CN106528623B - A kind of search engine accelerating method and device - Google Patents

A kind of search engine accelerating method and device Download PDF

Info

Publication number
CN106528623B
CN106528623B CN201610878061.7A CN201610878061A CN106528623B CN 106528623 B CN106528623 B CN 106528623B CN 201610878061 A CN201610878061 A CN 201610878061A CN 106528623 B CN106528623 B CN 106528623B
Authority
CN
China
Prior art keywords
document
search
address
search engine
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610878061.7A
Other languages
Chinese (zh)
Other versions
CN106528623A (en
Inventor
张立峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Publication of CN106528623A publication Critical patent/CN106528623A/en
Application granted granted Critical
Publication of CN106528623B publication Critical patent/CN106528623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of search engine accelerating method and device, including:It searches plain engine and receives search key input by user, search the preceding N search results that plain engine meets condition by search key output, the preceding N search result includes:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade;To be represented using bitmap data structure, the internal data sequence number of the bitmap data structure and the sequence of described search result match the address of document ID, and represent whether the document is stored in the corresponding position of the sequence number by a bit.Technical solution provided by the invention has search speed fast, the advantages of being lacked using space memory.

Description

A kind of search engine accelerating method and device
Technical field
This application involves technical field of data processing, have specifically related generally to a kind of search engine accelerating method and device.
Background technology
Many search engine engines are all to surround to establish inverted order index using keyword and scan for document, to accelerate to close The search of key word.But some fields are not suitable for establishing inverted order index, because the document many times returned using keyword It being also required to do complex calculation to some fields in these documents simultaneously, these fields are just not suitable for establishing inverted order index, if after The continuous design system according to routine may result in treating capacity and be very limited.
By taking picture search as an example, the process flow of usual picture search is:Camera acquisition mass picture, and generate corresponding Characteristic value is stored into search engine, uploading pictures and generates individual features value A to the picture during search, with this characteristic value A and Other filter conditions retrieve similar picture from search engine and complete to scheme to search figure.Such as (the CPU on single machine: Intel (R) Xeon (R) CPU E5-2520 v3@2.40GHz, memory:64G, hard disk:1T SATA, are related to below Machine is this configuration), it is 2,000,000 according to the keyword search retrieval scale of returning to document, designs with complicated mathematical operation very Hardly possible is only converted into the inverted order index according to keyword search, in addition the computing of other complexity, entire corresponding time 3s, with retrieval The sample of return is increasing, and the response time sharply increases, and when concurrency reaches 10 requests, whole system is just paralysed. Face search is again an exemplary in picture search because the characteristic value of face face be a great array, it is necessary to It is calculated using specific algorithm, industry has similar to scan for using LIRE (Lucene Image Retrieval) at present Picture, but be not very high for the search accuracy of face, so the requirement for reaching industry generation of having no idea.
Be not suitable for establishing inverted order index there are complex fields in document and complete search mission, while have the data knot of magnanimity again Fruit collects and the request of high concurrent needs to handle.In this case, engine search is easily since memory consumes more, cpu load It is excessive, the problem for the treatment of effeciency is low, the response time is long and search accuracy is low is caused, and easily leads to the wind of memory spilling Danger.
The content of the invention
The embodiment of the present application provides a kind of search engine accelerating method and device, and sea is supported to meet search engine Result set, high concurrent, the demand of low latency are measured, so as to improve search engine processing efficiency and speed.
The embodiment of the present application first aspect provides a kind of search engine accelerated method, including:
It searches plain engine and receives search key input by user, search plain engine and condition is met by search key output Preceding N search result, the preceding N search result include:Matching score, address of document ID and the search engine burst of document, The N is the positive integer of magnanimity grade;
The address of document ID to be represented using bitmap data structure, the internal data sequence number of the bitmap data structure with The sequence of described search result matches, and is corresponded to by a bit to represent whether the document is stored in the sequence number Position.
Optionally, this method further includes:
Multithreading task is created to address of document ID processing, independent data space is distributed for per thread and makes meter Calculation task reuses the data space.
Optionally, it is described to distribute independent data space for per thread and the calculating task is made to reuse the data sky Between it is specific, including:
Independent data space is distributed for per thread, by the calculating people after calculating task of a thread is completed Object is removed, and next calculating task that brings into operation.
Optionally, the data space independent for per thread distribution is specific, including:
For the identical data space of the size of per thread distribution.
Optionally, the method further includes:
Whole document information is obtained using the I/O model of non-block type, when the I/O model of the non-block type receives document When information is completed, handled by interrupt notification central processor CPU, whole document information has been handled in central processing unit When, by document information storage into memory.
Second aspect provides a kind of search engine device, including:
Receiving unit, for receiving search key input by user;
Search unit, for meeting preceding N search results of condition, the preceding N search by search key output As a result include:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade;
The address of document ID to be represented using bitmap data structure, the internal data sequence number of the bitmap data structure with The sequence of described search result matches, and is corresponded to by a bit to represent whether the document is stored in the sequence number Position.
Optionally, described device further includes:
Creating unit, for creating multithreading task to address of document ID processing;
Allocation unit, it is empty for distributing independent data space for per thread and calculating task being made to reuse the data Between.
Optionally, the allocation unit is specific, for distributing independent data space for per thread, when thread Calculating task removes the calculating personage after completing, and next calculating task that brings into operation.
Optionally, the allocation unit is specific, for the identical data space of the size distributed for the per thread.
Optionally, described device further includes:
Acquiring unit, the I/O model for the non-block type of application obtain whole document information;
Storage unit, for when the I/O model of the non-block type receive document information complete when, by interrupt notification When central processing unit has handled whole document information, the document information is stored to interior for central processor CPU processing In depositing.
Technical solution provided by the invention has the advantages that search speed is fast.
Description of the drawings
In order to illustrate more clearly of the technical solution in the embodiment of the present application, below by embodiment it is required use it is attached Figure is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present application, for this field For those of ordinary skill, without creative efforts, other attached drawings are can also be obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of search engine accelerated method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram for search engine accelerated method that another embodiment of the present invention provides;
Fig. 3 is a kind of flow diagram for search engine accelerated method that further embodiment of this invention provides;
Fig. 4 is a kind of structure diagram of search engine device provided in an embodiment of the present invention;
Fig. 5 is a kind of hardware architecture diagram for searching for equipment provided in an embodiment of the present invention.
Specific embodiment
It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing described as flow chart or method.Although operations are described as the processing of order by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.When it The processing can be terminated when operation is completed, it is also possible to have the additional step being not included in attached drawing.The processing It can correspond to method, function, regulation, subroutine, subprogram etc..
Alleged within a context " computer equipment ", also referred to as " computer ", referring to can be by running preset program or referring to Make performing the intelligent electronic device of the predetermined process process such as numerical computations and/or logical calculated, can include processor with Memory, by the survival that prestores in memory of processor execution instruct to perform predetermined process process or by ASIC, The hardware such as FPGA, DSP perform predetermined process process or are realized by said two devices combination.Computer equipment includes but unlimited In server, PC, laptop, tablet computer, smart mobile phone etc..
Method (some of them are illustrated by flow) discussed hereafter can be by hardware, software, firmware, centre Part, microcode, hardware description language or its any combination are implemented.Implement when with software, firmware, middleware or microcode When, to implement the program code of necessary task or code segment can be stored in machine or computer-readable medium and (for example deposit Storage media) in.(one or more) processor can implement necessary task.
Concrete structure and function detail disclosed herein are only representative, and are for describing showing for the present invention The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.
Although it should be appreciated that may have been used term " first ", " second " etc. herein to describe unit, But these units should not be limited by these terms.The use of these items is only for by a unit and another unit It distinguishes.For example, in the case of the scope without departing substantially from exemplary embodiment, it is single that first module can be referred to as second Member, and similarly second unit can be referred to as first module.Term "and/or" used herein above include one of them or The arbitrary and all combination of more listed associated items.
Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, otherwise singulative used herein above "one", " one " also attempt to include plural number.Should also When understanding, term " comprising " and/or "comprising" used herein above provide stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit, Component and/or its combination.
It should further be mentioned that in some replaces realization modes, the function/action being previously mentioned can be according to different from attached The order indicated in figure occurs.For example, depending on involved function/action, the two width figures shown in succession actually may be used Substantially simultaneously to perform or can perform in a reverse order sometimes.
The present invention is described in further detail below in conjunction with the accompanying drawings.
According to an aspect of the invention, there is provided a kind of search engine accelerated method.
In one embodiment, the above method can be used in smart machine, it is necessary to illustrate, the smart machine is only For citing, other existing or network equipments and user equipment for being likely to occur from now on should also include if applicable in the present invention Within the scope of the present invention, and it is incorporated herein by reference.
Referring first to Fig. 1, Fig. 1 is a kind of flow diagram of search engine accelerated method provided in an embodiment of the present invention. As shown in Figure 1, the above method can be applied in intelligent terminal or in computer equipment, above-mentioned intelligent terminal includes but unlimited In:The equipment such as mobile phone, tablet computer, computer, server, certainly itself or other equipment, such as smartwatch or Intelligent bracelet etc..This method is as shown in Figure 1, include the following steps:
Step S101, search plain engine and receive search key input by user;
The mode of reception search key input by user in above-mentioned steps S101 can there are many, for example, in this hair In a bright preferred embodiment, above-mentioned steps S101 obtains search key input by user by way of input through keyboard, Certain above-mentioned keyboard can show different forms in different equipment, for example, with computer (including but not limited to desk-top Computer and notebook computer), above-mentioned keyboard can be physical keyboard, certainly in tablet computer or mobile phone, above-mentioned keyboard Can be the dummy keyboard of Software Create, the application is not intended to limit the specific manifestation form of above-mentioned keyboard.The present invention another In preferred embodiment, search key input by user can be obtained in above-mentioned steps S101 by phonetic entry mode, when So in practical applications, the mode of above-mentioned phonetic entry can be obtained by built-in Mike, when another it is actual answer In, the phonetic entry can also be obtained by being connected the Mike of equipment with smart machine.Certainly in practical applications, it is above-mentioned The reception mode of step S101 can also have other modes, and differ a citing here.
Step S102, the preceding N search results that plain engine meets condition by search key output are searched, it is described N first Search result includes:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade;Institute Address of document ID is stated to be represented using bitmap data structure, internal data sequence number and the described search knot of the bitmap data structure The sequence of fruit matches, and represents whether the document is stored in the corresponding position of the sequence number by a bit.On The positive integer for stating magnanimity grade refers generally to the positive integer that numerical value is more than 1,000,000.
Search result is the search result obtained by search engine in above-mentioned steps S102, according to different search engines There may be different search results, for example, may be differed using the search result that Baidu or Google search come out.
Traditional address of document ID is replaced in above-mentioned steps using bitmap data structure, for existing address of document ID For, the size with 64bit, and for the application, address of document ID is to pass through only only there are one Bit Whether the corresponding position of digit is what is preserved by bitmap data structure to represent it, wherein 1 represents to preserve, 0 represents not preserve, Illustrated below by the example of a reality in a manner of its concrete implementation, since length is limited, here using 15 address of document as Example illustrates that in practical applications, the address ID number of possible document is up to up to a million a or even more than one hundred million.A such as number String, 0111000111110101, corresponding meaning is, document sequence number 2,3,4,8,9,10,11,12,14,15 is preserved pair The document answered does not preserve document for other sequence numbers.
By above description, for search engine, since 64bit is modified as 1bit by it, so greatly save The memory space of memory, for search result, quantity is generally million ranks, 63*107Bit, even if pressing 1000000 search results calculate, and can also save 6.3*108The amount of ram of bit, so it can save substantial amounts of memory, from And improve the speed of search.
Above-mentioned search engine (Search Engine) refer to according to certain strategy, with specific computer program from Information is collected on internet, after tissue and processing are carried out to information, provides retrieval service to the user, user search is relevant The system that information shows user.Search engine includes full-text index, directory index, META Search Engine, vertical search engine, collection Box-like search engine, door search engine and free lists of links etc..
One search engine is made of four searcher, index, searcher and user interface parts.The function of searcher It is to be roamed in internet, finds and collect information.The function of index is to understand the information that searcher is searched for, and is therefrom extracted Go out index entry, for representing document and generating the concordance list of document library.The function of searcher is the inquiry according to user in rope Draw Rapid Detection document in storehouse, carry out document and the covariance mapping of inquiry, the result that will be exported is ranked up, and is realized Certain End-user relevance feedback mechanism.The effect of user interface is input user inquiry, display query result, provides user's correlation Property feedback mechanism.
Referring to Fig. 2, Fig. 2 is a kind of flow diagram for search engine accelerated method that another embodiment of the present invention provides. As shown in Fig. 2, the above method can be applied in intelligent terminal or in computer equipment, above-mentioned intelligent terminal includes but unlimited In:The equipment such as mobile phone, tablet computer, computer, server, certainly itself or other equipment, such as smartwatch or Intelligent bracelet etc..This method is as shown in Fig. 2, include the following steps:
Step S201, search plain engine and receive search key input by user.
The mode of reception search key input by user in above-mentioned steps S201 can there are many, for example, in this hair In a bright preferred embodiment, above-mentioned steps S201 obtains search key input by user by way of input through keyboard, Certain above-mentioned keyboard can show different forms in different equipment, for example, with computer (including but not limited to desk-top Computer and notebook computer), above-mentioned keyboard can be physical keyboard, certainly in tablet computer or mobile phone, above-mentioned keyboard Can be the dummy keyboard of Software Create, the application is not intended to limit the specific manifestation form of above-mentioned keyboard.The present invention another In preferred embodiment, search key input by user can be obtained in above-mentioned steps S201 by phonetic entry mode, when So in practical applications, the mode of above-mentioned phonetic entry can be obtained by built-in Mike, when another it is actual answer In, the phonetic entry can also be obtained by being connected the Mike of equipment with smart machine.Certainly in practical applications, it is above-mentioned The reception mode of step S201 can also have other modes, and differ a citing here.
Step S202, the preceding N search results that plain engine meets condition by search key output are searched, it is described N first Search result includes:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade;Institute Address of document ID is stated to be represented using bitmap data structure, internal data sequence number and the described search knot of the bitmap data structure The sequence of fruit matches, and represents whether the document is stored in the corresponding position of the sequence number by a bit.
Search result is the search result obtained by search engine in above-mentioned steps S202, according to different search engines There may be different search results, for example, may be differed using the search result that Baidu or Google search come out.
Traditional address of document ID is replaced in above-mentioned steps using bitmap data structure, for existing address of document ID For, the size with 64bit, and for the application, address of document ID is to pass through only only there are one Bit Whether the corresponding position of digit is what is preserved by bitmap data structure to represent it, wherein 1 represents to preserve, 0 represents not preserve, Illustrated below by the example of a reality in a manner of its concrete implementation, since length is limited, here using 15 address of document as Example illustrates that in practical applications, the address ID number of possible document is up to up to a million a or even more than one hundred million.A such as number String, 0111000111110101, corresponding meaning is, document sequence number 2,3,4,8,9,10,11,12,14,15 is preserved pair The document answered does not preserve document for other sequence numbers.
By above description, for search engine, since 64bit is modified as 1bit by it, so greatly save The memory space of memory, for search result, quantity is generally million ranks, 63*107Bit, even if pressing 1000000 search results calculate, and can also save 6.3*108The amount of ram of bit, so it can save substantial amounts of memory, from And improve the speed of search.
Step S203, multithreading task is created to address of document ID processing, and independent data are distributed for per thread Space simultaneously makes calculating task reuse the data space.
The implementation method of above-mentioned steps S203 is specifically as follows:
Independent data space is distributed for per thread, by the calculating people after calculating task of a thread is completed Object is removed, and next calculating task that brings into operation.In addition, be optionally the identical data space of per thread allocated size, The quantity for so enabling to the document that per thread is handled in multithreading is essentially identical, avoids multiple threads number of documents The problem of uneven, further improves speed.
The technical solution of another embodiment of the present invention since 64bit is modified as 1bit by it, be so greatly saved in The memory space deposited, for search result, quantity is generally million ranks, 63*107Bit, even if by 1,000,000 Search result calculates, and can also save 6.3*108The amount of ram of bit, so it can save substantial amounts of memory, so as to improve The speed of search.Document is handled additionally by configuration multithreading, further improves the speed of search, so it has into one Step accelerates the advantages of speed.
Refering to Fig. 3, Fig. 3 is a kind of flow diagram for search engine accelerated method that further embodiment of this invention provides. As shown in figure 3, the above method can be applied in intelligent terminal or in computer equipment, above-mentioned intelligent terminal includes but unlimited In:The equipment such as mobile phone, tablet computer, computer, server, certainly itself or other equipment, such as smartwatch or Intelligent bracelet etc..This method is as shown in figure 3, include the following steps:
Step S301, search plain engine and receive search key input by user.
The mode of reception search key input by user in above-mentioned steps S301 can there are many, for example, in this hair In a bright preferred embodiment, above-mentioned steps S301 obtains search key input by user by way of input through keyboard, Certain above-mentioned keyboard can show different forms in different equipment, for example, with computer (including but not limited to desk-top Computer and notebook computer), above-mentioned keyboard can be physical keyboard, certainly in tablet computer or mobile phone, above-mentioned keyboard Can be the dummy keyboard of Software Create, the application is not intended to limit the specific manifestation form of above-mentioned keyboard.The present invention another In preferred embodiment, search key input by user can be obtained in above-mentioned steps S301 by phonetic entry mode, when So in practical applications, the mode of above-mentioned phonetic entry can be obtained by built-in Mike, when another it is actual answer In, the phonetic entry can also be obtained by being connected the Mike of equipment with smart machine.Certainly in practical applications, it is above-mentioned The reception mode of step S301 can also have other modes, and differ a citing here.
Step S302, the preceding N search results that plain engine meets condition by search key output are searched, it is described N first Search result includes:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade;Institute Address of document ID is stated to be represented using bitmap data structure, internal data sequence number and the described search knot of the bitmap data structure The sequence of fruit matches, and represents whether the document is stored in the corresponding position of the sequence number by a bit.
Search result is the search result obtained by search engine in above-mentioned steps S302, according to different search engines There may be different search results, for example, may be differed using the search result that Baidu or Google search come out.
Traditional address of document ID is replaced in above-mentioned steps using bitmap data structure, for existing address of document ID For, the size with 64bit, and for the application, address of document ID is to pass through only only there are one Bit Whether the corresponding position of digit is what is preserved by bitmap data structure to represent it, wherein 1 represents to preserve, 0 represents not preserve, Illustrated below by the example of a reality in a manner of its concrete implementation, since length is limited, here using 15 address of document as Example illustrates that in practical applications, the address ID number of possible document is up to up to a million a or even more than one hundred million.A such as number String, 0111000111110101, corresponding meaning is, document sequence number 2,3,4,8,9,10,11,12,14,15 is preserved pair The document answered does not preserve document for other sequence numbers.
Step S303, whole document information is obtained using the I/O model of non-block type, when the IO moulds of the non-block type It when type receives document information completion, informs that central processor CPU is handled, whole document letters has been handled in central processing unit During breath, by document information storage into memory.
The operation of I/O intensive type can be separated using the IO of non-obstruction in above-mentioned steps S303 and CPU intensive type operates, reduce The I/O latency and then speed up processing of CPU.
Under the I/O mode of obstruction, if reading the data volume less than specified size from network flow, obstruction IO is right over there Block.For example, the known data for having 10 bytes below are sent, but I only receives 8 bytes now, then current Thread is right just in the arrival that stupidly waits until next byte, is just waited at that, what thing is not also done, until this 10 words Section has been read, this just decontrols obstruction current.
Under Non-Blocking I/O pattern, if reading the data volume less than specified size from network flow, Non-Blocking I/O is just immediately It is current.For example, the known data for having 10 bytes below are sent, but I only receives 8 bytes now, then works as front Journey just reads the data of this 8 bytes, is just returned immediately after running through, and goes to read again when other two byte is waited again to come.
From the above it can be seen that obstruction IO is very low in aspect of performance, if it is desired that completing a Web with obstruction IO If server, then must enable a thread for each request and handle.And if using Non-Blocking I/O, one It is substantially just much of that two threads, because thread will not generate obstruction, like the data for once receiving A requests, connect under another Data of B requests, etc. are received, is exactly ceaselessly to run around here and there, is directly over to data receiver.
The technical solution of another embodiment of the present invention since 64bit is modified as 1bit by it, be so greatly saved in The memory space deposited, for search result, quantity is generally million ranks, 63*107Bit, even if by 1,000,000 Search result calculates, and can also save 6.3*108The amount of ram of bit, so it can save substantial amounts of memory, so as to improve The speed of search.In addition, the present embodiment further improves the speed of search using Non-Blocking I/O pattern.
Refering to Fig. 4, Fig. 4 is a kind of search engine device provided in an embodiment of the present invention, and the device is as shown in figure 4, such as Fig. 4 The definition of technical term in shown embodiment may refer to the definition of embodiment as shown in Figure 1, 2, 3, which includes:
Receiving unit 401, for receiving search key input by user.
Search unit 402, for meeting preceding N search results of condition by search key output, described first N is searched Hitch fruit includes:Matching score, address of document ID and the search engine burst of document, the N are the positive integer of magnanimity grade.
The address of document ID to be represented using bitmap data structure, the internal data sequence number of the bitmap data structure with The sequence of described search result matches, and is corresponded to by a bit to represent whether the document is stored in the sequence number Position.
Optionally, which further includes:
Creating unit 403, for creating multithreading task to address of document ID processing.
Allocation unit 404, for distributing independent data space for per thread and calculating task being made to reuse the data Space.
Allocation unit 404 is specific, for distributing independent data space for per thread, when the calculating task of a thread The calculating personage is removed after completion, and next calculating task that brings into operation.
Allocation unit 404 is specific, for the identical data space of the size distributed for the per thread.
Optionally, above device further includes:
Acquiring unit 405, the I/O model for the non-block type of application obtain whole document information.
Storage unit 406, for when the I/O model of the non-block type receives document information completion, informing central processing Device CPU processing, when central processing unit has handled whole document information, by document information storage into memory.
Refering to Fig. 5, Fig. 5 is a kind of hardware architecture diagram for searching for equipment provided in an embodiment of the present invention.Above-mentioned search Equipment is specifically as follows:The equipment such as server, computer, smart mobile phone, the search equipment 50 as shown in figure 5, including:Processor 501st, memory 502, transceiver 503 and bus 504.Transceiver 503 is used for external equipment interaction with transceiving data.Search is set The quantity of processor 501 in standby 50 can be one or more.In some embodiments of the present application, processor 501, memory 502 can be connected with transceiver 503 by bus or other modes.Memory 502 for storing program code, use by processor 501 In calling the program code that is stored in memory 502, to realize the function as shown in Figure 1, Figure 2, in Fig. 3.It is related on the present embodiment Term meaning and citing, may be referred to the corresponding embodiment in Fig. 1,2,3.Details are not described herein again.It should be noted that this In processor 501 can be a processing element or multiple processing elements general designation.For example, the processing element can Be central processing unit (English:Central processing unit, referred to as:) or specific integrated circuit (English CPU Text:Application-specific integrated circuit, referred to as:ASIC) or it is arranged to implement this Shen Please embodiment one or more integrated circuits, such as:One or more digital signal processor (English:digital Signal processor, referred to as:DSP) or, one or more field programmable gate array is (English:field- Programmable gate array, referred to as:FPGA).
Memory 503 can be the general designation of a storage device or multiple memory elements, and for storing and can hold Parameter, data etc. required for line program code or the operation of application program running gear.And memory 503 can include random storage Device (English:Random-access memory, referred to as:RAM), nonvolatile memory (non-volatile can also be included ), such as magnetic disk storage, flash memory (flash) etc. memory.
Bus 504 can be industry standard architecture (English:Industry Standard Architecture, letter Claim:ISA) bus, external equipment interconnection (English:Peripheral Component Interconnect, referred to as:PCI) bus Or extended industry-standard architecture (English:Extended Industry Standard Architecture, referred to as: EISA) bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of representing, only with one in Fig. 5 Bar thick line represents, it is not intended that an only bus or a type of bus.
Module or submodule in all embodiments of the invention, can be by universal integrated circuit, such as CPU or passes through ASIC (Application Specific Integrated Circuit, application-specific integrated circuit) is realized.
It should be noted that for foregoing each embodiment of the method, in order to be briefly described, therefore it is all expressed as to a system The combination of actions of row, but those skilled in the art should know, the present invention and from the limitation of described sequence of movement, because For according to the application, certain some step may be employed other orders or be carried out at the same time.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily this Shen It please be necessary.
In the above-described embodiments, all emphasize particularly on different fields to the description of each embodiment, be not described in some embodiment Part, may refer to the associated description of other embodiment.
The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.
Unit in user terminal of the embodiment of the present invention can be combined, divided and deleted according to actual needs.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, abbreviation RAM) etc..
A kind of search engine accelerating method and device disclosed by the embodiments of the present invention is described in detail above, herein In apply specific case the principle of the present invention and embodiment be set forth, the explanation of above example is only intended to sides Assistant solves the method and its core concept of the present invention;Meanwhile for those of ordinary skill in the art, think of according to the invention Think, in specific embodiments and applications there will be changes, in conclusion this specification content should not be construed as pair The limitation of the present invention.

Claims (2)

1. a kind of search engine accelerated method, which is characterized in that including:
Search engine receives search key input by user, before search engine meets condition by search key output N search results, the preceding N search result include:Matching score, address of document ID and the search engine burst of document, institute The positive integer that N is magnanimity grade is stated, the magnanimity grade is more than 1,000,000 positive integer for numerical value;
The address of document ID to be represented using bitmap data structure, the internal data sequence number of the bitmap data structure with it is described The sequence of search result matches, and represents whether the document is stored in the corresponding position of the sequence number by a bit It puts, only there are one bits by the address of document ID;
Wherein, the method further includes:
Whole document information is obtained using the I/O model of non-block type, when the I/O model of the non-block type receives document information It during completion, is handled by interrupt notification central processing unit, when central processing unit has handled whole document information, by institute Document information storage is stated into memory, the IO of the non-obstruction operates for separating the operation of I/O intensive type and CPU intensive type;
The method further includes:
Multithreading task is created to address of document ID processing, independent data space is distributed for per thread and appoints calculating Business reuses the data space, is specially:
Independence and the identical data space of size are distributed for per thread, when described in general after the calculating task of a thread is completed Calculating task is removed, and next calculating task that brings into operation.
2. a kind of search engine device, which is characterized in that including:
Receiving unit, for receiving search key input by user;
Search unit, for meeting preceding N search results of condition, the preceding N search result by search key output Including:Matching score, address of document ID and the search engine burst of document, the N be magnanimity grade positive integer, the magnanimity grade It is more than 1,000,000 positive integer for numerical value;
The address of document ID to be represented using bitmap data structure, the internal data sequence number of the bitmap data structure with it is described The sequence of search result matches, and represents whether the document is stored in the corresponding position of the sequence number by a bit It puts, only there are one bits by the address of document ID;
Wherein, described device further includes:
Acquiring unit, the I/O model for the non-block type of application obtain whole document information;
Storage unit, for when the I/O model of the non-block type receives document information completion, passing through interrupt notification centre Device CPU processing is managed, when central processing unit has handled whole document information, by document information storage to memory In, the IO of the non-obstruction operates for separating the operation of I/O intensive type and CPU intensive type;
Creating unit, for creating multithreading task to address of document ID processing;
Allocation unit, for distributing independent data space for per thread and calculating task being made to reuse the data space, tool Body is:Independence and the identical data space of size are distributed for per thread, by institute after the calculating task of a thread is completed State calculating task removing, and next calculating task that brings into operation.
CN201610878061.7A 2016-09-28 2016-10-08 A kind of search engine accelerating method and device Active CN106528623B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610858742 2016-09-28
CN2016108587427 2016-09-28

Publications (2)

Publication Number Publication Date
CN106528623A CN106528623A (en) 2017-03-22
CN106528623B true CN106528623B (en) 2018-05-22

Family

ID=58331772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610878061.7A Active CN106528623B (en) 2016-09-28 2016-10-08 A kind of search engine accelerating method and device

Country Status (1)

Country Link
CN (1) CN106528623B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121815B (en) * 2017-12-28 2022-03-11 深圳开思时代科技有限公司 Automobile part query method, device and system, electronic equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295323A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Processing method and system for index updating
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295323A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Processing method and system for index updating
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"分片位图索引:一种适用于云数据管理的辅助索引机制";孟必平 等;《计算机学报》;20121115;第35卷(第11期);第2306-2316页 *
"数据库加速引擎中数据垂直分片技术研究";黄河等;《计算机工程》;20060831;第32卷(第16期);第34-35、51页 *

Also Published As

Publication number Publication date
CN106528623A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
US10585915B2 (en) Database sharding
KR101661000B1 (en) Systems and methods to enable identification of different data sets
CN108255958A (en) Data query method, apparatus and storage medium
CN107704202B (en) Method and device for quickly reading and writing data
CN110275864A (en) Index establishing method, data query method and calculating equipment
CN104965826B (en) Search method and retrieval device based on browser
WO2020250064A1 (en) Context-aware data mining
CN109359237A (en) It is a kind of for search for boarding program method and apparatus
CN112035529B (en) Caching method, caching device, electronic equipment and computer readable storage medium
US20180129736A1 (en) System to organize search and display unstructured data
CN112148701A (en) File retrieval method and equipment
CN111367870A (en) Method, device and system for sharing picture book
CN108171189A (en) Video coding method, video coding device and electronic equipment
JP2021535473A (en) Token matching in a large document corpus
CN112070550A (en) Keyword determination method, device and equipment based on search platform and storage medium
CN107590248B (en) Search method, search device, search terminal and computer-readable storage medium
CN106528623B (en) A kind of search engine accelerating method and device
CN109614478A (en) Construction method, key word matching method and the device of term vector model
CN112148865B (en) Information pushing method and device
CN110287284B (en) Semantic matching method, device and equipment
US20210034704A1 (en) Identifying Ambiguity in Semantic Resources
CN110688223A (en) Data processing method and related product
CN114741489A (en) Document retrieval method, document retrieval device, storage medium and electronic equipment
CN111783440B (en) Intention recognition method and device, readable medium and electronic equipment
CN111666449B (en) Video retrieval method, apparatus, electronic device, and computer-readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant