CN115438236A - Unified hybrid search method and system - Google Patents

Unified hybrid search method and system Download PDF

Info

Publication number
CN115438236A
CN115438236A CN202211196900.9A CN202211196900A CN115438236A CN 115438236 A CN115438236 A CN 115438236A CN 202211196900 A CN202211196900 A CN 202211196900A CN 115438236 A CN115438236 A CN 115438236A
Authority
CN
China
Prior art keywords
data
keywords
keyword
cold
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211196900.9A
Other languages
Chinese (zh)
Other versions
CN115438236B (en
Inventor
彭龙
鲁东民
杜宏博
葛晋鹏
米丽媛
郭亚辉
饶雷
张帅
王乃正
邵鹏志
梁冬
王静阳
印泰桦
袁艳敏
王乐和
曾帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China North Computer Application Technology Research Institute
Original Assignee
China North Computer Application Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China North Computer Application Technology Research Institute filed Critical China North Computer Application Technology Research Institute
Priority to CN202211196900.9A priority Critical patent/CN115438236B/en
Publication of CN115438236A publication Critical patent/CN115438236A/en
Application granted granted Critical
Publication of CN115438236B publication Critical patent/CN115438236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a unified hybrid search system and a unified hybrid search method, which belong to the technical field of data processing; the method solves the problems that in the prior art, when keyword search is carried out, search needs to be carried out in all directions, so that the search resource is excessively occupied, the search progress is slow, and the search engine is easy to crash when a plurality of persons search simultaneously; the method comprises the steps of obtaining a keyword to be searched; comparing and searching the locally pre-stored hot data keywords, the non-locally pre-stored hot data keywords, the locally pre-stored cold data keywords and the non-locally pre-stored cold data keywords according to the set priority; when the keywords of the hot data or the cold data are searched to obtain the keywords which accord with the keywords of the hot data or the cold data, displaying the keywords and the data source addresses corresponding to the keywords to obtain a search result; and the data source address is hyperlinked with the corresponding keyword.

Description

Unified hybrid search method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a unified hybrid search method and a unified hybrid search system.
Background
The search engine is a retrieval technology which retrieves the formulated information and feeds the information back to the user by using a specific strategy according to the user requirement and a certain algorithm. The search engine relies on various technologies, such as a web crawler technology, a retrieval sorting technology, a web page processing technology, a big data processing technology, a natural language processing technology and the like, and provides quick and high-relevance information service for information retrieval users. And the search engine has huge data volume and occupies a large amount of resource cost. The existing search engine is all omnibearing when carrying out keyword search, the search mode can cause overlarge search resource occupation during search, the search progress is slow, the search application is influenced, and once a plurality of people carry out search simultaneously, the search engine is easy to crash, and defects and shortcomings exist.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a unified hybrid search method and system; the method solves the problems that a search system in the prior art needs to consume high hardware storage cost, and the existing search method needs to search in all directions when searching for the keyword, so that the search resource is excessively occupied, the search progress is slow, and the search engine is easy to crash when a plurality of persons search simultaneously.
The purpose of the invention is mainly realized by the following technical scheme:
in one aspect, the present invention provides a unified hybrid search method, including the following steps:
acquiring a keyword to be searched;
comparing and searching the locally pre-stored hot data keywords, the non-locally pre-stored hot data keywords, the locally pre-stored cold data keywords and the non-locally pre-stored cold data keywords according to the set priority;
when the keywords of the hot data or the cold data are searched to obtain the keywords which accord with the keywords, displaying the keywords and the data source addresses corresponding to the keywords to obtain a search result; and the data source address is hyperlinked with the corresponding keyword.
Further, the local prestore is local disk storage and is used for storing hot data keywords and cold data keywords of a search text type; the non-local pre-storage is cloud storage and is used for storing non-text hot data keywords and non-text cold data keywords.
Furthermore, the cold data keywords and the hot data keywords are obtained by counting the search frequency of each keyword through a preset interval time and judging according to a search frequency threshold and a hot data storage probability.
Further, the method for judging and obtaining the cold data keywords and the hot data keywords according to the search frequency threshold and the hot data storage probability comprises the following steps:
setting an updating period;
monitoring the times of retrieval of each keyword in an updating period, and primarily judging cold and hot data keywords according to a search frequency threshold to obtain pre-stored cold storage data keywords and pre-stored heat storage data keywords;
calculating the storage probability of the thermal data of the pre-stored heat storage data keywords, and selecting the data with the storage probability larger than the storage probability threshold value as the thermal data for storage; and storing the data with the storage probability smaller than the storage probability threshold value and the pre-stored cold storage data key words as cold data key words.
Further, the thermal data storage probability is calculated by the following formula:
Figure BDA0003869915590000021
wherein, P is the storage probability of the thermal data, t is the number of times of searching for the current keyword, w is the total number of times of searching for all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
Further, the data source address is used for linking to a webpage or a file matched with the keyword; and storing the data source address corresponding to the local pre-stored keyword by adopting a local disk, and storing the data source address corresponding to the non-local pre-stored keyword by adopting cloud.
In another aspect, a unified hybrid search system is also disclosed, comprising: the system comprises a central processing unit, a unified hybrid storage module and a data source address storage module;
the unified hybrid storage module is used for screening and respectively storing the cold data keywords and the hot data keywords; the unified hybrid storage module comprises a cold and hot keyword judgment unit, a cold data keyword storage unit and a hot data keyword storage unit;
the data source address storage module is used for storing data source address information and/or index information corresponding to the keywords; hyperlink between the data source address and the corresponding keyword;
the central processing unit is used for calling data of the cold and hot keyword judging unit and the keyword storage unit according to a preset cold and hot keyword judging flow and a preset search flow, and performing cold and hot data keyword judgment and cold and hot data keyword search; and if the matched hot data or cold data key words are searched, displaying the key words and the data source addresses corresponding to the key words to obtain search results.
Further, the hot and cold keyword judgment unit comprises a hot and cold data screening timing unit, a keyword search frequency statistic unit and a hot data storage probability calculation unit;
the cold and hot data screening timing unit is used for sending a cold and hot data screening instruction to the central processing unit at regular time;
the keyword search frequency counting unit is used for recording and counting the search frequency of each keyword in a preset time; the central processing unit identifies cold and hot data keywords in the hot data keyword storage unit and the cold data keyword storage unit according to the counted search frequency of each keyword, and prestores the cold and hot data keywords in the corresponding storage units according to identification results;
the hot data storage probability calculation unit is used for calculating hot data storage probability of the hot data key words; and storing the data with the probability greater than the threshold value as a final hot data keyword, and storing the data with the probability less than the threshold value and the data prestored in the cold data storage unit as a final cold data keyword.
Further, the cold and hot data keyword storage units each include: the system comprises a local data resource storage module and a non-local data resource storage module;
the local data resource storage module is a local disk and is used for storing cold and hot data keywords which are simple in searching steps; the non-local data resource storage module is used for storing the cold and hot data keywords with complicated searching steps.
Further, the thermal data storage probability is calculated by the following formula:
Figure BDA0003869915590000041
wherein, P is the storage probability of the thermal data, t is the retrieval times of the current keywords, w is the total retrieval times of all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
The beneficial effects of the technical scheme are as follows:
1. according to the invention, by arranging the cold data and hot data keyword searching units, priority searching can be carried out, and the hot keywords with higher searching frequency are searched in advance, so that comprehensive searching is avoided, and the occupation of searching resources is reduced.
2. The invention records the search times of each keyword through the keyword search frequency statistical module, sends a cold and hot data screening instruction to the central processing unit after 12 hours set by the cold and hot data screening timing module, re-identifies key words in the hot data keyword mixed search unit and the cold data keyword mixed search unit according to the search frequency and the hot data storage probability of each keyword counted by the keyword search frequency statistical module, realizes the re-identification and distribution of the cold and hot data, and stores the cold and hot data in the corresponding storage module, so as to ensure that the matched keyword is searched in the shortest time limit during each keyword search, and reduce the occupation of resources.
3. The method and the device preferentially call the keywords stored in the local data resource storage module of the pre-stored heat storage data keywords to compare with the input keywords for searching, and because the local data resource storage module of the pre-stored heat storage data keywords is a local disk and is used for storing text-type heat data keywords, if the input keywords are consistent with the keywords stored in the local data resource storage module of the pre-stored heat storage data keywords, the searching efficiency can be greatly improved, and the occupation of searching resources can be reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a flow chart of a unified hybrid search method for application specific domains according to an embodiment of the present invention;
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
An embodiment of the present invention discloses a unified hybrid search method for a dedicated domain, as shown in fig. 1, including the following steps:
step S1: acquiring a keyword to be searched;
specifically, a keyword to be searched can be input through the search display page.
Preferably, according to the actual application field of the search method, an initial database provided by an open source resource or a data storage database of a professional field search engine can be imported when the search engine is constructed. The database comprises keywords and data source addresses corresponding to the keywords. And respectively configuring the keyword storage module and the data source address storage module corresponding to the keyword to construct a search engine system. In actual application, the keyword to be searched is input through the input equipment, and the search engine searches the keyword database according to a preset flow to obtain a search result.
Step S2: comparing and searching the locally pre-stored hot data keywords, the non-locally pre-stored hot data keywords, the locally pre-stored cold data keywords and the non-locally pre-stored cold data keywords according to the set priorities;
specifically, in order to improve the search efficiency and reduce the data storage cost, the keywords of the invention are divided into local and non-local stored hot data keywords and local and non-local stored cold data keywords. Searching local storage hot data keywords through a set priority, if the keywords are matched, displaying the keywords and a corresponding data source address, and if the matched keywords are not searched, sequentially searching and comparing non-local storage hot data keywords, local storage cold data keywords and non-local storage cold data keywords through the same method; the conventional omnibearing search is avoided, and the search efficiency is improved.
Preferably, the local prestoring is local disk storage and is used for storing simple keywords in the searching step, and the simple keywords in the searching step are text keywords; the non-local pre-storage is cloud storage and is used for storing keywords with complex searching steps. The keywords with complex searching steps are non-text keywords, such as pictures, formulas, chemical formulas and the like, and the keywords are processed to form keywords, so that the keywords can be searched.
Preferably, the cold data keywords and the hot data keywords are obtained by counting the search frequency of each keyword through a predetermined interval time and judging according to the search frequency threshold and the hot data storage probability. According to the statistical experience of the search frequency and the operation rule of the user, the change of the data search frequency is large every 12 hours as a period, so the preset interval time for counting the search frequency of the cold data keyword and the hot data keyword is set to be 12 hours in the embodiment.
Preferably, the cold data keywords and the hot data keywords are determined according to the search frequency threshold and the hot data storage probability by the following steps:
setting an updating period;
monitoring the search frequency of each keyword retrieved in an updating period, and primarily judging cold and hot data keywords according to a search frequency threshold value to obtain pre-stored cold storage data keywords and pre-stored heat storage data keywords; the search frequency threshold of the present embodiment is set to 1000 times/minute.
Calculating the storage probability of the thermal data of the keywords of the pre-stored heat storage data, and selecting the data with the storage probability larger than the threshold value of the storage probability as the thermal data for storage; and the pre-stored cold data keywords and the data with the storage probability smaller than the storage probability threshold value are jointly used as the cold data keywords for storage. According to the scheme, the cloud resources are flexibly utilized on the basis of not increasing local storage hardware equipment, data storage is achieved, and the searching efficiency of the hot data is guaranteed.
After the pre-stored hot data is obtained, for a user with enough storage hardware equipment, in order to avoid the high cost of storing all data in the memory, the cold data can be ejected to the HDD for storage, and the hot data is still stored in the SSD.
For users with limited data storage hardware equipment, the obtained pre-stored heat storage data keywords can be further judged according to the heat data storage probability, so that the existing hardware resources are fully utilized.
Preferably, the thermal data storage probability is calculated by the following formula:
Figure BDA0003869915590000071
wherein, P is the storage probability of the thermal data, t is the retrieval times of the current keywords, w is the total retrieval times of all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
The storage probability of each thermal data keyword can be calculated through the formula, for example, 10 thermal data keywords can be determined in a period of 12 hours, the thermal data storage probabilities P1, P2 \8230andP 10 are sequentially calculated, descending sorting is carried out according to the thermal data storage probabilities, a threshold value is set according to actual requirements, and data larger than the threshold value are screened from sorting results to be the thermal data keywords.
Further, in the foregoing embodiment, a certain type of data may suddenly become hot data, and therefore, large-scale data storage location migration may be performed without exceeding the migration cost, and the data of the type may be directly and entirely saved in the ssd.
Specifically, the total cost of thermal data migration is calculated by the following formula:
f*N<F;
wherein F is the migration cost of a single keyword, N is the number of keywords of the migration hot data, and F is the threshold value of the migration total cost. If the total migration cost formula is established, batch migration can be carried out.
As a specific example, for NoSQL big data storage, a client needing a retrieval engine rents 1000 machines, the price of each high-performance machine ssd is 8w per year, the total cost is 8000 ten thousand, the cost of migrating cold data to a low-performance machine is 4w per year, the data is completely migrated to 1000 hdd devices, the cost is 4000 ten thousand per year, the example of adjusting cold and hot data once in 24 hours is taken, 360 days in a year, 4000/365 is equal to about 10.9, the cold and hot data are separated, the threshold value of migration is 10.9 ten thousand, and batch migration can be adopted as long as the total cost of each batch migration is lower than the threshold value, so that the total renting cost is lower.
According to the method, the hot data local resources, the hot data cloud storage resources, the cold data local resources and the cold data cloud storage resources are sequentially searched through the search priority setting, so that the operation of all-dimensional search is avoided, the search efficiency is greatly improved, the problem that the search engine is easy to crash due to the fact that the conventional all-dimensional search progress is slow is solved, and a plurality of people search simultaneously. And the keywords of the cold data and the hot data are judged and stored through the search frequency and the hot data storage probability, different thresholds can be set according to the hardware cost burden capacity of different users and the cost of cold data and hot data separation, the requirements of different users are met, and corresponding storage schemes are provided for the users in a differentiated mode.
And step S3: when the keywords of the hot data or the cold data are searched to obtain the keywords which accord with the keywords, displaying the keywords and the data source addresses corresponding to the keywords to obtain a search result; and the data source address is hyperlinked with the corresponding keyword.
Specifically, when the matched keywords are judged, the keywords obtained by searching are displayed on the keyword display page, and similar keywords are also displayed together, so that the user can search conveniently. Similar keywords can be obtained according to semantic similarity, for example: ancient poems, ancient poetry and poetry sentences are similar keywords. In practical application, similar keywords can be judged according to practical requirements through cosine similarity between the input keywords and keywords stored in the database, and are displayed at the same time.
And simultaneously, displaying the data source address corresponding to the searched keyword on a source address display page. The data source address is used for linking to a webpage or a file matched with the keyword; the data source address is also divided into local storage and non-local storage. And the data source address corresponding to the local storage keyword is stored by adopting a local disk, and the data source address corresponding to the non-local storage keyword is stored by adopting a cloud. And hyperlink the data source address information with the corresponding keywords, and displaying the source address data information hyperlinked with the keywords on a source address display page for a user to click to go.
Another embodiment of the present invention further discloses a unified hybrid search system for dedicated domain, comprising: the system comprises a central processing unit, a unified hybrid storage module and a data source address storage module;
the unified hybrid storage module is used for screening and respectively storing the cold data keywords and the hot data keywords; the unified hybrid storage module comprises a cold and hot keyword judgment unit, a cold data keyword storage unit and a hot data keyword storage unit;
the cold and hot keyword judging unit comprises a cold and hot data screening timing unit, a keyword search frequency statistical unit and a hot data storage probability calculating unit;
the cold and hot data screening timing unit is used for sending a cold and hot data screening instruction to the central processing unit at regular time; preferably, the initial set value of the cold and hot data screening timing unit is 12 hours, namely, the keyword judgment of cold and hot data is performed every 12 hours.
The keyword search frequency counting unit is used for recording and counting the search frequency of each keyword in preset time; the central processing unit identifies cold and hot data keywords in the hot data keyword storage unit and the cold data keyword storage unit according to the counted search frequency of each keyword, and prestores the cold and hot data keywords in the corresponding cold and hot keyword storage units according to the identification result;
the hot data storage probability calculation unit is used for calculating the hot data storage probability of the hot data key words. The data with the storage probability larger than the threshold value is used as a final hot data keyword and stored in a hot data keyword storage unit; and storing the data with the storage probability smaller than the threshold value and the data prestored in the cold data storage unit as final cold data keywords in the cold data keyword storage unit.
Preferably, the thermal data storage probability is calculated by the following formula:
Figure BDA0003869915590000101
wherein, P is the storage probability of the thermal data, t is the number of times of searching for the current keyword, w is the total number of times of searching for all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
The cold and hot data keyword storage units each include: the system comprises a local data resource storage module and a non-local data resource storage module;
the local data resource storage module is a local disk and is used for storing the cold and hot data keywords which are simple in the searching step; the non-local data resource storage module is used for storing cold and hot data keywords with complicated searching steps.
Preferably, the local data resource storage module is used for storing keywords which are simple in the searching step, and the keywords which are simple in the searching step are text keywords; the non-local data resource storage module is used for storing keywords with complex searching steps. The keywords with complex searching steps are non-text keywords, such as pictures, formulas, chemical formulas and the like, and the keywords are processed first and then formed to be searched.
The data source address storage module is used for storing data source address information and/or index information corresponding to the keywords; hyperlink between the data source address and the corresponding keyword;
the central processing unit is used for calling data of the cold and hot keyword judging unit and the keyword storage unit according to a preset cold and hot keyword judging flow and a preset search flow, and performing cold and hot data keyword judgment and cold and hot data keyword search; and if the matched hot data or cold data key words are searched, displaying the key words and the data source addresses corresponding to the key words to obtain search results.
The searching and displaying page module comprises a source address displaying page unit and a keyword displaying page unit; the source address display page unit is used for displaying source address data information of the keywords; the keyword display page unit is used for displaying keyword data information.
In summary, according to the system and method for unified and hybrid search provided by the embodiments of the present invention, through the design of priority, the search of priority is performed through the hot data keyword hybrid search unit, and the search of the keyword with higher search frequency is performed in advance. In addition, the searching of the thermal data keyword mixed searching unit is also provided with priority, the keywords stored in the local data resource storage module of the pre-stored heat storage data keywords are preferentially called to be compared with the input keywords for searching, and the local data resource storage module of the pre-stored thermal data keywords is a local disk and is used for storing the thermal data keywords which are simple in searching steps, so that if the input keywords are consistent with the keywords stored in the local data resource storage module of the pre-stored heat storage data keywords, the searching efficiency can be greatly improved, and the occupation of the searching resources is reduced.
Secondly, when the hot data keyword mixed search unit does not search for a matched keyword, the cold data keyword mixed search unit is used for performing comparison search, the cold data keyword mixed search unit is also provided with priority, firstly, keywords stored in a local data resource storage module for pre-storing cold data keywords are called to perform comparison search with input keywords, and the local data resource storage module for pre-storing the cold data keywords is a local disk and is used for storing the cold data keywords which are simple in the search step, so that the search efficiency can be further improved, and the occupation of search resources can be reduced.
Finally, the search frequency of each keyword is recorded by the keyword search frequency statistical module, and when the search frequency reaches 12 hours set by the cold and hot data screening timing module, a cold and hot data screening instruction is sent to the central processing unit, and then key words in the hot data keyword mixed search unit and the cold data keyword mixed search unit are re-identified according to the search frequency of each keyword counted by the keyword search frequency statistical module; and further, the cold and hot data are further judged according to the hot data storage probability, so that the cold and hot data are re-identified and distributed, and are stored in corresponding storage modules, so that different requirements of different users on hardware storage cost are met, the condition that the matched keywords are searched in the shortest time limit during each keyword search is ensured, and the search efficiency is greatly improved.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims (10)

1. A unified hybrid search method is characterized by comprising the following steps:
acquiring a keyword to be searched;
comparing and searching the locally pre-stored hot data keywords, the non-locally pre-stored hot data keywords, the locally pre-stored cold data keywords and the non-locally pre-stored cold data keywords according to the set priorities;
when the keywords of the hot data or the cold data are searched to obtain the keywords which accord with the keywords, displaying the keywords and the data source addresses corresponding to the keywords to obtain a search result; and the data source address is hyperlinked with the corresponding keyword.
2. The unified hybrid search method according to claim 1, wherein said local pre-storage is a local disk storage for storing hot data keywords and cold data keywords of search text type; the non-local pre-storage is cloud storage and is used for storing non-text hot data keywords and non-text cold data keywords.
3. The model training method according to claim 1, wherein the cold data keywords and the hot data keywords are obtained by counting a search frequency of each keyword at predetermined intervals and determining according to a search frequency threshold and a hot data storage probability.
4. The unified hybrid search method of claim 3, wherein determining the cold data keyword and the hot data keyword according to the search frequency threshold and the hot data storage probability comprises:
setting an updating period;
monitoring the times of retrieval of each keyword in an updating period, and primarily judging cold and hot data keywords according to a search frequency threshold to obtain pre-stored cold storage data keywords and pre-stored heat storage data keywords;
calculating the storage probability of the thermal data of the keywords of the pre-stored heat storage data, and selecting the data with the storage probability larger than the threshold value of the storage probability as the thermal data for storage; and storing the data with the storage probability smaller than the storage probability threshold value and the pre-stored cold storage data key words as cold data key words.
5. The unified hybrid search method of claim 4, wherein the hot data storage probability is calculated by the following formula:
Figure FDA0003869915580000021
wherein, P is the storage probability of the thermal data, t is the number of times of searching for the current keyword, w is the total number of times of searching for all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
6. The unified hybrid search method of claim 1, wherein the data source address is used to link to a web page or file matching a keyword; and storing the data source address corresponding to the local pre-stored keyword by using a local disk, and storing the data source address corresponding to the non-local pre-stored keyword by using cloud.
7. A unified hybrid search system, comprising: the system comprises a central processing unit, a unified hybrid storage module and a data source address storage module;
the unified hybrid storage module is used for screening and respectively storing the cold data keywords and the hot data keywords; the unified hybrid storage module comprises a cold and hot keyword judgment unit, a cold data keyword storage unit and a hot data keyword storage unit;
the data source address storage module is used for storing data source address information and/or index information corresponding to the keywords; hyperlinks between the data source address and the corresponding keyword;
the central processing unit is used for calling data of the cold and hot keyword judging unit and the keyword storage unit according to a preset cold and hot keyword judging flow and a preset search flow, and performing cold and hot data keyword judgment and cold and hot data keyword search; and if the matched hot data or cold data key words are searched, displaying the key words and the data source addresses corresponding to the key words to obtain search results.
8. The unified hybrid search system according to claim 7, wherein the hot and cold keyword decision unit comprises a hot and cold data filtering timing unit, a keyword search frequency statistic unit and a hot data storage probability calculation unit;
the cold and hot data screening timing unit is used for sending a cold and hot data screening instruction to the central processing unit at regular time;
the keyword search frequency counting unit is used for recording and counting the search frequency of each keyword in preset time; the central processing unit identifies cold and hot data keywords in the hot data keyword storage unit and the cold data keyword storage unit according to the counted search frequency of each keyword, and prestores the cold and hot data keywords in the corresponding storage units according to identification results;
the hot data storage probability calculation unit is used for calculating the hot data storage probability of the hot data key words; and storing the data with the probability greater than the threshold value as a final hot data keyword, and storing the data with the probability less than the threshold value and the data prestored in the cold data storage unit as a final cold data keyword.
9. A unified hybrid search system according to claim 7 wherein said cold and hot data keyword storage units each comprise: the system comprises a local data resource storage module and a non-local data resource storage module;
the local data resource storage module is a local disk and is used for storing the cold and hot data keywords which are simple in the searching step; the non-local data resource storage module is used for storing the cold and hot data keywords with complicated searching steps.
10. The unified hybrid search system of claim 8, wherein the thermal data storage probability is calculated by the following formula:
Figure FDA0003869915580000031
wherein, P is the storage probability of the thermal data, t is the number of times of searching for the current keyword, w is the total number of times of searching for all the thermal data keywords, R is the total data volume, and R is the data volume carrying the thermal data keywords.
CN202211196900.9A 2022-09-28 2022-09-28 Unified hybrid search method and system Active CN115438236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211196900.9A CN115438236B (en) 2022-09-28 2022-09-28 Unified hybrid search method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211196900.9A CN115438236B (en) 2022-09-28 2022-09-28 Unified hybrid search method and system

Publications (2)

Publication Number Publication Date
CN115438236A true CN115438236A (en) 2022-12-06
CN115438236B CN115438236B (en) 2023-08-29

Family

ID=84250566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211196900.9A Active CN115438236B (en) 2022-09-28 2022-09-28 Unified hybrid search method and system

Country Status (1)

Country Link
CN (1) CN115438236B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198068A1 (en) * 2004-03-04 2005-09-08 Shouvick Mukherjee Keyword recommendation for internet search engines
US20100211605A1 (en) * 2009-02-17 2010-08-19 Subhankar Ray Apparatus and method for unified web-search, selective broadcasting, natural language processing utilities, analysis, synthesis, and other applications for text, images, audios and videos, initiated by one or more interactions from users
CN110633319A (en) * 2019-09-28 2019-12-31 绍兴柯桥浙工大创新研究院发展有限公司 Big data analysis system for industrial design
CN111090674A (en) * 2019-12-28 2020-05-01 安徽微沃信息科技股份有限公司 Search engine system based on hot words and cache
CN111159066A (en) * 2020-01-07 2020-05-15 杭州电子科技大学 Dynamically-adjusted cache data management and elimination method
CN113672169A (en) * 2021-07-19 2021-11-19 浙江大华技术股份有限公司 Data reading and writing method of stream processing system and stream processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198068A1 (en) * 2004-03-04 2005-09-08 Shouvick Mukherjee Keyword recommendation for internet search engines
US20100211605A1 (en) * 2009-02-17 2010-08-19 Subhankar Ray Apparatus and method for unified web-search, selective broadcasting, natural language processing utilities, analysis, synthesis, and other applications for text, images, audios and videos, initiated by one or more interactions from users
CN110633319A (en) * 2019-09-28 2019-12-31 绍兴柯桥浙工大创新研究院发展有限公司 Big data analysis system for industrial design
CN111090674A (en) * 2019-12-28 2020-05-01 安徽微沃信息科技股份有限公司 Search engine system based on hot words and cache
CN111159066A (en) * 2020-01-07 2020-05-15 杭州电子科技大学 Dynamically-adjusted cache data management and elimination method
CN113672169A (en) * 2021-07-19 2021-11-19 浙江大华技术股份有限公司 Data reading and writing method of stream processing system and stream processing system

Also Published As

Publication number Publication date
CN115438236B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US6502091B1 (en) Apparatus and method for discovering context groups and document categories by mining usage logs
US9589208B2 (en) Retrieval of similar images to a query image
JP2021108183A (en) Method, apparatus, device and storage medium for intention recommendation
WO2021098648A1 (en) Text recommendation method, apparatus and device, and medium
US8255386B1 (en) Selection of documents to place in search index
US8898140B2 (en) Identifying and classifying query intent
US9317613B2 (en) Large scale entity-specific resource classification
US8311999B2 (en) System and method for knowledge research
WO2018072071A1 (en) Knowledge map building system and method
US10255363B2 (en) Refining search query results
WO2017097231A1 (en) Topic processing method and device
CN108776671A (en) A kind of network public sentiment monitoring system and method
US20040163034A1 (en) Systems and methods for labeling clusters of documents
WO2019196226A1 (en) System information querying method and apparatus, computer device, and storage medium
US20150341771A1 (en) Hotspot aggregation method and device
CN113407785B (en) Data processing method and system based on distributed storage system
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
US9552415B2 (en) Category classification processing device and method
JPWO2013146736A1 (en) Synonym relation determination device, synonym relation determination method, and program thereof
JP4375626B2 (en) Search service system and method for providing input order of keywords by category
CN116010552A (en) Engineering cost data analysis system and method based on keyword word library
Li [Retracted] Internet Tourism Resource Retrieval Using PageRank Search Ranking Algorithm
US20160246794A1 (en) Method for entity-driven alerts based on disambiguated features
JP4219122B2 (en) Feature word extraction system
CN115438236B (en) Unified hybrid search method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant