CN107169085B - Data search system - Google Patents

Data search system Download PDF

Info

Publication number
CN107169085B
CN107169085B CN201710332855.8A CN201710332855A CN107169085B CN 107169085 B CN107169085 B CN 107169085B CN 201710332855 A CN201710332855 A CN 201710332855A CN 107169085 B CN107169085 B CN 107169085B
Authority
CN
China
Prior art keywords
search
data
module
rule
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710332855.8A
Other languages
Chinese (zh)
Other versions
CN107169085A (en
Inventor
杜源
李鸽子
景蔚亮
陈小刚
陈邦明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinchu Integrated Circuit Co Ltd
Original Assignee
Shanghai Xinchu Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinchu Integrated Circuit Co Ltd filed Critical Shanghai Xinchu Integrated Circuit Co Ltd
Priority to CN201710332855.8A priority Critical patent/CN107169085B/en
Publication of CN107169085A publication Critical patent/CN107169085A/en
Application granted granted Critical
Publication of CN107169085B publication Critical patent/CN107169085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data search system, belonging to the technical field of data retrieval; the system comprises: the first setting unit is used for setting a search time interval, a processor and a memory of the server are both in a closed state, and the server is in an offline state; a second setting unit for setting a search rule according to which data is searched; the first data search unit further includes: the rule acquisition module is used for acquiring a search rule depended by the data search; the rule searching module is connected with the rule obtaining module and used for searching the storage network in the server according to the searching rule in the searching time period and outputting a corresponding searching result; the first data search unit is provided in a controller inside a memory of the server. The beneficial effects of the above technical scheme are: the idle time period of the server is effectively utilized to search data, so that a large amount of power consumption is saved for the data center, the service cost of a service provider is reduced, and the service quality is improved.

Description

Data search system
Technical Field
The invention relates to the technical field of data retrieval, in particular to a data search system.
Background
With the advent of the cloud era, Big Data (Big Data) technology is increasingly applied to daily life of people, and people generally adopt Big Data to describe and define mass Data generated in the information explosion era and technical development and innovation related to the mass Data. With the rapid increase of data volume, how to query or search valuable data required by users from a huge data group becomes a problem with general significance.
In the prior art, a user usually sends a data query request to a network through a personal computer or other user terminals, and a server of a data center starts to query or search data information required by the user from a storage network after receiving the query request. For servers in a data center, a great deal of tasks are to query or search users' data information from a mass storage network. At present, for a processor in a server of a data center, the processor can only directly process data information in a memory, that is, data stored in a memory in a storage network needs to be firstly transferred to the memory of the server, the processor can query or search the data according to a query request, and finally, a result is fed back to a user terminal. However, the memory capacity of the server is relatively limited, and for the huge and increasing amount of data in the memory of the storage network, the server can only transfer the data to the memory for processing by the processor for many times, and the rate of data transfer from the storage network to the memory is relatively slow compared to the rate of data processing by the processor, which inevitably becomes a bottleneck for the rate of data processing by the processor. In addition, the amount of data imported from the storage network into the server Memory is obviously much larger than the amount of data returned from the server to the user terminal, and the server Memory is generally composed of a Dynamic Random Access Memory (DRAM), and needs to be refreshed continuously to maintain the data, which also causes a large portion of waste of power consumption.
In addition, as the internet is gradually popularized, more and more user terminals access the network, and thus a huge number of query requests are brought along with the user terminals. When there are many query requests in the same time, the network is blocked due to the limitation of the network bandwidth, so that the time for the useful information to reach the user is greatly increased, and the user experience is reduced. Meanwhile, a large number of query requests generate huge data search results, which are usually kept in the memory of the server to immediately respond to the same query request, which is a huge challenge to the memory capacity of the server. Conventionally, the search history and the search results are sequentially cleared of the memory space within a certain time, however, when the same or similar query request is faced later, the same query or search operation needs to be performed again from the storage network. In addition, when a server of a conventional data center performs data query, a processor and a memory need to participate together, and a large number of user query requests cause the processor and the memory to work continuously, which can greatly increase the cost of power consumption of the data center.
Disclosure of Invention
According to the problems in the prior art, a technical scheme of a data search system is provided, which aims to effectively utilize the idle time period of a server to search data, so that a large amount of power consumption is saved for a data center, the service cost of a service provider is reduced, and the service quality is improved.
The technical scheme specifically comprises the following steps:
a data search system is applied to a server of a data center; wherein, include:
a first setting unit, configured to set a search time period for performing data search, where a processor and a memory of the server are both in a closed state and the server is in an offline state during the search time period;
the second setting unit is used for providing a user with a search rule according to which data search is set, wherein the search rule comprises a search prompt formed by at least one keyword and/or keyword, which is depended by the data search;
a first data search unit, respectively connected to the first setting unit and the second setting unit, the first data search unit further comprising:
the rule acquisition module is used for acquiring the search rule depended by the data search;
the rule searching module is connected with the rule obtaining module and used for searching the storage network in the server according to the searching rule in the searching time period and outputting a corresponding searching result;
the first data search unit is provided in a controller inside a memory of the server.
Preferably, the data search system further includes:
the statistical unit is connected with the first setting unit and is used for counting the normal work cycle of the server to obtain the idle operation time period of the server;
the first setting unit sets the search period according to the idle operation period.
Preferably, the data search system further includes:
the input unit is connected with the first setting unit and used for providing a user with a setting instruction for setting the search time period;
the first setting unit sets the search period according to the setting designation.
Preferably, the data search system further includes:
and the first storage unit is connected with the first data searching unit and used for generating and storing a corresponding result document according to the searching result.
Preferably, in the data search system, the first data search unit includes:
the result comparison module is connected with the rule search module and used for comparing the result document obtained by the data search with the result document obtained by the last data search and outputting a corresponding comparison result;
and the first notification module is connected with the result comparison module and used for sending a notification message to the user when the comparison result shows that the result of the data search is updated.
Preferably, in the data search system, the first data search unit includes:
the result comparison module is connected with the rule search module and used for comparing the result document obtained by the data search with the result document obtained by the last data search and outputting a corresponding comparison result;
the search reconstruction module is respectively connected with the result comparison module and the rule search module and is used for recombining the search prompts in the search rules according to a preset rule when the comparison result shows that the result of the data search is not updated so as to form the reconstructed search prompts;
and the second notification module is respectively connected with the search reconstruction module and the rule search module and is used for providing the result document formed by the search result obtained by searching according to the reconstructed search rule for the user to view.
Preferably, the data search system is provided with a second storage unit connected to the first storage unit and the first data search unit, respectively, the second storage unit is used for storing the search rule depending on each data search and the result document extracted from the first storage unit, and establishing a corresponding relationship between the search rule and the result document in the second storage unit;
the data search system also comprises a second data search unit which is connected with the second storage unit;
the second data search unit further comprises:
the request acquisition module is used for acquiring an externally input query request comprising the search rule;
the query module is connected with the request acquisition module and used for searching data in the storage network according to the query request;
the rule judging module is respectively connected with the request acquiring module and the inquiring module and used for searching whether the matched search rule exists in the second storage unit according to the search rule depended by the data search and outputting a corresponding judging result;
the query module is used for:
when the matched search rule exists in the second storage unit, the result document corresponding to the search rule is directly extracted to be used as the search result of the data search and output;
and when the matched search rule does not exist in the second storage unit, performing data search by adopting the search rule depended by the data search, and outputting a corresponding search result.
Preferably, in the data search system, the second data search unit further includes:
and the setting module is connected with the rule judging module and used for starting or closing the rule judging module according to an externally input instruction.
Preferably, the data search system, wherein the data saved in the storage network of the server are included in a plurality of different user folders, respectively;
the data search system further comprises:
the third data searching unit is connected with the first setting unit and used for searching data in different user folders in the server when the server is in an offline state so as to perform duplicate removal processing on the same data in the different user folders;
the third data search unit is arranged in a controller inside a memory of the server;
the third data search unit further includes:
the first searching module is used for searching data in different user folders in the server when the server is in an offline state so as to find the same data in the different user folders;
the data deduplication module is connected with the first search module and used for reserving data in one of the user folders with the same data and deleting the same data in all the other user folders according to search results of the first search module;
the link generation module is connected with the data deduplication module and used for generating a corresponding access link under the user folder when the data deduplication module deletes data in the user folder;
after the data deduplication processing, only one user folder which is not deleted for the same data is included in a plurality of different user folders with the same data and is used as a target folder, and the deleted same data is used as target data;
the access link points to a storage address of the target data in the target folder.
Preferably, the data search system, wherein the data saved in the storage network of the server are included in a plurality of different user folders, respectively;
the data search system further comprises:
the fourth data searching unit is connected with the first setting unit and used for searching data in different user folders in the server when the server is in an offline state so as to establish a corresponding relationship between similar data in the different user folders;
the fourth data search unit is arranged in the controller inside the memory of the server;
the fourth data search unit further includes:
the second searching module is used for searching data in different user folders in the server when the server is in an offline state so as to find similar data in the different user folders;
and the marking module is connected with the second searching module and is used for marking the similar data in different user folders according to the searching result of the second searching module so as to establish the corresponding relationship between the similar data.
The beneficial effects of the above technical scheme are: the data search system can effectively utilize the idle time period of the server to search data, not only saves a large amount of power consumption for a data center, thereby reducing the service cost of a service provider, but also improving the service quality.
Drawings
FIG. 1 is a schematic diagram of a data search system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a specific structure of a first data search unit according to a preferred embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a specific structure of a second data search unit according to a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a specific structure of a third data search unit according to a preferred embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a detailed structure of a fourth data search unit according to a preferred embodiment of the present invention;
FIG. 6 is a schematic diagram of time slot allocation of a busy time slot and an idle time slot of the server according to the preferred embodiment of the present invention;
FIG. 7 is a diagram illustrating a data search system for performing data deduplication, according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
In view of the above problems in the prior art, there is now provided a data search system, which is applied in a server of a data center, and has a specific structure as shown in fig. 1, including:
a first setting unit 1, configured to set a search time period for performing data search, where a processor and a memory of a server are both in a closed state and the server is in an offline state during the search time period;
the second setting unit 2 is used for providing a user with a search rule according to which data search is set, wherein the search rule comprises a search prompt formed by at least one keyword and/or keyword, which is depended by the data search;
the first data searching unit 3 is connected with the first setting unit 1 and the second setting unit 2 respectively.
As shown in fig. 2, the first data search unit further includes:
a rule obtaining module 31, which obtains a search rule that this data search depends on;
the rule searching module 32 is connected with the rule obtaining module 31 and used for searching the storage network in the server according to the searching rule in the searching time period and outputting a corresponding searching result;
the first data search unit is provided in a controller inside a memory of the server.
Specifically, in this embodiment, the first data searching unit is disposed in a controller inside a storage (e.g., an HDD or an SSD) of the server, that is, the controller inside the storage is used to directly search data in the storage network, instead of using a processor and a memory to search data in the prior art.
Therefore, the data search system described above can be performed during periods when the network is not congested (or during idle periods for the server), and during these periods, the server is usually in an Off-line (Off-line) state, and thus. During these periods (which may be nighttime in general), the processor (CPU) and memory within the server may be turned off and an in-memory controller employed to perform data searches on the data in the storage network.
Although the performance of data searching using the controller in the memory is much lower than that of the processor in the server, and the data processing is performed at a slower speed, the disadvantage of performance does not have much impact on the user experience because the server is idle and is in an offline state, and the controller in the memory has a lot of time to perform data searching.
In this embodiment, the first setting unit 1 may be adopted to set a search time period, which is an idle time period of the server (for example, 0 to 6 points in the morning). The specific setting of the search period will be described in detail below.
In this embodiment, the second setting unit 2 may be adopted to set a search rule, where the search rule may include a search hint composed of at least one keyword and/or keyword, which is relied on by data search. Specifically, the search prompt may be a plurality of keywords and/or a combination of keywords, and the search rule may further include a period of time that the user wishes to perform a search, which of course should be within the preset search period of time.
For example, the user can set up keywords and/or combinations of keywords and query frequency (e.g., daily, weekly, or monthly) for automatic query according to his or her needs and interests, and can set up the start time and end time of the query according to his or her needs. The above information is included in the search rules. After the search rule is set and submitted, the first data searching unit 3 will automatically start to search for related topic information or contents such as web pages in the search time period according to the search rule, and finally output the search result.
In a preferred embodiment of the present invention, as still shown in fig. 1, the data search system further includes:
the statistical unit 4 is connected with the first setting unit 1, and the statistical unit 4 is used for counting the normal work cycle of the server to obtain the idle operation time period of the server;
the first setting unit 1 sets a search period according to an idle operation period.
Specifically, in this embodiment, the statistical unit 4 performs data collection and statistics in a normal working cycle of the server, and may specifically collect operation conditions of a processor and a memory of the server and an occupation condition of an entire network bandwidth of the server, so as to obtain an idle period of the server through statistics. In the idle period, the server receives fewer user query requests, and the network bandwidth occupies less bandwidth (i.e., the network is not congested), so that the processor and the memory of the server can be shut down in the idle period without affecting the normal operation of the server. The idle periods of different servers may be different, so that the statistical unit 4 is adopted to perform statistics on different servers respectively to obtain the idle periods thereof, and the idle periods can be set as the search periods by the first setting unit 1. A comparison of busy periods and idle periods of a server may be generally described with reference to fig. 6.
In another preferred embodiment of the present invention, as still shown in fig. 1, the data search system further includes:
an input unit 5 connected with the first setting unit 1 for providing a user with a setting instruction for setting a search period;
the first setting unit 1 sets a search period according to the setting designation.
Specifically, in the present embodiment, the input unit 5 provides the user with a manual setting of the search period.
The above-mentioned statistical unit 4 and the input unit 5 are shown simultaneously in fig. 1. In different embodiments of the present invention, the statistical unit 4 and the input unit 5 may be used alternatively.
In a preferred embodiment of the present invention, as still shown in fig. 1, the data search system further includes:
and the first storage unit 6 is connected with the first data searching unit 3 and used for generating and storing a corresponding result document according to the searching result.
Specifically, in the present embodiment, the search result obtained by the data search performed by the first data search unit 3 is formed into a corresponding document and is stored in the first storage unit 6. The first storage unit 6 is a memory (HDD or SSD) in the server. In other words, in the data searching process of the present invention, the processor and the memory are not required to participate, and the data searching process can be completed only by the memory and the controller inside the memory.
In a preferred embodiment of the present invention, as shown in fig. 2, the first data searching unit 3 further includes:
a result comparing module 33 connected to the rule searching module 32, for comparing the result document obtained by the data search with the result document obtained by the data search of the last time, and outputting the corresponding comparison result;
and a first notification module 34 connected to the result comparison module 33, for sending a notification message to the user when the comparison result indicates that the result of the data search is updated.
Specifically, in this embodiment, after each data search is finished and a result document corresponding to the data search is generated, the result comparing module 33 needs to compare the result document corresponding to the data search with the result document corresponding to the previous data search, so as to determine whether the search result of the same living body or web page has updated information:
if there is the update information, the organized information is put into a new document and stored in the first storage unit 6, and then the document is sent to the user as a notification message, for example, the notification message may be pushed to the user by a mail, a short message, or an instant messaging software, and the pushing operation needs to be executed after the search time period is over (i.e., when the processor and the memory of the server start working again).
If there is no update information, the server does not need to recreate a new document or notify the user. Of course, the server and the user should agree in advance, and when the server does not push the notification message to the user, it indicates that the search result has no updated information. Accordingly, the user may also request the server to notify the user of the search result when the information is not updated in a preset manner, that is, the notification manner may be changed according to the setting of the user.
Through the arrangement, the data search system in the technical scheme of the invention not only saves the trouble that a user needs to input information frequently to perform corresponding query operation, but also reduces the power consumption and the cost of the data center server.
In a preferred embodiment of the present invention, the result comparing module 33 is connected to the rule searching module 32, and is configured to compare the result document obtained by the current data search with the result document obtained by the previous data search, and output a corresponding comparison result.
As also shown in fig. 1, the first data searching unit 3 further includes:
the search reconstruction module 35 is respectively connected with the result comparison module 33 and the rule search module 32, and is used for recombining the search prompts in the search rules according to a preset rule to form reconstructed search prompts when the comparison result indicates that the result of the data search is not updated;
and the second notification module 36 is respectively connected with the search reconstruction module 35 and the rule search module 32, and is used for providing a result document formed by the search result obtained by searching according to the reconstructed search rule for the user to view.
In particular, in the present embodiment, when the present data search does not update information with respect to the previous data search, a mechanism for fuzzy search needs to be further provided. Under the mechanism, a search reconstruction module 35 is adopted to recombine the search prompts in the search rules to form reconstructed search prompts, and the search is performed again according to the search rules including the reconstructed search prompts. Specifically, the term "recombining search prompts" refers to rearranging and combining keywords and/or keywords in the search prompts to form new search prompts, wherein the new search prompts have a certain degree of association with previous search prompts, but are not identical search prompts, and therefore, the data search can be restarted by using the new search prompts to expand the search range.
In this embodiment, the reconstructing process of the search prompt may include: and generating a group of random numbers by using a random number generator, and then recombining the keywords and/or the keywords in the search prompt by using the group of random numbers, or deleting individual keyword words in the search prompt, and then forming a reconstructed search prompt.
For example, the content to be searched by the user (i.e., the search prompt composed of the keywords) is "air conditioner, smart, variable rotation speed, certain brand 1, certain brand 2", if there is no updated result in one search with respect to the previous search, "air conditioner" is encoded as 1, "smart" is encoded as 2, "variable rotation speed" is encoded as 3, "certain brand 1" is encoded as 4, "certain brand 2" is encoded as 5, and N (N <5) random numbers are generated by a random number generator, and the values of these random numbers range from 1 to 5. For example, if a random number generated at a time is 1, 3, 4, or 5, then a search prompt formed after reconstruction by using the random number set is "air conditioner, variable rotation speed, certain brand 1, or certain brand 2".
In this embodiment, after the re-search is performed, a new document is generated based on the search result and stored in the first storage unit 6, and then the document is sent to the user as a notification message. The sending method and the sending time can refer to the above description of the first notification module 34, and are not described herein again.
In a preferred embodiment of the present invention, as shown in fig. 1, in the data searching system, a second storage unit 7 is provided, which is connected to the first storage unit 6 and the first data searching unit 3, respectively, the second storage unit 7 is used for storing the search rule depending on each data search and the result document extracted from the first storage unit 6, and establishing the corresponding relationship between the search rule and the result document in the second storage unit 7;
the data search system further comprises a second data search unit 8, and the second data search unit 8 is connected to the second storage unit 7;
as shown in fig. 3, the second data search unit 8 further includes:
a request obtaining module 81, configured to obtain an externally input query request including a search rule;
the query module 82 is connected with the request acquisition module 81 and is used for searching data in the storage network according to the query request;
the rule judging module 83 is respectively connected to the request obtaining module 81 and the query module 82, and is configured to find whether a matching search rule exists in the second storage unit according to a search rule relied by the data search, and output a corresponding judgment result;
the query module 81 is configured to, according to the determination result:
when the matched search rule exists in the second storage unit 7, directly extracting a result document corresponding to the search rule as a search result of the data search and outputting the result document;
and when the matched search rule does not exist in the second storage unit 7, performing data search by adopting the search rule depended by the data search, and outputting a corresponding search result.
Specifically, in an embodiment of the present invention, the second data searching unit 8 may be disposed in a processor of the server, which means that the data searching operation of the second data searching unit 8 is performed outside the searching period (i.e. when the processor and the memory are operating normally). In another embodiment of the present invention, the second data searching unit 8 may also be disposed in the controller of the storage, which means that the data searching operation of the second data searching unit 8 is performed during the searching period (i.e. when the processor and the memory are turned off).
In this embodiment, the second storage unit 7 may be a memory of a server, a nonvolatile memory, or another memory in a storage network. In the search period, the result document obtained after each data search by the first data search unit 3 is saved in the second storage unit 7 when the processor and the memory of the server resume normal operation, in addition to the first storage unit 6. In addition, the second storage unit 7 is connected to the first data search unit 3, when the processor and the memory start to work normally, the search rule that the first data search unit 3 depends on for each search is transmitted to the second storage unit 7 to be stored, and the corresponding relationship between the search rule and the result document is established in the second storage unit 7.
Specifically, when the request obtaining module 81 in the second data searching unit 8 receives a query request, the rule determining module 83 parses the corresponding search rule (i.e. the combination of keywords/words) from the query request, and retrieves whether there is a matching search rule from the second storage unit 8 according to the search rule:
if the search rule exists, the search rule depended by the data search is searched by someone before, and at the moment, the corresponding result document is directly extracted and pushed to the user;
if the search rule does not exist, the search rule depended by the data search is not used by anyone before, and at the moment, the normal data search can be started again directly according to the search rule.
In this embodiment, the search rules are matched, which means that the search rules are the same or similar. Further, it means that the keywords/words in the search rules are the same or similar. For example, the keywords/words included in the two search rules are the same (regardless of the arrangement order of the keywords/words), or the keywords/words included in the two search rules are increased or decreased by only a few words/words, and the like.
In the technical scheme of the invention, by applying the searching method, the operation of searching related contents in a storage network repeatedly can be omitted for the server of the data center, so that the power consumption is saved.
In a preferred embodiment of the present invention, as still shown in fig. 3, the second data searching unit 8 further includes:
the setting module 84 is connected to the rule determining module 83, and is configured to turn on or turn off the rule determining module 83 according to an externally input instruction.
Specifically, the user can set whether to perform a matching operation before searching data by using the setting module 84:
if the user inputs an instruction to control the setting module 84 to start the rule determining module 83, which indicates that the user wishes to perform a matching operation before searching data, the second data searching unit 8 performs matching of the search rule and corresponding data searching processing according to the matching process described above.
If the user inputs a command to control the setting module 84 to close the rule determination module 83, which indicates that the user wishes to directly perform a data search, the second data search unit 8 directly performs a data search according to a search rule included in the query request input by the user.
In a preferred embodiment of the present invention, the data stored in the storage network of the server is included in a plurality of different user folders, respectively;
as still shown in fig. 1, the data search system further includes:
and the third data searching unit 9 is connected to the first setting unit 1, and is configured to perform data search on different user folders in the server when the server is in an offline state, so as to perform deduplication processing on the same data in the different user folders.
The third data search unit 9 described above is arranged in the controller inside the memory of the server, i.e. the third data search unit 9 also realizes the corresponding function by the controller in the memory during the search period.
Further, as shown in fig. 4, the third data search unit 9 includes:
the first search module 91 is configured to perform data search on different user folders in the server when the server is in an offline state, so as to find the same data in the different user folders;
the data deduplication module 92 is connected to the first search module 91 and configured to, according to the search result of the first search module 91, reserve data in one of the user folders in the multiple user folders having the same data, and delete the same data in all the other user folders;
the link generation module 93 is connected to the data deduplication module 92, and is configured to generate a corresponding access link under a user folder when the data deduplication module 92 deletes data in the user folder;
after data deduplication processing, only one user folder without deleted same data is included in a plurality of different user folders with the same data and is used as a target folder, and the deleted same data is used as target data;
the access link points to a storage address of the target data in the target folder.
Specifically, in this embodiment, when the server is in an idle period (in an offline state at this time), the controller in the memory automatically performs a comprehensive search on the data stored inside the storage network to perform data deduplication processing on the storage network. The method specifically comprises the following steps:
first, a first search module 91 is used to search the entire storage space, search whether the same data exists in different user folders in the storage space, and output the search result.
Then, a data deduplication module 92 is adopted to retain the same data in only one of the user folders according to the search result for different user folders with the same data, delete the same data in all the other user folders, regard the user folder retaining the same data as a target folder, and regard the retained same data as target data.
Finally, a link generating module 93 is adopted to generate an access link at the corresponding storage location of the user folder from which the same data is deleted, where the access link points to the storage address of the target folder where the target data is stored, that is, a user can directly access the target data through the access link, and after data deduplication processing, only one user folder in the whole storage network stores the target data, and the other user folders no longer store the same data as the target data in the target folder.
In a preferred embodiment of the present invention, a general processing flow of the data deduplication process can be shown in fig. 7. In fig. 7, one user folder named "user a" stores file a, file B, file C, and other files, and the other user folder named "user B" stores file a, file Y, file Z, and other files. Then, after data search, it is found that the two user folders store the same data "file a", at this time, the file a in the user a is reserved (i.e. the user a is regarded as a target folder, and the file a is regarded as target data), the file a in the user b is deleted, and an access link is newly generated at the storage location where the file a is originally stored in the user b, and the access link is directly linked to the storage address of the file a in the user a.
In a preferred embodiment of the present invention, the data stored in the storage network of the server is included in a plurality of different user folders, respectively;
as still shown in fig. 1, the data search system further includes:
the fourth data searching unit 10 is connected to the first setting unit 1, and is configured to perform data search on different user folders in the server when the server is in an offline state, so as to establish a corresponding relationship between close data in the different user folders;
the fourth data search unit 10 described above is arranged in a controller inside the memory of the server, i.e. the fourth data search unit 10 also realizes the corresponding function by the controller in the memory during the search period.
Further, as shown in fig. 5, the fourth data search unit 10 includes:
the second searching module 101 is configured to perform data searching on different user folders in the server when the server is in an offline state, so as to find similar data in the different user folders. The judgment of the above-mentioned close data can be realized by presetting the similarity. For example, a standard data similarity of 40% or 50% is preset, and when the similarity between two data exceeds the standard data similarity, the two data are considered to be similar data. In this embodiment, since each data or file has different attributes, such as author, content relevance, belonging area, included keywords, and the like, the data similarity described above can be realized by determining the degree of similarity of the data attributes. For example, if the data includes four different types of attributes, the data similarity between two data having the same two types of attributes can be considered to be 50%. )
The marking module 102 is connected to the second searching module 101, and configured to mark, according to a search result of the second searching module, similar data in different user folders to establish a corresponding relationship between the similar data.
Specifically, in this embodiment, when the server is in an idle period (in an offline state at this time), the controller in the memory automatically performs a comprehensive search on the data stored inside the storage network to establish an association database between different data in the storage network.
First, a second search module 101 is used to search all data in the storage network to find similar data in the storage network, and output corresponding search results.
A labeling module 102 is then used to label the similar data to establish a correlation, i.e., a relational database storing data stored in the network is established by data search and label setting. After the relational database is established, when a user subsequently inquires information, more similar information can be provided for the user to know and select.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. A data search system is applied to a server of a data center; it is characterized by comprising:
a first setting unit, configured to set a search time period for performing data search, where a processor and a memory of the server are both in a closed state and the server is in an offline state during the search time period;
the second setting unit is used for providing a user with a search rule according to which data search is set, wherein the search rule comprises a search prompt formed by at least one keyword and/or keyword, which is depended by the data search;
a first data search unit, respectively connected to the first setting unit and the second setting unit, the first data search unit further comprising:
the rule acquisition module is used for acquiring the search rule depended by the data search;
the rule searching module is connected with the rule obtaining module and used for searching the storage network in the server according to the searching rule in the searching time period and outputting a corresponding searching result;
the first data search unit is provided in a controller inside a memory of the server.
2. The data search system of claim 1, further comprising:
the statistical unit is connected with the first setting unit and is used for counting the normal work cycle of the server to obtain the idle operation time period of the server;
the first setting unit sets the search period according to the idle operation period.
3. The data search system of claim 1, further comprising:
the input unit is connected with the first setting unit and used for providing a user with a setting instruction for setting the search time period;
the first setting unit sets the search period according to the setting instruction.
4. The data search system of claim 1, further comprising:
and the first storage unit is connected with the first data searching unit and used for generating and storing a corresponding result document according to the searching result.
5. The data search system of claim 4, wherein the first data search unit comprises:
the result comparison module is connected with the rule search module and used for comparing the result document obtained by the data search with the result document obtained by the last data search and outputting a corresponding comparison result;
and the first notification module is connected with the result comparison module and used for sending a notification message to the user when the comparison result shows that the result of the data search is updated.
6. The data search system of claim 4, wherein the first data search unit comprises:
the result comparison module is connected with the rule search module and used for comparing the result document obtained by the data search with the result document obtained by the last data search and outputting a corresponding comparison result;
the search reconstruction module is respectively connected with the result comparison module and the rule search module and is used for recombining the search prompts in the search rules according to a preset rule when the comparison result shows that the result of the data search is not updated so as to form the reconstructed search prompts;
and the second notification module is respectively connected with the search reconstruction module and the rule search module and is used for providing the result document formed by the search result obtained by searching according to the reconstructed search rule for the user to view.
7. The data search system according to claim 4, wherein a second storage unit is provided which is connected to the first storage unit and the first data search unit, respectively, for storing the search rule depending on each data search and the result document extracted from the first storage unit, and establishing a correspondence relationship between the search rule and the result document in the second storage unit;
the data search system also comprises a second data search unit which is connected with the second storage unit;
the second data search unit further comprises:
the request acquisition module is used for acquiring an externally input query request comprising the search rule;
the query module is connected with the request acquisition module and used for searching data in the storage network according to the query request;
the rule judging module is respectively connected with the request acquiring module and the inquiring module and used for searching whether the matched search rule exists in the second storage unit according to the search rule depended by the data search and outputting a corresponding judging result;
the query module is used for:
when the matched search rule exists in the second storage unit, the result document corresponding to the search rule is directly extracted to be used as the search result of the data search and output;
and when the matched search rule does not exist in the second storage unit, performing data search by adopting the search rule depended by the data search, and outputting a corresponding search result.
8. The data search system of claim 7, wherein the second data search unit further comprises:
and the setting module is connected with the rule judging module and used for starting or closing the rule judging module according to an externally input instruction.
9. The data search system according to claim 1, wherein the data held in the storage network of the server is included in a plurality of different user folders, respectively;
the data search system further comprises:
the third data searching unit is connected with the first setting unit and used for searching data in different user folders in the server when the server is in an offline state so as to perform duplicate removal processing on the same data in the different user folders;
the third data search unit is arranged in a controller inside a memory of the server;
the third data search unit further includes:
the first searching module is used for searching data in different user folders in the server when the server is in an offline state so as to find the same data in the different user folders;
the data deduplication module is connected with the first search module and used for reserving data in one of the user folders with the same data and deleting the same data in all the other user folders according to search results of the first search module;
the link generation module is connected with the data deduplication module and used for generating a corresponding access link under the user folder when the data deduplication module deletes data in the user folder;
after the data deduplication processing, only one user folder which is not deleted for the same data is included in a plurality of different user folders with the same data and is used as a target folder, and the deleted same data is used as target data;
the access link points to a storage address of the target data in the target folder.
10. The data search system according to claim 1, wherein the data held in the storage network of the server is included in a plurality of different user folders, respectively;
the data search system further comprises:
the fourth data searching unit is connected with the first setting unit and used for searching data in different user folders in the server when the server is in an offline state so as to establish a corresponding relationship between similar data in the different user folders;
the fourth data search unit is arranged in the controller inside the memory of the server;
the fourth data search unit further includes:
the second searching module is used for searching data in different user folders in the server when the server is in an offline state so as to find similar data in the different user folders;
and the marking module is connected with the second searching module and is used for marking the similar data in different user folders according to the searching result of the second searching module so as to establish the corresponding relationship between the similar data.
CN201710332855.8A 2017-05-12 2017-05-12 Data search system Active CN107169085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710332855.8A CN107169085B (en) 2017-05-12 2017-05-12 Data search system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710332855.8A CN107169085B (en) 2017-05-12 2017-05-12 Data search system

Publications (2)

Publication Number Publication Date
CN107169085A CN107169085A (en) 2017-09-15
CN107169085B true CN107169085B (en) 2020-12-01

Family

ID=59815916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710332855.8A Active CN107169085B (en) 2017-05-12 2017-05-12 Data search system

Country Status (1)

Country Link
CN (1) CN107169085B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107743101B (en) * 2017-09-26 2020-10-09 杭州迪普科技股份有限公司 Data forwarding method and device
CN111506818B (en) * 2020-04-22 2023-05-05 中国民航信息网络股份有限公司 Flight data processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763362B2 (en) * 2001-11-30 2004-07-13 Micron Technology, Inc. Method and system for updating a search engine
US7996348B2 (en) * 2006-12-08 2011-08-09 Pandya Ashish A 100GBPS security and search architecture using programmable intelligent search memory (PRISM) that comprises one or more bit interval counters
CN102081626B (en) * 2009-11-30 2012-12-26 中国移动通信集团北京有限公司 Data inquiring method and data inquiring server
CN105320569A (en) * 2015-11-04 2016-02-10 浪潮(北京)电子信息产业有限公司 Method and system of improving database server performance

Also Published As

Publication number Publication date
CN107169085A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
US20190294588A1 (en) Text deduplication method and apparatus, and storage medium
US10747951B2 (en) Webpage template generating method and server
US9262511B2 (en) System and method for indexing streams containing unstructured text data
CN109766318B (en) File reading method and device
CN110399348A (en) File deletes method, apparatus, system and computer readable storage medium again
CN107169085B (en) Data search system
CN104346345A (en) Data storage method and device
CN112866339B (en) Data transmission method and device, computer equipment and storage medium
CN110633379A (en) System and method for searching images by images based on GPU parallel operation
CN107077509B (en) Method, device and equipment for improving browsing experience
CN114996552A (en) Data acquisition method and terminal
CN111881086B (en) Big data storage method, query method, electronic device and storage medium
Podnar et al. Beyond term indexing: A P2P framework for web information retrieval
Irmak et al. Efficient query subscription processing for prospective search engines
CN111488370B (en) List paging quick response system and method
CN110798222B (en) Data compression method and device
KR100426995B1 (en) Method and system for indexing document
CN113127717A (en) Key retrieval method and system
CN112181994A (en) Method, device and medium for refreshing distributed memory database of operation and maintenance big data
CN107918654B (en) File decompression method and device and electronic equipment
Ragavan Efficient key hash indexing scheme with page rank for category based search engine big data
CN116756137B (en) Method, system and equipment for deleting large-scale data object storage
KR100434718B1 (en) Method and system for indexing document
CN115840785B (en) Distributed terminal data query system and method
CN116257521B (en) KV storage method based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant