CN108255972A - A kind of text searching method and system - Google Patents

A kind of text searching method and system Download PDF

Info

Publication number
CN108255972A
CN108255972A CN201711441728.8A CN201711441728A CN108255972A CN 108255972 A CN108255972 A CN 108255972A CN 201711441728 A CN201711441728 A CN 201711441728A CN 108255972 A CN108255972 A CN 108255972A
Authority
CN
China
Prior art keywords
file
retrieval
index
description information
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711441728.8A
Other languages
Chinese (zh)
Inventor
张迪
崔俊啸
臧德波
蔺川
景长超
张鹏
褚波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN201711441728.8A priority Critical patent/CN108255972A/en
Publication of CN108255972A publication Critical patent/CN108255972A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of text searching method and system, this method includes:At least one file data is obtained, determines the corresponding description information of each described file data;According to the description information, the corresponding file index of each described file data is built;Obtain retrieval information input by user;At least one search key is parsed from the retrieval information;File destination index corresponding at least one search key is determined from each file index;Determine that the file destination indexes corresponding goal description information, and the goal description information is shown.This programme can improve data search efficiency.

Description

Full-text retrieval method and system
Technical Field
The invention relates to the technical field of computers, in particular to a full-text retrieval method and a full-text retrieval system.
Background
With the development of computer technology, data shows explosive growth, and how to quickly retrieve target data from a file system has an important influence on improving data processing efficiency.
The distributed file system provided by Hadoop can store a large amount of data, each of which is stored in a separate storage device, e.g., in a separate disk. When retrieving target data, a user needs to check the storage device one by one to determine whether the target data exists in the storage device.
Because the amount of data stored in the file system is large and the storage positions of the data are different, the target data are searched by searching the storage devices one by one, and the efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a full-text retrieval method and a full-text retrieval system, which can improve the retrieval efficiency of data.
In a first aspect, an embodiment of the present invention provides a full-text retrieval method, including:
acquiring at least one file data, and determining description information corresponding to each file data;
according to the description information, constructing a file index corresponding to each file data;
acquiring retrieval information input by a user;
analyzing at least one retrieval keyword from the retrieval information;
determining a target file index corresponding to the at least one retrieval key word from each file index;
and determining target description information corresponding to the target file index, and displaying the target description information.
Preferably, the first and second electrodes are formed of a metal,
after the obtaining of the retrieval information input by the user, further comprising:
acquiring a retrieval condition input by a user;
the determining a target file index corresponding to the at least one search keyword from each file index includes:
and determining the target file index according to the retrieval conditions and the retrieval keywords.
Preferably, the first and second electrodes are formed of a metal,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
determining alternative file indexes corresponding to the retrieval time from the file indexes according to the retrieval time carried by the retrieval condition and the creation time in the description information corresponding to each file index;
determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
preferably, the first and second electrodes are formed of a metal,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
determining alternative file indexes corresponding to the retrieval file types from the file indexes according to the retrieval file types carried by the retrieval conditions and the created file types in the description information corresponding to the file indexes;
determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
preferably, the first and second electrodes are formed of a metal,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
combining the search keywords according to the splicing relation carried in the search condition;
and determining the target file index according to the combined retrieval key words.
Preferably, the first and second electrodes are formed of a metal,
further comprising: constructing an index library at a preset storage position;
the constructing of the file index corresponding to each file data according to the description information includes:
segmenting the file content in the description information by using a preset word segmentation device to obtain at least one content keyword;
processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information;
and storing the description information into the index library by using a preset index creator to form the file index.
Preferably, the first and second electrodes are formed of a metal,
further comprising:
receiving a file deletion request input by a user;
determining file data to be deleted from the at least one file data according to the file deletion request;
determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted;
and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
In a second aspect, an embodiment of the present invention provides a full-text retrieval system, including: the index retrieval system comprises an index construction unit, an acquisition unit and a retrieval unit; wherein,
the index construction unit is used for acquiring at least one file data, determining description information corresponding to each file data, and constructing a file index corresponding to each file data according to the description information;
the acquisition unit is used for acquiring retrieval information input by a user and analyzing at least one retrieval keyword from the retrieval information;
the retrieval unit is used for determining a target file index corresponding to the at least one retrieval key word from each file index; and determining target description information corresponding to the target file index, and displaying the target description information.
Preferably, the first and second electrodes are formed of a metal,
the acquisition unit is further used for acquiring a retrieval condition input by a user;
and the retrieval unit is used for determining the target file index according to the retrieval conditions and the retrieval keywords.
Preferably, the first and second electrodes are formed of a metal,
the retrieval unit is configured to determine, according to retrieval time carried by the retrieval condition and creation time in the description information corresponding to each file index, an alternative file index corresponding to the retrieval time from each file index; determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
preferably, the first and second electrodes are formed of a metal,
the retrieval unit is configured to determine, according to a retrieval file type carried by the retrieval condition and a created file type in the description information corresponding to each file index, an alternative file index corresponding to the retrieval file type from each file index; determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
preferably, the first and second electrodes are formed of a metal,
the retrieval unit is used for combining the retrieval keywords according to the splicing relation carried in the retrieval condition; and determining the target file index according to the combined retrieval key words.
Preferably, the first and second electrodes are formed of a metal,
further comprising: a setting unit; wherein,
the setting unit is used for constructing an index library at a preset storage position;
the index construction unit is used for segmenting the file content in the description information by using a preset word segmentation device to obtain at least one content keyword; processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information; and storing the description information into the index library by using a preset index creator to form the file index.
Preferably, the first and second electrodes are formed of a metal,
further comprising: an index deletion unit; wherein,
the acquisition unit is further used for receiving a file deletion request input by a user;
the index deleting unit is used for determining file data to be deleted from the at least one file data according to the file deleting request; determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted; and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
The embodiment of the invention provides a full-text retrieval method and a full-text retrieval system, which are used for generating a file index corresponding to each file data according to the description information of the acquired file data. When the retrieval information input by the user is acquired, the retrieval keywords are analyzed from the retrieval information, the target file indexes corresponding to the retrieval keywords are determined, and then the target description information corresponding to the target file indexes is displayed. Therefore, automatic retrieval of each file data is realized, and target data is not required to be retrieved by searching the storage devices one by one, so that the data retrieval efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a full text retrieval method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a full-text search system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a full-text search system according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram of a full-text retrieval system according to yet another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a full-text retrieval method, which may include the following steps:
step 101: acquiring at least one file data, and determining description information corresponding to each file data;
step 102: according to the description information, constructing a file index corresponding to each file data;
step 103: acquiring retrieval information input by a user;
step 104: analyzing at least one retrieval keyword from the retrieval information;
step 105: determining a target file index corresponding to the at least one retrieval key word from each file index;
step 106: and determining target description information corresponding to the target file index, and displaying the target description information.
In the above embodiment, the file index corresponding to each file data is generated according to the description information of the acquired file data. When the retrieval information input by the user is acquired, the retrieval keywords are analyzed from the retrieval information, the target file indexes corresponding to the retrieval keywords are determined, and then the target description information corresponding to the target file indexes is displayed. Therefore, automatic retrieval of each file data is realized, and target data is not required to be retrieved by searching the storage devices one by one, so that the data retrieval efficiency is improved.
In one embodiment of the present invention, the method may further comprise: constructing an index library at a preset storage position;
specific embodiments of step 102 may include:
segmenting the file content in the description information by using a preset word segmentation device to obtain at least one content keyword;
processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information;
and storing the description information into the index library by using a preset index creator to form the file index.
In this embodiment, a storage location where an index file is to be stored, for example, a disk a, is determined in a local file system, the storage location is determined as a storage location of an index library, and the index library is constructed at the storage location. And then constructing an index creator, wherein the index creator can create a file index, store the file index into the position of the index library and set the file index as an additional mode. Then, a word segmentation device, such as an IK word segmentation device, can be configured to construct a plurality of word banks, such as an extended word bank, a forbidden word bank and a synonym bank, and a dictionary corresponding to the word segmentation device, such as a dictionary of an IKAnalyzor, is adjusted through the constructed word banks. When creating the file index, creating corresponding document description according to the file type, and setting the content of the corresponding attribute field to form the description information of the file data, the specific content of which is shown in table 1.
TABLE 1
Attribute name Value of
fileName Filename
fileDataName Name of file upload object
content Document content
path File path
type File type
fileID File identifier
category Species of
createTime Creation time
top_directory Upper level directory
versionID Version number
The method comprises the steps of utilizing a word segmentation device to segment file contents in description information to form a plurality of content keywords, utilizing an adjusted dictionary to process the content keywords, wherein the content keywords comprise two words of 'happy' and 'happy', combining the words into 'happy', utilizing an extended word stock, and utilizing a synonym stock to determine synonyms of 'happy', such as 'happy' and 'happy'. And then writing the processed content key words into the description information to replace the original file content, and storing the replaced description information into an index library by using an index creator to form a file index corresponding to the file data. Therefore, all the file indexes are stored in the index database in a unified mode, only the storage positions of the index database are needed to be searched during searching, complexity of searching each disk is avoided, and data searching efficiency is further improved.
In one embodiment of the present invention, the method may further comprise:
receiving a file deletion request input by a user;
determining file data to be deleted from the at least one file data according to the file deletion request;
determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted;
and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
Specifically, to-be-deleted file data corresponding to the file deletion request may be determined from the acquired file data, the determined to-be-deleted file data may be deleted, to-be-deleted description information and the to-be-deleted file index corresponding to the to-be-deleted file data may be determined, and then the to-be-deleted file index and the to-be-deleted description information may be deleted by using the index creator. Therefore, when the file data is deleted, the corresponding file index is also deleted, the situation that the specific file data cannot be acquired through the file index is avoided, and the retrieval accuracy is improved.
It is worth mentioning that when the file data is moved or modified, the file index and the description information corresponding to the file can be deleted, new description information is generated according to the modified file data, and the file index of the modified file data is reestablished, so that when the file data is changed, the new file index can be automatically created, the file index can be synchronized with the file data, the accuracy of the file index can be ensured, and the retrieval accuracy can be improved.
In an embodiment of the present invention, after step 103, the method further includes:
acquiring a retrieval condition input by a user;
specific embodiments of step 105 may include:
and determining the target file index according to the retrieval conditions and the retrieval keywords.
Here, the user can customize the search conditions, such as search time, search file type, and concatenation relationship of each search keyword. Before the retrieval information of the user is obtained, weight values corresponding to the file names and the file contents in the search result sorting process can be preset, for example, the weight value of the file name is set to be larger than that of the file contents, after a plurality of file data corresponding to the retrieval information are retrieved, sorting is performed according to the relevance between the file names and the retrieval information, namely, the higher the weight value is, the higher the ranking of the file data is. In addition, an IK word segmentation device can be configured, and the search keywords are processed by utilizing a pre-established expansion word bank, a forbidden word bank and a synonym bank, so that the search accuracy is further improved.
Specifically, in an embodiment of the present invention, the determining the target file index according to the search condition and each search keyword includes:
determining alternative file indexes corresponding to the retrieval time from the file indexes according to the retrieval time carried by the retrieval condition and the creation time in the description information corresponding to each file index;
and determining a target file index corresponding to the retrieval key word from the determined alternative file indexes.
In this embodiment, the retrieval time range limited by the retrieval condition input by the user may be filtered according to createTime in the description information of each file data, that is, the creation time of the file index, for example, the retrieval time input by the user is 2017.10.1-2017.11.1, the file index with the creation time in this time period is used as the candidate file index, and the target file index corresponding to the retrieval key word is determined from these candidate file indexes, thereby further improving the accuracy of retrieval.
The method can be realized by at least the following programming languages:
Term begin=new Term("ctreateTime",dateBegin);
Term end=new Term("ctreateTime",dateEnd);
Query rangequery=new TermRangeQuery("ctreateTime",begin.bytes(),end.bytes(),true,true);
booleanQuery.add(rangequery,Occur.MUST).
in an embodiment of the present invention, the determining the target file index according to the search condition and each of the search keywords includes:
determining alternative file indexes corresponding to the retrieval file types from the file indexes according to the retrieval file types carried by the retrieval conditions and the created file types in the description information corresponding to the file indexes;
and determining a target file index corresponding to the retrieval key word from the determined alternative file indexes.
The user can set the retrieval time range and can also set the type of the retrieval file, for example, when the type of the retrieval file set by the user is Word, the document with doc and docx types is searched during retrieval, and other file types are similar, so that the retrieval accuracy can be further improved. It will be appreciated that when the user has no special settings for retrieving the file type, all file types may be retrieved by default. Specifically, the correspondence relationship between the type of the search file set by the user and the format of the file data is shown in table 2.
TABLE 2
File type Value of
All are All formats
Word doc、docx
PDF pdf
Excel xls、xlsx
TXT txt
PPT ppt、pptx
PICTURE bmp、jpg、jpeg、png、gif
VEDIO avi、wma、rmvb、mp4、flash、mp3、wav
In an embodiment of the present invention, the determining the target file index according to the search condition and each of the search keywords includes:
combining the search keywords according to the splicing relation carried in the search condition;
and determining the target file index according to the combined retrieval key words.
Here, in addition to setting the retrieval time and the type of the retrieved file, the user can set advanced retrieval, that is, selecting "and" or "none" through a drop-down box, and combining the respective retrieval keywords to splice the query conditions, wherein "and" is a file index satisfying the conditions simultaneously with the operation and retrieval; an OR operation, satisfying one; the 'not-containing' is not operation, and the file index meeting the following condition of the 'not-containing' is removed. It is understood that when the search condition set by the user includes a search time and a search file type, the combination relationship of the search may also be determined by setting the concatenation relationship thereof to splice different search conditions. Therefore, the user can define the retrieval conditions by self, the file index meeting the user requirement can be retrieved accurately, and the user experience is improved.
It is worth mentioning that after the corresponding object description information is presented, the user may be provided with a function of previewing and downloading the corresponding file data. For example, the search result includes files of Word, PDF, TXT, and the like, and after the user clicks a file, the file can be found through the file information in the attribute domain and cached in the browser to implement preview. The user can also find the file and download the file through the file information in the attribute domain by clicking a download button below the file, so that the user can conveniently obtain corresponding file data, and the user experience is further improved.
In addition, since Lucene is an open source library for full-text search and search, it is supported and provided by the Apache software foundation. Lucene provides a simple yet powerful application program interface that enables full-text indexing and searching. As a full-text search engine, the method has the following outstanding advantages: 1. the index file format is independent of the application platform. Lucene defines a set of index file formats based on 8-bit bytes, so that applications compatible with systems or different platforms can share the established index file. 2. On the basis of the inverted index of the traditional full-text retrieval engine, the block index is realized, the small file index can be established for a new file, and the indexing speed is increased. Then, the optimization is achieved by combining the index with the original index. 3. The excellent object-oriented system architecture reduces the learning difficulty of Lucene expansion and facilitates the expansion of new functions. 4. A text analysis interface independent of languages and file formats is designed, the indexer completes creation of an index file by receiving Token streams, and a user only needs to realize the interface of text analysis by expanding new languages and file formats. 5. A set of powerful query engines has been implemented by default, so that a user can obtain powerful query capability without writing codes by himself, and boolean operations, Fuzzy queries (Fuzzy Search), group queries, and the like are implemented by default in the Lucene query implementation. Furthermore, Lucene is a mature free open source tool in a Java development environment, has a cross-platform property, can provide a simple and easy-to-use tool kit for software developers, and establishes a full-text search engine more suitable for current application on the basis of the tool kit, so that a search system facing a Hadoop file system can be established on the basis of Lucene.
As shown in fig. 2, an embodiment of the present invention provides a full-text search system, including: an index construction unit 201, an acquisition unit 202 and a retrieval unit 203; wherein,
the index constructing unit 201 is configured to obtain at least one piece of file data, determine description information corresponding to each piece of file data, and construct a file index corresponding to each piece of file data according to the description information;
the acquiring unit 202 is configured to acquire search information input by a user, and parse at least one search keyword from the search information;
the retrieval unit 203 is configured to determine a target file index corresponding to the at least one retrieval key from the file indexes; and determining target description information corresponding to the target file index, and displaying the target description information.
In an embodiment of the present invention, the obtaining unit 202 is further configured to obtain a retrieval condition input by a user;
the retrieving unit 203 is configured to determine the target file index according to the retrieving condition and each retrieving keyword.
In an embodiment of the present invention, the retrieving unit 203 is configured to determine, according to the retrieval time carried by the retrieval condition and the creation time in the description information corresponding to each file index, an alternative file index corresponding to the retrieval time from each file index; and determining a target file index corresponding to the retrieval key word from the determined alternative file indexes.
In an embodiment of the present invention, the retrieving unit 203 is configured to determine, according to a retrieved file type carried by the retrieval condition and a created file type in the description information corresponding to each file index, an alternative file index corresponding to the retrieved file type from each file index; and determining a target file index corresponding to the retrieval key word from the determined alternative file indexes.
In an embodiment of the present invention, the retrieving unit 203 is configured to combine the retrieval keywords according to a splicing relationship carried in the retrieval condition; and determining the target file index according to the combined retrieval key words.
As shown in fig. 3, in an embodiment of the present invention, the system may further include: a setting unit 301; wherein,
the setting unit 301 is configured to build an index library at a preset storage location;
the index constructing unit 201 is configured to perform word segmentation on the file content in the description information by using a preset word segmentation device, so as to obtain at least one content keyword; processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information; and storing the description information into the index library by using a preset index creator to form the file index.
As shown in fig. 4. In one embodiment of the present invention, the system may further comprise: an index deletion unit 401; wherein,
the obtaining unit 302 is further configured to receive a file deletion request input by a user;
the index deleting unit 401 is configured to determine file data to be deleted from the at least one file data according to the file deleting request; determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted; and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.
An embodiment of the present invention further provides a readable medium, which includes an execution instruction, and when a processor of a storage controller executes the execution instruction, the storage controller executes a method provided in any one of the above embodiments of the present invention.
An embodiment of the present invention further provides a storage controller, including: a processor, a memory, and a bus; the memory is used for storing execution instructions, the processor is connected with the memory through the bus, and when the storage controller runs, the processor executes the execution instructions stored in the memory, so that the storage controller executes the method provided by any one of the above embodiments of the invention.
In summary, the above embodiments of the present invention have at least the following advantages:
1. in the embodiment of the invention, the file index corresponding to each file data is generated according to the description information of the acquired file data. When the retrieval information input by the user is acquired, the retrieval keywords are analyzed from the retrieval information, the target file indexes corresponding to the retrieval keywords are determined, and then the target description information corresponding to the target file indexes is displayed. Therefore, automatic retrieval of each file data is realized, and target data is not required to be retrieved by searching the storage devices one by one, so that the data retrieval efficiency is improved.
2. In the embodiment of the invention, an index base is constructed at a preset storage position, and then description information is stored in the index base by using an index creator to form a file index. Therefore, all the file indexes are stored in the index database in a unified mode, only the storage positions of the index database are needed to be searched during searching, complexity of searching each disk is avoided, and data searching efficiency is further improved.
3. In the embodiment of the invention, when a file deletion request input by a user is received, file data to be deleted corresponding to the file deletion request is determined from the acquired file data, the determined file data to be deleted is deleted, description information to be deleted and a file index to be deleted, which correspond to the file data to be deleted, are determined, and then the file index to be deleted and the description information to be deleted are deleted by using an index creator. Therefore, when the file data is deleted, the corresponding file index is also deleted, the situation that the specific file data cannot be acquired through the file index is avoided, and the retrieval accuracy is improved.
4. In the embodiment of the invention, when the file data is moved or modified, the file index and the description information corresponding to the file are deleted, new description information is generated according to the modified file data, and the file index of the modified file data is reestablished, so that when the file data is changed, the new file index can be automatically created, the synchronization with the file data is realized, the accuracy of the file index is ensured, and the retrieval accuracy is improved.
5. In the embodiment of the invention, the user-defined search conditions comprise search time, search file types, the splicing relation between each search condition and a search keyword and the like. Therefore, the file index meeting the user requirements can be accurately searched, and the user experience is improved.
6. In the embodiment of the invention, after the corresponding target description information is displayed, the functions of previewing and downloading the corresponding file data can be provided for the user. Therefore, the user can conveniently obtain the corresponding file data, and the user experience is further improved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A full-text search method, comprising:
acquiring at least one file data, and determining description information corresponding to each file data;
according to the description information, constructing a file index corresponding to each file data;
acquiring retrieval information input by a user;
analyzing at least one retrieval keyword from the retrieval information;
determining a target file index corresponding to the at least one retrieval key word from each file index;
and determining target description information corresponding to the target file index, and displaying the target description information.
2. The method of claim 1,
after the obtaining of the retrieval information input by the user, further comprising:
acquiring a retrieval condition input by a user;
the determining a target file index corresponding to the at least one search keyword from each file index includes:
and determining the target file index according to the retrieval conditions and the retrieval keywords.
3. The method of claim 2,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
determining alternative file indexes corresponding to the retrieval time from the file indexes according to the retrieval time carried by the retrieval condition and the creation time in the description information corresponding to each file index;
determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
and/or the presence of a gas in the gas,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
determining alternative file indexes corresponding to the retrieval file types from the file indexes according to the retrieval file types carried by the retrieval conditions and the created file types in the description information corresponding to the file indexes;
determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
and/or the presence of a gas in the gas,
the determining the target file index according to the retrieval conditions and the retrieval keywords comprises:
combining the search keywords according to the splicing relation carried in the search condition;
and determining the target file index according to the combined retrieval key words.
4. The method of claim 1,
further comprising: constructing an index library at a preset storage position;
the constructing of the file index corresponding to each file data according to the description information includes:
segmenting the file content in the description information by using a preset word segmentation device to obtain at least one content keyword;
processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information;
and storing the description information into the index library by using a preset index creator to form the file index.
5. The method of claim 4,
further comprising:
receiving a file deletion request input by a user;
determining file data to be deleted from the at least one file data according to the file deletion request;
determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted;
and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
6. A full-text retrieval system, comprising: the index retrieval system comprises an index construction unit, an acquisition unit and a retrieval unit; wherein,
the index construction unit is used for acquiring at least one file data, determining description information corresponding to each file data, and constructing a file index corresponding to each file data according to the description information;
the acquisition unit is used for acquiring retrieval information input by a user and analyzing at least one retrieval keyword from the retrieval information;
the retrieval unit is used for determining a target file index corresponding to the at least one retrieval key word from each file index; and determining target description information corresponding to the target file index, and displaying the target description information.
7. The system of claim 6,
the acquisition unit is further used for acquiring a retrieval condition input by a user;
and the retrieval unit is used for determining the target file index according to the retrieval conditions and the retrieval keywords.
8. The system of claim 7,
the retrieval unit is configured to determine, according to retrieval time carried by the retrieval condition and creation time in the description information corresponding to each file index, an alternative file index corresponding to the retrieval time from each file index; determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
and/or the presence of a gas in the gas,
the retrieval unit is configured to determine, according to a retrieval file type carried by the retrieval condition and a created file type in the description information corresponding to each file index, an alternative file index corresponding to the retrieval file type from each file index; determining a target file index corresponding to the retrieval key word from the determined alternative file indexes;
and/or the presence of a gas in the gas,
the retrieval unit is used for combining the retrieval keywords according to the splicing relation carried in the retrieval condition; and determining the target file index according to the combined retrieval key words.
9. The system of claim 6,
further comprising: a setting unit; wherein,
the setting unit is used for constructing an index library at a preset storage position;
the index construction unit is used for segmenting the file content in the description information by using a preset word segmentation device to obtain at least one content keyword; processing the at least one content keyword by using a preset dictionary corresponding to the word segmentation device, and writing the processed content keyword into the description information; and storing the description information into the index library by using a preset index creator to form the file index.
10. The system of claim 9,
further comprising: an index deletion unit; wherein,
the acquisition unit is further used for receiving a file deletion request input by a user;
the index deleting unit is used for determining file data to be deleted from the at least one file data according to the file deleting request; determining description information to be deleted and an index of the file to be deleted corresponding to the file to be deleted; and deleting the description information to be deleted and the file index to be deleted from the index database by using the index creator.
CN201711441728.8A 2017-12-27 2017-12-27 A kind of text searching method and system Pending CN108255972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711441728.8A CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711441728.8A CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Publications (1)

Publication Number Publication Date
CN108255972A true CN108255972A (en) 2018-07-06

Family

ID=62724110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711441728.8A Pending CN108255972A (en) 2017-12-27 2017-12-27 A kind of text searching method and system

Country Status (1)

Country Link
CN (1) CN108255972A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299466A (en) * 2018-10-22 2019-02-01 中国船舶工业综合技术经济研究院 A kind of document retrieval method and system towards science and techniques of defence field
CN109902150A (en) * 2019-02-25 2019-06-18 南京庚商网络信息技术有限公司 Unstructured digital resource text searching method and system
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN110598009A (en) * 2019-09-12 2019-12-20 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN111026712A (en) * 2019-11-04 2020-04-17 厦门天锐科技股份有限公司 File uploading method and device, file querying method and device and electronic equipment
CN111581410A (en) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 Image retrieval method, apparatus, medium, and system thereof
CN111680072A (en) * 2020-05-07 2020-09-18 国家计算机网络与信息安全管理中心 Social information data-based partitioning system and method
CN113553354A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 Row number fuzzy query method and system based on specific word bank
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN117033307A (en) * 2023-10-07 2023-11-10 北京天信瑞安信息技术有限公司 File indexing method, device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391941A (en) * 2014-11-25 2015-03-04 浪潮电子信息产业股份有限公司 Method for rapidly establishing full-text retrieval tool for common files
CN105279150A (en) * 2015-10-27 2016-01-27 江苏电力信息技术有限公司 Lucene full-text retrieval based Chinese word segmentation method
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391941A (en) * 2014-11-25 2015-03-04 浪潮电子信息产业股份有限公司 Method for rapidly establishing full-text retrieval tool for common files
CN105574062A (en) * 2015-07-01 2016-05-11 宇龙计算机通信科技(深圳)有限公司 File retrieval method and apparatus and terminal
CN105279150A (en) * 2015-10-27 2016-01-27 江苏电力信息技术有限公司 Lucene full-text retrieval based Chinese word segmentation method

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299466B (en) * 2018-10-22 2023-07-07 中国船舶工业综合技术经济研究院 Document retrieval method and system oriented to national defense science and technology field
CN109299466A (en) * 2018-10-22 2019-02-01 中国船舶工业综合技术经济研究院 A kind of document retrieval method and system towards science and techniques of defence field
CN109902150A (en) * 2019-02-25 2019-06-18 南京庚商网络信息技术有限公司 Unstructured digital resource text searching method and system
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110516157A (en) * 2019-08-30 2019-11-29 盈盛智创科技(广州)有限公司 A kind of document retrieval method, equipment and storage medium
CN110598009A (en) * 2019-09-12 2019-12-20 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN110598009B (en) * 2019-09-12 2022-04-22 北京达佳互联信息技术有限公司 Method and device for searching works, electronic equipment and storage medium
CN111026712A (en) * 2019-11-04 2020-04-17 厦门天锐科技股份有限公司 File uploading method and device, file querying method and device and electronic equipment
CN111680072A (en) * 2020-05-07 2020-09-18 国家计算机网络与信息安全管理中心 Social information data-based partitioning system and method
CN111680072B (en) * 2020-05-07 2023-12-08 国家计算机网络与信息安全管理中心 System and method for dividing social information data
CN111581410A (en) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 Image retrieval method, apparatus, medium, and system thereof
CN111581410B (en) * 2020-05-29 2023-11-14 上海依图网络科技有限公司 Image retrieval method, device, medium and system thereof
CN113553354A (en) * 2021-07-23 2021-10-26 中信银行股份有限公司 Row number fuzzy query method and system based on specific word bank
CN113553354B (en) * 2021-07-23 2024-08-23 中信银行股份有限公司 Fuzzy inquiry method and system for line numbers based on specific word bank
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN113987146B (en) * 2021-10-22 2023-01-31 国网江苏省电力有限公司镇江供电分公司 Dedicated intelligent question-answering system of electric power intranet
CN117033307A (en) * 2023-10-07 2023-11-10 北京天信瑞安信息技术有限公司 File indexing method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN108255972A (en) A kind of text searching method and system
JP6006267B2 (en) System and method for narrowing a search using index keys
US7130867B2 (en) Information component based data storage and management
US7788262B1 (en) Method and system for creating context based summary
US8055674B2 (en) Annotation framework
US8090708B1 (en) Searching indexed and non-indexed resources for content
US20160098405A1 (en) Document Curation System
US9251130B1 (en) Tagging annotations of electronic books
US20140114942A1 (en) Dynamic Pruning of a Search Index Based on Search Results
KR100930455B1 (en) Method and system for generating search collection by query
CN107085583B (en) Electronic document management method and device based on content
EP1716511A1 (en) Intelligent search and retrieval system and method
US20150154306A1 (en) Method for searching related entities through entity co-occurrence
US10678820B2 (en) System and method for computerized semantic indexing and searching
CN102810114A (en) Personal computer resource management system based on body
US9411880B2 (en) System and method for dynamically configuring content-driven relationships among data elements
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN110889023A (en) Distributed multifunctional search engine of elastic search
CN110674087A (en) File query method and device and computer readable storage medium
CN114328983A (en) Document fragmenting method, data retrieval device and electronic equipment
US20110252313A1 (en) Document information selection method and computer program product
CN114117242A (en) Data query method and device, computer equipment and storage medium
JPH0550774B2 (en)
US7949656B2 (en) Information augmentation method
US9886497B2 (en) Indexing presentation slides

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706

RJ01 Rejection of invention patent application after publication