CN116775830A - Online technical document searching method, device and medium - Google Patents

Online technical document searching method, device and medium Download PDF

Info

Publication number
CN116775830A
CN116775830A CN202310729187.8A CN202310729187A CN116775830A CN 116775830 A CN116775830 A CN 116775830A CN 202310729187 A CN202310729187 A CN 202310729187A CN 116775830 A CN116775830 A CN 116775830A
Authority
CN
China
Prior art keywords
document
search
target
index
guide information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310729187.8A
Other languages
Chinese (zh)
Inventor
亓文豪
周祥国
毛瑞雪
乔东旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN202310729187.8A priority Critical patent/CN116775830A/en
Publication of CN116775830A publication Critical patent/CN116775830A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an online technical document searching method, equipment and medium, wherein the method comprises the following steps: aiming at a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name; acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request; generating a search sentence according to the search keyword and the document warehouse information carried in the document search request; and matching the search statement with target guide information corresponding to the plurality of document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.

Description

Online technical document searching method, device and medium
Technical Field
The application relates to the technical field of document searching, in particular to an online technical document searching method, device and medium.
Background
Markdown is a lightweight markup language by which md technical documents written in a format are widely used in API writing. However, with the continuous expansion of the product scale and the update iteration of the product version, the technical documents are continuously increased, and the increasingly huge technical document volume puts higher demands on the full text retrieval performance of the technical documents. At present, most online technical documents are stored in a distributed mode by library and version, and the conventional traversal search mode can make the online technical documents search with lower efficiency when facing to huge volumes of technical documents.
Disclosure of Invention
In order to solve the above problems, the present application proposes an online technical document searching method, comprising:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
In one implementation of the present application, before obtaining the guide information corresponding to the document repository from the distributed search engine, the method further includes:
determining whether historical guide information corresponding to the document warehouse exists in the distributed search engine; wherein the history guidance information comprises a history document index and a history document warehouse name;
if the document exists, generating a document index corresponding to the document warehouse, taking the history document warehouse name as a target document warehouse name of the document warehouse, and establishing a first mapping relation between the document index and the target document warehouse name;
if not, generating a corresponding document index and a target document warehouse name for the document warehouse.
In one implementation manner of the present application, the updating of the guiding information to obtain the target guiding information corresponding to the document warehouse specifically includes:
determining a directory hierarchy description corresponding to the updated online technical document; wherein the target hierarchy describes a physical storage address used to characterize the online technical document;
generating document path information corresponding to the online technical document according to the target document warehouse name and the directory hierarchy description corresponding to the online technical document;
and updating the document index corresponding to the target document warehouse name according to the document path information to obtain a target document index.
In one implementation of the present application, after updating the document index, the method further includes:
and releasing the first mapping relation between the document index and the target document warehouse name, and reestablishing the second mapping relation between the target document index and the target document warehouse name.
In one implementation of the present application, the document search request is analyzed to identify search keywords in the document search request, and the method specifically includes:
performing word segmentation on the document search request, and performing feature recognition on the obtained word segmentation result to screen out a plurality of search words from the word segmentation result;
and determining the search quantity of the search words in the distributed search engine, wherein the search quantity is larger than a preset value, and the search words are used as search keywords.
In one implementation of the present application, matching a search sentence with target guide information corresponding to each of a plurality of document warehouses to determine a specified document index corresponding to the search sentence, and obtaining an online technical document corresponding to a document search request through the specified document index, specifically including:
determining a designated document warehouse name required by the document search request and designated target guide information matched with the designated document warehouse name;
acquiring a plurality of documents to be retrieved under the names of the specified document warehouse according to the specified target document index in the specified target guide information;
comparing each search keyword with the document to be searched in sequence aiming at each document to be searched to determine the matching weight of the search keywords in the document to be searched;
and arranging the documents to be searched according to the sequence of the matching quantity from large to small to obtain a corresponding document sequence to be searched, and sending the document sequence to be searched to the client so that the client screens out the online technical documents corresponding to the document search request from the document sequence to be searched.
In one implementation manner of the present application, determining the matching weight of the search keyword in the document to be searched specifically includes:
according to the search quantity corresponding to the search keywords, giving a first weight corresponding to the search quantity to the search keywords;
determining the matching quantity of the search keywords in the document to be searched, and giving corresponding second weights to the search keywords according to the matching quantity;
and adding the first weight and the second weight to obtain a matching weight corresponding to the search keyword.
In one implementation of the present application, before obtaining the guide information corresponding to the document repository from the distributed search engine, the method further includes:
fragmenting the document index, and storing a plurality of fragmented document indexes into designated nodes of a distributed search engine cluster;
and generating at least one index copy corresponding to the document index fragment aiming at each document index fragment, and storing the index copy into other nodes except the appointed node in the distributed search engine.
An embodiment of the present application provides an online technical document searching apparatus, which is characterized by comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
An embodiment of the present application provides a nonvolatile computer storage medium storing computer executable instructions, wherein the computer executable instructions are configured to:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
The online technical document searching method provided by the application has the following beneficial effects:
the on-line technical documents in the document warehouse and the document warehouse are positioned through the target guide information, so that the on-line technical documents matched with the search keywords can be quickly searched in a multi-warehouse multi-catalog document search scene according to the search keywords in the user search request, and the search efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic flow chart of an online technical document searching method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an online technical document searching apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
As shown in fig. 1, an online technical document searching method provided by an embodiment of the present application includes:
s101: aiming at a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; wherein the target guide information includes a target document index and a target document repository name.
The online technical documents are stored in separate repositories, and different versions of different types of online technical documents can be stored in multiple document repositories in a decentralized manner. In order to improve the searching efficiency of the technical document, an elastomer search distributed search engine can be used as a real-time searching and analyzing engine of the technical document, the structured or unstructured data can be efficiently stored and indexed, the elastomer search is distributed, and the data loss can be effectively prevented by correctly configuring fragments and copies in an elastomer search cluster. The embodiment of the application can realize the full text retrieval requirement of the online technical document by reprocessing the elastic search interface.
After the server is linked to the elastic search cluster, the number of document index fragments and the number of copies can be determined according to the node information in the cluster, and at least one index copy exists in one document index fragment in general. After the document index is fragmented, storing a plurality of fragmented document index fragments into designated nodes of the distributed search engine cluster, generating at least one index copy corresponding to the document index fragments for each document index fragment, and storing the index copies into other nodes except the designated nodes in the distributed search engine. It should be noted that, the index copy and the corresponding document index fragment thereof need to be stored in different nodes, so that when a certain document index fragment is in error, normal search of technical documents can be realized by calling the index copy on other nodes.
Technical documents are stored in a document warehouse, and if the technical documents are obtained through traditional traversal search, a large amount of computer resources are consumed, so that the search efficiency is reduced. In the embodiment of the application, the elastic search stores the guide information of different file warehouses. The guide information comprises a document index and a document warehouse name, wherein the document warehouse name can be used for locating a specific document warehouse, and the document index can be used for locating each online technical document stored in the document warehouse. Therefore, to search for documents in an online technology, it is necessary to ensure that guide information of a document repository is stored in a distributed search engine.
Specifically, the server needs to determine whether the history guidance information corresponding to the document warehouse exists in the distributed search engine; wherein the history guidance information includes a history document index and a history document repository name. If the history guide information exists, only a new document index is needed to be created, namely, the document index corresponding to the document warehouse is generated, the history document warehouse name is used as a target document warehouse name of the document warehouse, and a first mapping relation between the document index and the target document warehouse name is established. If not, at the moment, the corresponding document index and the target document warehouse name are generated aiming at the document warehouse.
After the guide information exists in the distributed search engine, if the online technical document is updated, the guide information stored in the distributed search engine needs to be updated while the online technical document is updated, so that the instantaneity of the guide information is ensured. The document repository name is not changed along with the update of the technical document, so that the document index is only updated, and the technical document is uniformly considered to be updated no matter whether the storage path is changed, deleted or newly added. The server needs to determine the directory hierarchy description corresponding to the online technical document; the target level description is used for representing the physical storage address of the online technical document, for example, the physical storage address of a certain online technical document is C \Users\Administrator\ … … map.md, which is the directory level description of the online technical document. After the directory level description is determined, generating document path information corresponding to the online technical document according to the directory level description and the target document warehouse name of the document warehouse where the online technical document is located, wherein the document path information can be used for positioning the technical document updated at the time. Then, according to the document path information, a document warehouse where the current updated online technical document is located can be determined, and further the document index of the document warehouse in the distributed storage engine is updated to obtain an updated target document index.
When updating the guide information, it is necessary to cancel the first mapping relationship between the original document index and the target document repository name, delete the original document index, and reestablish the second mapping relationship between the target document index and the target document repository name. Thus, the accuracy of the search results can be ensured by the latest target document index.
S102: and acquiring a document search request sent by the client, and analyzing the document search request to identify search keywords in the document search request.
After a user sends a document searching request to a server through a client, the server can acquire the document searching request and analyze the document searching request to identify a searching keyword. It should be noted that, the distributed search engine stores historical search data of a plurality of users, the search keywords are popular search words identified according to the historical search data of different users, and when searching documents, the search difficulty can be greatly reduced through the vocabulary with higher search frequency.
When identifying search keywords, firstly, the document search request is subjected to word segmentation, and the obtained word segmentation result is subjected to feature identification so as to screen out a plurality of search words from the word segmentation result. The feature recognition is used for removing semantic irrelevant language words, connective words and the like in the word segmentation result. After screening a plurality of search words, determining the search numbers of the search words corresponding to the search words in the distributed search engine respectively, and taking the search words with the search numbers larger than a preset value as search keywords.
S103: and generating search sentences according to the search keywords and the document warehouse information carried in the document search request.
Because the embodiment of the application is based on the document retrieval carried out by the elastic search, after the search keywords are determined, the server needs to generate specific search sentences in the elastic search according to the search keywords and the document warehouse information carried in the document search request. The document repository information includes information such as document repository names, document versions, and the like.
S104: and matching the search statement with target guide information corresponding to the plurality of document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
After the server matches the search statement with the target guide information corresponding to the plurality of document warehouses, the specified document warehouse name required by the document search request and the specified target guide information matched with the specified document warehouse name can be determined according to the search statement. And after the specified target guide information is obtained, acquiring a plurality of documents to be searched under the names of the specified document warehouse according to the specified target document index in the specified target guide information. And comparing each search keyword with the documents to be searched in sequence for each document to be searched to determine the matching weight of the search keywords in the documents to be searched, and then arranging the documents to be searched according to the sequence of the matching weights from large to small to obtain a corresponding document sequence to be searched. After the server feeds back the document sequence to be searched to the client, the client can acquire the online technical document required by the user from a plurality of documents to be searched provided in the document sequence to be searched.
It should be noted that, on the one hand, the matching weight reflects the importance degree of different search keywords, and the larger the number of searches of each search keyword searched by different users together, the higher the occurrence frequency of the vocabulary is, the higher the importance degree is, and the first weight corresponding to the number of searches can be given to the search keywords according to the number of searches corresponding to the search keywords. Wherein the first weight is positively correlated with the number of searches. On the other hand, the matching weight reflects the coincidence degree of the document to be searched and the search keywords, and the larger the number of matched words in the document to be searched and different search keywords is, the more closely the document to be searched is to the search request of the user, so that the corresponding second weight can be assigned to the document to be searched according to the matching number of the search keywords. Wherein the second weight is positively correlated with the number of matches. After the first weight and the second weight are obtained, the first weight and the second weight are added, and finally, the matching weight corresponding to the search keyword can be obtained.
The above is a method embodiment of the present application. Based on the same thought, some embodiments of the present application also provide a device and a non-volatile computer storage medium corresponding to the above method.
Fig. 2 is a schematic structural diagram of an online technical document searching apparatus according to an embodiment of the present application. As shown in fig. 2, includes:
at least one processor; the method comprises the steps of,
at least one processor in communication with the memory; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
aiming at a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the plurality of document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
The embodiment of the application provides a nonvolatile computer storage medium, which stores computer executable instructions, wherein the computer executable instructions are configured to:
aiming at a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the plurality of document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.
The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. An online technical document searching method, the method comprising:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
2. The method of claim 1, wherein before obtaining the guide information corresponding to the document repository from the distributed search engine, the method further comprises:
determining whether historical guide information corresponding to the document warehouse exists in the distributed search engine; wherein the history guidance information comprises a history document index and a history document warehouse name;
if the document exists, generating a document index corresponding to the document warehouse, taking the history document warehouse name as a target document warehouse name of the document warehouse, and establishing a first mapping relation between the document index and the target document warehouse name;
if not, generating a corresponding document index and a target document warehouse name for the document warehouse.
3. The method for searching for online technical documents according to claim 2, wherein updating the guiding information to obtain the target guiding information corresponding to the document repository comprises:
determining a directory hierarchy description corresponding to the updated online technical document; wherein the target hierarchy describes a physical storage address used to characterize the online technical document;
generating document path information corresponding to the online technical document according to the target document warehouse name and the directory hierarchy description corresponding to the online technical document;
and updating the document index corresponding to the target document warehouse name according to the document path information to obtain a target document index.
4. The online technical document searching method of claim 2, wherein after updating the document index, the method further comprises:
and releasing the first mapping relation between the document index and the target document warehouse name, and reestablishing the second mapping relation between the target document index and the target document warehouse name.
5. The online technical document searching method according to claim 1, wherein analyzing the document searching request to identify the search keyword in the document searching request comprises:
performing word segmentation on the document search request, and performing feature recognition on the obtained word segmentation result to screen out a plurality of search words from the word segmentation result;
and determining the search quantity of the search words in the distributed search engine, wherein the search quantity is larger than a preset value, and the search words are used as search keywords.
6. The method for searching for online technical documents according to claim 1, wherein the matching of the search statement with the target guide information corresponding to each of the plurality of document warehouses determines a designated document index corresponding to the search statement, and obtains the online technical document corresponding to the document search request by the designated document index, comprises:
determining a designated document warehouse name required by the document search request and designated target guide information matched with the designated document warehouse name;
acquiring a plurality of documents to be retrieved under the names of the specified document warehouse according to the specified target document index in the specified target guide information;
comparing each search keyword with the document to be searched in sequence aiming at each document to be searched to determine the matching weight of the search keywords in the document to be searched;
and arranging the documents to be searched according to the sequence of the matching quantity from large to small to obtain a corresponding document sequence to be searched, and sending the document sequence to be searched to the client so that the client screens out the online technical documents corresponding to the document search request from the document sequence to be searched.
7. The method for searching for an online technical document according to claim 6, wherein determining the matching weight of the search keyword in the document to be searched comprises:
according to the search quantity corresponding to the search keywords, giving a first weight corresponding to the search quantity to the search keywords;
determining the matching quantity of the search keywords in the document to be searched, and giving corresponding second weights to the search keywords according to the matching quantity;
and adding the first weight and the second weight to obtain a matching weight corresponding to the search keyword.
8. The method of claim 2, wherein before obtaining the guide information corresponding to the document repository from the distributed search engine, the method further comprises:
fragmenting the document index, and storing a plurality of fragmented document indexes into designated nodes of a distributed search engine cluster;
and generating at least one index copy corresponding to the document index fragment aiming at each document index fragment, and storing the index copy into other nodes except the appointed node in the distributed search engine.
9. An online technical document searching apparatus, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
10. A non-transitory computer storage medium storing computer-executable instructions, the computer-executable instructions configured to:
for a plurality of document warehouses for storing online technical documents, acquiring guide information corresponding to the document warehouses from a distributed search engine, and updating the guide information according to the online technical documents to obtain target guide information corresponding to the document warehouses; the target guide information comprises a target document index and a target document warehouse name;
acquiring a document search request sent by a client, and analyzing the document search request to identify search keywords in the document search request;
generating a search sentence according to the search keyword and the document warehouse information carried in the document search request;
and matching the search statement with target guide information corresponding to the document warehouses respectively to determine a designated document index corresponding to the search statement, and acquiring an online technical document corresponding to the document search request through the designated document index.
CN202310729187.8A 2023-06-19 2023-06-19 Online technical document searching method, device and medium Pending CN116775830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310729187.8A CN116775830A (en) 2023-06-19 2023-06-19 Online technical document searching method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310729187.8A CN116775830A (en) 2023-06-19 2023-06-19 Online technical document searching method, device and medium

Publications (1)

Publication Number Publication Date
CN116775830A true CN116775830A (en) 2023-09-19

Family

ID=87988982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310729187.8A Pending CN116775830A (en) 2023-06-19 2023-06-19 Online technical document searching method, device and medium

Country Status (1)

Country Link
CN (1) CN116775830A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117807280A (en) * 2024-02-29 2024-04-02 山东佰泰丰信息科技有限公司 Silence automatic triggering type document collection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117807280A (en) * 2024-02-29 2024-04-02 山东佰泰丰信息科技有限公司 Silence automatic triggering type document collection method
CN117807280B (en) * 2024-02-29 2024-05-03 山东佰泰丰信息科技有限公司 Silence automatic triggering type document collection method

Similar Documents

Publication Publication Date Title
CN111522816B (en) Data processing method, device, terminal and medium based on database engine
KR100981857B1 (en) System and method for scoping searches using index keys
US11449564B2 (en) System and method for searching based on text blocks and associated search operators
US7376650B1 (en) Method and system for redirecting a request using redirection patterns
RU2733482C2 (en) Method and system for updating search index database
CN110889023A (en) Distributed multifunctional search engine of elastic search
CN116775830A (en) Online technical document searching method, device and medium
CN115269631A (en) Data query method, data query system, device and storage medium
US20170124090A1 (en) Method of discovering and exploring feature knowledge
US10606805B2 (en) Object-level image query and retrieval
CN111625728B (en) Method, device, equipment and medium for generating retrieval catalog from webpage document
US9223833B2 (en) Method for in-loop human validation of disambiguated features
CN117033744A (en) Data query method and device, storage medium and electronic equipment
CN116361287A (en) Path analysis method, device and system
CN115794861A (en) Offline data query multiplexing method based on feature abstract and application thereof
US20230082446A1 (en) Compound predicate query statement transformation
CN114372083A (en) Metadata analysis method and device
CN114817293A (en) Data query method and system based on distributed SQL
CN116348868A (en) Metadata indexing for information management
CN116431756B (en) Method, equipment and medium for highlighting search text based on Vue
CN110908998B (en) Data storage and search method, system and computer readable storage medium
CN110928896A (en) Data query method and device
CN114841153B (en) Address segmentation updating method and device
CN111539208B (en) Sentence processing method and device, electronic device and readable storage medium
CN116166636A (en) Data migration method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination