WO2023236257A1

WO2023236257A1 - Document search platform, search method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023236257A1
Application number: PCT/CN2022/100921
Authority: WO
Inventors: 邬丹琳; 张勇; 周津; 杜文杰
Original assignee: 来也科技(北京)有限公司
Priority date: 2022-06-07
Filing date: 2022-06-23
Publication date: 2023-12-14
Also published as: CN114936269A

Abstract

Disclosed in the present disclosure are a document search platform, a search method and apparatus, an electronic device, and a storage medium, the method comprising: acquiring a document to be processed, said document having a corresponding document type; acquiring tag information corresponding to said document; according to the tag information and said document, constructing a target document library corresponding to the document type; and according to the target document library, forming a target document search platform. Since the target document search platform is formed according to the target document library corresponding to the document type, on the basis of the target document library of the corresponding document type, the constructed target document search platform can supply to different service scenarios document search services of corresponding document types. Therefore, the reusability of a document search platform is effectively improved, and the constructed document search platform can effectively satisfy requirements of document search in different service scenarios. In addition, the present disclosure can achieve IA of constructing a document search platform in combination with RPA and AI, thereby further reducing the labor cost.

Description

Document search platform, search method, device, electronic device and storage medium

Cross-references to related applications

This application is filed based on a Chinese patent application with application number 202210637112.2 and a filing date of June 7, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

Technical field

The present disclosure relates to the technical field of computer technology, and in particular, to a document search platform, search method, device, electronic device and storage medium.

Background technique

Robotic Process Automation (RPA) refers to the use of specific "robot software" to simulate human operations on a computer and automatically execute process tasks according to rules. Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

In related technologies, in order to meet the document search requirements of corresponding business scenarios, a document search platform is usually constructed based on all documents in the corresponding business scenarios. For different business scenarios, it is usually necessary to build multiple different document search platforms.

In this way, the built document search platform cannot meet the document search needs in different business scenarios, resulting in the built document search platform being unable to be reused in different business scenarios.

Contents of the invention

Embodiments of the present disclosure provide a document search platform construction method, document search method, device, electronic device and storage medium to solve problems existing in related technologies. The technical solutions are as follows:

In the first aspect, the construction method of a document search platform proposed by the embodiment of the present disclosure includes: obtaining a document to be processed, where the document to be processed has a corresponding document type; obtaining tag information corresponding to the document to be processed; and based on the tag information and the document to be processed. Process documents and build a target document library corresponding to the document type; and form a target document search platform based on the target document library.

In one implementation, obtaining tag information corresponding to the document to be processed includes: determining the parent tag corresponding to the document to be processed; parsing the child tag corresponding to the parent tag from the document to be processed; and combining the parent tag and the child tag. Together as label information.

In one implementation, parsing the child tags corresponding to the parent tag from the document to be processed includes: calling a natural language processing NLP service in the field of artificial intelligence AI to identify the child tag corresponding to the parent tag from the document to be processed. Document universal index, and use the document universal index as the sub-tag; and/or call the NLP service, identify the associated entity value corresponding to the parent tag from the document to be processed, and associate the Entity value as the child tag.

In one implementation, building a target document library corresponding to the document type based on the tag information and the document to be processed includes: calling a robotic process automation RPA robot to determine the initial document library corresponding to the document type; combining the tag information and the document to be processed Documents are stored in the initial document library to form the target document library.

In one implementation, storing the tag information and the document to be processed in the initial document library includes: obtaining the target loading type corresponding to the document to be processed; using the target document storage method corresponding to the target loading type to store the tag information and the document to be processed. Process documents are stored in the initial document library.

In one implementation, the target document storage method corresponding to the target loading type is used to store the tag information and the document to be processed in the initial document library, including: if the target loading type is a document loading type, then storing the document to be processed and the corresponding document The document tag information is stored in the target document library; and/or if the target load type is a link load type, the access link and corresponding tag information corresponding to the document to be processed are stored in the target document library; and/or if the target load type is a rich For the text loading type, the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.

In one implementation, after determining the parent tag corresponding to the document to be processed, the method further includes: configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.

In a second aspect, the document search method proposed by the embodiment of the present disclosure is applied to a document search platform. The document search platform is constructed by the construction method of the document search platform in the first aspect, wherein the document search method includes: receiving a document search request. , and parses the requirement document type and requirement tag information from the document search request, and then determines the target document library corresponding to the requirement document type from multiple document libraries. Among them, multiple document libraries belong to the document search platform, and the document library is used for storage Documents of corresponding document types; search for target documents corresponding to the requirement tag information from the target document library.

In one implementation, the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document; wherein, Searching the target document library for the target document corresponding to the requirement tag information includes: calling a natural language processing NLP service in the field of artificial intelligence to process the requirement attribute to determine the target parent tag from the plurality of parent tags, where, The target parent tag has a corresponding target sub-tag; the target document is searched from the target document library according to the requirement attribute, the requirement sub-tag, and the target sub-tag.

In one implementation, the target document library includes: multiple documents; wherein, searching the target document from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags includes: calling a robotic process automation RPA robot to search according to The demand sub-tag and the target sub-tag search for documents to be filtered from a plurality of the documents; according to the demand attributes, a target document is obtained from the plurality of documents to be filtered.

In one implementation, searching multiple documents to be filtered from the target document library according to the demand sub-tag and the target sub-tag includes: determining the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity If the value meets the set conditions, the document corresponding to the corresponding target sub-tag will be used as the document to be filtered.

In the third aspect, the construction device of the document search platform proposed by the embodiment of the present disclosure includes: a first acquisition module, used to acquire documents to be processed, where the documents to be processed have corresponding document types; and a second acquisition module, used to acquire Tag information corresponding to the document to be processed; a building module used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and a forming module used to form a target document search platform based on the target document library.

In one implementation, the second acquisition module includes: a first determination sub-module, used to determine the parent tag corresponding to the document to be processed; and a parsing sub-module, used to parse the document to be processed to obtain the parent tag corresponding to the document. Sub-tags; and processing sub-modules, used to use parent tags and sub-tags together as tag information.

In one implementation, the parsing sub-module is also used to: call the natural language processing NLP service in the field of artificial intelligence, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document The document universal index is used as the sub-tag; and/or the NLP service is called, the associated entity value corresponding to the parent tag is identified from the document to be processed, and the associated entity value is used as the sub-tag.

In one implementation, the building module includes: a second determination sub-module, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module, used to combine the tag information and the document to be processed Store to the initial document library to form the target document library.

In one implementation, the storage submodule is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed into the initial document library .

In one implementation, the storage submodule is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through the rich text editor, and The editing processing results and corresponding label information are stored in the target document library.

In one implementation, the second acquisition module further includes: a configuration sub-module, configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.

In one implementation, the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).

A fourth aspect is a document search device proposed by an embodiment of the present disclosure. The document search device is constructed from the document search platform construction device included in the third aspect. The document search device includes: a receiving module for receiving a document search request; The parsing module is used to parse the demand document type and demand tag information from the document search request; the determination module is used to determine the target document library corresponding to the demand document type from multiple document libraries, wherein the multiple document libraries belong to the document search On the platform, the document library is used to store documents of corresponding document types; the search module is used to search for target documents corresponding to the requirement tag information from the target document library.

In one implementation, the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document; wherein, The search module includes: a third determination sub-module, which is used to call the natural language processing NLP service in the artificial intelligence field to process the demand attribute to determine the target parent tag from the multiple parent tags, wherein the target parent tag The tag has a corresponding target sub-tag; the search sub-module is used to search for target documents from the target document library based on the requirement attributes, requirement sub-tags, and target sub-tags.

In one implementation, the target document library includes: multiple documents; wherein the search sub-module is further configured to: call a robotic process automation RPA robot to select from multiple required sub-tags according to the requirement sub-tag and the target sub-tag. Search the documents to be filtered in the above documents; filter out the target documents from multiple documents to be filtered according to the requirement attributes.

In one implementation, the search sub-module is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then search for the corresponding target sub-tag. The document is used as a document to be filtered.

In a fifth aspect, an electronic device provided by an embodiment of the present disclosure includes: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the implementation of the first aspect is implemented. A method for constructing a document search platform provided by the example, or a method for document search provided by the embodiment of the second aspect.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, a method for constructing a document search platform as provided in the embodiment of the first aspect is implemented. Or implement a document search method as provided in the embodiment of the second aspect.

The above summary is for illustration purposes only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments and features described above, further aspects, embodiments and features of the present disclosure will be readily apparent by reference to the drawings and the following detailed description.

Description of the drawings

In the drawings, unless otherwise specified, the same reference numbers refer to the same or similar parts or elements throughout the several figures. The drawings are not necessarily to scale. It should be understood that these drawings depict only some embodiments proposed in accordance with the disclosure and are not to be considered limiting of the scope of the disclosure.

Figure 1 is a schematic flowchart of a method for building a document search platform proposed by an embodiment of the present disclosure.

FIG. 2 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.

FIG. 3 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.

Figure 4 is a schematic diagram of a document library construction interface proposed by an embodiment of the present disclosure.

Figure 5 is a schematic diagram of a document storage management interface proposed by an embodiment of the present disclosure.

FIG. 6A is a schematic diagram of a document storage interface of document loading type proposed by an embodiment of the present disclosure.

FIG. 6B is a schematic diagram of a link loading type document storage interface proposed by an embodiment of the present disclosure.

FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure.

Figure 7 is a schematic diagram of a tag information configuration interface proposed by an embodiment of the present disclosure.

Figure 8 is a schematic diagram of an attribute configuration interface proposed by an embodiment of the present disclosure.

FIG. 9 is a schematic flowchart of a document search method proposed by an embodiment of the present disclosure.

FIG. 10 is a schematic flowchart of a document search method proposed by another embodiment of the present disclosure.

Figure 11 is a schematic diagram of a document attribute editing interface proposed by an embodiment of the present disclosure.

Figure 12 is a schematic diagram of a document search interface proposed by an embodiment of the present disclosure.

Figure 13 is a schematic diagram of a document screening interface proposed by an embodiment of the present disclosure.

Figure 14 is a schematic structural diagram of a device for building a document search platform proposed by an embodiment of the present disclosure.

Figure 15 is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure.

Figure 16 is a schematic structural diagram of a document search device proposed by an embodiment of the present disclosure.

FIG. 17 is a schematic structural diagram of a document search device according to another embodiment of the present disclosure.

FIG. 18 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present disclosure and are not to be construed as limitations of the present disclosure.

In the description of the embodiments of the present disclosure, the term "plurality" refers to two or more than two.

In the description of the embodiments of the present disclosure, the term "document to be processed" refers to a document that is currently to be processed, such as a professional knowledge document, an enterprise information document, etc.

In the description of the embodiments of the present disclosure, the term "document type" means that the document to be processed can be divided into multiple document types according to different classification basis, for example, medical document type, legal document type, etc.

In the description of the embodiment of the present disclosure, the term "tag information" refers to information used to describe the tag of the document to be processed, for example, the characteristic information of the tag, the content information of the tag, etc.

In the description of the embodiments of the present disclosure, the term "parent tag" refers to a preset tag of the document to be processed, that is, it can be called a parent tag. The parent tag can be, for example, an associated entity reused from other platforms. .

In the description of the embodiments of the present disclosure, the term "document universal index" refers to applicable document indexing for all documents to be processed, such as document type, document size, document name, document storage address, Document update time, etc.

In the description of the embodiment of the present disclosure, the term "attribute" refers to information used to describe the attributes of the parent tag, such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. .

In the description of the embodiments of the present disclosure, the term "child tag" refers to the specific document content corresponding to the parent tag.

In the description of the embodiments of the present disclosure, the term "associated entity value" refers to the relevant information used to specifically describe the corresponding parent tag, for example, the characteristic information of the associated entity, the content information of the associated entity, etc.

In the description of the embodiments of the present disclosure, the term "initial document library" refers to the document library corresponding to the document type of the document to be processed from multiple document libraries in the initial stage of execution of the construction method of the document search platform.

In the description of the embodiments of the present disclosure, the term "document search request" refers to a request made by the user-side electronic device for triggering a document search in the document search platform.

In the description of the embodiments of the present disclosure, the term "requirement document type" refers to the document type that a user may need to search when performing a document search. This document type may be called a requirement document type. This requirement The document search type can be used to characterize the document search needs in the user's business scenario.

In the description of the embodiment of the present disclosure, the term "required tag information" refers to the tag information that the user needs to search when searching for documents, such as the name of the document that the user needs to search, keywords of the document content, etc. .

These and other aspects of embodiments of the present disclosure will become apparent with reference to the following description and accompanying drawings. In these descriptions and drawings, some specific implementations of the embodiments of the disclosure are specifically disclosed to represent some of the ways of implementing the principles of the embodiments of the disclosure, but it should be understood that the scope of the embodiments of the disclosure is not limited by this restriction. On the contrary, the disclosed embodiments include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.

This embodiment illustrates that the construction method of the document search platform is configured as a construction device of the document search platform. In this embodiment, the construction method of the document search platform can be configured in the construction device of the document search platform. The construction device may be provided in a server or in an electronic device, which is not limited in the embodiments of the present disclosure.

Referring to Figure 1, the construction method of the document search platform includes:

S101: Obtain a document to be processed, where the document to be processed has a corresponding document type.

Among them, the document currently to be processed can be called a document to be processed. The document to be processed can be used to assist in building the document search platform during the execution of the construction method of the document search platform. The document to be processed can be Specific examples include professional knowledge documents, enterprise information documents, etc., and there are no restrictions on this.

It can be understood that the documents to be processed can be divided into multiple document types according to different classification basis. For example, the documents to be processed can be divided into different document types according to the different application scenarios to which the documents to be processed belong. The document type may be, for example, a medical document type or a legal document type, and there is no limit to this.

In the embodiment of the present disclosure, the document search platform may provide a corresponding data transmission interface in advance, and obtain documents published in different offline business scenarios through the data transmission interface, and use the documents as documents to be processed. This is not done. limit.

In some embodiments, data transmission links between different offline business scenario platforms and document search platforms can also be established in advance, and when new documents are released on different offline business scenario platforms, corresponding document transmission instructions are generated and passed through This document transfer instruction triggers the offline business scenario platform to transfer newly released documents to the document search platform, or it can also use any possible method to obtain documents to be processed, without restrictions.

After obtaining the documents to be processed from different offline business scenarios, the embodiments of the present disclosure can perform annotation processing on the corresponding documents to be processed according to the business scenario to which the documents to be processed belong. For example, the documents to be processed can be marked when the documents to be processed are When obtained from a medical scene, the document to be processed is marked as a medical document type, and there is no restriction on this.

S102: Obtain tag information corresponding to the document to be processed.

Among them, the tag can be used to describe the basic attributes and characteristics of the document. The tag can be used to index and manage the structured field information of the document. The tag can be specifically, for example, document name, document update time, etc., but this is not done. limit.

Among them, the information used to describe the tag of the document to be processed can be called tag information. The tag information can be specifically, for example, the characteristic information of the tag, the content information of the tag, etc., and there is no limit to this.

In the embodiment of the present disclosure, after obtaining the document to be processed, tag information of the document to be processed can be identified to obtain tag information corresponding to the document to be processed.

For example, identifying the tag information of the document to be processed can be performed by identifying the entity of the document to be processed. For example, after obtaining the document to be processed, the document to be processed can be input into a pre-trained artificial intelligence AI model (the artificial intelligence model). The AI model can support entity recognition of the document to be processed), in which the artificial intelligence AI model performs entity recognition on the document to be processed to obtain multiple entity information corresponding to the document to be processed, and use the entity information as the corresponding entity information to the document to be processed label information, there is no restriction on this.

Alternatively, the tag information of the document to be processed can be identified, or feature analysis of the document to be processed can be performed after obtaining the document to be processed. For example, the document to be processed can be feature parsed through a feature parsing algorithm to obtain the document to be processed. Corresponding multiple feature information, and use the feature information as tag information corresponding to the document to be processed, without limitation.

S103: Build a target document library corresponding to the document type based on the tag information and the document to be processed.

In this disclosed embodiment, after obtaining the document to be processed and determining the tag information corresponding to the document to be processed, a document library corresponding to the document type can be constructed based on the tag information and the document to be processed, and the document library can be called a target Document library.

In some embodiments, a target document library corresponding to the document type is constructed based on the tag information and the document to be processed. It is also possible to construct a document library of the corresponding document type in the document search platform in advance, and use the document type to annotate the corresponding document library. Processing, and then, after obtaining the document to be processed, the document to be processed of the corresponding document type can be stored in the document library of the corresponding document type, and the tag information can be configured on the side of the corresponding document to be processed, thereby constructing the target document library. There are no restrictions on this.

In the embodiment of the present disclosure, the target document library can be used to store documents to be processed of corresponding document types and tag information corresponding to the documents to be processed. That is to say, based on the tag information and documents to be processed, a target corresponding to the document type is constructed. The document library can store the obtained documents to be processed with the same document type and corresponding label information into a document library to build a target document library, and there is no limit to this.

In this disclosed embodiment, since the constructed target document library is only used to store documents to be processed and tag information of one document type, it is possible to subsequently form a target document search platform based on the target document library, and then use the target document library according to the actual business. The scenario calls the target document library of the document type corresponding to the actual business scenario in the target document search platform, and then when searching for documents in the target document library, the searched documents can be effectively adapted to the document search of the actual business scenario. need.

S104: Form a target document search platform based on the target document library.

In the embodiment of the present disclosure, after constructing a target document library corresponding to the document type based on the tag information and the document to be processed, the document search platform can be processed according to the target document library, and the document search platform obtained by the aforementioned processing can be used as the target document search platform. .

In the embodiment of the present disclosure, the document search platform is processed according to the target document library. After the target document library is constructed, the target document library is deployed in the document search platform, and corresponding documents are processed according to the documents to be processed in the target document library. Label the target document library according to the business scenario (for example, you can label the target document library as a medical document library, legal document library, etc. according to the business scenario corresponding to the documents to be processed in the target document library, and there is no restriction on this), And the document search platform obtained by the aforementioned annotation processing is used as the target document search platform.

Alternatively, any other possible method can be used to form a target document search platform based on the target document library, and there is no limit to this.

In the embodiment of the present disclosure, since the target document library corresponding to the business scenario of the document type is constructed based on the tag information and the document to be processed, when the target document library is deployed in the document search platform, the constructed target document search platform , which can provide document search services for different business scenarios based on multiple target document libraries with different business scenarios, thus supporting different business scenarios when searching for corresponding documents without having to re-build the document search platform for the corresponding business scenarios. That is, the target document search platform can be directly called, thereby effectively improving the reusability of the document search platform, so that the constructed target document search platform can effectively meet the document search needs of different business scenarios.

In the embodiments of the disclosure, the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search platform construction process, thereby effectively improving the automation level of the document search platform construction and reducing labor costs.

In this embodiment, the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type, as Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.

Referring to Figure 2, the construction method of the document search platform includes:

S201: Obtain a document to be processed, where the document to be processed has a corresponding document type.

For the description of S201, reference may be made to the above-mentioned embodiments, and details will not be described again here.

S202: Determine the parent tag corresponding to the document to be processed.

Among them, the tags preset for the document to be processed can be called parent tags. The parent tags can be associated entities reused from other platforms, or tags pre-obtained from the tag library, or It can be index information applicable to all documents to be processed, and there is no limit to this.

The document to be processed can specifically be a text containing entities, for example: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality."

Entities can include diseases, research objects, research time, etc., and there are no restrictions on this.

The associated entity refers to an entity obtained from other platforms that can be reused for the document to be processed. For example, the associated entity may be an associated entity reused from the corresponding medical business platform.

That is to say, in the embodiment of the present disclosure, obtaining the associated entity corresponding to the document to be processed may be through the data transmission interface of the document search platform, obtaining the associated entity in other platforms that can be reused by the document being processed, and then The associated entity obtained by the aforementioned reuse is used as the parent tag corresponding to the document to be processed, and there is no restriction on this.

The index information refers to the structured information related to all documents participating in the search. The index information can be, for example, document type, document size, document name, document storage address, document update time, etc., and there is no limit to this.

In some embodiments, determining the parent tag corresponding to the document to be processed may be to obtain the associated entity that can be reused by other corresponding business platforms through the data transmission interface of the document search platform after determining the document to be processed, and use the tag as The parent tag, or determine the tag corresponding to the document to be processed, or after obtaining the document to be processed, obtain the tags in the tag library through the data transmission interface of the document search platform, and use the previously obtained tags as the The parent tag corresponding to the document to be processed, there is no restriction on this.

S203: Configure attributes for the parent label, and use the configured attributes as label information.

Among them, the information used to describe the attributes of the parent tag can be called attributes. The attributes can be specific information such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. In this regard, Without limitation, this property is used to determine whether the parent tag participates in document searches.

That is to say, after obtaining the parent tag, the embodiment of the present disclosure can configure various attributes of the parent tag to meet the document search requirements of different business scenarios. The attribute configuration can be, for example, tag classification, tag There are no restrictions on the name, label type, whether the label is required, value type, whether to participate in indexing, visibility filtering, etc.

In the embodiment of the present disclosure, by configuring corresponding attributes for the parent tag and using the attributes as tag information, the parent tag can be flexibly configured and modified based on the attributes, so that the document tag information in the document search platform can effectively meet different needs. Document search needs of business scenarios.

S204: Parse the child tag corresponding to the parent tag from the document to be processed.

Among them, it is assumed that the document to be processed is: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality", the parent tag is: "Disease, survey object, survey time, etc.", and the sub-tag can be The specific document content corresponding to the parent tag, the sub-tag corresponding to the parent tag can be, for example: "Disease - influenza, research object - children, research time - 2021". There is no restriction on this.

In some embodiments, the child tag corresponding to the parent tag is obtained from the document to be processed. After the document to be processed is obtained and the corresponding parent tag is determined, the document to be processed and the parent tag can be input into a pre-trained global pointer. (Global Pointer) model, in order to obtain the child tag corresponding to the parent tag output by the Global Pointer model, there is no restriction on this.

Among them, the Global Pointer model is an artificial intelligence model based on rotational position coding (a relative position coding). This model can support information extraction from documents, or the model can also be configured to any other possible method that can support extracting information from documents. The artificial intelligence model that extracts the corresponding sub-tags is not limited to this.

Optionally, in some embodiments, parsing the child tag corresponding to the parent tag from the document to be processed may be to call a natural language processing NLP service in the field of artificial intelligence to identify the child tag corresponding to the parent tag from the document to be processed. The document universal index corresponding to the tag, and the document universal index is used as the sub-tag, thereby enabling accurate parsing of the document to be processed to obtain the document universal index corresponding to the parent tag as a sub-tag, thereby enabling parsing The obtained document universal index can be adapted to the parent tag, so that when the document universal index is used as a subtag, the determination effect of the subtag can be effectively improved.

Among them, the document general index refers to the relevant information used to specifically describe the corresponding parent tag. The document general index can be, for example, the characteristic information of the corresponding parent tag, the content information of the corresponding parent tag, etc., and there is no limit to this. .

Among them, when the parent tag is "document update time", the corresponding document general index can be, for example, "April 20, 2022", and there is no limit to this.

That is to say, in the embodiment of the present disclosure, after the parent tag is determined, the document to be processed can be parsed according to the parent tag (wherein, the parsing method can be specifically, for example, semantic parsing, model parsing, etc., and there is no need for this). restrictions) to parse the document to be processed to obtain the document universal index corresponding to the parent tag, and use the document universal index as a child tag, without any restrictions.

In this disclosed embodiment, the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service can be called to process the document to be processed, so as to parse the document from the document to be processed. Get the child tag corresponding to the parent tag, without restrictions.

In some embodiments, extracting the associated entity value corresponding to the parent tag (for example, associated entity) from the document to be processed may be to use an entity recognition model to extract the association corresponding to the parent tag (for example, associated entity) from the document to be processed. Entity value, that is, the document to be processed and the corresponding parent tag (for example, associated entity) can be input into the entity recognition model to obtain the associated entity value output by the entity recognition model corresponding to the parent tag (for example, associated entity). For this No restrictions.

Optionally, in other embodiments, the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the NLP service can be called to identify the association corresponding to the parent tag from the document to be processed. Entity value, and the associated entity value is used as the child tag, thereby enabling the associated entity value corresponding to the parent tag to be accurately parsed from the document to be processed as the child tag, thereby enabling the parsed associated entity to be obtained The value can be adapted to the parent tag, which can effectively improve the certainty effect of the child tag when using the associated entity value as a child tag.

Among them, the associated entity value refers to the relevant information used to specifically describe the corresponding parent tag (for example, associated entity). The associated entity value can be, for example, the characteristic information of the associated entity, the content information of the associated entity, etc., for There is no restriction on this.

Wherein, when the associated entity is "disease", the corresponding associated entity value may be, for example, "flu, cold", and there is no limit to this.

In this disclosed embodiment, the child tag corresponding to the parent tag is obtained from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service is called to process the document to be processed, so as to obtain the subtag corresponding to the parent tag from the document to be processed. Identify the associated entity value corresponding to the parent tag, and use the associated entity value as the child tag, without limitation.

S205: Use the parent tag and the child tag together as tag information.

In this embodiment, by determining the parent tag corresponding to the document to be processed, and parsing the child tag corresponding to the parent tag from the document to be processed, the tag information can be used when the parent tag and the child tag are jointly used as tag information. Accurately characterize parent tags and corresponding sub-tags, thereby effectively improving the comprehensiveness and reference of tag information, and enabling the document search platform to provide tag information to the document search platform based on both parent tags and sub-tags. dimensions to assist users in performing document search work.

In the embodiment of the present disclosure, after determining the parent tag corresponding to the document to be processed, and parsing the child tag corresponding to the parent tag from the document to be processed, the parent tag and the child tag can be used together as tag information, and then the combined tag information can be The subsequent construction method of the document search platform is performed. For details, please refer to the subsequent embodiments.

S206: Build a target document library corresponding to the document type based on the tag information and the document to be processed.

S207: Form a target document search platform based on the target document library.

For descriptions of S206-S207, reference may be made to the above-mentioned embodiments and will not be described again here.

In this embodiment, by obtaining the document to be processed, where the document to be processed has a corresponding document type, and determining the parent tag corresponding to the document to be processed, determining the parent tag corresponding to the document to be processed, and parsing it from the document to be processed Obtain the child tag corresponding to the parent tag, so that when the parent tag and the child tag are jointly used as tag information, the tag information can accurately characterize the parent tag and the corresponding child tag, thereby effectively improving the comprehensiveness and accuracy of the tag information. Reference, and when providing tag information to the document search platform, the document search platform can assist the user in the execution of document search work based on the two dimensions of parent tag and child tag, and then configure the corresponding attributes for the parent tag, and Attributes are used as tag information, so that parent tags can be flexibly configured and modified based on attributes, so that the Wendan tag information in the document search platform can effectively meet the document search needs of different business scenarios, and then based on the tag information and documents to be processed, Build a target document library corresponding to the document type, and form a target document search platform based on the target document library, which can effectively improve the reusability of the document search platform and enable the constructed document search platform to effectively meet the needs of different business scenarios. Document search needs

See Figure 3 for the construction method of the document search platform, including:

S301: Obtain the document to be processed, where the document to be processed has a corresponding document type.

S302: Obtain tag information corresponding to the document to be processed.

For descriptions of S301-S302, reference may be made to the above-mentioned embodiments, and details will not be described again here.

S303: Call the robotic process automation RPA robot to determine the initial document library corresponding to the document type.

Among them, in the initial stage of execution of the construction method of the document search platform, a document library corresponding to the document type of the document to be processed is selected from multiple document libraries, which can be called an initial document library. This initial document library can be used in subsequent document searches. During the execution of the platform construction method, it is used to assist in building the target document library. For details, please refer to subsequent embodiments.

In the embodiment of the present disclosure, determining the initial document library corresponding to the document type may be by calling a robotic process automation (RPA) robot to automatically annotate a certain document library according to the document type, and after annotating the document library , so that the document library can only be used to store documents of the corresponding document type. The document library after the annotation process can be called the initial document library.

S304: Store tag information and documents to be processed in the initial document library to form a target document library.

In the embodiment of the present disclosure, after determining the initial document library corresponding to the document type, tag information and documents to be processed can be stored in the initial document library to form a target document library.

Optionally, in some embodiments, storing the tag information and the document to be processed into the initial document library to form a target document library may be to obtain the target load type corresponding to the document to be processed, and use the target corresponding to the target load type. The document storage method stores tag information and documents to be processed in the initial document library. This enables adaptive storage of the documents to be processed based on the target document storage method that matches the documents to be processed, thus effectively meeting the needs of different needs. Document storage requirements for documents to be processed of the target load type. In addition, by using the target document storage method corresponding to the target load type to store tag information and documents to be processed, the target load type of documents in the initial document library does not need to be limited. In a single format, it can effectively expand the documents of the document search platform to a large extent.

Among them, the document to be processed can be loaded in different types. This type can be called a target loading type. The target loading type can be specifically, for example, a document loading type, a link loading type, a rich text loading type, etc., and there is no limit to this. .

Different target loading types may have corresponding document storage methods, and the document storage methods may be called target document storage methods.

For example, the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library. When the document to be processed is a text loading type, the document to be processed is directly stored in the initial document library. , or, when the document to be processed is an image loading type, the image can be recognized using optical character recognition (Optical Character Recognition, OCR), and the previously recognized text can be stored in the initial document library. There is no need for this. Make restrictions.

Optionally, in some embodiments, the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library. When the target loading type is a document loading type, the document to be processed and the corresponding document can be stored in the initial document library. The document label information is stored in the target document library, and/or when the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed are stored in the target document library, and/or when the target loading type is a rich In the text loading type, the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.

In the embodiment of the present disclosure, the document loading type means that the document to be processed supports direct loading from the device to which the target document library belongs to the target document library. At this time, the document to be processed and the corresponding tag information can be stored locally from the device to which the target document library belongs. to the target document library.

In this disclosed embodiment, the link loading type is an external link (for example, Uniform Resource Locator (URL)), that is, the original file of the document to be processed does not exist locally on the device to which the target document library belongs, and the external link supports jump To the pending document corresponding to the external link, at this time, the external link and corresponding tag information can be stored in the target document library.

In the embodiment of the present disclosure, the rich text loading type means that the document to be processed is loaded in image type, audio type, video type, etc. At this time, a rich text editor can be used to edit the document to be processed to obtain the corresponding Edit the processing results, and store the editing results and corresponding tag information in the target document library.

In the embodiment of the present disclosure, since the initial document library corresponding to the document type is first determined, it is possible to accurately store the tag information and the document to be processed of the corresponding document type into the initial document library of the corresponding document type, so that the obtained target document can be formed The corresponding document type of the library can be adapted to the document type of the document to be processed, thereby effectively improving the construction effect of the target document library.

S305: Form a target document search platform based on the target document library.

For the description of S305, reference may be made to the above-mentioned embodiments, and details will not be described again here.

In the embodiment of the present disclosure, the method of constructing the document search platform described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams. In the initial stage of the method of constructing the document search platform, the document to be processed and the document to be processed can be obtained. The initial document library corresponding to the document type (the initial document library can be the document library construction interface of the document search platform in advance (the document library construction interface can be seen in Figure 4, Figure 4 is a document proposed by an embodiment of the present disclosure) Schematic diagram of the library construction interface), and obtain the tag information corresponding to the document to be processed. Then, the target loading type corresponding to the document to be processed can be determined, and the document upload management interface of the document search platform (document upload management The interface can be seen in Figure 5, which is a schematic diagram of the document storage management interface proposed by an embodiment of the present disclosure). Select the configuration item of the corresponding document loading type to enter the document storage interface of different document loading types (the different document loading types. The document storage interface can be seen in Figure 6A, Figure 6B, and Figure 6C. Figure 6A is a schematic diagram of a document storage interface of the document loading type proposed by an embodiment of the present disclosure, and Figure 6B is a link loading type of document storage proposed by an embodiment of the present disclosure. A schematic diagram of the interface, FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure), and the corresponding document to be processed is stored in the corresponding document storage interface.

In this embodiment of the present disclosure, after the document to be processed is stored in the initial document library, corresponding tag information can be configured for the document to be processed on the side of the document to be processed in the initial document library (for example, see FIG. 7 , which is a diagram of the present disclosure). A schematic diagram of the tag information configuration interface proposed in one embodiment, that is, you can click on the edit item on the interface, and perform the corresponding tag information configuration operation under the edit item to configure the corresponding tag information for the document to be processed). In addition, you can also Support configuring corresponding attributes for the parent tag on the attribute configuration interface of the corresponding tag (see Figure 8, which is a schematic diagram of the attribute configuration interface proposed by an embodiment of the present disclosure). At this point, the construction of the target document library is completed, thereby forming the target document Search platform.

In this embodiment, by obtaining the document to be processed, where the document to be processed has a corresponding document type, obtaining the tag information corresponding to the document to be processed, and then determining the initial document library corresponding to the document type, it is possible to realize the tag information The document to be processed and the corresponding document type are accurately stored in the initial document library of the corresponding document type, so that the corresponding document type of the target document library can be adapted to the document type of the document to be processed, thereby effectively improving the target document library. The construction effect is such that when the target document search platform is formed based on the target document library, the built document search platform can effectively meet the document search needs of different business scenarios.

In this embodiment, the document search method is configured as a document search device as an example. In this embodiment, the document search method can be configured in the document search device. The document search device can be set in a server or can also be set in an electronic device. , the embodiments of the present disclosure do not limit this.

Referring to Figure 9, the document search method includes:

S901: Receive document search request.

For the meanings and descriptions of the same terms in this embodiment as in the above embodiment, please refer to the above embodiment for details and will not be described again here.

Among them, a request made by the user-side electronic device to trigger a document search in the document search platform may be called a document search request.

In the embodiment of the present disclosure, the document search request may be received by the target document search platform providing a corresponding data transmission interface in advance, and the document search request made by the user-side device is received via the data transmission interface, without limitation.

Alternatively, to receive a document search request, a corresponding monitoring device may be pre-set in the target document search platform, and the user-side device may be monitored through the monitoring device, and when the user-side device generates a corresponding document search request, the document may be received. Search requests, without restrictions.

S902: Parse the requirement document type and requirement tag information from the document search request.

After receiving the document search request, the embodiment of the present disclosure can parse the required document type and required tag information from the document search request.

Among them, when users perform document searches, they can have the document type they need to search for. This document type can be called a demand document type. This demand document search type can be used to search for documents in the business scenario where the user is located. Characterize the search requirements. For example, when the business scenario the user is in is a medical business scenario, it can be determined that the document type required by the user when performing a document search is a medical document type, and there is no restriction on this.

When the user performs a document search, he or she may have the tag information that he or she needs to search for. This tag information may be called demand tag information. The demand tag information may be, for example, the name of the document that the user needs to search, or the key content of the document. words, etc., there is no restriction on this.

For example, if the received document search request is: "Study on the causes of influenza in children in 2021", the required document type can be, for example, a medical document type, and the required tag information can be the keywords in the document search request. Information, such as: "Flu, children, 2021", etc., are not restricted.

In the embodiment of the present disclosure, parsing the requirement document type and requirement tag information from the document search request may include performing semantic parsing processing on the document search request to obtain the requirement document type and requirement tag information.

S903: Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.

After receiving the document search request, the embodiment of the present disclosure can determine the document library corresponding to the required document type from multiple document libraries as the target document library according to the required document type in the document search request.

In the embodiment of the present disclosure, multiple document libraries in the target document search platform can be used to store documents to be processed of corresponding document types. The target document library corresponding to the required document type is determined from the multiple document libraries. This can be done by first determining the target document library corresponding to the required document type. Multiple document types corresponding to multiple document libraries respectively, and after determining the requirement document type, compare the requirement document type with the multiple previously determined document types, and when the requirement document type and document type are the same, compare the requirement document type with the document type. The document library corresponding to the document type is used as the target document library, and there is no restriction on this.

For example, determining the target document library corresponding to the required document type from multiple document libraries may be, for example, when determining that the required document type is a medical document type, determining the document library used to store medical documents as the target document library, and then It can support document search in the medical document library, so that the target documents obtained by the search can effectively meet the medical document needs in the corresponding medical business scenarios.

In some embodiments, when building a target document search platform, it can support annotation processing of multiple document libraries according to document types. Correspondingly, determining a target document library corresponding to the required document type from multiple document libraries can be determined. When it is determined that the corresponding identifier of the document library is the required document type, the document library is used as the target document library, and there is no restriction on this.

In this disclosed example, since the target document library stores documents to be processed that match the document type of the user's business scenario, it is possible to support processing from the target document library that matches the business type of the user's business scenario. Perform document search, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the searched documents can effectively meet the document search needs of different business scenarios.

S904: Search the target document library for the target document corresponding to the requirement tag information.

In the embodiment of the present disclosure, after determining the target document library corresponding to the required document type from multiple document libraries, the document to be processed corresponding to the required tag information is searched from the multiple documents to be processed in the target document library as the target document.

In the embodiment of the present disclosure, the target document library stores tag information associated with the document. Accordingly, searching for the target document corresponding to the required tag information from the target document library may be to search for the required tag information in the target document library. Match the tag information, and use the document corresponding to the tag information as the target document.

For example, if the required label information is: "2021, influenza, children", you can find whether the documents stored in the target document library have: "2021, influenza, children" document label information, and determine whether a certain document When the document tag information is: "2021, influenza, children", the document is used as the target document, and there is no restriction on this.

In some embodiments, to search for tag information that matches the required tag information in the target document library, a pre-trained information matching model may be used to match the required tag information and the tag information. That is, the required tag information and the tag information may be matched. The information is input into the pre-trained information matching model, and the information matching model performs matching processing on the demand label information and the label information to obtain the corresponding matching processing result, and the matching processing result indicates: the demand label information and the label information match. When , the document to be processed corresponding to the tag information is used as the target document, and there is no restriction on this.

Alternatively, search for tag information that matches the required tag information in the target document library, or determine the matching degree value between the required tag information and the tag information, and when the matching degree value is greater than a predetermined matching degree threshold, the matching degree value with the required tag information will be determined. The document to be processed corresponding to the tag information is used as the target document, and there is no restriction on this.

In the embodiments of the disclosure, the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search process, thereby effectively improving the automation of document search and reducing labor costs.

In this embodiment, by receiving the document search request, the required document type and the required tag information are parsed from the document search request, and the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries It belongs to a document search platform. The document library is used to store documents of corresponding document types, and then searches for target documents corresponding to the required tag information from the target document library. This can support the search for documents that match the business type of the user's business scenario. Document search is performed in the target document library, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the target documents obtained by searching can effectively meet the document search needs of different business scenarios.

Referring to Figure 10, the document search method includes:

S1001: Receive document search request.

S1002: Parse the requirement document type and requirement tag information from the document search request.

S1003: Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.

For descriptions of S1001-S1003, reference may be made to the above-mentioned embodiments and will not be described again here.

S1004: Call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes to determine the target parent tag from multiple parent tags, where the target parent tag has a corresponding target subtag.

Among the multiple parent tags, the parent tag participating in this document search can be called the target parent tag, and correspondingly, the child tag corresponding to the target parent tag can be called the target child tag.

For example, multiple parent tags and corresponding sub-tags can be: "Document format - text format, disease - influenza, document update time - April 2021, research object - children, cause of disease - spontaneously caused", the target parent The tag can be, for example, a parent tag participating in this document search, such as: "disease, research object, cause of disease". Correspondingly, the target sub-tag can be a sub-tag corresponding to the target parent tag, such as: "children, spontaneously caused, influenza" ” etc., there is no restriction on this.

In the embodiment of the present disclosure, attributes can be used to determine whether the parent tag and the child tag corresponding to the parent tag participate in the document search. Among them, the tags in the parent tag that participate in subsequent document searches can be called target parent tags. Correspondingly, , the child tag corresponding to the target parent tag can be called the target child tag.

Among them, the attribute used for the requirement can be called the requirement attribute, and the requirement attribute can support the configuration adjustment of the parent tag in the target document library according to the user's document search requirements.

That is to say, after the embodiment of the present disclosure determines the target document library corresponding to the requirement document type from multiple document libraries, it can call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes, and select the target document library from the target document library. Adjust the corresponding attributes of the parent tag in to determine whether the parent tag and the child tags corresponding to the parent tag participate in subsequent document searches, and use the parent tags that participate in subsequent document searches as the target parent tag, and set the parent tag corresponding to the target parent tag. The sub-tag serves as the target sub-tag, and then a subsequent document search method can be executed based on the target sub-tag. For details, please refer to subsequent embodiments.

For example, the multiple parent tags can be: document format, disease, research time, research object, and the document that the user needs to search can be a document with children as the research object. In this case, the document search can be During the process, according to the demand attributes, two labels such as disease and research time are hidden, so that the two parent labels such as disease and survey time and the corresponding sub-tags are not involved in subsequent document searches, and documents other than the parent label are Parent tags such as format and research object are used as the target parent tag, and the sub tags corresponding to the target parent tag are used as the target sub tags. From this, based on the demand attributes, we can determine from multiple parent tags that can effectively satisfy subsequent document searches. The target parent tag can further narrow the tag search scope, thereby effectively reducing the amount of data processed by tags in the subsequent document search process, thereby effectively ensuring the document search effect while effectively improving the document search efficiency. .

S1005: Search the target document from the target document library according to the requirement attribute, requirement subtag, and target subtag.

In the embodiment of the present disclosure, after determining the target parent tag from multiple parent tags according to the requirement attribute, the target document can be searched from the target document library according to the requirement attribute, the requirement subtag, and the target subtag.

In some embodiments, searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be performed by matching the requirement subtag and the target subtag (wherein, the matching processing method may be, for example, , model matching, feature matching, etc., there are no restrictions on this) to obtain the corresponding matching processing results, and the aforementioned matching processing results are further filtered according to the requirement attributes to obtain the target document, there are no restrictions on this.

Optionally, in some embodiments, searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be to call a robotic process automation RPA robot to search for the target document based on the requirement subtag and the target. Sub-tags to automatically search for documents to be filtered from multiple documents.

That is to say, in the embodiment of the present disclosure, it is possible to support the determination of multiple documents to be filtered from multiple documents in the target document library according to the requirement sub-tag and the target sub-tag. Then, it is possible to support the determination of multiple documents to be filtered according to the requirement attributes. Filter documents for further filtering to get target documents.

In some embodiments, searching for the target document from the target document library based on the requirement subtag and the target subtag may be performed by matching the requirement subtag and the target subtag, and when the requirement subtag and the target subtag match, The documents corresponding to the target sub-tag in the target document library are used as documents to be filtered, and there is no restriction on this.

Alternatively, search multiple documents to be filtered from the target document library based on the requirement subtag and target subtag, or search for the same target subtag as the requirement subtag in the target document library, and search between the requirement subtag and target subtag. When the tags are the same, the document corresponding to the target sub-tag in the target document library will be used as the document to be filtered, and there is no restriction on this.

Optionally, in some embodiments, searching multiple documents to be filtered from the target document library based on the demand subtag and the target subtag may be to determine the similarity value between the demand subtag and the target subtag of each document, and When the similarity value meets the set conditions, the document corresponding to the corresponding target sub-tag is used as the document to be filtered.

Among them, the similarity value can be used to characterize the degree of similarity between the demand sub-label and the target sub-label. The greater the similarity value, the closer the demand sub-label and the target sub-label are to the same. On the contrary, the greater the similarity value. If it is small, it can indicate that the gap between the demand sub-label and the target sub-label is larger, and there is no restriction on this.

That is to say, in the embodiment of the present disclosure, the Euclidean distance between the demand sub-label and the target sub-label may be determined, and the Euclidean distance may be used as the similarity value between the demand sub-label and the target sub-label, and Compare the similarity value with the preset setting conditions (where the setting conditions can be adaptively configured based on the document search requirements in actual business scenarios, without any restrictions), and when the similarity value satisfies the setting conditions When setting conditions, the documents corresponding to the corresponding target sub-tags are used as documents to be filtered.

In the embodiment of the present disclosure, multiple documents to be filtered that are searched from the target document library according to the requirement sub-tag and the target sub-tag can be sorted according to their corresponding similarity values. At this time, the multiple documents to be filtered can be sorted according to the requirement attributes. The target document is obtained by filtering out the documents to be filtered.

It can be understood that in the embodiment of the present disclosure, documents in the target document library may have multiple target sub-tags, and multiple documents to be processed may have a certain target sub-tag overlap. In this case, when based on requirements When searching for target documents using sub-tags, there may be multiple documents obtained through the search. In this case, you can further configure and filter the multiple documents to be filtered obtained from the aforementioned search based on the required attributes to select from multiple documents to be filtered. The target document is determined in the document, and there is no restriction on this.

For example, in the embodiment of the present disclosure, the document search method described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams. In the initial stage of the document search method, the document search platform can receive the document search request, and then can For the required attributes in the document search request, in the document attribute editing interface in the target document library (see Figure 11, which is a schematic diagram of the document attribute editing interface proposed by an embodiment of the present disclosure), the attributes of the tags in the target document library are Edit, thereby determining the target parent tag and target sub-tag from the parent tags in the target document library, so as to participate in the document search and obtain the corresponding target document.

Then, the requirement sub-tag in the document search request can be entered into the document search interface of the target document search platform (see Figure 12, which is a schematic diagram of the document search interface proposed by an embodiment of the present disclosure). The target document search platform can search according to the requirements. Search the target document based on the similarity value between the sub-tag and the target sub-tag, and sort one or more documents to be filtered obtained by the search according to the similarity value and then present them in the document search interface. It can also support the following methods: The filtering configuration items of the document search interface shown in Figure 12 enter the document screening interface (see Figure 13, which is a schematic diagram of the document screening interface proposed by an embodiment of the present disclosure), and according to the requirement attributes, the parent tag of the document to be filtered is Configure filter conditions to select target documents from multiple documents to be filtered.

In this embodiment, by receiving the document search request and then parsing the required document type and the required tag information from the document search request, the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries belong to Document search platform, the document library is used to store documents to be processed of corresponding document types, and determine the target document library corresponding to the required document type from multiple document libraries, where multiple document libraries belong to the document search platform, and the document library is used to Store documents of the corresponding document type, and then determine the target parent tag from multiple parent tags according to the requirement attribute. The target parent tag has a corresponding target subtag, and the target parent tag is selected from the target according to the requirement attribute, the requirement subtag, and the target subtag. Search the target document in the document library. Therefore, based on the requirement attributes, the target parent tag that can effectively satisfy the subsequent document search can be determined from multiple parent tags, so that the tag search scope can be further narrowed, so that subsequent documents can be searched. During the search process, the amount of data processed by tags is effectively reduced, thereby effectively improving the document search efficiency while effectively ensuring the document search effect.

Referring to Figure 14, the construction device 140 of the document search platform includes: a first acquisition module 1401, used to acquire a document to be processed, where the document to be processed has a corresponding document type; a second acquisition module 1402, used to acquire the document to be processed. Process the tag information corresponding to the document; the building module 1403 is used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and the forming module 1404 is used to form a target document search platform based on the target document library.

Optionally, in some embodiments, refer to Figure 15, which is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure, in which the second acquisition module 1402 includes: a first determination sub-module 14021 , used to determine the parent tag corresponding to the document to be processed; the parsing sub-module 14022, used to parse the sub-tag corresponding to the parent tag from the document to be processed; and the processing sub-module 14023, used to combine the parent tag and the sub-tag as Label Information.

Optionally, in some embodiments, the parsing sub-module 14022 is also used to: call the natural language processing NLP service in the artificial intelligence field, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document universal index to as a child tag; and/or call the NLP service to identify the associated entity value corresponding to the parent tag from the document to be processed, and use the associated entity value as a child tag.

Optionally, in some embodiments, the building module 1403 includes: a second determination sub-module 14031, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module 14032, used to store the tag Information and pending documents are stored in the initial document library to form the target document library.

Optionally, in some embodiments, the storage submodule 14032 is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed in Initial document library.

Optionally, in some embodiments, the storage submodule 14032 is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is If it is a link loading type, then the access link and corresponding tag information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through a rich text editor. , and store the editing processing results and corresponding tag information to the target document library.

Optionally, in some embodiments, the second acquisition module 1402 also includes: a configuration sub-module 14024, configured to configure attributes for the parent tag after determining the parent tag corresponding to the document to be processed, and use the configured attributes as the tag Information where the attribute identifies whether the parent tag participates in document searches.

Optionally, in some embodiments, the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).

It should be noted that the functions and specific implementation principles of the above-mentioned modules in the embodiments of the present disclosure may be referred to the above-mentioned method embodiments, and will not be described again here.

In this embodiment, the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type. Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.

Referring to Figure 16, the document search device 160 includes: a receiving module 1601, used to receive a document search request; a parsing module 1602, used to parse the required document type and demand tag information from the document search request; and a determining module 1603, used to obtain the required document type and required tag information from the document search request. Determine the target document library corresponding to the required document type in multiple document libraries, wherein the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type; the search module 1604 is used to search from the target document library The target document corresponding to the requirement tag information.

Optionally, in some embodiments, refer to Figure 17, which is a schematic structural diagram of a document search device proposed by another embodiment of the present disclosure. The requirement tag information includes: requirement attributes and requirement sub-tags, and the target document library has corresponding Multiple parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document.

Among them, the search module 1604 includes: a third determination sub-module 16041, which is used to call the natural language processing NLP service processing requirement attribute in the artificial intelligence field to determine the target parent tag from multiple parent tags, where the target parent tag has Corresponding target sub-tag; search sub-module 16042, used to search for target documents from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags.

Optionally, in some embodiments, the target document library includes: multiple documents; wherein, the search sub-module 16042 is also used to: call the robotic process automation RPA robot to select from multiple documents according to the demand sub-tag and the target sub-tag. Search for documents to be filtered; filter multiple documents to be filtered to obtain target documents based on required attributes.

Optionally, in some embodiments, the search sub-module 16042 is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then add the corresponding target sub-tag to The document corresponding to the label is used as the document to be filtered.

In order to implement the above embodiments, the present disclosure also provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the aforementioned embodiments of the present disclosure are implemented. The proposed construction method of the document search platform, or the implementation of the document search method proposed in the foregoing embodiments of this disclosure.

FIG. 18 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present disclosure. As shown in Figure 18, the electronic device 180 includes: a memory 1810 and a processor 1820. The memory 1810 stores a computer program that can run on the processor 1820. When the processor 1820 executes the computer program, it implements the construction method of the document search platform in the above embodiment, or implements the document search method in the above embodiment. The number of memory 1810 and processor 1820 may be one or more.

The electronic device also includes: a communication interface 1830, used for communicating with external devices and performing interactive data transmission. If the memory 1810, the processor 1820 and the communication interface 1830 are implemented independently, the memory 1810, the processor 1820 and the communication interface 1830 can be linked to each other through a bus and complete communication with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.

Optionally, in terms of specific implementation, if the memory 1810, the processor 1820 and the communication interface 1830 are integrated on one chip, the memory 1810, the processor 1820 and the communication interface 1830 can communicate with each other through the internal interface.

The present disclosure also provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the method for building a document search platform as proposed in the foregoing embodiments of the disclosure is implemented, or the document search platform as in the foregoing embodiments is implemented. Search method.

The present disclosure also provides a computer program product that, when executed by an instruction processor in the computer program product, implements the construction method of a document search platform as proposed in the foregoing embodiments of the disclosure, or implements the document search method as in the foregoing embodiments.

It should be understood that the above-mentioned processor can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processing, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor, etc. It is worth noting that the processor may be a processor that supports Advanced RISC Machines (ARM) architecture.

Further, optionally, the above-mentioned memory may include read-only memory and random access memory, and may also include non-volatile random access memory. The memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can include read-only memory (Read-Only Memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may include Random Access Memory (RAM), which acts as an external cache. By way of illustration, but not limitation, many forms of RAM are available. For example, static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM) and direct memory bus random access memory (Direct Access RAM, DR RAM).

In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the above integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. The storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.

The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of various changes or modifications within the technical scope of the present disclosure. alternatives, these should all be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

A method for building a document search platform, including:

Obtain a document to be processed, where the document to be processed has a corresponding document type;

Obtain tag information corresponding to the document to be processed;

Construct a target document library corresponding to the document type according to the tag information and the document to be processed; and

According to the target document library, a target document search platform is formed.
The method of claim 1, wherein said obtaining tag information corresponding to the document to be processed includes:

Determine the parent tag corresponding to the document to be processed;

Parse the child tag corresponding to the parent tag from the document to be processed; and

The parent tag and the child tag are collectively used as the tag information.
The method of claim 2, wherein parsing the child tag corresponding to the parent tag from the document to be processed includes:

Call the natural language processing NLP service in the artificial intelligence field, identify the document universal index corresponding to the parent tag from the document to be processed, and use the document universal index as the child tag; and/or

The NLP service is called, the associated entity value corresponding to the parent tag is identified from the document to be processed, and the associated entity value is used as the child tag.
The method according to any one of claims 1 to 3, wherein said constructing a target document library corresponding to the document type according to the tag information and the document to be processed includes:

Call the robotic process automation RPA robot to determine the initial document library corresponding to the document type;

The tag information and the document to be processed are stored in the initial document library to form the target document library.
The method of claim 4, wherein storing the tag information and the document to be processed into the initial document library includes:

Obtain the target loading type corresponding to the document to be processed;

The tag information and the document to be processed are stored in the initial document library using a target document storage method corresponding to the target loading type.
The method of claim 5, wherein the step of storing the tag information and the document to be processed in the initial document library using a target document storage method corresponding to the target loading type includes:

If the target loading type is a document loading type, store the document to be processed and the corresponding tag information in the target document library; and/or

If the target loading type is a link loading type, store the access link corresponding to the document to be processed and the corresponding tag information to the target document library; and/or

If the target loading type is a rich text loading type, the document to be processed is edited through a rich text editor, and the editing processing result and the corresponding tag information are stored in the target document library.
The method of claim 2 or 3, wherein after determining the parent tag corresponding to the document to be processed, it further includes:

Configure attributes for the parent tag, and use the configured attributes as the tag information, where the attributes are used to identify whether the parent tag participates in document search.
A document search method, applied to a document search platform, the document search platform is constructed by the construction method of the document search platform described in any one of the above claims 1-7;

Wherein, the document search method includes:

Receive document search requests;

Parse the requirement document type and requirement tag information from the document search request;

Determine a target document library corresponding to the required document type from a plurality of document libraries, wherein the plurality of document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type;

Search the target document library for target documents corresponding to the requirement tag information.
The method of claim 8, wherein the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, so The corresponding sub-tag is used to describe the document;

Wherein, searching for target documents corresponding to the requirement tag information from the target document library includes:

Call the natural language processing NLP service in the artificial intelligence field to process the requirement attributes to determine a target parent tag from the multiple parent tags, where the target parent tag has a corresponding target sub-tag;

The target document is searched from the target document library according to the requirement attribute, the requirement subtag, and the target subtag.
The method of claim 9, wherein the target document library includes: a plurality of the documents;

Wherein, searching for the target document from the target document library according to the requirement attribute, the requirement sub-tag, and the target sub-tag includes:

Call the robotic process automation RPA robot to search for documents to be filtered from multiple documents according to the requirement sub-tag and the target sub-tag;

The target document is obtained from a plurality of documents to be filtered according to the requirement attribute.
The method according to claim 10, wherein said searching for documents to be filtered from a plurality of said documents according to said demand sub-tag and said target sub-tag includes:

Determine the similarity value between the requirement sub-tag and the target sub-tag of each of the documents;

If the similarity value meets the set condition, the document corresponding to the target sub-tag is used as the document to be filtered.
A device for constructing a document search platform, including:

The first acquisition module is used to acquire documents to be processed, where the documents to be processed have corresponding document types;

The second acquisition module is used to acquire tag information corresponding to the document to be processed;

A building module, configured to build a target document library corresponding to the document type based on the tag information and the document to be processed; and

A forming module is used to form a target document search platform according to the target document library.
A document search device, applied to a document search platform, the document search platform is constructed by the document search platform construction device described in claim 12;

Wherein, the document search device includes:

The receiving module is used to receive document search requests;

A parsing module, used to parse the requirement document type and requirement tag information from the document search request;

A determination module, configured to determine a target document library corresponding to the required document type from a plurality of document libraries, wherein the plurality of document libraries belong to the document search platform, and the document library is used to store the corresponding document type. document;

A search module, configured to search for target documents corresponding to the requirement tag information from the target document library.
An electronic device, including: a processor and a memory, instructions stored in the memory, and the instructions are loaded and executed by the processor to implement the method for building a document search platform according to any one of claims 1 to 7 , or implement the document search method as described in any one of claims 8 to 11.
A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the document search platform as described in any one of claims 1-7 is implemented. Construct a method, or implement the document search method as described in any one of claims 8 to 11.