WO2023236257A1 - Document search platform, search method and apparatus, electronic device, and storage medium - Google Patents

Document search platform, search method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2023236257A1
WO2023236257A1 PCT/CN2022/100921 CN2022100921W WO2023236257A1 WO 2023236257 A1 WO2023236257 A1 WO 2023236257A1 CN 2022100921 W CN2022100921 W CN 2022100921W WO 2023236257 A1 WO2023236257 A1 WO 2023236257A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
target
tag
processed
library
Prior art date
Application number
PCT/CN2022/100921
Other languages
French (fr)
Chinese (zh)
Inventor
邬丹琳
张勇
周津
杜文杰
Original Assignee
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 来也科技(北京)有限公司 filed Critical 来也科技(北京)有限公司
Publication of WO2023236257A1 publication Critical patent/WO2023236257A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the present disclosure relates to the technical field of computer technology, and in particular, to a document search platform, search method, device, electronic device and storage medium.
  • Robotic Process Automation refers to the use of specific "robot software” to simulate human operations on a computer and automatically execute process tasks according to rules.
  • Artificial Intelligence AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
  • a document search platform in order to meet the document search requirements of corresponding business scenarios, a document search platform is usually constructed based on all documents in the corresponding business scenarios. For different business scenarios, it is usually necessary to build multiple different document search platforms.
  • the built document search platform cannot meet the document search needs in different business scenarios, resulting in the built document search platform being unable to be reused in different business scenarios.
  • Embodiments of the present disclosure provide a document search platform construction method, document search method, device, electronic device and storage medium to solve problems existing in related technologies.
  • the technical solutions are as follows:
  • the construction method of a document search platform proposed by the embodiment of the present disclosure includes: obtaining a document to be processed, where the document to be processed has a corresponding document type; obtaining tag information corresponding to the document to be processed; and based on the tag information and the document to be processed. Process documents and build a target document library corresponding to the document type; and form a target document search platform based on the target document library.
  • obtaining tag information corresponding to the document to be processed includes: determining the parent tag corresponding to the document to be processed; parsing the child tag corresponding to the parent tag from the document to be processed; and combining the parent tag and the child tag. Together as label information.
  • parsing the child tags corresponding to the parent tag from the document to be processed includes: calling a natural language processing NLP service in the field of artificial intelligence AI to identify the child tag corresponding to the parent tag from the document to be processed.
  • Document universal index and use the document universal index as the sub-tag; and/or call the NLP service, identify the associated entity value corresponding to the parent tag from the document to be processed, and associate the Entity value as the child tag.
  • building a target document library corresponding to the document type based on the tag information and the document to be processed includes: calling a robotic process automation RPA robot to determine the initial document library corresponding to the document type; combining the tag information and the document to be processed Documents are stored in the initial document library to form the target document library.
  • storing the tag information and the document to be processed in the initial document library includes: obtaining the target loading type corresponding to the document to be processed; using the target document storage method corresponding to the target loading type to store the tag information and the document to be processed. Process documents are stored in the initial document library.
  • the target document storage method corresponding to the target loading type is used to store the tag information and the document to be processed in the initial document library, including: if the target loading type is a document loading type, then storing the document to be processed and the corresponding document
  • the document tag information is stored in the target document library; and/or if the target load type is a link load type, the access link and corresponding tag information corresponding to the document to be processed are stored in the target document library; and/or if the target load type is a rich
  • the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.
  • the method further includes: configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.
  • the document search method proposed by the embodiment of the present disclosure is applied to a document search platform.
  • the document search platform is constructed by the construction method of the document search platform in the first aspect, wherein the document search method includes: receiving a document search request. , and parses the requirement document type and requirement tag information from the document search request, and then determines the target document library corresponding to the requirement document type from multiple document libraries.
  • multiple document libraries belong to the document search platform, and the document library is used for storage Documents of corresponding document types; search for target documents corresponding to the requirement tag information from the target document library.
  • the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document; wherein, Searching the target document library for the target document corresponding to the requirement tag information includes: calling a natural language processing NLP service in the field of artificial intelligence to process the requirement attribute to determine the target parent tag from the plurality of parent tags, where, The target parent tag has a corresponding target sub-tag; the target document is searched from the target document library according to the requirement attribute, the requirement sub-tag, and the target sub-tag.
  • the target document library includes: multiple documents; wherein, searching the target document from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags includes: calling a robotic process automation RPA robot to search according to The demand sub-tag and the target sub-tag search for documents to be filtered from a plurality of the documents; according to the demand attributes, a target document is obtained from the plurality of documents to be filtered.
  • searching multiple documents to be filtered from the target document library according to the demand sub-tag and the target sub-tag includes: determining the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity If the value meets the set conditions, the document corresponding to the corresponding target sub-tag will be used as the document to be filtered.
  • the construction device of the document search platform proposed by the embodiment of the present disclosure includes: a first acquisition module, used to acquire documents to be processed, where the documents to be processed have corresponding document types; and a second acquisition module, used to acquire Tag information corresponding to the document to be processed; a building module used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and a forming module used to form a target document search platform based on the target document library.
  • the second acquisition module includes: a first determination sub-module, used to determine the parent tag corresponding to the document to be processed; and a parsing sub-module, used to parse the document to be processed to obtain the parent tag corresponding to the document. Sub-tags; and processing sub-modules, used to use parent tags and sub-tags together as tag information.
  • the parsing sub-module is also used to: call the natural language processing NLP service in the field of artificial intelligence, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document The document universal index is used as the sub-tag; and/or the NLP service is called, the associated entity value corresponding to the parent tag is identified from the document to be processed, and the associated entity value is used as the sub-tag.
  • the building module includes: a second determination sub-module, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module, used to combine the tag information and the document to be processed Store to the initial document library to form the target document library.
  • the storage submodule is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed into the initial document library .
  • the storage submodule is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through the rich text editor, and The editing processing results and corresponding label information are stored in the target document library.
  • the second acquisition module further includes: a configuration sub-module, configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.
  • the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).
  • AI artificial intelligence
  • RPA robotic process automation
  • a fourth aspect is a document search device proposed by an embodiment of the present disclosure.
  • the document search device is constructed from the document search platform construction device included in the third aspect.
  • the document search device includes: a receiving module for receiving a document search request;
  • the parsing module is used to parse the demand document type and demand tag information from the document search request;
  • the determination module is used to determine the target document library corresponding to the demand document type from multiple document libraries, wherein the multiple document libraries belong to the document search
  • the document library is used to store documents of corresponding document types;
  • the search module is used to search for target documents corresponding to the requirement tag information from the target document library.
  • the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document;
  • the search module includes: a third determination sub-module, which is used to call the natural language processing NLP service in the artificial intelligence field to process the demand attribute to determine the target parent tag from the multiple parent tags, wherein the target parent tag The tag has a corresponding target sub-tag; the search sub-module is used to search for target documents from the target document library based on the requirement attributes, requirement sub-tags, and target sub-tags.
  • the target document library includes: multiple documents; wherein the search sub-module is further configured to: call a robotic process automation RPA robot to select from multiple required sub-tags according to the requirement sub-tag and the target sub-tag. Search the documents to be filtered in the above documents; filter out the target documents from multiple documents to be filtered according to the requirement attributes.
  • the search sub-module is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then search for the corresponding target sub-tag.
  • the document is used as a document to be filtered.
  • an electronic device provided by an embodiment of the present disclosure includes: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program, the implementation of the first aspect is implemented.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored.
  • a program is executed by a processor, a method for constructing a document search platform as provided in the embodiment of the first aspect is implemented. Or implement a document search method as provided in the embodiment of the second aspect.
  • Figure 1 is a schematic flowchart of a method for building a document search platform proposed by an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
  • FIG. 3 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a document library construction interface proposed by an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of a document storage management interface proposed by an embodiment of the present disclosure.
  • FIG. 6A is a schematic diagram of a document storage interface of document loading type proposed by an embodiment of the present disclosure.
  • FIG. 6B is a schematic diagram of a link loading type document storage interface proposed by an embodiment of the present disclosure.
  • FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram of a tag information configuration interface proposed by an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of an attribute configuration interface proposed by an embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of a document search method proposed by an embodiment of the present disclosure.
  • FIG. 10 is a schematic flowchart of a document search method proposed by another embodiment of the present disclosure.
  • Figure 11 is a schematic diagram of a document attribute editing interface proposed by an embodiment of the present disclosure.
  • Figure 12 is a schematic diagram of a document search interface proposed by an embodiment of the present disclosure.
  • Figure 13 is a schematic diagram of a document screening interface proposed by an embodiment of the present disclosure.
  • Figure 14 is a schematic structural diagram of a device for building a document search platform proposed by an embodiment of the present disclosure.
  • Figure 15 is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure.
  • Figure 16 is a schematic structural diagram of a document search device proposed by an embodiment of the present disclosure.
  • FIG. 17 is a schematic structural diagram of a document search device according to another embodiment of the present disclosure.
  • FIG. 18 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present disclosure.
  • the term “plurality” refers to two or more than two.
  • the term "document to be processed” refers to a document that is currently to be processed, such as a professional knowledge document, an enterprise information document, etc.
  • document type means that the document to be processed can be divided into multiple document types according to different classification basis, for example, medical document type, legal document type, etc.
  • tag information refers to information used to describe the tag of the document to be processed, for example, the characteristic information of the tag, the content information of the tag, etc.
  • parent tag refers to a preset tag of the document to be processed, that is, it can be called a parent tag.
  • the parent tag can be, for example, an associated entity reused from other platforms. .
  • document universal index refers to applicable document indexing for all documents to be processed, such as document type, document size, document name, document storage address, Document update time, etc.
  • attribute refers to information used to describe the attributes of the parent tag, such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. .
  • child tag refers to the specific document content corresponding to the parent tag.
  • associated entity value refers to the relevant information used to specifically describe the corresponding parent tag, for example, the characteristic information of the associated entity, the content information of the associated entity, etc.
  • initial document library refers to the document library corresponding to the document type of the document to be processed from multiple document libraries in the initial stage of execution of the construction method of the document search platform.
  • document search request refers to a request made by the user-side electronic device for triggering a document search in the document search platform.
  • the term "requirement document type" refers to the document type that a user may need to search when performing a document search.
  • This document type may be called a requirement document type.
  • the document search type can be used to characterize the document search needs in the user's business scenario.
  • the term "required tag information" refers to the tag information that the user needs to search when searching for documents, such as the name of the document that the user needs to search, keywords of the document content, etc. .
  • Figure 1 is a schematic flowchart of a method for building a document search platform proposed by an embodiment of the present disclosure.
  • This embodiment illustrates that the construction method of the document search platform is configured as a construction device of the document search platform.
  • the construction method of the document search platform can be configured in the construction device of the document search platform.
  • the construction device may be provided in a server or in an electronic device, which is not limited in the embodiments of the present disclosure.
  • the construction method of the document search platform includes:
  • the document currently to be processed can be called a document to be processed.
  • the document to be processed can be used to assist in building the document search platform during the execution of the construction method of the document search platform.
  • the document to be processed can be Specific examples include professional knowledge documents, enterprise information documents, etc., and there are no restrictions on this.
  • the documents to be processed can be divided into multiple document types according to different classification basis.
  • the documents to be processed can be divided into different document types according to the different application scenarios to which the documents to be processed belong.
  • the document type may be, for example, a medical document type or a legal document type, and there is no limit to this.
  • the document search platform may provide a corresponding data transmission interface in advance, and obtain documents published in different offline business scenarios through the data transmission interface, and use the documents as documents to be processed. This is not done. limit.
  • data transmission links between different offline business scenario platforms and document search platforms can also be established in advance, and when new documents are released on different offline business scenario platforms, corresponding document transmission instructions are generated and passed through This document transfer instruction triggers the offline business scenario platform to transfer newly released documents to the document search platform, or it can also use any possible method to obtain documents to be processed, without restrictions.
  • the embodiments of the present disclosure can perform annotation processing on the corresponding documents to be processed according to the business scenario to which the documents to be processed belong.
  • the documents to be processed can be marked when the documents to be processed are When obtained from a medical scene, the document to be processed is marked as a medical document type, and there is no restriction on this.
  • the tag can be used to describe the basic attributes and characteristics of the document.
  • the tag can be used to index and manage the structured field information of the document.
  • the tag can be specifically, for example, document name, document update time, etc., but this is not done. limit.
  • the information used to describe the tag of the document to be processed can be called tag information.
  • the tag information can be specifically, for example, the characteristic information of the tag, the content information of the tag, etc., and there is no limit to this.
  • tag information of the document to be processed can be identified to obtain tag information corresponding to the document to be processed.
  • identifying the tag information of the document to be processed can be performed by identifying the entity of the document to be processed.
  • the document to be processed can be input into a pre-trained artificial intelligence AI model (the artificial intelligence model).
  • the AI model can support entity recognition of the document to be processed), in which the artificial intelligence AI model performs entity recognition on the document to be processed to obtain multiple entity information corresponding to the document to be processed, and use the entity information as the corresponding entity information to the document to be processed label information, there is no restriction on this.
  • the tag information of the document to be processed can be identified, or feature analysis of the document to be processed can be performed after obtaining the document to be processed.
  • the document to be processed can be feature parsed through a feature parsing algorithm to obtain the document to be processed.
  • a document library corresponding to the document type can be constructed based on the tag information and the document to be processed, and the document library can be called a target Document library.
  • a target document library corresponding to the document type is constructed based on the tag information and the document to be processed. It is also possible to construct a document library of the corresponding document type in the document search platform in advance, and use the document type to annotate the corresponding document library. Processing, and then, after obtaining the document to be processed, the document to be processed of the corresponding document type can be stored in the document library of the corresponding document type, and the tag information can be configured on the side of the corresponding document to be processed, thereby constructing the target document library. There are no restrictions on this.
  • the target document library can be used to store documents to be processed of corresponding document types and tag information corresponding to the documents to be processed. That is to say, based on the tag information and documents to be processed, a target corresponding to the document type is constructed.
  • the document library can store the obtained documents to be processed with the same document type and corresponding label information into a document library to build a target document library, and there is no limit to this.
  • the constructed target document library is only used to store documents to be processed and tag information of one document type, it is possible to subsequently form a target document search platform based on the target document library, and then use the target document library according to the actual business.
  • the scenario calls the target document library of the document type corresponding to the actual business scenario in the target document search platform, and then when searching for documents in the target document library, the searched documents can be effectively adapted to the document search of the actual business scenario. need.
  • S104 Form a target document search platform based on the target document library.
  • the document search platform after constructing a target document library corresponding to the document type based on the tag information and the document to be processed, the document search platform can be processed according to the target document library, and the document search platform obtained by the aforementioned processing can be used as the target document search platform. .
  • the document search platform is processed according to the target document library.
  • the target document library is constructed, the target document library is deployed in the document search platform, and corresponding documents are processed according to the documents to be processed in the target document library.
  • Label the target document library according to the business scenario for example, you can label the target document library as a medical document library, legal document library, etc. according to the business scenario corresponding to the documents to be processed in the target document library, and there is no restriction on this
  • the document search platform obtained by the aforementioned annotation processing is used as the target document search platform.
  • any other possible method can be used to form a target document search platform based on the target document library, and there is no limit to this.
  • the constructed target document search platform which can provide document search services for different business scenarios based on multiple target document libraries with different business scenarios, thus supporting different business scenarios when searching for corresponding documents without having to re-build the document search platform for the corresponding business scenarios. That is, the target document search platform can be directly called, thereby effectively improving the reusability of the document search platform, so that the constructed target document search platform can effectively meet the document search needs of different business scenarios.
  • the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search platform construction process, thereby effectively improving the automation level of the document search platform construction and reducing labor costs.
  • IA intelligent automation
  • the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type, as Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.
  • FIG. 2 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
  • the construction method of the document search platform includes:
  • the tags preset for the document to be processed can be called parent tags.
  • the parent tags can be associated entities reused from other platforms, or tags pre-obtained from the tag library, or It can be index information applicable to all documents to be processed, and there is no limit to this.
  • the document to be processed can specifically be a text containing entities, for example: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality.”
  • Entities can include diseases, research objects, research time, etc., and there are no restrictions on this.
  • the associated entity refers to an entity obtained from other platforms that can be reused for the document to be processed.
  • the associated entity may be an associated entity reused from the corresponding medical business platform.
  • obtaining the associated entity corresponding to the document to be processed may be through the data transmission interface of the document search platform, obtaining the associated entity in other platforms that can be reused by the document being processed, and then The associated entity obtained by the aforementioned reuse is used as the parent tag corresponding to the document to be processed, and there is no restriction on this.
  • the index information refers to the structured information related to all documents participating in the search.
  • the index information can be, for example, document type, document size, document name, document storage address, document update time, etc., and there is no limit to this.
  • determining the parent tag corresponding to the document to be processed may be to obtain the associated entity that can be reused by other corresponding business platforms through the data transmission interface of the document search platform after determining the document to be processed, and use the tag as The parent tag, or determine the tag corresponding to the document to be processed, or after obtaining the document to be processed, obtain the tags in the tag library through the data transmission interface of the document search platform, and use the previously obtained tags as the The parent tag corresponding to the document to be processed, there is no restriction on this.
  • S203 Configure attributes for the parent label, and use the configured attributes as label information.
  • the information used to describe the attributes of the parent tag can be called attributes.
  • the attributes can be specific information such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. In this regard, Without limitation, this property is used to determine whether the parent tag participates in document searches.
  • the embodiment of the present disclosure can configure various attributes of the parent tag to meet the document search requirements of different business scenarios.
  • the attribute configuration can be, for example, tag classification, tag
  • the parent tag by configuring corresponding attributes for the parent tag and using the attributes as tag information, the parent tag can be flexibly configured and modified based on the attributes, so that the document tag information in the document search platform can effectively meet different needs. Document search needs of business scenarios.
  • the document to be processed is: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality"
  • the parent tag is: "Disease, survey object, survey time, etc.”
  • the sub-tag can be
  • the specific document content corresponding to the parent tag, the sub-tag corresponding to the parent tag can be, for example: "Disease - influenza, research object - children, research time - 2021". There is no restriction on this.
  • the child tag corresponding to the parent tag is obtained from the document to be processed. After the document to be processed is obtained and the corresponding parent tag is determined, the document to be processed and the parent tag can be input into a pre-trained global pointer. (Global Pointer) model, in order to obtain the child tag corresponding to the parent tag output by the Global Pointer model, there is no restriction on this.
  • Global Pointer Global Pointer
  • the Global Pointer model is an artificial intelligence model based on rotational position coding (a relative position coding).
  • This model can support information extraction from documents, or the model can also be configured to any other possible method that can support extracting information from documents.
  • the artificial intelligence model that extracts the corresponding sub-tags is not limited to this.
  • parsing the child tag corresponding to the parent tag from the document to be processed may be to call a natural language processing NLP service in the field of artificial intelligence to identify the child tag corresponding to the parent tag from the document to be processed.
  • the document universal index corresponding to the tag, and the document universal index is used as the sub-tag, thereby enabling accurate parsing of the document to be processed to obtain the document universal index corresponding to the parent tag as a sub-tag, thereby enabling parsing
  • the obtained document universal index can be adapted to the parent tag, so that when the document universal index is used as a subtag, the determination effect of the subtag can be effectively improved.
  • the document general index refers to the relevant information used to specifically describe the corresponding parent tag.
  • the document general index can be, for example, the characteristic information of the corresponding parent tag, the content information of the corresponding parent tag, etc., and there is no limit to this. .
  • the parent tag is "document update time”
  • the corresponding document general index can be, for example, "April 20, 2022", and there is no limit to this.
  • the document to be processed can be parsed according to the parent tag (wherein, the parsing method can be specifically, for example, semantic parsing, model parsing, etc., and there is no need for this). restrictions) to parse the document to be processed to obtain the document universal index corresponding to the parent tag, and use the document universal index as a child tag, without any restrictions.
  • the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service can be called to process the document to be processed, so as to parse the document from the document to be processed. Get the child tag corresponding to the parent tag, without restrictions.
  • NLP Natural Language Processing
  • extracting the associated entity value corresponding to the parent tag (for example, associated entity) from the document to be processed may be to use an entity recognition model to extract the association corresponding to the parent tag (for example, associated entity) from the document to be processed.
  • Entity value that is, the document to be processed and the corresponding parent tag (for example, associated entity) can be input into the entity recognition model to obtain the associated entity value output by the entity recognition model corresponding to the parent tag (for example, associated entity).
  • the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the NLP service can be called to identify the association corresponding to the parent tag from the document to be processed.
  • Entity value, and the associated entity value is used as the child tag, thereby enabling the associated entity value corresponding to the parent tag to be accurately parsed from the document to be processed as the child tag, thereby enabling the parsed associated entity to be obtained
  • the value can be adapted to the parent tag, which can effectively improve the certainty effect of the child tag when using the associated entity value as a child tag.
  • the associated entity value refers to the relevant information used to specifically describe the corresponding parent tag (for example, associated entity).
  • the associated entity value can be, for example, the characteristic information of the associated entity, the content information of the associated entity, etc., for There is no restriction on this.
  • the corresponding associated entity value may be, for example, "flu, cold", and there is no limit to this.
  • the child tag corresponding to the parent tag is obtained from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service is called to process the document to be processed, so as to obtain the subtag corresponding to the parent tag from the document to be processed. Identify the associated entity value corresponding to the parent tag, and use the associated entity value as the child tag, without limitation.
  • NLP Natural Language Processing
  • S205 Use the parent tag and the child tag together as tag information.
  • the tag information can be used when the parent tag and the child tag are jointly used as tag information.
  • the parent tag and the child tag can be used together as tag information, and then the combined tag information can be The subsequent construction method of the document search platform is performed. For details, please refer to the subsequent embodiments.
  • S207 Form a target document search platform based on the target document library.
  • the document to be processed by obtaining the document to be processed, where the document to be processed has a corresponding document type, and determining the parent tag corresponding to the document to be processed, determining the parent tag corresponding to the document to be processed, and parsing it from the document to be processed Obtain the child tag corresponding to the parent tag, so that when the parent tag and the child tag are jointly used as tag information, the tag information can accurately characterize the parent tag and the corresponding child tag, thereby effectively improving the comprehensiveness and accuracy of the tag information.
  • the document search platform can assist the user in the execution of document search work based on the two dimensions of parent tag and child tag, and then configure the corresponding attributes for the parent tag, and Attributes are used as tag information, so that parent tags can be flexibly configured and modified based on attributes, so that the Wendan tag information in the document search platform can effectively meet the document search needs of different business scenarios, and then based on the tag information and documents to be processed, Build a target document library corresponding to the document type, and form a target document search platform based on the target document library, which can effectively improve the reusability of the document search platform and enable the constructed document search platform to effectively meet the needs of different business scenarios.
  • Document search needs can effectively improve the reusability of the document search platform and enable the constructed document search platform to effectively meet the needs of different business scenarios.
  • FIG. 3 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
  • a document library corresponding to the document type of the document to be processed is selected from multiple document libraries, which can be called an initial document library.
  • This initial document library can be used in subsequent document searches.
  • it is used to assist in building the target document library. For details, please refer to subsequent embodiments.
  • determining the initial document library corresponding to the document type may be by calling a robotic process automation (RPA) robot to automatically annotate a certain document library according to the document type, and after annotating the document library , so that the document library can only be used to store documents of the corresponding document type.
  • RPA robotic process automation
  • the document library after the annotation process can be called the initial document library.
  • S304 Store tag information and documents to be processed in the initial document library to form a target document library.
  • tag information and documents to be processed can be stored in the initial document library to form a target document library.
  • storing the tag information and the document to be processed into the initial document library to form a target document library may be to obtain the target load type corresponding to the document to be processed, and use the target corresponding to the target load type.
  • the document storage method stores tag information and documents to be processed in the initial document library. This enables adaptive storage of the documents to be processed based on the target document storage method that matches the documents to be processed, thus effectively meeting the needs of different needs. Document storage requirements for documents to be processed of the target load type.
  • the target load type of documents in the initial document library does not need to be limited. In a single format, it can effectively expand the documents of the document search platform to a large extent.
  • the document to be processed can be loaded in different types.
  • This type can be called a target loading type.
  • the target loading type can be specifically, for example, a document loading type, a link loading type, a rich text loading type, etc., and there is no limit to this. .
  • Different target loading types may have corresponding document storage methods, and the document storage methods may be called target document storage methods.
  • the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library.
  • the document to be processed is directly stored in the initial document library.
  • the image can be recognized using optical character recognition (Optical Character Recognition, OCR), and the previously recognized text can be stored in the initial document library. There is no need for this. Make restrictions.
  • the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library.
  • the target loading type is a document loading type
  • the document to be processed and the corresponding document can be stored in the initial document library.
  • the document label information is stored in the target document library, and/or when the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed are stored in the target document library, and/or when the target loading type is a rich
  • the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.
  • the document loading type means that the document to be processed supports direct loading from the device to which the target document library belongs to the target document library. At this time, the document to be processed and the corresponding tag information can be stored locally from the device to which the target document library belongs. to the target document library.
  • the link loading type is an external link (for example, Uniform Resource Locator (URL)), that is, the original file of the document to be processed does not exist locally on the device to which the target document library belongs, and the external link supports jump To the pending document corresponding to the external link, at this time, the external link and corresponding tag information can be stored in the target document library.
  • an external link for example, Uniform Resource Locator (URL)
  • the rich text loading type means that the document to be processed is loaded in image type, audio type, video type, etc.
  • a rich text editor can be used to edit the document to be processed to obtain the corresponding Edit the processing results, and store the editing results and corresponding tag information in the target document library.
  • the initial document library corresponding to the document type since the initial document library corresponding to the document type is first determined, it is possible to accurately store the tag information and the document to be processed of the corresponding document type into the initial document library of the corresponding document type, so that the obtained target document can be formed
  • the corresponding document type of the library can be adapted to the document type of the document to be processed, thereby effectively improving the construction effect of the target document library.
  • S305 Form a target document search platform based on the target document library.
  • the method of constructing the document search platform described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams.
  • the document to be processed and the document to be processed can be obtained.
  • the initial document library corresponding to the document type (the initial document library can be the document library construction interface of the document search platform in advance (the document library construction interface can be seen in Figure 4, Figure 4 is a document proposed by an embodiment of the present disclosure) Schematic diagram of the library construction interface), and obtain the tag information corresponding to the document to be processed.
  • FIG. 5 is a schematic diagram of the document storage management interface proposed by an embodiment of the present disclosure.
  • the document storage interface can be seen in Figure 6A, Figure 6B, and Figure 6C.
  • Figure 6A is a schematic diagram of a document storage interface of the document loading type proposed by an embodiment of the present disclosure
  • Figure 6B is a link loading type of document storage proposed by an embodiment of the present disclosure.
  • FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure)
  • the corresponding document to be processed is stored in the corresponding document storage interface.
  • corresponding tag information can be configured for the document to be processed on the side of the document to be processed in the initial document library (for example, see FIG. 7 , which is a diagram of the present disclosure).
  • a schematic diagram of the tag information configuration interface proposed in one embodiment that is, you can click on the edit item on the interface, and perform the corresponding tag information configuration operation under the edit item to configure the corresponding tag information for the document to be processed).
  • you can also Support configuring corresponding attributes for the parent tag on the attribute configuration interface of the corresponding tag see Figure 8, which is a schematic diagram of the attribute configuration interface proposed by an embodiment of the present disclosure).
  • the document to be processed by obtaining the document to be processed, where the document to be processed has a corresponding document type, obtaining the tag information corresponding to the document to be processed, and then determining the initial document library corresponding to the document type, it is possible to realize the tag information
  • the document to be processed and the corresponding document type are accurately stored in the initial document library of the corresponding document type, so that the corresponding document type of the target document library can be adapted to the document type of the document to be processed, thereby effectively improving the target document library.
  • the construction effect is such that when the target document search platform is formed based on the target document library, the built document search platform can effectively meet the document search needs of different business scenarios.
  • FIG. 9 is a schematic flowchart of a document search method proposed by an embodiment of the present disclosure.
  • the document search method is configured as a document search device as an example.
  • the document search method can be configured in the document search device.
  • the document search device can be set in a server or can also be set in an electronic device. , the embodiments of the present disclosure do not limit this.
  • the document search method includes:
  • a request made by the user-side electronic device to trigger a document search in the document search platform may be called a document search request.
  • the document search request may be received by the target document search platform providing a corresponding data transmission interface in advance, and the document search request made by the user-side device is received via the data transmission interface, without limitation.
  • a corresponding monitoring device may be pre-set in the target document search platform, and the user-side device may be monitored through the monitoring device, and when the user-side device generates a corresponding document search request, the document may be received. Search requests, without restrictions.
  • S902 Parse the requirement document type and requirement tag information from the document search request.
  • the embodiment of the present disclosure can parse the required document type and required tag information from the document search request.
  • This document type can be called a demand document type.
  • This demand document search type can be used to search for documents in the business scenario where the user is located. Characterize the search requirements. For example, when the business scenario the user is in is a medical business scenario, it can be determined that the document type required by the user when performing a document search is a medical document type, and there is no restriction on this.
  • the demand tag information may be, for example, the name of the document that the user needs to search, or the key content of the document. words, etc., there is no restriction on this.
  • the required document type can be, for example, a medical document type
  • the required tag information can be the keywords in the document search request.
  • Information such as: “Flu, children, 2021", etc., are not restricted.
  • parsing the requirement document type and requirement tag information from the document search request may include performing semantic parsing processing on the document search request to obtain the requirement document type and requirement tag information.
  • S903 Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.
  • the embodiment of the present disclosure can determine the document library corresponding to the required document type from multiple document libraries as the target document library according to the required document type in the document search request.
  • multiple document libraries in the target document search platform can be used to store documents to be processed of corresponding document types.
  • the target document library corresponding to the required document type is determined from the multiple document libraries. This can be done by first determining the target document library corresponding to the required document type. Multiple document types corresponding to multiple document libraries respectively, and after determining the requirement document type, compare the requirement document type with the multiple previously determined document types, and when the requirement document type and document type are the same, compare the requirement document type with the document type.
  • the document library corresponding to the document type is used as the target document library, and there is no restriction on this.
  • determining the target document library corresponding to the required document type from multiple document libraries may be, for example, when determining that the required document type is a medical document type, determining the document library used to store medical documents as the target document library, and then It can support document search in the medical document library, so that the target documents obtained by the search can effectively meet the medical document needs in the corresponding medical business scenarios.
  • determining a target document library corresponding to the required document type from multiple document libraries can be determined.
  • the document library is used as the target document library, and there is no restriction on this.
  • the target document library stores documents to be processed that match the document type of the user's business scenario, it is possible to support processing from the target document library that matches the business type of the user's business scenario.
  • Perform document search which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the searched documents can effectively meet the document search needs of different business scenarios.
  • S904 Search the target document library for the target document corresponding to the requirement tag information.
  • the document to be processed corresponding to the required tag information is searched from the multiple documents to be processed in the target document library as the target document.
  • the target document library stores tag information associated with the document. Accordingly, searching for the target document corresponding to the required tag information from the target document library may be to search for the required tag information in the target document library. Match the tag information, and use the document corresponding to the tag information as the target document.
  • the required label information is: "2021, influenza, children”
  • the document tag information is: "2021, influenza, children”
  • the document is used as the target document, and there is no restriction on this.
  • a pre-trained information matching model may be used to match the required tag information and the tag information. That is, the required tag information and the tag information may be matched.
  • the information is input into the pre-trained information matching model, and the information matching model performs matching processing on the demand label information and the label information to obtain the corresponding matching processing result, and the matching processing result indicates: the demand label information and the label information match.
  • search for tag information that matches the required tag information in the target document library or determine the matching degree value between the required tag information and the tag information, and when the matching degree value is greater than a predetermined matching degree threshold, the matching degree value with the required tag information will be determined.
  • the document to be processed corresponding to the tag information is used as the target document, and there is no restriction on this.
  • the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search process, thereby effectively improving the automation of document search and reducing labor costs.
  • IA intelligent automation
  • the required document type and the required tag information are parsed from the document search request, and the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries It belongs to a document search platform.
  • the document library is used to store documents of corresponding document types, and then searches for target documents corresponding to the required tag information from the target document library. This can support the search for documents that match the business type of the user's business scenario.
  • Document search is performed in the target document library, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the target documents obtained by searching can effectively meet the document search needs of different business scenarios.
  • FIG. 10 is a schematic flowchart of a document search method proposed by another embodiment of the present disclosure.
  • the document search method includes:
  • S1002 Parse the requirement document type and requirement tag information from the document search request.
  • S1003 Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.
  • S1004 Call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes to determine the target parent tag from multiple parent tags, where the target parent tag has a corresponding target subtag.
  • the parent tag participating in this document search can be called the target parent tag, and correspondingly, the child tag corresponding to the target parent tag can be called the target child tag.
  • multiple parent tags and corresponding sub-tags can be: "Document format - text format, disease - influenza, document update time - April 2021, research object - children, cause of disease - spontaneously caused", the target parent
  • the tag can be, for example, a parent tag participating in this document search, such as: “disease, research object, cause of disease”.
  • the target sub-tag can be a sub-tag corresponding to the target parent tag, such as: “children, spontaneously caused, influenza” ” etc., there is no restriction on this.
  • attributes can be used to determine whether the parent tag and the child tag corresponding to the parent tag participate in the document search.
  • the tags in the parent tag that participate in subsequent document searches can be called target parent tags.
  • the child tag corresponding to the target parent tag can be called the target child tag.
  • the attribute used for the requirement can be called the requirement attribute, and the requirement attribute can support the configuration adjustment of the parent tag in the target document library according to the user's document search requirements.
  • the embodiment of the present disclosure determines the target document library corresponding to the requirement document type from multiple document libraries, it can call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes, and select the target document library from the target document library. Adjust the corresponding attributes of the parent tag in to determine whether the parent tag and the child tags corresponding to the parent tag participate in subsequent document searches, and use the parent tags that participate in subsequent document searches as the target parent tag, and set the parent tag corresponding to the target parent tag.
  • the sub-tag serves as the target sub-tag, and then a subsequent document search method can be executed based on the target sub-tag. For details, please refer to subsequent embodiments.
  • the multiple parent tags can be: document format, disease, research time, research object, and the document that the user needs to search can be a document with children as the research object.
  • the document search can be During the process, according to the demand attributes, two labels such as disease and research time are hidden, so that the two parent labels such as disease and survey time and the corresponding sub-tags are not involved in subsequent document searches, and documents other than the parent label are Parent tags such as format and research object are used as the target parent tag, and the sub tags corresponding to the target parent tag are used as the target sub tags. From this, based on the demand attributes, we can determine from multiple parent tags that can effectively satisfy subsequent document searches. The target parent tag can further narrow the tag search scope, thereby effectively reducing the amount of data processed by tags in the subsequent document search process, thereby effectively ensuring the document search effect while effectively improving the document search efficiency. .
  • S1005 Search the target document from the target document library according to the requirement attribute, requirement subtag, and target subtag.
  • the target document after determining the target parent tag from multiple parent tags according to the requirement attribute, the target document can be searched from the target document library according to the requirement attribute, the requirement subtag, and the target subtag.
  • searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be performed by matching the requirement subtag and the target subtag (wherein, the matching processing method may be, for example, , model matching, feature matching, etc., there are no restrictions on this) to obtain the corresponding matching processing results, and the aforementioned matching processing results are further filtered according to the requirement attributes to obtain the target document, there are no restrictions on this.
  • the matching processing method may be, for example, , model matching, feature matching, etc., there are no restrictions on this
  • searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be to call a robotic process automation RPA robot to search for the target document based on the requirement subtag and the target.
  • Sub-tags to automatically search for documents to be filtered from multiple documents.
  • searching for the target document from the target document library based on the requirement subtag and the target subtag may be performed by matching the requirement subtag and the target subtag, and when the requirement subtag and the target subtag match,
  • the documents corresponding to the target sub-tag in the target document library are used as documents to be filtered, and there is no restriction on this.
  • the tags are the same, the document corresponding to the target sub-tag in the target document library will be used as the document to be filtered, and there is no restriction on this.
  • searching multiple documents to be filtered from the target document library based on the demand subtag and the target subtag may be to determine the similarity value between the demand subtag and the target subtag of each document, and When the similarity value meets the set conditions, the document corresponding to the corresponding target sub-tag is used as the document to be filtered.
  • the similarity value can be used to characterize the degree of similarity between the demand sub-label and the target sub-label.
  • the greater the similarity value the closer the demand sub-label and the target sub-label are to the same. On the contrary, the greater the similarity value. If it is small, it can indicate that the gap between the demand sub-label and the target sub-label is larger, and there is no restriction on this.
  • the Euclidean distance between the demand sub-label and the target sub-label may be determined, and the Euclidean distance may be used as the similarity value between the demand sub-label and the target sub-label, and Compare the similarity value with the preset setting conditions (where the setting conditions can be adaptively configured based on the document search requirements in actual business scenarios, without any restrictions), and when the similarity value satisfies the setting conditions
  • the documents corresponding to the corresponding target sub-tags are used as documents to be filtered.
  • multiple documents to be filtered that are searched from the target document library according to the requirement sub-tag and the target sub-tag can be sorted according to their corresponding similarity values.
  • the multiple documents to be filtered can be sorted according to the requirement attributes.
  • the target document is obtained by filtering out the documents to be filtered.
  • documents in the target document library may have multiple target sub-tags, and multiple documents to be processed may have a certain target sub-tag overlap.
  • documents in the target document library may have multiple target sub-tags, and multiple documents to be processed may have a certain target sub-tag overlap.
  • the target document is determined in the document, and there is no restriction on this.
  • the document search method described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams.
  • the document search platform can receive the document search request, and then can For the required attributes in the document search request, in the document attribute editing interface in the target document library (see Figure 11, which is a schematic diagram of the document attribute editing interface proposed by an embodiment of the present disclosure), the attributes of the tags in the target document library are Edit, thereby determining the target parent tag and target sub-tag from the parent tags in the target document library, so as to participate in the document search and obtain the corresponding target document.
  • the requirement sub-tag in the document search request can be entered into the document search interface of the target document search platform (see Figure 12, which is a schematic diagram of the document search interface proposed by an embodiment of the present disclosure).
  • the target document search platform can search according to the requirements. Search the target document based on the similarity value between the sub-tag and the target sub-tag, and sort one or more documents to be filtered obtained by the search according to the similarity value and then present them in the document search interface.
  • the filtering configuration items of the document search interface shown in Figure 12 enter the document screening interface (see Figure 13, which is a schematic diagram of the document screening interface proposed by an embodiment of the present disclosure), and according to the requirement attributes, the parent tag of the document to be filtered is Configure filter conditions to select target documents from multiple documents to be filtered.
  • the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries belong to Document search platform, the document library is used to store documents to be processed of corresponding document types, and determine the target document library corresponding to the required document type from multiple document libraries, where multiple document libraries belong to the document search platform, and the document library is used to Store documents of the corresponding document type, and then determine the target parent tag from multiple parent tags according to the requirement attribute.
  • the target parent tag has a corresponding target subtag, and the target parent tag is selected from the target according to the requirement attribute, the requirement subtag, and the target subtag. Search the target document in the document library.
  • the target parent tag that can effectively satisfy the subsequent document search can be determined from multiple parent tags, so that the tag search scope can be further narrowed, so that subsequent documents can be searched.
  • the amount of data processed by tags is effectively reduced, thereby effectively improving the document search efficiency while effectively ensuring the document search effect.
  • Figure 14 is a schematic structural diagram of a device for building a document search platform proposed by an embodiment of the present disclosure.
  • the construction device 140 of the document search platform includes: a first acquisition module 1401, used to acquire a document to be processed, where the document to be processed has a corresponding document type; a second acquisition module 1402, used to acquire the document to be processed.
  • Process the tag information corresponding to the document the building module 1403 is used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and the forming module 1404 is used to form a target document search platform based on the target document library.
  • Figure 15 is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure, in which the second acquisition module 1402 includes: a first determination sub-module 14021 , used to determine the parent tag corresponding to the document to be processed; the parsing sub-module 14022, used to parse the sub-tag corresponding to the parent tag from the document to be processed; and the processing sub-module 14023, used to combine the parent tag and the sub-tag as Label Information.
  • a first determination sub-module 14021 used to determine the parent tag corresponding to the document to be processed
  • the parsing sub-module 14022 used to parse the sub-tag corresponding to the parent tag from the document to be processed
  • the processing sub-module 14023 used to combine the parent tag and the sub-tag as Label Information.
  • the parsing sub-module 14022 is also used to: call the natural language processing NLP service in the artificial intelligence field, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document universal index to as a child tag; and/or call the NLP service to identify the associated entity value corresponding to the parent tag from the document to be processed, and use the associated entity value as a child tag.
  • the building module 1403 includes: a second determination sub-module 14031, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module 14032, used to store the tag Information and pending documents are stored in the initial document library to form the target document library.
  • the storage submodule 14032 is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed in Initial document library.
  • the storage submodule 14032 is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is If it is a link loading type, then the access link and corresponding tag information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through a rich text editor. , and store the editing processing results and corresponding tag information to the target document library.
  • the second acquisition module 1402 also includes: a configuration sub-module 14024, configured to configure attributes for the parent tag after determining the parent tag corresponding to the document to be processed, and use the configured attributes as the tag Information where the attribute identifies whether the parent tag participates in document searches.
  • a configuration sub-module 14024 configured to configure attributes for the parent tag after determining the parent tag corresponding to the document to be processed, and use the configured attributes as the tag Information where the attribute identifies whether the parent tag participates in document searches.
  • the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).
  • AI artificial intelligence
  • RPA robotic process automation
  • the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type.
  • Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.
  • Figure 16 is a schematic structural diagram of a document search device proposed by an embodiment of the present disclosure.
  • the document search device 160 includes: a receiving module 1601, used to receive a document search request; a parsing module 1602, used to parse the required document type and demand tag information from the document search request; and a determining module 1603, used to obtain the required document type and required tag information from the document search request.
  • Figure 17 is a schematic structural diagram of a document search device proposed by another embodiment of the present disclosure.
  • the requirement tag information includes: requirement attributes and requirement sub-tags, and the target document library has corresponding Multiple parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document.
  • the search module 1604 includes: a third determination sub-module 16041, which is used to call the natural language processing NLP service processing requirement attribute in the artificial intelligence field to determine the target parent tag from multiple parent tags, where the target parent tag has Corresponding target sub-tag; search sub-module 16042, used to search for target documents from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags.
  • a third determination sub-module 16041 which is used to call the natural language processing NLP service processing requirement attribute in the artificial intelligence field to determine the target parent tag from multiple parent tags, where the target parent tag has Corresponding target sub-tag
  • search sub-module 16042 used to search for target documents from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags.
  • the target document library includes: multiple documents; wherein, the search sub-module 16042 is also used to: call the robotic process automation RPA robot to select from multiple documents according to the demand sub-tag and the target sub-tag. Search for documents to be filtered; filter multiple documents to be filtered to obtain target documents based on required attributes.
  • the search sub-module 16042 is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then add the corresponding target sub-tag to The document corresponding to the label is used as the document to be filtered.
  • the required document type and the required tag information are parsed from the document search request, and the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries It belongs to a document search platform.
  • the document library is used to store documents of corresponding document types, and then searches for target documents corresponding to the required tag information from the target document library. This can support the search for documents that match the business type of the user's business scenario.
  • Document search is performed in the target document library, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the target documents obtained by searching can effectively meet the document search needs of different business scenarios.
  • the present disclosure also provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program, the aforementioned embodiments of the present disclosure are implemented.
  • FIG. 18 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 180 includes: a memory 1810 and a processor 1820.
  • the memory 1810 stores a computer program that can run on the processor 1820.
  • the processor 1820 executes the computer program, it implements the construction method of the document search platform in the above embodiment, or implements the document search method in the above embodiment.
  • the number of memory 1810 and processor 1820 may be one or more.
  • the electronic device also includes: a communication interface 1830, used for communicating with external devices and performing interactive data transmission. If the memory 1810, the processor 1820 and the communication interface 1830 are implemented independently, the memory 1810, the processor 1820 and the communication interface 1830 can be linked to each other through a bus and complete communication with each other.
  • the bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.
  • the memory 1810, the processor 1820 and the communication interface 1830 are integrated on one chip, the memory 1810, the processor 1820 and the communication interface 1830 can communicate with each other through the internal interface.
  • the present disclosure also provides a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, the method for building a document search platform as proposed in the foregoing embodiments of the disclosure is implemented, or the document search platform as in the foregoing embodiments is implemented. Search method.
  • the present disclosure also provides a computer program product that, when executed by an instruction processor in the computer program product, implements the construction method of a document search platform as proposed in the foregoing embodiments of the disclosure, or implements the document search method as in the foregoing embodiments.
  • processor can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processing, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU Central Processing Unit
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor can be a microprocessor or any conventional processor, etc. It is worth noting that the processor may be a processor that supports Advanced RISC Machines (ARM) architecture.
  • ARM Advanced RISC Machines
  • the above-mentioned memory may include read-only memory and random access memory, and may also include non-volatile random access memory.
  • the memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can include read-only memory (Read-Only Memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Erase programmable read-only memory Electrically EPROM, EEPROM
  • Volatile memory may include Random Access Memory (RAM), which acts as an external cache.
  • RAM Random Access Memory
  • RAM Random Access Memory
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. If the above integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present disclosure are a document search platform, a search method and apparatus, an electronic device, and a storage medium, the method comprising: acquiring a document to be processed, said document having a corresponding document type; acquiring tag information corresponding to said document; according to the tag information and said document, constructing a target document library corresponding to the document type; and according to the target document library, forming a target document search platform. Since the target document search platform is formed according to the target document library corresponding to the document type, on the basis of the target document library of the corresponding document type, the constructed target document search platform can supply to different service scenarios document search services of corresponding document types. Therefore, the reusability of a document search platform is effectively improved, and the constructed document search platform can effectively satisfy requirements of document search in different service scenarios. In addition, the present disclosure can achieve IA of constructing a document search platform in combination with RPA and AI, thereby further reducing the labor cost.

Description

文档搜索平台、搜索方法、装置、电子设备及存储介质Document search platform, search method, device, electronic device and storage medium
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210637112.2、申请日为2022年06月07日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210637112.2 and a filing date of June 7, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本公开涉及计算机技术技术领域,尤其涉及一种文档搜索平台、搜索方法、装置、电子设备及存储介质。The present disclosure relates to the technical field of computer technology, and in particular, to a document search platform, search method, device, electronic device and storage medium.
背景技术Background technique
机器人流程自动化(Robotic Process Automation,RPA),是指通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。Robotic Process Automation (RPA) refers to the use of specific "robot software" to simulate human operations on a computer and automatically execute process tasks according to rules. Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
相关技术中,为了满足相应业务场景的文档搜索需求,通常是根据相应业务场景下的全部文档构建文档搜索平台,而针对不同的业务场景,通常需要构建多个不同的文档搜索平台。In related technologies, in order to meet the document search requirements of corresponding business scenarios, a document search platform is usually constructed based on all documents in the corresponding business scenarios. For different business scenarios, it is usually necessary to build multiple different document search platforms.
这种方式下,构建得到的文档搜索平台无法满足不同业务场景中的文档搜索需求,导致构建得到的文档搜索平台无法被不同业务场景所复用。In this way, the built document search platform cannot meet the document search needs in different business scenarios, resulting in the built document search platform being unable to be reused in different business scenarios.
发明内容Contents of the invention
本公开实施例提供一种文档搜索平台的构建方法、文档搜索方法、装置、电子设备及存储介质,以解决相关技术存在的问题,技术方案如下:Embodiments of the present disclosure provide a document search platform construction method, document search method, device, electronic device and storage medium to solve problems existing in related technologies. The technical solutions are as follows:
第一方面,本公开实施例提出的文档搜索平台的构建方法,包括:获取待处理文档,其中,待处理文档具有对应的文档类型;获取与待处理文档对应的标签信息;根据标签信息和待处理文档,构建与文档类型对应的目标文档库;以及根据目标文档库,形成目标文档搜索平台。In the first aspect, the construction method of a document search platform proposed by the embodiment of the present disclosure includes: obtaining a document to be processed, where the document to be processed has a corresponding document type; obtaining tag information corresponding to the document to be processed; and based on the tag information and the document to be processed. Process documents and build a target document library corresponding to the document type; and form a target document search platform based on the target document library.
在一种实施方式中,获取与待处理文档对应的标签信息,包括:确定与待处理文档对应的父标签;从待处理文档中解析得到与父标签对应的子标签;以及将父标签和子标签共同作为标签信息。In one implementation, obtaining tag information corresponding to the document to be processed includes: determining the parent tag corresponding to the document to be processed; parsing the child tag corresponding to the parent tag from the document to be processed; and combining the parent tag and the child tag. Together as label information.
在一种实施方式中,从待处理文档中解析与父标签对应的子标签,包括:调用人工智能AI领域的自然语言处理NLP服务,从所述待处理文档中识别与所述父标签对应的文档通用索引,并将所述文档通用索引作为所述子标签;和/或调用所述NLP服务,从所述待处理文档中识别与所述父标签对应的关联实体值,并将所述关联实体值作为所述子标签。In one implementation, parsing the child tags corresponding to the parent tag from the document to be processed includes: calling a natural language processing NLP service in the field of artificial intelligence AI to identify the child tag corresponding to the parent tag from the document to be processed. Document universal index, and use the document universal index as the sub-tag; and/or call the NLP service, identify the associated entity value corresponding to the parent tag from the document to be processed, and associate the Entity value as the child tag.
在一种实施方式中,根据标签信息和待处理文档,构建与文档类型对应的目标文档库,包括:调用机器人流程自动化RPA机器人,确定与文档类型对应的初始文档库;将标签信息和待处理文档存储至初始文档库,以形成目标文档库。In one implementation, building a target document library corresponding to the document type based on the tag information and the document to be processed includes: calling a robotic process automation RPA robot to determine the initial document library corresponding to the document type; combining the tag information and the document to be processed Documents are stored in the initial document library to form the target document library.
在一种实施方式中,将标签信息和待处理文档存储至初始文档库,包括:获取与待处理文档对应的目标加载类型;采用与目标加载类型对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库。In one implementation, storing the tag information and the document to be processed in the initial document library includes: obtaining the target loading type corresponding to the document to be processed; using the target document storage method corresponding to the target loading type to store the tag information and the document to be processed. Process documents are stored in the initial document library.
在一种实施方式中,采用与目标加载类型对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库,包括:如果目标加载类型是文档加载类型,则将待处理文档和相应文档标签信息存储至目标文档库;和/或如果目标加载类型是链接加载类型,则将与待处理文档对应的访问链接和相应标签信息存储至目标文档库;和/或如果目标加载类型是富文本加载类型,则经由富文本编辑器对待处理文档进行编辑处理,并将编辑处理结果和相应标签信息存储至目标文档库。In one implementation, the target document storage method corresponding to the target loading type is used to store the tag information and the document to be processed in the initial document library, including: if the target loading type is a document loading type, then storing the document to be processed and the corresponding document The document tag information is stored in the target document library; and/or if the target load type is a link load type, the access link and corresponding tag information corresponding to the document to be processed are stored in the target document library; and/or if the target load type is a rich For the text loading type, the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.
在一种实施方式中,在确定与待处理文档对应的父标签之后,还包括:针对父标签配置属性,并将所配置属性作为标签信息,其中,属性用于标识父标签是否参与文档搜索。In one implementation, after determining the parent tag corresponding to the document to be processed, the method further includes: configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.
第二方面,本公开实施例提出的文档搜索方法,应用于文档搜索平台,文档搜索平台由上述第一方面的文档搜索平台的构建方法构建得到,其中,该文档搜索方法包括:接收文档搜索请求,并从文档搜索请求中解析需求文档类型和需求标签信息,再从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档 类型的文档;从目标文档库中搜索与需求标签信息对应的目标文档。In a second aspect, the document search method proposed by the embodiment of the present disclosure is applied to a document search platform. The document search platform is constructed by the construction method of the document search platform in the first aspect, wherein the document search method includes: receiving a document search request. , and parses the requirement document type and requirement tag information from the document search request, and then determines the target document library corresponding to the requirement document type from multiple document libraries. Among them, multiple document libraries belong to the document search platform, and the document library is used for storage Documents of corresponding document types; search for target documents corresponding to the requirement tag information from the target document library.
在一种实施方式中,需求标签信息包括:需求属性和需求子标签,目标文档库中具有对应的多个父标签,父标签具有所对应子标签,所对应子标签用于描述文档;其中,从目标文档库中搜索与需求标签信息对应的目标文档,包括:调用人工智能AI领域的自然语言处理NLP服务处理所述需求属性,以从所述多个父标签中确定目标父标签,其中,所述目标父标签具有所对应目标子标签;根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档。In one implementation, the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document; wherein, Searching the target document library for the target document corresponding to the requirement tag information includes: calling a natural language processing NLP service in the field of artificial intelligence to process the requirement attribute to determine the target parent tag from the plurality of parent tags, where, The target parent tag has a corresponding target sub-tag; the target document is searched from the target document library according to the requirement attribute, the requirement sub-tag, and the target sub-tag.
在一种实施方式中,目标文档库包括:多个文档;其中,根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档,包括:调用机器人流程自动化RPA机器人,以根据所述需求子标签和所述目标子标签从多个所述文档中搜索待筛选文档;根据需求属性,从多个待筛选文档中筛选得到目标文档。In one implementation, the target document library includes: multiple documents; wherein, searching the target document from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags includes: calling a robotic process automation RPA robot to search according to The demand sub-tag and the target sub-tag search for documents to be filtered from a plurality of the documents; according to the demand attributes, a target document is obtained from the plurality of documents to be filtered.
在一种实施方式中,根据需求子标签和目标子标签从目标文档库中搜索多个待筛选文档,包括:确定需求子标签和各个文档的目标子标签之间的相似度值;如果相似度值满足设定条件,则将相应目标子标签所对应文档作为待筛选文档。In one implementation, searching multiple documents to be filtered from the target document library according to the demand sub-tag and the target sub-tag includes: determining the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity If the value meets the set conditions, the document corresponding to the corresponding target sub-tag will be used as the document to be filtered.
第三方面,本公开实施例提出的文档搜索平台的构建装置,包括:第一获取模块,用于获取待处理文档,其中,待处理文档具有对应的文档类型;第二获取模块,用于获取与待处理文档对应的标签信息;构建模块,用于根据标签信息和待处理文档,构建与文档类型对应的目标文档库;以及形成模块,用于根据目标文档库,形成目标文档搜索平台。In the third aspect, the construction device of the document search platform proposed by the embodiment of the present disclosure includes: a first acquisition module, used to acquire documents to be processed, where the documents to be processed have corresponding document types; and a second acquisition module, used to acquire Tag information corresponding to the document to be processed; a building module used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and a forming module used to form a target document search platform based on the target document library.
在一种实施方式中,第二获取模块,包括:第一确定子模块,用于确定与待处理文档对应的父标签;解析子模块,用于从待处理文档中解析得到与父标签对应的子标签;以及处理子模块,用于将父标签和子标签共同作为标签信息。In one implementation, the second acquisition module includes: a first determination sub-module, used to determine the parent tag corresponding to the document to be processed; and a parsing sub-module, used to parse the document to be processed to obtain the parent tag corresponding to the document. Sub-tags; and processing sub-modules, used to use parent tags and sub-tags together as tag information.
在一种实施方式中,解析子模块,还用于:调用人工智能AI领域的自然语言处理NLP服务,从所述待处理文档中识别与所述父标签对应的文档通用索引,并将所述文档通用索引作为所述子标签;和/或调用所述NLP服务,从所述待处理文档中识别与所述父标签对应的关联实体值,并将所述关联实体值作为所述子标签。In one implementation, the parsing sub-module is also used to: call the natural language processing NLP service in the field of artificial intelligence, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document The document universal index is used as the sub-tag; and/or the NLP service is called, the associated entity value corresponding to the parent tag is identified from the document to be processed, and the associated entity value is used as the sub-tag.
在一种实施方式中,构建模块,包括:第二确定子模块,用于调用机器人流程自动化RPA机器人,确定与文档类型对应的初始文档库;存储子模块,用于将标签信息和待处理文档存储至初始文档库,以形成目标文档库。In one implementation, the building module includes: a second determination sub-module, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module, used to combine the tag information and the document to be processed Store to the initial document library to form the target document library.
在一种实施方式中,存储子模块,还用于:获取与待处理文档对应的目标加载类型;采用与目标加载类型对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库。In one implementation, the storage submodule is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed into the initial document library .
在一种实施方式中,存储子模块,还用于:如果目标加载类型是文档加载类型,则将待处理文档和相应文档标签信息存储至目标文档库;和/或如果目标加载类型是链接加载类型,则将与待处理文档对应的访问链接和相应标签信息存储至目标文档库;和/或如果目标加载类型是富文本加载类型,则经由富文本编辑器对待处理文档进行编辑处理,并将编辑处理结果和相应标签信息存储至目标文档库。In one implementation, the storage submodule is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through the rich text editor, and The editing processing results and corresponding label information are stored in the target document library.
在一种实施方式中,第二获取模块,还包括:配置子模块,针对父标签配置属性,并将所配置属性作为标签信息,其中,属性用于标识父标签是否参与文档搜索。In one implementation, the second acquisition module further includes: a configuration sub-module, configuring attributes for the parent tag, and using the configured attributes as tag information, where the attributes are used to identify whether the parent tag participates in document search.
在一种实施方式中,文档搜索平台的构建方法是采用人工智能AI和机器人流程自动化RPA实现的。In one implementation, the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).
第四方面,本公开实施例提出的文档搜索装置,该文档搜索装置由上述第三方面包括的文档搜索平台的构建装置构建得到,该文档搜索装置包括:接收模块,用于接收文档搜索请求;解析模块,用于从文档搜索请求中解析需求文档类型和需求标签信息;确定模块,用于从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档;搜索模块,用于从目标文档库中搜索与需求标签信息对应的目标文档。A fourth aspect is a document search device proposed by an embodiment of the present disclosure. The document search device is constructed from the document search platform construction device included in the third aspect. The document search device includes: a receiving module for receiving a document search request; The parsing module is used to parse the demand document type and demand tag information from the document search request; the determination module is used to determine the target document library corresponding to the demand document type from multiple document libraries, wherein the multiple document libraries belong to the document search On the platform, the document library is used to store documents of corresponding document types; the search module is used to search for target documents corresponding to the requirement tag information from the target document library.
在一种实施方式中,需求标签信息包括:需求属性和需求子标签,目标文档库中具有对应的多个父标签,父标签具有所对应子标签,所对应子标签用于描述文档;其中,搜索模块,包括:第三确定子模块,用于调用人工智能AI领域的自然语言处理NLP服务处理所述需求属性,以从所述多个父标签中确定目标父标签,其中,所述目标父标签具有所对应目标子标签;搜索子模块,用于根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档。In one implementation, the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document; wherein, The search module includes: a third determination sub-module, which is used to call the natural language processing NLP service in the artificial intelligence field to process the demand attribute to determine the target parent tag from the multiple parent tags, wherein the target parent tag The tag has a corresponding target sub-tag; the search sub-module is used to search for target documents from the target document library based on the requirement attributes, requirement sub-tags, and target sub-tags.
在一种实施方式中,目标文档库包括:多个文档;其中,搜索子模块,还用于:调用机器 人流程自动化RPA机器人,以根据所述需求子标签和所述目标子标签从多个所述文档中搜索待筛选文档;根据需求属性从多个待筛选文档中筛选得到目标文档。In one implementation, the target document library includes: multiple documents; wherein the search sub-module is further configured to: call a robotic process automation RPA robot to select from multiple required sub-tags according to the requirement sub-tag and the target sub-tag. Search the documents to be filtered in the above documents; filter out the target documents from multiple documents to be filtered according to the requirement attributes.
在一种实施方式中,搜索子模块,还用于:确定需求子标签和各个文档的目标子标签之间的相似度值;如果相似度值满足设定条件,则将相应目标子标签所对应文档作为待筛选文档。In one implementation, the search sub-module is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then search for the corresponding target sub-tag. The document is used as a document to be filtered.
第五方面,本公开实施例提出的电子设备,其包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行程序时,实现如第一方面实施例提供的一种文档搜索平台的构建方法,或者实现如第二方面实施例提供的一种文档搜索方法。In a fifth aspect, an electronic device provided by an embodiment of the present disclosure includes: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the implementation of the first aspect is implemented. A method for constructing a document search platform provided by the example, or a method for document search provided by the embodiment of the second aspect.
第六方面,本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,程序被处理器执行时,实现如第一方面实施例提供的一种文档搜索平台的构建方法,或者实现如第二方面实施例提供的一种文档搜索方法。In a sixth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, a method for constructing a document search platform as provided in the embodiment of the first aspect is implemented. Or implement a document search method as provided in the embodiment of the second aspect.
上述概述仅仅是为了说明书的目的,并不意图以任何方式进行限制。除上述描述的示意性的方面、实施方式和特征之外,通过参考附图和以下的详细描述,本公开进一步的方面、实施方式和特征将会是容易明白的。The above summary is for illustration purposes only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments and features described above, further aspects, embodiments and features of the present disclosure will be readily apparent by reference to the drawings and the following detailed description.
附图说明Description of the drawings
在附图中,除非另外规定,否则贯穿多个附图相同的附图标记表示相同或相似的部件或元素。这些附图不一定是按照比例绘制的。应该理解,这些附图仅描绘了根据本公开提出的一些实施方式,而不应将其视为是对本公开范围的限制。In the drawings, unless otherwise specified, the same reference numbers refer to the same or similar parts or elements throughout the several figures. The drawings are not necessarily to scale. It should be understood that these drawings depict only some embodiments proposed in accordance with the disclosure and are not to be considered limiting of the scope of the disclosure.
图1是本公开一实施例提出的文档搜索平台的构建方法的流程示意图。Figure 1 is a schematic flowchart of a method for building a document search platform proposed by an embodiment of the present disclosure.
图2是本公开另一实施例提出的文档搜索平台的构建方法的流程示意图。FIG. 2 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
图3是本公开另一实施例提出的文档搜索平台的构建方法的流程示意图。FIG. 3 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
图4是本公开一实施例提出的文档库构建界面的示意图。Figure 4 is a schematic diagram of a document library construction interface proposed by an embodiment of the present disclosure.
图5是本公开一实施例提出的文档存储管理界面的示意图。Figure 5 is a schematic diagram of a document storage management interface proposed by an embodiment of the present disclosure.
图6A是本公开一实施例提出的文档加载类型的文档存储界面的示意图。FIG. 6A is a schematic diagram of a document storage interface of document loading type proposed by an embodiment of the present disclosure.
图6B是本公开一实施例提出的链接加载类型的文档存储界面的示意图。FIG. 6B is a schematic diagram of a link loading type document storage interface proposed by an embodiment of the present disclosure.
图6C是本公开一实施例提出的富文本加载类型的文档存储界面的示意图。FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure.
图7是本公开一实施例提出的标签信息配置界面的示意图。Figure 7 is a schematic diagram of a tag information configuration interface proposed by an embodiment of the present disclosure.
图8是本公开一实施例提出的属性配置界面的示意图。Figure 8 is a schematic diagram of an attribute configuration interface proposed by an embodiment of the present disclosure.
图9是本公开一实施例提出的文档搜索方法的流程示意图。FIG. 9 is a schematic flowchart of a document search method proposed by an embodiment of the present disclosure.
图10是本公开另一实施例提出的文档搜索方法的流程示意图。FIG. 10 is a schematic flowchart of a document search method proposed by another embodiment of the present disclosure.
图11是本公开一实施例提出的文档属性编辑界面的示意图。Figure 11 is a schematic diagram of a document attribute editing interface proposed by an embodiment of the present disclosure.
图12是本公开一实施例提出的文档搜索界面的示意图。Figure 12 is a schematic diagram of a document search interface proposed by an embodiment of the present disclosure.
图13是本公开一实施例提出的文档筛选界面的示意图。Figure 13 is a schematic diagram of a document screening interface proposed by an embodiment of the present disclosure.
图14是本公开一实施例提出的文档搜索平台的构建装置的结构示意图。Figure 14 is a schematic structural diagram of a device for building a document search platform proposed by an embodiment of the present disclosure.
图15是本公开另一实施例提出的文档搜索平台的构建装置的结构示意图。Figure 15 is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure.
图16是本公开一实施例提出的文档搜索装置的结构示意图。Figure 16 is a schematic structural diagram of a document search device proposed by an embodiment of the present disclosure.
图17是本公开另一实施例提出的文档搜索装置的结构示意图。FIG. 17 is a schematic structural diagram of a document search device according to another embodiment of the present disclosure.
图18是本公开一实施例提供的电子设备的硬件结构示意图。FIG. 18 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present disclosure and are not to be construed as limitations of the present disclosure.
在本公开实施例的描述中,术语“多个”,是指两个或两个以上。In the description of the embodiments of the present disclosure, the term "plurality" refers to two or more than two.
在本公开实施例的描述中,术语“待处理文档”,是指当前待对其进行处理的文档,例如,专业知识文档,企业信息文档等。In the description of the embodiments of the present disclosure, the term "document to be processed" refers to a document that is currently to be processed, such as a professional knowledge document, an enterprise information document, etc.
在本公开实施例的描述中,术语“文档类型”,是指待处理文档可以根据不同的划分依据被划分为多种文档类型,例如,医学文档类型,法律文档类型等。In the description of the embodiments of the present disclosure, the term "document type" means that the document to be processed can be divided into multiple document types according to different classification basis, for example, medical document type, legal document type, etc.
在本公开实施例的描述中,术语“标签信息”,是指用于对待处理文档的标签进行描述的信息,例如,标签的特征信息,标签的内容信息等。In the description of the embodiment of the present disclosure, the term "tag information" refers to information used to describe the tag of the document to be processed, for example, the characteristic information of the tag, the content information of the tag, etc.
在本公开实施例的描述中,术语“父标签”,是指针对待处理文档预先设定的标签,即可以被称为父标签,该父标签可以例如是从其他平台中复用得到的关联实体。In the description of the embodiments of the present disclosure, the term "parent tag" refers to a preset tag of the document to be processed, that is, it can be called a parent tag. The parent tag can be, for example, an associated entity reused from other platforms. .
在本公开实施例的描述中,术语“文档通用索引”,是指对于全部的待处理文档而言,均可进行适用的文档索引,例如,文档类型,文档大小,文档名称,文档存储地址,文档更新时间等。In the description of the embodiments of the present disclosure, the term "document universal index" refers to applicable document indexing for all documents to be processed, such as document type, document size, document name, document storage address, Document update time, etc.
在本公开实施例的描述中,术语“属性”,是指用于对父标签的属性进行描述的信息,例如,标签名称,标签数据格式,标签是否允许修改说明,标签是否参与搜索说明等信息。In the description of the embodiment of the present disclosure, the term "attribute" refers to information used to describe the attributes of the parent tag, such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. .
在本公开实施例的描述中,术语“子标签”,是指与父标签对应的具体文档内容。In the description of the embodiments of the present disclosure, the term "child tag" refers to the specific document content corresponding to the parent tag.
在本公开实施例的描述中,术语“关联实体值”,是指用于对相应父标签进行具体描述的相关信息,例如,关联实体的特征信息,关联实体的内容信息等。In the description of the embodiments of the present disclosure, the term "associated entity value" refers to the relevant information used to specifically describe the corresponding parent tag, for example, the characteristic information of the associated entity, the content information of the associated entity, etc.
在本公开实施例的描述中,术语“初始文档库”,是指在文档搜索平台的构建方法执行的初始阶段,从多个文档库中与待处理文档的文档类型对应的文档库。In the description of the embodiments of the present disclosure, the term "initial document library" refers to the document library corresponding to the document type of the document to be processed from multiple document libraries in the initial stage of execution of the construction method of the document search platform.
在本公开实施例的描述中,术语“文档搜索请求”,是指用户侧电子设备作出的用于触发在文档搜索平台中进行文档搜索的请求。In the description of the embodiments of the present disclosure, the term "document search request" refers to a request made by the user-side electronic device for triggering a document search in the document search platform.
在本公开实施例的描述中,术语“需求文档类型”,是指用户在进行文档搜索时,可以具有其所需求进行搜索的文档类型,该文档类型即可以被称为需求文档类型,该需求文档搜索类型可以用于对用户所处的业务场景中的文档搜索需求进行表征。In the description of the embodiments of the present disclosure, the term "requirement document type" refers to the document type that a user may need to search when performing a document search. This document type may be called a requirement document type. This requirement The document search type can be used to characterize the document search needs in the user's business scenario.
在本公开实施例的描述中,术语“需求标签信息”,是指用户在进行文档搜索时,可以具有其所需求进行搜索的标签信息,例如,用户需要搜索的文档名称,文档内容关键词等。In the description of the embodiment of the present disclosure, the term "required tag information" refers to the tag information that the user needs to search when searching for documents, such as the name of the document that the user needs to search, keywords of the document content, etc. .
参照下面的描述和附图,将清楚本公开的实施例的这些和其他方面。在这些描述和附图中,具体公开了本公开的实施例中的一些特定实施方式,来表示实施本公开的实施例的原理的一些方式,但是应当理解,本公开的实施例的范围不受此限制。相反,本公开的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。These and other aspects of embodiments of the present disclosure will become apparent with reference to the following description and accompanying drawings. In these descriptions and drawings, some specific implementations of the embodiments of the disclosure are specifically disclosed to represent some of the ways of implementing the principles of the embodiments of the disclosure, but it should be understood that the scope of the embodiments of the disclosure is not limited by this restriction. On the contrary, the disclosed embodiments include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.
图1是本公开一实施例提出的文档搜索平台的构建方法的流程示意图。Figure 1 is a schematic flowchart of a method for building a document search platform proposed by an embodiment of the present disclosure.
本实施例以文档搜索平台的构建方法被配置为文档搜索平台的构建装置中来举例说明,本实施例中文档搜索平台的构建方法可以被配置在文档搜索平台的构建装置中,文档搜索平台的构建装置可以设置在服务器中,或者也可以设置在电子设备中,本公开实施例对此不作限制。This embodiment illustrates that the construction method of the document search platform is configured as a construction device of the document search platform. In this embodiment, the construction method of the document search platform can be configured in the construction device of the document search platform. The construction device may be provided in a server or in an electronic device, which is not limited in the embodiments of the present disclosure.
参见图1,该文档搜索平台的构建方法,包括:Referring to Figure 1, the construction method of the document search platform includes:
S101:获取待处理文档,其中,待处理文档具有对应的文档类型。S101: Obtain a document to be processed, where the document to be processed has a corresponding document type.
其中,当前待对其进行处理的文档,即可以被称为待处理文档,该待处理文档可以在文档搜索平台的构建方法的执行过程中,用于辅助构建文档搜索平台,该待处理文档可以具体例如为专业知识文档,企业信息文档等,对此不做限制。Among them, the document currently to be processed can be called a document to be processed. The document to be processed can be used to assist in building the document search platform during the execution of the construction method of the document search platform. The document to be processed can be Specific examples include professional knowledge documents, enterprise information documents, etc., and there are no restrictions on this.
可以理解的是,待处理文档可以根据不同的划分依据被划分为多种文档类型,例如,可以按照待处理文档所属的不同应用场景为划分依据,将待处理文档划分为不同的文档类型,该文档类型可以具体例如为医学文档类型,法律文档类型,对此不做限制。It can be understood that the documents to be processed can be divided into multiple document types according to different classification basis. For example, the documents to be processed can be divided into different document types according to the different application scenarios to which the documents to be processed belong. The document type may be, for example, a medical document type or a legal document type, and there is no limit to this.
本公开实施例中,可以是由文档搜索平台预先提供相应的数据传输接口,并经由该数据传输接口获取线下不同业务场景中发布的文档,并将该文档作为待处理文档,对此不做限制。In the embodiment of the present disclosure, the document search platform may provide a corresponding data transmission interface in advance, and obtain documents published in different offline business scenarios through the data transmission interface, and use the documents as documents to be processed. This is not done. limit.
一些实施例中,还可以是预先建立线下不同业务场景平台和文档搜索平台之间的数据传输链路,并在线下不同业务场景平台有新文档发布时,生成相应的文档传输指令,并经由该文档传输指令,触发线下业务场景平台将新发布的文档传输至文档搜索平台,或者,也可以采用任意可能的方式,实现获取待处理文档,对此不做限制。In some embodiments, data transmission links between different offline business scenario platforms and document search platforms can also be established in advance, and when new documents are released on different offline business scenario platforms, corresponding document transmission instructions are generated and passed through This document transfer instruction triggers the offline business scenario platform to transfer newly released documents to the document search platform, or it can also use any possible method to obtain documents to be processed, without restrictions.
本公开实施例在从线下不同业务场景中获取待处理文档后,可以根据待处理文档所属的业务场景,对相应待处理文档进行相应业务场景的标注处理,例如,可以是在待处理文档是从医学场景中获取得到时,将待处理文档标注为医学文档类型,对此不做限制。After obtaining the documents to be processed from different offline business scenarios, the embodiments of the present disclosure can perform annotation processing on the corresponding documents to be processed according to the business scenario to which the documents to be processed belong. For example, the documents to be processed can be marked when the documents to be processed are When obtained from a medical scene, the document to be processed is marked as a medical document type, and there is no restriction on this.
S102:获取与待处理文档对应的标签信息。S102: Obtain tag information corresponding to the document to be processed.
其中,标签可以用于对文档的基础属性及特征进行描述,该标签可以用于索引和管理文档的结构化字段信息,该标签可以具体例如为,文档名称,文档更新时间等,对此不做限制。Among them, the tag can be used to describe the basic attributes and characteristics of the document. The tag can be used to index and manage the structured field information of the document. The tag can be specifically, for example, document name, document update time, etc., but this is not done. limit.
其中,用于对待处理文档的标签进行描述的信息,即可以被称为标签信息,该标签信息可以具体例如为,标签的特征信息,标签的内容信息等,对此不做限制。Among them, the information used to describe the tag of the document to be processed can be called tag information. The tag information can be specifically, for example, the characteristic information of the tag, the content information of the tag, etc., and there is no limit to this.
本公开实施例中,在获取待处理文档后,可以对待处理文档进行标签信息识别,以获取与 待处理文档对应的标签信息。In the embodiment of the present disclosure, after obtaining the document to be processed, tag information of the document to be processed can be identified to obtain tag information corresponding to the document to be processed.
举例而言,对待处理文档进行标签信息识别,可以是对待处理文档进行实体识别,例如,可以是在获取得到待处理文档后,将待处理文档输入至预先训练的人工智能AI模型(该人工智能AI模型可以支持对待处理文档进行实体识别)中,由人工智能AI模型对待处理文档进行实体识别,以得到与该待处理文档相应的多个实体信息,并将该实体信息作为与待处理文档对应的标签信息,对此不做限制。For example, identifying the tag information of the document to be processed can be performed by identifying the entity of the document to be processed. For example, after obtaining the document to be processed, the document to be processed can be input into a pre-trained artificial intelligence AI model (the artificial intelligence model). The AI model can support entity recognition of the document to be processed), in which the artificial intelligence AI model performs entity recognition on the document to be processed to obtain multiple entity information corresponding to the document to be processed, and use the entity information as the corresponding entity information to the document to be processed label information, there is no restriction on this.
或者,对待处理文档进行标签信息识别,还可以是在获取得到待处理文档后,对待处理文档进行特征解析,例如,可以是经由特征解析算法对待处理文档进行特征解析处理,以得到与待处理文档相应的多个特征信息,并将该特征信息作为与待处理文档对应的标签信息,对此不做限制。Alternatively, the tag information of the document to be processed can be identified, or feature analysis of the document to be processed can be performed after obtaining the document to be processed. For example, the document to be processed can be feature parsed through a feature parsing algorithm to obtain the document to be processed. Corresponding multiple feature information, and use the feature information as tag information corresponding to the document to be processed, without limitation.
S103:根据标签信息和待处理文档,构建与文档类型对应的目标文档库。S103: Build a target document library corresponding to the document type based on the tag information and the document to be processed.
本公开实施例在获取得到待处理文档,并确定与待处理文档对应的标签信息后,可以根据标签信息和待处理文档,构建与文档类型对应的文档库,该文档库即可以被称为目标文档库。In this disclosed embodiment, after obtaining the document to be processed and determining the tag information corresponding to the document to be processed, a document library corresponding to the document type can be constructed based on the tag information and the document to be processed, and the document library can be called a target Document library.
一些实施例中,根据标签信息和待处理文档,构建与文档类型对应的目标文档库,还可以是预先在文档搜索平台中构建相应文档类型的文档库,并采用文档类型对相应文档库进行标注处理,而后,可以在获取得到待处理文档后,将相应文档类型的待处理文档存储至对应文档类型的文档库中,并将标签信息配置在相应待处理文档侧,从而构建得到目标文档库,对此不做限制。In some embodiments, a target document library corresponding to the document type is constructed based on the tag information and the document to be processed. It is also possible to construct a document library of the corresponding document type in the document search platform in advance, and use the document type to annotate the corresponding document library. Processing, and then, after obtaining the document to be processed, the document to be processed of the corresponding document type can be stored in the document library of the corresponding document type, and the tag information can be configured on the side of the corresponding document to be processed, thereby constructing the target document library. There are no restrictions on this.
本公开实施例中,目标文档库可以用于存储相应文档类型的待处理文档和与待处理文档对应的标签信息,也即是说,根据标签信息和待处理文档,构建与文档类型对应的目标文档库,可以是将获取得到的具有相同文档类型的待处理文档和相应标签信息存储至一个文档库中,以构建目标文档库,对此不做限制。In the embodiment of the present disclosure, the target document library can be used to store documents to be processed of corresponding document types and tag information corresponding to the documents to be processed. That is to say, based on the tag information and documents to be processed, a target corresponding to the document type is constructed. The document library can store the obtained documents to be processed with the same document type and corresponding label information into a document library to build a target document library, and there is no limit to this.
本公开实施例中,由于构建得到的目标文档库中只用于对应存储一种文档类型的待处理文档和标签信息,从而可以在后续根据目标文档库,形成目标文档搜索平台后,根据实际业务场景调用目标文档搜索平台中与实际业务场景相对应的文档类型的目标文档库,进而在该目标文档库中进行文档搜索时,使得搜索得到的文档能够有效的适配于实际业务场景的文档搜索需求。In this disclosed embodiment, since the constructed target document library is only used to store documents to be processed and tag information of one document type, it is possible to subsequently form a target document search platform based on the target document library, and then use the target document library according to the actual business. The scenario calls the target document library of the document type corresponding to the actual business scenario in the target document search platform, and then when searching for documents in the target document library, the searched documents can be effectively adapted to the document search of the actual business scenario. need.
S104:根据目标文档库,形成目标文档搜索平台。S104: Form a target document search platform based on the target document library.
本公开实施例在根据标签信息和待处理文档,构建与文档类型对应的目标文档库后,可以根据目标文档库对文档搜索平台进行处理,并将前述处理得到的文档搜索平台作为目标文档搜索平台。In the embodiment of the present disclosure, after constructing a target document library corresponding to the document type based on the tag information and the document to be processed, the document search platform can be processed according to the target document library, and the document search platform obtained by the aforementioned processing can be used as the target document search platform. .
本公开实施例中,根据目标文档库,对文档搜索平台进行处理,可以是在构建得到目标文档库后,将目标文档库部署在文档搜索平台中,并根据目标文档库中的待处理文档相应的业务场景对目标文档库进行标注处理(例如,可以根据目标文档库中的待处理文档相应的业务场景,将目标文档库标注为医学文档库,法律文档库等,对此不做限制),并将前述标注处理得到的文档搜索平台作为目标文档搜索平台。In the embodiment of the present disclosure, the document search platform is processed according to the target document library. After the target document library is constructed, the target document library is deployed in the document search platform, and corresponding documents are processed according to the documents to be processed in the target document library. Label the target document library according to the business scenario (for example, you can label the target document library as a medical document library, legal document library, etc. according to the business scenario corresponding to the documents to be processed in the target document library, and there is no restriction on this), And the document search platform obtained by the aforementioned annotation processing is used as the target document search platform.
或者,也可以采用其他任意可能的方法,实现根据目标文档库,形成目标文档搜索平台,对此不做限制。Alternatively, any other possible method can be used to form a target document search platform based on the target document library, and there is no limit to this.
本公开实施例中,由于是根据标签信息和待处理文档,构建与文档类型相应业务场景的目标文档库,从而在将目标文档库部署在文档搜索平台中时,使得构建得到的目标文档搜索平台,可以基于多个具有不同业务场景的目标文档库,为不同的业务场景提供文档搜索服务,从而可以支持不同业务场景在进行相应文档搜索时,无需重新对相应业务场景的文档搜索平台进行构建,即可以直接对目标文档搜索平台进行调用,从而能够有效地提升文档搜索平台的复用性,使得构建得到的目标文档搜索平台能够有效地满足不同业务场景的文档搜索需求。In the embodiment of the present disclosure, since the target document library corresponding to the business scenario of the document type is constructed based on the tag information and the document to be processed, when the target document library is deployed in the document search platform, the constructed target document search platform , which can provide document search services for different business scenarios based on multiple target document libraries with different business scenarios, thus supporting different business scenarios when searching for corresponding documents without having to re-build the document search platform for the corresponding business scenarios. That is, the target document search platform can be directly called, thereby effectively improving the reusability of the document search platform, so that the constructed target document search platform can effectively meet the document search needs of different business scenarios.
本公开实施例中,本公开实施例可以有效结合RPA和AI实现文档搜索平台构建过程的智能自动化(Intelligent Automation,IA),从而有效提升文档搜索平台构建的自动化程度,降低人工成本。In the embodiments of the disclosure, the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search platform construction process, thereby effectively improving the automation level of the document search platform construction and reducing labor costs.
本实施例中,通过获取待处理文档,其中,待处理文档具有对应的文档类型,并获取与待处理文档对应的标签信息,再根据标签信息和待处理文档,构建与文档类型对应的目标文档库,以及根据目标文档库,形成目标文档搜索平台,由于根据与文档类型对应的目标文档库,形成目标文档搜索平台,从而使得构建得到的目标文档平台可以基于相应文档类型的目标文档库, 为不同业务场景提供相应文档类型的文档搜索服务,从而能够有效地提升文档搜索平台的复用性,使得构建得到的文档搜索平台能够有效地满足不同业务场景的文档搜索需求。In this embodiment, the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type, as Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.
图2是本公开另一实施例提出的文档搜索平台的构建方法的流程示意图。FIG. 2 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
参见图2,该文档搜索平台的构建方法,包括:Referring to Figure 2, the construction method of the document search platform includes:
S201:获取待处理文档,其中,待处理文档具有对应的文档类型。S201: Obtain a document to be processed, where the document to be processed has a corresponding document type.
S201的描述说明可以具体参见上述实施例,在此不再赘述。For the description of S201, reference may be made to the above-mentioned embodiments, and details will not be described again here.
S202:确定与待处理文档对应的父标签。S202: Determine the parent tag corresponding to the document to be processed.
其中,针对待处理文档预先设定的标签,即可以被称为父标签,该父标签可以是从其他平台中复用得到的关联实体,还可以是从标签库中预先获取得到的标签,还可以是适用于所有待处理文档的索引信息,对此不做限制。Among them, the tags preset for the document to be processed can be called parent tags. The parent tags can be associated entities reused from other platforms, or tags pre-obtained from the tag library, or It can be index information applicable to all documents to be processed, and there is no limit to this.
其中,待处理文档可以具体是一段包含实体的文本,例如:“在2021年针对儿童的流感情况调研中,发现流感的发生具有一定的季节性”。The document to be processed can specifically be a text containing entities, for example: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality."
而实体,可以包括疾病,调研对象,调研时间等,对此不做限制。Entities can include diseases, research objects, research time, etc., and there are no restrictions on this.
其中,关联实体,是指从其他平台上获取得到的可供待处理文档复用的实体,例如,该关联实体,可以是从相应医疗业务平台上复用得到的关联实体。The associated entity refers to an entity obtained from other platforms that can be reused for the document to be processed. For example, the associated entity may be an associated entity reused from the corresponding medical business platform.
也即是说,本公开实施例中,获取与待处理文档对应的关联实体,可以是经由文档搜索平台的数据传输接口,获取其他平台中可以被带处理文档所复用的关联实体,并将前述复用得到的关联实体作为与待处理文档对应的父标签,对此不做限制。That is to say, in the embodiment of the present disclosure, obtaining the associated entity corresponding to the document to be processed may be through the data transmission interface of the document search platform, obtaining the associated entity in other platforms that can be reused by the document being processed, and then The associated entity obtained by the aforementioned reuse is used as the parent tag corresponding to the document to be processed, and there is no restriction on this.
其中,索引信息是指参与搜索的所有文档相关的结构化信息,该索引信息可以具体例如为,文档类型,文档大小,文档名称,文档存储地址,文档更新时间等,对此不做限制。The index information refers to the structured information related to all documents participating in the search. The index information can be, for example, document type, document size, document name, document storage address, document update time, etc., and there is no limit to this.
一些实施例中,确定与待处理文档对应的父标签,可以是在确定待处理文档后,经由文档搜索平台的数据传输接口,获取其他相应业务平台可复用的关联实体,并将该标签作为父标签,或者,确定与待处理文档对应的标签,还可以是在获取得到待处理文档后,由文档搜索平台的数据传输接口,获取标签库中的标签,并将前述获取得到的标签作为与待处理文档对应的父标签,对此不做限制。In some embodiments, determining the parent tag corresponding to the document to be processed may be to obtain the associated entity that can be reused by other corresponding business platforms through the data transmission interface of the document search platform after determining the document to be processed, and use the tag as The parent tag, or determine the tag corresponding to the document to be processed, or after obtaining the document to be processed, obtain the tags in the tag library through the data transmission interface of the document search platform, and use the previously obtained tags as the The parent tag corresponding to the document to be processed, there is no restriction on this.
S203:针对父标签配置属性,并将所配置属性作为标签信息。S203: Configure attributes for the parent label, and use the configured attributes as label information.
其中,用于对父标签的属性进行描述的信息,即可以被称为属性,该属性可以具体例如为标签名称,标签数据格式,标签是否允许修改说明,标签是否参与搜索说明等信息,对此不做限制,该属性用于确定父标签是否参与文档搜索。Among them, the information used to describe the attributes of the parent tag can be called attributes. The attributes can be specific information such as tag name, tag data format, whether the tag allows modification instructions, whether the tag participates in search instructions, etc. In this regard, Without limitation, this property is used to determine whether the parent tag participates in document searches.
也即是说,本公开实施例在获取得到父标签后,可以对父标签的各种属性进行配置,以满足不同的业务场景的文档搜索需求,该属性配置可以具体例如为,标签分类,标签名称,标签类型,标签是否必填,值类型,是否参与索引,可见性筛选等,对此不做限制。That is to say, after obtaining the parent tag, the embodiment of the present disclosure can configure various attributes of the parent tag to meet the document search requirements of different business scenarios. The attribute configuration can be, for example, tag classification, tag There are no restrictions on the name, label type, whether the label is required, value type, whether to participate in indexing, visibility filtering, etc.
本公开实施例中,通过针对父标签配置相应的属性,并将属性作为标签信息,从而能够基于属性对父标签进行灵活配置修改,从而能够使得文档搜索平台中的文档标签信息能够有效地满足不同的业务场景的文档搜索需求。In the embodiment of the present disclosure, by configuring corresponding attributes for the parent tag and using the attributes as tag information, the parent tag can be flexibly configured and modified based on the attributes, so that the document tag information in the document search platform can effectively meet different needs. Document search needs of business scenarios.
S204:从待处理文档中解析得到与父标签对应的子标签。S204: Parse the child tag corresponding to the parent tag from the document to be processed.
其中,假设待处理文档是:“在2021年针对儿童的流感情况调研中,发现流感的发生具有一定的季节性”,父标签是:“疾病,调研对象,调研时间等”,子标签可以是与父标签对应的具体文档内容,该与父标签对应的子标签可以例如是:“疾病-流感,调研对象-儿童,调研时间-2021年”对此不做限制。Among them, it is assumed that the document to be processed is: "In the 2021 influenza situation survey for children, it was found that the occurrence of influenza has a certain seasonality", the parent tag is: "Disease, survey object, survey time, etc.", and the sub-tag can be The specific document content corresponding to the parent tag, the sub-tag corresponding to the parent tag can be, for example: "Disease - influenza, research object - children, research time - 2021". There is no restriction on this.
一些实施例中,从待处理文档中解析得到与父标签对应的子标签,可以在获取得到待处理文档并确定相应的父标签后,可以将待处理文档和父标签输入至预训练的全局指针(Global Pointer)模型中,以得到Global Pointer模型输出的与父标签对应的子标签,对此不做限制。In some embodiments, the child tag corresponding to the parent tag is obtained from the document to be processed. After the document to be processed is obtained and the corresponding parent tag is determined, the document to be processed and the parent tag can be input into a pre-trained global pointer. (Global Pointer) model, in order to obtain the child tag corresponding to the parent tag output by the Global Pointer model, there is no restriction on this.
其中,Global Pointer模型是基于旋转位置编码(一种相对位置编码)的人工智能模型,该模型可以支持对文档进行信息抽取,或者,该模型也可以被配置为其他任意可能的能够支持从文档中提取相应子标签的人工智能模型,对此不做限制。Among them, the Global Pointer model is an artificial intelligence model based on rotational position coding (a relative position coding). This model can support information extraction from documents, or the model can also be configured to any other possible method that can support extracting information from documents. The artificial intelligence model that extracts the corresponding sub-tags is not limited to this.
可选地,一些实施例中,从待处理文档中解析得到与父标签对应的子标签,可以是调用人工智能AI领域的自然语言处理NLP服务,从所述待处理文档中识别与所述父标签对应的文档通用索引,并将所述文档通用索引作为所述子标签,由此,能够实现从待处理文档中准确地解析得到与父标签对应的文档通用索引作为子标签,从而能够使得解析得到的文档通用索引可以 和父标签相适配,从而能够在将文档通用索引作为子标签时,能够有效地提升子标签的确定效果。Optionally, in some embodiments, parsing the child tag corresponding to the parent tag from the document to be processed may be to call a natural language processing NLP service in the field of artificial intelligence to identify the child tag corresponding to the parent tag from the document to be processed. The document universal index corresponding to the tag, and the document universal index is used as the sub-tag, thereby enabling accurate parsing of the document to be processed to obtain the document universal index corresponding to the parent tag as a sub-tag, thereby enabling parsing The obtained document universal index can be adapted to the parent tag, so that when the document universal index is used as a subtag, the determination effect of the subtag can be effectively improved.
其中,文档通用索引,是指用于对相应父标签进行具体描述的相关信息,该文档通用索引可以具体例如为,相应父标签的特征信息,相应父标签的内容信息等,对此不做限制。Among them, the document general index refers to the relevant information used to specifically describe the corresponding parent tag. The document general index can be, for example, the characteristic information of the corresponding parent tag, the content information of the corresponding parent tag, etc., and there is no limit to this. .
其中,当父标签是“文档更新时间”时,相应文档通用索引可以具体例如是“2022年4月20日”,对此不做限制。Among them, when the parent tag is "document update time", the corresponding document general index can be, for example, "April 20, 2022", and there is no limit to this.
也即是说,本公开实施例中,可以是在确定父标签后,根据父标签对待处理文档进行解析处理(其中,该解析处理方式可以具体例如为,语义解析,模型解析等,对此不做限制),以从待处理文档中解析得到与父标签对应的文档通用索引,并将该文档通用索引作为子标签,对此不做限制。That is to say, in the embodiment of the present disclosure, after the parent tag is determined, the document to be processed can be parsed according to the parent tag (wherein, the parsing method can be specifically, for example, semantic parsing, model parsing, etc., and there is no need for this). restrictions) to parse the document to be processed to obtain the document universal index corresponding to the parent tag, and use the document universal index as a child tag, without any restrictions.
本公开实施例中,从待处理文档中解析得到与父标签对应的子标签,还可以是调用自然语言处理(Natural Language Processing,NLP)服务,对待处理文档进行处理,以从待处理文档中解析得到与父标签对应的子标签,对此不做限制。In this disclosed embodiment, the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service can be called to process the document to be processed, so as to parse the document from the document to be processed. Get the child tag corresponding to the parent tag, without restrictions.
一些实施例中,从待处理文档提取得到与父标签(例如,关联实体)对应的关联实体值,可以是采用实体识别模型从待处理文档提取得到与父标签(例如,关联实体)对应的关联实体值,即可以将待处理文档和相应父标签(例如,关联实体)输入至实体识别模型中,以得到实体识别模型输出的与父标签(例如,关联实体)对应的关联实体值,对此不做限制。In some embodiments, extracting the associated entity value corresponding to the parent tag (for example, associated entity) from the document to be processed may be to use an entity recognition model to extract the association corresponding to the parent tag (for example, associated entity) from the document to be processed. Entity value, that is, the document to be processed and the corresponding parent tag (for example, associated entity) can be input into the entity recognition model to obtain the associated entity value output by the entity recognition model corresponding to the parent tag (for example, associated entity). For this No restrictions.
可选地,另一些实施例中,从待处理文档中解析得到与父标签对应的子标签,还可以是调用所述NLP服务,从所述待处理文档中识别与所述父标签对应的关联实体值,并将所述关联实体值作为所述子标签,由此,能够实现从待处理文档中准确地解析得到与父标签对应的关联实体值作为子标签,从而能够使得解析得到的关联实体值可以和父标签相适配,从而能够在将关联实体值作为子标签时,有效地提升子标签的确定效果。Optionally, in other embodiments, the child tag corresponding to the parent tag is obtained by parsing from the document to be processed, or the NLP service can be called to identify the association corresponding to the parent tag from the document to be processed. Entity value, and the associated entity value is used as the child tag, thereby enabling the associated entity value corresponding to the parent tag to be accurately parsed from the document to be processed as the child tag, thereby enabling the parsed associated entity to be obtained The value can be adapted to the parent tag, which can effectively improve the certainty effect of the child tag when using the associated entity value as a child tag.
其中,关联实体值,是指用于对相应父标签(例如,关联实体)进行具体描述的相关信息,该关联实体值可以具体例如为,关联实体的特征信息,关联实体的内容信息等,对此不做限制。Among them, the associated entity value refers to the relevant information used to specifically describe the corresponding parent tag (for example, associated entity). The associated entity value can be, for example, the characteristic information of the associated entity, the content information of the associated entity, etc., for There is no restriction on this.
其中,当关联实体是“疾病”时,相应关联实体值可以具体例如是“流感,感冒”,对此不做限制。Wherein, when the associated entity is "disease", the corresponding associated entity value may be, for example, "flu, cold", and there is no limit to this.
本公开实施例中,从待处理文档中解析得到与父标签对应的子标签,还可以是调用自然语言处理(Natural Language Processing,NLP)服务,对待处理文档进行处理,以从所述待处理文档中识别与所述父标签对应的关联实体值,并将所述关联实体值作为所述子标签,对此不做限制。In this disclosed embodiment, the child tag corresponding to the parent tag is obtained from the document to be processed, or the natural language processing (Natural Language Processing, NLP) service is called to process the document to be processed, so as to obtain the subtag corresponding to the parent tag from the document to be processed. Identify the associated entity value corresponding to the parent tag, and use the associated entity value as the child tag, without limitation.
S205:将父标签和子标签共同作为标签信息。S205: Use the parent tag and the child tag together as tag information.
本实施例中,通过确定与待处理文档对应的父标签,并从待处理文档中解析得到与父标签对应的子标签,从而能够在将父标签和子标签共同作为标签信息时,使得标签信息能够准确地对父标签和相应子标签进行表征,从而能够有效地提升标签信息的全面性和参考性,且能够在将标签信息提供至文档搜索平台时,使得文档搜索平台能够基于父标签和子标签两个维度,辅助用户的文档搜索工作的执行。In this embodiment, by determining the parent tag corresponding to the document to be processed, and parsing the child tag corresponding to the parent tag from the document to be processed, the tag information can be used when the parent tag and the child tag are jointly used as tag information. Accurately characterize parent tags and corresponding sub-tags, thereby effectively improving the comprehensiveness and reference of tag information, and enabling the document search platform to provide tag information to the document search platform based on both parent tags and sub-tags. dimensions to assist users in performing document search work.
本公开实施例在确定与待处理文档对应的父标签,并从待处理文档中解析得到与父标签对应的子标签后,可以将父标签和子标签共同作为标签信息,而后,可以将结合标签信息执行后续的文档搜索平台的构建方法,具体可以参见后续实施例。In the embodiment of the present disclosure, after determining the parent tag corresponding to the document to be processed, and parsing the child tag corresponding to the parent tag from the document to be processed, the parent tag and the child tag can be used together as tag information, and then the combined tag information can be The subsequent construction method of the document search platform is performed. For details, please refer to the subsequent embodiments.
S206:根据标签信息和待处理文档,构建与文档类型对应的目标文档库。S206: Build a target document library corresponding to the document type based on the tag information and the document to be processed.
S207:根据目标文档库,形成目标文档搜索平台。S207: Form a target document search platform based on the target document library.
S206-S207的描述说明可以具体参见上述实施例,在此不再赘述。For descriptions of S206-S207, reference may be made to the above-mentioned embodiments and will not be described again here.
本实施例中,通过获取待处理文档,其中,待处理文档具有对应的文档类型,并确定与待处理文档对应的父标签,确定与待处理文档对应的父标签,并从待处理文档中解析得到与父标签对应的子标签,从而能够在将父标签和子标签共同作为标签信息时,使得标签信息能够准确地对父标签和相应子标签进行表征,从而能够有效地提升标签信息的全面性和参考性,且能够在将标签信息提供至文档搜索平台时,使得文档搜索平台能够基于父标签和子标签两个维度,辅助用户的文档搜索工作的执行,再针对父标签配置相应的属性,并将属性作为标签信息,从而能够基于属性对父标签进行灵活配置修改,从而能够使得文档搜索平台中的文旦标签信息能够有效地满足不同的业务场景的文档搜索需求,再根据标签信息和待处理文档,构建与文档类 型对应的目标文档库,并根据目标文档库,形成目标文档搜索平台,从而能够有效地提升文档搜索平台的复用性,使得构建得到的文档搜索平台能够有效地满足不同业务场景的文档搜索需求In this embodiment, by obtaining the document to be processed, where the document to be processed has a corresponding document type, and determining the parent tag corresponding to the document to be processed, determining the parent tag corresponding to the document to be processed, and parsing it from the document to be processed Obtain the child tag corresponding to the parent tag, so that when the parent tag and the child tag are jointly used as tag information, the tag information can accurately characterize the parent tag and the corresponding child tag, thereby effectively improving the comprehensiveness and accuracy of the tag information. Reference, and when providing tag information to the document search platform, the document search platform can assist the user in the execution of document search work based on the two dimensions of parent tag and child tag, and then configure the corresponding attributes for the parent tag, and Attributes are used as tag information, so that parent tags can be flexibly configured and modified based on attributes, so that the Wendan tag information in the document search platform can effectively meet the document search needs of different business scenarios, and then based on the tag information and documents to be processed, Build a target document library corresponding to the document type, and form a target document search platform based on the target document library, which can effectively improve the reusability of the document search platform and enable the constructed document search platform to effectively meet the needs of different business scenarios. Document search needs
图3是本公开另一实施例提出的文档搜索平台的构建方法的流程示意图。FIG. 3 is a schematic flowchart of a method for building a document search platform proposed by another embodiment of the present disclosure.
参见图3,该文档搜索平台的构建方法,包括:See Figure 3 for the construction method of the document search platform, including:
S301:获取待处理文档,其中,待处理文档具有对应的文档类型。S301: Obtain the document to be processed, where the document to be processed has a corresponding document type.
S302:获取与待处理文档对应的标签信息。S302: Obtain tag information corresponding to the document to be processed.
S301-S302的描述说明可以具体参见上述实施例,在此不再赘述。For descriptions of S301-S302, reference may be made to the above-mentioned embodiments, and details will not be described again here.
S303:调用机器人流程自动化RPA机器人,确定与文档类型对应的初始文档库。S303: Call the robotic process automation RPA robot to determine the initial document library corresponding to the document type.
其中,在文档搜索平台的构建方法执行的初始阶段,从多个文档库中与待处理文档的文档类型对应的文档库,即可以被称为初始文档库,该初始文档库可以在后续文档搜索平台的构建方法的执行过程中,用于辅助构建目标文档库,具体可以参见后续实施例。Among them, in the initial stage of execution of the construction method of the document search platform, a document library corresponding to the document type of the document to be processed is selected from multiple document libraries, which can be called an initial document library. This initial document library can be used in subsequent document searches. During the execution of the platform construction method, it is used to assist in building the target document library. For details, please refer to subsequent embodiments.
本公开实施例中,确定与文档类型对应的初始文档库,可以是调用机器人流程自动化RPA机器人,根据文档类型实现自动化地对某一文档库进行标注处理,并在对该文档库进行标注处理后,使得该文档库只能用于存储该相应文档类型的文档,该标注处理后的文档库,即可以被称为初始文档库。In the embodiment of the present disclosure, determining the initial document library corresponding to the document type may be by calling a robotic process automation (RPA) robot to automatically annotate a certain document library according to the document type, and after annotating the document library , so that the document library can only be used to store documents of the corresponding document type. The document library after the annotation process can be called the initial document library.
S304:将标签信息和待处理文档存储至初始文档库,以形成目标文档库。S304: Store tag information and documents to be processed in the initial document library to form a target document library.
本公开实施例在确定与文档类型对应的初始文档库后,可以将标签信息和待处理文档存储至初始文档库,以形成目标文档库。In the embodiment of the present disclosure, after determining the initial document library corresponding to the document type, tag information and documents to be processed can be stored in the initial document library to form a target document library.
可选地,一些实施例中,将标签信息和待处理文档存储至初始文档库,以形成目标文档库,可以是获取与待处理文档对应的目标加载类型,并采用与目标加载类型对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库,由此,能够实现基于与待处理文档相适配目标文档存储方式对相应待处理文档进行适应性存储,从而可以有效地满足不同目标加载类型的待处理文档的文档存储需求,此外,通过采用与目标加载类型对应的目标文档存储方式,对标签信息和待处理文档进行存储,使得初始文档库中文档的目标加载类型不需要局限于一种单一的格式,从而能够在较大程度上实现对文档搜索平台的文档进行有效扩充。Optionally, in some embodiments, storing the tag information and the document to be processed into the initial document library to form a target document library may be to obtain the target load type corresponding to the document to be processed, and use the target corresponding to the target load type. The document storage method stores tag information and documents to be processed in the initial document library. This enables adaptive storage of the documents to be processed based on the target document storage method that matches the documents to be processed, thus effectively meeting the needs of different needs. Document storage requirements for documents to be processed of the target load type. In addition, by using the target document storage method corresponding to the target load type to store tag information and documents to be processed, the target load type of documents in the initial document library does not need to be limited. In a single format, it can effectively expand the documents of the document search platform to a large extent.
其中,待处理文档可以不同的类型进行加载,该类型即可以被称为目标加载类型,该目标加载类型可以具体例如为文档加载类型,链接加载类型,富文本加载类型等,对此不做限制。Among them, the document to be processed can be loaded in different types. This type can be called a target loading type. The target loading type can be specifically, for example, a document loading type, a link loading type, a rich text loading type, etc., and there is no limit to this. .
其中,不同的目标加载类型可以具有对应的文档存储方式,该文档存储方式即可以被称为目标文档存储方式。Different target loading types may have corresponding document storage methods, and the document storage methods may be called target document storage methods.
举例而言,采用与文档格式对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库,可以是在待处理文档是文本加载类型时,将待处理文档直接存储至初始文档库,或者,还可以是在待处理文档是图片加载类型时,采用光学字符识别(Optical Character Recognition,OCR)方式对该图片进行识别,并将前述识别得到的文本存储至初始文档库,对此不做限制。For example, the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library. When the document to be processed is a text loading type, the document to be processed is directly stored in the initial document library. , or, when the document to be processed is an image loading type, the image can be recognized using optical character recognition (Optical Character Recognition, OCR), and the previously recognized text can be stored in the initial document library. There is no need for this. Make restrictions.
可选地,一些实施例中,采用与文档格式对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库,可以在目标加载类型是文档加载类型时,将待处理文档和相应文档标签信息存储至目标文档库,和/或在目标加载类型是链接加载类型时,将与待处理文档对应的访问链接和相应标签信息存储至目标文档库,和/或在目标加载类型是富文本加载类型,经由富文本编辑器对待处理文档进行编辑处理,并将编辑处理结果和相应标签信息存储至目标文档库。Optionally, in some embodiments, the target document storage method corresponding to the document format is used to store the tag information and the document to be processed in the initial document library. When the target loading type is a document loading type, the document to be processed and the corresponding document can be stored in the initial document library. The document label information is stored in the target document library, and/or when the target loading type is a link loading type, the access link and corresponding label information corresponding to the document to be processed are stored in the target document library, and/or when the target loading type is a rich In the text loading type, the document to be processed is edited through the rich text editor, and the editing results and corresponding tag information are stored in the target document library.
本公开实施例中,文档加载类型是指,待处理文档支持从目标文档库所属设备本地直接加载至目标文档库,此时,可以将待处理文档和相应标签信息从目标文档库所属设备本地存储至目标文档库。In the embodiment of the present disclosure, the document loading type means that the document to be processed supports direct loading from the device to which the target document library belongs to the target document library. At this time, the document to be processed and the corresponding tag information can be stored locally from the device to which the target document library belongs. to the target document library.
本公开实施例中,链接加载类型是外部链接(例如,统一资源定位符(Uniform Resource Locator,URL)),即目标文档库所属设备本地不存在待处理文档的原文件,该外部链接支持跳转至与外部链接相应的待处理文档,此时,可以将外部链接和相应标签信息存储至目标文档库。In this disclosed embodiment, the link loading type is an external link (for example, Uniform Resource Locator (URL)), that is, the original file of the document to be processed does not exist locally on the device to which the target document library belongs, and the external link supports jump To the pending document corresponding to the external link, at this time, the external link and corresponding tag information can be stored in the target document library.
本公开实施例中,富文本加载类型是指待处理文档是以图片类型,音频类型,视频类型等类型进行加载,此时,可以采用富文本编辑器对待处理文档进行编辑处理,以得到相应的编辑处理结果,将编辑结果和相应标签信息存储至目标文档库。In the embodiment of the present disclosure, the rich text loading type means that the document to be processed is loaded in image type, audio type, video type, etc. At this time, a rich text editor can be used to edit the document to be processed to obtain the corresponding Edit the processing results, and store the editing results and corresponding tag information in the target document library.
本公开实施例中,由于是先确定与文档类型对应的初始文档库,从而能够实现将标签信息 和相应文档类型待处理文档准确存储至相应文档类型的初始文档库中,使得形成得到的目标文档库相应的文档类型能够和待处理文档的文档类型相适配,从而有效地提升目标文档库的构建效果。In the embodiment of the present disclosure, since the initial document library corresponding to the document type is first determined, it is possible to accurately store the tag information and the document to be processed of the corresponding document type into the initial document library of the corresponding document type, so that the obtained target document can be formed The corresponding document type of the library can be adapted to the document type of the document to be processed, thereby effectively improving the construction effect of the target document library.
S305:根据目标文档库,形成目标文档搜索平台。S305: Form a target document search platform based on the target document library.
S305的描述说明可以具体参见上述实施例,在此不再赘述。For the description of S305, reference may be made to the above-mentioned embodiments, and details will not be described again here.
本公开实施例中,可以结合具体的示意图对本公开实施例描述的文档搜索平台的构建方法进行具体的举例说明,在文档搜索平台的构建方法的初始阶段,可以获取待处理文档和与待处理文档的文档类型对应的初始文档库(该初始文档库可以是预先在文档搜索平台的文档搜索平台的文档库构建界面(文档库构建界面可以参见图4,图4是本公开一实施例提出的文档库构建界面的示意图)构建得到的),并获取与待处理文档对应的标签信息,而后,可以确定与待处理文档对应的目标加载类型,并在文档搜索平台的文档上传管理界面(文档上传管理界面可以参见图5,图5是本公开一实施例提出的文档存储管理界面的示意图),选择相应文档加载类型的配置项,以进入不同文档加载类型的文档存储界面(该不同文档加载类型的文档存储界面可以参见图6A,图6B,图6C,图6A是本公开一实施例提出的文档加载类型的文档存储界面的示意图,图6B是本公开一实施例提出的链接加载类型的文档存储界面的示意图,图6C是本公开一实施例提出的富文本加载类型的文档存储界面的示意图),并在该相应的文档存储界面存储相应的待处理文档。In the embodiment of the present disclosure, the method of constructing the document search platform described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams. In the initial stage of the method of constructing the document search platform, the document to be processed and the document to be processed can be obtained. The initial document library corresponding to the document type (the initial document library can be the document library construction interface of the document search platform in advance (the document library construction interface can be seen in Figure 4, Figure 4 is a document proposed by an embodiment of the present disclosure) Schematic diagram of the library construction interface), and obtain the tag information corresponding to the document to be processed. Then, the target loading type corresponding to the document to be processed can be determined, and the document upload management interface of the document search platform (document upload management The interface can be seen in Figure 5, which is a schematic diagram of the document storage management interface proposed by an embodiment of the present disclosure). Select the configuration item of the corresponding document loading type to enter the document storage interface of different document loading types (the different document loading types. The document storage interface can be seen in Figure 6A, Figure 6B, and Figure 6C. Figure 6A is a schematic diagram of a document storage interface of the document loading type proposed by an embodiment of the present disclosure, and Figure 6B is a link loading type of document storage proposed by an embodiment of the present disclosure. A schematic diagram of the interface, FIG. 6C is a schematic diagram of a rich text loading type document storage interface proposed by an embodiment of the present disclosure), and the corresponding document to be processed is stored in the corresponding document storage interface.
本公开实施例在将待处理文档存储至初始文档库后,可以在在初始文档库中的待处理文档侧,针对待处理文档配置相应的标签信息(例如,参见图7,图7是本公开一实施例提出的标签信息配置界面的示意图,即可以在该界面点击编辑项,并在编辑项下进行相应标签信息配置操作,以实现针对待处理文档配置相应的标签信息),此外,还可以支持在相应标签的属性配置界面(参见图8,图8是本公开一实施例提出的属性配置界面的示意图)针对父标签配置相应的属性,至此,完成目标文档库的构建,从而形成目标文档搜索平台。In this embodiment of the present disclosure, after the document to be processed is stored in the initial document library, corresponding tag information can be configured for the document to be processed on the side of the document to be processed in the initial document library (for example, see FIG. 7 , which is a diagram of the present disclosure). A schematic diagram of the tag information configuration interface proposed in one embodiment, that is, you can click on the edit item on the interface, and perform the corresponding tag information configuration operation under the edit item to configure the corresponding tag information for the document to be processed). In addition, you can also Support configuring corresponding attributes for the parent tag on the attribute configuration interface of the corresponding tag (see Figure 8, which is a schematic diagram of the attribute configuration interface proposed by an embodiment of the present disclosure). At this point, the construction of the target document library is completed, thereby forming the target document Search platform.
本实施例中,通过获取待处理文档,其中,待处理文档具有对应的文档类型,并获取与待处理文档对应的标签信息,再确定与文档类型对应的初始文档库,从而能够实现将标签信息和相应文档类型待处理文档准确存储至相应文档类型的初始文档库中,使得形成得到的目标文档库相应的文档类型能够和待处理文档的文档类型相适配,从而有效地提升目标文档库的构建效果,从而能够在基于目标文档库,形成目标文档搜索平台时,使得构建得到的文档搜索平台能够有效地满足不同业务场景的文档搜索需求。In this embodiment, by obtaining the document to be processed, where the document to be processed has a corresponding document type, obtaining the tag information corresponding to the document to be processed, and then determining the initial document library corresponding to the document type, it is possible to realize the tag information The document to be processed and the corresponding document type are accurately stored in the initial document library of the corresponding document type, so that the corresponding document type of the target document library can be adapted to the document type of the document to be processed, thereby effectively improving the target document library. The construction effect is such that when the target document search platform is formed based on the target document library, the built document search platform can effectively meet the document search needs of different business scenarios.
图9是本公开一实施例提出的文档搜索方法的流程示意图。FIG. 9 is a schematic flowchart of a document search method proposed by an embodiment of the present disclosure.
本实施例以文档搜索方法被配置为文档搜索装置中来举例说明,本实施例中文档搜索方法可以被配置在文档搜索装置中,文档搜索装置可以设置在服务器中,或者也可以设置在电子设备中,本公开实施例对此不作限制。In this embodiment, the document search method is configured as a document search device as an example. In this embodiment, the document search method can be configured in the document search device. The document search device can be set in a server or can also be set in an electronic device. , the embodiments of the present disclosure do not limit this.
参见图9,该文档搜索方法,包括:Referring to Figure 9, the document search method includes:
S901:接收文档搜索请求。S901: Receive document search request.
本实施例中与上述实施例中相同的术语的含义和描述说明,可以具体参见上述实施例,在此不再赘述。For the meanings and descriptions of the same terms in this embodiment as in the above embodiment, please refer to the above embodiment for details and will not be described again here.
其中,用户侧电子设备作出的用于触发在文档搜索平台中进行文档搜索的请求,即可以被称为文档搜索请求。Among them, a request made by the user-side electronic device to trigger a document search in the document search platform may be called a document search request.
本公开实施例中,接收文档搜索请求,可以是由目标文档搜索平台预先提供相应的数据传输接口,经由该数据传输接口接收用户侧设备作出的文档搜索请求,对此不做限制。In the embodiment of the present disclosure, the document search request may be received by the target document search platform providing a corresponding data transmission interface in advance, and the document search request made by the user-side device is received via the data transmission interface, without limitation.
或者,接收文档搜索请求,还可以是在目标文档搜索平台中预先设置相应的监听装置,并经由监听装置对用户侧设备进行监听,并在监听到用户侧设备生成相应文档搜索请求时,接收文档搜索请求,对此不做限制。Alternatively, to receive a document search request, a corresponding monitoring device may be pre-set in the target document search platform, and the user-side device may be monitored through the monitoring device, and when the user-side device generates a corresponding document search request, the document may be received. Search requests, without restrictions.
S902:从文档搜索请求中解析需求文档类型和需求标签信息。S902: Parse the requirement document type and requirement tag information from the document search request.
本公开实施例在接收到文档搜索请求后,可以从文档搜索请求中解析得到需求文档类型和需求标签信息。After receiving the document search request, the embodiment of the present disclosure can parse the required document type and required tag information from the document search request.
其中,用户在进行文档搜索时,可以具有其所需求进行搜索的文档类型,该文档类型即可以被称为需求文档类型,该需求文档搜索类型可以用于对用户所处的业务场景中的文档搜索需求进行表征,例如,当用户所处的业务场景是医学业务场景时,可以确定该用户在进行文档搜 索时,其需求的文档类型是医学文档类型,对此不做限制。Among them, when users perform document searches, they can have the document type they need to search for. This document type can be called a demand document type. This demand document search type can be used to search for documents in the business scenario where the user is located. Characterize the search requirements. For example, when the business scenario the user is in is a medical business scenario, it can be determined that the document type required by the user when performing a document search is a medical document type, and there is no restriction on this.
其中,用户在进行文档搜索时,可以具有其所需求进行搜索的标签信息,该标签信息即可以被称为需求标签信息,该需求标签信息可以具体例如为用户需要搜索的文档名称,文档内容关键词等,对此不做限制。When the user performs a document search, he or she may have the tag information that he or she needs to search for. This tag information may be called demand tag information. The demand tag information may be, for example, the name of the document that the user needs to search, or the key content of the document. words, etc., there is no restriction on this.
举例而言,如果接收到的文档搜索请求是:“2021年针对儿童做出的流感发病原因探究”,则需求文档类型可以例如是医学文档类型,需求标签信息可以是文档搜索请求中的关键词信息,例如:“流感,儿童,2021年”等,对此不做限制。For example, if the received document search request is: "Study on the causes of influenza in children in 2021", the required document type can be, for example, a medical document type, and the required tag information can be the keywords in the document search request. Information, such as: "Flu, children, 2021", etc., are not restricted.
本公开实施例中,从文档搜索请求中解析需求文档类型和需求标签信息,可以是对文档搜索请求进行语义解析处理,以得到需求文档类型和需求标签信息。In the embodiment of the present disclosure, parsing the requirement document type and requirement tag information from the document search request may include performing semantic parsing processing on the document search request to obtain the requirement document type and requirement tag information.
S903:从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档。S903: Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.
本公开实施例在接收文档搜索请求后,可以根据文档搜索请求中的需求文档类型,从多个文档库中确定与需求文档类型对应的文档库作为目标文档库。After receiving the document search request, the embodiment of the present disclosure can determine the document library corresponding to the required document type from multiple document libraries as the target document library according to the required document type in the document search request.
本公开实施例中,目标文档搜索平台中的多个文档库可以用于存储相应文档类型的待处理文档,从多个文档库中确定与需求文档类型对应的目标文档库,可以是先确定与多个文档库分别对应的多个文档类型,并在确定需求文档类型后,将需求文档类型和前述确定的多个文档类型进行比对,并在需求文档类型和文档类型相同时,将与该文档类型对应的文档库作为目标文档库,对此不做限制。In the embodiment of the present disclosure, multiple document libraries in the target document search platform can be used to store documents to be processed of corresponding document types. The target document library corresponding to the required document type is determined from the multiple document libraries. This can be done by first determining the target document library corresponding to the required document type. Multiple document types corresponding to multiple document libraries respectively, and after determining the requirement document type, compare the requirement document type with the multiple previously determined document types, and when the requirement document type and document type are the same, compare the requirement document type with the document type. The document library corresponding to the document type is used as the target document library, and there is no restriction on this.
举例而言,从多个文档库中确定与需求文档类型对应的目标文档库,可以例如是在确定需求文档类型是医学文档类型时,确定用于存储医学文档的文档库作为目标文档库,而后可以支持在医学文档库中进行文档搜索,从而使得搜索得到的目标文档能够有效地满足相应医疗业务场景中的医学文档需求。For example, determining the target document library corresponding to the required document type from multiple document libraries may be, for example, when determining that the required document type is a medical document type, determining the document library used to store medical documents as the target document library, and then It can support document search in the medical document library, so that the target documents obtained by the search can effectively meet the medical document needs in the corresponding medical business scenarios.
一些实施例中,在构建目标文档搜索平台时,可以支持根据文档类型对多个文档库进行标注处理,相应地,从多个文档库中确定与需求文档类型对应的目标文档库,可以是确定与在确定文档库相应的标识是需求文档类型时,将该文档库作为目标文档库,对此不做限制。In some embodiments, when building a target document search platform, it can support annotation processing of multiple document libraries according to document types. Correspondingly, determining a target document library corresponding to the required document type from multiple document libraries can be determined. When it is determined that the corresponding identifier of the document library is the required document type, the document library is used as the target document library, and there is no restriction on this.
本公开实例中,由于目标文档库中存储的是与用户所处业务场景的文档类型相适配的待处理文档,从而可以支持从与用户所处业务场景业务类型相适配的目标文档库中进行文档搜索,从而能够基于目标文档库有效地缩小文档搜索的范围,在有效地提升文档搜索效率的同时,使得搜索得到文档能够有效地满足不同业务场景的文档搜索需求。In this disclosed example, since the target document library stores documents to be processed that match the document type of the user's business scenario, it is possible to support processing from the target document library that matches the business type of the user's business scenario. Perform document search, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the searched documents can effectively meet the document search needs of different business scenarios.
S904:从目标文档库中搜索与需求标签信息对应的目标文档。S904: Search the target document library for the target document corresponding to the requirement tag information.
本公开实施例在从多个文档库中确定与需求文档类型对应的目标文档库后,从目标文档库中的多个待处理文档中,搜索与需求标签信息对应的待处理文档作为目标文档。In the embodiment of the present disclosure, after determining the target document library corresponding to the required document type from multiple document libraries, the document to be processed corresponding to the required tag information is searched from the multiple documents to be processed in the target document library as the target document.
本公开实施例中,目标文档库中存储有与文档关联的标签信息,相应地,从目标文档库中搜索与需求标签信息对应的目标文档,可以是,在目标文档库中查找与需求标签信息相匹配的标签信息,并将与该标签信息相应的文档作为目标文档。In the embodiment of the present disclosure, the target document library stores tag information associated with the document. Accordingly, searching for the target document corresponding to the required tag information from the target document library may be to search for the required tag information in the target document library. Match the tag information, and use the document corresponding to the tag information as the target document.
举例而言,如果需求标签信息是:“2021年,流感,儿童”,可以是查找目标文档库中存储的文档是否具有:“2021年,流感,儿童”文档标签信息,并在确定某一文档具有:“2021年,流感,儿童”文档标签信息时,将该文档作为目标文档,对此不做限制。For example, if the required label information is: "2021, influenza, children", you can find whether the documents stored in the target document library have: "2021, influenza, children" document label information, and determine whether a certain document When the document tag information is: "2021, influenza, children", the document is used as the target document, and there is no restriction on this.
一些实施例中,在目标文档库中查找与需求标签信息相匹配的标签信息,可以是采用预先训练的信息匹配模型,对需求标签信息和标签信息进行匹配处理,即可以将需求标签信息和标签信息输入至预先训练的信息匹配模型中,由信息匹配模型对需求标签信息和标签信息进行匹配处理,以得到相应的匹配处理结果,并在该匹配处理结果指示:需求标签信息和标签信息相匹配时,将与该标签信息相应的待处理文档作为目标文档,对此不做限制。In some embodiments, to search for tag information that matches the required tag information in the target document library, a pre-trained information matching model may be used to match the required tag information and the tag information. That is, the required tag information and the tag information may be matched. The information is input into the pre-trained information matching model, and the information matching model performs matching processing on the demand label information and the label information to obtain the corresponding matching processing result, and the matching processing result indicates: the demand label information and the label information match. When , the document to be processed corresponding to the tag information is used as the target document, and there is no restriction on this.
或者,在目标文档库中查找与需求标签信息相匹配的标签信息,还可以是确定需求标签信息和标签信息的匹配程度值,并在匹配程度值大于预先确定的匹配程度阈值时,将将与该标签信息相应的待处理文档作为目标文档,对此不做限制。Alternatively, search for tag information that matches the required tag information in the target document library, or determine the matching degree value between the required tag information and the tag information, and when the matching degree value is greater than a predetermined matching degree threshold, the matching degree value with the required tag information will be determined. The document to be processed corresponding to the tag information is used as the target document, and there is no restriction on this.
本公开实施例中,本公开实施例可以有效结合RPA和AI实现文档搜索过程的智能自动化(Intelligent Automation,IA),从而有效提升文档搜索的自动化程度,降低人工成本。In the embodiments of the disclosure, the embodiments of the disclosure can effectively combine RPA and AI to realize intelligent automation (IA) of the document search process, thereby effectively improving the automation of document search and reducing labor costs.
本实施例中,通过接收文档搜索请求,再从文档搜索请求中解析需求文档类型和需求标签信息,并从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档 搜索平台,文档库用于存储相应文档类型的文档,再从目标文档库中搜索与需求标签信息对应的目标文档,由此,可以支持在与用户所处业务场景业务类型相适配的目标文档库中进行文档搜索,从而能够基于目标文档库有效地缩小文档搜索的范围,在有效地提升文档搜索效率的同时,使得搜索得到目标文档能够有效地满足不同业务场景的文档搜索需求。In this embodiment, by receiving the document search request, the required document type and the required tag information are parsed from the document search request, and the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries It belongs to a document search platform. The document library is used to store documents of corresponding document types, and then searches for target documents corresponding to the required tag information from the target document library. This can support the search for documents that match the business type of the user's business scenario. Document search is performed in the target document library, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the target documents obtained by searching can effectively meet the document search needs of different business scenarios.
图10是本公开另一实施例提出的文档搜索方法的流程示意图。FIG. 10 is a schematic flowchart of a document search method proposed by another embodiment of the present disclosure.
参见图10,该文档搜索方法,包括:Referring to Figure 10, the document search method includes:
S1001:接收文档搜索请求。S1001: Receive document search request.
S1002:从文档搜索请求中解析需求文档类型和需求标签信息。S1002: Parse the requirement document type and requirement tag information from the document search request.
S1003:从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档。S1003: Determine the target document library corresponding to the required document type from multiple document libraries, where the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type.
S1001-S1003的描述说明可以具体参见上述实施例,在此不再赘述。For descriptions of S1001-S1003, reference may be made to the above-mentioned embodiments and will not be described again here.
S1004:调用人工智能AI领域的自然语言处理NLP服务处理需求属性,以从多个父标签中确定目标父标签,其中,目标父标签具有所对应目标子标签。S1004: Call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes to determine the target parent tag from multiple parent tags, where the target parent tag has a corresponding target subtag.
其中,多个父标签中参与本次文档搜索的父标签,即可以被称为目标父标签,相应的,与该目标父标签对应的子标签,即可以被称为目标子标签。Among the multiple parent tags, the parent tag participating in this document search can be called the target parent tag, and correspondingly, the child tag corresponding to the target parent tag can be called the target child tag.
举例而言,多个父标签及相应子标签可以例如为:“文档格式-文本格式,疾病-流感,文档更新时间-2021年4月,调研对象-儿童,发病原因-自发引起”,目标父标签可以例如是参与本次文档搜索的父标签,例如:“疾病,调研对象,发病原因”相应的,目标子标签可以是与目标父标签对应的子标签,例如:“儿童,自发引起,流感”等,对此不做限制。For example, multiple parent tags and corresponding sub-tags can be: "Document format - text format, disease - influenza, document update time - April 2021, research object - children, cause of disease - spontaneously caused", the target parent The tag can be, for example, a parent tag participating in this document search, such as: "disease, research object, cause of disease". Correspondingly, the target sub-tag can be a sub-tag corresponding to the target parent tag, such as: "children, spontaneously caused, influenza" ” etc., there is no restriction on this.
本公开实施例中,属性可以用于确定父标签,及与父标签相应的子标签是否参与文档搜索,其中,父标签中参与后续文档搜索的标签,即可以被称为目标父标签,相应地,与目标父标签对应的子标签,即可以被称为目标子标签。In the embodiment of the present disclosure, attributes can be used to determine whether the parent tag and the child tag corresponding to the parent tag participate in the document search. Among them, the tags in the parent tag that participate in subsequent document searches can be called target parent tags. Correspondingly, , the child tag corresponding to the target parent tag can be called the target child tag.
其中,用于需求的属性,即可以被称为需求属性,该需求属性,可以支持根据用户的文档搜索需求对目标文档库中的父标签进行配置调整。Among them, the attribute used for the requirement can be called the requirement attribute, and the requirement attribute can support the configuration adjustment of the parent tag in the target document library according to the user's document search requirements.
也即是说,本公开实施例在从多个文档库中确定与需求文档类型对应的目标文档库后,可以调用人工智能AI领域的自然语言处理NLP服务对需求属性进行处理,从目标文档库中的父标签相应的属性进行调整,从而确定父标签,及与父标签相应的子标签是否参与后续文档搜索,并将参与后续文档搜索的父标签作为目标父标签,将与目标父标签相应的子标签作为目标子标签,而后,可以基于目标子标签执行后续的文档搜索方法文档搜索方法,具体可以参见后续实施例。That is to say, after the embodiment of the present disclosure determines the target document library corresponding to the requirement document type from multiple document libraries, it can call the natural language processing NLP service in the field of artificial intelligence to process the requirement attributes, and select the target document library from the target document library. Adjust the corresponding attributes of the parent tag in to determine whether the parent tag and the child tags corresponding to the parent tag participate in subsequent document searches, and use the parent tags that participate in subsequent document searches as the target parent tag, and set the parent tag corresponding to the target parent tag. The sub-tag serves as the target sub-tag, and then a subsequent document search method can be executed based on the target sub-tag. For details, please refer to subsequent embodiments.
举例而言,多个父标签可以例如是:文档格式,疾病,调研时间,调研对象,而用户所需求搜索的文档可以具体例如是以儿童为调研对象的文档,此时,可以在文档搜索的过程中,根据需求属性,对疾病和调研时间等两个标签进行隐藏,从而使得疾病和调研时间等两个父标签及相应的子标签不参于后续文档搜索,并将除父标签外的文档格式和调研对象等父标签作为目标父标签,将与目标父标签对应的子标签作为目标子标签,由此,可以基于需求属性,从多个父标签中确定出能够有效地满足后续文档搜索的目标父标签,从而能够更进一步的缩小标签搜索范围,从而能够在后续文档搜索过程中,有效地降低标签处理的数据量,从而能够在有效地保障文档搜索效果的同时,有效地提升文档搜索效率。For example, the multiple parent tags can be: document format, disease, research time, research object, and the document that the user needs to search can be a document with children as the research object. In this case, the document search can be During the process, according to the demand attributes, two labels such as disease and research time are hidden, so that the two parent labels such as disease and survey time and the corresponding sub-tags are not involved in subsequent document searches, and documents other than the parent label are Parent tags such as format and research object are used as the target parent tag, and the sub tags corresponding to the target parent tag are used as the target sub tags. From this, based on the demand attributes, we can determine from multiple parent tags that can effectively satisfy subsequent document searches. The target parent tag can further narrow the tag search scope, thereby effectively reducing the amount of data processed by tags in the subsequent document search process, thereby effectively ensuring the document search effect while effectively improving the document search efficiency. .
S1005:根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档。S1005: Search the target document from the target document library according to the requirement attribute, requirement subtag, and target subtag.
本公开实施例在根据需求属性,从多个父标签中确定目标父标签之后,可以根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档。In the embodiment of the present disclosure, after determining the target parent tag from multiple parent tags according to the requirement attribute, the target document can be searched from the target document library according to the requirement attribute, the requirement subtag, and the target subtag.
一些实施例中,根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档,可以是对需求子标签和目标子标签进行匹配处理(其中,该匹配处理方式可以具体例如为,模型匹配,特征匹配等,对此不做限制),以得到相应的匹配处理结果,并根据需求属性对前述匹配处理结果进行进一步筛选,以得到目标文档,对此不做限制。In some embodiments, searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be performed by matching the requirement subtag and the target subtag (wherein, the matching processing method may be, for example, , model matching, feature matching, etc., there are no restrictions on this) to obtain the corresponding matching processing results, and the aforementioned matching processing results are further filtered according to the requirement attributes to obtain the target document, there are no restrictions on this.
可选地,一些实施例中,根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档,可以是调用机器人流程自动化RPA机器人,以根据所述需求子标签和所述目标子标签,实现自动化地从多个所述文档中搜索待筛选文档。Optionally, in some embodiments, searching for the target document from the target document library according to the requirement attribute, the requirement subtag, and the target subtag may be to call a robotic process automation RPA robot to search for the target document based on the requirement subtag and the target. Sub-tags to automatically search for documents to be filtered from multiple documents.
也即是说,本公开实施例中,可以支持根据需求子标签和目标子标签从目标文档库中的多个文档中确定多个待筛选文档,而后,可以支持根据需求属性,对多个待筛选文档进行进一步 筛选,以得到目标文档。That is to say, in the embodiment of the present disclosure, it is possible to support the determination of multiple documents to be filtered from multiple documents in the target document library according to the requirement sub-tag and the target sub-tag. Then, it is possible to support the determination of multiple documents to be filtered according to the requirement attributes. Filter documents for further filtering to get target documents.
一些实施例中,根据需求子标签和以及目标子标签从目标文档库中搜索目标文档,可以是对需求子标签和目标子标签进行匹配处理,并在需求子标签和目标子标签相匹配时,将目标文档库中与目标子标签对应的文档作为待筛选文档,对此不做限制。In some embodiments, searching for the target document from the target document library based on the requirement subtag and the target subtag may be performed by matching the requirement subtag and the target subtag, and when the requirement subtag and the target subtag match, The documents corresponding to the target sub-tag in the target document library are used as documents to be filtered, and there is no restriction on this.
或者,根据需求子标签和目标子标签从目标文档库中搜索多个待筛选文档,还可以是在目标文档库中,搜索与需求子标签相同的目标子标签,并在需求子标签和目标子标签相同时,将目标文档库中与目标子标签对应的文档作为待筛选文档,对此不做限制。Alternatively, search multiple documents to be filtered from the target document library based on the requirement subtag and target subtag, or search for the same target subtag as the requirement subtag in the target document library, and search between the requirement subtag and target subtag. When the tags are the same, the document corresponding to the target sub-tag in the target document library will be used as the document to be filtered, and there is no restriction on this.
可选地,一些实施例中,根据需求子标签和目标子标签从目标文档库中搜索多个待筛选文档,可以是确定需求子标签和各个文档的目标子标签之间的相似度值,并在相似度值满足设定条件时,将相应目标子标签所对应文档作为待筛选文档。Optionally, in some embodiments, searching multiple documents to be filtered from the target document library based on the demand subtag and the target subtag may be to determine the similarity value between the demand subtag and the target subtag of each document, and When the similarity value meets the set conditions, the document corresponding to the corresponding target sub-tag is used as the document to be filtered.
其中,相似度值可以用于表征需求子标签和目标子标签之间的相似程度,相似度值越大,则可以表征需求子标签和目标子标签越趋近于相同,反之,相似度值越小,则可以表征需求子标签和目标子标签差距越大,对此不做限制。Among them, the similarity value can be used to characterize the degree of similarity between the demand sub-label and the target sub-label. The greater the similarity value, the closer the demand sub-label and the target sub-label are to the same. On the contrary, the greater the similarity value. If it is small, it can indicate that the gap between the demand sub-label and the target sub-label is larger, and there is no restriction on this.
也即是说,本公开实施例中,可以是确定需求子标签和目标子标签之间的欧氏距离,并将该欧氏距离作为需求子标签和目标子标签之间的相似度值,并将相似度值与预先设定的设定条件(其中,该设定条件可以结合实际业务场景中的文档搜索需求,自适应配置,对此不做限制)相比较,并在相似度值满足设定条件时,将相应目标子标签对应的文档作为待筛选文档。That is to say, in the embodiment of the present disclosure, the Euclidean distance between the demand sub-label and the target sub-label may be determined, and the Euclidean distance may be used as the similarity value between the demand sub-label and the target sub-label, and Compare the similarity value with the preset setting conditions (where the setting conditions can be adaptively configured based on the document search requirements in actual business scenarios, without any restrictions), and when the similarity value satisfies the setting conditions When setting conditions, the documents corresponding to the corresponding target sub-tags are used as documents to be filtered.
本公开实施例中,根据需求子标签和目标子标签从目标文档库中搜索得到的多个待筛选文档,可以按照其相应的相似度值大小进行相应排序,此时可以根据需求属性,从多个待筛选文档中筛选得到目标文档。In the embodiment of the present disclosure, multiple documents to be filtered that are searched from the target document library according to the requirement sub-tag and the target sub-tag can be sorted according to their corresponding similarity values. At this time, the multiple documents to be filtered can be sorted according to the requirement attributes. The target document is obtained by filtering out the documents to be filtered.
可以理解的是,本公开实施例中,目标文档库中的文档可以具有多个目标子标签,多个待处理文档可以存在某个目标子标签重合的情况,在这种情况下,当基于需求子标签进行目标文档的匹配搜索时,存在搜索得到的文档数量为多个的情况,此时,可以根据需求属性对前述搜索得到的多个待筛选文档进行进一步配置筛选,以从多个待筛选文档中确定目标文档,对此不做限制。It can be understood that in the embodiment of the present disclosure, documents in the target document library may have multiple target sub-tags, and multiple documents to be processed may have a certain target sub-tag overlap. In this case, when based on requirements When searching for target documents using sub-tags, there may be multiple documents obtained through the search. In this case, you can further configure and filter the multiple documents to be filtered obtained from the aforementioned search based on the required attributes to select from multiple documents to be filtered. The target document is determined in the document, and there is no restriction on this.
举例而言,本公开实施例中,可以结合具体的示意图对本公开实施例描述的文档搜索方法进行具体的举例说明,在文档搜索方法的初始阶段,文档搜索平台可以接收文档搜索请求,而后可以根据文档搜索请求中的需求属性,在目标文档库中的文档属性编辑界面(参见图11,图11是本公开一实施例提出的文档属性编辑界面的示意图),对目标文档库中标签的属性进行编辑,从而从目标文档库中的父标签中确定目标父标签和目标子标签,以参与文档搜索得到相应目标文档。For example, in the embodiment of the present disclosure, the document search method described in the embodiment of the present disclosure can be specifically illustrated with specific schematic diagrams. In the initial stage of the document search method, the document search platform can receive the document search request, and then can For the required attributes in the document search request, in the document attribute editing interface in the target document library (see Figure 11, which is a schematic diagram of the document attribute editing interface proposed by an embodiment of the present disclosure), the attributes of the tags in the target document library are Edit, thereby determining the target parent tag and target sub-tag from the parent tags in the target document library, so as to participate in the document search and obtain the corresponding target document.
而后,可以将文档搜索请求中的需求子标签键入目标文档搜索平台的文档搜索界面(参见图12,图12是本公开一实施例提出的文档搜索界面的示意图),目标文档搜索平台可以根据需求子标签和目标子标签之间的相似度值对目标文档进行搜索,并将搜索得到的一个或者多个待筛选文档按相似度值大小进行依次排序后呈现在文档搜索界面,还可以支持通过如图12所示的文档搜索界面的筛选配置项,进入文档筛选界面(参见图13,图13是本公开一实施例提出的文档筛选界面的示意图),并根据需求属性,对待筛选文档的父标签进行筛选条件配置,以从多个待筛选文档中筛选得到目标文档。Then, the requirement sub-tag in the document search request can be entered into the document search interface of the target document search platform (see Figure 12, which is a schematic diagram of the document search interface proposed by an embodiment of the present disclosure). The target document search platform can search according to the requirements. Search the target document based on the similarity value between the sub-tag and the target sub-tag, and sort one or more documents to be filtered obtained by the search according to the similarity value and then present them in the document search interface. It can also support the following methods: The filtering configuration items of the document search interface shown in Figure 12 enter the document screening interface (see Figure 13, which is a schematic diagram of the document screening interface proposed by an embodiment of the present disclosure), and according to the requirement attributes, the parent tag of the document to be filtered is Configure filter conditions to select target documents from multiple documents to be filtered.
本实施例中,通过接收文档搜索请求,再从文档搜索请求中解析需求文档类型和需求标签信息,从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的待处理文档,并从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档,再根据需求属性,从多个父标签中确定目标父标签,其中,目标父标签具有对应的目标子标签,并根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档,由此,可以基于需求属性,从多个父标签中确定出能够有效地满足后续文档搜索的目标父标签,从而能够更进一步的缩小标签搜索范围,从而能够在后续文档搜索过程中,有效地降低标签处理的数据量,从而能够在有效地保障文档搜索效果的同时,有效地提升文档搜索效率。In this embodiment, by receiving the document search request and then parsing the required document type and the required tag information from the document search request, the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries belong to Document search platform, the document library is used to store documents to be processed of corresponding document types, and determine the target document library corresponding to the required document type from multiple document libraries, where multiple document libraries belong to the document search platform, and the document library is used to Store documents of the corresponding document type, and then determine the target parent tag from multiple parent tags according to the requirement attribute. The target parent tag has a corresponding target subtag, and the target parent tag is selected from the target according to the requirement attribute, the requirement subtag, and the target subtag. Search the target document in the document library. Therefore, based on the requirement attributes, the target parent tag that can effectively satisfy the subsequent document search can be determined from multiple parent tags, so that the tag search scope can be further narrowed, so that subsequent documents can be searched. During the search process, the amount of data processed by tags is effectively reduced, thereby effectively improving the document search efficiency while effectively ensuring the document search effect.
图14是本公开一实施例提出的文档搜索平台的构建装置的结构示意图。Figure 14 is a schematic structural diagram of a device for building a document search platform proposed by an embodiment of the present disclosure.
参见图14,该文档搜索平台的构建装置140,包括:第一获取模块1401,用于获取待处理文档,其中,待处理文档具有对应的文档类型;第二获取模块1402,用于获取与待处理文档对 应的标签信息;构建模块1403,用于根据标签信息和待处理文档,构建与文档类型对应的目标文档库;以及形成模块1404,用于根据目标文档库,形成目标文档搜索平台。Referring to Figure 14, the construction device 140 of the document search platform includes: a first acquisition module 1401, used to acquire a document to be processed, where the document to be processed has a corresponding document type; a second acquisition module 1402, used to acquire the document to be processed. Process the tag information corresponding to the document; the building module 1403 is used to build a target document library corresponding to the document type based on the tag information and the document to be processed; and the forming module 1404 is used to form a target document search platform based on the target document library.
可选地,一些实施例中,参见图15,图15是本公开另一实施例提出的文档搜索平台的构建装置的结构示意图,其中,第二获取模块1402,包括:第一确定子模块14021,用于确定与待处理文档对应的父标签;解析子模块14022,用于从待处理文档中解析得到与父标签对应的子标签;以及处理子模块14023,用于将父标签和子标签共同作为标签信息。Optionally, in some embodiments, refer to Figure 15, which is a schematic structural diagram of a device for building a document search platform proposed by another embodiment of the present disclosure, in which the second acquisition module 1402 includes: a first determination sub-module 14021 , used to determine the parent tag corresponding to the document to be processed; the parsing sub-module 14022, used to parse the sub-tag corresponding to the parent tag from the document to be processed; and the processing sub-module 14023, used to combine the parent tag and the sub-tag as Label Information.
可选地,一些实施例中,解析子模块14022,还用于:调用人工智能AI领域的自然语言处理NLP服务,从待处理文档中识别与父标签对应的文档通用索引,并将文档通用索引作为子标签;和/或调用NLP服务,从待处理文档中识别与父标签对应的关联实体值,并将关联实体值作为子标签。Optionally, in some embodiments, the parsing sub-module 14022 is also used to: call the natural language processing NLP service in the artificial intelligence field, identify the document universal index corresponding to the parent tag from the document to be processed, and add the document universal index to as a child tag; and/or call the NLP service to identify the associated entity value corresponding to the parent tag from the document to be processed, and use the associated entity value as a child tag.
可选地,一些实施例中,构建模块1403,包括:第二确定子模块14031,用于调用机器人流程自动化RPA机器人,确定与文档类型对应的初始文档库;存储子模块14032,用于将标签信息和待处理文档存储至初始文档库,以形成目标文档库。Optionally, in some embodiments, the building module 1403 includes: a second determination sub-module 14031, used to call the robotic process automation RPA robot to determine the initial document library corresponding to the document type; a storage sub-module 14032, used to store the tag Information and pending documents are stored in the initial document library to form the target document library.
可选地,一些实施例中,存储子模块14032,还用于:获取与待处理文档对应的目标加载类型;采用与目标加载类型对应的目标文档存储方式,将标签信息和待处理文档存储至初始文档库。Optionally, in some embodiments, the storage submodule 14032 is also used to: obtain the target loading type corresponding to the document to be processed; use the target document storage method corresponding to the target loading type to store the tag information and the document to be processed in Initial document library.
可选地,一些实施例中,存储子模块14032,还用于:如果目标加载类型是文档加载类型,则将待处理文档和相应文档标签信息存储至目标文档库;和/或如果目标加载类型是链接加载类型,则将与待处理文档对应的访问链接和相应标签信息存储至目标文档库;和/或如果目标加载类型是富文本加载类型,则经由富文本编辑器对待处理文档进行编辑处理,并将编辑处理结果和相应标签信息存储至目标文档库。Optionally, in some embodiments, the storage submodule 14032 is also used to: if the target loading type is a document loading type, store the document to be processed and the corresponding document tag information to the target document library; and/or if the target loading type is If it is a link loading type, then the access link and corresponding tag information corresponding to the document to be processed will be stored in the target document library; and/or if the target loading type is a rich text loading type, the document to be processed will be edited through a rich text editor. , and store the editing processing results and corresponding tag information to the target document library.
可选地,一些实施例中,第二获取模块1402,还包括:配置子模块14024,用于在确定与待处理文档对应的父标签之后,针对父标签配置属性,并将所配置属性作为标签信息,其中,属性用于标识父标签是否参与文档搜索。Optionally, in some embodiments, the second acquisition module 1402 also includes: a configuration sub-module 14024, configured to configure attributes for the parent tag after determining the parent tag corresponding to the document to be processed, and use the configured attributes as the tag Information where the attribute identifies whether the parent tag participates in document searches.
可选地,一些实施例中,文档搜索平台的构建方法是采用人工智能AI和机器人流程自动化RPA实现的。Optionally, in some embodiments, the document search platform is constructed using artificial intelligence (AI) and robotic process automation (RPA).
需要说明的是,本公开实施例中的上述各模块的功能及具体实现原理,可参照上述各方法实施例,此处不再赘述。It should be noted that the functions and specific implementation principles of the above-mentioned modules in the embodiments of the present disclosure may be referred to the above-mentioned method embodiments, and will not be described again here.
本实施例中,通过获取待处理文档,其中,待处理文档具有对应的文档类型,并获取与待处理文档对应的标签信息,再根据标签信息和待处理文档,构建与文档类型对应的目标文档库,以及根据目标文档库,形成目标文档搜索平台,由于根据与文档类型对应的目标文档库,形成目标文档搜索平台,从而使得构建得到的目标文档平台可以基于相应文档类型的目标文档库,为不同业务场景提供相应文档类型的文档搜索服务,从而能够有效地提升文档搜索平台的复用性,使得构建得到的文档搜索平台能够有效地满足不同业务场景的文档搜索需求。In this embodiment, the document to be processed is obtained, where the document to be processed has a corresponding document type, and the tag information corresponding to the document to be processed is obtained, and then a target document corresponding to the document type is constructed based on the tag information and the document to be processed. library, and form a target document search platform based on the target document library. Since the target document search platform is formed based on the target document library corresponding to the document type, the built target document platform can be based on the target document library of the corresponding document type. Different business scenarios provide document search services for corresponding document types, which can effectively improve the reusability of the document search platform, so that the constructed document search platform can effectively meet the document search needs of different business scenarios.
图16是本公开一实施例提出的文档搜索装置的结构示意图。Figure 16 is a schematic structural diagram of a document search device proposed by an embodiment of the present disclosure.
参见图16,该文档搜索装置160,包括:接收模块1601,用于接收文档搜索请求;解析模块1602,用于从文档搜索请求中解析需求文档类型和需求标签信息;确定模块1603,用于从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档;搜索模块1604,用于从目标文档库中搜索与需求标签信息对应的目标文档。Referring to Figure 16, the document search device 160 includes: a receiving module 1601, used to receive a document search request; a parsing module 1602, used to parse the required document type and demand tag information from the document search request; and a determining module 1603, used to obtain the required document type and required tag information from the document search request. Determine the target document library corresponding to the required document type in multiple document libraries, wherein the multiple document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type; the search module 1604 is used to search from the target document library The target document corresponding to the requirement tag information.
可选地,一些实施例中,参见图17,图17是本公开另一实施例提出的文档搜索装置的结构示意图,需求标签信息包括:需求属性和需求子标签,目标文档库中具有对应的多个父标签,父标签具有所对应子标签,所对应子标签用于描述文档。Optionally, in some embodiments, refer to Figure 17, which is a schematic structural diagram of a document search device proposed by another embodiment of the present disclosure. The requirement tag information includes: requirement attributes and requirement sub-tags, and the target document library has corresponding Multiple parent tags, the parent tag has corresponding sub-tags, and the corresponding sub-tags are used to describe the document.
其中,搜索模块1604,包括:第三确定子模块16041,用于调用人工智能AI领域的自然语言处理NLP服务处理需求属性,以从多个父标签中确定目标父标签,其中,目标父标签具有所对应目标子标签;搜索子模块16042,用于根据需求属性、需求子标签,以及目标子标签从目标文档库中搜索目标文档。Among them, the search module 1604 includes: a third determination sub-module 16041, which is used to call the natural language processing NLP service processing requirement attribute in the artificial intelligence field to determine the target parent tag from multiple parent tags, where the target parent tag has Corresponding target sub-tag; search sub-module 16042, used to search for target documents from the target document library according to the requirement attributes, requirement sub-tags, and target sub-tags.
可选地,一些实施例中,目标文档库包括:多个文档;其中,搜索子模块16042,还用于:调用机器人流程自动化RPA机器人,以根据需求子标签和目标子标签从多个文档中搜索待筛选文档;根据需求属性从多个待筛选文档中筛选得到目标文档。Optionally, in some embodiments, the target document library includes: multiple documents; wherein, the search sub-module 16042 is also used to: call the robotic process automation RPA robot to select from multiple documents according to the demand sub-tag and the target sub-tag. Search for documents to be filtered; filter multiple documents to be filtered to obtain target documents based on required attributes.
可选地,一些实施例中,搜索子模块16042,还用于:确定需求子标签和各个文档的目标子标签之间的相似度值;如果相似度值满足设定条件,则将相应目标子标签所对应文档作为待筛选文档。Optionally, in some embodiments, the search sub-module 16042 is also used to: determine the similarity value between the demand sub-tag and the target sub-tag of each document; if the similarity value meets the set conditions, then add the corresponding target sub-tag to The document corresponding to the label is used as the document to be filtered.
需要说明的是,本公开实施例中的上述各模块的功能及具体实现原理,可参照上述各方法实施例,此处不再赘述。It should be noted that the functions and specific implementation principles of the above-mentioned modules in the embodiments of the present disclosure may be referred to the above-mentioned method embodiments, and will not be described again here.
本实施例中,通过接收文档搜索请求,再从文档搜索请求中解析需求文档类型和需求标签信息,并从多个文档库中确定与需求文档类型对应的目标文档库,其中,多个文档库属于文档搜索平台,文档库用于存储相应文档类型的文档,再从目标文档库中搜索与需求标签信息对应的目标文档,由此,可以支持在与用户所处业务场景业务类型相适配的目标文档库中进行文档搜索,从而能够基于目标文档库有效地缩小文档搜索的范围,在有效地提升文档搜索效率的同时,使得搜索得到目标文档能够有效地满足不同业务场景的文档搜索需求。In this embodiment, by receiving the document search request, the required document type and the required tag information are parsed from the document search request, and the target document library corresponding to the required document type is determined from multiple document libraries, wherein the multiple document libraries It belongs to a document search platform. The document library is used to store documents of corresponding document types, and then searches for target documents corresponding to the required tag information from the target document library. This can support the search for documents that match the business type of the user's business scenario. Document search is performed in the target document library, which can effectively narrow the scope of document search based on the target document library. While effectively improving the efficiency of document search, the target documents obtained by searching can effectively meet the document search needs of different business scenarios.
为了实现上述实施例,本公开还提供一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行程序时,实现如本公开前述实施例提出的文档搜索平台的构建方法,或者实现如本公开前述实施例提出的文档搜索方法。In order to implement the above embodiments, the present disclosure also provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the aforementioned embodiments of the present disclosure are implemented. The proposed construction method of the document search platform, or the implementation of the document search method proposed in the foregoing embodiments of this disclosure.
图18为本公开一实施例提供的电子设备的硬件结构示意图。如图18所示,该电子设备180包括:存储器1810和处理器1820,存储器1810内存储有可在处理器1820上运行的计算机程序。处理器1820执行该计算机程序时实现上述实施例中的文档搜索平台的构建方法,或者实现如上述实施例中的文档搜索方法。存储器1810和处理器1820的数量可以为一个或多个。FIG. 18 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present disclosure. As shown in Figure 18, the electronic device 180 includes: a memory 1810 and a processor 1820. The memory 1810 stores a computer program that can run on the processor 1820. When the processor 1820 executes the computer program, it implements the construction method of the document search platform in the above embodiment, or implements the document search method in the above embodiment. The number of memory 1810 and processor 1820 may be one or more.
该电子设备还包括:通信接口1830,用于与外界设备进行通信,进行数据交互传输。如果存储器1810、处理器1820和通信接口1830独立实现,则存储器1810、处理器1820和通信接口1830可以通过总线相互链接并完成相互间的通信。该总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The electronic device also includes: a communication interface 1830, used for communicating with external devices and performing interactive data transmission. If the memory 1810, the processor 1820 and the communication interface 1830 are implemented independently, the memory 1810, the processor 1820 and the communication interface 1830 can be linked to each other through a bus and complete communication with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.
可选的,在具体实现上,如果存储器1810、处理器1820及通信接口1830集成在一块芯片上,则存储器1810、处理器1820及通信接口1830可以通过内部接口完成相互间的通信。Optionally, in terms of specific implementation, if the memory 1810, the processor 1820 and the communication interface 1830 are integrated on one chip, the memory 1810, the processor 1820 and the communication interface 1830 can communicate with each other through the internal interface.
本公开还提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,实现如本公开前述实施例提出的文档搜索平台的构建方法,或者实现如上述实施例中的文档搜索方法。The present disclosure also provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the method for building a document search platform as proposed in the foregoing embodiments of the disclosure is implemented, or the document search platform as in the foregoing embodiments is implemented. Search method.
本公开还提供一种计算机程序产品,当计算机程序产品中的指令处理器执行时,实现如本公开前述实施例提出的文档搜索平台的构建方法,或者实现如上述实施例中的文档搜索方法。The present disclosure also provides a computer program product that, when executed by an instruction processor in the computer program product, implements the construction method of a document search platform as proposed in the foregoing embodiments of the disclosure, or implements the document search method as in the foregoing embodiments.
应理解的是,上述处理器可以是中央处理器(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。值得说明的是,处理器可以是支持进阶精简指令集机器(Advanced RISC Machines,ARM)架构的处理器。It should be understood that the above-mentioned processor can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processing, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor, etc. It is worth noting that the processor may be a processor that supports Advanced RISC Machines (ARM) architecture.
进一步地,可选的,上述存储器可以包括只读存储器和随机存取存储器,还可以包括非易失性随机存取存储器。该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以包括随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用。例如,静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic Random Access Memory,DRAM)和直接内存总线随机存取存储器(Direct Access RAM,DR RAM)。Further, optionally, the above-mentioned memory may include read-only memory and random access memory, and may also include non-volatile random access memory. The memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can include read-only memory (Read-Only Memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may include Random Access Memory (RAM), which acts as an external cache. By way of illustration, but not limitation, many forms of RAM are available. For example, static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM) and direct memory bus random access memory (Direct Access RAM, DR RAM).
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。上述集成的模块如果以软件功能模块 的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读存储介质中。该存储介质可以是只读存储器,磁盘或光盘等。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the above integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. The storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到其各种变化或替换,这些都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of various changes or modifications within the technical scope of the present disclosure. alternatives, these should all be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (15)

  1. 一种文档搜索平台的构建方法,包括:A method for building a document search platform, including:
    获取待处理文档,其中,所述待处理文档具有对应的文档类型;Obtain a document to be processed, where the document to be processed has a corresponding document type;
    获取与所述待处理文档对应的标签信息;Obtain tag information corresponding to the document to be processed;
    根据所述标签信息和所述待处理文档,构建与所述文档类型对应的目标文档库;以及Construct a target document library corresponding to the document type according to the tag information and the document to be processed; and
    根据所述目标文档库,形成目标文档搜索平台。According to the target document library, a target document search platform is formed.
  2. 如权利要求1所述的方法,其中,所述获取与所述待处理文档对应的标签信息,包括:The method of claim 1, wherein said obtaining tag information corresponding to the document to be processed includes:
    确定与所述待处理文档对应的父标签;Determine the parent tag corresponding to the document to be processed;
    从所述待处理文档中解析与所述父标签对应的子标签;以及Parse the child tag corresponding to the parent tag from the document to be processed; and
    将所述父标签和所述子标签共同作为所述标签信息。The parent tag and the child tag are collectively used as the tag information.
  3. 如权利要求2所述的方法,其中,所述从所述待处理文档中解析与所述父标签对应的子标签,包括:The method of claim 2, wherein parsing the child tag corresponding to the parent tag from the document to be processed includes:
    调用人工智能AI领域的自然语言处理NLP服务,从所述待处理文档中识别与所述父标签对应的文档通用索引,并将所述文档通用索引作为所述子标签;和/或Call the natural language processing NLP service in the artificial intelligence field, identify the document universal index corresponding to the parent tag from the document to be processed, and use the document universal index as the child tag; and/or
    调用所述NLP服务,从所述待处理文档中识别与所述父标签对应的关联实体值,并将所述关联实体值作为所述子标签。The NLP service is called, the associated entity value corresponding to the parent tag is identified from the document to be processed, and the associated entity value is used as the child tag.
  4. 如权利要求1至3中任一项所述的方法,其中,所述根据所述标签信息和所述待处理文档,构建与所述文档类型对应的目标文档库,包括:The method according to any one of claims 1 to 3, wherein said constructing a target document library corresponding to the document type according to the tag information and the document to be processed includes:
    调用机器人流程自动化RPA机器人,确定与所述文档类型对应的初始文档库;Call the robotic process automation RPA robot to determine the initial document library corresponding to the document type;
    将所述标签信息和所述待处理文档存储至所述初始文档库,以形成所述目标文档库。The tag information and the document to be processed are stored in the initial document library to form the target document library.
  5. 如权利要求4所述的方法,其中,所述将所述标签信息和所述待处理文档存储至所述初始文档库,包括:The method of claim 4, wherein storing the tag information and the document to be processed into the initial document library includes:
    获取与所述待处理文档对应的目标加载类型;Obtain the target loading type corresponding to the document to be processed;
    采用与所述目标加载类型对应的目标文档存储方式,将所述标签信息和所述待处理文档存储至所述初始文档库。The tag information and the document to be processed are stored in the initial document library using a target document storage method corresponding to the target loading type.
  6. 如权利要求5所述的方法,其中,所述采用与所述目标加载类型对应的目标文档存储方式,将所述标签信息和所述待处理文档存储至所述初始文档库,包括:The method of claim 5, wherein the step of storing the tag information and the document to be processed in the initial document library using a target document storage method corresponding to the target loading type includes:
    如果所述目标加载类型是文档加载类型,则将所述待处理文档和相应所述标签信息存储至所述目标文档库;和/或If the target loading type is a document loading type, store the document to be processed and the corresponding tag information in the target document library; and/or
    如果所述目标加载类型是链接加载类型,则将与所述待处理文档对应的访问链接和相应所述标签信息存储至所述目标文档库;和/或If the target loading type is a link loading type, store the access link corresponding to the document to be processed and the corresponding tag information to the target document library; and/or
    如果所述目标加载类型是富文本加载类型,则经由富文本编辑器对所述待处理文档进行编辑处理,并将编辑处理结果和相应所述标签信息存储至所述目标文档库。If the target loading type is a rich text loading type, the document to be processed is edited through a rich text editor, and the editing processing result and the corresponding tag information are stored in the target document library.
  7. 如权利要求2或3所述的方法,其中,在所述确定与所述待处理文档对应的父标签之后,还包括:The method of claim 2 or 3, wherein after determining the parent tag corresponding to the document to be processed, it further includes:
    针对所述父标签配置属性,并将所配置属性作为所述标签信息,其中,所述属性用于标识所述父标签是否参与文档搜索。Configure attributes for the parent tag, and use the configured attributes as the tag information, where the attributes are used to identify whether the parent tag participates in document search.
  8. 一种文档搜索方法,应用于文档搜索平台,所述文档搜索平台由上述权利要求1-7中任一项所述的文档搜索平台的构建方法构建得到;A document search method, applied to a document search platform, the document search platform is constructed by the construction method of the document search platform described in any one of the above claims 1-7;
    其中,所述文档搜索方法包括:Wherein, the document search method includes:
    接收文档搜索请求;Receive document search requests;
    从所述文档搜索请求中解析需求文档类型和需求标签信息;Parse the requirement document type and requirement tag information from the document search request;
    从多个文档库中确定与所述需求文档类型对应的目标文档库,其中,所述多个文档库属于所述文档搜索平台,所述文档库用于存储相应文档类型的文档;Determine a target document library corresponding to the required document type from a plurality of document libraries, wherein the plurality of document libraries belong to the document search platform, and the document library is used to store documents of the corresponding document type;
    从所述目标文档库中搜索与所述需求标签信息对应的目标文档。Search the target document library for target documents corresponding to the requirement tag information.
  9. 如权利要求8所述的方法,其中,所述需求标签信息包括:需求属性和需求子标签,所述目标文档库中具有对应的多个父标签,所述父标签具有所对应子标签,所述所对应子标签用于描述所述文档;The method of claim 8, wherein the requirement tag information includes: requirement attributes and requirement sub-tags, the target document library has multiple corresponding parent tags, the parent tag has corresponding sub-tags, so The corresponding sub-tag is used to describe the document;
    其中,所述从所述目标文档库中搜索与所述需求标签信息对应的目标文档,包括:Wherein, searching for target documents corresponding to the requirement tag information from the target document library includes:
    调用人工智能AI领域的自然语言处理NLP服务处理所述需求属性,以从所述多个父标签中确定目标父标签,其中,所述目标父标签具有所对应目标子标签;Call the natural language processing NLP service in the artificial intelligence field to process the requirement attributes to determine a target parent tag from the multiple parent tags, where the target parent tag has a corresponding target sub-tag;
    根据所述需求属性、所述需求子标签,以及所述目标子标签从所述目标文档库中搜索所述目标文档。The target document is searched from the target document library according to the requirement attribute, the requirement subtag, and the target subtag.
  10. 如权利要求9所述的方法,其中,所述目标文档库包括:多个所述文档;The method of claim 9, wherein the target document library includes: a plurality of the documents;
    其中,所述根据所述需求属性、所述需求子标签,以及所述目标子标签从所述目标文档库中搜索所述目标文档,包括:Wherein, searching for the target document from the target document library according to the requirement attribute, the requirement sub-tag, and the target sub-tag includes:
    调用机器人流程自动化RPA机器人,以根据所述需求子标签和所述目标子标签从多个所述文档中搜索待筛选文档;Call the robotic process automation RPA robot to search for documents to be filtered from multiple documents according to the requirement sub-tag and the target sub-tag;
    根据所述需求属性从多个所述待筛选文档中筛选得到所述目标文档。The target document is obtained from a plurality of documents to be filtered according to the requirement attribute.
  11. 如权利要求10所述的方法,其中,所述根据所述需求子标签和所述目标子标签从多个所述文档中搜索待筛选文档,包括:The method according to claim 10, wherein said searching for documents to be filtered from a plurality of said documents according to said demand sub-tag and said target sub-tag includes:
    确定所述需求子标签和各个所述文档的所述目标子标签之间的相似度值;Determine the similarity value between the requirement sub-tag and the target sub-tag of each of the documents;
    如果所述相似度值满足设定条件,则将相应所述目标子标签所对应文档作为所述待筛选文档。If the similarity value meets the set condition, the document corresponding to the target sub-tag is used as the document to be filtered.
  12. 一种文档搜索平台的构建装置,包括:A device for constructing a document search platform, including:
    第一获取模块,用于获取待处理文档,其中,所述待处理文档具有对应的文档类型;The first acquisition module is used to acquire documents to be processed, where the documents to be processed have corresponding document types;
    第二获取模块,用于获取与所述待处理文档对应的标签信息;The second acquisition module is used to acquire tag information corresponding to the document to be processed;
    构建模块,用于根据所述标签信息和所述待处理文档,构建与所述文档类型对应的目标文档库;以及A building module, configured to build a target document library corresponding to the document type based on the tag information and the document to be processed; and
    形成模块,用于根据所述目标文档库,形成目标文档搜索平台。A forming module is used to form a target document search platform according to the target document library.
  13. 一种文档搜索装置,应用于文档搜索平台,所述文档搜索平台由上述权利要求12中所述的文档搜索平台的构建装置构建得到;A document search device, applied to a document search platform, the document search platform is constructed by the document search platform construction device described in claim 12;
    其中,所述文档搜索装置包括:Wherein, the document search device includes:
    接收模块,用于接收文档搜索请求;The receiving module is used to receive document search requests;
    解析模块,用于从所述文档搜索请求中解析需求文档类型和需求标签信息;A parsing module, used to parse the requirement document type and requirement tag information from the document search request;
    确定模块,用于从多个文档库中确定与所述需求文档类型对应的目标文档库,其中,所述多个文档库属于所述文档搜索平台,所述文档库用于存储相应文档类型的文档;A determination module, configured to determine a target document library corresponding to the required document type from a plurality of document libraries, wherein the plurality of document libraries belong to the document search platform, and the document library is used to store the corresponding document type. document;
    搜索模块,用于从所述目标文档库中搜索与所述需求标签信息对应的目标文档。A search module, configured to search for target documents corresponding to the requirement tag information from the target document library.
  14. 一种电子设备,包括:处理器和存储器,所述存储器中存储指令,所述指令由处理器加载并执行,以实现如权利要求1至7中任一项所述的文档搜索平台的构建方法,或者实现如权利要求8至11中任一项所述的文档搜索方法。An electronic device, including: a processor and a memory, instructions stored in the memory, and the instructions are loaded and executed by the processor to implement the method for building a document search platform according to any one of claims 1 to 7 , or implement the document search method as described in any one of claims 8 to 11.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述的文档搜索平台的构建方法,或者实现如权利要求8至11中任一项所述的文档搜索方法。A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the document search platform as described in any one of claims 1-7 is implemented. Construct a method, or implement the document search method as described in any one of claims 8 to 11.
PCT/CN2022/100921 2022-06-07 2022-06-23 Document search platform, search method and apparatus, electronic device, and storage medium WO2023236257A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210637112.2A CN114936269A (en) 2022-06-07 2022-06-07 Document searching platform, searching method, device, electronic equipment and storage medium
CN202210637112.2 2022-06-07

Publications (1)

Publication Number Publication Date
WO2023236257A1 true WO2023236257A1 (en) 2023-12-14

Family

ID=82866472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100921 WO2023236257A1 (en) 2022-06-07 2022-06-23 Document search platform, search method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN114936269A (en)
WO (1) WO2023236257A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118172448A (en) * 2024-05-11 2024-06-11 中移(苏州)软件技术有限公司 Data processing method and device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971780A (en) * 2023-12-29 2024-05-03 青矩技术股份有限公司 Document storage method, device, equipment and storage medium
CN117688136B (en) * 2024-01-30 2024-04-30 广州敏行数字科技有限公司 Combined retrieval optimization method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262640A (en) * 2010-05-31 2011-11-30 中国移动通信集团贵州有限公司 Method and device for full-text retrieval of document database
US20200110839A1 (en) * 2018-10-05 2020-04-09 International Business Machines Corporation Determining tags to recommend for a document from multiple database sources
US10776376B1 (en) * 2014-12-05 2020-09-15 Veritas Technologies Llc Systems and methods for displaying search results
CN112507068A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Document query method and device, electronic equipment and storage medium
CN113204621A (en) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 Document storage method, document retrieval method, device, equipment and storage medium
CN113449063A (en) * 2021-06-25 2021-09-28 树根互联股份有限公司 Method and device for constructing document structure information retrieval library

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262640A (en) * 2010-05-31 2011-11-30 中国移动通信集团贵州有限公司 Method and device for full-text retrieval of document database
US10776376B1 (en) * 2014-12-05 2020-09-15 Veritas Technologies Llc Systems and methods for displaying search results
US20200110839A1 (en) * 2018-10-05 2020-04-09 International Business Machines Corporation Determining tags to recommend for a document from multiple database sources
CN112507068A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Document query method and device, electronic equipment and storage medium
CN113204621A (en) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 Document storage method, document retrieval method, device, equipment and storage medium
CN113449063A (en) * 2021-06-25 2021-09-28 树根互联股份有限公司 Method and device for constructing document structure information retrieval library

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118172448A (en) * 2024-05-11 2024-06-11 中移(苏州)软件技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114936269A (en) 2022-08-23

Similar Documents

Publication Publication Date Title
WO2023236257A1 (en) Document search platform, search method and apparatus, electronic device, and storage medium
WO2019136993A1 (en) Text similarity calculation method and device, computer apparatus, and storage medium
CN105431844B (en) Third party for search system searches for application
JP5736469B2 (en) Search keyword recommendation based on user intention
WO2021114810A1 (en) Graph structure-based official document recommendation method, apparatus, computer device, and medium
US20160239504A1 (en) Method for entity enrichment of digital content to enable advanced search functionality in content management systems
JP2010073114A (en) Image information search device, image information search method, computer program for the same
JP6932360B2 (en) Object search method, device and server
JP2022031625A (en) Method and device for pushing information, electronic device, storage medium, and computer program
US11756301B2 (en) System and method for automatically detecting and marking logical scenes in media content
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN112732949A (en) Service data labeling method and device, computer equipment and storage medium
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN112307318A (en) Content publishing method, system and device
US9984108B2 (en) Database joins using uncertain criteria
CN107633080B (en) User task processing method and device
CN111859042A (en) Retrieval method and device and electronic equipment
CN110489740B (en) Semantic analysis method and related product
CN115878864A (en) Data retrieval method, device and equipment and readable storage medium
US11720614B2 (en) Method and system for generating a response to an unstructured natural language (NL) query
CN113420097A (en) Data analysis method and device, storage medium and server
CN117931812B (en) Digital object element registry system facing data space and searching method
CN111427870B (en) Resource management method, device and equipment
CN109408368A (en) A kind of output method, storage medium and server for testing auxiliary information
CN112883249B (en) Layout document processing method and device and application method of device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22945396

Country of ref document: EP

Kind code of ref document: A1