CN117472854A - Acceleration batch file search model - Google Patents

Acceleration batch file search model Download PDF

Info

Publication number
CN117472854A
CN117472854A CN202311417905.4A CN202311417905A CN117472854A CN 117472854 A CN117472854 A CN 117472854A CN 202311417905 A CN202311417905 A CN 202311417905A CN 117472854 A CN117472854 A CN 117472854A
Authority
CN
China
Prior art keywords
file
index
authority
information
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311417905.4A
Other languages
Chinese (zh)
Inventor
康宁波
李志伟
王海超
仇晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Shaka Intelligent Technology Co ltd
Original Assignee
Suzhou Shaka Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Shaka Intelligent Technology Co ltd filed Critical Suzhou Shaka Intelligent Technology Co ltd
Priority to CN202311417905.4A priority Critical patent/CN117472854A/en
Publication of CN117472854A publication Critical patent/CN117472854A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses an acceleration batch file search model, which comprises a file index and file metadata formed by associating the file index, wherein the authority control of the file index comprises a catalog space, tenant information, user information and file authority; the file index creation comprises tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, a compensation mechanism is added besides corresponding attempt processing, and missing or abnormal file updating is timed. The method and the device support file keyword query through file aggregation to a unified file index, realize low-delay file data update, support multiple condition searches such as keyword segmentation search, authority control, file path search and the like, are decoupled from services such as file information metadata query, preview, editing and the like, pay attention to unified summarization of data, pay attention to query efficiency, and pay attention to high coverage rate and accuracy of query results and authority control.

Description

Acceleration batch file search model
Technical Field
The invention belongs to the technical field of file searching, and particularly relates to an acceleration batch file searching model.
Background
In the conventional standardized management of a company, knowledge base storage is an extremely important standardized management link for the company, and knowledge base management is used for recording information and knowledge, so that team precipitation experience and resource sharing are facilitated, team cooperation and safety management and control are realized, a complete knowledge system is formed, and continuous evolution is realized.
At present, a large number of files of a company are stored in a service terminal, and the files are scattered and unstructured data are difficult to retrieve. The usual file search is directed to structured file metadata queries, whereas the current scenario of the elastiscearch search application is not commonly used in knowledge base file storage in log analysis and web blogs.
The current knowledge base tool of the company can manage the company files and set access rights through rights control, but file searching is mainly realized through fuzzy matching of file names, so that the searched data need to accurately know keywords contained in the file names, otherwise, the required files cannot be searched.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a scheme for accelerating a batch file search model, so as to solve the problem that the matching range of files cannot be searched through file keywords in the existing company knowledge base.
An accelerated batch file search model comprises a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, the authority control of the file index comprises directory space, tenant information, user information and file authority,
the file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library;
the catalog space is used for storing metadata of tables, indexes and other objects, and comprises names of the tables, names of columns, data types of the columns and name information of the indexes;
the tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes;
the user information is used for controlling the attribution authority of the document;
the file authority is used for controlling the authority process, the authority information and the outer chain sharing of the file, and is the authority with the finest granularity in the file index.
Further, the file index creation includes tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, except the corresponding trial processing, a compensation mechanism is added, and the missing or abnormal file is updated regularly.
Further, the creation of the file index includes the following:
space maintenance: data maintained in any space catalogue under tenant information is gathered into an index service through pushing of a file service;
uploading files: in the file uploading process, besides the conventional file storing process, metadata information of the file and content identification information of the file are stored in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled;
and (3) authority control: fine granularity control set for a visible range in the middle of a document or a directory;
and (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
Further, the file index creating operation method is a file consumer, and the file operation under different space catalogues is consumed in an asynchronous consumption mode, and the tenant, the file identifier, the operation identifier and the like are identified through the file producer to send messages; the specific steps of creating the file index include:
s1, inquiring metadata information and detailed information of files under different tenants according to tenant IDs, file metadata IDs and file editing types as reference entering objects;
s2, processing file synchronization according to the editing type strategy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
s3, indexing the abnormal processing condition of the file information, and waiting for retrying to update again in the database.
In particular, in the space maintenance, an empty space directory is not meaningful for the index service, and only the space directory containing the file entity can be added as an attribute to the index document.
Further, the authority control of the file search supports the SaaS version multi-tenant authority isolation level, including setting directory authorities, updating document authorities, setting document authorities, updating document authorities and modifying document authorities;
the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file;
by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
Compared with the prior art, the invention has the following advantages:
(1) The method is characterized in that the method comprises the steps of converging files to a unified file index server, supporting file keyword query, realizing low-delay file data update, supporting keyword segmentation search, supporting authority control, supporting multiple condition searches such as file path search, and the like.
(2) The dependence on the service application system is reduced, and the performance pressure on the database is released; the scattered file data are aggregated, the file searching is portable and high in reusability, the file searching performance is improved, the full-text searching is applied to the knowledge base, a new searching mode is realized, and the file searching accuracy is realized.
Drawings
FIG. 1 is a block diagram of a model of the present invention;
FIG. 2 is a flow chart of file index creation according to the present invention;
FIG. 3 is a diagram of the rights control for file searching of the present invention;
FIG. 4 is a diagram of a data structure in an embodiment of the present invention;
FIG. 5 is a class diagram of an implementation of an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, an accelerated batch file search model comprises file indexes and file metadata formed by associating the file indexes, wherein the authority control of the file indexes is controlled by the association relation of the file metadata, and the authority control of the file indexes comprises directory space, tenant information, user information and file authority. The file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library; the catalog space is used for storing metadata of the table, the index and other objects, and comprises names of the table, names of columns, data types of the columns and name information of the index; tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes; the user information is used for controlling the attribution authority of the document; the file authority is used for controlling the authority process, the authority information and the external chain sharing of the file, and is the authority with the finest granularity in the file index.
As shown in fig. 2, the file index is created by tenant, file service and index service, the index service adopts asynchronous mode to create and update, and adds compensation mechanism to update abnormal data except corresponding attempt processing, and updates missing or abnormal files at regular time. Creation of the file index includes the following:
space maintenance: data maintained in any space directory under tenant information is gathered into an index service through pushing of a file service.
Uploading files: in addition to storing the file in the file storage space conventionally, the file uploading process also needs to store metadata information of the file and content identification information of the file in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled.
And (3) authority control: is fine-grained control of the visible range settings in the middle of a document or directory.
And (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
As shown in fig. 3, the rights control of file search supports the multi-tenant rights isolation level of SaaS version, including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights; the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file; by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
Examples
The business requirement is that a company realizes a knowledge base global search function under a multi-tenant scene, and performs condition definition according to the catalog space to which a user belongs and the attribution of the project or product space to which the user belongs, and simultaneously performs global search on document information set through authority control.
FIG. 4 shows a data structure retrieved inside a company, wherein the main operation method class of document index creation is FileConsumer, and document operations under different space catalogs are consumed in an asynchronous consumption mode. The tenant is identified by the FileProducer, the file identifier, the operation identifier, etc. send the message.
As shown in the implementation class diagram of FIG. 5, the specific steps of creating a file index can be known to include:
(1) According to tenant IDs, file metadata IDs and file editing types as entry objects, inquiring metadata information and detailed information of files under different tenants;
(2) A process of processing file synchronization according to an editing type policy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
(3) And (3) indexing the abnormal processing condition of the Chinese information, and waiting for retrying to update again in the database.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (6)

1. An acceleration batch file search model is characterized by comprising a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, and the authority control of the file index comprises a directory space, tenant information, user information and file authority,
the file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library;
the catalog space is used for storing metadata of tables, indexes and other objects, and comprises names of the tables, names of columns, data types of the columns and name information of the indexes;
the tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes;
the user information is used for controlling the attribution authority of the document;
the file authority is used for controlling the authority process, the authority information and the outer chain sharing of the file, and is the authority with the finest granularity in the file index.
2. The accelerated bulk file search model of claim 1 wherein the file index creation includes tenant, file service and index service, the index service being created and updated asynchronously, and wherein compensation mechanisms are added to update missing or abnormal files in a timed manner except for corresponding attempted processing of the updated abnormal data.
3. An accelerated bulk file search model of claim 2, wherein the creation of the file index comprises the following:
space maintenance: data maintained in any space catalogue under tenant information is gathered into an index service through pushing of a file service;
uploading files: in the file uploading process, besides the conventional file storing process, metadata information of the file and content identification information of the file are stored in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled;
and (3) authority control: fine granularity control set for a visible range in the middle of a document or a directory;
and (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
4. An accelerated batch file search model according to claim 3, wherein the file index creation operation method is fileConsumer, and uses asynchronous consumption to consume the file operations under different space catalogues, and uses fileProducer to identify tenants, file identifiers, operation identifiers and the like to send messages; the specific steps of creating the file index include:
s1, inquiring metadata information and detailed information of files under different tenants according to tenant IDs, file metadata IDs and file editing types as reference entering objects;
s2, processing file synchronization according to the editing type strategy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
s3, indexing the abnormal processing condition of the file information, and waiting for retrying to update again in the database.
5. An accelerated bulk file search model of claim 3, wherein in the spatial maintenance, empty spatial directories are meaningless to an indexing service, and only spatial directories containing file entities can be added as attributes to an index document.
6. An accelerated bulk file search model of claim 3 wherein the rights control for file searches supports SaaS version multi-tenant rights quarantine levels including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights;
the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file;
by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
CN202311417905.4A 2023-10-30 2023-10-30 Acceleration batch file search model Pending CN117472854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311417905.4A CN117472854A (en) 2023-10-30 2023-10-30 Acceleration batch file search model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311417905.4A CN117472854A (en) 2023-10-30 2023-10-30 Acceleration batch file search model

Publications (1)

Publication Number Publication Date
CN117472854A true CN117472854A (en) 2024-01-30

Family

ID=89634120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311417905.4A Pending CN117472854A (en) 2023-10-30 2023-10-30 Acceleration batch file search model

Country Status (1)

Country Link
CN (1) CN117472854A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909299A (en) * 2024-03-19 2024-04-19 电子科技大学 Dynamic hierarchical data splitting system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909299A (en) * 2024-03-19 2024-04-19 电子科技大学 Dynamic hierarchical data splitting system
CN117909299B (en) * 2024-03-19 2024-05-10 电子科技大学 Dynamic hierarchical data splitting system

Similar Documents

Publication Publication Date Title
US11516289B2 (en) Method and system for displaying similar email messages based on message contents
US6961734B2 (en) Method, system, and program for defining asset classes in a digital library
US5802524A (en) Method and product for integrating an object-based search engine with a parametrically archived database
US7464084B2 (en) Method for performing an inexact query transformation in a heterogeneous environment
US8010887B2 (en) Implementing versioning support for data using a two-table approach that maximizes database efficiency
US20060248039A1 (en) Sharing of full text index entries across application boundaries
US20050060337A1 (en) System, method, and service for managing persistent federated folders within a federated content management system
US7734618B2 (en) Creating adaptive, deferred, incremental indexes
US20220083618A1 (en) Method And System For Scalable Search Using MicroService And Cloud Based Search With Records Indexes
US20090240714A1 (en) Semantic relational database
AU2017243870B2 (en) "Methods and systems for database optimisation"
US20140046928A1 (en) Query plans with parameter markers in place of object identifiers
CN111680041B (en) Safety high-efficiency access method for heterogeneous data
CA2379930A1 (en) Multi-model access to data
CN114116716A (en) Hierarchical data retrieval method, device and equipment
CN117472854A (en) Acceleration batch file search model
US20030135492A1 (en) Method, system, and program for defining asset queries in a digital library
US6701328B1 (en) Database management system
CN107291938A (en) Order Query System and method
EP3436988B1 (en) "methods and systems for database optimisation"
CN104834664A (en) Optical disc juke-box oriented full text retrieval system
CN114238241B (en) Metadata processing method and computer system for financial data
Zabback et al. Office documents on a database kernel—filing, retrieval, and archiving
JP2007156844A (en) Data registration/retrieval system and data registration/retrieval method
CN113806366A (en) Atlas-based method for realizing multidimensional metadata joint query

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Kang Ningbo

Inventor after: Wang Haichao

Inventor after: Qiu Chen

Inventor before: Kang Ningbo

Inventor before: Li Zhiwei

Inventor before: Wang Haichao

Inventor before: Qiu Chen