CN117472854A - Acceleration batch file search model - Google Patents
Acceleration batch file search model Download PDFInfo
- Publication number
- CN117472854A CN117472854A CN202311417905.4A CN202311417905A CN117472854A CN 117472854 A CN117472854 A CN 117472854A CN 202311417905 A CN202311417905 A CN 202311417905A CN 117472854 A CN117472854 A CN 117472854A
- Authority
- CN
- China
- Prior art keywords
- file
- index
- authority
- information
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001133 acceleration Effects 0.000 title claims abstract description 5
- 238000000034 method Methods 0.000 claims abstract description 18
- 230000002159 abnormal effect Effects 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 4
- 230000008520 organization Effects 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 12
- 238000002955 isolation Methods 0.000 claims description 8
- 238000012423 maintenance Methods 0.000 claims description 5
- 238000011084 recovery Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 abstract description 2
- 230000002776 aggregation Effects 0.000 abstract 1
- 238000004220 aggregation Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 5
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses an acceleration batch file search model, which comprises a file index and file metadata formed by associating the file index, wherein the authority control of the file index comprises a catalog space, tenant information, user information and file authority; the file index creation comprises tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, a compensation mechanism is added besides corresponding attempt processing, and missing or abnormal file updating is timed. The method and the device support file keyword query through file aggregation to a unified file index, realize low-delay file data update, support multiple condition searches such as keyword segmentation search, authority control, file path search and the like, are decoupled from services such as file information metadata query, preview, editing and the like, pay attention to unified summarization of data, pay attention to query efficiency, and pay attention to high coverage rate and accuracy of query results and authority control.
Description
Technical Field
The invention belongs to the technical field of file searching, and particularly relates to an acceleration batch file searching model.
Background
In the conventional standardized management of a company, knowledge base storage is an extremely important standardized management link for the company, and knowledge base management is used for recording information and knowledge, so that team precipitation experience and resource sharing are facilitated, team cooperation and safety management and control are realized, a complete knowledge system is formed, and continuous evolution is realized.
At present, a large number of files of a company are stored in a service terminal, and the files are scattered and unstructured data are difficult to retrieve. The usual file search is directed to structured file metadata queries, whereas the current scenario of the elastiscearch search application is not commonly used in knowledge base file storage in log analysis and web blogs.
The current knowledge base tool of the company can manage the company files and set access rights through rights control, but file searching is mainly realized through fuzzy matching of file names, so that the searched data need to accurately know keywords contained in the file names, otherwise, the required files cannot be searched.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a scheme for accelerating a batch file search model, so as to solve the problem that the matching range of files cannot be searched through file keywords in the existing company knowledge base.
An accelerated batch file search model comprises a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, the authority control of the file index comprises directory space, tenant information, user information and file authority,
the file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library;
the catalog space is used for storing metadata of tables, indexes and other objects, and comprises names of the tables, names of columns, data types of the columns and name information of the indexes;
the tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes;
the user information is used for controlling the attribution authority of the document;
the file authority is used for controlling the authority process, the authority information and the outer chain sharing of the file, and is the authority with the finest granularity in the file index.
Further, the file index creation includes tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, except the corresponding trial processing, a compensation mechanism is added, and the missing or abnormal file is updated regularly.
Further, the creation of the file index includes the following:
space maintenance: data maintained in any space catalogue under tenant information is gathered into an index service through pushing of a file service;
uploading files: in the file uploading process, besides the conventional file storing process, metadata information of the file and content identification information of the file are stored in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled;
and (3) authority control: fine granularity control set for a visible range in the middle of a document or a directory;
and (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
Further, the file index creating operation method is a file consumer, and the file operation under different space catalogues is consumed in an asynchronous consumption mode, and the tenant, the file identifier, the operation identifier and the like are identified through the file producer to send messages; the specific steps of creating the file index include:
s1, inquiring metadata information and detailed information of files under different tenants according to tenant IDs, file metadata IDs and file editing types as reference entering objects;
s2, processing file synchronization according to the editing type strategy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
s3, indexing the abnormal processing condition of the file information, and waiting for retrying to update again in the database.
In particular, in the space maintenance, an empty space directory is not meaningful for the index service, and only the space directory containing the file entity can be added as an attribute to the index document.
Further, the authority control of the file search supports the SaaS version multi-tenant authority isolation level, including setting directory authorities, updating document authorities, setting document authorities, updating document authorities and modifying document authorities;
the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file;
by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
Compared with the prior art, the invention has the following advantages:
(1) The method is characterized in that the method comprises the steps of converging files to a unified file index server, supporting file keyword query, realizing low-delay file data update, supporting keyword segmentation search, supporting authority control, supporting multiple condition searches such as file path search, and the like.
(2) The dependence on the service application system is reduced, and the performance pressure on the database is released; the scattered file data are aggregated, the file searching is portable and high in reusability, the file searching performance is improved, the full-text searching is applied to the knowledge base, a new searching mode is realized, and the file searching accuracy is realized.
Drawings
FIG. 1 is a block diagram of a model of the present invention;
FIG. 2 is a flow chart of file index creation according to the present invention;
FIG. 3 is a diagram of the rights control for file searching of the present invention;
FIG. 4 is a diagram of a data structure in an embodiment of the present invention;
FIG. 5 is a class diagram of an implementation of an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, an accelerated batch file search model comprises file indexes and file metadata formed by associating the file indexes, wherein the authority control of the file indexes is controlled by the association relation of the file metadata, and the authority control of the file indexes comprises directory space, tenant information, user information and file authority. The file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library; the catalog space is used for storing metadata of the table, the index and other objects, and comprises names of the table, names of columns, data types of the columns and name information of the index; tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes; the user information is used for controlling the attribution authority of the document; the file authority is used for controlling the authority process, the authority information and the external chain sharing of the file, and is the authority with the finest granularity in the file index.
As shown in fig. 2, the file index is created by tenant, file service and index service, the index service adopts asynchronous mode to create and update, and adds compensation mechanism to update abnormal data except corresponding attempt processing, and updates missing or abnormal files at regular time. Creation of the file index includes the following:
space maintenance: data maintained in any space directory under tenant information is gathered into an index service through pushing of a file service.
Uploading files: in addition to storing the file in the file storage space conventionally, the file uploading process also needs to store metadata information of the file and content identification information of the file in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled.
And (3) authority control: is fine-grained control of the visible range settings in the middle of a document or directory.
And (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
As shown in fig. 3, the rights control of file search supports the multi-tenant rights isolation level of SaaS version, including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights; the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file; by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
Examples
The business requirement is that a company realizes a knowledge base global search function under a multi-tenant scene, and performs condition definition according to the catalog space to which a user belongs and the attribution of the project or product space to which the user belongs, and simultaneously performs global search on document information set through authority control.
FIG. 4 shows a data structure retrieved inside a company, wherein the main operation method class of document index creation is FileConsumer, and document operations under different space catalogs are consumed in an asynchronous consumption mode. The tenant is identified by the FileProducer, the file identifier, the operation identifier, etc. send the message.
As shown in the implementation class diagram of FIG. 5, the specific steps of creating a file index can be known to include:
(1) According to tenant IDs, file metadata IDs and file editing types as entry objects, inquiring metadata information and detailed information of files under different tenants;
(2) A process of processing file synchronization according to an editing type policy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
(3) And (3) indexing the abnormal processing condition of the Chinese information, and waiting for retrying to update again in the database.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (6)
1. An acceleration batch file search model is characterized by comprising a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, and the authority control of the file index comprises a directory space, tenant information, user information and file authority,
the file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library;
the catalog space is used for storing metadata of tables, indexes and other objects, and comprises names of the tables, names of columns, data types of the columns and name information of the indexes;
the tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes;
the user information is used for controlling the attribution authority of the document;
the file authority is used for controlling the authority process, the authority information and the outer chain sharing of the file, and is the authority with the finest granularity in the file index.
2. The accelerated bulk file search model of claim 1 wherein the file index creation includes tenant, file service and index service, the index service being created and updated asynchronously, and wherein compensation mechanisms are added to update missing or abnormal files in a timed manner except for corresponding attempted processing of the updated abnormal data.
3. An accelerated bulk file search model of claim 2, wherein the creation of the file index comprises the following:
space maintenance: data maintained in any space catalogue under tenant information is gathered into an index service through pushing of a file service;
uploading files: in the file uploading process, besides the conventional file storing process, metadata information of the file and content identification information of the file are stored in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled;
and (3) authority control: fine granularity control set for a visible range in the middle of a document or a directory;
and (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.
4. An accelerated batch file search model according to claim 3, wherein the file index creation operation method is fileConsumer, and uses asynchronous consumption to consume the file operations under different space catalogues, and uses fileProducer to identify tenants, file identifiers, operation identifiers and the like to send messages; the specific steps of creating the file index include:
s1, inquiring metadata information and detailed information of files under different tenants according to tenant IDs, file metadata IDs and file editing types as reference entering objects;
s2, processing file synchronization according to the editing type strategy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;
s3, indexing the abnormal processing condition of the file information, and waiting for retrying to update again in the database.
5. An accelerated bulk file search model of claim 3, wherein in the spatial maintenance, empty spatial directories are meaningless to an indexing service, and only spatial directories containing file entities can be added as attributes to an index document.
6. An accelerated bulk file search model of claim 3 wherein the rights control for file searches supports SaaS version multi-tenant rights quarantine levels including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights;
the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file;
by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311417905.4A CN117472854A (en) | 2023-10-30 | 2023-10-30 | Acceleration batch file search model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311417905.4A CN117472854A (en) | 2023-10-30 | 2023-10-30 | Acceleration batch file search model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117472854A true CN117472854A (en) | 2024-01-30 |
Family
ID=89634120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311417905.4A Pending CN117472854A (en) | 2023-10-30 | 2023-10-30 | Acceleration batch file search model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117472854A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117909299A (en) * | 2024-03-19 | 2024-04-19 | 电子科技大学 | Dynamic hierarchical data splitting system |
-
2023
- 2023-10-30 CN CN202311417905.4A patent/CN117472854A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117909299A (en) * | 2024-03-19 | 2024-04-19 | 电子科技大学 | Dynamic hierarchical data splitting system |
CN117909299B (en) * | 2024-03-19 | 2024-05-10 | 电子科技大学 | Dynamic hierarchical data splitting system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11516289B2 (en) | Method and system for displaying similar email messages based on message contents | |
US6961734B2 (en) | Method, system, and program for defining asset classes in a digital library | |
US5802524A (en) | Method and product for integrating an object-based search engine with a parametrically archived database | |
US7464084B2 (en) | Method for performing an inexact query transformation in a heterogeneous environment | |
US8010887B2 (en) | Implementing versioning support for data using a two-table approach that maximizes database efficiency | |
US20060248039A1 (en) | Sharing of full text index entries across application boundaries | |
US20050060337A1 (en) | System, method, and service for managing persistent federated folders within a federated content management system | |
US7734618B2 (en) | Creating adaptive, deferred, incremental indexes | |
US20220083618A1 (en) | Method And System For Scalable Search Using MicroService And Cloud Based Search With Records Indexes | |
US20090240714A1 (en) | Semantic relational database | |
AU2017243870B2 (en) | "Methods and systems for database optimisation" | |
US20140046928A1 (en) | Query plans with parameter markers in place of object identifiers | |
CN111680041B (en) | Safety high-efficiency access method for heterogeneous data | |
CA2379930A1 (en) | Multi-model access to data | |
CN114116716A (en) | Hierarchical data retrieval method, device and equipment | |
CN117472854A (en) | Acceleration batch file search model | |
US20030135492A1 (en) | Method, system, and program for defining asset queries in a digital library | |
US6701328B1 (en) | Database management system | |
CN107291938A (en) | Order Query System and method | |
EP3436988B1 (en) | "methods and systems for database optimisation" | |
CN104834664A (en) | Optical disc juke-box oriented full text retrieval system | |
CN114238241B (en) | Metadata processing method and computer system for financial data | |
Zabback et al. | Office documents on a database kernel—filing, retrieval, and archiving | |
JP2007156844A (en) | Data registration/retrieval system and data registration/retrieval method | |
CN113806366A (en) | Atlas-based method for realizing multidimensional metadata joint query |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Kang Ningbo Inventor after: Wang Haichao Inventor after: Qiu Chen Inventor before: Kang Ningbo Inventor before: Li Zhiwei Inventor before: Wang Haichao Inventor before: Qiu Chen |