CN117472854A

CN117472854A - Acceleration batch file search model

Info

Publication number: CN117472854A
Application number: CN202311417905.4A
Authority: CN
Inventors: 康宁波; 李志伟; 王海超; 仇晨
Original assignee: Suzhou Shaka Intelligent Technology Co ltd
Current assignee: Suzhou Shaka Intelligent Technology Co ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-01-30

Abstract

The invention discloses an acceleration batch file search model, which comprises a file index and file metadata formed by associating the file index, wherein the authority control of the file index comprises a catalog space, tenant information, user information and file authority; the file index creation comprises tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, a compensation mechanism is added besides corresponding attempt processing, and missing or abnormal file updating is timed. The method and the device support file keyword query through file aggregation to a unified file index, realize low-delay file data update, support multiple condition searches such as keyword segmentation search, authority control, file path search and the like, are decoupled from services such as file information metadata query, preview, editing and the like, pay attention to unified summarization of data, pay attention to query efficiency, and pay attention to high coverage rate and accuracy of query results and authority control.

Description

Acceleration batch file search model

Technical Field

The invention belongs to the technical field of file searching, and particularly relates to an acceleration batch file searching model.

Background

In the conventional standardized management of a company, knowledge base storage is an extremely important standardized management link for the company, and knowledge base management is used for recording information and knowledge, so that team precipitation experience and resource sharing are facilitated, team cooperation and safety management and control are realized, a complete knowledge system is formed, and continuous evolution is realized.

At present, a large number of files of a company are stored in a service terminal, and the files are scattered and unstructured data are difficult to retrieve. The usual file search is directed to structured file metadata queries, whereas the current scenario of the elastiscearch search application is not commonly used in knowledge base file storage in log analysis and web blogs.

The current knowledge base tool of the company can manage the company files and set access rights through rights control, but file searching is mainly realized through fuzzy matching of file names, so that the searched data need to accurately know keywords contained in the file names, otherwise, the required files cannot be searched.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a scheme for accelerating a batch file search model, so as to solve the problem that the matching range of files cannot be searched through file keywords in the existing company knowledge base.

An accelerated batch file search model comprises a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, the authority control of the file index comprises directory space, tenant information, user information and file authority,

the file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library;

the catalog space is used for storing metadata of tables, indexes and other objects, and comprises names of the tables, names of columns, data types of the columns and name information of the indexes;

the tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes;

the user information is used for controlling the attribution authority of the document;

the file authority is used for controlling the authority process, the authority information and the outer chain sharing of the file, and is the authority with the finest granularity in the file index.

Further, the file index creation includes tenant, file service and index service, the index service adopts asynchronous mode creation and updating, and for the data with abnormal updating, except the corresponding trial processing, a compensation mechanism is added, and the missing or abnormal file is updated regularly.

Further, the creation of the file index includes the following:

space maintenance: data maintained in any space catalogue under tenant information is gathered into an index service through pushing of a file service;

uploading files: in the file uploading process, besides the conventional file storing process, metadata information of the file and content identification information of the file are stored in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled;

and (3) authority control: fine granularity control set for a visible range in the middle of a document or a directory;

and (3) file recovery: and updating and reflecting in the index service, and recovering files to be invisible.

Further, the file index creating operation method is a file consumer, and the file operation under different space catalogues is consumed in an asynchronous consumption mode, and the tenant, the file identifier, the operation identifier and the like are identified through the file producer to send messages; the specific steps of creating the file index include:

s1, inquiring metadata information and detailed information of files under different tenants according to tenant IDs, file metadata IDs and file editing types as reference entering objects;

s2, processing file synchronization according to the editing type strategy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;

s3, indexing the abnormal processing condition of the file information, and waiting for retrying to update again in the database.

In particular, in the space maintenance, an empty space directory is not meaningful for the index service, and only the space directory containing the file entity can be added as an attribute to the index document.

Further, the authority control of the file search supports the SaaS version multi-tenant authority isolation level, including setting directory authorities, updating document authorities, setting document authorities, updating document authorities and modifying document authorities;

the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file;

by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.

Compared with the prior art, the invention has the following advantages:

(1) The method is characterized in that the method comprises the steps of converging files to a unified file index server, supporting file keyword query, realizing low-delay file data update, supporting keyword segmentation search, supporting authority control, supporting multiple condition searches such as file path search, and the like.

(2) The dependence on the service application system is reduced, and the performance pressure on the database is released; the scattered file data are aggregated, the file searching is portable and high in reusability, the file searching performance is improved, the full-text searching is applied to the knowledge base, a new searching mode is realized, and the file searching accuracy is realized.

Drawings

FIG. 1 is a block diagram of a model of the present invention;

FIG. 2 is a flow chart of file index creation according to the present invention;

FIG. 3 is a diagram of the rights control for file searching of the present invention;

FIG. 4 is a diagram of a data structure in an embodiment of the present invention;

FIG. 5 is a class diagram of an implementation of an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in FIG. 1, an accelerated batch file search model comprises file indexes and file metadata formed by associating the file indexes, wherein the authority control of the file indexes is controlled by the association relation of the file metadata, and the authority control of the file indexes comprises directory space, tenant information, user information and file authority. The file metadata is used for storing name attributes, file type attributes, file size attributes and address attributes of an associated file library; the catalog space is used for storing metadata of the table, the index and other objects, and comprises names of the table, names of columns, data types of the columns and name information of the index; tenant information is used for controlling physical isolation of file metadata and logical isolation of document indexes; the user information is used for controlling the attribution authority of the document; the file authority is used for controlling the authority process, the authority information and the external chain sharing of the file, and is the authority with the finest granularity in the file index.

As shown in fig. 2, the file index is created by tenant, file service and index service, the index service adopts asynchronous mode to create and update, and adds compensation mechanism to update abnormal data except corresponding attempt processing, and updates missing or abnormal files at regular time. Creation of the file index includes the following:

space maintenance: data maintained in any space directory under tenant information is gathered into an index service through pushing of a file service.

Uploading files: in addition to storing the file in the file storage space conventionally, the file uploading process also needs to store metadata information of the file and content identification information of the file in an index service, and the current operation is realized asynchronously, so that the file storing process is not influenced, and the service is decoupled.

And (3) authority control: is fine-grained control of the visible range settings in the middle of a document or directory.

As shown in fig. 3, the rights control of file search supports the multi-tenant rights isolation level of SaaS version, including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights; the authority rule of the hidden part of the file is created, the storage space of the file is the organization range authority of the file, and the source of the file is the authority of the file; by setting the authority of the file, the organization scope and personnel visible search of the file can be realized, the organization authority setting is to update the organization information to the index service, and the personnel information under the organization is updated to the authority items in the index service.

Examples

The business requirement is that a company realizes a knowledge base global search function under a multi-tenant scene, and performs condition definition according to the catalog space to which a user belongs and the attribution of the project or product space to which the user belongs, and simultaneously performs global search on document information set through authority control.

FIG. 4 shows a data structure retrieved inside a company, wherein the main operation method class of document index creation is FileConsumer, and document operations under different space catalogs are consumed in an asynchronous consumption mode. The tenant is identified by the FileProducer, the file identifier, the operation identifier, etc. send the message.

As shown in the implementation class diagram of FIG. 5, the specific steps of creating a file index can be known to include:

(1) According to tenant IDs, file metadata IDs and file editing types as entry objects, inquiring metadata information and detailed information of files under different tenants;

(2) A process of processing file synchronization according to an editing type policy of the file: when the number of the tenants is increased, whether indexes exist under the tenants is judged, and the purpose of the index initialization is to be achieved in the first synchronization; during editing, not only is the edited content updated, but also the authority is synchronously changed to the index; when deleting, the document information in the index is physically deleted and not reserved;

(3) And (3) indexing the abnormal processing condition of the Chinese information, and waiting for retrying to update again in the database.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An acceleration batch file search model is characterized by comprising a file index and file metadata which is formed by associating the file index, wherein the authority control of the file index is controlled by the association relation of the file metadata, and the authority control of the file index comprises a directory space, tenant information, user information and file authority,

2. The accelerated bulk file search model of claim 1 wherein the file index creation includes tenant, file service and index service, the index service being created and updated asynchronously, and wherein compensation mechanisms are added to update missing or abnormal files in a timed manner except for corresponding attempted processing of the updated abnormal data.

3. An accelerated bulk file search model of claim 2, wherein the creation of the file index comprises the following:

4. An accelerated batch file search model according to claim 3, wherein the file index creation operation method is fileConsumer, and uses asynchronous consumption to consume the file operations under different space catalogues, and uses fileProducer to identify tenants, file identifiers, operation identifiers and the like to send messages; the specific steps of creating the file index include:

5. An accelerated bulk file search model of claim 3, wherein in the spatial maintenance, empty spatial directories are meaningless to an indexing service, and only spatial directories containing file entities can be added as attributes to an index document.

6. An accelerated bulk file search model of claim 3 wherein the rights control for file searches supports SaaS version multi-tenant rights quarantine levels including setting directory rights, updating document rights, setting document rights, updating document rights, modifying document rights;