WO2022222303A1 - 基于hdfs的小文件处理方法、装置、介质及电子设备 - Google Patents

基于hdfs的小文件处理方法、装置、介质及电子设备 Download PDF

Info

Publication number
WO2022222303A1
WO2022222303A1 PCT/CN2021/110209 CN2021110209W WO2022222303A1 WO 2022222303 A1 WO2022222303 A1 WO 2022222303A1 CN 2021110209 W CN2021110209 W CN 2021110209W WO 2022222303 A1 WO2022222303 A1 WO 2022222303A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
file
cluster
hdfs
response
Prior art date
Application number
PCT/CN2021/110209
Other languages
English (en)
French (fr)
Inventor
魏鹏飞
万月亮
火一莽
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022222303A1 publication Critical patent/WO2022222303A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the embodiments of the present application relate to the field of big data technologies, for example, to an HDFS-based small file processing method, apparatus, medium, and electronic device.
  • HDFS Hadoop Distributed File System, distributed file system
  • the data input to HDFS consists of many small files, where small files refer to files smaller than the smallest storage and processing unit in HDFS.
  • the speed of processing small files is much less than the speed of processing comparably sized portions in large files.
  • Each small file occupies one resource unit, and task startup will consume a lot of time or even most of the time is spent on starting and releasing tasks. There is no good strategy for dealing with a large number of small files.
  • Embodiments of the present application provide an HDFS-based small file processing method, apparatus, medium, and electronic device, which can save small files in a cluster, combine multiple small files in the cluster into a large file, and store the large file in a In HDFS, the processing time of small files is saved and the processing efficiency is improved.
  • an HDFS-based small file processing method which includes:
  • the multiple target files in the target cluster are merged by file type to obtain a merged file, and the merged file is transmitted to HDFS.
  • an embodiment of the present application provides a small file processing device based on HDFS, and the device includes:
  • the target file acquisition module is set to filter the files to be processed to obtain the target file
  • a target file saving module configured to save the target file to the target cluster according to a preset writing rule in response to the target file meeting the file volume constraint
  • the merged file transmission module is configured to merge multiple target files in the target cluster according to file types, obtain a merged file, and transmit the merged file to HDFS.
  • embodiments of the present application provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, implements the HDFS-based HDFS described in the embodiments of the present application. Small file handling method.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor implements the following when executing the computer program.
  • FIG. 1 is a flowchart of the HDFS-based small file processing method provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic diagram of an HDFS-based small file processing process provided in Embodiment 2 of the present application;
  • FIG. 3 is a schematic structural diagram of an HDFS-based small file processing apparatus provided in Embodiment 3 of the present application;
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart depicts various steps as a sequential process, some of the various steps may be performed in parallel, concurrently, or concurrently. Additionally, the order of each step can be rearranged. The process may be terminated when the operation is complete, but may also have additional steps not included in the figures.
  • the processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.
  • FIG. 1 is a flowchart of the HDFS-based small file processing method provided in Embodiment 1 of the present application. This embodiment is applicable to the case of processing a large number of small files. It is executed by a file processing apparatus, and the apparatus may be implemented by software and/or hardware, and may be integrated into a device such as an intelligent terminal for file processing.
  • the HDFS-based small file processing device may be a server, and the server communicates with the client, the nodes in the target cluster, and HDFS, respectively.
  • the HDFS-based small file processing method includes:
  • a preset function can be used to filter the files to be processed.
  • the preset function may be the globStatus function. Get the path of the eligible object file through the globStatus function. Among them, the globStatus function uses wildcards to match the path to the specified pattern.
  • the file size constraints can be set according to business requirements. For example, a target file smaller than 128M may be determined to satisfy the file volume constraint, and a target file larger than 128M may be determined to not satisfy the file volume constraint.
  • the number of servers that can be added to the cluster, and the servers in the cluster can provide the same services, so that the servers can reach a stable and efficient state.
  • the target file that satisfies the file volume constraint condition may be saved to the target cluster according to the preset writing rule. For example, you can save to the target cluster by name rules.
  • the target file is saved separately according to business groups.
  • HDFS Hadoop Distributed File System
  • HDFS refers to a distributed file system that is designed to run on general-purpose hardware and has the characteristics of high fault tolerance.
  • the target cluster includes a Redis (Remote Dictionary Server, remote dictionary service) cluster
  • the target files are grouped by business according to preset writing rules, and the grouped target files are saved to the Redis cluster.
  • Redis is an open source log-type, Key-Value (Key-Value) database written in ANSI C language, supporting network, memory-based and persistent, and provides API (Application Programming Interface, application program) in multiple languages. interface).
  • Redis cluster is to strengthen the read and write capabilities of Redis.
  • each redis is recorded as a node.
  • nodes There are two types of nodes: master nodes and slave nodes.
  • Saving the target file to the Redis cluster can make full use of CPU (Central Processing Unit, central processing unit) resources and improve the performance of small file processing.
  • CPU Central Processing Unit, central processing unit
  • the file type can be searched according to the protocol, and multiple target files can be merged into a large file according to a preset merging rule.
  • the target file can be merged into multiple large files through the copyBytes function, and multiple large files can be uploaded to HDFS.
  • a large file can be understood as a file whose storage capacity is greater than that of a small file, and a file obtained by merging the small files.
  • target files of the same file type may be merged to obtain a large file corresponding to each different file type.
  • the multiple target files in the target cluster are merged by file type to obtain the merged file, and the merged file is transmitted to HDFS, including:
  • the plurality of target files in the target cluster are merged by file type to obtain a merged file, and the merged file is transmitted to HDFS.
  • the target files in the target cluster are merged by file type. , get the merged file, and transfer the merged file to HDFS.
  • the multiple target files in the target cluster are merged by file type to obtain the merged file, and the merged file is transmitted to HDFS, including:
  • the multiple target files in the target cluster are merged by file type according to a preset timing task to obtain a merged file, and the merged file is transmitted to HDFS.
  • a background scheduled task will be started according to the configuration to flush the time-out target cluster cache to within HDFS.
  • Obtaining the target cluster information can prevent the target cluster cache from being flushed due to abnormal information, and improve the processing efficiency of small files.
  • the method further includes:
  • the target data In response to that the target data does not exist in the target cluster, obtain the target data in the HDFS, and send the target data in the HDFS to the client.
  • the client when the client reads the target data, it reads the target data from the target cluster; if the target data does not exist in the target cluster, it reads the target data from HDFS. If the data read time is greater than the upload time of the target file in the target cluster to HDFS, the target data is directly read from HDFS.
  • the files to be processed are screened to obtain the target file; in response to the target file satisfying the file volume constraint, the target file is saved to the target cluster according to the preset writing rule; multiple target files in the target cluster are saved Merge by file type, get the merged file, and transfer the merged file to HDFS.
  • small files can be saved in the cluster, and multiple small files in the cluster can be combined into large files and stored in HDFS, which saves time for processing small files and improves processing efficiency.
  • FIG. 2 is a schematic diagram of a small file processing process based on HDFS provided by Embodiment 2 of the present application, and Embodiment 2 is based on Embodiment 1.
  • saving the target file to the target cluster according to a preset writing rule includes: in response to a client call request, sending target parameters to the client, so that the client can construct a cluster constraint based on the target parameters.
  • the content not described in the second embodiment can be seen in the first embodiment.
  • the method includes the following steps:
  • S220 In response to the client invocation request, send the target parameters to the client, so that the client can construct the to-be-stored file that satisfies the cluster constraints according to the target parameters; wherein the target parameters include absolute path parameters and attachments name parameter;
  • the MapFile name service is implemented using the Spring Boot framework, and the service interface is the Rest interface.
  • the client calls the MapFile name service to obtain the absolute path parameter of the attachment in the MapFile storage and the attachment name parameter of the attachment in the MapFile.
  • the client processes the target file according to the absolute path parameter and the attachment name parameter, and constructs the file to be stored that meets the cluster constraints.
  • the file to be stored may refer to a MapFile file.
  • the operation mode of constructing the to-be-stored file that satisfies the cluster constraints may be to name the file according to a specific format.
  • the client processes the target file according to the absolute path parameter and the attachment name parameter, and writes the to-be-stored file into the target cluster after constructing the to-be-stored file that satisfies the cluster constraints.
  • the method before sending the target parameter to the client in response to the client invocation request, the method further includes:
  • the method further includes:
  • the counter In response to detecting that the file to be stored is saved to the target cluster, the counter is operated to monitor the client write operation.
  • the information to be edited includes a file length and a counter.
  • the operation mode of the counter may be to change the value corresponding to the counter to monitor the write operation of the client.
  • the client calls the MapFile name service to obtain the full path of the attachment in the MapFile storage and the attachment name of the attachment in the MapFile.
  • the passed parameters include the namespace, data set, the length of the attachment and the name of the attachment;
  • the MapFile name service is based on the writing client Calculate the storage period of the data based on the entered namespace and dataset, and look up the MapFileInfo information in the internal Hash table.
  • MapFileInfo If no MapFileInfo information is found, a new MapFileInfo will be created; if there is MapFileInfo information, the file length of the current MapFile information will be increased, the counter will be incremented by one, and the absolute path parameters and attachments of the writing client MapFile will be returned to the writing client and stored in MapFile.
  • the attachment name parameter in the client completes the write operation of the MapFile file into the target cluster according to the absolute path parameter and the attachment name parameter; the client calls the MapFile name service to complete the cache write, and the MapFile service decrements the reference count of the cluster's cache by one, Implement monitoring of client write operations.
  • the MapFile name service checks that the current MapFile write counter is 0, and the target file storage exceeds the configuration time or configuration size, and notifies the client to refresh the MapFile in the cluster to the HDFS file system and destroy the current MapFile structure.
  • Saving the target file to the Redis cluster can make full use of CPU resources and improve performance. At the same time, the cache operation written to the cluster is monitored, which improves the processing efficiency of small files.
  • S240 Merge multiple target files in the target cluster according to file types to obtain a merged file, and transmit the merged file to HDFS.
  • the files to be processed are screened to obtain the target file; in response to the client call request, the target parameter is sent to the client, and in response to the client write operation, the to-be-stored file is saved to the target cluster. Merge multiple target files in the target cluster by file type, obtain the merged file, and transfer the merged file to HDFS.
  • small files can be saved in the cluster, multiple small files in the cluster can be combined into large files, and the large files can be stored in HDFS, which saves time for processing small files and improves processing efficiency.
  • FIG. 3 is a schematic structural diagram of an HDFS-based small file processing apparatus provided in Embodiment 3 of the present application. As shown in FIG. 3 , the HDFS-based small file processing apparatus includes:
  • the target file obtaining module 310 is configured to filter the files to be processed to obtain the target file
  • the target file saving module 320 is configured to save the target file to the target cluster according to a preset writing rule in response to the target file satisfying the file volume constraint;
  • the merged file transmission module 330 is configured to merge multiple target files in the target cluster according to file types, obtain a merged file, and transmit the merged file to HDFS.
  • the target file saving module 320 includes:
  • an invocation request response unit configured to send target parameters to the client in response to the invocation request of the client, so that the client can process the target file according to the target parameter and construct a to-be-stored file that satisfies the cluster constraints;
  • the target parameter includes an absolute path parameter and an attachment name parameter;
  • the write operation response unit is configured to save the to-be-stored file to the target cluster in response to the client's write operation.
  • the apparatus further includes:
  • a to-be-edited information determination module configured to determine the to-be-edited information according to the namespace information and data set information sent by the client; wherein the to-be-edited information includes a counter;
  • the counter operation module is configured to operate the counter in response to detecting that the to-be-stored file is saved to the target cluster, so as to monitor the client write operation.
  • the combined file transmission module 330 includes:
  • a storage capacity judging unit configured to, in response to the storage capacity of the target cluster meeting a preset storage capacity condition, merge a plurality of target files in the target cluster by file type, obtain a merged file, and merge the merged file Transfer to HDFS;
  • a storage time judging unit configured to merge the multiple target files in the target cluster by file type in response to the storage time of the multiple target files in the target cluster meeting a preset time condition to obtain a merged file, and Transfer the merged file to HDFS.
  • the combined file transmission module 330 includes:
  • the cache exception processing unit is configured to, in response to detecting a cache exception, merge multiple target files in the target cluster according to file types according to a preset timing task, obtain a merged file, and transmit the merged file to HDFS.
  • the target cluster includes a Redis cluster
  • the target file saving module 320 is further configured to group the target files by business according to preset writing rules, and save the grouped target files to the Redis cluster.
  • the apparatus further includes:
  • the client read operation response module is set to detect whether the target data exists in the target cluster in response to the client read operation
  • a target data existence module configured to send the target data to the client in response to the target data existing in the target cluster
  • the target data absence module is configured to obtain target data in HDFS in response to the target data not existing in the target cluster, and send the target data in HDFS to the client.
  • the above product can execute the method provided by the embodiments of the present application, and has functional modules and beneficial effects corresponding to the execution method.
  • Embodiments of the present application further provide a medium containing computer-executable instructions, where the computer-executable instructions are used to execute a small file processing method based on HDFS when executed by a computer processor, and the method includes:
  • the multiple target files in the target cluster are merged by file type to obtain a merged file, and the merged file is transmitted to HDFS.
  • a medium refers to any of the various types of memory devices or storage devices.
  • the term medium is intended to include: installation media such as Compact Disc Read-Only Memory (CD-ROM), floppy disks or tape devices; computer system memory or random access memory such as Dynamic Random Access Memory Memory, DRAM), double-speed synchronous random access memory (double data rate RAM, DDR RAM), static random access memory (Static Random-Access Memory, SRAM), extended data output memory (Extended Data Out RAM, EDO RAM) ), Rambus RAM, etc;
  • the medium may also include other types or combinations of memory.
  • the medium may be located in the computer system executing the program, or may be located in a different second computer system connected to the computer system through a network such as the Internet.
  • the second computer system may provide program instructions to the computer for execution.
  • the term medium may include two or more media that may reside in different locations (eg, in different computer systems connected through a network).
  • the medium may store program instructions, such as a computer program, executable by one or more processors.
  • a medium containing computer-executable instructions provided by an embodiment of the present application can not only implement the above-mentioned HDFS-based small file processing operation, but also can execute the HDFS-based processing operation provided by any embodiment of the present application. Related operations in small file processing methods.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present application.
  • this embodiment provides an electronic device 400, including: one or more processors 420; and a storage device 410, configured to store one or more programs, when the one or more programs are stored.
  • the one or more processors 420 execute, so that the one or more processors 420 implement the HDFS-based small file processing method provided by the embodiment of the present application, and the method includes:
  • the multiple target files in the target cluster are merged by file type to obtain the merged file, and the merged file is transmitted to the distributed file system HDFS.
  • processor 420 also implements the HDFS-based small file processing method provided by any embodiment of the present application.
  • the electronic device 400 shown in FIG. 4 is just one example.
  • the electronic device 400 includes a processor 420 , a storage device 410 , an input device 430 and an output device 440 ; the number of processors 420 in the electronic device may be one or more, and one processor 420 is used in FIG. 4 .
  • the processor 420 , the storage device 410 , the input device 430 and the output device 440 in the electronic device may be connected by a bus or in other ways, and the connection by the bus 450 is taken as an example in FIG. 4 .
  • the storage device 410 may be configured to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the HDFS-based small file processing method in the embodiments of the present application.
  • the storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Additionally, storage device 410 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, storage device 410 may include memory located remotely from processor 420, which may be connected through a network. Examples of the above-mentioned network include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 430 may be configured to receive inputted numbers, character information or voice information, and to generate key signal inputs related to user settings and function control of the electronic device 400 .
  • the output device 440 may include electronic devices such as a display screen, a speaker, and the like.
  • the electronic device provided by the embodiment of the present application can save small files in the cluster, combine multiple small files in the cluster into a large file, and store the large file in HDFS, which saves the processing time of small files and improves the efficiency of processing small files. processing efficiency.
  • the HDFS-based small file processing apparatus, medium, and electronic device provided in the above embodiments can execute the HDFS-based small file processing method provided by any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method.
  • the HDFS-based small file processing method provided by any embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于HDFS的小文件处理方法、装置、介质及电子设备(400)。该方法包括:对待处理文件进行筛选,获得目标文件(S110);响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群(S120);将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS(S130)。

Description

基于HDFS的小文件处理方法、装置、介质及电子设备
本公开要求在2021年04月19日提交中国专利局、申请号为202110417936.4的中国专利申请的优先权,以上申请的全部内容通过引用结合在本公开中。
技术领域
本申请实施例涉及大数据技术领域,例如涉及一种基于HDFS的小文件处理方法、装置、介质及电子设备。
背景技术
随着互联网技术的发展,网络数据量成指数级增长。在实际生产环境中,大数据规模达到千亿或者PB级,广泛使用HDFS(Hadoop Distributed File System,分布式文件系统)处理不同类的文件。
输入到HDFS的数据由很多小文件构成,这里的小文件指的是小于HDFS中最小存储和处理单位的文件。
处理小文件的速度远远小于处理大文件中的同等大小的部分的速度。每一个小文件要占用一个资源单位,而任务启动将消耗大量时间甚至大部分时间都耗费在启动和释放任务上。针对大量小文件处理并没有很好的处理策略。
发明内容
本申请实施例提供一种基于HDFS的小文件处理方法、装置、介质及电子设备,可以将小文件保存至集群中,并将集群中的多个小文件合并成大文件,将大文件存储到HDFS中,节省了小文件处理的时间,提高了处理效率。
第一方面,本申请实施例提供了一种基于HDFS的小文件处理方法,该方法包括:
对待处理文件进行筛选,获得目标文件;
响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
第二方面,本申请实施例提供了一种基于HDFS的小文件处理装置,该装置包括:
目标文件获取模块,设置为对待处理文件进行筛选,获得目标文件;
目标文件保存模块,设置为响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
合并文件传输模块,设置为将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
第三方面,本申请实施例提供了一种计算机可读介质,所述计算机可读介质上存储有计算机程序,所述计算机程序被处理器执行时实现如本申请实施例所述的基于HDFS的小文件处理方法。
第四方面,本申请实施例提供了一种电子设备,包括存储器,处理器及存储在所述存储器上并可由所述处理器运行的计算机程序,所述处理器执行所述计算机程序时实现如本申请实施例所述的基于HDFS的小文件处理方法。
附图说明
图1是本申请实施例一提供的基于HDFS的小文件处理方法的流程图;
图2是本申请实施例二提供的基于HDFS的小文件处理过程的示意图;
图3是本申请实施例三提供的基于HDFS的小文件处理装置的结构示意图;
图4是本申请实施例五提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是多个步骤中的部分步骤可以被并行地、并发地或者同时实施。此外,每个步骤的顺序可以被重新安排。当操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
实施例一
图1是本申请实施例一提供的基于HDFS的小文件处理方法的流程图,本实施例可适用于对大量小文件进行处理的情况,该方法可以由本申请实施例所提供的基于HDFS的小文件处理装置执行,该装置可以由软件和/或硬件的方式来实现,并可集成于用于文件处理的智能终端等设备中。
在一实施场景中,基于HDFS的小文件处理装置可为一服务器,该服务器分别与客户端、目标集群中的节点以及HDFS进行通信。
如图1所示,所述基于HDFS的小文件处理方法包括:
S110、对待处理文件进行筛选,获得目标文件;
在本实施例中,可以利用预设函数对待处理文件进行筛选。例如,预设函数可以是globStatus函数。通过globStatus函数获取符合条件的目标文件的路径。其中,globStatus函数是用通配符匹配到制定模式的路径。
S120、响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
其中,文件体积约束条件可以根据业务需求进行设定。例如,可以将小于128M的目标文件判断为满足文件体积约束条件,将大于128M的目标文件判断为不满足文件体积约束条件。
其中,集群中可以添加服务器的数量,集群中的服务器可提供相同的服务,让服务器达到一个稳定、高效的状态。
在本实施例中,可以将满足文件体积约束条件的目标文件按照预设写入规则保存至目标集群中。例如,可以按照名称规则保存至目标集群。其中,在将目标文件保存至目标集群中时,将目标文件按照业务分组分别进行保存。将不满足文件体积约束条件的目标文件直接保存至HDFS。其中,Hadoop分布式文件系统(HDFS)是指被设计成适合运行在通用硬件上的分布式文件系统,有着高容错性的特点。
在一实施例中,所述目标集群包括Redis(Remote Dictionary Server,远程字典服务)集群;
根据预设写入规则将所述目标文件保存至目标集群,包括:
根据预设写入规则将所述目标文件按业务分组,将分组后的目标文件保存至Redis集群。
其中,Redis是一个开源的使用ANSI C语言编写、支持网络、可基于内存亦可持久化的日志型、Key-Value(键值)数据库,并提供多种语言的API(Application Programming Interface,应用程序接口)。Redis集群是为了强化Redis的读写能力。在Redis集群中,每一个redis记为一个节点。有两种类型的节点:主节点和从节点。
将目标文件保存至Redis集群,可以充分利用CPU(Central Processing Unit,中央处理器)资源,提高小文件处理性能。
S130、将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
在本实施例中,可以根据协议查找文件类型,并根据预设合并规则将多个 目标文件进行合并成大文件。例如,可以通过copyBytes函数将目标文件合并为多个大文件,并将多个大文件上传到HDFS。
大文件可理解为存储量大于小文件的文件,以及,小文件合并后得到的文件。
在一实施例中,可将同种文件类型的目标文件进行合并,以得到与每个不同文件类型对应的大文件。
在一实施例中,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS,包括:
响应于所述目标集群的存储量满足预设存储量条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS;
响应于所述目标集群中的多个目标文件的存储时间满足预设时间条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
可以理解的,将目标文件存储到目标集群中,当目标集群中存储的数据量达到一定量或目标文件的存储时间满足一定时间要求,此时,将目标集群中的目标文件按文件类型进行合并,获得合并文件,并将合并文件传输至HDFS。
通过将目标集群中的小文件合并成大文件存储到HDFS中,节省了小文件处理的时间,提高了处理效率。
在一实施例中,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS,包括:
响应于检测到缓存异常,根据预设定时任务将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
例如,若检测到客户端异常崩溃,导致目标集群中写计数器异常,或者客户端长时间没有请求,缓存不能刷新到HDFS内时,则根据配置启动后台定时任务将超时的目标集群缓存,刷新到HDFS内。
获取目标集群信息,可以防止因异常信息导致目标集群缓存刷爆,提高了小文件处理效率。
在一实施例中,所述方法还包括:
响应于客户端读取操作,检测目标集群中是否存在目标数据;
响应于所述目标集群中存在所述目标数据,将所述目标数据发送至客户端;
响应于所述目标集群中不存在所述目标数据,获取HDFS中的目标数据, 并将所述HDFS中的目标数据发送至客户端。
可以理解的,客户端在读取目标数据时,从目标集群中进行读取;若目标集群中没有该目标数据,则从HDFS中读取目标数据。如果数据读取时间大于目标集群中目标文件上传至HDFS的上传时间,则直接从HDFS中读取目标数据。
将小文件保存至集群中,并将集群中的多个小文件合并成大文件,将大文件存储到HDFS中。通过从集群或HDFS中读取数据,提高了读取效率。
本申请实施例,对待处理文件进行筛选,获得目标文件;响应于目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;将目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将合并文件传输至HDFS。通过执行本实施例,可以将小文件保存至集群中,并将集群中的多个小文件合并成大文件存储到HDFS中,节省了小文件处理的时间,提高了处理效率。
实施例二
图2是本申请实施例二提供的基于HDFS的小文件处理过程的示意图,本实施例二基于实施例一。例如:根据预设写入规则将所述目标文件保存至目标集群,包括:响应于客户端调用请求,将目标参数发送至客户端,以供所述客户端根据所述目标参数构建满足集群约束条件的待存储文件;其中,所述目标参数包括绝对路径参数和附件名称参数;响应于客户端写入操作,将所述待存储文件保存至目标集群。其中,未在本实施例二中描述的内容可见实施例一。如图2所示,该方法包括以下步骤:
S210、对待处理文件进行筛选,获得目标文件;
S220、响应于客户端调用请求,将目标参数发送至客户端,以供所述客户端根据所述目标参数构建满足集群约束条件的待存储文件;其中,所述目标参数包括绝对路径参数和附件名称参数;
在本实施例中,MapFile名称服务采用Spring Boot框架实现,服务接口采用Rest接口。客户端通过发送调用请求,调用MapFile名称服务获取附件在MapFile存储中的绝对路径参数和附件在MapFile中的附件名称参数。客户端根据绝对路径参数和附件名称参数,对目标文件进行处理,构建满足集群约束条件的待存储文件。其中,待存储文件可以是指MapFile文件。
在一实施例中,构建满足集群约束条件的待存储文件的操作方式可为,按照特定格式命名文件。
S230、响应于客户端写入操作,将所述待存储文件保存至目标集群;
在本实施例中,客户端根据绝对路径参数和附件名称参数,对目标文件进行处理,构建满足集群约束条件的待存储文件后,将待存储文件写入目标集群。
在一实施例中,在响应于客户端调用请求,将目标参数发送至客户端之前,所述方法还包括:
根据客户端发送的命名空间信息和数据集信息,确定待编辑信息;其中,所述待编辑信息包括计数器;
在响应于客户端写入操作,将所述待存储文件保存至目标集群之后,所述方法还包括:
响应于检测到待存储文件保存至目标集群,对所述计数器进行操作,以对所述客户端写入操作进行监控。
在一实施例中,待编辑信息包括文件长度和计数器。
例如,对计数器的操作方式可为,对计数器对应的数值进行变更,以对所述客户端写入操作进行监控。
例如,客户端调用MapFile名称服务获取附件在MapFile存储中的完整路径和附件在MapFile中的附件名称,传递的参数包括命名空间、数据集,附件长度以及附件名称;MapFile名称服务根据写客户端传入的命名空间、数据集计算出数据的存储周期,在内部的Hash表中查找MapFileInfo信息。若没有查找到MapFileInfo信息,就创建一个新的MapFileInfo;若存在MapFileInfo信息,则将当前的MapFile信息的文件长度增加,计数器加一,并返回给写客户端MapFile的绝对路径参数和附件在MapFile存储中的附件名称参数;客户端根据绝对路径参数和附件名称参数完成MapFile文件写入目标集群的写入操作;客户端调用MapFile名称服务缓存写完成,MapFile服务将集群的缓存的引用计数减一,实现对客户端写入操作进行监控。此时,MapFile名称服务检查当前MapFile写计数器为0,并且目标文件存储超过配置时间或者配置大小,通知客户端刷新集群中MapFile到HDFS文件系统,销毁当前的MapFile结构。
将目标文件保存至Redis集群,可以充分利用CPU资源,提高性能。同时对写入集群的缓存操作进行监控,提高了小文件处理效率。
S240、将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
本申请实施例,对待处理文件进行筛选,获得目标文件;响应于客户端调用请求,将目标参数发送至客户端,响应于客户端写入操作,将待存储文件保 存至目标集群。将目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将合并文件传输至HDFS。通过执行本实施例,可以将小文件保存至集群中,并将集群中的多个小文件合并成大文件,将大文件存储到HDFS中,节省了小文件处理的时间,提高了处理效率。
实施例三
图3是本申请实施例三提供的基于HDFS的小文件处理装置的结构示意图,如图3所示,基于HDFS的小文件处理装置包括:
目标文件获取模块310,设置为对待处理文件进行筛选,获得目标文件;
目标文件保存模块320,设置为响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
合并文件传输模块330,设置为将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
在一实施例中,所述目标文件保存模块320,包括:
调用请求响应单元,设置为响应于客户端调用请求,将目标参数发送至客户端,以供所述客户端根据所述目标参数,对目标文件进行处理,构建满足集群约束条件的待存储文件;其中,所述目标参数包括绝对路径参数和附件名称参数;
写入操作响应单元,设置为响应于客户端写入操作,将所述待存储文件保存至目标集群。
在一实施例中,所述装置还包括:
待编辑信息确定模块,设置为根据客户端发送的命名空间信息和数据集信息,确定待编辑信息;其中,所述待编辑信息包括计数器;
计数器操作模块,设置为响应于检测到所述待存储文件保存至所述目标集群,对所述计数器进行操作,以对所述客户端写入操作进行监控。
在一实施例中,所述合并文件传输模块330,包括:
存储量判断单元,设置为响应于所述目标集群的存储量满足预设存储量条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS;
存储时间判断单元,设置为响应于所述目标集群中的多个目标文件的存储时间满足预设时间条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
在一实施例中,所述合并文件传输模块330,包括:
缓存异常处理单元,设置为响应于检测到缓存异常,根据预设定时任务将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
在一实施例中,所述目标集群包括Redis集群;
所述目标文件保存模块320,还设置为根据预设写入规则将所述目标文件按业务分组,将分组后的目标文件保存至Redis集群。
在一实施例中,所述装置还包括:
客户端读取操作响应模块,设置为响应于客户端读取操作,检测目标集群中是否存在目标数据;
目标数据存在模块,设置为响应于所述目标集群中存在所述目标数据,将所述目标数据发送至客户端;
目标数据不存在模块,设置为响应于所述目标集群中不存在所述目标数据,获取HDFS中的目标数据,并将所述HDFS中的目标数据发送至客户端。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。
实施例四
本申请实施例还提供一种包含计算机可执行指令的介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种基于HDFS的小文件处理方法,该方法包括:
对待处理文件进行筛选,获得目标文件;
响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
介质是指任何的不同类型的存储器设备或存储设备。术语介质旨在包括:安装介质,例如只读光盘(Compact Disc Read-Only Memory,CD-ROM)、软盘或磁带装置;计算机系统存储器或随机存取存储器,诸如动态随机存取存储器(Dynamic Random Access Memory,DRAM)、双倍速同步随机存取储器(double data rate RAM,DDR RAM)、静态随机存取存储器(Static Random-Access Memory,SRAM)、扩展数据输出内存(Extended Data Out RAM,EDO RAM),兰巴斯(Rambus)RAM等;非易失性存储器,诸如闪存、磁介质,磁介质例如硬盘或光存储;寄存器或其它相似类型的存储器元件等。介质可以还包括其它 类型的存储器或组合。另外,介质可以位于执行程序的计算机系统中,或者可以位于不同的第二计算机系统中,第二计算机系统通过网络诸如因特网连接到计算机系统。第二计算机系统可以提供程序指令给计算机用于执行。术语介质可以包括可以驻留在不同位置中(例如在通过网络连接的不同计算机系统中)的两个或更多介质。介质可以存储可由一个或多个处理器执行的程序指令,程序指令例如为计算机程序。
本申请实施例所提供的一种包含计算机可执行指令的介质,计算机可执行指令不仅可实现如上所述的基于HDFS的小文件处理操作,还可以执行本申请任意实施例所提供的基于HDFS的小文件处理方法中的相关操作。
实施例五
本申请实施例提供了一种电子设备,该电子设备中可集成本申请实施例提供的基于HDFS的小文件处理装置。图4是本申请实施例五提供的一种电子设备的结构示意图。如图4所示,本实施例提供了一种电子设备400,包括:一个或多个处理器420;存储装置410,设置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器420执行,使得所述一个或多个处理器420实现本申请实施例所提供的基于HDFS的小文件处理方法,该方法包括:
对待处理文件进行筛选,获得目标文件;
响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至分布式文件系统HDFS。
本领域技术人员可以理解,处理器420还实现本申请任意实施例所提供的基于HDFS的小文件处理方法。
图4显示的电子设备400仅仅是一个示例。
如图4所示,该电子设备400包括处理器420、存储装置410、输入装置430和输出装置440;电子设备中处理器420的数量可以是一个或多个,图4中以一个处理器420为例;电子设备中的处理器420、存储装置410、输入装置430和输出装置440可以通过总线或其他方式连接,图4中以通过总线450连接为例。
存储装置410作为一种计算机可读介质,可设置为存储软件程序、计算机可执行程序以及模块单元,如本申请实施例中的基于HDFS的小文件处理方法对应的程序指令。
存储装置410可主要包括存储程序区和存储数据区,其中,存储程序区可 存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储装置410可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置410可包括相对于处理器420远程设置的存储器,这些远程存储器可以通过网络连接。上述网络的实例包括互联网、企业内部网、局域网、移动通信网及上述的组合。
输入装置430可设置为接收输入的数字、字符信息或语音信息,以及产生与电子设备400的用户设置以及功能控制有关的键信号输入。输出装置440可包括显示屏、扬声器等电子设备。
本申请实施例提供的电子设备,可以达到将小文件保存至集群中,并将集群中的多个小文件合并成大文件,将大文件存储到HDFS中,节省了小文件处理的时间,提高了处理效率。
上述实施例中提供的基于HDFS的小文件处理装置、介质及电子设备可执行本申请任意实施例所提供的基于HDFS的小文件处理方法,具备执行该方法相应的功能模块和有益效果。未在上述实施例中描述的技术细节,可参见本申请任意实施例所提供的基于HDFS的小文件处理方法。

Claims (18)

  1. 一种基于HDFS的小文件处理方法,包括:
    对待处理文件进行筛选,获得目标文件;
    响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
    将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至分布式文件系统HDFS。
  2. 根据权利要求1所述的方法,其中,根据预设写入规则将所述目标文件保存至目标集群,包括:
    响应于客户端调用请求,将目标参数发送至客户端,以供所述客户端根据所述目标参数,对目标文件进行处理,构建满足集群约束条件的待存储文件;其中,所述目标参数包括绝对路径参数和附件名称参数;
    响应于客户端写入操作,将所述待存储文件保存至目标集群。
  3. 根据权利要求2所述的方法,在响应于客户端调用请求,将目标参数发送至客户端之前,所述方法还包括:
    根据客户端发送的命名空间信息和数据集信息,确定待编辑信息;其中,所述待编辑信息包括计数器;
    在响应于客户端写入操作,将所述待存储文件保存至目标集群之后,所述方法还包括:
    响应于检测到所述待存储文件保存至所述目标集群,对所述计数器进行操作,以对所述客户端写入操作进行监控。
  4. 根据权利要求1所述的方法,其中,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS,包括:
    响应于所述目标集群的存储量满足预设存储量条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  5. 根据权利要求1或4所述的方法,其中,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS,包括:
    响应于所述目标集群中的多个目标文件的存储时间满足预设时间条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  6. 根据权利要求1所述的方法,其中,将所述目标集群中的多个目标文件 按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS,包括:
    响应于检测到缓存异常,根据预设定时任务将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  7. 根据权利要求1所述的方法,其中,所述目标集群包括Redis集群;
    根据预设写入规则将所述目标文件保存至目标集群,包括:
    根据预设写入规则将所述目标文件按业务分组,将分组后的目标文件保存至Redis集群。
  8. 根据权利要求1所述的方法,还包括:
    响应于客户端读取操作,检测目标集群中是否存在目标数据;
    响应于所述目标集群中存在所述目标数据,将所述目标数据发送至客户端;
    响应于所述目标集群中不存在所述目标数据,获取HDFS中的目标数据,并将所述HDFS中的目标数据发送至客户端。
  9. 一种基于HDFS的小文件处理装置,包括:
    目标文件获取模块,设置为对待处理文件进行筛选,获得目标文件;
    目标文件保存模块,设置为响应于所述目标文件满足文件体积约束条件,根据预设写入规则将所述目标文件保存至目标集群;
    合并文件传输模块,设置为将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  10. 根据权利要求9所述的装置,其中,所述目标文件保存模块,包括:
    调用请求响应单元,设置为响应于客户端调用请求,将目标参数发送至客户端,以供所述客户端根据所述目标参数,对目标文件进行处理,构建满足集群约束条件的待存储文件;其中,所述目标参数包括绝对路径参数和附件名称参数;
    写入操作响应单元,设置为响应于客户端写入操作,将所述待存储文件保存至目标集群。
  11. 根据权利要求10所述的装置,所述装置还包括:
    待编辑信息确定模块,设置为根据客户端发送的命名空间信息和数据集信息,确定待编辑信息;其中,所述待编辑信息包括计数器;
    计数器操作模块,设置为响应于检测到所述待存储文件保存至所述目标集群,对所述计数器进行操作,以对所述客户端写入操作进行监控。
  12. 根据权利要求9所述的装置,其中,所述合并文件传输模块,包括:
    存储量判断单元,设置为响应于所述目标集群的存储量满足预设存储量条 件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  13. 根据权利要求9或12所述的装置,其中,所述合并文件传输模块,包括:
    存储时间判断单元,设置为响应于所述目标集群中的多个目标文件的存储时间满足预设时间条件,将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  14. 根据权利要求9所述的装置,其中,所述合并文件传输模块,包括:
    缓存异常处理单元,设置为响应于检测到缓存异常,根据预设定时任务将所述目标集群中的多个目标文件按文件类型进行合并,获得合并文件,并将所述合并文件传输至HDFS。
  15. 根据权利要求9所述的装置,其中,所述目标集群包括Redis集群;
    所述目标文件保存模块,还设置为根据预设写入规则将所述目标文件按业务分组,将分组后的目标文件保存至Redis集群。
  16. 根据权利要求9所述的装置,所述装置还包括:
    客户端读取操作响应模块,设置为响应于客户端读取操作,检测目标集群中是否存在目标数据;
    目标数据存在模块,设置为响应于所述目标集群中存在所述目标数据,将所述目标数据发送至客户端;
    目标数据不存在模块,设置为响应于所述目标集群中不存在所述目标数据,获取HDFS中的目标数据,并将所述HDFS中的目标数据发送至客户端。
  17. 一种计算机可读介质,所述计算机可读介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-8中任一项所述的基于HDFS的小文件处理方法。
  18. 一种电子设备,包括存储器,处理器及存储在所述存储器上并可由所述处理器运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-8中任一项所述的基于HDFS的小文件处理方法。
PCT/CN2021/110209 2021-04-19 2021-08-03 基于hdfs的小文件处理方法、装置、介质及电子设备 WO2022222303A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110417936.4 2021-04-19
CN202110417936.4A CN113111036A (zh) 2021-04-19 2021-04-19 一种基于hdfs的小文件处理方法、装置、介质及电子设备

Publications (1)

Publication Number Publication Date
WO2022222303A1 true WO2022222303A1 (zh) 2022-10-27

Family

ID=76718274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/110209 WO2022222303A1 (zh) 2021-04-19 2021-08-03 基于hdfs的小文件处理方法、装置、介质及电子设备

Country Status (2)

Country Link
CN (1) CN113111036A (zh)
WO (1) WO2022222303A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111036A (zh) * 2021-04-19 2021-07-13 北京锐安科技有限公司 一种基于hdfs的小文件处理方法、装置、介质及电子设备
CN113448938A (zh) * 2021-07-20 2021-09-28 恒安嘉新(北京)科技股份公司 数据处理方法、装置、电子设备及存储介质
CN114968939A (zh) * 2022-05-31 2022-08-30 济南浪潮数据技术有限公司 一种文件合并方法、装置及计算可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053863A (zh) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 适合大小文件的海量医疗数据存储系统及数据存储方法
CN108595589A (zh) * 2018-04-19 2018-09-28 中国科学院电子学研究所苏州研究院 一种海量科学数据图片高效存取方法
US10152493B1 (en) * 2015-06-30 2018-12-11 EMC IP Holding Company LLC Dynamic ephemeral point-in-time snapshots for consistent reads to HDFS clients
CN110825694A (zh) * 2019-11-01 2020-02-21 北京锐安科技有限公司 数据处理方法、装置、设备和存储介质
CN113111036A (zh) * 2021-04-19 2021-07-13 北京锐安科技有限公司 一种基于hdfs的小文件处理方法、装置、介质及电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913927A (zh) * 2020-07-16 2020-11-10 珠海大横琴科技发展有限公司 一种数据写入方法、装置及计算机设备
CN111913917A (zh) * 2020-07-24 2020-11-10 北京锐安科技有限公司 一种文件处理方法、装置、设备和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10152493B1 (en) * 2015-06-30 2018-12-11 EMC IP Holding Company LLC Dynamic ephemeral point-in-time snapshots for consistent reads to HDFS clients
CN108053863A (zh) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 适合大小文件的海量医疗数据存储系统及数据存储方法
CN108595589A (zh) * 2018-04-19 2018-09-28 中国科学院电子学研究所苏州研究院 一种海量科学数据图片高效存取方法
CN110825694A (zh) * 2019-11-01 2020-02-21 北京锐安科技有限公司 数据处理方法、装置、设备和存储介质
CN113111036A (zh) * 2021-04-19 2021-07-13 北京锐安科技有限公司 一种基于hdfs的小文件处理方法、装置、介质及电子设备

Also Published As

Publication number Publication date
CN113111036A (zh) 2021-07-13

Similar Documents

Publication Publication Date Title
WO2022222303A1 (zh) 基于hdfs的小文件处理方法、装置、介质及电子设备
US11422982B2 (en) Scaling stateful clusters while maintaining access
KR102293093B1 (ko) 분산된 데이터 스토어 내의 버젼형 계층 데이터 구조
US11487771B2 (en) Per-node custom code engine for distributed query processing
US10911369B2 (en) Processing event data using dynamic data server nodes
US11907216B2 (en) Multi-language fusion query method and multi-model database system
JP6266630B2 (ja) アーカイブされたリレーションを有する連続クエリの管理
WO2022063284A1 (zh) 数据同步方法、装置、设备及计算机可读介质
US11544232B2 (en) Efficient transaction log and database processing
WO2019109854A1 (zh) 分布式数据库数据处理方法、装置、存储介质及电子装置
US10747739B1 (en) Implicit checkpoint for generating a secondary index of a table
CN114090580A (zh) 数据处理方法、装置、设备、存储介质及产品
CN116108057A (zh) 一种分布式数据库访问方法、装置、设备及存储介质
US10445157B2 (en) Concurrent services caching
WO2024041376A1 (zh) 分布式图数据处理系统、方法、装置、设备及存储介质
US11321198B2 (en) Event failover service
US11301517B2 (en) Method and system for identifying, managing, and monitoring data dependencies
WO2023155591A1 (zh) 进度信息管控方法、微服务装置、电子设备及存储介质
CN111090782A (zh) 一种图数据存储方法、装置、设备及存储介质
CN117056033A (zh) 数据处理方法、装置、mec服务器及存储介质
CN111459931A (zh) 数据查重方法和数据查重装置
CN113051244B (zh) 数据访问方法和装置、数据获取方法和装置
CN117390040B (zh) 基于实时宽表的业务请求处理方法、设备及存储介质
US11934370B1 (en) Data store indexing engine with automated refresh
CN116821219A (zh) 一种数据装载方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21937527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21937527

Country of ref document: EP

Kind code of ref document: A1