CN113742410A - Data processing method, data processing apparatus, electronic device, storage medium, and program product - Google Patents

Data processing method, data processing apparatus, electronic device, storage medium, and program product Download PDF

Info

Publication number
CN113742410A
CN113742410A CN202111064294.0A CN202111064294A CN113742410A CN 113742410 A CN113742410 A CN 113742410A CN 202111064294 A CN202111064294 A CN 202111064294A CN 113742410 A CN113742410 A CN 113742410A
Authority
CN
China
Prior art keywords
file
data
virtual column
time virtual
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111064294.0A
Other languages
Chinese (zh)
Inventor
关振宇
朱家强
郑为锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lakala Payment Co ltd
Original Assignee
Lakala Payment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lakala Payment Co ltd filed Critical Lakala Payment Co ltd
Priority to CN202111064294.0A priority Critical patent/CN113742410A/en
Publication of CN113742410A publication Critical patent/CN113742410A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: detecting whether a target file is operated; when the target file is detected to be operated, acquiring the operation time of the target file; and sending the target file operation time to a data warehouse component so that the data warehouse component stores the target file operation time. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.

Description

Data processing method, data processing apparatus, electronic device, storage medium, and program product
Technical Field
The disclosed embodiments relate to the technical field of data processing, and in particular, to a data processing method, an apparatus, an electronic device, a storage medium, and a program product.
Background
hive is a data warehouse tool based on Hadoop, and is used for performing operations such as data extraction, transformation and loading, and the like, so as to realize query and analysis on large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
Disclosure of Invention
The disclosed embodiment provides a data processing method, a data processing device, an electronic device, a storage medium and a program product.
In a first aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
detecting whether a target file is operated;
when the target file is detected to be operated, acquiring the operation time of the target file;
and sending the target file operation time to a data warehouse component so that the data warehouse component stores the target file operation time.
In a second aspect, a data processing method is provided in an embodiment of the present disclosure.
Specifically, the data processing method includes:
receiving the operation time of a target file sent by a file system;
determining a preset data writing format;
and writing the target file operation time into a file time virtual column according to the preset data writing format.
With reference to the second aspect, in a first implementation manner of the second aspect, an embodiment of the present disclosure further includes:
a file time virtual column is set.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, an embodiment of the present disclosure further includes:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the present disclosure further includes:
responding to a received file time virtual column data read-write command, and determining whether an opening parameter of a file time virtual column is opened or not;
when the starting parameter of the file time virtual column is started, performing data reading and writing on the file time virtual column according to the file time virtual column data reading and writing command;
and when the starting parameter of the file time virtual column is not started, returning unavailable prompt information of the file time virtual column data.
In a third aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
the file system detects whether a target file is operated, and when the target file is detected to be operated, the operation time of the target file is obtained and sent to the data warehouse component;
and the data warehouse component receives the target file operation time sent by the file system, determines a preset data writing format, and writes the target file operation time into the file time virtual column according to the preset data writing format.
With reference to the third aspect, in a first implementation manner of the third aspect, the embodiment of the present disclosure further includes:
the data warehouse component sets the file time virtual column.
With reference to the third aspect and the first implementation manner of the third aspect, in a second implementation manner of the third aspect, an embodiment of the present disclosure further includes:
the data warehouse component sets an opening parameter of the file time virtual column, wherein the opening parameter comprises opening and non-opening.
With reference to the third aspect, the first implementation manner of the third aspect, and the second implementation manner of the third aspect, in a third implementation manner of the third aspect, the embodiment of the present disclosure further includes:
the data warehouse component responds to a received file time virtual column data read-write command, determines whether an opening parameter of a file time virtual column is opened or not, reads and writes data of the file time virtual column according to the file time virtual column data read-write command when the opening parameter of the file time virtual column is opened, and returns file time virtual column data unavailable prompt information when the opening parameter of the file time virtual column is not opened.
In a fourth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
a detection module configured to detect whether a target file is operated;
the acquisition module is configured to acquire the operation time of the target file when the target file is detected to be operated;
a sending module configured to send the target file operation time to a data warehouse component so that the data warehouse component stores the target file operation time.
In a fifth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
the receiving module is configured to receive the target file operation time sent by the file system;
a determination module configured to determine a preset data writing format;
and the writing module is configured to write the target file operation time into a file time virtual column according to the preset data writing format.
With reference to the fifth aspect, in a first implementation manner of the fifth aspect, an embodiment of the present disclosure further includes:
a setting module configured to set a file time virtual column.
With reference to the fifth aspect and the first implementation manner of the fifth aspect, in a second implementation manner of the fifth aspect, the setting module is further configured to:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
With reference to the fifth aspect, the first implementation manner of the fifth aspect, and the second implementation manner of the fifth aspect, in a third implementation manner of the fifth aspect, the writing module is further configured to:
responding to a received file time virtual column data read-write command, and determining whether an opening parameter of a file time virtual column is opened or not;
when the starting parameter of the file time virtual column is started, performing data reading and writing on the file time virtual column according to the file time virtual column data reading and writing command;
and when the starting parameter of the file time virtual column is not started, returning unavailable prompt information of the file time virtual column data.
In a sixth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
the file system is configured to detect whether a target file is operated, and when the target file is detected to be operated, obtain the operation time of the target file and send the operation time of the target file to the data warehouse component;
the data warehouse component is configured to receive target file operation time sent by a file system, determine a preset data writing format and write the target file operation time into a file time virtual column according to the preset data writing format.
With reference to the sixth aspect, in a first implementation manner of the sixth aspect, the embodiment of the present disclosure, the data warehouse component is further configured to:
a file time virtual column is set.
With reference to the sixth aspect and the first implementation manner of the sixth aspect, in a second implementation manner of the sixth aspect, an embodiment of the present disclosure further configured to:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
With reference to the sixth aspect, the first implementation manner of the sixth aspect, and the second implementation manner of the sixth aspect, in a third implementation manner of the sixth aspect, an embodiment of the present disclosure is further configured to:
and responding to a received file time virtual column data read-write command, determining whether an opening parameter of the file time virtual column is opened or not, reading and writing the data of the file time virtual column according to the file time virtual column data read-write command when the opening parameter of the file time virtual column is opened, and returning unavailable prompt information of the file time virtual column data when the opening parameter of the file time virtual column is not opened.
In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer instructions for supporting a data processing apparatus to execute the data processing method, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.
In an eighth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a data processing apparatus, which contains computer instructions for executing the data processing method described above as a data processing apparatus.
In a ninth aspect, the disclosed embodiments provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above-described data processing method.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme, the automatic recording of the file operation time is realized by means of the newly-arranged virtual columns. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.
Drawings
Other features, objects, and advantages of embodiments of the disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 5 shows a block diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 6 shows a block diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 8 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the disclosed embodiments will be described in detail with reference to the accompanying drawings so that they can be easily implemented by those skilled in the art. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the disclosed embodiments, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
According to the technical scheme provided by the embodiment of the disclosure, the automatic recording of the file operation time is realized by means of the newly-arranged virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present disclosure, which includes the following steps S101 to S103, as shown in fig. 1:
in step S101, it is detected whether the target file is operated;
in step S102, when it is detected that the target file is operated, acquiring a target file operation time;
in step S103, the target file operation time is sent to a data warehouse component, so that the data warehouse component stores the target file operation time.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing method is proposed that realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In an embodiment of the present disclosure, the data processing method may be applied to a file system such as a computer, an electronic device, a server cluster, or the like that performs data processing.
In an embodiment of the present disclosure, the target file refers to a file whose operation time needs to be determined and recorded. The target file is stored in the file system, which refers to a system for storing files, and the file system may be, for example, a Hadoop Distributed File System (HDFS).
In an embodiment of the present disclosure, the operations may include a plurality of operation types: create, modify, edit, delete, etc. The target file operation time refers to a time at which the target file is operated, such as a creation time, a modification time, an editing time, a deletion time, and the like.
In an embodiment of the present disclosure, the data warehouse component refers to a component for storing information related to the file system and/or files stored in the file system. When the file system is a Hadoop Distributed File System (HDFS), the data warehouse component may be, for example, a Hadoop-based data warehouse tool hive, and the hive may be used to perform operations such as data extraction, conversion, and loading, so as to query and analyze large-scale data stored in the Hadoop.
In the above embodiment, if the file system detects that the target file is operated, the operation time of the target file is obtained, and the obtained operation time of the target file is sent to the data warehouse component, so that the data warehouse component stores the operation time of the target file for subsequent extraction, conversion and loading of relevant data, thereby avoiding explicit programming processing of a time field of the data warehouse component, and further reducing the storage pressure of the data warehouse.
Fig. 2 shows a flowchart of a data processing method according to another embodiment of the present disclosure, which includes the following steps S201 to S203, as shown in fig. 2:
in step S201, receiving a target file operation time sent by the file system;
in step S202, a preset data write format is determined;
in step S203, the target file operation time is written into the file time virtual column according to the preset data writing format.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing method is proposed that realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In an embodiment of the present disclosure, the data processing method may be applied to data warehouse components such as computers, electronic devices, servers, server clusters, and the like, which perform data processing.
In an embodiment of the present disclosure, the preset data writing format refers to a preset data writing format that meets requirements of a data warehouse component.
In one embodiment of the present disclosure, the file time virtual column is a virtual column disposed in the data warehouse component for storing file operation times in a file system.
In the above embodiment, after receiving the target file operation time sent by the file system, the data warehouse component writes the target file operation time into the file time virtual column according to the preset data writing format, thereby avoiding explicit programming processing of the data warehouse component time field, and further reducing the storage pressure of the data warehouse.
In an embodiment of the present disclosure, the method may further include the steps of:
a file time virtual column is set.
The use of the FILE time virtual column mentioned above can avoid explicit programming of the time field of the data warehouse component, and reduce the storage pressure of the data warehouse, and before the FILE time virtual column is used, it needs to be set and defined, for example, a new virtual column named FILE __ time is defined in the data warehouse component, so that it can be used to store the operation time of the FILE in the FILE system.
In an embodiment of the present disclosure, the method may further include the steps of:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
As one skilled in the art will appreciate, the conventional ROW __ OFFSET __ INSIDE __ BLOCK column is not turned on by default, and the on function is realized only by setting the corresponding parameters. Therefore, in this embodiment, in order to provide the selectivity of the file time virtual column, an open parameter may also be set for the file time virtual column, so as to implement the enabling control for the file time virtual column through the setting of the open parameter. The starting parameter may include two parameter values of starting and non-starting, when the starting parameter is starting, the file time virtual column is in an available state, otherwise, when the starting parameter is non-starting, the file time virtual column is in an unavailable state.
In an embodiment of the present disclosure, the method may further include the steps of:
responding to a received file time virtual column data read-write command, and determining whether an opening parameter of a file time virtual column is opened or not;
when the starting parameter of the file time virtual column is started, performing data reading and writing on the file time virtual column according to the file time virtual column data reading and writing command;
and when the starting parameter of the file time virtual column is not started, returning unavailable prompt information of the file time virtual column data.
After the file time virtual column is set, reading and writing data for the file time virtual column, but considering that the file time virtual column is provided with an opening parameter, when reading and writing data for the file time virtual column, different operations need to be executed according to different opening parameters, for example, after a file time virtual column data reading and writing command is received, whether the opening parameter of the file time virtual column is opened or not is determined, if the opening parameter of the file time virtual column is opened, data reading and writing are performed for the file time virtual column according to the file time virtual column data reading and writing command, otherwise, if the opening parameter of the file time virtual column is not opened, file time virtual column data unavailability prompt information is returned to prompt that the file time virtual column is currently unavailable, the starting parameter of the file time virtual column can be modified to be started, and then the data read-write operation of the file time virtual column is executed.
Technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 1 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments, reference may be made to the above explanation of the explanation of fig. 1 and related embodiments, and no further description is provided here.
Fig. 3 illustrates a flowchart of a data processing method according to still another embodiment of the present disclosure, which includes the following steps S301 to S302, as illustrated in fig. 3:
in step S301, the file system detects whether a target file is operated, and when it is detected that the target file is operated, obtains a target file operation time, and sends the target file operation time to the data warehouse component;
in step S302, the data warehouse component receives the target file operation time sent by the file system, determines a preset data writing format, and writes the target file operation time into the file time virtual column according to the preset data writing format.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing method is proposed that realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In one embodiment of the present disclosure, the data processing method may be applied to a system for data processing including a file system and a data warehouse component.
In an embodiment of the present disclosure, the method may further include the steps of:
the data warehouse component sets the file time virtual column.
In an embodiment of the present disclosure, the method may further include the steps of:
the data warehouse component sets an opening parameter of the file time virtual column, wherein the opening parameter comprises opening and non-opening.
In an embodiment of the present disclosure, the method may further include the steps of:
the data warehouse component responds to a received file time virtual column data read-write command, determines whether an opening parameter of a file time virtual column is opened or not, reads and writes data of the file time virtual column according to the file time virtual column data read-write command when the opening parameter of the file time virtual column is opened, and returns file time virtual column data unavailable prompt information when the opening parameter of the file time virtual column is not opened.
Technical terms and technical features related to the technical terms and technical features shown in fig. 3 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 1-2 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 3 and related embodiments, reference may be made to the above explanation of the embodiment shown in fig. 1-2 and related embodiments, and no further description is provided here.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 4, the data processing apparatus includes:
a detection module 401 configured to detect whether a target file is operated;
an obtaining module 402 configured to obtain a target file operation time when it is detected that the target file is operated;
a sending module 403 configured to send the target file operation time to a data warehouse component so that the data warehouse component stores the target file operation time.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing apparatus is proposed which realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a file system such as a computer, an electronic device, a server cluster, or the like that performs data processing.
In an embodiment of the present disclosure, the target file refers to a file whose operation time needs to be determined and recorded. The target file is stored in the file system, which refers to a system for storing files, and the file system may be, for example, a Hadoop Distributed File System (HDFS).
In an embodiment of the present disclosure, the operations may include a plurality of operation types: create, modify, edit, delete, etc. The target file operation time refers to a time at which the target file is operated, such as a creation time, a modification time, an editing time, a deletion time, and the like.
In an embodiment of the present disclosure, the data warehouse component refers to a component for storing information related to the file system and/or files stored in the file system. When the file system is a Hadoop Distributed File System (HDFS), the data warehouse component may be, for example, a Hadoop-based data warehouse tool hive, and the hive may be used to perform operations such as data extraction, conversion, and loading, so as to query and analyze large-scale data stored in the Hadoop.
In the above embodiment, if the file system detects that the target file is operated, the operation time of the target file is obtained, and the obtained operation time of the target file is sent to the data warehouse component, so that the data warehouse component stores the operation time of the target file for subsequent extraction, conversion and loading of relevant data, thereby avoiding explicit programming processing of a time field of the data warehouse component, and further reducing the storage pressure of the data warehouse.
Fig. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 5, the data processing apparatus includes:
a receiving module 501, configured to receive a target file operation time sent by a file system;
a determining module 502 configured to determine a preset data writing format;
a writing module 503, configured to write the target file operation time into a file time virtual column according to the preset data writing format.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing apparatus is proposed which realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a data warehouse component such as a computer, an electronic device, a server cluster, and the like, which performs data processing.
In an embodiment of the present disclosure, the preset data writing format refers to a preset data writing format that meets requirements of a data warehouse component.
In one embodiment of the present disclosure, the file time virtual column is a virtual column disposed in the data warehouse component for storing file operation times in a file system.
In the above embodiment, after receiving the target file operation time sent by the file system, the data warehouse component writes the target file operation time into the file time virtual column according to the preset data writing format, thereby avoiding explicit programming processing of the data warehouse component time field, and further reducing the storage pressure of the data warehouse.
In an embodiment of the present disclosure, the apparatus may further include:
a setting module configured to set a file time virtual column.
The use of the FILE time virtual column mentioned above can avoid explicit programming of the time field of the data warehouse component, and reduce the storage pressure of the data warehouse, and before the FILE time virtual column is used, it needs to be set and defined, for example, a new virtual column named FILE __ time is defined in the data warehouse component, so that it can be used to store the operation time of the FILE in the FILE system.
In an embodiment of the present disclosure, the setting module may be further configured to:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
As one skilled in the art will appreciate, the conventional ROW __ OFFSET __ INSIDE __ BLOCK column is not turned on by default, and the on function is realized only by setting the corresponding parameters. Therefore, in this embodiment, in order to provide the selectivity of the file time virtual column, an open parameter may also be set for the file time virtual column, so as to implement the enabling control for the file time virtual column through the setting of the open parameter. The starting parameter may include two parameter values of starting and non-starting, when the starting parameter is starting, the file time virtual column is in an available state, otherwise, when the starting parameter is non-starting, the file time virtual column is in an unavailable state.
In an embodiment of the present disclosure, the writing module may be further configured to:
responding to a received file time virtual column data read-write command, and determining whether an opening parameter of a file time virtual column is opened or not;
when the starting parameter of the file time virtual column is started, performing data reading and writing on the file time virtual column according to the file time virtual column data reading and writing command;
and when the starting parameter of the file time virtual column is not started, returning unavailable prompt information of the file time virtual column data.
After the file time virtual column is set, reading and writing data for the file time virtual column, but considering that the file time virtual column is provided with an opening parameter, when reading and writing data for the file time virtual column, different operations need to be executed according to different opening parameters, for example, after a file time virtual column data reading and writing command is received, whether the opening parameter of the file time virtual column is opened or not is determined, if the opening parameter of the file time virtual column is opened, data reading and writing are performed for the file time virtual column according to the file time virtual column data reading and writing command, otherwise, if the opening parameter of the file time virtual column is not opened, file time virtual column data unavailability prompt information is returned to prompt that the file time virtual column is currently unavailable, the starting parameter of the file time virtual column can be modified to be started, and then the data read-write operation of the file time virtual column is executed.
Technical terms and technical features related to the technical terms and technical features shown in fig. 5 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 4 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 5 and related embodiments, reference may be made to the above explanation of the explanation of fig. 4 and related embodiments, and no further description is provided here.
Fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data processing apparatus includes:
the file system 601 is configured to detect whether a target file is operated, and when the target file is detected to be operated, obtain target file operation time and send the target file operation time to the data warehouse component;
the data warehouse component 602 is configured to receive a target file operation time sent by a file system, determine a preset data writing format, and write the target file operation time into a file time virtual column according to the preset data writing format.
As mentioned above, hive is a data warehouse tool based on Hadoop, which is used to perform operations such as data extraction, transformation, and loading, so as to realize query and analysis of large-scale data stored in Hadoop. When data extraction, conversion, loading and other operations are performed by using hive, a time field is usually required to be added to mark the processing time of the data file, but the setting of the time field increases the operation complexity, the manual workload and the data storage capacity.
In view of the above, in this embodiment, a data processing apparatus is proposed which realizes automatic recording of file operation time by newly setting a virtual column. According to the technical scheme, a time field is not required to be added to mark the processing time of the data file, so that the operation complexity, the manual workload and the data storage amount are reduced, the explicit programming processing of the time field is avoided, and the storage pressure of a data warehouse is reduced.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a system for data processing including a file system and a data warehouse component.
In an embodiment of the present disclosure, the data warehouse component 602 may be configured to:
the data warehouse component sets the file time virtual column.
In an embodiment of the present disclosure, the data warehouse component 602 may be configured to:
the data warehouse component sets an opening parameter of the file time virtual column, wherein the opening parameter comprises opening and non-opening.
In an embodiment of the present disclosure, the data warehouse component 602 may be further configured to:
the data warehouse component responds to a received file time virtual column data read-write command, determines whether an opening parameter of a file time virtual column is opened or not, reads and writes data of the file time virtual column according to the file time virtual column data read-write command when the opening parameter of the file time virtual column is opened, and returns file time virtual column data unavailable prompt information when the opening parameter of the file time virtual column is not opened.
Technical terms and technical features related to the technical terms and technical features shown in fig. 6 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 4-5 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 6 and related embodiments, reference may be made to the above explanation of the explanation of fig. 4-5 and related embodiments, and no further description is provided here.
The present disclosure also discloses an electronic device, fig. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 7, the electronic device 700 includes a memory 701 and a processor 702; wherein the content of the first and second substances,
the memory 701 is used to store one or more computer instructions, which are executed by the processor 702 to implement the above-described method steps.
Fig. 8 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present disclosure.
As shown in fig. 8, the computer system 800 includes a processing unit 801 which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The processing unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary. The processing unit 801 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.
In particular, the above described methods may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the route planning method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811.
A computer program product is also disclosed in embodiments of the present disclosure, the computer program product comprising computer programs/instructions which, when executed by a processor, implement any of the above method steps.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the disclosed embodiment also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A method of data processing, comprising:
detecting whether a target file is operated;
when the target file is detected to be operated, acquiring the operation time of the target file;
and sending the target file operation time to a data warehouse component so that the data warehouse component stores the target file operation time.
2. A method of data processing, comprising:
receiving the operation time of a target file sent by a file system;
determining a preset data writing format;
and writing the target file operation time into a file time virtual column according to the preset data writing format.
3. The method of claim 2, further comprising:
a file time virtual column is set.
4. The method of claim 3, further comprising:
and setting the starting parameters of the file time virtual column, wherein the starting parameters comprise starting and non-starting.
5. The method of claim 4, further comprising:
responding to a received file time virtual column data read-write command, and determining whether an opening parameter of a file time virtual column is opened or not;
when the starting parameter of the file time virtual column is started, performing data reading and writing on the file time virtual column according to the file time virtual column data reading and writing command;
and when the starting parameter of the file time virtual column is not started, returning unavailable prompt information of the file time virtual column data.
6. A method of data processing, comprising:
the file system detects whether a target file is operated, and when the target file is detected to be operated, the operation time of the target file is obtained and sent to the data warehouse component;
and the data warehouse component receives the target file operation time sent by the file system, determines a preset data writing format, and writes the target file operation time into the file time virtual column according to the preset data writing format.
7. The method of claim 6, further comprising:
the data warehouse component sets the file time virtual column.
8. The method of claim 7, further comprising:
the data warehouse component sets an opening parameter of the file time virtual column, wherein the opening parameter comprises opening and non-opening.
9. The method of claim 8, further comprising:
the data warehouse component responds to a received file time virtual column data read-write command, determines whether an opening parameter of a file time virtual column is opened or not, reads and writes data of the file time virtual column according to the file time virtual column data read-write command when the opening parameter of the file time virtual column is opened, and returns file time virtual column data unavailable prompt information when the opening parameter of the file time virtual column is not opened.
10. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the steps of the method of any one of claims 1-9.
CN202111064294.0A 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product Pending CN113742410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111064294.0A CN113742410A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111064294.0A CN113742410A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN113742410A true CN113742410A (en) 2021-12-03

Family

ID=78738117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111064294.0A Pending CN113742410A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN113742410A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100274764A1 (en) * 2009-04-22 2010-10-28 International Business Machines Corporation Accessing snapshots of a time based file system
US20150088924A1 (en) * 2013-09-23 2015-03-26 Daniel ABADI Schema-less access to stored data
CN109240991A (en) * 2018-09-26 2019-01-18 Oppo广东移动通信有限公司 File recommendation method, device, storage medium and intelligent terminal
US20190311051A1 (en) * 2018-04-04 2019-10-10 Sap Se Virtual columns to expose row specific details for query execution in column store databases
CN111095238A (en) * 2017-09-29 2020-05-01 甲骨文国际公司 Processing semi-structured and unstructured data in a partitioned database environment
US20200201813A1 (en) * 2018-12-21 2020-06-25 Dropbox, Inc. Scaling hdfs for hive
CN111680092A (en) * 2020-06-05 2020-09-18 深圳市卡数科技有限公司 Method, system, server and storage medium for importing data into hive table

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100274764A1 (en) * 2009-04-22 2010-10-28 International Business Machines Corporation Accessing snapshots of a time based file system
US20150088924A1 (en) * 2013-09-23 2015-03-26 Daniel ABADI Schema-less access to stored data
CN111095238A (en) * 2017-09-29 2020-05-01 甲骨文国际公司 Processing semi-structured and unstructured data in a partitioned database environment
US20190311051A1 (en) * 2018-04-04 2019-10-10 Sap Se Virtual columns to expose row specific details for query execution in column store databases
CN109240991A (en) * 2018-09-26 2019-01-18 Oppo广东移动通信有限公司 File recommendation method, device, storage medium and intelligent terminal
US20200201813A1 (en) * 2018-12-21 2020-06-25 Dropbox, Inc. Scaling hdfs for hive
CN111680092A (en) * 2020-06-05 2020-09-18 深圳市卡数科技有限公司 Method, system, server and storage medium for importing data into hive table

Similar Documents

Publication Publication Date Title
CN101650660B (en) Booting a computer system from central storage
US8112464B2 (en) On-demand access to container file directories
US9015695B2 (en) Information processing apparatus and information processing method
CN104238963B (en) A kind of date storage method, storage device and storage system
US20150032690A1 (en) Virtual synchronization with on-demand data delivery
US8321482B2 (en) Selectively modifying files of a container file
CN111414231B (en) Method and equipment for mutual conversion between virtual machine mirror image and container mirror image
US9165603B2 (en) Method and apparatus for grouping video tracks in a video editing timeline
CN112306993A (en) Data reading method, device and equipment based on Redis and readable storage medium
CN115114232A (en) Method, device and medium for enumerating historical version objects
US11157456B2 (en) Replication of data in a distributed file system using an arbiter
CN109347899B (en) Method for writing log data in distributed storage system
CN114461691A (en) Control method and device of state machine, electronic equipment and storage medium
CN113742410A (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN110704091A (en) Firmware upgrading method and device
CN107766385B (en) Method and equipment for converting file format of virtual disk
KR20100050098A (en) Image processing apparatus and control method thereof
US11269848B2 (en) Preventing unnecessary upload
CN114443442A (en) Log storage method and electronic equipment
CN111399753B (en) Method and device for writing pictures
CN112925796A (en) Write consistency control method, device, equipment and storage medium
CN109347896B (en) Information processing method, equipment and computer readable storage medium
CN113297317A (en) Data table synchronization method and device, electronic equipment and storage medium
CN108509252B (en) Virtual machine starting device and method and host
CN109660576B (en) User data real-time migration method, storage medium, electronic device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination