CN112035459A - Data format conversion method and device - Google Patents

Data format conversion method and device Download PDF

Info

Publication number
CN112035459A
CN112035459A CN202010905102.3A CN202010905102A CN112035459A CN 112035459 A CN112035459 A CN 112035459A CN 202010905102 A CN202010905102 A CN 202010905102A CN 112035459 A CN112035459 A CN 112035459A
Authority
CN
China
Prior art keywords
data
service
thread
processing
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010905102.3A
Other languages
Chinese (zh)
Inventor
李杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202010905102.3A priority Critical patent/CN112035459A/en
Publication of CN112035459A publication Critical patent/CN112035459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data format conversion method and a device, wherein the method comprises the following steps: receiving a data source address and a target address set by a user, a loader, a filter and a service class to be referred, and an exporter with a target import rule; starting a loader thread to read target data from a data source address, and converting the target data into entity data; transmitting the entity class data into a filter, starting a filter thread to clean and filter the entity class data according to a preset data filtering rule; starting a service thread to perform logic processing on the entity class data after being cleaned and filtered according to a preset service processing logic; and starting an exporter thread to read the entity class data after the logic processing, and writing the entity class data after the logic processing into a target address. The invention can realize the data format conversion process quickly, reliably and simply.

Description

Data format conversion method and device
Technical Field
The invention relates to the technical field of big data processing, in particular to a data format conversion method and device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In an actual business scenario, a business department may obtain customer data from multiple channels. In the face of different data sources, the formats of the acquired client data are also various, for example, text data, background oracle export data, excel data, and possibly large data such as hive, hbase, and mango. After receiving the client data, it is often necessary to convert the client data into data in a uniform format, so as to facilitate subsequent business processing according to the same logic.
In the existing data format conversion scheme, data in different formats need to be processed respectively, so that the number of times of processing the data is large, the repeated workload is large, and the processing period is long; meanwhile, when processing the data of the next format after processing the data of one format, all codes need to be changed once according to the format of the next data, the changed codes need to be tested in a full disk, so that the test workload is large, and developers are required to have strong capability of troubleshooting problems so as to modify the problems in the code test in time, otherwise, the subsequent data processing process is seriously influenced. Therefore, how to provide a fast, reliable and simple data format conversion method becomes a problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the invention provides a data format conversion method, which is used for quickly, reliably and simply realizing a data format conversion process and comprises the following steps:
receiving a data source address and a target address set by a user, a loader, a filter and a service class required to be quoted, and an exporter set with a target import rule, wherein the loader is used for reading data in at least one specified format, and the target import rule defines that the data is written into the target address in a format of storing the data in the target address;
starting a loader thread to read target data from a data source address, and converting the target data into entity data;
transmitting the entity class data into a filter, starting a filter thread to clean and filter the entity class data according to a preset data filtering rule;
transmitting the entity class data after being cleaned and filtered into a service class, and starting a service thread to perform logic processing on the entity class data after being cleaned and filtered according to a preset service processing logic;
and starting an exporter thread to read the entity class data after the logic processing, and writing the entity class data after the logic processing into a target address.
The embodiment of the present invention further provides a data format conversion device, which is used for quickly, reliably and simply implementing a data format conversion process, and the device includes:
the receiving module is used for receiving a data source address and a target address set by a user, a loader, a filter and a service class which need to be referred, and an exporter with a target import rule, wherein the loader is used for reading data in at least one specified format, and the target import rule defines that the data is written into the target address in a format of storing the data in the target address;
the data reading module is used for starting a loader thread to read target data from a data source address and converting the target data into entity data;
the data cleaning module is used for transmitting the entity class data converted by the data reading module into the filter, and starting a filter thread to clean and filter the entity class data according to a preset data filtering rule;
the logic processing module is used for transmitting the entity class data cleaned and filtered by the data cleaning module into a service class, and starting a service thread to perform logic processing on the cleaned and filtered entity class data according to a preset service processing logic;
and the data import module is used for starting an exporter thread to read the entity class data which is logically processed by the logic processing module and writing the entity class data which is logically processed into the target address.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the data format conversion method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the data format conversion method is stored in the computer-readable storage medium.
In the embodiment of the invention, the data format conversion process can be realized by directly referencing the loader, the filter, the service class and the exporter component, the realization is simple, the development time is short, and developers do not need to concern the intermediate service logic; the data in various formats can be read only by the inheritance realization loader, the data can be converted into the data in the uniform format by the inheritance realization exporter, and the code logic is simple. Meanwhile, multithreading streaming processing is adopted, threads are adopted to process the whole processes of reading data, cleaning data, processing data and importing data respectively, large data volume processing is finished at the fastest speed under the condition of guaranteeing multithreading safety of data, the performance is efficient and stable, the problem of blocking is not generated during synchronous processing, and the smoothness of service using data is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flow chart of a data format conversion method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another data format conversion method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another data format conversion method according to an embodiment of the present invention;
FIG. 4 is a flow chart of another data format conversion method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data format conversion device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The embodiment of the invention provides a data format conversion method, which is based on Java technology, uses a spring boot frame to construct a portable and efficient platform frame, simultaneously starts a plurality of threads, and comprises the steps of reading and reading data, screening the data, processing the data (running when a process can be consumed), exporting data process flow type asynchronous processing, and using redis lightweight storage to solve the problem of storing hundred million-level clients and realize quick reading.
As shown in fig. 1, the method includes steps 101 to 105:
step 101, receiving a data source address and a target address set by a user, a loader, a filter and a service class to be referred to, and an exporter with a target import rule.
The loader is used for reading data in at least one specified format, and the loader identifies which format the current data is in through suffixes of target data, such as text data suffixed with txt, word data suffixed with doc or docx, and the like. After the data format is identified, a read operation is performed by a loader that can read the data in that format.
In the embodiment of the invention, a loader can be defined to read data in various formats. A loader can be defined to read data in one format, and the loader capable of reading data in different formats is referred, so that the target data can be read smoothly. The format of the data read by the loader can be configured, for example, a new readable data format is added, or a data format which is not used any more is deleted.
The target import rule defines writing data to a target address in a format in which the data is stored at the target address. For example, if the target address is a traditional relational Database, an Exporter of the traditional relational Database type is referred, a Java Database connection (JDBC) is created by the Exporter of the traditional relational Database type, and the processed target data is written into the traditional relational Database such as oracle, mysql, and the like according to the format of data stored in the traditional relational Database; if the target address is a large database, an exporter of the type of the large database is quoted, and the exporter writes the processed target data into the large database such as hive, hbase, mongo, hdfs and the like according to the format of the data stored in the large database; and if the target address is a text, the text type Exporter is referred, and the processed target data is imported into the text according to the format of the data stored in the text, such as csv and the like.
Specifically, the referenced loader, filter, service, exporter, and data source address, destination address, etc. information may be configured in the properties file.
It should be noted that, if the target address is a traditional relational database or a big database, the target address may not be configured; in other cases, the target address needs to be configured.
In an implementation manner of the embodiment of the invention, a master function engine mechanism is adopted to refer to different components, the referred components, namely loaderList, FilterChain, ServiceChain and ExporterList, are connected in series in one job in sequence, the master function engine starts service, controls overall scheduling, calls each Chain to start one thread, the former thread is used as a producer, the latter one is used as a consumer, and target data flows through each thread for asynchronous multithreading processing.
Before receiving the service class to be referred to set by the user, as shown in fig. 2, the following steps 201 and 202 may also be performed:
step 201, adding a service class, and inheriting a service parent class by using the added service class.
And inheriting the service parent class by using the newly added service class so that the newly added code is identified as the service class by the program.
Step 202, receiving a service processing logic set by a user, and writing the service processing logic into an excute method of the inherited service class to obtain a service class which can be referred.
In the embodiment of the invention, an expansion function is provided, a user can customize the business processing logic, and a new service class is generated by utilizing the customized business processing logic. The generated new service class may be referenced by the user in step 101.
And 102, starting a loader thread to read target data from a data source address, and converting the target data into entity data.
Entity class data is a type of data in computer programming that gives a field name to a field.
For example, the algorithm that can be used when converting to entity class data is as follows:
Figure BDA0002661133930000051
when calculating the number of cross-bank transfer customer transactions in the year, the input data is as follows:
zhangsan, a certain bank of workers, 100.00, 20200201
Zhangsan, a bank of agriculture, 222.00, 20200601
Lisi, a certain bank of workers, 500.00, 20200201
With the above algorithm, the transformed entity class data is:
CustomerEnty zhangsan=new CustomerEnty();
zhangsan.name=zhangsan;
zhangsan.times=2;
zhangsan.volumns=322.00;
CustomerEnty lisi=new CustomerEnty();
lisi.name=lisi;
lisi.times=1;
lisi.volumns=500.00;
and 103, transmitting the entity class data into a filter, and starting a filter thread to clean and filter the entity class data according to a preset data filtering rule.
The data filtering rules include some general rules and user-defined rules, for example, the general rules may be used for cleaning junk data which do not conform to the format, non-compliant data, and the like; the user-defined rule can be that whether the field value of the vehicle field is yes is screened out, and the data which is not yes is cleaned.
And 104, transmitting the entity class data after cleaning and filtering into a service class, and starting a service thread to perform logic processing on the entity class data after cleaning and filtering according to a preset service processing logic.
The Service can be internally provided with various functions for efficiently and accurately processing data in different application scenes, and illustratively, the Service class can provide a regular expression matching method for identifying transaction amount fields with symbols and units in text data; or, a word segmentation algorithm of a sentence can be provided, and fields such as the name of the client, the number of the purchased financial product, the transaction amount and the like in the sentence can be analyzed.
In another implementation manner, as shown in fig. 3, the starting of the service thread performs logic processing on the entity class data after cleaning and filtering according to a preset service processing logic, and the following steps 1041 and 1042 may be specifically performed:
and 1041, when the service processing logic is identified as a complex logic by the user, executing an excute method by using the service thread, and caching the entity class data after being cleaned and filtered to the redis.
And step 1042, taking out the cached data from the redis for logic processing.
Considering that the processing time of complex logic is long, the data transmitted by the filter can be stored in the redis, and the data is taken out from the redis for logic processing after the current data is processed, so as to avoid blocking. That is, if the data processing is simple, the data does not need to be stored in redis and can be directly processed by the service; if complex logic is processed, for example, searching clients of various types of financing purchasing topN in the clients and outputting all business staff leaderboards of each organization, the data needs to be cached to redis, and then the data is read from the redis in the final clear aggregation stage of the service for logic processing. With the use of redis lightweight storage, hundred million levels of customer data can be stored and fast reading can be achieved.
In another implementation manner of the embodiment of the present invention, as shown in fig. 4, the following step 401 may also be performed:
step 401, receiving the atlas method of the service class rewritten by the user.
Wherein the atlas method is used to indicate that logical processing is performed after all data in the data source address is transmitted into the service class.
After the step 1042 is executed to cache the entity class data after being cleaned and filtered to the redis, the following steps 402 to 403 may also be executed:
step 402, starting a service thread to execute an atlas method, taking out entity class data after cleaning and filtering from the redis and storing the entity class data into an atlas result until all data received from a data source address are processed and stored into the atlas result.
And step 403, taking out all cleaned and filtered entity class data stored in the atlas result for logic processing.
For example, the atlas method may be used in a sequencing scenario for a full customer savings amount, or in other application scenarios where customer data for a full customer is required to participate in a logical process altogether.
And 105, starting an exporter thread to read the entity class data after the logic processing, and writing the entity class data after the logic processing into a target address.
In practical application, a plurality of exporters, such as ExcelExporter, mysql exporter, hdfsExporter, oracleExporter, etc., may be pre-established, and in the using process, the corresponding exporter may be referred to by configuring xml to implement data import of different target addresses.
Aiming at target data read by a loader thread in a reading process, sequentially processing the target data according to the sequence of loader thread processing, filter thread processing, service thread processing and exporter thread processing; and aiming at all target data, simultaneously executing a processing process by at least one loader thread, at least one filter thread, at least one service thread and at least one exporter thread. That is to say, each piece of data needs to be processed by a loader, a filter, a service, and an exporter in sequence, and there may be a plurality of loader threads, a plurality of filter threads, a plurality of service threads, and a plurality of exporter threads, after one loader thread reads data, another loader thread may continue to read data, a plurality of filter threads may also receive a plurality of pieces of data processed by a loader, and so on for other threads. By adopting the method, the data is processed simultaneously by multiple threads, the stream reading is realized simultaneously, and the method of reading and exporting can save IO to the maximum extent.
In another implementation manner of the embodiment of the invention, interface service can be provided, and the service class can be provided for an external system to perform matching operation except for internal use. In this case, the external interface needs to send a random string and to be encrypted in a combined manner such as 3DES and the like, and to agree on a fixed key, and continuous access is permitted only through encryption and decryption, so that data security is guaranteed.
The embodiment of the invention also utilizes a plurality of mechanisms to ensure the thread to stably and safely process data, for example, the used data comprises a threading mechanism, a locking mechanism, a thread safety counting mechanism, a Blocked thread mechanism, a waiting thread mechanism and the like. The implementation principle of the above mechanism will be briefly described below.
1. the mechanism of the thread
The same class is executed in different threads, and the thread ensures that specific values of the threads are isolated and independent from each other. When processing mass data, it is necessary to allow the same set of programs in a multi-threaded environment. Illustratively, designing a client computing class
Figure BDA0002661133930000071
Figure BDA0002661133930000081
Task A calculates the number of tall, rich and handsome, task B the number of house holders, and task C calculates the number of clients who have been marketed this month. And starting three threads to respectively create the class. Then the thread may guarantee that the three task computed results are 30, 40, 20, respectively, rather than all computing one result 90.
2. Thread safety counting mechanism
The using scene of the method can be PV and UV value calculation, and the mechanism ensures that counting is stable and threads are safe and reliable in a multi-thread environment.
Figure BDA0002661133930000082
3. Blocking thread mechanism and waiting thread mechanism
Since I/O is slower than cpu processing speed sometimes, with this mechanism, processing continues until there is data flowing into the read thread, otherwise waiting, and no time-out of the thread occurs.
4. Locking mechanism
The lock can be added on the class, the object or the method, the function is that B can not be read when A is read, the safety of the thread in the lock is ensured, the lock is suitable for various scenes, and the processing of the program is slow
In the embodiment of the invention, a perfect exception handling mechanism, a perfect monitoring mechanism and a perfect log mechanism are also provided, so that the data problem and the operation state are conveniently checked. The exception handling mechanism is as follows:
1. if abnormal data is input in the Loader stage, the abnormal data is output to the log, and meanwhile, the next piece of data is processed without throwing the abnormality.
2. And reading different structured unstructured data sources in the Loader stage, throwing the exception if the connection fails, outputting the data which does not accord with the service condition rule to a log when the filter stage filters and cleans the data, and continuing processing without throwing the exception.
3. In a complex data processing scene, a cache redis is needed, and if a connection error or a cache exception occurs, an exception is thrown.
4. And the Service logic processing layer throws the exception if the operation exception caused by the Service logic code error occurs.
5. Other exceptions may be handled by user-defined handling.
The advantages of the data format conversion method of the present invention will be described below by way of specific examples.
If the client matching is performed by using the traditional file uploading method, the following steps are required:
1) downloading a template;
2) filling information according to a fixed template, wherein the excel file supports at most sixty thousand lines and does not support massive clients;
3) developing an uploading file code;
4) copying the file to a server side;
5) updating a database state table, and setting the state table as matching;
6) performing asynchronous matching operation due to the performance relation;
7) for circularly matches the customers one by one to find out the customers under the institution and buy financial products;
8) establishing N financing product category maps;
9) after all clients finish reading, each map circularly finds out the first 10 clients;
10) inquiring the result, developing a export code, or exporting to an oracle or export text;
11) and releasing the resources.
The method in the embodiment of the invention is realized by the following steps:
directly configuring properties, configuring a Loader as an ExcelLoader class, configuring an Exporter as an oracleExporter class, configuring the dynamic variable class TreeMap length of a Server as 10, configuring an input file path (namely a data source address, a configurable folder, and can be used for placing massive files and reading the massive files one by one), and operating a master function engine.
Therefore, compared with the traditional client matching scheme in the prior art, the method and the device for matching the TB-level data are flexible and strong, efficient in processing and high in throughput, more than 6000 transactions are processed on a single second level, the performance of the distributed server is greatly improved, and TB-level data are processed within hours.
In the embodiment of the invention, the data format conversion process can be realized by directly referencing the loader, the filter, the service class and the exporter component, the realization is simple, the development time is short, and developers do not need to concern the intermediate service logic; the data in various formats can be read only by the inheritance realization loader, the data can be converted into the data in the uniform format by the inheritance realization exporter, and the code logic is simple. Meanwhile, multithreading streaming processing is adopted, threads are adopted to process the whole processes of reading data, cleaning data, processing data and importing data respectively, large data volume processing is finished at the fastest speed under the condition of guaranteeing multithreading safety of data, the performance is efficient and stable, the problem of blocking is not generated during synchronous processing, and the smoothness of service using data is guaranteed.
The embodiment of the present invention further provides a data format conversion apparatus, as described in the following embodiments. Because the principle of the device for solving the problems is similar to the data format conversion method, the implementation of the device can refer to the implementation of the data format conversion method, and repeated details are not repeated.
As shown in fig. 5, the apparatus includes a receiving module 501, a data reading module 502, a data cleansing module 503, a logic processing module 504, and a data importing module 505.
The receiving module 501 is configured to receive a data source address and a target address set by a user, a loader, a filter, a service class to be referred to, and an exporter with a target import rule set, where the loader is configured to read data in at least one specified format, and the target import rule defines that data is written to the target address in a format in which the target address stores the data.
The data reading module 502 is configured to start a loader thread to read target data from a data source address, and convert the target data into entity data.
And the data cleaning module 503 is configured to transmit the entity class data converted by the data reading module 502 into the filter, and start the filter thread to clean and filter the entity class data according to a preset data filtering rule.
The logic processing module 504 is configured to transmit the entity class data cleaned and filtered by the data cleaning module 503 into a service class, and start a service thread to perform logic processing on the cleaned and filtered entity class data according to a preset service processing logic.
And the data import module 505 is configured to start an exporter thread to read the entity class data logically processed by the logic processing module 504, and write the entity class data logically processed into the target address.
In one implementation manner of the embodiment of the present invention, the apparatus 500 further includes:
a configuration module 506, configured to add a service class, and inherit a service parent class using the added service class;
the configuration module 506 is further configured to receive a service processing logic set by a user, and write the service processing logic into an excute method of the inherited service class to obtain a service class that can be referred to.
In an implementation manner of the embodiment of the present invention, the logic processing module 504 is configured to:
when the service processing logic is identified as complex logic by a user, a service thread is utilized to execute an excute method, and the entity class data after being cleaned and filtered is cached to redis;
and taking out the cached data from the redis for logic processing.
In one implementation of an embodiment of the present invention,
the receiving module 501 is further configured to receive an atlas method of a service class rewritten by a user, where the atlas method is used to indicate that logic processing is performed after all data in a data source address is transmitted into the service class;
the logic processing module 504 is further configured to:
starting a service thread to execute an atlas method, taking out entity class data after cleaning and filtering from the redis and storing the entity class data into an atlas result until all data received from a data source address are processed and stored into the atlas result;
and taking out all entity class data after cleaning and filtering stored in the atlas result for logic processing.
In one implementation of an embodiment of the present invention,
aiming at target data read by a loader thread in a reading process, sequentially processing the target data according to the sequence of loader thread processing, filter thread processing, service thread processing and exporter thread processing;
and aiming at all target data, simultaneously executing a processing process by at least one loader thread, at least one filter thread, at least one service thread and at least one exporter thread.
In the embodiment of the invention, the data format conversion process can be realized by directly referencing the loader, the filter, the service class and the exporter component, the realization is simple, the development time is short, and developers do not need to concern the intermediate service logic; the data in various formats can be read only by the inheritance realization loader, the data can be converted into the data in the uniform format by the inheritance realization exporter, and the code logic is simple. Meanwhile, multithreading streaming processing is adopted, threads are adopted to process the whole processes of reading data, cleaning data, processing data and importing data respectively, large data volume processing is finished at the fastest speed under the condition of guaranteeing multithreading safety of data, the performance is efficient and stable, the problem of blocking is not generated during synchronous processing, and the smoothness of service using data is guaranteed.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the data format conversion method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the data format conversion method is stored in the computer-readable storage medium.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A method for converting a data format, the method comprising:
receiving a data source address and a target address set by a user, a loader, a filter and a service class required to be quoted, and an exporter set with a target import rule, wherein the loader is used for reading data in at least one specified format, and the target import rule defines that the data is written into the target address in a format of storing the data in the target address;
starting a loader thread to read target data from a data source address, and converting the target data into entity data;
transmitting the entity class data into a filter, starting a filter thread to clean and filter the entity class data according to a preset data filtering rule;
transmitting the entity class data after being cleaned and filtered into a service class, and starting a service thread to perform logic processing on the entity class data after being cleaned and filtered according to a preset service processing logic;
and starting an exporter thread to read the entity class data after the logic processing, and writing the entity class data after the logic processing into a target address.
2. The method of claim 1, wherein prior to receiving the user-defined service class to be referenced, the method further comprises:
newly adding a service class, and inheriting a service parent class by using the newly added service class;
and receiving a service processing logic set by a user, and writing the service processing logic into an excute method of the inherited service class to obtain the service class which can be referred.
3. The method of claim 1, wherein the step of starting a service thread to perform logic processing on the entity class data after being cleaned and filtered according to a preset service processing logic comprises the following steps:
when the service processing logic is identified as complex logic by a user, a service thread is utilized to execute an excute method, and the entity class data after being cleaned and filtered is cached to redis;
and taking out the cached data from the redis for logic processing.
4. The method of claim 3, further comprising:
receiving an atlas method of a service class rewritten by a user, wherein the atlas method is used for indicating that logic processing is carried out after all data in a data source address are transmitted into the service class;
after caching the entity class data after the cleaning filtering to the redis, the method further comprises:
starting a service thread to execute an atlas method, taking out entity class data after cleaning and filtering from the redis and storing the entity class data into an atlas result until all data received from a data source address are processed and stored into the atlas result;
and taking out all entity class data after cleaning and filtering stored in the atlas result for logic processing.
5. The method of claim 1,
aiming at target data read by a loader thread in a reading process, sequentially processing the target data according to the sequence of loader thread processing, filter thread processing, service thread processing and exporter thread processing;
and aiming at all target data, simultaneously executing a processing process by at least one loader thread, at least one filter thread, at least one service thread and at least one exporter thread.
6. An apparatus for converting a data format, the apparatus comprising:
the receiving module is used for receiving a data source address and a target address set by a user, a loader, a filter and a service class which need to be referred, and an exporter with a target import rule, wherein the loader is used for reading data in at least one specified format, and the target import rule defines that the data is written into the target address in a format of storing the data in the target address;
the data reading module is used for starting a loader thread to read target data from a data source address and converting the target data into entity data;
the data cleaning module is used for transmitting the entity class data converted by the data reading module into the filter, and starting a filter thread to clean and filter the entity class data according to a preset data filtering rule;
the logic processing module is used for transmitting the entity class data cleaned and filtered by the data cleaning module into a service class, and starting a service thread to perform logic processing on the cleaned and filtered entity class data according to a preset service processing logic;
and the data import module is used for starting an exporter thread to read the entity class data which is logically processed by the logic processing module and writing the entity class data which is logically processed into the target address.
7. The apparatus of claim 6, further comprising:
the configuration module is used for newly adding a service class and inheriting a service parent class by using the newly added service class;
the configuration module is further configured to receive a service processing logic set by a user, and write the service processing logic into an export method of the inherited service class to obtain a service class that can be referred to.
8. The apparatus of claim 6, wherein the logic processing module is configured to:
when the service processing logic is identified as complex logic by a user, a service thread is utilized to execute an excute method, and the entity class data after being cleaned and filtered is cached to redis;
and taking out the cached data from the redis for logic processing.
9. The apparatus of claim 8,
the receiving module is further used for receiving an atlas method of the service class rewritten by the user, wherein the atlas method is used for indicating that logic processing is performed after all data in the data source address are transmitted into the service class;
the logic processing module is further configured to:
starting a service thread to execute an atlas method, taking out entity class data after cleaning and filtering from the redis and storing the entity class data into an atlas result until all data received from a data source address are processed and stored into the atlas result;
and taking out all entity class data after cleaning and filtering stored in the atlas result for logic processing.
10. The device according to any one of claims 6 to 9,
aiming at target data read by a loader thread in a reading process, sequentially processing the target data according to the sequence of loader thread processing, filter thread processing, service thread processing and exporter thread processing;
and aiming at all target data, simultaneously executing a processing process by at least one loader thread, at least one filter thread, at least one service thread and at least one exporter thread.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 5.
CN202010905102.3A 2020-09-01 2020-09-01 Data format conversion method and device Pending CN112035459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010905102.3A CN112035459A (en) 2020-09-01 2020-09-01 Data format conversion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010905102.3A CN112035459A (en) 2020-09-01 2020-09-01 Data format conversion method and device

Publications (1)

Publication Number Publication Date
CN112035459A true CN112035459A (en) 2020-12-04

Family

ID=73590875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010905102.3A Pending CN112035459A (en) 2020-09-01 2020-09-01 Data format conversion method and device

Country Status (1)

Country Link
CN (1) CN112035459A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223541A (en) * 2022-06-21 2022-10-21 深圳市优必选科技股份有限公司 Text-to-speech processing method, device, equipment and storage medium
CN116303377A (en) * 2022-11-23 2023-06-23 南京视察者智能科技有限公司 Government affair data cleaning and filtering method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09282204A (en) * 1996-04-18 1997-10-31 Hokkaido Nippon Denki Software Kk Access device to multiple kinds of files
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
US20190102403A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Techniques for dynamic multi-storage format database access
CN110781231A (en) * 2019-09-19 2020-02-11 平安科技(深圳)有限公司 Batch import method, device, equipment and storage medium based on database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09282204A (en) * 1996-04-18 1997-10-31 Hokkaido Nippon Denki Software Kk Access device to multiple kinds of files
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
US20190102403A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Techniques for dynamic multi-storage format database access
CN110781231A (en) * 2019-09-19 2020-02-11 平安科技(深圳)有限公司 Batch import method, device, equipment and storage medium based on database

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IANLY梁炎: "使用Java8的stream对list数据去重,使用filter()过滤列表,list转map", pages 330, Retrieved from the Internet <URL:https://blog.csdn.net/ianly123/article/details/82658622> *
洪陆合;蔡建立;吴顺祥;: "基于第三方控件的数据可视化系统的设计与实现", 计算机工程与设计, no. 13 *
阳光的亮亮: "java Service实现类中处理复杂逻辑的一种写法", pages 1 - 2, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_39352976/article/details/104872831> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223541A (en) * 2022-06-21 2022-10-21 深圳市优必选科技股份有限公司 Text-to-speech processing method, device, equipment and storage medium
CN116303377A (en) * 2022-11-23 2023-06-23 南京视察者智能科技有限公司 Government affair data cleaning and filtering method

Similar Documents

Publication Publication Date Title
US20220335338A1 (en) Feature processing tradeoff management
US10713589B1 (en) Consistent sort-based record-level shuffling of machine learning data
US10339465B2 (en) Optimized decision tree based models
US9753751B2 (en) Dynamically loading graph-based computations
US10963810B2 (en) Efficient duplicate detection for machine learning data sets
US8775447B2 (en) Processing related datasets
JP6427592B2 (en) Manage data profiling operations related to data types
Barre et al. MapReduce for parallel trace validation of LTL properties
EP2633398A1 (en) Managing data set objects in a dataflow graph that represents a computer program
KR102201510B1 (en) Managing memory and storage space for a data operation
CN115292160B (en) Application testing
CN112035459A (en) Data format conversion method and device
Dörre et al. Modeling and optimizing MapReduce programs
CN110008236B (en) Data distributed type self-increment coding method, system, equipment and medium
JP2019521430A (en) Format-specific data processing operations
Papavasileiou et al. Ariadne: Online provenance for big graph analytics
Wimmer et al. Correctness issues of symbolic bisimulation computation for Markov chains
Hallé et al. MapReduce for parallel trace validation of LTL properties
US20180032929A1 (en) Risk-adaptive agile software development
US20230418792A1 (en) Method to track and clone data artifacts associated with distributed data processing pipelines
AU2016200107A1 (en) Dynamically loading graph-based computations
Huang et al. Incremental Causal Consistency Checking for Read-Write Memory Histories
CA3233392A1 (en) Automated modification of computer programs
Luu et al. Spark Streaming
CN118202331A (en) Automatic modification of computer programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination