CN116719866B

CN116719866B - Multi-format data self-adaptive distribution method and system

Info

Publication number: CN116719866B
Application number: CN202310518702.8A
Authority: CN
Inventors: 徐天南; 吴凯
Original assignee: Hainan Yinmancang Digital Technology Co ltd
Current assignee: Hainan Yinmancang Digital Technology Co ltd
Priority date: 2023-05-09
Filing date: 2023-05-09
Publication date: 2024-02-13
Anticipated expiration: 2043-05-09
Also published as: CN116719866A

Abstract

The embodiment of the invention discloses a multi-format data self-adaptive distribution method and a system, wherein the method comprises the following steps: receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching; preprocessing each piece of classified service data respectively to obtain standard format data; storing each piece of obtained standard format data; receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data; generating data adapting to a client format according to the read data reading code; and outputting the processed data adapting to the format of the client. The system comprises: the device comprises a classification module, a preprocessing module, a data storage module, a self-adaptive processing module and an output module. The invention can ensure that the system can process a large amount of data in the supply chain and provide accurate, reliable, safe and efficient services.

Description

Multi-format data self-adaptive distribution method and system

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and a system for adaptively distributing multi-format data.

Background

In the supply chain, the data may come from different data sources, which may differ in data quality. Because of the very large amount of data in the supply chain, processing such data can take a significant amount of time and resources. There is a need for systematic analysis of supply data.

Since the supply data comes from different sources, involving multiple business processes, there is a need to address the problem of data integration. If the data is improperly integrated, the data is inconsistent, repeated or missing and the like can be caused. Moreover, the supply data processing system requires the use of a variety of techniques, including databases, programming languages, data analysis tools, and the like. For non-professional users, these techniques can be overly complex and difficult to master, affecting the use and maintenance of the system. In addition, the supply data often contains sensitive information, such as transaction details and financial information, and therefore, security of the system needs to be ensured to protect such sensitive data from unauthorized access and attack.

Disclosure of Invention

In view of the above, an object of the embodiments of the present invention is to provide a multi-format data adaptive distribution method and system, which can ensure that the system can process a large amount of data in a supply chain and provide accurate, reliable, safe and efficient services.

In a first aspect, an embodiment of the present invention provides a method for adaptively distributing multi-format data, where the method includes:

and receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching.

And preprocessing each piece of classified service data respectively to obtain standard format data.

And storing each piece of obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data.

And generating data adapting to the format of the client according to the read data reading code.

And outputting the processed data adapting to the format of the client.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the receiving service data input by the service end, and performing data classification on the input service data by AI fuzzy matching includes:

samples of business data are collected from a third party platform.

A classification category of the business data is determined, the classification category including price, inventory, name, and brand.

And carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data.

And inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model.

And receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category.

With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the preprocessing, to each piece of classified service data, to obtain a standard data format includes:

according to the characteristics of the service data, each piece of service data is defined by using a mode language provided by Avro, and a data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.

And respectively combining the category of each piece of service data with the data type in the Avro format data.

And obtaining standard format data corresponding to each piece of classified service data.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the storing each piece of obtained standard format data, receiving a request from a client to read specific data, and reading a data reading code corresponding to the standard format data includes:

writing a data reading code for each piece of the preprocessed standard format data.

And storing each piece of data in the standard format and the corresponding data reading code.

With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data includes:

a request is received from a client to read specific data, the request including a selected feature.

And obtaining a data reading code of the standard format data corresponding to the selected feature.

With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the generating data adapted to a client format according to the read data reading code includes:

and selecting a corresponding output format according to different clients.

And selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.

The read standard format data is converted into a corresponding data structure and format using a corresponding library and method.

And processing the converted data structure and format to obtain data adapting to the format of the corresponding client.

In a second aspect, an embodiment of the present invention further provides a multi-format data adaptive distribution method, where the method includes:

and the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching.

And the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data.

And the data storage module is used for storing each piece of obtained standard format data.

The self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code.

And the output module is used for outputting the processed data adapting to the client format.

With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the classification module includes:

and the sample acquisition unit is used for acquiring samples of the service data from the third party platform.

And the category determining unit is used for determining the classification category of the business data, wherein the classification category comprises price, inventory, name and brand.

And the vectorization unit is used for carrying out vectorization preprocessing on the acquired samples of the service data and converting the samples into vector format data.

And the training unit is used for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model.

The class matching unit is used for receiving the service data input by the service end, inputting the service data into the trained AI model, and matching the service data with the classification class.

With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the preprocessing module includes:

and the mode definition unit is used for defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, and the data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.

And a category definition unit, configured to combine each category of the service data with a data type in the Avro format data.

And the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.

With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the data storage module includes:

and the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data.

And the code matching unit is used for storing each piece of standard format data and the corresponding data reading code.

And the feature extraction unit is used for receiving a request of the client for reading the specific data, wherein the request contains the selected features.

And the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.

With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the adaptive processing module includes:

and the output format selection unit is used for selecting a corresponding output format according to different clients.

And the standard format data reading unit is used for selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.

And the data conversion unit is used for converting the read standard format data into a corresponding data structure and format by using a corresponding library and method.

And the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.

The embodiment of the invention has the beneficial effects that:

the multi-format data self-adaptive distribution method and system are mainly used for processing information such as supply data, inventory data, sales data and the like. The required data is converted into standard format data after standardized processing, so that the problems of non-uniformity in data form and different data standards are solved, the standard format data is further stored in a preset database, extraction and pushing of target data are facilitated, and the data acquisition efficiency is improved;

according to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of the multi-format data adaptive distribution method of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations.

Referring to fig. 1, a first embodiment of the present invention provides a multi-format data adaptive distribution method, which includes:

And outputting the processed data adapting to the format of the client.

Specifically, the receiving the service data input by the service end and performing data classification on the input service data through AI fuzzy matching includes:

samples of business data are collected from a third party platform.

The machine model adopts a Support Vector Machine (SVM), a decision tree, naive Bayes and the like.

Specifically, the preprocessing is performed on each piece of classified service data to obtain a standard data format, which includes:

Specifically, the storing each piece of the obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data includes:

The code is written in a programming language such as Java, python, etc.

Specifically, the receiving the request of the client to read the specific data, and reading the data reading code corresponding to the standard format data includes:

Specifically, an application Python is provided, each piece of obtained standard format data is stored in a pandas library, a request of a client for reading specific data is received, and an example of a data reading code corresponding to the standard format data is read.

(1) Writing data reading codes for each piece of preprocessed standard format data, wherein the data reading codes comprise:

writing the standard format data into a CSV file by using a CSV module of Python, wherein the file name is data.csv, and the format of the standard format data is data= [ 'parameter name', 'default value', 'description', 'sequence', 'nickname', 'field name' ], wherein each standard format data comprises a field name, a default value, a document description of a field, a field sequence and a field alias;

and extracting a specified characteristic column from the data according to the field names and the field sequences in the data based on each piece of standard format data by using a pandas library in Python, and writing and obtaining a data reading code of each piece of standard format data.

(2) And forming a data set X by each piece of standard format data and the corresponding data reading code, and storing the data set X in a database.

(3) Receiving a request of a client for reading specific data, and obtaining a corresponding data reading code, wherein the request comprises the following steps:

a request of a client to read specific data is received by an instruction data=pd.read_csv ('processed_data.csv'), the request including a selected feature. Wherein the CSV file of processed_data.csv is a request to read specific data, the function reads the file by using the read_csv function in the pandas library.

After receiving a request of a client for reading specific data, obtaining a data reading code of the standard format data corresponding to the selected feature by a command selected_features= [ ' price ', ' stock ', ' title ', ' brand ]. Where 'price' represents price, 'stock' represents stock, 'title' represents name, 'brand' represents brand, and the data reading code of the standard format data containing the above features is read by the selected_features instruction.

By the instruction x=data [ selected_features ], a data set of data read codes containing selected features is output. Wherein data is an original data set and selected_features is a list containing names of selected features, and generating the data set by the instruction of reading codes of the data of the standard format corresponding to the selected features.

By means of an instruction x.to_csv ("selected_features.csv", index=false), a data set containing the data read code of the selected feature is exported to the CSV file. Wherein X is a data set to be saved as a CSV file; "selected_features. CSV" is the file name of the saved CSV file, which can be customized as needed; index=false means that the index of data set X is not saved to the CSV file, and if the parameter is not specified, the index is saved to the CSV file by default.

By setting the data reading codes for the standard format data, the process of data reading is more automatic and efficient, a large amount of data is automatically processed, a large amount of time and energy are saved, and the efficiency and accuracy of data processing are improved; the read data is ensured to be consistent with the format and the content of the original data, errors and deviations in the data processing process are reduced, and the condition that the data formats are not uniform or errors are avoided, so that the accuracy of data processing is improved; the data format and structure are clearer, the sharing and the exchange of the data are convenient, and for the scene of sharing the data among different systems, software and platforms, the process and the cost of the data exchange can be greatly simplified by setting the data reading codes for the standard format data; by setting the data reading code, the data can be read into a specific data structure, such as a pandas data frame, so that the data analysis and visualization are convenient, and the data analysis and interpretation are more convenient.

Specifically, the generating data adapting to the client format according to the read data reading code includes:

and selecting a corresponding output format according to different clients.

The output format includes JSON, XML, etc.

XML and JSON data can be read in Java by using Jackson, gson, dom J and other libraries; an ElementTree may be used in Python.

In Java, JSON data can be converted into Java objects by using libraries such as Jackson and Gson, and XML data can be converted into Java objects by using libraries such as XMLBeans, JAXB; the Pandas library may be used in Python to convert CSV data to DataFrame objects and the XML data to Python objects using the XML. Etre. Elementtree library.

In Java, java objects can be converted into JSON or XML formats by using libraries such as Jackson and Gson, and data can be written into Excel files by using libraries such as Apache POI; the DataFrame object can be converted to CSV format in Python using the Pandas library and Python object to XML format using the XML. Etre. Elementtree library.

A second embodiment of the present invention provides a multi-format data adaptive distribution method, including:

Specifically, the classification module includes:

Specifically, the preprocessing module includes:

Specifically, the data storage module includes:

The code is written in a programming language such as Java, python, etc.

Specifically, the adaptive processing module includes:

The output format includes JSON, XML, etc.

The embodiment of the invention aims to protect a multi-format data self-adaptive distribution method and a system, and has the following effects:

1. the method and the device convert the required data into the standard format data after standardized processing, solve the problems of non-uniform data form and different data standards, further store the standard format data into the preset database, facilitate the extraction and pushing of the target data, and improve the efficiency of data acquisition.

2. According to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.

3. The invention stores service data such as goods data, stock data, sales data and the like in a structured data form by forming standard format data, is convenient for the data to carry out operations such as quick inquiry, screening, sorting, aggregation and the like, can form table data at the same time, is visualized, and is convenient for efficiently processing and analyzing the data. Through feature extraction, specific features are normalized, standardized, encoded and the like, so that the problems of non-uniformity in data form and different data standards are solved, and the efficiency of data acquisition and storage is improved.

The computer program product of the multi-format data adaptive distribution method and apparatus provided in the embodiments of the present invention includes a computer readable storage medium storing program codes, and instructions included in the program codes may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.

In particular, the storage medium can be a general-purpose storage medium, such as a mobile disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the above-described multi-format data adaptive distribution method can be executed, so that the system can be ensured to process a large amount of data in a supply chain, and an accurate, reliable, safe, and efficient service can be provided.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for enabling a computer device to be a personal computer, a server, a network device or the like to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory, a ROM, a random access Memory Random Access Memory, a RAM, a magnetic disk, or an optical disk, etc., which can store program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-format data adaptive distribution method, comprising:

receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching;

preprocessing each piece of classified service data respectively to obtain standard format data;

storing each piece of obtained standard format data;

receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data;

generating data adapting to a client format according to the read data reading code;

outputting the processed data adapting to the format of the client;

the step of receiving the service data input by the service terminal and carrying out data classification on the input service data through AI fuzzy matching comprises the following steps:

collecting a sample of the business data from a third party platform;

determining a classification category of the business data, the classification category including price, inventory, name, and brand;

carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data;

inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model;

receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category;

the preprocessing is performed on each piece of classified service data respectively to obtain a standard data format, and the method comprises the following steps:

defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, wherein a data structure of the Avro format comprises a field name, a default value, a document description of the field, a sequence of the field and an alias of the field;

combining each category of the service data with a data type in the Avro format data;

2. The adaptive distribution method of multi-format data according to claim 1, wherein the storing and receiving a request for reading specific data from a client for each piece of the obtained standard-format data, and reading the data reading code corresponding to the standard-format data includes:

writing a data reading code for each piece of preprocessed standard format data;

3. The multi-format data adaptive distribution method according to claim 2, wherein the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data, comprises:

receiving a request of a client for reading specific data, wherein the request contains selected characteristics;

4. The multi-format data adaptive distribution method according to claim 3, wherein the generating data adapted to a client format according to the read data read code comprises:

selecting a corresponding output format according to different clients;

selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format;

converting the read standard format data into a corresponding data structure and format using a corresponding library and method;

5. A multi-format data adaptive distribution apparatus, comprising:

the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching;

the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data;

the data storage module is used for storing each piece of obtained standard format data;

the self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code;

the output module is used for outputting the processed data adapting to the format of the client;

the classification module comprises:

the sample acquisition unit is used for acquiring samples of service data from the third party platform;

a category determining unit, configured to determine a classification category of the service data, where the classification category includes a price, an inventory, a name, and a brand;

the vectorization unit is used for carrying out vectorization preprocessing on the collected samples of the service data and converting the samples into vector format data;

a training unit for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model

The preprocessing module comprises:

a mode definition unit, configured to define each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, where a data structure in the Avro format includes a field name, a default value, a document description of a field, an order of the field, and an alias of the field;

a category definition unit, configured to combine each category of the service data with a data type in the Avro format data;

6. The multi-format data adaptive distribution apparatus according to claim 5, wherein the data storage module comprises:

the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data;

the code matching unit is used for storing each piece of standard format data and the corresponding data reading code;

the feature extraction unit is used for receiving a request of a client for reading specific data, wherein the request contains selected features;

7. The multi-format data adaptive distribution apparatus according to claim 5, wherein the adaptive processing module comprises:

the output format selection unit is used for selecting a corresponding output format according to different clients;

a standard format data reading unit, configured to select a corresponding library and a method to read the data reading code according to a corresponding output format;

a data conversion unit for converting the read standard format data into a corresponding data structure and format using a corresponding library and method;