CN116719866B - Multi-format data self-adaptive distribution method and system - Google Patents
Multi-format data self-adaptive distribution method and system Download PDFInfo
- Publication number
- CN116719866B CN116719866B CN202310518702.8A CN202310518702A CN116719866B CN 116719866 B CN116719866 B CN 116719866B CN 202310518702 A CN202310518702 A CN 202310518702A CN 116719866 B CN116719866 B CN 116719866B
- Authority
- CN
- China
- Prior art keywords
- data
- format
- service
- reading
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000007781 pre-processing Methods 0.000 claims abstract description 23
- 238000013500 data storage Methods 0.000 claims abstract description 7
- 230000003044 adaptive effect Effects 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 7
- 208000025174 PANDAS Diseases 0.000 description 8
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 8
- 240000004718 Panda Species 0.000 description 8
- 235000016496 Panda oleosa Nutrition 0.000 description 8
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a multi-format data self-adaptive distribution method and a system, wherein the method comprises the following steps: receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching; preprocessing each piece of classified service data respectively to obtain standard format data; storing each piece of obtained standard format data; receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data; generating data adapting to a client format according to the read data reading code; and outputting the processed data adapting to the format of the client. The system comprises: the device comprises a classification module, a preprocessing module, a data storage module, a self-adaptive processing module and an output module. The invention can ensure that the system can process a large amount of data in the supply chain and provide accurate, reliable, safe and efficient services.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and a system for adaptively distributing multi-format data.
Background
In the supply chain, the data may come from different data sources, which may differ in data quality. Because of the very large amount of data in the supply chain, processing such data can take a significant amount of time and resources. There is a need for systematic analysis of supply data.
Since the supply data comes from different sources, involving multiple business processes, there is a need to address the problem of data integration. If the data is improperly integrated, the data is inconsistent, repeated or missing and the like can be caused. Moreover, the supply data processing system requires the use of a variety of techniques, including databases, programming languages, data analysis tools, and the like. For non-professional users, these techniques can be overly complex and difficult to master, affecting the use and maintenance of the system. In addition, the supply data often contains sensitive information, such as transaction details and financial information, and therefore, security of the system needs to be ensured to protect such sensitive data from unauthorized access and attack.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a multi-format data adaptive distribution method and system, which can ensure that the system can process a large amount of data in a supply chain and provide accurate, reliable, safe and efficient services.
In a first aspect, an embodiment of the present invention provides a method for adaptively distributing multi-format data, where the method includes:
and receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching.
And preprocessing each piece of classified service data respectively to obtain standard format data.
And storing each piece of obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data.
And generating data adapting to the format of the client according to the read data reading code.
And outputting the processed data adapting to the format of the client.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the receiving service data input by the service end, and performing data classification on the input service data by AI fuzzy matching includes:
samples of business data are collected from a third party platform.
A classification category of the business data is determined, the classification category including price, inventory, name, and brand.
And carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data.
And inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model.
And receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the preprocessing, to each piece of classified service data, to obtain a standard data format includes:
according to the characteristics of the service data, each piece of service data is defined by using a mode language provided by Avro, and a data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And respectively combining the category of each piece of service data with the data type in the Avro format data.
And obtaining standard format data corresponding to each piece of classified service data.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the storing each piece of obtained standard format data, receiving a request from a client to read specific data, and reading a data reading code corresponding to the standard format data includes:
writing a data reading code for each piece of the preprocessed standard format data.
And storing each piece of data in the standard format and the corresponding data reading code.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data includes:
a request is received from a client to read specific data, the request including a selected feature.
And obtaining a data reading code of the standard format data corresponding to the selected feature.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the generating data adapted to a client format according to the read data reading code includes:
and selecting a corresponding output format according to different clients.
And selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
The read standard format data is converted into a corresponding data structure and format using a corresponding library and method.
And processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In a second aspect, an embodiment of the present invention further provides a multi-format data adaptive distribution method, where the method includes:
and the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching.
And the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data.
And the data storage module is used for storing each piece of obtained standard format data.
The self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code.
And the output module is used for outputting the processed data adapting to the client format.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the classification module includes:
and the sample acquisition unit is used for acquiring samples of the service data from the third party platform.
And the category determining unit is used for determining the classification category of the business data, wherein the classification category comprises price, inventory, name and brand.
And the vectorization unit is used for carrying out vectorization preprocessing on the acquired samples of the service data and converting the samples into vector format data.
And the training unit is used for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model.
The class matching unit is used for receiving the service data input by the service end, inputting the service data into the trained AI model, and matching the service data with the classification class.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the preprocessing module includes:
and the mode definition unit is used for defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, and the data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And a category definition unit, configured to combine each category of the service data with a data type in the Avro format data.
And the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the data storage module includes:
and the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data.
And the code matching unit is used for storing each piece of standard format data and the corresponding data reading code.
And the feature extraction unit is used for receiving a request of the client for reading the specific data, wherein the request contains the selected features.
And the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the adaptive processing module includes:
and the output format selection unit is used for selecting a corresponding output format according to different clients.
And the standard format data reading unit is used for selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
And the data conversion unit is used for converting the read standard format data into a corresponding data structure and format by using a corresponding library and method.
And the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
The embodiment of the invention has the beneficial effects that:
the multi-format data self-adaptive distribution method and system are mainly used for processing information such as supply data, inventory data, sales data and the like. The required data is converted into standard format data after standardized processing, so that the problems of non-uniformity in data form and different data standards are solved, the standard format data is further stored in a preset database, extraction and pushing of target data are facilitated, and the data acquisition efficiency is improved;
according to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of the multi-format data adaptive distribution method of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations.
Referring to fig. 1, a first embodiment of the present invention provides a multi-format data adaptive distribution method, which includes:
and receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching.
And preprocessing each piece of classified service data respectively to obtain standard format data.
And storing each piece of obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data.
And generating data adapting to the format of the client according to the read data reading code.
And outputting the processed data adapting to the format of the client.
Specifically, the receiving the service data input by the service end and performing data classification on the input service data through AI fuzzy matching includes:
samples of business data are collected from a third party platform.
A classification category of the business data is determined, the classification category including price, inventory, name, and brand.
And carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data.
And inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model.
The machine model adopts a Support Vector Machine (SVM), a decision tree, naive Bayes and the like.
And receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category.
Specifically, the preprocessing is performed on each piece of classified service data to obtain a standard data format, which includes:
according to the characteristics of the service data, each piece of service data is defined by using a mode language provided by Avro, and a data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And respectively combining the category of each piece of service data with the data type in the Avro format data.
And obtaining standard format data corresponding to each piece of classified service data.
Specifically, the storing each piece of the obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data includes:
writing a data reading code for each piece of the preprocessed standard format data.
The code is written in a programming language such as Java, python, etc.
And storing each piece of data in the standard format and the corresponding data reading code.
Specifically, the receiving the request of the client to read the specific data, and reading the data reading code corresponding to the standard format data includes:
a request is received from a client to read specific data, the request including a selected feature.
And obtaining a data reading code of the standard format data corresponding to the selected feature.
Specifically, an application Python is provided, each piece of obtained standard format data is stored in a pandas library, a request of a client for reading specific data is received, and an example of a data reading code corresponding to the standard format data is read.
(1) Writing data reading codes for each piece of preprocessed standard format data, wherein the data reading codes comprise:
writing the standard format data into a CSV file by using a CSV module of Python, wherein the file name is data.csv, and the format of the standard format data is data= [ 'parameter name', 'default value', 'description', 'sequence', 'nickname', 'field name' ], wherein each standard format data comprises a field name, a default value, a document description of a field, a field sequence and a field alias;
and extracting a specified characteristic column from the data according to the field names and the field sequences in the data based on each piece of standard format data by using a pandas library in Python, and writing and obtaining a data reading code of each piece of standard format data.
(2) And forming a data set X by each piece of standard format data and the corresponding data reading code, and storing the data set X in a database.
(3) Receiving a request of a client for reading specific data, and obtaining a corresponding data reading code, wherein the request comprises the following steps:
a request of a client to read specific data is received by an instruction data=pd.read_csv ('processed_data.csv'), the request including a selected feature. Wherein the CSV file of processed_data.csv is a request to read specific data, the function reads the file by using the read_csv function in the pandas library.
After receiving a request of a client for reading specific data, obtaining a data reading code of the standard format data corresponding to the selected feature by a command selected_features= [ ' price ', ' stock ', ' title ', ' brand ]. Where 'price' represents price, 'stock' represents stock, 'title' represents name, 'brand' represents brand, and the data reading code of the standard format data containing the above features is read by the selected_features instruction.
By the instruction x=data [ selected_features ], a data set of data read codes containing selected features is output. Wherein data is an original data set and selected_features is a list containing names of selected features, and generating the data set by the instruction of reading codes of the data of the standard format corresponding to the selected features.
By means of an instruction x.to_csv ("selected_features.csv", index=false), a data set containing the data read code of the selected feature is exported to the CSV file. Wherein X is a data set to be saved as a CSV file; "selected_features. CSV" is the file name of the saved CSV file, which can be customized as needed; index=false means that the index of data set X is not saved to the CSV file, and if the parameter is not specified, the index is saved to the CSV file by default.
By setting the data reading codes for the standard format data, the process of data reading is more automatic and efficient, a large amount of data is automatically processed, a large amount of time and energy are saved, and the efficiency and accuracy of data processing are improved; the read data is ensured to be consistent with the format and the content of the original data, errors and deviations in the data processing process are reduced, and the condition that the data formats are not uniform or errors are avoided, so that the accuracy of data processing is improved; the data format and structure are clearer, the sharing and the exchange of the data are convenient, and for the scene of sharing the data among different systems, software and platforms, the process and the cost of the data exchange can be greatly simplified by setting the data reading codes for the standard format data; by setting the data reading code, the data can be read into a specific data structure, such as a pandas data frame, so that the data analysis and visualization are convenient, and the data analysis and interpretation are more convenient.
Specifically, the generating data adapting to the client format according to the read data reading code includes:
and selecting a corresponding output format according to different clients.
The output format includes JSON, XML, etc.
And selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
XML and JSON data can be read in Java by using Jackson, gson, dom J and other libraries; an ElementTree may be used in Python.
The read standard format data is converted into a corresponding data structure and format using a corresponding library and method.
In Java, JSON data can be converted into Java objects by using libraries such as Jackson and Gson, and XML data can be converted into Java objects by using libraries such as XMLBeans, JAXB; the Pandas library may be used in Python to convert CSV data to DataFrame objects and the XML data to Python objects using the XML. Etre. Elementtree library.
And processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In Java, java objects can be converted into JSON or XML formats by using libraries such as Jackson and Gson, and data can be written into Excel files by using libraries such as Apache POI; the DataFrame object can be converted to CSV format in Python using the Pandas library and Python object to XML format using the XML. Etre. Elementtree library.
A second embodiment of the present invention provides a multi-format data adaptive distribution method, including:
and the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching.
And the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data.
And the data storage module is used for storing each piece of obtained standard format data.
The self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code.
And the output module is used for outputting the processed data adapting to the client format.
Specifically, the classification module includes:
and the sample acquisition unit is used for acquiring samples of the service data from the third party platform.
And the category determining unit is used for determining the classification category of the business data, wherein the classification category comprises price, inventory, name and brand.
And the vectorization unit is used for carrying out vectorization preprocessing on the acquired samples of the service data and converting the samples into vector format data.
And the training unit is used for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model.
The machine model adopts a Support Vector Machine (SVM), a decision tree, naive Bayes and the like.
The class matching unit is used for receiving the service data input by the service end, inputting the service data into the trained AI model, and matching the service data with the classification class.
Specifically, the preprocessing module includes:
and the mode definition unit is used for defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, and the data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And a category definition unit, configured to combine each category of the service data with a data type in the Avro format data.
And the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
Specifically, the data storage module includes:
and the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data.
The code is written in a programming language such as Java, python, etc.
And the code matching unit is used for storing each piece of standard format data and the corresponding data reading code.
And the feature extraction unit is used for receiving a request of the client for reading the specific data, wherein the request contains the selected features.
And the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
Specifically, the adaptive processing module includes:
and the output format selection unit is used for selecting a corresponding output format according to different clients.
The output format includes JSON, XML, etc.
And the standard format data reading unit is used for selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
XML and JSON data can be read in Java by using Jackson, gson, dom J and other libraries; an ElementTree may be used in Python.
And the data conversion unit is used for converting the read standard format data into a corresponding data structure and format by using a corresponding library and method.
In Java, JSON data can be converted into Java objects by using libraries such as Jackson and Gson, and XML data can be converted into Java objects by using libraries such as XMLBeans, JAXB; the Pandas library may be used in Python to convert CSV data to DataFrame objects and the XML data to Python objects using the XML. Etre. Elementtree library.
And the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In Java, java objects can be converted into JSON or XML formats by using libraries such as Jackson and Gson, and data can be written into Excel files by using libraries such as Apache POI; the DataFrame object can be converted to CSV format in Python using the Pandas library and Python object to XML format using the XML. Etre. Elementtree library.
The embodiment of the invention aims to protect a multi-format data self-adaptive distribution method and a system, and has the following effects:
1. the method and the device convert the required data into the standard format data after standardized processing, solve the problems of non-uniform data form and different data standards, further store the standard format data into the preset database, facilitate the extraction and pushing of the target data, and improve the efficiency of data acquisition.
2. According to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.
3. The invention stores service data such as goods data, stock data, sales data and the like in a structured data form by forming standard format data, is convenient for the data to carry out operations such as quick inquiry, screening, sorting, aggregation and the like, can form table data at the same time, is visualized, and is convenient for efficiently processing and analyzing the data. Through feature extraction, specific features are normalized, standardized, encoded and the like, so that the problems of non-uniformity in data form and different data standards are solved, and the efficiency of data acquisition and storage is improved.
The computer program product of the multi-format data adaptive distribution method and apparatus provided in the embodiments of the present invention includes a computer readable storage medium storing program codes, and instructions included in the program codes may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.
In particular, the storage medium can be a general-purpose storage medium, such as a mobile disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the above-described multi-format data adaptive distribution method can be executed, so that the system can be ensured to process a large amount of data in a supply chain, and an accurate, reliable, safe, and efficient service can be provided.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for enabling a computer device to be a personal computer, a server, a network device or the like to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory, a ROM, a random access Memory Random Access Memory, a RAM, a magnetic disk, or an optical disk, etc., which can store program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (7)
1. A multi-format data adaptive distribution method, comprising:
receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching;
preprocessing each piece of classified service data respectively to obtain standard format data;
storing each piece of obtained standard format data;
receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data;
generating data adapting to a client format according to the read data reading code;
outputting the processed data adapting to the format of the client;
the step of receiving the service data input by the service terminal and carrying out data classification on the input service data through AI fuzzy matching comprises the following steps:
collecting a sample of the business data from a third party platform;
determining a classification category of the business data, the classification category including price, inventory, name, and brand;
carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data;
inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model;
receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category;
the preprocessing is performed on each piece of classified service data respectively to obtain a standard data format, and the method comprises the following steps:
defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, wherein a data structure of the Avro format comprises a field name, a default value, a document description of the field, a sequence of the field and an alias of the field;
combining each category of the service data with a data type in the Avro format data;
and obtaining standard format data corresponding to each piece of classified service data.
2. The adaptive distribution method of multi-format data according to claim 1, wherein the storing and receiving a request for reading specific data from a client for each piece of the obtained standard-format data, and reading the data reading code corresponding to the standard-format data includes:
writing a data reading code for each piece of preprocessed standard format data;
and storing each piece of data in the standard format and the corresponding data reading code.
3. The multi-format data adaptive distribution method according to claim 2, wherein the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data, comprises:
receiving a request of a client for reading specific data, wherein the request contains selected characteristics;
and obtaining a data reading code of the standard format data corresponding to the selected feature.
4. The multi-format data adaptive distribution method according to claim 3, wherein the generating data adapted to a client format according to the read data read code comprises:
selecting a corresponding output format according to different clients;
selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format;
converting the read standard format data into a corresponding data structure and format using a corresponding library and method;
and processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
5. A multi-format data adaptive distribution apparatus, comprising:
the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching;
the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data;
the data storage module is used for storing each piece of obtained standard format data;
the self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code;
the output module is used for outputting the processed data adapting to the format of the client;
the classification module comprises:
the sample acquisition unit is used for acquiring samples of service data from the third party platform;
a category determining unit, configured to determine a classification category of the service data, where the classification category includes a price, an inventory, a name, and a brand;
the vectorization unit is used for carrying out vectorization preprocessing on the collected samples of the service data and converting the samples into vector format data;
a training unit for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model
The preprocessing module comprises:
a mode definition unit, configured to define each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, where a data structure in the Avro format includes a field name, a default value, a document description of a field, an order of the field, and an alias of the field;
a category definition unit, configured to combine each category of the service data with a data type in the Avro format data;
and the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
6. The multi-format data adaptive distribution apparatus according to claim 5, wherein the data storage module comprises:
the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data;
the code matching unit is used for storing each piece of standard format data and the corresponding data reading code;
the feature extraction unit is used for receiving a request of a client for reading specific data, wherein the request contains selected features;
and the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
7. The multi-format data adaptive distribution apparatus according to claim 5, wherein the adaptive processing module comprises:
the output format selection unit is used for selecting a corresponding output format according to different clients;
a standard format data reading unit, configured to select a corresponding library and a method to read the data reading code according to a corresponding output format;
a data conversion unit for converting the read standard format data into a corresponding data structure and format using a corresponding library and method;
and the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310518702.8A CN116719866B (en) | 2023-05-09 | 2023-05-09 | Multi-format data self-adaptive distribution method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310518702.8A CN116719866B (en) | 2023-05-09 | 2023-05-09 | Multi-format data self-adaptive distribution method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116719866A CN116719866A (en) | 2023-09-08 |
CN116719866B true CN116719866B (en) | 2024-02-13 |
Family
ID=87874180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310518702.8A Active CN116719866B (en) | 2023-05-09 | 2023-05-09 | Multi-format data self-adaptive distribution method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116719866B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546399A (en) * | 2008-03-28 | 2009-09-30 | 精工爱普生株式会社 | Voucher data management system and method for controlling voucher data management system |
CN102968306A (en) * | 2012-11-29 | 2013-03-13 | 广东全通教育股份有限公司 | Method and system for automatically generating code based on data model drive |
CN103491135A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Device and method for conducting self-matching on data formats |
CN107957889A (en) * | 2016-10-17 | 2018-04-24 | 阿里巴巴集团控股有限公司 | Processing method, device, client, server and the system of product configuration data |
CN109358845A (en) * | 2017-12-27 | 2019-02-19 | 广州Tcl智能家居科技有限公司 | Method, tool and the storage medium of JS code are write based on XMPP protocol |
CN110597816A (en) * | 2019-09-17 | 2019-12-20 | 深圳追一科技有限公司 | Data processing method, data processing device, computer equipment and computer readable storage medium |
CN112214453A (en) * | 2020-09-14 | 2021-01-12 | 上海微亿智造科技有限公司 | Large-scale industrial data compression storage method, system and medium |
CN112235311A (en) * | 2020-10-20 | 2021-01-15 | 网络通信与安全紫金山实验室 | OVSDB client code automatic generation method, system, device and medium |
CN113010503A (en) * | 2021-03-01 | 2021-06-22 | 广州智筑信息技术有限公司 | Engineering cost data intelligent analysis method and system based on deep learning |
CN113626512A (en) * | 2021-08-17 | 2021-11-09 | 未鲲(上海)科技服务有限公司 | Data processing method, device, equipment and readable storage medium |
CA3142409A1 (en) * | 2020-12-19 | 2022-06-19 | The Toronto-Dominion Bank | Real-time prediction of parameter modifications based on structured messaging data |
CN114792145A (en) * | 2022-05-27 | 2022-07-26 | 中国标准化研究院 | Standard digital management maintenance system and method based on knowledge graph |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102551601B1 (en) * | 2018-02-05 | 2023-07-06 | 한국전자통신연구원 | Storage server and adaptable prefetching method performed by the storage server in distributed file system |
-
2023
- 2023-05-09 CN CN202310518702.8A patent/CN116719866B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546399A (en) * | 2008-03-28 | 2009-09-30 | 精工爱普生株式会社 | Voucher data management system and method for controlling voucher data management system |
CN102968306A (en) * | 2012-11-29 | 2013-03-13 | 广东全通教育股份有限公司 | Method and system for automatically generating code based on data model drive |
CN103491135A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Device and method for conducting self-matching on data formats |
CN107957889A (en) * | 2016-10-17 | 2018-04-24 | 阿里巴巴集团控股有限公司 | Processing method, device, client, server and the system of product configuration data |
CN109358845A (en) * | 2017-12-27 | 2019-02-19 | 广州Tcl智能家居科技有限公司 | Method, tool and the storage medium of JS code are write based on XMPP protocol |
CN110597816A (en) * | 2019-09-17 | 2019-12-20 | 深圳追一科技有限公司 | Data processing method, data processing device, computer equipment and computer readable storage medium |
CN112214453A (en) * | 2020-09-14 | 2021-01-12 | 上海微亿智造科技有限公司 | Large-scale industrial data compression storage method, system and medium |
CN112235311A (en) * | 2020-10-20 | 2021-01-15 | 网络通信与安全紫金山实验室 | OVSDB client code automatic generation method, system, device and medium |
CA3142409A1 (en) * | 2020-12-19 | 2022-06-19 | The Toronto-Dominion Bank | Real-time prediction of parameter modifications based on structured messaging data |
CN113010503A (en) * | 2021-03-01 | 2021-06-22 | 广州智筑信息技术有限公司 | Engineering cost data intelligent analysis method and system based on deep learning |
CN113626512A (en) * | 2021-08-17 | 2021-11-09 | 未鲲(上海)科技服务有限公司 | Data processing method, device, equipment and readable storage medium |
CN114792145A (en) * | 2022-05-27 | 2022-07-26 | 中国标准化研究院 | Standard digital management maintenance system and method based on knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN116719866A (en) | 2023-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106844372B (en) | Logistics information query method and device | |
CN110418196B (en) | Video generation method and device and server | |
CN110659318A (en) | Big data based strategy pushing method and system and computer equipment | |
CN110798567A (en) | Short message classification display method and device, storage medium and electronic equipment | |
CN111581193A (en) | Data processing method, device, computer system and storage medium | |
CN113627168A (en) | Method, device, medium and equipment for checking component packaging conflict | |
CN111582314A (en) | Target user determination method and device and electronic equipment | |
CN116719866B (en) | Multi-format data self-adaptive distribution method and system | |
CN117271478A (en) | Data migration method and device, storage medium and electronic equipment | |
CN113205130A (en) | Data auditing method and device, electronic equipment and storage medium | |
CN115756486A (en) | Data interface analysis method and device | |
CN116228265A (en) | Invoice risk identification method, device and equipment | |
US11562555B2 (en) | Methods, systems, articles of manufacture, and apparatus to extract shape features based on a structural angle template | |
CN115409104A (en) | Method, apparatus, device, medium and program product for identifying object type | |
CN114743012A (en) | Text recognition method and device | |
CN113379499A (en) | Article screening method and apparatus, electronic device, and storage medium | |
CN117112846B (en) | Multi-information source license information management method, system and medium | |
CN116610679B (en) | json data analysis method, json data analysis device, computer equipment and computer medium | |
CN110597967B (en) | Order positioning method and equipment | |
CN115576934A (en) | Data management method and device and computer readable storage medium | |
CN117668227A (en) | Method, system, equipment and medium for auditing warranty text | |
CN118114982A (en) | Enterprise risk conduction prediction method, system and medium based on graph characteristics | |
CN113627136A (en) | Component recommendation method, device and system | |
CN117455688A (en) | Investment object screening method and device, storage medium and electronic device | |
CN116384390A (en) | Text labeling method and device, processor and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |