CN116719866B - Multi-format data self-adaptive distribution method and system - Google Patents

Multi-format data self-adaptive distribution method and system Download PDF

Info

Publication number
CN116719866B
CN116719866B CN202310518702.8A CN202310518702A CN116719866B CN 116719866 B CN116719866 B CN 116719866B CN 202310518702 A CN202310518702 A CN 202310518702A CN 116719866 B CN116719866 B CN 116719866B
Authority
CN
China
Prior art keywords
data
format
service
reading
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310518702.8A
Other languages
Chinese (zh)
Other versions
CN116719866A (en
Inventor
徐天南
吴凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainan Yinmancang Digital Technology Co ltd
Original Assignee
Hainan Yinmancang Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan Yinmancang Digital Technology Co ltd filed Critical Hainan Yinmancang Digital Technology Co ltd
Priority to CN202310518702.8A priority Critical patent/CN116719866B/en
Publication of CN116719866A publication Critical patent/CN116719866A/en
Application granted granted Critical
Publication of CN116719866B publication Critical patent/CN116719866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a multi-format data self-adaptive distribution method and a system, wherein the method comprises the following steps: receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching; preprocessing each piece of classified service data respectively to obtain standard format data; storing each piece of obtained standard format data; receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data; generating data adapting to a client format according to the read data reading code; and outputting the processed data adapting to the format of the client. The system comprises: the device comprises a classification module, a preprocessing module, a data storage module, a self-adaptive processing module and an output module. The invention can ensure that the system can process a large amount of data in the supply chain and provide accurate, reliable, safe and efficient services.

Description

Multi-format data self-adaptive distribution method and system
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and a system for adaptively distributing multi-format data.
Background
In the supply chain, the data may come from different data sources, which may differ in data quality. Because of the very large amount of data in the supply chain, processing such data can take a significant amount of time and resources. There is a need for systematic analysis of supply data.
Since the supply data comes from different sources, involving multiple business processes, there is a need to address the problem of data integration. If the data is improperly integrated, the data is inconsistent, repeated or missing and the like can be caused. Moreover, the supply data processing system requires the use of a variety of techniques, including databases, programming languages, data analysis tools, and the like. For non-professional users, these techniques can be overly complex and difficult to master, affecting the use and maintenance of the system. In addition, the supply data often contains sensitive information, such as transaction details and financial information, and therefore, security of the system needs to be ensured to protect such sensitive data from unauthorized access and attack.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a multi-format data adaptive distribution method and system, which can ensure that the system can process a large amount of data in a supply chain and provide accurate, reliable, safe and efficient services.
In a first aspect, an embodiment of the present invention provides a method for adaptively distributing multi-format data, where the method includes:
and receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching.
And preprocessing each piece of classified service data respectively to obtain standard format data.
And storing each piece of obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data.
And generating data adapting to the format of the client according to the read data reading code.
And outputting the processed data adapting to the format of the client.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the receiving service data input by the service end, and performing data classification on the input service data by AI fuzzy matching includes:
samples of business data are collected from a third party platform.
A classification category of the business data is determined, the classification category including price, inventory, name, and brand.
And carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data.
And inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model.
And receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the preprocessing, to each piece of classified service data, to obtain a standard data format includes:
according to the characteristics of the service data, each piece of service data is defined by using a mode language provided by Avro, and a data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And respectively combining the category of each piece of service data with the data type in the Avro format data.
And obtaining standard format data corresponding to each piece of classified service data.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the storing each piece of obtained standard format data, receiving a request from a client to read specific data, and reading a data reading code corresponding to the standard format data includes:
writing a data reading code for each piece of the preprocessed standard format data.
And storing each piece of data in the standard format and the corresponding data reading code.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data includes:
a request is received from a client to read specific data, the request including a selected feature.
And obtaining a data reading code of the standard format data corresponding to the selected feature.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the generating data adapted to a client format according to the read data reading code includes:
and selecting a corresponding output format according to different clients.
And selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
The read standard format data is converted into a corresponding data structure and format using a corresponding library and method.
And processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In a second aspect, an embodiment of the present invention further provides a multi-format data adaptive distribution method, where the method includes:
and the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching.
And the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data.
And the data storage module is used for storing each piece of obtained standard format data.
The self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code.
And the output module is used for outputting the processed data adapting to the client format.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the classification module includes:
and the sample acquisition unit is used for acquiring samples of the service data from the third party platform.
And the category determining unit is used for determining the classification category of the business data, wherein the classification category comprises price, inventory, name and brand.
And the vectorization unit is used for carrying out vectorization preprocessing on the acquired samples of the service data and converting the samples into vector format data.
And the training unit is used for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model.
The class matching unit is used for receiving the service data input by the service end, inputting the service data into the trained AI model, and matching the service data with the classification class.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the preprocessing module includes:
and the mode definition unit is used for defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, and the data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And a category definition unit, configured to combine each category of the service data with a data type in the Avro format data.
And the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the data storage module includes:
and the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data.
And the code matching unit is used for storing each piece of standard format data and the corresponding data reading code.
And the feature extraction unit is used for receiving a request of the client for reading the specific data, wherein the request contains the selected features.
And the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the adaptive processing module includes:
and the output format selection unit is used for selecting a corresponding output format according to different clients.
And the standard format data reading unit is used for selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
And the data conversion unit is used for converting the read standard format data into a corresponding data structure and format by using a corresponding library and method.
And the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
The embodiment of the invention has the beneficial effects that:
the multi-format data self-adaptive distribution method and system are mainly used for processing information such as supply data, inventory data, sales data and the like. The required data is converted into standard format data after standardized processing, so that the problems of non-uniformity in data form and different data standards are solved, the standard format data is further stored in a preset database, extraction and pushing of target data are facilitated, and the data acquisition efficiency is improved;
according to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of the multi-format data adaptive distribution method of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations.
Referring to fig. 1, a first embodiment of the present invention provides a multi-format data adaptive distribution method, which includes:
and receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching.
And preprocessing each piece of classified service data respectively to obtain standard format data.
And storing each piece of obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data.
And generating data adapting to the format of the client according to the read data reading code.
And outputting the processed data adapting to the format of the client.
Specifically, the receiving the service data input by the service end and performing data classification on the input service data through AI fuzzy matching includes:
samples of business data are collected from a third party platform.
A classification category of the business data is determined, the classification category including price, inventory, name, and brand.
And carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data.
And inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model.
The machine model adopts a Support Vector Machine (SVM), a decision tree, naive Bayes and the like.
And receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category.
Specifically, the preprocessing is performed on each piece of classified service data to obtain a standard data format, which includes:
according to the characteristics of the service data, each piece of service data is defined by using a mode language provided by Avro, and a data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And respectively combining the category of each piece of service data with the data type in the Avro format data.
And obtaining standard format data corresponding to each piece of classified service data.
Specifically, the storing each piece of the obtained standard format data, receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data includes:
writing a data reading code for each piece of the preprocessed standard format data.
The code is written in a programming language such as Java, python, etc.
And storing each piece of data in the standard format and the corresponding data reading code.
Specifically, the receiving the request of the client to read the specific data, and reading the data reading code corresponding to the standard format data includes:
a request is received from a client to read specific data, the request including a selected feature.
And obtaining a data reading code of the standard format data corresponding to the selected feature.
Specifically, an application Python is provided, each piece of obtained standard format data is stored in a pandas library, a request of a client for reading specific data is received, and an example of a data reading code corresponding to the standard format data is read.
(1) Writing data reading codes for each piece of preprocessed standard format data, wherein the data reading codes comprise:
writing the standard format data into a CSV file by using a CSV module of Python, wherein the file name is data.csv, and the format of the standard format data is data= [ 'parameter name', 'default value', 'description', 'sequence', 'nickname', 'field name' ], wherein each standard format data comprises a field name, a default value, a document description of a field, a field sequence and a field alias;
and extracting a specified characteristic column from the data according to the field names and the field sequences in the data based on each piece of standard format data by using a pandas library in Python, and writing and obtaining a data reading code of each piece of standard format data.
(2) And forming a data set X by each piece of standard format data and the corresponding data reading code, and storing the data set X in a database.
(3) Receiving a request of a client for reading specific data, and obtaining a corresponding data reading code, wherein the request comprises the following steps:
a request of a client to read specific data is received by an instruction data=pd.read_csv ('processed_data.csv'), the request including a selected feature. Wherein the CSV file of processed_data.csv is a request to read specific data, the function reads the file by using the read_csv function in the pandas library.
After receiving a request of a client for reading specific data, obtaining a data reading code of the standard format data corresponding to the selected feature by a command selected_features= [ ' price ', ' stock ', ' title ', ' brand ]. Where 'price' represents price, 'stock' represents stock, 'title' represents name, 'brand' represents brand, and the data reading code of the standard format data containing the above features is read by the selected_features instruction.
By the instruction x=data [ selected_features ], a data set of data read codes containing selected features is output. Wherein data is an original data set and selected_features is a list containing names of selected features, and generating the data set by the instruction of reading codes of the data of the standard format corresponding to the selected features.
By means of an instruction x.to_csv ("selected_features.csv", index=false), a data set containing the data read code of the selected feature is exported to the CSV file. Wherein X is a data set to be saved as a CSV file; "selected_features. CSV" is the file name of the saved CSV file, which can be customized as needed; index=false means that the index of data set X is not saved to the CSV file, and if the parameter is not specified, the index is saved to the CSV file by default.
By setting the data reading codes for the standard format data, the process of data reading is more automatic and efficient, a large amount of data is automatically processed, a large amount of time and energy are saved, and the efficiency and accuracy of data processing are improved; the read data is ensured to be consistent with the format and the content of the original data, errors and deviations in the data processing process are reduced, and the condition that the data formats are not uniform or errors are avoided, so that the accuracy of data processing is improved; the data format and structure are clearer, the sharing and the exchange of the data are convenient, and for the scene of sharing the data among different systems, software and platforms, the process and the cost of the data exchange can be greatly simplified by setting the data reading codes for the standard format data; by setting the data reading code, the data can be read into a specific data structure, such as a pandas data frame, so that the data analysis and visualization are convenient, and the data analysis and interpretation are more convenient.
Specifically, the generating data adapting to the client format according to the read data reading code includes:
and selecting a corresponding output format according to different clients.
The output format includes JSON, XML, etc.
And selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
XML and JSON data can be read in Java by using Jackson, gson, dom J and other libraries; an ElementTree may be used in Python.
The read standard format data is converted into a corresponding data structure and format using a corresponding library and method.
In Java, JSON data can be converted into Java objects by using libraries such as Jackson and Gson, and XML data can be converted into Java objects by using libraries such as XMLBeans, JAXB; the Pandas library may be used in Python to convert CSV data to DataFrame objects and the XML data to Python objects using the XML. Etre. Elementtree library.
And processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In Java, java objects can be converted into JSON or XML formats by using libraries such as Jackson and Gson, and data can be written into Excel files by using libraries such as Apache POI; the DataFrame object can be converted to CSV format in Python using the Pandas library and Python object to XML format using the XML. Etre. Elementtree library.
A second embodiment of the present invention provides a multi-format data adaptive distribution method, including:
and the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching.
And the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data.
And the data storage module is used for storing each piece of obtained standard format data.
The self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code.
And the output module is used for outputting the processed data adapting to the client format.
Specifically, the classification module includes:
and the sample acquisition unit is used for acquiring samples of the service data from the third party platform.
And the category determining unit is used for determining the classification category of the business data, wherein the classification category comprises price, inventory, name and brand.
And the vectorization unit is used for carrying out vectorization preprocessing on the acquired samples of the service data and converting the samples into vector format data.
And the training unit is used for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model.
The machine model adopts a Support Vector Machine (SVM), a decision tree, naive Bayes and the like.
The class matching unit is used for receiving the service data input by the service end, inputting the service data into the trained AI model, and matching the service data with the classification class.
Specifically, the preprocessing module includes:
and the mode definition unit is used for defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, and the data structure of the Avro format comprises a field name, a default value, a document description of the field, an order of the field and an alias of the field.
And a category definition unit, configured to combine each category of the service data with a data type in the Avro format data.
And the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
Specifically, the data storage module includes:
and the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data.
The code is written in a programming language such as Java, python, etc.
And the code matching unit is used for storing each piece of standard format data and the corresponding data reading code.
And the feature extraction unit is used for receiving a request of the client for reading the specific data, wherein the request contains the selected features.
And the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
Specifically, the adaptive processing module includes:
and the output format selection unit is used for selecting a corresponding output format according to different clients.
The output format includes JSON, XML, etc.
And the standard format data reading unit is used for selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format.
XML and JSON data can be read in Java by using Jackson, gson, dom J and other libraries; an ElementTree may be used in Python.
And the data conversion unit is used for converting the read standard format data into a corresponding data structure and format by using a corresponding library and method.
In Java, JSON data can be converted into Java objects by using libraries such as Jackson and Gson, and XML data can be converted into Java objects by using libraries such as XMLBeans, JAXB; the Pandas library may be used in Python to convert CSV data to DataFrame objects and the XML data to Python objects using the XML. Etre. Elementtree library.
And the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
In Java, java objects can be converted into JSON or XML formats by using libraries such as Jackson and Gson, and data can be written into Excel files by using libraries such as Apache POI; the DataFrame object can be converted to CSV format in Python using the Pandas library and Python object to XML format using the XML. Etre. Elementtree library.
The embodiment of the invention aims to protect a multi-format data self-adaptive distribution method and a system, and has the following effects:
1. the method and the device convert the required data into the standard format data after standardized processing, solve the problems of non-uniform data form and different data standards, further store the standard format data into the preset database, facilitate the extraction and pushing of the target data, and improve the efficiency of data acquisition.
2. According to the invention, the data is stored and read through the Avro format, the Avro format data can be compressed into a smaller file, the storage and transmission cost can be reduced, and the data processing is accelerated; the Avro format of data can interoperate between different programming languages, facilitating data transfer between different platforms.
3. The invention stores service data such as goods data, stock data, sales data and the like in a structured data form by forming standard format data, is convenient for the data to carry out operations such as quick inquiry, screening, sorting, aggregation and the like, can form table data at the same time, is visualized, and is convenient for efficiently processing and analyzing the data. Through feature extraction, specific features are normalized, standardized, encoded and the like, so that the problems of non-uniformity in data form and different data standards are solved, and the efficiency of data acquisition and storage is improved.
The computer program product of the multi-format data adaptive distribution method and apparatus provided in the embodiments of the present invention includes a computer readable storage medium storing program codes, and instructions included in the program codes may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.
In particular, the storage medium can be a general-purpose storage medium, such as a mobile disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the above-described multi-format data adaptive distribution method can be executed, so that the system can be ensured to process a large amount of data in a supply chain, and an accurate, reliable, safe, and efficient service can be provided.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for enabling a computer device to be a personal computer, a server, a network device or the like to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory, a ROM, a random access Memory Random Access Memory, a RAM, a magnetic disk, or an optical disk, etc., which can store program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A multi-format data adaptive distribution method, comprising:
receiving service data input by a service end, and carrying out data classification on the input service data through AI fuzzy matching;
preprocessing each piece of classified service data respectively to obtain standard format data;
storing each piece of obtained standard format data;
receiving a request of a client for reading specific data, and reading a data reading code corresponding to the standard format data;
generating data adapting to a client format according to the read data reading code;
outputting the processed data adapting to the format of the client;
the step of receiving the service data input by the service terminal and carrying out data classification on the input service data through AI fuzzy matching comprises the following steps:
collecting a sample of the business data from a third party platform;
determining a classification category of the business data, the classification category including price, inventory, name, and brand;
carrying out vectorization preprocessing on the collected samples of the service data, and converting the samples into vector format data;
inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model, and obtaining a trained AI model;
receiving service data input by a service end, inputting the service data into the trained AI model, and matching the service data with the classification category;
the preprocessing is performed on each piece of classified service data respectively to obtain a standard data format, and the method comprises the following steps:
defining each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, wherein a data structure of the Avro format comprises a field name, a default value, a document description of the field, a sequence of the field and an alias of the field;
combining each category of the service data with a data type in the Avro format data;
and obtaining standard format data corresponding to each piece of classified service data.
2. The adaptive distribution method of multi-format data according to claim 1, wherein the storing and receiving a request for reading specific data from a client for each piece of the obtained standard-format data, and reading the data reading code corresponding to the standard-format data includes:
writing a data reading code for each piece of preprocessed standard format data;
and storing each piece of data in the standard format and the corresponding data reading code.
3. The multi-format data adaptive distribution method according to claim 2, wherein the receiving a request from a client to read specific data, reading a data reading code corresponding to the standard format data, comprises:
receiving a request of a client for reading specific data, wherein the request contains selected characteristics;
and obtaining a data reading code of the standard format data corresponding to the selected feature.
4. The multi-format data adaptive distribution method according to claim 3, wherein the generating data adapted to a client format according to the read data read code comprises:
selecting a corresponding output format according to different clients;
selecting a corresponding library and a corresponding method to read the data reading code according to the corresponding output format;
converting the read standard format data into a corresponding data structure and format using a corresponding library and method;
and processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
5. A multi-format data adaptive distribution apparatus, comprising:
the classification module is used for receiving the service data input by the service end and classifying the input service data through AI fuzzy matching;
the preprocessing module is used for respectively preprocessing each piece of classified service data to obtain standard format data;
the data storage module is used for storing each piece of obtained standard format data;
the self-adaptive processing module is used for receiving a request of a client for reading specific data, reading a data reading code corresponding to the standard format data, and generating data adapting to the format of the client according to the read data reading code;
the output module is used for outputting the processed data adapting to the format of the client;
the classification module comprises:
the sample acquisition unit is used for acquiring samples of service data from the third party platform;
a category determining unit, configured to determine a classification category of the service data, where the classification category includes a price, an inventory, a name, and a brand;
the vectorization unit is used for carrying out vectorization preprocessing on the collected samples of the service data and converting the samples into vector format data;
a training unit for inputting the preprocessed vector format data and the labels of the corresponding categories into a machine model to train an AI model
The preprocessing module comprises:
a mode definition unit, configured to define each piece of service data by using a mode language provided by Avro according to the characteristics of the service data, where a data structure in the Avro format includes a field name, a default value, a document description of a field, an order of the field, and an alias of the field;
a category definition unit, configured to combine each category of the service data with a data type in the Avro format data;
and the standard format data output unit is used for obtaining standard format data corresponding to each piece of classified service data.
6. The multi-format data adaptive distribution apparatus according to claim 5, wherein the data storage module comprises:
the code writing unit is used for writing data reading codes for each piece of the preprocessed standard format data;
the code matching unit is used for storing each piece of standard format data and the corresponding data reading code;
the feature extraction unit is used for receiving a request of a client for reading specific data, wherein the request contains selected features;
and the code reading unit is used for obtaining a data reading code of the standard format data corresponding to the selected feature.
7. The multi-format data adaptive distribution apparatus according to claim 5, wherein the adaptive processing module comprises:
the output format selection unit is used for selecting a corresponding output format according to different clients;
a standard format data reading unit, configured to select a corresponding library and a method to read the data reading code according to a corresponding output format;
a data conversion unit for converting the read standard format data into a corresponding data structure and format using a corresponding library and method;
and the data processing unit is used for processing the converted data structure and format to obtain data adapting to the format of the corresponding client.
CN202310518702.8A 2023-05-09 2023-05-09 Multi-format data self-adaptive distribution method and system Active CN116719866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310518702.8A CN116719866B (en) 2023-05-09 2023-05-09 Multi-format data self-adaptive distribution method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310518702.8A CN116719866B (en) 2023-05-09 2023-05-09 Multi-format data self-adaptive distribution method and system

Publications (2)

Publication Number Publication Date
CN116719866A CN116719866A (en) 2023-09-08
CN116719866B true CN116719866B (en) 2024-02-13

Family

ID=87874180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310518702.8A Active CN116719866B (en) 2023-05-09 2023-05-09 Multi-format data self-adaptive distribution method and system

Country Status (1)

Country Link
CN (1) CN116719866B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546399A (en) * 2008-03-28 2009-09-30 精工爱普生株式会社 Voucher data management system and method for controlling voucher data management system
CN102968306A (en) * 2012-11-29 2013-03-13 广东全通教育股份有限公司 Method and system for automatically generating code based on data model drive
CN103491135A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Device and method for conducting self-matching on data formats
CN107957889A (en) * 2016-10-17 2018-04-24 阿里巴巴集团控股有限公司 Processing method, device, client, server and the system of product configuration data
CN109358845A (en) * 2017-12-27 2019-02-19 广州Tcl智能家居科技有限公司 Method, tool and the storage medium of JS code are write based on XMPP protocol
CN110597816A (en) * 2019-09-17 2019-12-20 深圳追一科技有限公司 Data processing method, data processing device, computer equipment and computer readable storage medium
CN112214453A (en) * 2020-09-14 2021-01-12 上海微亿智造科技有限公司 Large-scale industrial data compression storage method, system and medium
CN112235311A (en) * 2020-10-20 2021-01-15 网络通信与安全紫金山实验室 OVSDB client code automatic generation method, system, device and medium
CN113010503A (en) * 2021-03-01 2021-06-22 广州智筑信息技术有限公司 Engineering cost data intelligent analysis method and system based on deep learning
CN113626512A (en) * 2021-08-17 2021-11-09 未鲲(上海)科技服务有限公司 Data processing method, device, equipment and readable storage medium
CA3142409A1 (en) * 2020-12-19 2022-06-19 The Toronto-Dominion Bank Real-time prediction of parameter modifications based on structured messaging data
CN114792145A (en) * 2022-05-27 2022-07-26 中国标准化研究院 Standard digital management maintenance system and method based on knowledge graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102551601B1 (en) * 2018-02-05 2023-07-06 한국전자통신연구원 Storage server and adaptable prefetching method performed by the storage server in distributed file system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546399A (en) * 2008-03-28 2009-09-30 精工爱普生株式会社 Voucher data management system and method for controlling voucher data management system
CN102968306A (en) * 2012-11-29 2013-03-13 广东全通教育股份有限公司 Method and system for automatically generating code based on data model drive
CN103491135A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Device and method for conducting self-matching on data formats
CN107957889A (en) * 2016-10-17 2018-04-24 阿里巴巴集团控股有限公司 Processing method, device, client, server and the system of product configuration data
CN109358845A (en) * 2017-12-27 2019-02-19 广州Tcl智能家居科技有限公司 Method, tool and the storage medium of JS code are write based on XMPP protocol
CN110597816A (en) * 2019-09-17 2019-12-20 深圳追一科技有限公司 Data processing method, data processing device, computer equipment and computer readable storage medium
CN112214453A (en) * 2020-09-14 2021-01-12 上海微亿智造科技有限公司 Large-scale industrial data compression storage method, system and medium
CN112235311A (en) * 2020-10-20 2021-01-15 网络通信与安全紫金山实验室 OVSDB client code automatic generation method, system, device and medium
CA3142409A1 (en) * 2020-12-19 2022-06-19 The Toronto-Dominion Bank Real-time prediction of parameter modifications based on structured messaging data
CN113010503A (en) * 2021-03-01 2021-06-22 广州智筑信息技术有限公司 Engineering cost data intelligent analysis method and system based on deep learning
CN113626512A (en) * 2021-08-17 2021-11-09 未鲲(上海)科技服务有限公司 Data processing method, device, equipment and readable storage medium
CN114792145A (en) * 2022-05-27 2022-07-26 中国标准化研究院 Standard digital management maintenance system and method based on knowledge graph

Also Published As

Publication number Publication date
CN116719866A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN106844372B (en) Logistics information query method and device
CN110418196B (en) Video generation method and device and server
CN110659318A (en) Big data based strategy pushing method and system and computer equipment
CN110798567A (en) Short message classification display method and device, storage medium and electronic equipment
CN111581193A (en) Data processing method, device, computer system and storage medium
CN113627168A (en) Method, device, medium and equipment for checking component packaging conflict
CN111582314A (en) Target user determination method and device and electronic equipment
CN116719866B (en) Multi-format data self-adaptive distribution method and system
CN117271478A (en) Data migration method and device, storage medium and electronic equipment
CN113205130A (en) Data auditing method and device, electronic equipment and storage medium
CN115756486A (en) Data interface analysis method and device
CN116228265A (en) Invoice risk identification method, device and equipment
US11562555B2 (en) Methods, systems, articles of manufacture, and apparatus to extract shape features based on a structural angle template
CN115409104A (en) Method, apparatus, device, medium and program product for identifying object type
CN114743012A (en) Text recognition method and device
CN113379499A (en) Article screening method and apparatus, electronic device, and storage medium
CN117112846B (en) Multi-information source license information management method, system and medium
CN116610679B (en) json data analysis method, json data analysis device, computer equipment and computer medium
CN110597967B (en) Order positioning method and equipment
CN115576934A (en) Data management method and device and computer readable storage medium
CN117668227A (en) Method, system, equipment and medium for auditing warranty text
CN118114982A (en) Enterprise risk conduction prediction method, system and medium based on graph characteristics
CN113627136A (en) Component recommendation method, device and system
CN117455688A (en) Investment object screening method and device, storage medium and electronic device
CN116384390A (en) Text labeling method and device, processor and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant