CN108153896B - Processing method and device for input data and output data - Google Patents

Processing method and device for input data and output data Download PDF

Info

Publication number
CN108153896B
CN108153896B CN201810015586.7A CN201810015586A CN108153896B CN 108153896 B CN108153896 B CN 108153896B CN 201810015586 A CN201810015586 A CN 201810015586A CN 108153896 B CN108153896 B CN 108153896B
Authority
CN
China
Prior art keywords
data
configuration information
processing
data structure
type conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810015586.7A
Other languages
Chinese (zh)
Other versions
CN108153896A (en
Inventor
杨强
戴文渊
陈雨强
裴兆友
石光川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201810015586.7A priority Critical patent/CN108153896B/en
Publication of CN108153896A publication Critical patent/CN108153896A/en
Application granted granted Critical
Publication of CN108153896B publication Critical patent/CN108153896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a processing method and a processing device for input data and output data, relates to the technical field of computers, and can effectively improve the universality of data processing tools such as machine learning and the like. The processing method for the input data comprises the following steps: acquiring data input configuration information, wherein the data input configuration information comprises an operation identification set for processing binary byte streams, and the operation identification set comprises a reading operation identification, an anti-serialization operation identification and a type conversion operation identification; parsing the operation identification set, performing an operation corresponding to the parsed operation identification set to read the binary byte stream from the local or remote storage medium, performing a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and performing a type conversion operation on the source data structure or object to obtain a target data structure or object. The method and the device can be suitable for scenes applying heterogeneous data.

Description

Processing method and device for input data and output data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for processing input data and output data.
Background
With the advent of massive amounts of data, it is often necessary to face data from a variety of different sources or formats when processing (e.g., data mining) the data. This results in the data processing tool (e.g., machine learning platform software) being exposed to third party data, which may not be supported or matched, and may take a lot of time and effort to perform corresponding data processing. Furthermore, since the data formats available for different data processing tools are generally not universal, users are required to have certain data processing capabilities, which can also increase the workload of using the tools. The above problems often result in a large limitation in the efficiency of use and the range of applications of data processing.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for processing input data and output data, which can effectively improve the versatility of a data processing tool.
In a first aspect, an embodiment of the present invention provides a processing method for input data, including: acquiring data input configuration information, wherein the data input configuration information comprises an operation identification set for processing binary byte streams, and the operation identification set comprises a reading operation identification, an anti-serialization operation identification and a type conversion operation identification; parsing the operation identification set, executing an operation corresponding to the parsed operation identification set to read the binary byte stream from the local or remote storage medium, executing a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and executing a type conversion operation on the source data structure or object to obtain a target data structure or object; wherein the read operation identification indicates reading a binary byte stream from a local or remote storage medium, the deserialization operation identification indicates the preset deserialization operation, and the type conversion operation identification indicates at least one type conversion operation to be performed on the source data structure or object.
With reference to the first aspect, in a first implementation manner of the first aspect, the data input configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
With reference to the first aspect, in a third implementation of the first aspect, the target data structure or object is used as a training sample for a machine learning model.
With reference to the third implementation of the first aspect, in a fourth implementation of the first aspect, the method is performed by a machine learning system.
With reference to the first aspect or any one of the first to fourth implementation manners of the first aspect, in a fifth implementation manner of the first aspect, the type conversion operation includes a decompression operation.
With reference to the first aspect or any one of the first to fourth implementation manners of the first aspect, in a sixth implementation manner of the first aspect, the data input configuration information is represented as a uniform resource identifier.
With reference to the first aspect or any one of the first to fourth implementation manners of the first aspect, in a seventh implementation manner of the first aspect, the performing an operation corresponding to the parsed operation identification set includes: operations corresponding to the parsed operation identification set are performed serially or in parallel.
With reference to the first aspect or any one of the first to fourth implementation manners of the first aspect, in an eighth implementation manner of the first aspect, the obtaining data input configuration information includes: the data input configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface.
In a second aspect, an embodiment of the present invention further provides a method for processing output data, including: acquiring data output configuration information, wherein the data output configuration information comprises an operation identifier set for processing a source data structure or an object, and the operation identifier set comprises a type conversion operation identifier, a serialization operation identifier and a write operation identifier; parsing the operation identification set, performing an operation corresponding to the parsed operation identification set to perform a type conversion operation on the source data structure or object to obtain a target data structure or object, performing the preset serialization operation on the target data structure or object to obtain a binary byte stream, and outputting the binary byte stream to a local or remote storage medium, wherein the type conversion operation identification indicates at least one type conversion operation to be performed on the source data structure or object, the serialization operation identification indicates the preset serialization operation, and the write operation identification indicates outputting the binary byte stream to the local or remote storage medium.
With reference to the second aspect, in a first implementation manner of the second aspect, the data output configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
With reference to the second aspect, in a third implementation of the second aspect, the source data structure or object is a training result of a machine learning model.
With reference to the third implementation of the second aspect, in a fourth implementation of the second aspect, the method is performed by a machine learning system.
With reference to the second aspect or any one of the first to fourth embodiments of the second aspect, in a fifth embodiment of the second aspect, the type conversion operation includes a compression operation.
With reference to the second aspect or any one of the first to fourth embodiments of the second aspect, in a sixth embodiment of the second aspect, the data output configuration information is represented as a uniform resource identifier.
With reference to the second aspect or any one of the first to fourth embodiments of the second aspect, in a seventh embodiment of the second aspect, the performing an operation corresponding to the parsed operation identification set includes: operations corresponding to the parsed operation identification set are performed serially or in parallel.
With reference to the second aspect or any one of the first to fourth embodiments of the second aspect, in an eighth embodiment of the second aspect, the obtaining data output configuration information includes: the data output configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface.
In a third aspect, an embodiment of the present invention further provides a processing apparatus for input data, including: the device comprises a configuration information acquisition unit, a data input configuration information acquisition unit and a data output configuration information processing unit, wherein the data input configuration information comprises an operation identification set for processing the binary byte stream, and the operation identification set comprises a reading operation identification, an anti-serialization operation identification and a type conversion operation identification; the processing unit is used for analyzing the operation identification set, executing an operation corresponding to the analyzed operation identification set to read the binary byte stream from a local or remote storage medium, executing a preset deserialization operation on the read binary byte stream to obtain a source data structure or an object, and executing a type conversion operation on the source data structure or the object to obtain a target data structure or an object; wherein the read operation identification indicates reading a binary byte stream from a local or remote storage medium, the deserialization operation identification indicates the preset deserialization operation, and the type conversion operation identification indicates at least one type conversion operation to be performed on the source data structure or object.
With reference to the third aspect, in a first implementation manner of the third aspect, the data input configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
With reference to the first implementation manner of the third aspect, in a second implementation manner of the third aspect, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
With reference to the third aspect, in a third implementation form of the third aspect, the target data structure or object is used as a training sample for a machine learning model.
With reference to the third implementation manner of the third aspect, in a fourth implementation manner of the third aspect, the processing device is integrated in a machine learning system.
With reference to the third aspect or any one of the first to fourth embodiments of the third aspect, in a fifth embodiment of the third aspect, the type conversion operation includes a decompression operation.
With reference to the third aspect or any one of the first to fourth embodiments of the third aspect, in a sixth embodiment of the third aspect, the data input configuration information is represented as a uniform resource identifier.
With reference to the third aspect or any one of the first to fourth embodiments of the third aspect, in a seventh embodiment of the third aspect, the processing unit performs operations corresponding to the parsed operation identification sets in series or in parallel.
With reference to the third aspect or any one of the first to fourth embodiments of the third aspect, in an eighth embodiment of the third aspect, the configuration information acquisition unit acquires the data input configuration information by detecting a configuration operation performed by a user on a graphical user interface.
In a fourth aspect, an embodiment of the present invention further provides a processing apparatus for output data, including: the device comprises a configuration information acquisition unit, a data output configuration information acquisition unit and a data output configuration information processing unit, wherein the data output configuration information comprises an operation identification set for processing a source data structure or an object, and the operation identification set comprises a type conversion operation identification, a serialization operation identification and a write operation identification; a processing unit, configured to parse the operation identifier set, perform an operation corresponding to the parsed operation identifier set, to perform a type conversion operation on the source data structure or object to obtain a target data structure or object, perform the preset serialization operation on the target data structure or object to obtain a binary byte stream, and output the binary byte stream to a local or remote storage medium, wherein the type conversion operation identifier indicates at least one type conversion operation to be performed on the source data structure or object, the serialization operation identifier indicates the preset serialization operation, and the write operation identifier indicates to output the binary byte stream to the local or remote storage medium.
With reference to the fourth aspect, in a first implementation manner of the fourth aspect, the data output configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
With reference to the first implementation manner of the fourth aspect, in a second implementation manner of the fourth aspect, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
With reference to the fourth aspect, in a third implementation of the fourth aspect, the source data structure or object is a training result of a machine learning model.
With reference to the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect, the processing device is integrated in a machine learning system.
With reference to the fourth aspect or any one of the first to fourth embodiments of the fourth aspect, in a fifth embodiment of the fourth aspect, the type conversion operation includes a compression operation.
With reference to the fourth aspect or any one of the first to fourth embodiments of the fourth aspect, in a sixth embodiment of the fourth aspect, the data output configuration information is represented as a uniform resource identifier.
With reference to the fourth aspect or any one of the first to fourth embodiments of the fourth aspect, in a seventh embodiment of the fourth aspect, the processing unit performs operations corresponding to the parsed operation identification sets in series or in parallel.
With reference to the fourth aspect or any one of the first to fourth embodiments of the fourth aspect, in an eighth implementation of the fourth aspect, the configuration information acquisition unit acquires the data output configuration information by detecting a configuration operation performed by a user on a graphical user interface.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing any processing method aiming at the input data provided by the embodiment of the invention; or for performing any of the processing methods for output data provided by embodiments of the present invention.
In a sixth aspect, embodiments of the present invention further provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement any one of the processing methods for input data provided by the embodiments of the present invention; or to implement any of the processing methods for output data provided by embodiments of the present invention.
The processing method and the processing device for the input data and the output data, which are provided by the embodiment of the invention, can effectively process various input data from different sources entering the data processing system or various output data output from the system to different destinations by means of unified configuration information, thereby freeing a user from various data input/output processes, and effectively improving the operating efficiency and the universality of the data processing system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a processing method for input data according to an embodiment of the present invention;
fig. 2 is a detailed flowchart of a processing method for input data according to an embodiment of the present invention;
FIG. 3 is a data flow diagram corresponding to the embodiment shown in FIG. 2;
FIG. 4 is a flow chart of a processing method for output data according to an embodiment of the present invention;
FIG. 5 is a detailed flowchart of a processing method for output data according to an embodiment of the present invention;
FIG. 6 is a data flow diagram corresponding to the embodiment shown in FIG. 5;
fig. 7 is a schematic structural diagram of a processing apparatus for input data according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a processing apparatus for outputting data according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a first aspect, embodiments of the present invention provide a processing method for input data, according to which input data from various sources entering a data processing system can be effectively processed by using uniform configuration information, so that a user is liberated from a data preprocessing stage, and the operating efficiency and the universality of the data processing system are effectively improved.
As shown in fig. 1, a processing method for input data provided by an embodiment of the present invention may include:
s11, obtaining data input configuration information, wherein the data input configuration information includes an operation identification set for processing operation on the binary byte stream, and the operation identification set includes a reading operation identification, an anti-serialization operation identification and a type conversion operation identification.
Since the external data has various situations, and the data structure or object that can be actually processed by the program in the current system does not necessarily conform to the external data, the data processing operation that the system needs to execute is relatively complicated.
According to an exemplary implementation of the present invention, the form of the data input configuration information may be agreed with an external data source, so that a series of processes can be effectively prompted to be performed by the system for the input data through the transmission of the data input configuration information, and a data structure or an object which can be processed by the system can be obtained.
In this step, the data input configuration information is used to read the external data into the system for further processing, and regardless of where the original external data comes from, what sequence, and what data type, the external data can be converted into the data type that can be processed by the program in the current system by the related operations performed based on the data input configuration information.
First, the data input configuration information may be obtained from a source of external input data directly or through a network, or may be generated at the system itself. The data input configuration information carries a series of related operation identifiers according to a predetermined format. Here, the input data is in the form of a binary byte stream when it is stored in a local storage medium such as a disk or further transmitted over a network. Thus, in this step, the data input configuration information needs to indicate a series of operations from reading in the binary byte stream to finally converting it into a data structure or object that can be processed in the system. To achieve the above indication, in one embodiment of the present invention, the data input configuration information may include an operation identifier set, and the operation identifier set may include a series of operation identifiers, where each operation identifier represents a one-step processing operation on the input data. For example, in one embodiment of the invention, the set of operation identifications may include a read operation identification, an deserialization operation identification, and a type conversion operation identification. Here, the read operation identification indicates reading of a binary byte stream from a local or remote storage medium, the deserialization operation identification indicates performing a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and the type conversion operation identification indicates at least one type conversion operation to be performed on the source data structure or object.
S12, the operation identification set is analyzed, the operation corresponding to the analyzed operation identification set is executed to read the binary byte stream from the local or remote storage medium, the preset deserialization operation is executed to the read binary byte stream to obtain the source data structure or object, and the type conversion operation is executed to the source data structure or object to obtain the target data structure or object.
In this step, the operation identifier set acquired in step S11 may be analyzed to determine what data processing operation needs to be performed. In general, the operation identification set may indicate how to read a binary byte stream from a local or remote storage medium, how to perform a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and how to perform a type conversion operation on the source data structure or object to obtain a target data structure or object. Wherein, how to read the binary byte stream from the local or remote storage medium can be indicated by the read operation identifier in the operation identifier set, for example, the read operation identifier can indicate that the operation type is a read operation and/or a specific read operation; how to perform a predetermined deserialization operation on the read binary byte stream to obtain a source data structure or object can be indicated by a deserialization operation identifier in the operation identifier set, for example, the deserialization operation identifier can indicate a specific kind of deserialization operation, and the deserialization operation is preset to be used for obtaining the source data structure or object; how to convert a source data structure or object into a destination data structure or object may be indicated by a type conversion operation identifier, which may indicate one or more specific types of conversion, for example.
The processing method for input data provided by the embodiment of the invention can acquire data input configuration information, indicate preset data processing operation through a read operation identifier, an deserialization operation identifier and a type conversion operation identifier in the data input configuration information so as to read a binary byte stream from a local or remote storage medium, perform preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and perform type conversion operation on the source data structure or object to obtain a target data structure or object. Therefore, various data input into the system can be processed in a centralized manner and converted into a data type which can be directly calculated by the system, so that a user is liberated from various data preprocessing, and the operation efficiency and the universality of the system are effectively improved.
Optionally, the input data in the embodiment of the present invention, that is, the binary byte stream to be read into the system may carry various data such as sample data, model data, and/or prediction result data in machine learning. Accordingly, the system may advantageously utilize machine learning data from different sources. For example, when the binary byte stream carries training sample data in a model training process, a target data structure or object converted from the binary byte stream can be used as a training sample of a machine learning model for a machine learning system to perform model training, and preferably, the whole processing method for input data can be executed by the machine learning system, for example, by a separate module provided in the machine learning system, so that the operation part of the machine learning system does not need to care about any additional data processing; for another example, when the binary byte stream carries intermediate results or final results of the machine learning models trained on other machine learning platforms, the machine learning system can continue to perform operations or processes on the target data structure or object converted from the binary byte stream, so that different machine learning systems or platforms can cooperate to perform machine learning model training or cooperate to utilize the model training results of machine learning, thereby overcoming the defect that the processing results of one machine learning platform cannot be utilized by another machine learning platform in the prior art.
Specifically, since the specific operation steps may be different according to different binary byte streams and different target data structures or objects, and the specific operation performed on a specific binary byte stream needs to be indicated in the data input configuration information, in step S11, the corresponding data input configuration information may be obtained in a targeted manner. This targeting may be embodied by the configuration of the user. For example, in one embodiment of the present invention, obtaining data input configuration information may include: the data input configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface. .
Further, in order to smoothly read the binary byte stream (e.g., binary byte string) from the local or remote storage medium into the memory, the data input configuration information obtained in step S11 may further include an address of the local or remote storage medium where the binary byte stream is located and/or a transmission protocol for the binary byte stream. The address of the remote storage medium may be the address of various public clouds or private clouds. The transmission protocol for a binary byte stream refers to the protocol to which the binary byte stream adheres during storage or transmission. Optionally, protocols followed by the binary byte stream in the storage or transmission process in the embodiment of the present invention may be various, and correspondingly, the data relationship indicated by the transmission protocol may also include various ones, for example, a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream may be indicated, which is not limited in the embodiment of the present invention.
For example, in an embodiment of the present invention, according to different sources of the binary byte stream, the transmission protocol may indicate that the binary byte stream complies with a protocol of a relational database such as mysql, oracle, db2, a nonsql database such as redis, mongodb, hbase, and the like, and a protocol of a data stream acquired by a message queue or a log such as kafka, logstash, and flume.
As can be seen from the data processing method for input data provided in the embodiment of the present invention, a binary byte stream in a storage medium needs to be transmitted to a system, such as a machine learning platform, deserialized in the system into a source data structure or object, and then converted from the source data structure or object into a target data structure or object by a type conversion operation, so as to be further processed by the system (e.g., operations performed in a memory, etc.). The type conversion from the source data structure or object to the target data structure or object may be performed by one type conversion operation or may be performed by multiple type conversion operations.
In particular, in one embodiment of the invention, the type conversion operation may include a decompression operation, which makes the design of the data input configuration information more concise. For example, a binary byte stream in a storage medium corresponds to data in a compressed format, that is, after the deserialization operation is completed, the obtained source data structure or object is in a compressed format, and needs to be decompressed before being further converted into a target data structure or object. The type conversion operation in this case comprises a 2-step conversion operation, i.e. a decompression operation and an operation of further converting the decompressed data into a target data structure or object.
It should be noted that, in step S11, the purpose of acquiring the data input configuration information is to read the binary byte stream data and convert the binary byte stream data into a required data structure or object, so the data input configuration information is only required to indicate that the above operation is clear, and the embodiment of the present invention is not limited to the specific representation form of the data input configuration information. Preferably, in some embodiments of the present invention, the data input configuration information may be represented as a URI (Uniform Resource identifier), or a character string satisfying a preset format requirement, or a descriptive document capable of being read by a computer, etc.
For example, in one embodiment of the present invention, the data input configuration information in the form of a URI may be expressed as:
hdfs://host:port/path/name?read=R120&serial izat ion=C130&P1=P140&P2=P150。
the data input configuration information represented in the form of the URI can be divided into three parts, namely a transmission protocol, a storage position and an operation identification set, and the computer can convert the URI into a series of operations, wherein:
the transmission protocol followed by the binary byte stream is hdfs (e.g., Hadoop distributed file system), the storage location of the binary byte stream is host: port/path/name, and the operations on the binary byte stream indicated in the operation identification set include read R120& serial acceleration ion C130& P1P 140& P2P 150.
Optionally, read denotes that a read operation of data is to be performed, R120 may be one of a plurality of specific operation symbols that denote the read operation, and different read operation symbols are required to implement corresponding read operations when protocols followed by the binary byte stream are different, for example, R120, R121, R122, R123, and the like respectively denote different read operations.
Optionally, Serial izot ion ═ C130, indicates that an deserialization operation is to be performed on the binary byte stream that is read in, thereby restoring the binary byte stream to its corresponding data structure or object (i.e., source data structure or object). Where C130 is a specific deserialization operation, adapted to the characteristics of the binary byte stream itself. Of course, different binary byte streams and source data structures or objects also correspond to different deserialization operations, e.g., C131, C132, etc.
Alternatively, P1 and P2 indicate two-step data type conversion, where P140 may indicate a specific decompression operation, and P150 may indicate a type conversion operation from a data structure or object resulting from the decompression operation to a data structure or object that needs to be used in a system such as a machine learning program.
In step S12, the acquired operation identifier set indicates a series of operations that need to be performed, and optionally, in an embodiment of the present invention, the operations corresponding to the parsed operation identifier set may be performed serially. The operations corresponding to the parsed operation identifier set are executed serially, that is, the binary byte stream reading operation, the deserializing operation, and the one-step and one-step type conversion operation are sequentially executed, the deserializing operation is performed on the read data only after the binary byte stream reading operation is finished, and the reading of the next part of the binary byte stream is allowed only when the read data is taken away for deserializing operation. The deserialization operation and the type conversion operation follow similar ordering rules. Optionally, in another embodiment of the present invention, the operation corresponding to the parsed operation identification set may also be executed in parallel. The operations corresponding to the analyzed operation identification set are executed in parallel, namely, the binary byte stream reading operation, the deserialization operation and the one-step and one-step type conversion operation are connected through the buffer queue, so that a set of assembly line is formed.
The following describes in detail a processing method for input data according to an embodiment of the present invention with specific embodiments.
As shown in fig. 2, a processing method for input data according to an embodiment of the present invention may include:
s101, acquiring data input configuration information:
"hdfs:// host: port/path/name? R120& serial izot ion C130& P1P 140& P2P 150 "; wherein the operation identification set is "read R120& serial izot ion ═ C130& P1 ═ P140& P2 ═ P150";
s102, generating a reading operation R120 according to a transmission protocol or a read parameter;
s103, generating an deserialization operation C130 according to the serial izot parameter;
s104, generating data conversion operations P140 and P150 according to parameters P1 and P2;
s105, connecting R120, C130, P140 and P150 in series to form a set of data input operation process;
s106, reading the binary byte stream from the storage medium A according to the formed input operation flow to finally obtain the required data type T160, wherein the schematic diagram of the data stream of the whole process can be shown as the figure 3.
In a second aspect, embodiments of the present invention also provide a processing method for output data, according to which output data provided from a data processing system to various destinations can be efficiently processed with unified configuration information, thereby freeing a user from a data output processing stage and effectively improving the operating efficiency and versatility of the data processing system.
As shown in fig. 4, a processing method for input data provided by an embodiment of the present invention may include:
s21, obtaining data output configuration information, wherein the data output configuration information includes an operation identification set for processing operation on a source data structure or an object, and the operation identification set includes a type conversion operation identification, a serialization operation identification and a write operation identification.
Since there are various data destinations and data actually output by a program in the current system does not necessarily match the data destinations, data processing operations to be performed by the system are relatively complicated.
According to the exemplary implementation of the present invention, the form of the data output configuration information may be agreed with the external data destination, so that the system can be effectively prompted to perform a series of processes for the output data through the transmission of the data output configuration information, thereby obtaining processing results suitable for various data destinations.
In this step, the data output configuration information is used to write data generated by the system to the outside, and regardless of where the external data destination is, what sequence is applicable, and what data type is, the data output from the system can be converted into the data type suitable for the external data destination by means of the relevant operation performed based on the data output configuration information.
First, the data output configuration information may be acquired directly or through a network from a destination that receives the output data, or may be generated at the system itself. The data output configuration information carries a series of related operation identifiers according to a predetermined format. Here, the output data is result data generated by the system through processing such as arithmetic, and is in the form of a binary byte stream when it is written in a local storage medium such as a disk or further transmitted through a network. Thus, in this step, the data output configuration information needs to indicate a sequence of operations from type-converting the source data structure or object to ultimately get a binary byte stream that can be stored or transmitted. To achieve the above indication, in an embodiment of the present invention, the data output configuration information may include an operation identifier set, and the operation identifier set may include a series of operation identifiers, where each operation identifier represents a one-step processing operation on the output data. For example, in one embodiment of the invention, the set of operation identifications may include a type conversion operation identification, a serialization operation identification, and a write operation identification. Here, the type conversion operation identification indicates at least one type conversion operation to be performed on a source data structure or object, the serialization operation identification indicates the preset serialization operation, and the write operation identification indicates outputting of the binary byte stream to a local or remote storage medium.
S22, analyzing the operation identification set, executing the operation corresponding to the analyzed operation identification set to execute type conversion operation on the source data structure or object to obtain a target data structure or object, executing the preset serialization operation on the target data structure or object to obtain a binary byte stream, and outputting the binary byte stream to a local or remote storage medium;
in this step, the operation identifier set acquired in step S21 may be analyzed to determine what data processing operation needs to be performed. In general, the operation identification set may indicate how to perform type conversion operations on a source data structure or object to obtain a target data structure or object, how to perform pre-set serialization operations on the target data structure or object to obtain a binary byte stream, and how to output the binary byte stream from memory to a local or remote storage medium. Wherein, how to convert the source data structure or object into the destination data structure or object can be indicated by the type conversion operation identifier, for example, the type conversion operation identifier can indicate one or more specific type conversion modes; how to perform a preset serialization operation on the converted target data structure or object to obtain the binary data stream may be indicated by a serialization operation identifier in the operation identifier set, for example, the serialization operation identifier may indicate a specific kind of serialization operation, and the kind of serialization operation is preset to be used for acquiring a binary data source; how to output the binary byte stream to the local or remote storage medium may be indicated by a write operation identifier in the operation identifier set, e.g., the write operation identifier may indicate that the operation type is a write operation and/or a specific write type.
The processing method for the output data provided by the embodiment of the invention can acquire the data output configuration information, indicate the preset data processing operation through the type conversion operation identifier, the serialization operation identifier and the write operation identifier in the data output configuration information so as to execute the type conversion operation on the source data structure or object to obtain the target data structure or object, perform the serialization operation on the obtained target data structure or object to obtain the binary byte stream, and output the binary byte stream to the local or remote storage medium. Therefore, various types of data output from the system can be processed in a centralized manner, so that the operation result (such as a machine learning model) of the system can be utilized by other systems, and the compatibility and the universality of the system are effectively improved.
Optionally, the output data in the embodiment of the present invention, that is, the data structure or the object to be output from the system to the local or remote storage medium, may carry various data such as sample data, model data, and/or prediction result data in machine learning. Accordingly, the system can conveniently output machine learning data to different destinations. For example, when the source data structure or object is a training result of the machine learning model (e.g., an intermediate result or a final result of the machine learning model), the binary byte stream converted from the source data structure or object may be used as the machine learning model for other external platforms to continue training or perform pre-estimation, and preferably, the whole processing method for the output data may be performed by the machine learning system, e.g., by a separate module provided in the machine learning system, so that the operation part of the machine learning system does not need to care about any additional data processing; for another example, when the source data structure or the object is sample data of a machine learning model, the binary byte stream converted from the source data structure or the object can be used by other external platforms to continue training or perform estimation, so that different machine learning systems or platforms can cooperate to perform machine learning model training or cooperate to utilize a model training result of machine learning, thereby overcoming the defect that a processing result of one machine learning platform cannot be utilized by another machine learning platform in the prior art.
Specifically, since the specific operation steps may be different according to different source data structures or objects and different binary byte streams, and the data output configuration information needs to indicate a specific operation performed on a specific source data structure or object, in step S21, the corresponding data output configuration information may be obtained in a targeted manner. This targeting may be embodied by the configuration of the user. For example, in one embodiment of the present invention, obtaining data output configuration information may include: the data output configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface.
Further, in order to smoothly output the converted binary byte stream (e.g., binary byte string) to the local or remote storage medium, the address of the local or remote storage medium and/or the transmission protocol for the binary byte stream may also be included in the data output configuration information acquired in step S21. For example, the address of the remote storage medium may be an address of various public or private clouds. The transmission protocol for a binary byte stream refers to the protocol to which the binary byte stream adheres during storage or transmission. Optionally, protocols followed by the binary byte stream in the storage or transmission process in the embodiment of the present invention may be various, and correspondingly, the data relationship indicated by the transmission protocol may also include various ones, for example, a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream may be indicated, which is not limited in the embodiment of the present invention.
For example, in an embodiment of the present invention, according to the difference of the output direction of the binary byte stream, the transmission protocol may indicate that the binary byte stream complies with the protocol of the relational database such as mysql, oracle, db2, etc., may indicate that the binary byte stream complies with the protocol of the nosql database such as redis, mongodb, hbase, etc., and may indicate that the binary byte stream complies with the protocol of the message queue such as kafka, logstash, flash, etc., or the data stream obtained by collecting the log.
As can be seen from the data processing method for output data provided in the embodiment of the present invention, a source data structure or object generated in a system such as a machine learning platform is converted into a target data structure or object through a type conversion operation, the target data structure or object is converted into a corresponding binary byte stream through a serialization operation, and then the binary byte stream is output to a local or remote storage medium for use by another platform (e.g., another machine learning system) or the like. Optionally, the type conversion from the source data structure or object to the target data structure or object may be performed by one type conversion operation or may be performed by multiple type conversion operations.
In particular, in one embodiment of the invention, the type conversion operation may include a compression operation, which makes the design of the data output configuration information more concise. For example, the requirement to be output to a local or remote storage medium is a binary byte stream corresponding to the data structure or object in a compressed format, i.e., the target data structure or object resulting from the type conversion operation is compressed prior to the serialization operation. In this case, the type conversion operation includes a 2-step conversion operation, that is, an operation of converting a source data structure or object into another data structure or object, and an operation of performing compression on the resultant data structure or object.
It should be noted that, in step S21, the purpose of acquiring the data output configuration information is to convert the data structure or object into the required binary byte stream and output the binary byte stream, so the data output configuration information is only required to indicate that the above operation is clear, and the embodiment of the present invention is not limited to the specific representation form of the data output configuration information. Preferably, in some embodiments of the present invention, the data output configuration information may be represented as a URI (uniform resource identifier), or a character string satisfying a preset format requirement, or a descriptive document capable of being read by a computer, etc.
For example, in one embodiment of the present invention, the data output configuration information in the form of URI may be expressed as:
hdfs://host:port/path/name?P1=P210&P2=P220&serial izat ion=C230&write=W240。
the data output configuration information represented in the form of URI can be divided into three parts, namely a transmission protocol, a storage position and an operation identification set, and the computer can convert the URI into a series of operations, wherein:
the transport protocol followed by the binary byte stream is hdfs (e.g., Hadoop distributed file system), the destination of the source data structure or object is set to host: port/path/name, and the operations indicated in the operation id set include P1, P210, P2, P220, serial, ion, C230, and W240.
Alternatively, P1 and P2 indicate two-step data type transformations, where P210 may indicate a transformation operation from a source data structure or object to an intermediate data structure or object and P220 may indicate a specific compaction operation that compacts the intermediate data structure or object into a target data structure or object.
Optionally, Serial izot ion ═ C230, which indicates that the target data structure or object is to be serialized, resulting in a binary byte stream. Where C230 is a specific serialization operation, adapted to the characteristics of the binary byte stream itself. Of course, different binary byte streams and target data structures or objects may correspond to different serialization operations, such as C231, C232, etc.
Optionally, where W240 denotes that a write-out operation of data is to be performed, W240 may be one of a plurality of specific operation symbols indicating a write operation, and protocols followed by the binary byte stream are different, different write operation symbols are required to implement corresponding write operations, for example, W240, W241, W242, W243, and the like respectively represent different write operations.
In step S22, the acquired operation identifier set indicates a series of operations that need to be performed, and optionally, in an embodiment of the present invention, the operations corresponding to the parsed operation identifier set may be performed serially. The operations corresponding to the parsed operation identifier set are executed serially, that is, the type conversion operation, the serialization operation and the write binary byte stream operation are executed in sequence, the data after the type conversion operation can be serialized after the type conversion operation is finished, and the type conversion of the next part of data is allowed only when the data after the type conversion is taken away for serialization operation. The serialization and write operations follow similar ordering rules. Optionally, in another embodiment of the present invention, the operation corresponding to the parsed operation identification set may also be executed in parallel. The operations corresponding to the analyzed operation identification set are executed in parallel, namely, the type conversion operation, the serialization operation and the write binary byte stream operation which are performed step by step are connected through a buffer queue, so that a set of assembly line is formed, various operations can be performed independently, for example, after the type conversion operation is performed on the current data, the converted data can be placed into the buffer queue, then the type conversion is performed on the next part of data immediately, and the serialization operation is not required to be performed after the current data is taken away, so that the data processing efficiency is effectively improved.
The following describes in detail a processing method for output data according to an embodiment of the present invention with reference to specific embodiments.
As shown in fig. 5, a method for processing output data according to an embodiment of the present invention may include:
s201, acquiring data output configuration information:
"hdfs:// host: port/path/name? P1 ═ P210& P2 ═ P220& serial izot ion ═ C230& write ═ W240 "; wherein, the operation identification set is "P1 ═ P210 ═ P2 ═ P220 ═ C230 ═ W240";
s202, generating data conversion operations P210 and P220 according to parameters P1 and P2;
s203, generating a serialization operation C230 according to the serialization parameter;
s204, generating a write operation W240 according to a transmission protocol or a write parameter;
s205, connecting P210, P220, C230 and W240 in series to form a set of data output operation flow;
s206, outputting the data T200 to be stored to the storage medium B according to the formed output operation flow, where the whole data flow structure diagram can be as shown in fig. 6.
In a third aspect, an embodiment of the present invention further provides a processing apparatus for input data, which can effectively process input data from various sources entering the processing apparatus by using unified configuration information, so as to liberate a user from a data preprocessing stage, and effectively improve the operating efficiency and the universality of the processing apparatus.
As shown in fig. 7, a processing apparatus for input data according to an embodiment of the present invention includes:
a configuration information obtaining unit 31, configured to obtain data input configuration information, where the data input configuration information includes an operation identifier set for performing a processing operation on a binary byte stream, and the operation identifier set includes a read operation identifier, an deserialization operation identifier, and a type conversion operation identifier;
a processing unit 32, configured to parse the operation identifier set, perform an operation corresponding to the parsed operation identifier set, to read a binary byte stream from a local or remote storage medium, perform a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and perform a type conversion operation on the source data structure or object to obtain a target data structure or object;
wherein the read operation identification indicates reading a binary byte stream from a local or remote storage medium, the deserialization operation identification indicates the preset deserialization operation, and the type conversion operation identification indicates at least one type conversion operation to be performed on the source data structure or object.
Optionally, the data input configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
Optionally, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
Optionally, the target data structure or object is used as a training sample for a machine learning model.
Optionally, the processing device is integrated in a machine learning system.
Optionally, the type conversion operation comprises a decompression operation.
Optionally, the data input configuration information is represented as a uniform resource identifier.
Optionally, the processing unit 32 performs the operations corresponding to the parsed operation identification set in series or in parallel.
Alternatively, the configuration information acquisition unit 31 acquires the data input configuration information by detecting a configuration operation performed by the user on the user graphical interface.
In a fourth aspect, embodiments of the present invention further provide a processing apparatus for output data, which can effectively process output data provided from the processing apparatus to various destinations by using unified configuration information, thereby freeing a user from a data output processing stage, and effectively improving the operating efficiency and the versatility of the processing apparatus.
As shown in fig. 8, a processing apparatus for output data according to an embodiment of the present invention may include:
a configuration information obtaining unit 41, configured to obtain data output configuration information, where the data output configuration information includes an operation identifier set for performing a processing operation on a source data structure or an object, and the operation identifier set includes a type conversion operation identifier, a serialization operation identifier, and a write operation identifier;
a processing unit 42 for parsing the operation identification set, performing an operation corresponding to the parsed operation identification set to perform a type conversion operation on the source data structure or object to obtain a target data structure or object, performing the preset serialization operation on the target data structure or object to obtain a binary byte stream, and outputting the binary byte stream to a local or remote storage medium,
wherein the type conversion operation identifier indicates at least one type conversion operation to be performed on a source data structure or object, the serialization operation identifier indicates the preset serialization operation, and the write operation identifier indicates outputting a binary byte stream to a local or remote storage medium.
Optionally, the data output configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
Optionally, the transmission protocol indicates a transmission protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
Optionally, the source data structure or the object is a training result of a machine learning model.
Optionally, the processing device is integrated in a machine learning system.
Optionally, the type conversion operation comprises a compression operation.
Optionally, the data output configuration information is represented as a uniform resource identifier.
Optionally, processing unit 42 performs operations corresponding to the parsed operation identification set in series or in parallel.
Alternatively, the configuration information acquisition unit 41 acquires the data output configuration information by detecting a configuration operation performed by the user on the user graphical interface.
It should be noted that the two processing devices may be unified so as to perform processing for both input data and output data, and accordingly, the configuration information acquiring unit needs to acquire both data input configuration information and data output configuration information, and the processing unit needs to perform corresponding input data processing and output data processing. Since the processing details of the above units have been described in detail in connection with the method embodiments, the details will not be repeated here.
Further, as shown in fig. 9, an electronic device provided by an embodiment of the present invention may include: the device comprises a shell 51, a processor 52, a memory 53, a circuit board 54 and a power circuit 55, wherein the circuit board 54 is arranged inside a space enclosed by the shell 51, and the processor 52 and the memory 53 are arranged on the circuit board 54; a power supply circuit 55 for supplying power to each circuit or device of the electronic apparatus; the memory 53 is used to store executable program code; the processor 52 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 53, and is configured to execute the processing method for the input data and/or the processing method for the output data provided in any of the foregoing embodiments.
For specific execution processes of the above steps by the processor 52 and further steps executed by the processor 52 by running the executable program code, reference may be made to the description of the foregoing embodiments, and details are not described herein again.
The electronic device may exist in various forms, and may have a single or distributed computing structure, which is not limited in the present invention.
In a fifth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to implement any one of the processing methods for input data and/or the processing method for output data provided in the foregoing embodiments, so that corresponding technical effects can also be achieved, which have been described in detail above and are not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (38)

1. A method for processing input data, comprising:
acquiring data input configuration information, wherein the data input configuration information comprises an operation identification set for processing binary byte streams, and the operation identification set comprises a reading operation identification, an anti-serialization operation identification and a type conversion operation identification; the form of the data input configuration information is agreed with an external data source, so that the system is prompted to process input data through the transmission of the data input configuration information;
parsing the operation identification set, executing an operation corresponding to the parsed operation identification set to read the binary byte stream from the local or remote storage medium, executing a preset deserialization operation on the read binary byte stream to obtain a source data structure or object, and executing a type conversion operation on the source data structure or object to obtain a target data structure or object;
wherein the read operation identifier indicates reading a binary byte stream from a local or remote storage medium, the deserialization operation identifier indicates the preset deserialization operation, and the type conversion operation identifier indicates at least one type conversion operation to be performed on a source data structure or object;
the acquiring data input configuration information includes: the data input configuration information is obtained from an external data input source directly or through a network, or is generated in the system itself.
2. The processing method according to claim 1,
the data input configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
3. The processing method according to claim 2, wherein the transport protocol indicates a transport protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
4. The process of claim 1, wherein the target data structure or object is used as a training sample for a machine learning model.
5. The process of claim 4, wherein the process is performed by a machine learning system.
6. The processing method according to any of claims 1 to 5, wherein the type conversion operation comprises a decompression operation.
7. The processing method according to any of claims 1 to 5, wherein the data input configuration information is represented as a uniform resource identifier.
8. The processing method according to any one of claims 1 to 5, wherein the performing the operation corresponding to the parsed operation identification set comprises: operations corresponding to the parsed operation identification set are performed serially or in parallel.
9. The processing method according to any one of claims 1 to 5, wherein the obtaining data input configuration information comprises: the data input configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface.
10. A method for processing output data, comprising:
acquiring data output configuration information, wherein the data output configuration information comprises an operation identifier set for processing a source data structure or an object, and the operation identifier set comprises a type conversion operation identifier, a serialization operation identifier and a write operation identifier; the data output configuration information is in a form appointed with an external data destination so as to prompt a system to process output data through transmission of the data output configuration information;
parsing the operation identification set, performing an operation corresponding to the parsed operation identification set to perform a type conversion operation on the source data structure or object to obtain a target data structure or object, performing a preset serialization operation on the target data structure or object to obtain a binary byte stream, and outputting the binary byte stream to a local or remote storage medium,
wherein the type conversion operation identifier indicates at least one type conversion operation to be performed on a source data structure or object, the serialization operation identifier indicates the preset serialization operation, and the write operation identifier indicates outputting a binary byte stream to a local or remote storage medium;
the acquiring data output configuration information includes: the data output configuration information is obtained directly or through a network from a destination that receives the output data, or is generated at the system itself.
11. The processing method according to claim 10,
the data output configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
12. The processing method of claim 11, wherein the transport protocol indicates a transport protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
13. The process of claim 10, wherein the source data structure or object is a training result of a machine learning model.
14. The process of claim 13, wherein the process is performed by a machine learning system.
15. The processing method according to any of claims 10 to 14, wherein the type conversion operation comprises a compression operation.
16. The processing method according to any of claims 10 to 14, wherein the data output configuration information is represented as a uniform resource identifier.
17. The processing method according to any one of claims 10 to 14, wherein the performing the operation corresponding to the parsed operation identification set comprises: operations corresponding to the parsed operation identification set are performed serially or in parallel.
18. The processing method according to any one of claims 10 to 14, wherein the obtaining data output configuration information comprises: the data output configuration information is obtained by detecting a configuration operation performed by a user on a user graphical interface.
19. A processing apparatus for input data, comprising:
the device comprises a configuration information acquisition unit, a data input configuration information acquisition unit and a data output configuration information processing unit, wherein the data input configuration information comprises an operation identification set for processing the binary byte stream, and the operation identification set comprises a reading operation identification, an anti-serialization operation identification and a type conversion operation identification; the form of the data input configuration information is agreed with an external data source, so that the system is prompted to process input data through the transmission of the data input configuration information;
the processing unit is used for analyzing the operation identification set, executing an operation corresponding to the analyzed operation identification set to read the binary byte stream from a local or remote storage medium, executing a preset deserialization operation on the read binary byte stream to obtain a source data structure or an object, and executing a type conversion operation on the source data structure or the object to obtain a target data structure or an object;
wherein the read operation identifier indicates reading a binary byte stream from a local or remote storage medium, the deserialization operation identifier indicates the preset deserialization operation, and the type conversion operation identifier indicates at least one type conversion operation to be performed on a source data structure or object;
the configuration information obtaining unit is specifically configured to obtain the data input configuration information directly or through a network from an external data input source, or generate the data input configuration information in the system itself.
20. The processing apparatus according to claim 19,
the data input configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
21. The processing apparatus according to claim 20, wherein the transport protocol indicates a transport protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
22. The processing apparatus according to claim 19, wherein the target data structure or object is used as a training sample for a machine learning model.
23. The processing device of claim 22, wherein the processing device is integrated in a machine learning system.
24. The processing apparatus according to any of the claims 19 to 23, wherein the type conversion operation comprises a decompression operation.
25. The processing apparatus according to any of claims 19 to 23, wherein the data input configuration information is represented as a uniform resource identifier.
26. The processing apparatus according to any of claims 19 to 23, wherein the processing unit performs operations corresponding to the parsed operation identification set in series or in parallel.
27. The processing apparatus according to any one of claims 19 to 23, wherein the configuration information acquisition unit acquires the data input configuration information by detecting a configuration operation performed by a user on a user graphical interface.
28. A processing apparatus for output data, comprising:
the device comprises a configuration information acquisition unit, a data output configuration information acquisition unit and a data output configuration information processing unit, wherein the data output configuration information comprises an operation identification set for processing a source data structure or an object, and the operation identification set comprises a type conversion operation identification, a serialization operation identification and a write operation identification; the data output configuration information is in a form appointed with an external data destination so as to prompt a system to process output data through transmission of the data output configuration information;
a processing unit for parsing the operation identification set, performing an operation corresponding to the parsed operation identification set to perform a type conversion operation on the source data structure or object to obtain a target data structure or object, performing a preset serialization operation on the target data structure or object to obtain a binary byte stream, and outputting the binary byte stream to a local or remote storage medium,
wherein the type conversion operation identifier indicates at least one type conversion operation to be performed on a source data structure or object, the serialization operation identifier indicates the preset serialization operation, and the write operation identifier indicates outputting a binary byte stream to a local or remote storage medium;
the configuration information obtaining unit is specifically configured to obtain the data output configuration information from a destination that receives output data directly or through a network, or generate the data output configuration information at the system itself.
29. The processing apparatus of claim 28,
the data output configuration information further includes an address of the local or remote storage medium and/or a transmission protocol for the binary byte stream.
30. The processing apparatus according to claim 29, wherein the transport protocol indicates a transport protocol corresponding to a relational database, a non-relational database, a message queue, or a log data stream.
31. The processing apparatus according to claim 28, wherein the source data structure or object is a training result of a machine learning model.
32. The processing device of claim 31, wherein the processing device is integrated in a machine learning system.
33. The processing apparatus according to any of the claims 28 to 32, wherein the type conversion operation comprises a compression operation.
34. The processing apparatus according to any of claims 28 to 32, wherein the data output configuration information is represented as a uniform resource identifier.
35. The processing apparatus according to any of claims 28 to 32, wherein the processing unit performs operations corresponding to the parsed operation identification set in series or in parallel.
36. The processing apparatus according to any one of claims 28 to 32, wherein the configuration information acquisition unit acquires the data output configuration information by detecting a configuration operation performed by a user on a user graphical interface.
37. An electronic device, characterized in that the electronic device comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing the processing method for the input data of any one of the preceding claims 1 to 9; or for performing the processing method for output data of any of the preceding claims 10 to 18.
38. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the processing method for input data of any one of the preceding claims 1 to 9; or to implement a processing method for output data as claimed in any of the preceding claims 10 to 18.
CN201810015586.7A 2018-01-08 2018-01-08 Processing method and device for input data and output data Active CN108153896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810015586.7A CN108153896B (en) 2018-01-08 2018-01-08 Processing method and device for input data and output data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810015586.7A CN108153896B (en) 2018-01-08 2018-01-08 Processing method and device for input data and output data

Publications (2)

Publication Number Publication Date
CN108153896A CN108153896A (en) 2018-06-12
CN108153896B true CN108153896B (en) 2020-07-10

Family

ID=62461185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810015586.7A Active CN108153896B (en) 2018-01-08 2018-01-08 Processing method and device for input data and output data

Country Status (1)

Country Link
CN (1) CN108153896B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582390A (en) * 2018-11-29 2019-04-05 上海哔哩哔哩科技有限公司 Game data generation method, device and storage medium based on exploitation allocation list
CN110705714B (en) * 2019-09-27 2022-07-22 上海联影医疗科技股份有限公司 Deep learning model detection method, deep learning platform and computer equipment
CN111026736B (en) * 2019-12-13 2024-03-12 中盈优创资讯科技有限公司 Data blood margin management method and device and data blood margin analysis method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092980A (en) * 2013-01-31 2013-05-08 中国科学院自动化研究所 Method and system of data automatic conversion and storage
CN105335412A (en) * 2014-07-31 2016-02-17 阿里巴巴集团控股有限公司 Method and device for data conversion and data migration
CN106570018A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Serialization method and apparatus, deserialization method and apparatus, serialization and deserialization system, and electronic device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077335B (en) * 2013-05-07 2017-05-03 腾讯科技(深圳)有限公司 Methods, devices and system for serializing and deserializing structured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092980A (en) * 2013-01-31 2013-05-08 中国科学院自动化研究所 Method and system of data automatic conversion and storage
CN105335412A (en) * 2014-07-31 2016-02-17 阿里巴巴集团控股有限公司 Method and device for data conversion and data migration
CN106570018A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Serialization method and apparatus, deserialization method and apparatus, serialization and deserialization system, and electronic device

Also Published As

Publication number Publication date
CN108153896A (en) 2018-06-12

Similar Documents

Publication Publication Date Title
US8689177B2 (en) Integration environment generator
CN110888720A (en) Task processing method and device, computer equipment and storage medium
CN108153896B (en) Processing method and device for input data and output data
CN111683066A (en) Heterogeneous system integration method and device, computer equipment and storage medium
CN110580158A (en) Code generation method and device, storage medium and electronic equipment
CN113468344B (en) Entity relationship extraction method and device, electronic equipment and computer readable medium
CN114416877A (en) Data processing method, device and equipment and readable storage medium
CN104598570A (en) Resource fetching method and device
CN111680092A (en) Method, system, server and storage medium for importing data into hive table
JP2009181446A (en) Program generating device and block diagram generating apparatus
CN111369237A (en) Data processing method and device and computer storage medium
CN110704099B (en) Alliance chain construction method and device and electronic equipment
CN112612427A (en) Vehicle stop data processing method and device, storage medium and terminal
CN109446146B (en) State transition sequence generation method of application layer communication protocol
CN113536748A (en) Method and device for generating chart data
US11797277B2 (en) Neural network model conversion method server, and storage medium
WO2016110204A1 (en) Processing of process object, and method and device for generating plug-in
CN110888883A (en) Data storage method, device, system and storage medium
CN110727654B (en) Data extraction method and device for distributed system, server and storage medium
JP6298014B2 (en) Manual generation device, manual generation system, manual generation method, and manual generation program
US9405512B2 (en) Rejuvenation of legacy code into resources-oriented architectures
CN112947938B (en) File conversion method and device, electronic equipment and storage medium
CN113129049B (en) File configuration method and system for model training and application
CN113342644B (en) Automatic test script generation method and device based on grammar analysis technology
CN113157276B (en) Layout file conversion method, apparatus, electronic device, and computer-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant