CN111813846A - Data analysis processing system and data processing method - Google Patents

Data analysis processing system and data processing method Download PDF

Info

Publication number
CN111813846A
CN111813846A CN202010611247.2A CN202010611247A CN111813846A CN 111813846 A CN111813846 A CN 111813846A CN 202010611247 A CN202010611247 A CN 202010611247A CN 111813846 A CN111813846 A CN 111813846A
Authority
CN
China
Prior art keywords
data
data structure
dynamic
type
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010611247.2A
Other languages
Chinese (zh)
Other versions
CN111813846B (en
Inventor
焦悦光
胡宗星
邱剑生
郭璐
崔静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zetyun Tech Co ltd
Original Assignee
Beijing Zetyun Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zetyun Tech Co ltd filed Critical Beijing Zetyun Tech Co ltd
Priority to CN202010611247.2A priority Critical patent/CN111813846B/en
Publication of CN111813846A publication Critical patent/CN111813846A/en
Application granted granted Critical
Publication of CN111813846B publication Critical patent/CN111813846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data analysis processing system and a data processing method, wherein the method comprises the following steps: obtaining input data of a first data structure of a streaming task; converting input data of the first data structure into intermediate data of a second data structure; calculating the intermediate data by using an operator of the flow task, and outputting a calculation result; wherein the second data structure includes a static data region and a dynamic data region. The data analysis processing system in the embodiment of the invention can process dynamic data or complex data, and improves the data processing efficiency.

Description

Data analysis processing system and data processing method
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data analysis processing system and a data processing method.
Background
In recent years, big data processing and analysis have become global problems, and with the increasing level of informatization and automation of the economy and society, big data problems are faced in many fields such as government management, public services, scientific research, commercial application and the like, and various solutions which are targeted and economically effective are needed. The big data platform provides processing capacity for industry big data, and integrates functions of data access, data processing, data storage, query and retrieval, analysis and mining, application interfaces and the like.
The existing data analysis processing system can only process single-layer data or static data, but cannot process dynamic data or complex data (nested data), and has low data processing efficiency and single data processing type.
Disclosure of Invention
The embodiment of the invention provides a data analysis processing system and a data processing method, which solve the problems of low data processing efficiency and single type of processed data of the conventional data analysis processing system.
In order to solve the above technical problem, the present invention provides a data processing method applied to a data analysis processing system, the method comprising:
obtaining input data of a first data structure of a streaming task;
converting input data of the first data structure into intermediate data of a second data structure;
processing the intermediate data by using an operator of the stream task, and outputting a processing result;
wherein the second data structure includes a static data region and a dynamic data region.
Preferably, in the above method, the converting the input data of the first data structure into the intermediate data of the second data structure includes:
acquiring the data type of the input data;
and converting the input data of the first data structure into intermediate data of a second data structure according to the data type.
Preferably, in the above method, the converting the input data of the first data structure into the intermediate data of the second data structure according to the data type includes:
determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
Preferably, in the above method, the step of determining, according to the original data type of each field of the input data, a corresponding target data type of each field in the second data structure includes:
a substep: if the original data type of the field of the input data is static and the data type is a scalar, marking the field as static data;
and a substep b: if the original data type of the field of the input data is static and the data type is a non-scalar, recursively repeating the substeps a and b for each subfield of the field;
and a substep c: if the original data type of the field of the input data is dynamic and the number and the name of the sub-fields contained in the field are determined, recursively repeating the sub-steps a, b and c for each sub-field of the field;
and a substep d: and if the original data type of the field of the input data is dynamic and the number or the name of the sub-fields of the field is uncertain, marking the field as dynamic data.
Preferably, in the above method, before the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure, the method further includes:
establishing a static data area with a corresponding length according to the number of the static area indexes;
and establishing a dynamic data area with a corresponding length according to the number of the dynamic area indexes.
Preferably, in the above method, the static data area is a variable-length array, and the dynamic data area is a variable-length array.
Preferably, in the above method, the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure includes:
mapping the value of the field marked as the static data into the array element of which the static area index corresponding to the field of the static data is a subscript in the static data area;
mapping the value of the field marked as the dynamic data into an array element in the dynamic data area, wherein the dynamic area index corresponding to the field of the dynamic data is a subscript;
and setting the value of an array element with the static area index corresponding to the field of the dynamic data as a subscript in the static data area as the dynamic area index.
Preferably, in the above method, the acquiring the data type of the input data includes:
obtaining a data type of the input data based on a user configuration input; or
Determining a data type of the input data based on a pre-established data type prediction model.
Preferably, in the above method, the input data includes nested data and/or dynamic data.
Preferably, in the above method, after the step of converting the input data of the first data structure into the intermediate data of the second data structure, the method further includes:
determining a computing mode of the stream task based on the target data type;
the step of processing the intermediate data by using the operator of the stream task and outputting a processing result comprises the following steps:
and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
Preferably, in the above method, before the step of obtaining the input data of the first data structure of the streaming task, the method further includes: and acquiring input data of the streaming task, and performing deserialization processing on the input data.
Preferably, in the method, the step of processing the intermediate data by using an operator of the stream task and outputting a processing result includes:
accessing a value corresponding to the intermediate data through the subscript of the array element by using an operator of the stream task;
calculating by using the value to obtain a calculation result;
and converting the calculation result into data of the first data structure to obtain output data.
Preferably, in the above method, after the step of converting the calculation result into the data of the first data structure and obtaining the output data, the method further includes:
carrying out serialization processing on the output data;
and outputting the output data after the serialization processing.
Preferably, in the above method, the stream task runs in a distributed manner, the step of processing the intermediate data by using an operator of the stream task and outputting a processing result includes:
calculating the intermediate data by using a first operator of the stream task, and performing serialization processing on the calculated data to obtain a byte stream;
inputting the byte stream into a second operator, and performing anti-sequence on the byte stream to obtain calculation data; and processing the calculation data by using a second operator, and outputting a calculation result.
Preferably, in the above method, the second data structure further includes intrinsic attributes.
Preferably, in the above method, the step of determining, according to the original data type of each field of the input data, a corresponding target data type of each field in the second data structure includes:
and if the field of the input data is a field common to at least two data structures in the first data structure, mapping the field to the intrinsic attribute.
The embodiment of the present invention further provides a data analysis processing system, where the data analysis processing system includes:
the acquisition module is used for acquiring input data of a first data structure of the stream task;
a conversion module for converting the input data of the first data structure into intermediate data of a second data structure;
the processing module is used for processing the intermediate data by using the operator of the flow task and outputting a processing result;
wherein the second data structure includes a static data region and a dynamic data region.
Preferably, in the data analysis processing system, the conversion module includes:
the acquisition subunit is used for acquiring the data type of the input data;
and the conversion subunit is used for converting the input data of the first data structure into the intermediate data of the second data structure according to the data type.
Preferably, in the data analysis processing system, the conversion subunit is specifically configured to:
determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
Preferably, in the data analysis processing system, the step of obtaining, according to the original data type of each field of the input data, a target data type corresponding to each field in the second data structure includes:
a substep: if the original data type of the field of the input data is static and the data type is a scalar, marking the field as static data;
and a substep b: if the original data type of the field of the input data is static and the data type is a non-scalar, recursively repeating the substeps a and b for each subfield of the field;
and a substep c: if the original data type of the field of the input data is dynamic and the number and the name of the sub-fields contained in the field are determined, recursively repeating the sub-steps a, b and c for each sub-field of the field;
and a substep d: and if the original data type of the field of the input data is dynamic and the number or the name of the sub-fields of the field is uncertain, marking the field as dynamic data.
Preferably, in the data analysis processing system, before the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure, the method further includes:
establishing a static data area with a corresponding length according to the number of the static area indexes;
and establishing a dynamic data area with a corresponding length according to the number of the dynamic area indexes.
Preferably, in the data analysis processing system, the static data area is a variable-length array, and the dynamic data area is a variable-length array.
Preferably, in the data analysis processing system, the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure includes:
mapping the value of the field marked as the static data into the array element of which the static area index corresponding to the field of the static data is a subscript in the static data area;
mapping the value of the field marked as the dynamic data into an array element in the dynamic data area, wherein the dynamic area index corresponding to the field of the dynamic data is a subscript;
and setting the value of an array element with the static area index corresponding to the field of the dynamic data as a subscript in the static data area as the dynamic area index.
Preferably, in the data analysis processing system, the obtaining subunit is specifically configured to:
obtaining a data type of the input data based on a user configuration input; or
Determining a data type of the input data based on a pre-established data type prediction model.
Preferably, in the data analysis processing system, the input data includes nested data and/or dynamic data.
Preferably, in the data analysis processing system, after the step of converting the input data of the first data structure into the intermediate data of the second data structure, the data analysis processing system further includes:
determining a computing mode of the stream task based on the target data type;
the processing module is specifically configured to:
and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
Preferably, the data analysis processing system further includes:
and the deserializing module is used for acquiring the input data of the stream task and deserializing the input data.
Preferably, in the data analysis processing system, the processing module is further specifically configured to:
accessing a value corresponding to the intermediate data through the subscript of the array element by using an operator of the stream task;
calculating by using the value to obtain a calculation result;
and converting the calculation result into data of the first data structure to obtain output data.
Preferably, in the data analysis processing system, the processing module is further specifically configured to:
carrying out serialization processing on the output data;
and outputting the output data after the serialization processing.
Preferably, in the data analysis processing system, the stream task runs in a distributed manner, and the processing module is further specifically configured to:
calculating the intermediate data by using a first operator of the stream task, and performing serialization processing on the calculated data to obtain a byte stream;
inputting the byte stream into a second operator, and performing anti-sequence on the byte stream to obtain calculation data; and processing the calculation data by using a second operator, and outputting a calculation result.
Preferably, in the data analysis processing system, the second data structure further includes intrinsic attributes.
Preferably, in the data analysis processing system, the step of determining, according to the original data type of each field of the input data, a target data type corresponding to each field in the second data structure includes:
and if the field of the input data is a field common to at least two data structures in the first data structure, mapping the field to the intrinsic attribute.
The embodiment of the present invention further provides a data analysis processing system, where the data analysis processing system includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the steps of the data processing method are implemented.
An embodiment of the present invention further provides a readable storage medium, where a computer program is stored, and when the computer program is executed, the steps of the data processing method are implemented.
The invention provides a data analysis processing system and a data processing method, wherein the method comprises the following steps: obtaining input data of a first data structure of a streaming task; converting the input data of the first data structure into a second data structure to obtain intermediate data; calculating the intermediate data by using an operator of the flow task, and outputting a calculation result; wherein the second data structure includes a static data region and a dynamic data region. According to the embodiment of the invention, the first data structure of the input data is converted into the second data structure, so that the data analysis processing system can process dynamic data or complex data, and the data processing efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flow chart of a data processing method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a graphical user interface for defining data structures provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a graphical user interface for defining a data structure according to an embodiment of the present invention;
FIG. 4 is a flow chart of step 102 of a data processing method provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a streaming task provided by an embodiment of the present invention;
FIG. 6 is a graphical configuration interface of a stream task operator provided by embodiments of the present invention;
FIG. 7 is a graphical configuration interface of yet another stream task operator provided by an embodiment of the present invention;
fig. 8 is a block diagram of a data analysis processing system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a data processing method provided by an embodiment of the present invention, where the data processing method is applied to a data analysis processing system, and as shown in fig. 1, the data processing method includes the following steps:
step 101, input data of a first data structure of a stream task is obtained.
Optionally, the input data includes real-time data, and the input data may be nested data, or may be dynamic data. The nested data refers to data comprising at least two layers of data structures, and the field of the nested data is a non-scalar.
Wherein the first data structure includes a field name and a type of value. In an embodiment of the present invention, the first data structure includes at least one of: dynamic data structures, static data structures, nested data structures. If the field information in the first data structure is pre-known, i.e. the name, type and number of the fields can be determined, and the type of each field is either scalar or static, the data structure is called a static data structure, whereas if the field information in the first data structure is unpredictable, the data structure is called a dynamic data structure.
Illustratively, the single-layer data structure and the nested data structure are specifically described below in the "student achievement" context. A data structure is defined below to represent the student's record of performance, called "student performance", where the first line is the name of the data structure, followed by the fields' names, and after the colon is the type of the field:
student achievement
Study number: character string
Name: character string
Achievement: integer number of
The type of value of each field in the data structure is scalar (i.e., a value that does not require decomposition and can be directly processed, such as an integer, a floating point, a string, etc.), which is referred to as a single-layer (flat) data structure.
Modifying the 'achievement' field type in the data structure into another data structure 'each achievement':
score of each department
The language: integer number of
Mathematics is as follows: integer number of
English: integer number of
The overall data structure definition of "student achievement" becomes:
student achievement
Study number: character string
Name: character string
Achievement: score of each department
The language: integer number of
Mathematics is as follows: integer number of
English: integer number of
At this point the type of the "achievements" field is no longer a scalar, and this data structure (the overall data structure of the "student achievements") is referred to as a nested (i.e., multi-level) data structure.
The static data structures and dynamic data structures are further described below in conjunction with the above examples.
The "student achievement" in the above example includes that the field name, the type, and the number of fields of the field are fixed, that is, the field information of the data structure is predictable, and thus the "student achievement" in the above example is a static data structure. If the above-mentioned "records of each department" data structure also contains a field "other subjects", its type is a dynamic data structure (for example, the records of other subjects can be stored in the way of key-value pair, the key is the name of the subject, the value is its corresponding record, here the number and name of the key are unpredictable), then the "records of each department" data structure is dynamic, thus result in the "student record" data structure containing "records of each department" also being dynamic.
Optionally, the obtaining of the input data of the first data structure of the stream task specifically includes: defining a data structure according to input data to be processed, and processing the input data based on the defined data structure so as to obtain the input data of the first data structure. Wherein, defining the data structure may be that a user defines the data structure used in the stream task by some data structure description language (for example, by means of Json code definition) or by means of a Graphical User Interface (GUI). FIG. 2 is a diagrammatic view of a graphical user interface defining a "department achievements" data structure. The "scores of subjects" data structure shown in fig. 2 has three fields "language", "math", and "english" of which types are integers, and one field "other subjects" of which types is field-value type dynamic data.
FIG. 3 is a diagrammatic illustration of a graphical interface defining a "student achievement" data structure. The defined 'each achievement' type is quoted, and the nested definition is formed. The final overall data structure of the obtained student achievement is defined as:
student achievement
Study number: character string
Name: character string
Achievement: score of each department
The language: integer number of
Mathematics is as follows: integer number of
English: integer number of
Other subjects: key-value type dynamic data.
Optionally, in step 101, based on obtaining a first data structure of input data of the streaming task, the data processing method further includes: and acquiring input data of the streaming task, and performing deserialization processing on the input data.
Specifically, the input data of the streaming task is usually in the form of a byte stream, and the data analysis processing system cannot directly process the input data and needs to deserialize the input data into data of the first data structure.
Step 102, converting the input data of the first data structure into intermediate data of a second data structure.
Wherein the second data structure includes a static data region and a dynamic data region.
Here, the static data area: is an array of indefinite length, the value of the array is a scalar, and the type of the value of the data includes, but is not limited to, at least one of: integers, strings, boolean values, and the like. Dynamic data area: the array is an array with an indefinite length, and the value of the array is various dynamic data structures, for example, the value of the array can be field-value type dynamic data and the like. The dynamic data area can open up a storage space by using a pointer (for example, an array which is dynamically increased by a linked list mode, and field-value type data which is dynamically increased by a hash table mode) and can dynamically increase the storage space.
And 103, processing the intermediate data by using the operator of the stream task, and outputting a processing result.
The method and the device adopt the second data structure to adapt to the possible dynamic/nested data structure, and convert the data of the data structures such as the dynamic data structure, the nested data structure and the like into the data of the second data structure which is supported by the data analysis processing system to operate, thereby realizing the real-time processing of the data of the dynamic data structure and the data of the nested data structure.
The implementation of the steps of the method is specifically described in detail below for the steps of the method:
optionally, as shown in fig. 4, step 102 includes:
step 1021, acquiring the data type of the input data.
Wherein, the step 1021 of obtaining the data type of the input data of the first data structure specifically includes: obtaining a data type of the input data based on a user configuration input; or processing the input data based on a data type prediction model established in advance in the data analysis processing system so as to determine the data type of the input data.
Specifically, the obtaining of the data type of the input data based on the user configuration input includes: and displaying a user interface for configuring the data type, and acquiring the configuration operation of the user on the user interface so as to acquire the data type of the input data of the first data structure.
Specifically, processing the input data based on a data type prediction model in the data analysis processing system, so as to determine the data type of the input data includes: the user inputs sample data, and the data analysis processing system utilizes a pre-trained data type prediction model to automatically infer the data type according to the sample data input by the user. Furthermore, the user can perform custom adjustment and modification on the data type automatically inferred by the data analysis processing system to obtain the final data type.
Step 1022, converting the input data of the first data structure into the intermediate data of the second data structure according to the data type.
Specifically, the conversion of the input data of the first data structure into the intermediate data of the second data structure includes two processes, marking the data type and establishing an index. The process of establishing the index is as follows:
said converting said input data into intermediate data of a second data structure according to said data type of said step 1022 comprises:
determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
The process in which data types are marked is as follows:
the step of determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data includes:
a substep: if the original data type of the field of the input data is static and the data type is a scalar, marking the field as static data;
and a substep b: if the original data type of the field of the input data is static and the data type is a non-scalar, recursively repeating the substeps a and b for each subfield of the field;
and a substep c: if the original data type of the field of the input data is dynamic and the number and the name of the sub-fields contained in the field are determined, recursively repeating the sub-steps a, b and c for each sub-field of the field;
and a substep d: and if the original data type of the field of the input data is dynamic and the number or the name of the sub-fields of the field is uncertain, marking the field as dynamic data.
The marking process is for determining whether to place the fields in the static data area or the dynamic data area of the second data structure.
After completing the marking of the data type and the indexing, converting the input data of the first data structure into the intermediate data of the second data structure further comprises two processes of establishing a data area and mapping data. The process of establishing the data area is as follows:
before the step of converting the input data into intermediate data of the second data structure according to the static area index, the dynamic area index, and the corresponding data type of each field in the second data structure, the method further includes:
establishing a static data area with a corresponding length according to the number of the static area indexes;
and establishing a dynamic data area with a corresponding length according to the number of the dynamic area indexes.
The static data area is a variable-length array, and the dynamic data area is a variable-length array.
The process of mapping data is as follows:
the step of converting the input data into intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure includes:
mapping the value of the field marked as the static data into the array element of which the static area index corresponding to the field of the static data is a subscript in the static data area;
mapping the value of the field marked as the dynamic data into an array element in the dynamic data area, wherein the dynamic area index corresponding to the field of the dynamic data is a subscript;
and setting the value of an array element with the static area index corresponding to the field of the dynamic data as a subscript in the static data area as the dynamic area index.
Further, the second data structure further includes intrinsic attributes. Wherein intrinsic properties refer to some common fields that the adapted at least two data structures have, said data structures comprising the first data structure of the streaming task. Specifically, for example, events in a stream task of a data analysis processing system have a timestamp, and the timestamp can be regarded as intrinsic. The fields inheriting the attributes are static, and the storage mode of the fields is determined in the source code design stage of the data analysis processing system.
Further, the second data structure includes intrinsic attributes, the steps of: the step of obtaining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data further includes:
and if the field of the input data is a field common to at least two data structures in the first data structure, mapping the field to the intrinsic attribute.
Since the fields common to at least two data structures in the first data structure are mapped to the intrinsic property without marking the steps b, c, d, e in the data type, the speed of data structure conversion can be increased by the data type of the intrinsic property.
Example one: taking the student's score data structure as an example, a field ' other subjects ' is added in the ' each subject score ' data structure, the type of the field ' other subjects ' is a dynamic data structure stored in a key-value mode, the key name is a subject name, and the value is the subject score. The overall definition of the "student achievement" data structure becomes:
student achievement
Study number: character string
Name: character string
Achievement: score of each department
The language: integer number of
Mathematics is as follows: integer number of
English: integer number of
Other subjects: key-value type dynamic data.
Table 1 is an illustration of the tagging and indexing of fields in the "student achievements" data structure.
TABLE 1
Figure BDA0002560985290000141
The fields of "school number", "name" and "achievement" are marked as "static data" because they are scalar quantities;
the "achievement" field is marked as "dynamic data" because it is a compound type which is not marked, but the "language", "mathematics" and "English" in the field are marked as "static data", and the type of the "other subjects" is a field-value type dynamic data. The "static data" and "dynamic data" are then numbered.
The second data structure thus obtained is schematically shown in table 2 below:
TABLE 2
Figure BDA0002560985290000151
The Chinese language represents the Chinese language field in the score field, and other fields are similar.
Illustratively, a piece of "student achievement" data in JSON format (first data structure) is as follows:
Figure BDA0002560985290000152
the data after parsing into the second data structure is as follows:
static data area: "2020001", "Zhang III", 88,92,59,0
Dynamic data area: { "physical": 70, "chemical": 78 }.
Example two: the two data structures used in the redefined stream task are as follows:
student's failure subject item counting
Study number: character string
Name: character string
Failing to count subjects:
the language: integer number of
Mathematics is as follows: integer number of
English: integer number of
Other subjects: integer number of
Total number of failed subjects for students
Study number: character string
Name: character string
Number of failed meshes: integer number of
The labeling and indexing process for the "student failed subject item count" data structure is schematically shown in table 3 below.
TABLE 3
Figure BDA0002560985290000161
Figure BDA0002560985290000171
The second data structure corresponding to the "student failing to reach subject item count" data structure is shown in table 4 below:
TABLE 4
Figure BDA0002560985290000172
The labeling and indexing process for the "total number of failed students" data structure is schematically shown in table 5 below:
TABLE 5
Figure BDA0002560985290000173
The second data structure corresponding to the student's total number of failed subjects "data structure is shown in table 6 below:
TABLE 6
Figure BDA0002560985290000181
Optionally, the embodiment of the present invention provides the following feasible implementation process for step 103, where in step 103, the implementation of the step of processing the intermediate data by using the operator of the stream task and outputting the processing result specifically includes:
accessing a value corresponding to the intermediate data through the subscript of the array element by using an operator of the stream task;
calculating by using the value to obtain a calculation result;
and converting the calculation result into data of the first data structure to obtain output data.
In the process of converting the data of the first data structure into the intermediate data of the second data structure, the index is constructed and is used as the subscript of the array element corresponding to the second data structure, so that the data can be directly obtained through the subscript of the array element without searching step by step through field names, the data can be quickly obtained, the waiting time is shortened, and the calculation speed of the stream task operator is increased.
Furthermore, it should be noted that, in the process of processing the intermediate data of the second data structure by the operator in the streaming task to obtain the calculation result, multiple serialization and deserialization processes are performed on the intermediate data of the second data structure based on the requirements of the operator on input and output data, so that the data output by the upstream operator can be transmitted to the downstream operator as input through the network.
Specifically, the serialization and deserialization of the intrinsic properties of the second data structure is done at the source code level.
The static data area of the second data structure is a variable-length array, and the serialization mode of the static data area of the second data structure can be as follows: firstly, outputting a serialized integer value to represent the number of elements in the array, and then sequentially outputting the values of the serialized elements. Each element is scalar type, and the original serialization mode of the corresponding data type under the programming language is adopted (for example, under the Java language, an integer directly outputs four bytes, and each character code of a character string is output). When deserializing, a whole value is deserialized firstly to know the length of the array, and then the value of each element is deserialized according to the predicted data type sequence.
The dynamic data area of the second data structure is a variable-length array, and the serialization mode of the dynamic data area of the second data structure can be as follows: firstly, outputting a serialized integer value to represent the number of elements in the array, and then sequentially outputting the values of the serialized elements. The serialization way for each element may be: first a serialized integer value is output, representing the length of the byte string (as its length is not fixed) after the value of the element itself has been serialized, and then the serialized value of the element is output. When deserializing, an integer value is deserialized to know the length of the array, and then each element is deserialized in sequence. When each element is deserialized, an integer value is deserialized to know the length of the byte string to be read, and then the byte string with the length is read and deserialized into the value of the element.
Further, after the step of converting the calculation result into the data of the first data structure to obtain the output data, the method further includes:
carrying out serialization processing on the output data;
and outputting the output data after the serialization processing.
Specifically, the static region index and the dynamic region index in the second data structure corresponding to each input/output field in the first data structure of the input data of the stream task, which is obtained based on the user configuration operation, can be obtained, so that when the stream task runs, the operator of the stream task directly accesses the corresponding value through the subscript of the array element. And calculating the value by an operator of the flow task to obtain a calculation result. And converting the calculation result into calculation data of the first data structure based on the second data structure, the static region index and the dynamic region index, and serializing the calculation data into a byte stream, namely obtaining data of a field-value type as output data.
Optionally, in this embodiment of the present invention, after converting the input data of the first data structure into the intermediate data of the second data structure in step 102, the method further includes:
determining a computing mode of the stream task based on the target data type;
the step of processing the intermediate data by using the operator of the stream task and outputting a processing result comprises the following steps: and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
The target data type based on the second data structure can optimize the calculation method, the calculation method can be determined in advance based on the type of the median in the data structure, and therefore the data analysis processing system can directly process the data based on the determined calculation method, and the operation speed is improved. For example, summing two data, if it is predicted that both data are integers, the runtime can directly use integer addition to obtain a result; if the types of both data are unpredictable, the runtime needs to judge the type combinations of the two data that may occur one by one to perform appropriate type conversion operations on the original data and to use the corresponding type addition operations, which reduces the running speed.
Optionally, the stream task runs in a distributed manner, in step 103, the processing the intermediate data by using an operator of the stream task, and outputting the processing result may further include:
calculating the intermediate data by using a first operator of the stream task, and performing serialization processing on the calculated data to obtain a byte stream;
inputting the byte stream into a second operator, and performing anti-sequence on the byte stream to obtain calculation data; and processing the calculation data by using a second operator, and outputting a calculation result.
Specifically, when the stream task runs on the distributed platform, the instances of the operators may run on different hosts, the output data of the upstream operator needs to be serialized into a byte stream, the byte stream is transmitted to the host where the downstream operator is located through the network, and then deserialization operation is performed to restore the byte stream to the original data. Because the intermediate data of the second data structure is data of an array structure, and the subscript of the array structure is the index corresponding to the intermediate data field, the field name of the data does not need to be saved when the calculation data obtained by calculation based on the intermediate data is subjected to serialization processing, the size of the generated byte stream can be reduced, the network bandwidth during data transmission is saved, and the data processing efficiency can be further improved.
Illustratively, a stream task is defined to process the above-mentioned "student achievement" data structure and "total number of failed subjects" data structure.
The streaming task is illustrated in fig. 5, where the input data format and the output data format are specified as JSON format. The input data is converted into RT Event (the RT Event is intermediate data of a second data structure) after being analyzed into the RT Event operator, and the corresponding data structure is student achievement; then, the corresponding data structure is changed into 'student failing to reach the department item count' through 'field value mapping' operator; then, the corresponding data structure is changed into the total number of the student failing to reach the subjects through a summation operator; and finally, converting the 'construction output' operator into data in a JSON format for output.
The specific operations performed by the operators of the stream task may be configured in a manner defined by a user input code (e.g., in a manner defined by a programming language such as Java, Python, or R) or in a manner defined by a graphical user interface. FIG. 6 below shows a graphical configuration interface for the operation of the "field value mapping" operator.
In the graphical configuration interface shown in fig. 6, the "academic number" and "name" fields of the configuration output data directly take the values of the homonymous fields of the input data, and the subfields "language" and "mathematics" and "english" under the "failing subject count" field of the output data are generated by performing conditional value calculation on the homonymous fields corresponding to the "score of each subject" field of the input data in a manner that when the original field value is less than 60, the result value is 1, otherwise, the result value is 0. The sub-field of other subjects under the output field of the failed subject count is generated by performing condition counting calculation on corresponding fields with the same name under the subject score field of the input data, and the counting mode is that the number of the dynamic sub-fields of the original field (which is field-value type dynamic data) is counted to be less than 60.
FIG. 7 is an interface schematic diagram of a graphical configuration of a "sum" operator
According to the embodiment of the invention, the first data structure of the input data is converted into the second data structure, so that the data analysis processing system can process dynamic data or complex data, and the data processing efficiency is improved.
In the graphical configuration interface shown in fig. 7, the "school number" and "name" fields of the configuration output data directly take the value of the same name field of the input data, and the "number of failed subjects" field of the output data is the sum of all subfields under the "failed subject count" of the input.
Taking a specific input data as an example, the change of the data after passing through each operator is explained.
Input data in JSON format:
Figure BDA0002560985290000211
the data after the "resolve to RT Event" operator is as follows:
static data area: "2020001", "Zhang III", 88,92,59,0
Dynamic data area: { "physical": 70 "," chemical ":78}
The data after the "field value mapping" operator is as follows:
static data area: "2020001", "Zhang III", 0,0,1,0
Dynamic data area: air conditioner
The data after the "sum" operator is as follows:
static data area: "2020001", "Zhang III", 1
Dynamic data area: air conditioner
The JSON format data output after the 'construction output' operator is as follows:
Figure BDA0002560985290000221
according to the invention, the first data structure of the input data is converted into the second data structure, so that the data analysis processing system can process dynamic data or complex data, and the data processing efficiency is improved. Meanwhile, the user operation is simple and convenient, and the user operation threshold is reduced.
Based on the model operation method provided in the above embodiment, an embodiment of the present invention further provides a data analysis processing system for implementing the above method, and referring to fig. 8, a data analysis processing system 800 provided in an embodiment of the present invention includes:
the obtaining module 801 obtains input data of a first data structure of a streaming task.
A conversion module 802, configured to convert the input data of the first data structure into intermediate data of a second data structure.
And the processing module 803 is configured to process the intermediate data by using an operator of the stream task, and output a processing result. Wherein the second data structure includes a static data region and a dynamic data region.
Optionally, in the data analysis processing system, the conversion module includes:
the acquisition subunit is used for acquiring the data type of the input data;
and the conversion subunit is used for converting the input data of the first data structure into the intermediate data of the second data structure according to the data type of the first data structure.
Optionally, in the data analysis processing system, the conversion subunit is specifically configured to: determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
Optionally, in the data analysis processing system, the step of determining, according to the original data type of each field of the input data, a target data type corresponding to each field in the second data structure includes:
a substep: if the original data type of the field of the input data is static and the type is a scalar, marking the field as static data;
and a substep b: if the original data type of the field of the input data is static and the data type is a non-scalar, recursively repeating the substeps a and b for each subfield of the field;
and a substep c: if the original data type of the field of the input data is dynamic and the number and the name of the sub-fields contained in the field are determined, recursively repeating the sub-steps a, b and c for each sub-field of the field;
and a substep d: and if the original data type of the field of the input data is dynamic and the number or the name of the sub-fields of the field is uncertain, marking the field as dynamic data.
Optionally, in the data analysis processing system, the first data structure includes a field name and a type of a value.
Optionally, in the data analysis processing system, before the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure, the method further includes:
establishing a static data area with a corresponding length according to the number of the static area indexes;
and establishing a dynamic data area with a corresponding length according to the number of the dynamic area indexes.
Optionally, the static data area is a variable length array, and the dynamic data area is a variable length array.
Optionally, in the data analysis processing system, the step of converting the input data of the first data structure into the intermediate data of the second data structure according to the static area index, the dynamic area index, and the data type corresponding to each field in the second data structure includes:
mapping the value of the field marked as the static data to an array element in the static data area, wherein the static area index corresponding to the field of the static data is a subscript;
mapping the value of the field marked as the dynamic data into an array element in the dynamic data area, wherein the dynamic area index corresponding to the field of the dynamic data is a subscript;
and setting the value of an array element with the static area index corresponding to the field of the dynamic data as a subscript in the static data area as the dynamic area index.
Optionally, in the data analysis processing system, the obtaining subunit is specifically configured to:
obtaining a data type of the input data based on a user configuration input; or
Determining a data type of the input data based on a pre-established data type prediction model.
Optionally, in the data analysis processing system, the input data includes nested data and/or dynamic data.
Optionally, in the data analysis processing system, after the step of converting the input data of the first data structure into the intermediate data of the second data structure, the method further includes:
determining a computing mode of the stream task based on the target data type;
the processing module is specifically configured to:
and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
Optionally, the data analysis processing system further includes: and the deserializing module is used for acquiring the input data of the stream task and deserializing the input data.
Optionally, in the data analysis processing system, the processing module 803 is further specifically configured to:
accessing a value corresponding to the intermediate data through the subscript of the array element by using an operator of the stream task;
calculating by using the value to obtain a calculation result;
and converting the calculation result into data of the first data structure to obtain output data.
Optionally, in the data analysis processing system, the processing module 803 is further specifically configured to:
carrying out serialization processing on the output data;
and outputting the output data after the serialization processing.
Optionally, in the data analysis processing system, the stream task runs in a distributed manner, and the processing module 803 is further specifically configured to:
calculating the intermediate data by using a first operator of the stream task, and performing serialization processing on the calculated data to obtain a byte stream;
inputting the byte stream into a second operator, and performing anti-sequence on the byte stream to obtain calculation data; and processing the calculation data by using a second operator, and outputting a calculation result.
Optionally, in the data analysis processing system, the second data structure further includes intrinsic attributes.
Optionally, in the data analysis processing system, the determining, according to the original data type of each field of the input data, a target type corresponding to each field in the second data structure includes:
and if the field of the input data is a field common to at least two data structures in the first data structure, mapping the field to the intrinsic attribute.
The data analysis processing system provided by the invention has the advantages that the first data structure of the input data is converted into the second data structure, so that the data analysis processing system can process dynamic data or complex data, and the data processing efficiency is improved. Meanwhile, the user operation is simple and convenient, and the user operation threshold is reduced.
An embodiment of the present invention provides a data analysis processing system, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when executed by the processor, the computer program implements the steps of the data processing method according to the above embodiment.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data processing method according to the above embodiment.
The embodiment of the present invention further provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A data processing method is applied to a data analysis processing system and is characterized by comprising the following steps:
obtaining input data of a first data structure of a streaming task;
converting input data of the first data structure into intermediate data of a second data structure;
processing the intermediate data by using an operator of the stream task, and outputting a processing result;
wherein the second data structure includes a static data region and a dynamic data region.
2. The method of claim 1, wherein converting the input data of the first data structure into the intermediate data of the second data structure comprises:
acquiring the data type of the input data;
and converting the input data of the first data structure into intermediate data of a second data structure according to the data type.
3. The method of claim 2, wherein converting the input data of the first data structure into the intermediate data of the second data structure according to the data type comprises:
determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
4. The method of claim 1, wherein the input data comprises nested data and/or dynamic data.
5. The method of claim 3, wherein after the step of converting the input data of the first data structure to the intermediate data of the second data structure, the method further comprises:
determining a computing mode of the stream task based on the target data type;
the step of processing the intermediate data by using the operator of the stream task and outputting a processing result comprises the following steps:
and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
6. A data analysis processing system, characterized in that the data analysis processing system comprises:
the acquisition module is used for acquiring input data of a first data structure of the stream task;
a conversion module for converting the input data of the first data structure into intermediate data of a second data structure;
the processing module is used for processing the intermediate data by using the operator of the flow task and outputting a processing result;
wherein the second data structure includes a static data region and a dynamic data region.
7. The data analysis processing system of claim 6, wherein the conversion module comprises:
the acquisition subunit is used for acquiring the data type of the input data;
a conversion subunit, configured to convert input data of a first data structure into intermediate data of a second data structure according to the data type of the first data structure.
8. The data analysis processing system according to claim 7, wherein the conversion subunit is specifically configured to: determining a target data type corresponding to each field in the second data structure according to the original data type of each field of the input data, wherein the target data type comprises a static data type and a dynamic data type;
uniformly and sequentially numbering corresponding static data and dynamic data in the second data structure to obtain a static area index, and individually and sequentially numbering the dynamic data to obtain a dynamic area index;
and converting the input data of the first data structure into intermediate data of a second data structure according to the static area index, the dynamic area index and the corresponding target data type of each field in the second data structure.
9. The data analysis processing system of claim 6, wherein the input data comprises nested data and/or dynamic data.
10. The data analysis processing system of claim 8, wherein the step of converting the input data of the first data structure into the intermediate data of the second data structure is followed by further comprising:
determining a computing mode of the stream task based on the target data type;
the processing module is specifically configured to:
and processing the intermediate data by using the operator of the stream task based on the calculation mode, and outputting a processing result.
CN202010611247.2A 2020-06-29 2020-06-29 Data analysis processing system and data processing method Active CN111813846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010611247.2A CN111813846B (en) 2020-06-29 2020-06-29 Data analysis processing system and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010611247.2A CN111813846B (en) 2020-06-29 2020-06-29 Data analysis processing system and data processing method

Publications (2)

Publication Number Publication Date
CN111813846A true CN111813846A (en) 2020-10-23
CN111813846B CN111813846B (en) 2021-04-02

Family

ID=72855550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010611247.2A Active CN111813846B (en) 2020-06-29 2020-06-29 Data analysis processing system and data processing method

Country Status (1)

Country Link
CN (1) CN111813846B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112506497A (en) * 2020-11-30 2021-03-16 北京九章云极科技有限公司 Data processing method and data processing system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1802614A (en) * 2003-06-04 2006-07-12 菲尔德巴士基金 Flexible function blocks
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
US20090106282A1 (en) * 2007-10-19 2009-04-23 Siemens Product Lifecycle Management Software Inc. System and method for interformat data conversion
CN101661514A (en) * 2008-05-21 2010-03-03 中国石化股份胜利油田分公司地质科学研究院 Oil deposit black oil model numerical simulation system
CN101958928A (en) * 2010-09-17 2011-01-26 北京大学 Online reconstruction method of fine-grain remote call
CN103886521A (en) * 2014-04-15 2014-06-25 杭州昊美科技有限公司 Intelligent tour inspection test terminal and method based on GIS
US20150292018A1 (en) * 2011-02-22 2015-10-15 The Procter & Gamble Company Method of making skin care compositions
US9361464B2 (en) * 2012-04-24 2016-06-07 Jianqing Wu Versatile log system
CN106296458A (en) * 2016-08-15 2017-01-04 成都九鼎瑞信科技股份有限公司 Water utilities data processing method, device and water utilities data collecting system
CN108733758A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Hotel's static data method for pushing, device, electronic equipment and readable storage medium storing program for executing
CN108921423A (en) * 2018-06-28 2018-11-30 北京金风科创风电设备有限公司 experiment management and data analysis system
CN109657103A (en) * 2018-12-19 2019-04-19 广州天鹏计算机科技有限公司 Conversion method, device, computer equipment and the storage medium of data structure
CN109670267A (en) * 2018-12-29 2019-04-23 北京航天数据股份有限公司 A kind of data processing method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1802614A (en) * 2003-06-04 2006-07-12 菲尔德巴士基金 Flexible function blocks
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
US20090106282A1 (en) * 2007-10-19 2009-04-23 Siemens Product Lifecycle Management Software Inc. System and method for interformat data conversion
CN101661514A (en) * 2008-05-21 2010-03-03 中国石化股份胜利油田分公司地质科学研究院 Oil deposit black oil model numerical simulation system
CN101958928A (en) * 2010-09-17 2011-01-26 北京大学 Online reconstruction method of fine-grain remote call
US20150292018A1 (en) * 2011-02-22 2015-10-15 The Procter & Gamble Company Method of making skin care compositions
US9361464B2 (en) * 2012-04-24 2016-06-07 Jianqing Wu Versatile log system
CN103886521A (en) * 2014-04-15 2014-06-25 杭州昊美科技有限公司 Intelligent tour inspection test terminal and method based on GIS
CN106296458A (en) * 2016-08-15 2017-01-04 成都九鼎瑞信科技股份有限公司 Water utilities data processing method, device and water utilities data collecting system
CN108733758A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Hotel's static data method for pushing, device, electronic equipment and readable storage medium storing program for executing
CN108921423A (en) * 2018-06-28 2018-11-30 北京金风科创风电设备有限公司 experiment management and data analysis system
CN109657103A (en) * 2018-12-19 2019-04-19 广州天鹏计算机科技有限公司 Conversion method, device, computer equipment and the storage medium of data structure
CN109670267A (en) * 2018-12-29 2019-04-23 北京航天数据股份有限公司 A kind of data processing method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PIERRE HIREL: "Atomsk:A tool for manipulating and converting atomic data files", 《COMPUTER PHYSICS COMMUNICATIONS》 *
SHERIF SAKR 等: "The family of mapreduce and large-scale data processing systems", 《ACM COMPUTING SURVEYS(CSUR)》 *
张翼翔 等: "模型平台公共数据管理方法研究", 《系统仿真学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112506497A (en) * 2020-11-30 2021-03-16 北京九章云极科技有限公司 Data processing method and data processing system
CN112506497B (en) * 2020-11-30 2021-08-24 北京九章云极科技有限公司 Data processing method and data processing system

Also Published As

Publication number Publication date
CN111813846B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
US10572822B2 (en) Modular memoization, tracking and train-data management of feature extraction
US10360405B2 (en) Anonymization apparatus, and program
CN111159220B (en) Method and apparatus for outputting structured query statement
CN110298019A (en) Name entity recognition method, device, equipment and computer readable storage medium
WO2015099976A1 (en) Generation of client-side application programming interfaces
US20130346939A1 (en) Methods and Systems Utilizing Behavioral Data Models With Views
CN110941655B (en) Data format conversion method and device
US20210110111A1 (en) Methods and systems for providing universal portability in machine learning
US11663245B2 (en) Initial loading of partial deferred object model
CN111159215A (en) Mapping method and device of Java class and relational database and computing equipment
CN113760839A (en) Log data compression processing method and device, electronic equipment and storage medium
CN110175128B (en) Similar code case acquisition method, device, equipment and storage medium
CA3089289C (en) System and methods for loading objects from hash chains
CN116244387A (en) Entity relationship construction method, device, electronic equipment and storage medium
CN117389544B (en) Artificial intelligence data modeling method, device, medium and equipment
CN111813846B (en) Data analysis processing system and data processing method
CN110888876A (en) Method and device for generating database script, storage medium and computer equipment
CN111125154B (en) Method and apparatus for outputting structured query statement
CN115629763A (en) Target code generation method and NPU instruction display method and device
CN113468258B (en) Heterogeneous data conversion method, heterogeneous data conversion device and storage medium
US10394898B1 (en) Methods and systems for analyzing discrete-valued datasets
CN113778846B (en) Method and device for generating test data
CN115526177A (en) Training of object association models
EP3355207A1 (en) K-selection using parallel processing
CN118093097B (en) Data storage cluster resource scheduling method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant