CN115391619A - Data analysis method and device, electronic equipment and storage medium - Google Patents

Data analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115391619A
CN115391619A CN202211035810.1A CN202211035810A CN115391619A CN 115391619 A CN115391619 A CN 115391619A CN 202211035810 A CN202211035810 A CN 202211035810A CN 115391619 A CN115391619 A CN 115391619A
Authority
CN
China
Prior art keywords
data
code
character string
instructions
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211035810.1A
Other languages
Chinese (zh)
Inventor
何文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202211035810.1A priority Critical patent/CN115391619A/en
Publication of CN115391619A publication Critical patent/CN115391619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The application provides a data analysis method, a data analysis device, electronic equipment and a storage medium; the method comprises the following steps: for each acquired character string, all instructions in the character string are analyzed by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class; parsing the character strings to determine execution logic between each instruction and each data code; determining a data source pointed by each data code according to the field in each data code, and determining a data structure of the data source; and enabling each instruction to call the data code according to the execution logic, and loading data according to the data structure pointed by the data code to obtain target data. As can be seen, the loading of target data can be done across data sources, and across data types.

Description

Data analysis method and device, electronic equipment and storage medium
Technical Field
Embodiments of the present application relate to the field of data processing technologies, and in particular, to a data parsing method and apparatus, an electronic device, and a storage medium.
Background
In the actual data application process, when data is loaded, an instruction based on one statement often occurs, data of different data sources needs to be loaded, and the data types of the data sources are often different.
However, in the related data parsing technology, it is often difficult for an instruction of a statement to be loaded across different data sources or across different data types, that is, when the statement is parsed, it is often difficult to parse a corresponding data loading manner according to requirements of different data types and data sources, which makes it very difficult to load data across data sources and load data across data types.
Based on this, there is a need for a solution that enables data parsing that can be performed across different data sources and across different data types.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data analysis method, apparatus, electronic device and storage medium.
In view of the above, the present application provides a data parsing method, including:
for each acquired character string, all instructions in the character string are analyzed by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class;
parsing the character string to determine execution logic between each instruction and each data code;
determining a data source pointed by each data code according to the field in each data code, and determining a data structure of the data source;
and enabling each instruction to call the data code according to the execution logic, and loading data according to the data structure pointed by the data code to obtain target data.
Further, the first enumerated class and the second enumerated class are determined by:
taking each of a plurality of instructions as a first object in the first enumeration class, and defining each first object;
and taking the data type of each of the plurality of data sources as a second object in the second enumeration class, and defining each second object.
Further, by calling the preset first enumeration class, all instructions in the character string are resolved, including:
and sequentially determining characters representing the instructions according to the expression sequence of the characters in the character string, and inquiring the definitions of the instructions in the first enumeration class.
Further, by calling the preset second enumeration class, all data codes in the character string are analyzed, including:
and sequentially determining data codes representing the data sources according to the expression sequence of each character in the character string, and querying the definition of the data source corresponding to each data code in the second enumeration class.
Further, determining a data source to which the data code points according to the fields in the data codes, and determining a data structure of the data source, including:
for each data code, determining a first field of the data code pointing to the type of the data source, and determining a second field of the data code pointing to the name of the data source;
determining, by querying a definition of the second object according to the first field, that the data structure of the data source is one of the structured data and the semi-structured data;
and determining the name of the data source by inquiring a preset mapping class according to the second field.
Further, before the instructions call the data code according to the execution logic, the method further includes:
dividing all the instructions into command instructions, connection instructions and conditional instructions according to the definition of the first object;
and constructing each instruction and each data code in the character string into a syntax tree according to the expression sequence of each instruction and each data code in the character string.
Further, performing data loading according to the data structure pointed by the data code to obtain target data, including:
in response to determining that the data source to which the data code points is structured data, loading corresponding target data from the corresponding data source;
and responding to the fact that the data source pointed by the data code is determined to be semi-structured data, and analyzing the semi-structured data to obtain corresponding target data.
Further, analyzing the semi-structured data to obtain corresponding target data, including:
calling a preset resolver corresponding to the data type according to the data type of the semi-structured data;
determining target data pointed by the second field by analyzing the script file of the semi-structured data by using the corresponding analyzer;
and loading the target data based on the determined target data.
Based on the same inventive concept, the application also provides a data analysis method and device, comprising the following steps: the system comprises a lexical analysis module, a grammar analysis module, a data structure determination module and a data loading module;
the lexical analysis module is configured to analyze all instructions in each obtained character string by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class;
the grammar parsing module is configured to parse the character strings and determine execution logic between each instruction and each data code;
the data structure determining module is configured to determine a data source pointed by each data code according to the field in the data code, and determine a data structure of the data source;
and the data loading module is configured to enable each instruction to call the data code according to the execution logic, and load data according to a data structure pointed by the data code to obtain target data.
Based on the same inventive concept, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the data parsing method as described in any one of the above items is implemented.
Based on the same inventive concept, the present application further provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing the computer to execute the data parsing method as described above.
As can be seen from the foregoing, according to the data parsing method, the data parsing apparatus, the electronic device, and the storage medium provided by the present application, the instructions and the data sources that may appear in the character string are respectively categorized based on the defined first enumeration class and the defined second enumeration class, so that in the process of parsing the character string, the characters related to the instructions and the characters related to the data sources can be respectively parsed according to the first enumeration class and the second enumeration class; meanwhile, the grammar analysis added in the method can effectively analyze the execution logic between the instruction and the data code.
Further, after the structured data and the semi-structured data are comprehensively considered, the specific pointed data type of the data is determined through the data code, different data sources can be distinguished when the data code is called, different data types can be distinguished, and loading of target data can be carried out on different data types in different modes, so that when different data sources appear in a character string and each data source is different in data type, loading of the target data can be carried out across the data sources and across the data types.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a data parsing method according to an embodiment of the present application;
FIG. 2 is a logical representation of an abstract syntax tree according to an embodiment of the present application;
FIG. 3 is a logic diagram illustrating data loading according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a data analysis device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
As described in the background section, the related data parsing method is also difficult to meet the needs of actual production.
The applicant finds that the related data analysis method has the main problems in the process of implementing the application: in the actual data application process, when data is loaded, because data to be loaded may be stored in advance as different types of data structures or in different databases, an instruction based on one statement often occurs, data of different data sources needs to be loaded, and data types between the data sources are often different.
However, in the related data parsing technology, it is often difficult for an instruction of a statement to be loaded across different data sources or across different data types, that is, when the statement is parsed, it is often difficult to simultaneously parse and load different data structures according to requirements of different data types and data sources, which makes it very difficult to load data across data sources and load data across data types.
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
In the embodiment of the present application, an SQL (structured query) data management system is taken as a specific example, an SQL (structured query) language is taken as a language environment for implementing the method, and it is set that data to be loaded is stored in different data sources in different data structures in advance.
In the present embodiment, in the SQL data management system, an SQL statement that calls data may be input, and the SQL statement may be regarded as a character string.
Further, for the data stored in different data structures in different data sources, when different data sources need to be loaded simultaneously, the requests for invoking multiple different data sources may be included in one SQL statement, for example, the requests for invoking data from MYSQL (relational database management system), posttgresql (object-relational database management system), and clikouse (a list of storage databases) and, at the same time, the requests for invoking XML (extensible markup language file) configuration files, JSON (object notation file) log files, unprocessed raw data, and so on.
Similarly, in the SQL statement of the same sentence calling data, the call requirement may further include: calling data acquired by the crawler, calling data of the biographical references and the like.
Referring to fig. 1, a data parsing method according to an embodiment of the present application includes the following steps:
step S101, for each acquired character string, all instructions in the character string are analyzed by calling the preset first enumeration class; and analyzing all data codes in the character string by calling the preset second enumeration class.
In an embodiment of the present application, based on a language environment for implementing the method, a first enumeration class for enumerating each instruction and a second enumeration class for enumerating each data source may be defined first.
Specifically, an enumeration class may be established, in which each instruction is used as an object and each instruction is defined separately to enumerate each instruction; the individual instructions may be all instructions in the computer language for implementing the method, or a plurality of instructions selected in advance.
In this embodiment, each object corresponding to each instruction is taken as a first object, and the enumerated class is taken as a first enumerated class.
Further, another enumeration class may be established, in which each data type related to each data source is taken as an object, and each data type is defined respectively to enumerate each data type; each data type may be all data types that need to be related, or may be a plurality of data types selected in advance.
In this embodiment, each object corresponding to each data type is taken as a second object, and the enumerated class is taken as a second enumerated class.
In a specific example, the first enumerated class may be referred to as a Token class and the second enumerated class as a database class.
Further, a part or all of the instructions of the SQL may be defined in the Token class.
For example, the instruction SELECT may be defined as a first object as follows:
SELECT(“SELECT”)
further, the data types related to some or all of the data sources that need to be called can be defined in the Databases class.
For example, MYSQL type may be used as one second object, and the data source of CSV type may be used as another second object, and the following definitions are performed:
MYSQL(“MYSQL”),
CSV(“CSV”)
it can be seen that after each SQL instruction is defined, a Token class defining a plurality of SQL instructions, that is, a first enumeration class, can be obtained; after the data types related to each data source are defined, database classes defining a plurality of data types, that is, a second enumeration class, may be obtained.
In an embodiment of the present application, based on the first enumeration class and the second enumeration class that are constructed as described above, the obtained character string may be analyzed, and all instructions and all data codes of the character string may be obtained.
In the present embodiment, for the character string of the read data related to the user input, which includes a plurality of characters, it can be considered that a part of the characters represent the respective instructions, and another part of the characters, which are the data codes in the present embodiment, represent the respective data sources involved, where each data code represents one data source.
Further, after the character string is obtained, the character string may be input to a preset lexical analyzer to analyze each instruction and each data source therein.
Specifically, the lexical analyzer may sequentially distinguish instruction codes and data codes in the character string according to a description order of each character in the character string.
Further, for a distinguished instruction, a definition about the instruction may be queried from the first enumeration class constructed as described above.
Further, for distinguished data codes, the definition of the data source represented by the data code can be queried from the second enumeration class constructed as described above.
In a specific example, as shown in fig. 2, the following SQL statements are used as a specific example of the SQL statements input by the user:
SELECT user.id,user.age,data.name FROM ClickHouse.user user LEFT JOIN JSON.data data ON user.id=data.id WHERE user.age>10and user.type=’new’
in this embodiment, after the SQL statement input by the user is obtained, the SQL statement is input to a Lexer in fig. 2, where the Lexer is used as a lexical analyzer and can perform lexical analysis on the SQL statement.
Further, after the SQL statement is analyzed, the following instructions can be obtained:
SELECT, FROM, LEFT JOIN, ON, =, WHERE, > and =
It can be seen that the Lexer determines each instruction according to the sequence of each character in the SQL statement.
Further, the definitions of the above respective instructions may be queried from the Token class.
Further, after the SQL statement is analyzed, the following data codes are obtained:
user.id, user.age, data.name, clickhouse.user and json.data
It can be seen that the Lexer determines each data code according to the sequence of each character in the SQL statement.
Further, the definitions of the corresponding data sources in the data codes may be queried from the Databases.
And step S102, carrying out syntax analysis on the character strings, and determining execution logic between each instruction and each data code.
In an embodiment of the present application, based on the obtained character string, the character string may be input to a preset parser.
Further, the character string is parsed by the parser, so that the execution logic between each instruction and each data code in the character string can be obtained.
Wherein the execution logic specifically describes the syntax logic when each instruction calls the data code.
In a specific example, as shown in fig. 2, the SQL statement in the foregoing steps may be input into Parser, where the Parser serves as a Parser and may parse the SQL statement.
It should be noted that since the lexical analysis performed in step S102 and the syntactic analysis performed in step S103 are executed by different parsers, step S102 and step S103 may be executed in the execution order of step S102 and step S103, first step S102 and then step S103; step S103 may be executed first, and then step S102 may be executed; or step S102 and step S103 may be performed simultaneously.
Step S103, determining the data source pointed by the data code according to the fields in the data codes, and determining the data structure of the data source.
In the embodiment of the present application, based on each data code in the parsed character string, the data source represented by the data code and the data type of the data source may be determined through a specific field of the data code.
Specifically, for the acquired character string, the data source pointed by each data code may be determined according to the field in the data code, and the data source may be determined as structured data or semi-structured data.
In the present embodiment, the data code includes two fields, specifically, a field indicating the data type of the data source, and is referred to as a first field herein; and a field representing the name of the data source, and herein referred to as a second field.
Further, the first field in this embodiment may be a representation manner that uses a specific name of the data type as the data source, and may determine the data structure of the data source represented by each first field according to the definition of each second object in the second enumeration class.
In this embodiment, the data structure specifically includes structured data and unstructured data.
Structured data can also include data sources of a variety of different data types, and in particular examples, structured data can include, for example, MYSQL (relational database management system), posttgresql (object-relational database management system), clikhouse (a list store database), and the like.
Further, the unstructured data may also include data sources of a variety of different data types, and in specific examples, the structured data may include, for example, CSV (comma separated value file), JSON (object profile file in JAVA language), XML (extensible markup language file), and the like.
Further, based on the second field described above, the name of the data source pointed to by the second field may be determined.
Specifically, a mapping class may be preset, a plurality of key value pairs may be set in the mapping class, and the data source may be determined from the key value pairs according to the name of the data source in the second field.
In a specific example, based on the SQL statement in the foregoing step, taking clickwouse therein as an example, it can be seen that clickwouse serves as a first field, and a data type represented by the first field clickwouse can be determined through a second enumeration type query.
Further, as a second field, the user can be determined as the name of the required data source.
As shown in fig. 2, it can be seen that, for each data code, it can be further divided into two data sources, namely a table and a column according to its second field; the method comprises the steps that TABLES represents a data source in a form of a table, two data codes are used, clickHouse represents a specific data structure of the data source and belongs to chemical data, and user represents the name of the data table; JSON represents a concrete structure of a data source, and data is semi-structured data, which represents a name of the semi-structured data.
Further, FIELDS represents a data source in the form of columns, taking an example of an analyzed data code clickwouse.user.age therein, clickwouse represents a specific data structure of the data source and belongs to the chemical data; user represents the name of the data table; age represents the column named age in the data table user.
And step S104, enabling each instruction to call the data code according to the execution logic, and loading data according to the data structure pointed by the data code to obtain target data.
In the embodiment of the application, for the obtained character string, each instruction in the obtained character string can call a data code according to the execution logic, and when the data code points to the structured data, corresponding target data can be loaded from a corresponding data source; when the data code points to the semi-structured data, the corresponding target data can be obtained by analyzing the semi-structured data.
First, based on the respective instructions and the respective data codes parsed as described above, it is possible to construct a syntax tree and distinguish command instructions for commands, connection instructions for associations, and conditional instructions for filtering in the syntax tree.
Specifically, the instructions and the data codes parsed in the foregoing steps may still be arranged according to the expression order, and in this embodiment, the data source in the form of a column may be used as one branch of the syntax tree, and the data source in the form of a table may be used as another branch of the syntax tree.
Specifically, as shown in fig. 2, the parsed instructions SELECT, FROM, JOIN, and WHERE are arranged according to the expression order in the SQL statement; and arranging the data codes according to the expression sequence of SQL, specifically, arranging the data sources in the form of columns: (iii) clickwause.user.id, clickwause.user.age, and json.logs.data.name as branches of FIELDS; a data source in the form of a table: clickwause.user and json.data as branches of TABLE.
Further, in the analyzed instructions, instructions with different functions, such as a command, a connection instruction, and a conditional instruction, may be distinguished, and in a specific example, the command instruction may be SELECT, FROM, WHERE, and the like; the connection instruction can be JOIN, LEFT JOIN and the like; the conditional instruction may be a symbol >.
Further, as shown in fig. 2, characters and data codes respectively associated with the connection instruction and the conditional instruction are respectively set in respective branches, based on which AST (abstract syntax tree) can be obtained.
Further, based on the constructed AST, the AST may be input to a data selection process for loading of target data.
Specifically, based on the structure of the constructed syntax tree, after the syntax tree is input to the data selection process, each instruction may be made to operate according to the execution logic parsed by the syntax parser, and the relevant data code is specifically called, so as to obtain the target data corresponding to the data code.
Further, when the data codes are called, different loading processes can be respectively carried out on the structured data and the semi-structured data according to the data types pointed by the data codes determined in the previous step.
Specifically, when a first field in the called data code points to a data source of the structured data, a specific data source can be determined through the first field, and corresponding target data is loaded from the data source according to a second field and is loaded.
Further, when a first field in the called data code points to a data source of the semi-structured data, the specific type of the semi-structured data can be determined through the first field, a script file related to the target data is determined according to a second field, after the script file indicated by the second field is obtained, the script file can be analyzed, the target data can be determined after the analysis, and the target data can be loaded.
When parsing different types of semi-structured data, the parsing can be performed by a preset parser for the type.
In this embodiment, taking the logic of data selection shown in fig. 3 as a specific example, it can be seen that, when a data code in the constructed AST is called, for structured data, taking clickwouse.
Further, for the semi-structured data, taking JSON. Data as an example, it can be determined that the specific type of the semi-structured data is JSON through the first field.
Further, as can be seen from fig. 3, a parser for each semi-structured data is provided: and after the semi-structured data is determined to be the JSON script file, the JSON parser can be used for parsing the JSON script file.
Further, when the JSON script file is parsed, the second field of the JSON script file can also be used as a tag, and a value corresponding to the tag is determined after parsing.
In this embodiment, the second field data of the json.data, which points to the name of the data source, may be used as a tag of data, and a value corresponding to the tag may be determined after parsing, and the value corresponding to the tag may be target data, so that target data to be loaded may be determined based on this.
It can be seen that, based on the determined requirement for simultaneously calling the structured data and the semi-structured data, different processing modes can be simultaneously implemented for different data structures because the specific data type is determined from the first field.
Furthermore, the data source can be directly loaded for any type of determined structured data; for the semi-structured data, the semi-structured data can be analyzed based on the first field, and the target data is determined after the analysis, so that the structured data and the semi-structured data which need to be called in the SQL statement can be analyzed simultaneously, that is, in this embodiment, the target data pointed by the JSON file can be determined while the target data in the clikhouse is determined, and the target data in the clikhouse and the target data pointed by the JSON file do not need to be processed separately.
Further, based on the determined target data to be loaded, as described above, the target data related to the execution logic may be executed according to the target data parsed by the parser, for example, the JOIN instruction JOIN operation is performed first, then the conditional instruction is screened, and the like, and the result of data loading is obtained.
As can be seen, in the data parsing method according to the embodiment of the present application, the instructions and the data sources that may appear in the character string are respectively categorized based on the defined first enumeration class and the defined second enumeration class, so that in the parsing process of the character string, the characters related to the instructions and the characters related to the data sources can be respectively parsed according to the first enumeration class and the second enumeration class; meanwhile, the grammar parsing added in the method can effectively parse out the execution logic between the instruction and the data code.
Further, after the structured data and the semi-structured data are comprehensively considered, the specific pointed data type of the data is determined through the data code, different data sources can be distinguished when the data code is called, different data types can be distinguished, and loading of target data can be carried out on different data types in different modes, so that when different data sources appear in a character string and each data source is different in data type, loading of the target data can be carried out across the data sources and across the data types.
It should be noted that the method of the embodiments of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and is completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present application, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, the embodiment of the present application further provides a data analysis device corresponding to any of the above embodiments.
Referring to fig. 4, the data parsing apparatus includes: a lexical analysis module 401, a grammar analysis module 402, a data structure determination module 403 and a data loading module 404;
the lexical analysis module 401 is configured to, for each obtained character string, analyze all instructions in the character string by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class;
the syntax parsing module 402 is configured to parse the character string to determine an execution logic between each instruction and each data code;
the data structure determining module 403 is configured to determine, according to the fields in the data codes, the data source to which the data codes point, and determine the data structure of the data source;
the data loading module 404 is configured to enable each instruction to call the data code according to the execution logic, and load data according to a data structure pointed by the data code to obtain target data.
For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in practicing embodiments of the present application.
The apparatus in the foregoing embodiment is used to implement the corresponding data analysis method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment, the embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the data parsing method according to any embodiment is implemented.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present Application.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiment of the present application is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).
The bus 1050 includes a path to transfer information between various components of the device, such as the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may also include only those components necessary to implement the embodiments of the present application, and not necessarily all of the components shown in the figures.
The apparatus in the foregoing embodiment is used to implement the corresponding data analysis method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-mentioned embodiment methods, the present application also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the data parsing method according to any of the above-mentioned embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the data analysis method according to any one of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The embodiments of the present application are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims (11)

1. A data parsing method, comprising:
for each acquired character string, all instructions in the character string are analyzed by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class;
parsing the character string to determine execution logic between each instruction and each data code;
determining a data source pointed by each data code according to the field in each data code, and determining a data structure of the data source;
and enabling each instruction to call the data code according to the execution logic, and loading data according to the data structure pointed by the data code to obtain target data.
2. The method of claim 1, wherein the first enumerated class and the second enumerated class are determined by:
taking each of a plurality of instructions as a first object in the first enumeration class, and defining each first object;
and taking the data type of each of the plurality of data sources as a second object in the second enumeration class, and defining each second object.
3. The method according to claim 1, wherein the parsing out all instructions in the character string by calling the preset first enumeration class comprises:
and sequentially determining characters representing the instructions according to the expression sequence of the characters in the character string, and inquiring the definitions of the instructions in the first enumeration class.
4. The method according to claim 1, wherein the parsing out all data codes in the character string by calling the preset second enumeration class includes:
and sequentially determining data codes representing the data sources according to the expression sequence of each character in the character string, and inquiring the definition of the data source corresponding to each data code in the second enumeration class.
5. The method according to claim 2, wherein determining the data source pointed to by the data code according to the field in each data code and determining the data structure of the data source comprises:
for each data code, determining a first field of the data code pointing to the type of the data source, and determining a second field of the data code pointing to the name of the data source;
determining, by querying a definition of the second object according to the first field, that the data structure of the data source is one of the structured data and the semi-structured data;
and determining the name of the data source by inquiring a preset mapping class according to the second field.
6. The method of claim 2, wherein prior to causing the instructions to call the data code in accordance with the execution logic, further comprising:
dividing all the instructions into command instructions, connection instructions and conditional instructions according to the definition of the first object;
and constructing each instruction and each data code in the character string into a syntax tree according to the expression sequence of each instruction and each data code in the character string.
7. The method according to claim 5, wherein the performing data loading according to the data structure pointed by the data code to obtain target data comprises:
in response to determining that the data source to which the data code points is structured data, loading corresponding target data from the corresponding data source;
and responding to the fact that the data source pointed by the data code is determined to be semi-structured data, and analyzing the semi-structured data to obtain corresponding target data.
8. The method according to claim 7, wherein the parsing the semi-structured data to obtain corresponding target data comprises:
calling a preset resolver corresponding to the data type according to the data type of the semi-structured data;
determining target data pointed by the second field by analyzing the script file of the semi-structured data by using the corresponding analyzer;
and loading the target data based on the determined target data.
9. A data analysis device, comprising: the system comprises a lexical analysis module, a grammar analysis module, a data structure determination module and a data loading module;
the lexical analysis module is configured to analyze all instructions in each obtained character string by calling the preset first enumeration class; analyzing all data codes in the character string by calling the preset second enumeration class;
the grammar parsing module is configured to parse the character strings to determine execution logics between each instruction and each data code;
the data structure determining module is configured to determine a data source pointed by each data code according to the field in the data code, and determine a data structure of the data source;
and the data loading module is configured to enable each instruction to call the data code according to the execution logic, and load data according to a data structure pointed by the data code to obtain target data.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.
11. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to any one of claims 1 to 8.
CN202211035810.1A 2022-08-26 2022-08-26 Data analysis method and device, electronic equipment and storage medium Pending CN115391619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211035810.1A CN115391619A (en) 2022-08-26 2022-08-26 Data analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211035810.1A CN115391619A (en) 2022-08-26 2022-08-26 Data analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115391619A true CN115391619A (en) 2022-11-25

Family

ID=84122018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211035810.1A Pending CN115391619A (en) 2022-08-26 2022-08-26 Data analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115391619A (en)

Similar Documents

Publication Publication Date Title
CN111061757B (en) Language conversion method and device of database, electronic equipment and storage medium
US11157560B2 (en) System and method for managing graph data
US9058360B2 (en) Extensible language framework using data cartridges
CN107454954B (en) Data binding dependency analysis
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
CN112395305B (en) SQL sentence analysis method and device, electronic equipment and storage medium
US20130332449A1 (en) Generating data processing code from a directed acyclic graph
CN108255837B (en) SQL parser and method
US20140173559A1 (en) Identifying differences between source codes of different versions of a software when each source code is organized using incorporated files
CN111125064B (en) Method and device for generating database schema definition statement
US10691434B2 (en) System and method for converting a first programming language application to a second programming language application
CN113419740B (en) Program data stream analysis method and device, electronic equipment and readable storage medium
CN116028028B (en) Request function generation method, device, equipment and storage medium
CN111814449B (en) Form analysis method, device, equipment and storage medium
CN114780107B (en) Grammar analysis method and device of rule running file and decision engine
CN114064601A (en) Storage process conversion method, device, equipment and storage medium
CN113609128A (en) Method and device for generating database entity class, terminal equipment and storage medium
US20210232378A1 (en) Program calling, electronic device, and storage medium
CN115618363B (en) Vulnerability path mining method and related equipment
CN115391619A (en) Data analysis method and device, electronic equipment and storage medium
CN113138767B (en) Code language conversion method, device, electronic equipment and storage medium
CN114116683A (en) Multi-language processing method and device for computing platform and readable storage medium
CN112307050A (en) Identification method and device for repeated correlation calculation and computer system
US11907223B2 (en) System and method for executing compiled user defined functions in vectorized databases
CN116756184B (en) Database instance processing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination