CN116467355A

CN116467355A - Method and system for selecting and integrating data in multiple data sets

Info

Publication number: CN116467355A
Application number: CN202310164803.XA
Authority: CN
Inventors: 包卿; 薛立俊; 王兴华; 方禺
Original assignee: Mingdu Zhiyun Zhejiang Technology Co Ltd
Current assignee: Mingdu Zhiyun Zhejiang Technology Co Ltd
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-07-21

Abstract

The invention discloses a method and a system for selecting and integrating data in multiple data sets, which are characterized in that all field attributes and field parameter values in multiple data sets which are mutually connected through connection marks are obtained, the integration attribute of all connection marks is read, connection queues formed by connecting all the data sets are split to form multiple data set groups which are sequentially arranged in series, field parameter values of two adjacent data sets are sequentially screened according to a preset combination sequence and then combined into a branch data set according to the integration attribute of the connection marks of the two adjacent data sets, and the branch data set is screened and combined with the next data set until the whole data set combination is completed and an integrated data set is formed, so that screening and combining of all fields in the multiple data set connection queues with branches are realized.

Description

Method and system for selecting and integrating data in multiple data sets

Technical Field

The present invention relates to the field of information technologies, and in particular, to a method and system for selecting and integrating data in multiple data sets.

Background

When data analysis is performed, content screening and merging are often performed on multiple data sets, and new data sets meeting requirements are output for subsequent data analysis and presentation. For example, in the current process of analyzing production process data, to better present the comparison and variation of monitoring data in multiple processes, multiple data sets often need to be screened and combined into a target data set. The existing data set processing mode is to set content screening rules on two data sets, and then to screen the data of the two data sets and combine the two data sets into a target data set meeting the requirements, but the screening and combining mode can only screen and combine the two data sets independently, and can not screen and combine more than two associated data sets.

Disclosure of Invention

The invention provides a method for selecting and integrating data in multiple data sets, which aims at the defects in the prior art and comprises the following steps:

s1, acquiring all field attributes and field parameter values in a plurality of data sets which are mutually connected through connection line identifiers, and reading integrated attributes of all connection line identifiers;

s2, splitting a connection queue formed by connecting all data sets to form a plurality of data set groups which are sequentially arranged in series, sequentially screening field parameter values of two adjacent data sets according to the integration attribute of the connection mark according to a preset combination sequence, combining the two adjacent data sets into a branch data set, and then screening and combining the branch data set with the next data set until the whole data set combination is completed;

s3, acquiring the integration attribute and the connection field of the connection line identifier of each data set group, and combining the branch data sets formed after integrating the data set groups after field parameter value screening according to the integration attribute and the connection field of the corresponding connection line identifier to form an integrated data set.

Preferably, the step S2 further includes:

acquiring a starting data table node of a data set queue according to a set direction, traversing each data table node to an end data table node in the data set queue through a depth-first algorithm from the starting data table node according to the connection relation of each data table to form a queue connection path;

and reversely acquiring and recording the association relation between the queue connection path and each data table from the end data table node according to the queue connection path to form at least one data set group which is arranged in tandem.

Preferably, the step S2 further includes: one side of the data set queue, which is provided with at least one data set, is respectively connected with a plurality of different data sets through different connection line identifiers, wherein the data set groups which are arranged in tandem are formed after the association relationship between the data set queue and each data table is reversely acquired and recorded from the end data table node according to the connection path of the queue.

Preferably, the step S3 further includes: and acquiring a multi-connection-mark merging rule in an integration attribute of connection marks in a branch data set, and carrying out field parameter value screening on a plurality of data set groups according to the multi-connection-mark merging rule to merge the data set groups to form an integrated data set, wherein the branch data set is a data set with a plurality of connection marks connected on the same side.

Preferably, the step S3 further includes: and generating a view result SQL statement according to the formed integrated data set, checking the generated view result SQL statement according to the grammar of the SQL statement, and analyzing to generate a visual report.

The invention also discloses a system for selecting and integrating the data in the multiple data sets, which comprises the following steps: the field acquisition module is used for acquiring each field attribute and each field parameter value in the plurality of data sets which are connected with each other through the connection line identification, and reading the integration attribute of each connection line identification; the decomposition module is used for splitting a connection queue formed by connecting all the data sets to form a plurality of data set groups which are sequentially arranged in series, sequentially screening field parameter values of two adjacent data sets according to the integration attribute of the connection mark according to a preset combination sequence, combining the two adjacent data sets into a branch data set, and then screening and combining the branch data set with the next data set until the combination of the whole data sets is completed; and the integration module is used for acquiring the integration attribute and the connection field of the connection line identifier of each data set group, and integrating the branch data sets formed after the data set groups are integrated, filtering the field parameter values according to the integration attribute and the connection field of the corresponding connection line identifier, and then merging the field parameter values to form an integrated data set.

Preferably, the decomposition module is further configured to acquire a starting data table node of the data set queue according to a set direction, traverse each data table node to an end data table node in the data set queue according to a connection relation of each data table from the starting data table node through a depth priority algorithm, and form a queue connection path; and reversely acquiring and recording the association relation between the queue connection path and each data table from the end data table node according to the queue connection path to form at least one data set group which is arranged in tandem.

Preferably, the side, where at least one data set exists, in the data set queue is connected with a plurality of different data sets through different connection line identifiers, wherein the data set sets which are serially arranged in front and back are formed after the association relationship between the data set and each data table is reversely acquired and recorded from the end data table node according to the queue connection path.

The invention also discloses a device for selecting and integrating data in multiple data sets, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of any one of the methods when executing the computer program.

The present invention also discloses a computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements the steps of the method as described in any of the above.

The invention discloses a method and a system for selecting and integrating data in multiple data sets, which are characterized in that a plurality of data set groups which are sequentially arranged in series are respectively formed by splitting a connection queue formed by connecting a plurality of mutually connected data sets, two adjacent data sets are sequentially screened and combined into each branch data set according to the preset combination sequence and the integration attribute of the connection line identification of the two adjacent data sets, and finally, each branch data set is screened and combined into the same integrated data set. The filtering and merging of fields in a plurality of data set connection queues with branches is realized. The method can more efficiently screen and combine a plurality of data sets, and meet the data analysis requirements of scenes such as visual presentation of monitoring data in a plurality of working procedures in comparison with changing conditions.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

fig. 1 is a flow chart of a method for selecting and integrating data in multiple data sets according to the present embodiment.

Fig. 2 is a schematic diagram of a plurality of data sets connected by a connection identifier according to the present embodiment.

Fig. 3 is a specific flowchart of step S2 disclosed in this embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The terms "first," "second," and the like in the description and in the claims, are not used for any order, quantity, or importance, but are used for distinguishing between different elements. Likewise, the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one.

In order to better present the comparison and variation conditions of the monitoring data in a plurality of working procedures when the data analysis of the production process is performed at present, a plurality of data sets are often required to be screened and combined into a target data set, and in order to achieve the purpose of the present, the embodiment discloses a method for selecting and integrating the data in the plurality of data sets, and the method is specifically shown in the attached figure 1 and comprises the following contents.

Step S1, obtaining all field attributes and field parameter values in a plurality of data sets which are mutually connected through the connection line identifiers, and reading the integration attribute of each connection line identifier.

A large portion of the production-generated data set is built into the production management system for convenient selection by the user. The user can select the data set according to the requirement, put the wanted data field in the database in one data set, or the user can randomly build the data set according to the requirement.

For example, in fig. 2, one data set is deviation record data, which includes data fields such as deviation identification, programming code, deviation status, product code, product name, and product specification, and in the other data set is batch deviation data, which includes data fields such as deviation identification, product lot number, job ticket, and product name. In this embodiment, the data set includes a plurality of fields reflecting characteristics of each production process arranged in a longitudinal direction, and each characteristic field in the data set is associated with each mutually corresponding characteristic data in the target object database.

And carrying out association between the data sets through a connection mark, wherein two ends of the connection mark are respectively connected with target fields in the two data sets, and configuring the connection mark with integrated attributes, wherein the integrated attributes comprise a field judgment selection rule and a combination relation of the data sets, and the judgment selection rule is comparison screening logic for recording parameter values of the fields at two ends of the connection. The combination relation is a field screening rule which is required to be output after the two data sets connected by the connecting line are combined. The judging and selecting rule for each parameter value of the target fields at the two ends of the connection line includes, but is not limited to, that the parameter values of the two target fields are identical, that the parameter value of the first target field is greater than the parameter value of the second target field, or that the parameter value of the first target field is smaller than the parameter value of the second target field. The combination relation of the data sets can comprise all fields comprising the associated data set, mainly the fields of the first data set at the wire end, mainly the fields of the second data set at the wire end or mainly the target fields of the wire.

In another embodiment, a plurality of links may be disposed between two data sets, where each link is provided with a judgment priority level in addition to a judgment selection rule. And screening and sorting the connection line between the two data sets according to the priority, comparing and selecting the parameter values of the fields at the two ends of the connection line according to the judging and selecting rule in sequence, and storing the field parameter values meeting the requirements as screening values and inputting the screening values into a field parameter value screening library.

And screening the fields at the two ends of the connecting lines with the same priority level according to the judging and selecting rules by acquiring the priority level in the attribute information attached to each connecting line, and inputting the field parameter values corresponding to the judging and selecting rules of a plurality of connecting lines simultaneously conforming to the same priority level into a field parameter value screening library.

And (3) screening and sorting the connection lines with different priority levels according to the priority levels, processing the judging and selecting rule corresponding to the connection line with the highest priority level, and supplementing the field parameter value meeting the requirement as a screening value into a characteristic field parameter value screening library. For the connection lines with different priority levels, if one end of a plurality of connection lines is connected with the same characteristic field in the same data set, the characteristic field parameter values which simultaneously accord with the corresponding judging and selecting rules of the plurality of connection lines are input into the characteristic field parameter value screening library.

If the attached attribute information of one connection has no judging priority, the highest priority is given to the attached attribute information to participate in screening and sorting processing of each connection identifier, the parameter values of the characteristic fields at the two ends of the connection identifier are compared and selected according to the judging and selecting rule of each connection identifier in sequence, and the field parameter values meeting the requirements are stored as screening values to be input into a field parameter value screening library.

And S2, splitting a connection queue formed by connecting all the data sets to form a plurality of data set groups which are sequentially arranged in series, sequentially screening field parameter values of two adjacent data sets according to the integration attribute of the connection mark according to a preset combination sequence, combining the two adjacent data sets into a branch data set, and screening and combining the branch data set with the next data set until the combination of the whole data set is completed.

When the integration object has more than three data sets connected with each other, especially when one data set has one or more fields on one side to be respectively connected with more than two other data sets, the screening and merging of adjacent data sets cannot be directly performed, each branch of the connection queue formed by connecting the data sets needs to be split and decomposed into a plurality of data set groups which are sequentially arranged in series, and then each data set group is screened and merged respectively, as shown in fig. 3, the method specifically can include the following.

Step S21, acquiring a starting data table node of a data set queue according to a set direction, traversing each data table node to a last data table node in the data set queue from the starting data table node through a depth-first algorithm according to the connection relation of each data table to form a queue connection path;

step S22, reversely acquiring and recording the association relation between the queue connection path and each data table from the end data table node according to the queue connection path to form at least one data set group which is arranged in tandem.

In this embodiment, step S2 further includes: one side of the data set queue, which is provided with at least one data set, is respectively connected with a plurality of different data sets through different connection line identifiers, wherein the data set groups which are arranged in tandem are formed after the association relationship between the data set queue and each data table is reversely acquired and recorded from the end data table node according to the connection path of the queue.

Specifically, the relation of the whole plurality of data sets can be regarded as a graph structure, and one data set is equivalent to one node in the graph. Starting from the first node, the last leaf node is continuously found according to the depth-first algorithm, and its association with other data sets (1 lnnerjoin 2leftjoin3 lightjoin 4 fulljoin) is reversely acquired from the last leaf node. The relationships of the fields from node to node are combined using logical AND operations. When a node is processed, tracing back to the previous node according to the path of the previous depth algorithm, and continuously superposing the association relation between the previous node and all the nodes to obtain a union of the previous node and all the relations. And performing recursion continuously, and finally obtaining the SQL relation network of the whole graph.

Step S3, acquiring the integration attribute and the connection field of the connection line identifier of each data set group, and combining the branch data sets formed after integrating the data set groups into an integrated data set after field parameter value screening according to the integration attribute and the connection field of the corresponding connection line identifier.

In this embodiment, the step S3 further includes: and acquiring a multi-connection-mark merging rule in an integration attribute of connection marks in a branch data set, and carrying out field parameter value screening on a plurality of data set groups according to the multi-connection-mark merging rule to merge the data set groups to form an integrated data set, wherein the branch data set is a data set with a plurality of connection marks connected on the same side.

The step S3 further includes: and generating a view result SQL statement according to the formed integrated data set, checking the generated view result SQL statement according to the grammar of the SQL statement, and analyzing to generate a visual report.

In this embodiment, the SQL sentence automatically generated by the system or the SQL sentence edited by the user may be checked according to the syntax of the SQL sentence. And analyzing the final SQL sentence according to an autonomously developed lexical analysis method. After verification is successful, the report form in the system or the third party external system can be directly used. Through lexical analysis, the sql statement edited by the user in a self-definition manner is analyzed and verified according to the keyword priority principle, and the method can specifically comprise the following contents.

A key word library of the SQL grammar is established, and all key words and functions of the SQL grammar are combined into a set. Dividing the SQL statement customized by the user according to the space as a separator to form an SQL statement linked list.

And performing keyword or function matching on the characters in the linked list, and marking the matched characters with keywords or functions. And prompting the random word strings according to the keyword and the function set and the highest matching degree, for example, if the word strings are in the form of sam, and prompting whether the word strings are sum.

And checking the next node of the linked list of the marked character string according to the rule of the SQL sentence. But where two keywords cannot be linked. If the key words are function expressions, function parameters in the sentence linked list are searched, and if the parameters do not meet the requirements, a prompt is given. And finally executing the whole SQL statement for creating the view.

The invention discloses a method for selecting and integrating data in multiple data sets, which comprises the steps of splitting a connection queue formed by connecting a plurality of mutually connected data sets to respectively form a plurality of data set groups which are sequentially arranged in series, sequentially screening and combining two adjacent data sets according to the preset combination sequence and the integration attribute of the connection mark of the adjacent data sets to form each branch data set, and finally screening and combining each branch data set to form the same integrated data set. The filtering and merging of fields in a plurality of data set connection queues with branches is realized. The method can more efficiently screen and combine a plurality of data sets, and meet the data analysis requirements of scenes such as visual presentation of monitoring data in a plurality of working procedures in comparison with changing conditions.

In another embodiment, a system for integrating data selection in multiple data sets is also disclosed, including: the field acquisition module is used for acquiring each field attribute and each field parameter value in the plurality of data sets which are connected with each other through the connection line identification, and reading the integration attribute of each connection line identification; the decomposition module is used for splitting a connection queue formed by connecting all the data sets to form a plurality of data set groups which are sequentially arranged in series, sequentially screening field parameter values of two adjacent data sets according to the integration attribute of the connection mark according to a preset combination sequence, combining the two adjacent data sets into a branch data set, and then screening and combining the branch data set with the next data set until the combination of the whole data sets is completed; and the integration module is used for acquiring the integration attribute and the connection field of the connection line identifier of each data set group, and integrating the branch data sets formed after the data set groups are integrated, filtering the field parameter values according to the integration attribute and the connection field of the corresponding connection line identifier, and then merging the field parameter values to form an integrated data set.

In this embodiment, the decomposition module is further configured to obtain a starting data table node of the data set queue according to a set direction, traverse each data table node from the starting data table node to an end data table node in the data set queue according to a connection relationship of each data table through a depth-first algorithm, and form a queue connection path; and reversely acquiring and recording the association relation between the queue connection path and each data table from the end data table node according to the queue connection path to form at least one data set group which is arranged in tandem.

One side of the data set queue with at least one data set is connected with a plurality of different data sets through different connection marks, wherein the data set sets are reversely acquired from the end data table node according to the queue connection path and the association relation between the end data table node and each data table is recorded to form a plurality of data set groups which are arranged in tandem.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the data selection and integration system in multiple data sets disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section because the system corresponds to the method for selecting and integrating the data in multiple data sets disclosed in the embodiment.

In other embodiments, there is further provided a device for data selection and integration in multiple data sets, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for data selection and integration in multiple data sets as described in the embodiments above when the processor executes the computer program.

The multiple data sets may be selected and integrated by a processor or a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a data selection integration apparatus within a multi-dataset, and does not constitute a limitation of the data selection integration apparatus device within a multi-dataset, and may include more or less components than illustrated, or may combine some components, or different components, e.g., the data selection integration apparatus device within a multi-dataset may further include an input-output device, a network access device, a bus, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., where the processor is a control center of the data selection and integration device apparatus in the multiple data sets, and various interfaces and lines are used to connect various parts of the data selection and integration device apparatus in the entire multiple data sets.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the data selection integration apparatus device in the multiple data sets by running or executing the computer program and/or module stored in the memory and invoking the data stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like, and the memory may include a high-speed random access memory, and may further include a nonvolatile memory such as a hard disk, a memory, a plug-in type hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), at least one disk storage device, a Flash memory device, or other volatile solid-state storage device.

The data selection integration means within the multiple data sets may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the above-described embodiment of the method, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the above-described embodiments of the method for selecting and integrating data in multiple data sets. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

In summary, the foregoing description is only of the preferred embodiments of the present invention, and all equivalent changes and modifications made in accordance with the claims should be construed to fall within the scope of the invention.

Claims

1. The method for selecting and integrating the data in the multiple data sets is characterized by comprising the following steps:

2. The method for integrating multiple data set data selection according to claim 1, wherein the step S2 further comprises:

3. The method for integrating data selection in multiple data sets according to claim 2, wherein the step S2 further comprises:

one side of the data set queue, which is provided with at least one data set, is respectively connected with a plurality of different data sets through different connection line identifiers, wherein the data set groups which are arranged in tandem are formed after the association relationship between the data set queue and each data table is reversely acquired and recorded from the end data table node according to the connection path of the queue.

4. The method for integrating data selection in multiple data sets according to claim 3, wherein the step S3 further comprises:

and acquiring a multi-connection-mark merging rule in an integration attribute of connection marks in a branch data set, and carrying out field parameter value screening on a plurality of data set groups according to the multi-connection-mark merging rule to merge the data set groups to form an integrated data set, wherein the branch data set is a data set with a plurality of connection marks connected on the same side.

5. The method for integrating data selection in multiple data sets according to claim 4, wherein the step S3 further comprises: and generating a view result SQL statement according to the formed integrated data set, checking the generated view result SQL statement according to the grammar of the SQL statement, and analyzing to generate a visual report.

6. A system for data selection integration within a plurality of data sets, comprising:

the field acquisition module is used for acquiring each field attribute and each field parameter value in the plurality of data sets which are connected with each other through the connection line identification, and reading the integration attribute of each connection line identification;

the decomposition module is used for splitting a connection queue formed by connecting all the data sets to form a plurality of data set groups which are sequentially arranged in series, sequentially screening field parameter values of two adjacent data sets according to the integration attribute of the connection mark according to a preset combination sequence, combining the two adjacent data sets into a branch data set, and then screening and combining the branch data set with the next data set until the combination of the whole data sets is completed;

and the integration module is used for acquiring the integration attribute and the connection field of the connection line identifier of each data set group, and integrating the branch data sets formed after the data set groups are integrated, filtering the field parameter values according to the integration attribute and the connection field of the corresponding connection line identifier, and then merging the field parameter values to form an integrated data set.

7. The system of claim 6, wherein the decomposition module is further configured to obtain a starting data table node of the data set queue according to a set direction, traverse each data table node from the starting data table node to an end data table node in the data set queue according to a connection relationship of each data table by a depth-first algorithm, and form a queue connection path; and reversely acquiring and recording the association relation between the queue connection path and each data table from the end data table node according to the queue connection path to form at least one data set group which is arranged in tandem.

8. The multiple data set data selection integration system of claim 7, wherein: one side of the data set queue, which is provided with at least one data set, is respectively connected with a plurality of different data sets through different connection line identifiers, wherein the data set groups which are arranged in tandem are formed after the association relationship between the data set queue and each data table is reversely acquired and recorded from the end data table node according to the connection path of the queue.

9. A data selection and integration device in a plurality of data sets, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, implements the steps of the method according to any one of claims 1-6.

10. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any of claims 1-6 when executed by a processor.