CN113901094B - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113901094B
CN113901094B CN202111154987.9A CN202111154987A CN113901094B CN 113901094 B CN113901094 B CN 113901094B CN 202111154987 A CN202111154987 A CN 202111154987A CN 113901094 B CN113901094 B CN 113901094B
Authority
CN
China
Prior art keywords
data
target
source
path
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111154987.9A
Other languages
Chinese (zh)
Other versions
CN113901094A (en
Inventor
石晓坤
张瑞
许超
孟迪
吴家林
代小亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111154987.9A priority Critical patent/CN113901094B/en
Publication of CN113901094A publication Critical patent/CN113901094A/en
Application granted granted Critical
Publication of CN113901094B publication Critical patent/CN113901094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The disclosure provides a data processing method, a data processing device, data processing equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical field of big data and intelligent medical treatment. The specific implementation scheme is as follows: determining a tracing path of the data to be traced according to a generating path of the data to be traced; determining a data relationship between the data to be traced and the target original data to be acquired according to the tracing path and the structural configuration information associated with the generating path; and acquiring target original data of the data to be traced according to the data relation. Through the technical scheme of this disclosure, can accomplish tracing to the source of data accurately and fast, and then promoted the efficiency of data problem location and data evaluation etc..

Description

Data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technology, in particular to the field of big data and intelligent medical technology, and more particularly to a data processing method, apparatus, device, and storage medium.
Background
With the continuous development and application of internet technology, various mass data can be generated in different fields. For any field, such as medical field, in order to facilitate data storage and application, different forms of data are usually structured and converted into data in a standard structure.
However, in the process of using data in the standard structure form, data related personnel usually have requirements for data problem location and data evaluation, and therefore data source tracing is required.
Disclosure of Invention
The disclosure provides a data processing method, apparatus, device and storage medium.
According to an aspect of the present disclosure, there is provided a data processing method, including:
determining a tracing path of the data to be traced according to a generating path of the data to be traced;
determining a data relationship between the data to be traced and the target original data to be acquired according to the tracing path and the structural configuration information associated with the generating path;
and acquiring target original data of the data to be traced according to the data relation.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any one of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the tracing of the data can be accurately and rapidly completed, and then the efficiency of data problem positioning, data evaluation and the like is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flow chart of a data processing method provided according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data processing apparatus provided in accordance with an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a data processing method provided according to an embodiment of the present disclosure. The embodiment of the disclosure is suitable for the situation of how to trace the source of the data. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device carrying data processing functions. As shown in fig. 1, the data processing method provided in this embodiment may include:
s101, determining a source tracing path of the data to be traced according to the generation path of the data to be traced.
In this embodiment, the data to be traced is structured data that needs to search for original data, and the structured data is data in a standard structural form that is subjected to structural processing. Optionally, the data to be traced can be obtained from a tracing request submitted by a user. For example, in the process of using the structured data, if the user has requirements for data evaluation, data problem location, and the like, a source tracing request including data to be traced can be submitted to the electronic device having a data processing function, and the electronic device can acquire the data to be traced from the source tracing request. Optionally, the electronic device in this embodiment may provide an interaction channel, such as a visual interaction interface or an interaction interface, to the user.
Illustratively, the path for generating the data to be traced is a path for generating the data to be traced, and specifically, the path is a processing flow for performing structural processing on target original data of the data to be traced to generate the data to be traced. Optionally, generating the path includes at least one structuring operation. For example, an original data is converted into a structured data through structured processing operations such as mapping, parsing and cleaning; and the generation path of the structured data is composed of mapping, analyzing and cleaning structured operations. Further, when the data structuring processing flow is complex, the generation path may be represented in a form of Directed Acyclic Graph (DAG), where each node in the DAG represents one structuring operation, and an arrow between nodes represents a sequence between two structuring operations.
According to an implementation manner, the association relationship between the structured data and the generation path may be stored in advance, and then after the data to be traced is obtained, the generation path of the data to be traced may be obtained according to the data to be traced based on the association relationship between the structured data and the generation path.
In this embodiment, the source tracing path of the data to be traced is a source tracing flow required for tracing the source of the data to be traced. Optionally, the tracing path is an inverse process of the generation path. And then after the generation path of the data to be traced is obtained, the reverse path of the generation path can be used as the tracing path of the data to be traced. For example, the generation path of the data to be traced consists of mapping, parsing and cleaning structured operations; furthermore, the tracing path of the data to be traced is composed of cleaning, analyzing and mapping tracing operations.
For example, the tracing path may include at least one tracing operation, and each tracing operation uniquely corresponds to one structured operation in the generation path. Further, under the condition that the tracing path includes two or more tracing operations, the association relationship (i.e., the precedence order or the progressive relationship) between the tracing operations in the tracing path is determined by the precedence order between the structured operations in the generation path.
S102, determining a data relation between the data to be traced and the target original data to be acquired according to the tracing path and the structural configuration information associated with the generating path.
In this embodiment, the structural configuration information associated with the generation path may include structural configuration information associated with the structural operation in the generation path. Illustratively, for a structuring operation, the structuring configuration information associated with the structuring operation is a rule for performing the structuring operation on data. For example, a data needs to be parsed into structural operations, and the structural configuration information associated with the parsing of the structural operations is:
Figure BDA0003288383430000041
and the name of the field name with the name of the marital in the data can be modified into the marital state based on the configuration information.
The target original data is the original data of the data to be traced obtained through the structuring processing, and the further target original data can be composed of single type of data and different types of data, such as semi-structured data, unstructured data and the like. The data relationship between the data to be traced and the target original data to be acquired can be a corresponding relationship between storage paths, and can be used for representing the storage paths of the content in the data to be traced in the target original data; furthermore, the data to be traced includes at least one field, and the data relationship between the data to be traced and the target original data to be obtained is substantially a combination of data relationships at least at one field level, that is, the data relationship between the data to be traced and the target original data to be obtained can specifically represent the storage path of the related content in the target original data corresponding to the field in the data to be traced.
For example, the to-be-traced data includes a field of case characteristics, and the storage path of the field of case characteristics obtained by analyzing the structured configuration information associated with the generated path is dst _ path [ "first course record.
Figure BDA0003288383430000042
And then the two storage paths dst _ path and src _ path can be used as the data relationship between the field of the case characteristics in the data to be traced and the corresponding content in the target original data, that is, the corresponding relationship between the storage paths.
Further, the source data to be traced and the target original data to be obtained are substantially an end-to-end data relationship, that is, a data relationship between the input end and the output end.
According to one implementation mode, a data relation between data to be traced and target raw data to be acquired can be determined in a mode of combining a machine learning model. For example, the tracing path and the structural configuration information associated with the generation path may be input together into a pre-trained relationship determination model, which outputs a data relationship between the data to be traced and the target raw data to be acquired.
In another implementation manner, since the traceable path is a reverse path of the generated path, the embodiment may analyze the relevant configuration information in the structured configuration information associated with the generated path based on the traceable operation in the traceable path, and may determine the data relationship between the data to be traced and the target original data to be obtained based on the association relationship between the analysis result and the traceable operation in the traceable path.
And S103, acquiring target original data of the data to be traced according to the data relation.
Specifically, after the data relationship between the data to be traced and the target original data to be obtained is determined, the target original data associated with the data to be traced can be obtained from the candidate original data according to the data relationship.
It should be noted that, at present, for structured data stored in a single type of database (such as a relational database or a NoSQL database), data relationships are generally generated based on a scripting language corresponding to the database for tracing. For composite data with various forms, such as composite data combined by structured data, semi-structured data, unstructured data and the like, the composite data is converted into data with a standard structure form through structured processing for a user to use, and the user needs to manually search if the user has a data tracing requirement in the process of using the data with the standard structure form of the composite data.
In this embodiment, no matter the data is structured data of a single type, or structured data obtained by structuring composite data composed of multiple data forms, the data relationship between the data to be traced and the target original data can be quickly and accurately determined by combining the generation path and the structured configuration information associated with the generation path, and then the target original data can be accurately obtained based on the data relationship. Compared with the existing data source tracing mode realized by the database-based scripting language, the method has strong universality; compared with a manual data tracing mode, the automatic data tracing method has the advantages that automatic tracing is achieved, tracing efficiency is improved, and further efficiency of data problem positioning and data assessment (such as structured accuracy assessment) is improved.
According to the technical scheme provided by the embodiment of the disclosure, the tracing path of the data to be traced can be determined according to the generation path of the data to be traced; by combining the tracing path and the structural configuration information associated with the generation path, the data relationship between the data to be traced and the target original data to be acquired can be quickly and accurately determined, and the target original data can be accurately acquired based on the data relationship. Compared with the existing data tracing mode based on the scripting language of the database, the method is applicable to tracing of single type of structured data and composite data, and is high in universality; meanwhile, the method does not depend on manpower, realizes automatic traceability, improves the traceability efficiency, and further improves the efficiency of data problem positioning, data evaluation and the like.
On the basis of the embodiment, in order to facilitate the user to visually perform data problem positioning or data evaluation and the like, the data to be traced, the target original data and the data relationship between the data to be traced and the target original data to be obtained can be displayed in an associated manner according to a preset display mode.
For example, the data relationship between the data to be traced and the target original data to be acquired may be displayed at a set position of the visual interface, and the data to be traced and the target original data may be displayed in the visual interface in a horizontal parallel manner or a vertical parallel manner. Further, if it is monitored that the user clicks a certain field in the data to be traced, highlighting relevant content in the target original data can be performed.
Fig. 2 is a flowchart of another data processing method provided according to an embodiment of the present disclosure, and this embodiment further explains in detail how to determine a data relationship between the data to be traced and the target original data to be obtained according to the structured configuration information associated with the tracing path and the generating path, on the basis of the foregoing embodiment. As shown in fig. 2, the data processing method provided in this embodiment may include:
s201, determining a tracing path of the data to be traced according to the generation path of the data to be traced.
S202, selecting the target configuration information of the tracing operation in the tracing path from the structured configuration information associated with the generation path.
Optionally, the structural configuration information associated with the generation path may include structural configuration information associated with a structural operation in the generation path, and each traceback operation in the traceback path uniquely corresponds to one structural operation in the generation path, and further, for the traceback operation in the traceback path, the structural configuration information of the structural operation corresponding to the traceback operation may be used as the target configuration information of the traceback operation.
And S203, determining a data relation corresponding to the source tracing operation according to the target configuration information.
In this embodiment, the tracing operation may include a mapping tracing operation, an analyzing tracing operation, a cleaning tracing operation, and the like; the mapping traceability operation is to reversely deduce the relationship information between the input data (namely, source data) and the output data (namely, result data) according to the structural configuration information of the mapping structural operation corresponding to the mapping traceability operation; furthermore, the structural configuration information of the mapping structuring operation generally defines the addressing process information from the target field to the source field, and the source storage path in the mapping tracing operation can be described by using continuous XPATH.
The analysis traceability operation is to reversely deduce the relationship information between the input data (namely, source data) and the output data (namely, result data) according to the structural configuration information of the analysis structural operation corresponding to the analysis traceability operation; further, the parsing structured operation is generally used to process unstructured text content, and the structured configuration information of the parsing structured operation generally includes fields that need to be structured.
The cleaning tracing operation is to reversely deduce the relation information between the input data (namely, source data) and the output data (namely, result data) according to the structural configuration information of the cleaning structural operation corresponding to the cleaning tracing operation; further, the structural configuration information for cleaning the structural operation usually includes related information of a change field (such as a field for performing an add/drop modify operation).
Optionally, for each tracing operation, the data relationship corresponding to the tracing operation is a data relationship between result data of the target configuration information of the tracing operation and source data, specifically, a corresponding storage relationship between the result data and the source data, and further may characterize a storage path of related content in the source data corresponding to a field in the result data. And performing structured operation corresponding to the source tracing operation on the source data according to the target configuration information to obtain result data. For example, if the source tracing operation is a cleansing tracing operation, the corresponding structured operation is a cleansing structured operation, and then, according to the target configuration information, the cleansing structured operation is performed on the source data, and the result data can be obtained.
Specifically, for each tracing operation, the data relationship corresponding to the tracing operation can be obtained by analyzing the target configuration information of the tracing operation.
For each tracing operation, target configuration information of the tracing operation, result data and source data of the target configuration information, and the like can be combined to determine a data relationship corresponding to the tracing operation.
And S204, processing the data relationship corresponding to the source tracing operation according to the incidence relationship between the source tracing operation in the source tracing path to obtain the data relationship between the data to be traced and the target original data to be obtained.
In this embodiment, the association relationship between the tracing operations is used to characterize the dependency relationship or the progressive relationship between the tracing operations. For example, the tracing path includes two tracing operations, namely a tracing operation a and a tracing operation b, the structural operation a corresponding to the tracing operation a is executed first in the generated path, and the structural operation b corresponding to the tracing operation b is executed after the structural operation a, so that the relationship between the tracing operation a and the tracing operation b is that the tracing operation b is executed first, and then the tracing operation a is executed.
Optionally, if only one tracing operation exists in the tracing path, the data relationship corresponding to the tracing operation may be used as the data relationship between the data to be traced and the target original data to be obtained.
Further, if the tracing path includes two or more tracing operations, the data relationships corresponding to the tracing operations may be connected in series according to the association relationship between the tracing operations in the tracing path, so as to obtain the data relationship between the data to be traced and the target original data to be obtained.
Specifically, for each field in the data to be traced, according to the association relationship between the tracing operations related to the field (for example, the field and the related field having a direct or indirect dependency relationship with the field) in the tracing path, the data relationships corresponding to the field in the tracing operations are connected in series, so that the data relationship between the field in the data to be traced and the related content in the target original data to be obtained can be obtained.
For example, the tracing path includes a mapping tracing operation and a cleaning tracing operation, and the incidence relation between the mapping tracing operation and the cleaning tracing operation is that the cleaning tracing operation is performed first, and then the mapping tracing operation is performed; assuming that the data to be traced comprises a field of case characteristics, further extracting a data relation related to case characteristics from the data relation corresponding to the cleaning tracing operation, searching from the data relation corresponding to the mapping tracing operation according to a source storage path src _ path1 in the data relation of the extracted case characteristics, determining a target storage path dst _ path2 which is the same as the source storage path src _ path1, and extracting a source storage path src _ path2 corresponding to the determined target storage path dst _ path 2; and taking the target storage path dst _ path1 corresponding to the source storage path src _ path1 in the data relationship corresponding to the cleansing tracing operation and the source storage path src _ path2 as the data relationship between the field of the case characteristics in the data to be traced and the content related to the case characteristics in the target original data to be acquired.
S205, according to the data relation between the data to be traced and the target original data to be obtained, the target original data of the data to be traced is obtained.
According to the technical scheme provided by the embodiment of the disclosure, the source tracing operation is introduced, the data relation corresponding to the source tracing operation is processed according to the incidence relation between the source tracing operations in the source tracing path, the data relation between the data to be traced and the target original data to be obtained can be obtained, and the target original data can be accurately obtained based on the data relation. According to the scheme, the whole tracing complexity is simplified by changing the fine granularity of the complex type data to single-step tracing operation, and a new thought is provided for the tracing of the complex type data; meanwhile, single-step tracing operation is introduced, and verification of the accuracy of single structuring operation can be achieved.
Fig. 3 is a flowchart of another data processing method provided according to an embodiment of the present disclosure, and this embodiment further explains in detail how to determine a data relationship between data to be traced and target original data to be acquired according to structured configuration information associated with a tracing path and a generating path based on the above embodiment. As shown in fig. 3, the data processing method provided in this embodiment may include:
s301, determining a tracing path of the data to be traced according to the generation path of the data to be traced.
S302, selecting the target configuration information of the tracing operation in the tracing path from the structured configuration information associated with the generation path.
And S303, determining a target storage path of a target field associated with the tracing operation and a source storage path of a source field corresponding to the target field according to the target configuration information.
Optionally, the target field, the target storage path of the target field, the source field corresponding to the target field, and the source storage path of the source field (i.e., the source storage path of the source field corresponding to the target field) may be determined by analyzing the target configuration information. The target field is a field in the result data associated with the target configuration information, the source field is a field in the source data associated with the target configuration information, and further, the source field in the source data is subjected to structuring processing according to rules in the target configuration information, so that the target field in the result data can be obtained. The target storage path of the target field is the storage location of the target field in the result data associated with the target configuration information, and correspondingly, the source storage path is the storage location of the source field corresponding to the target field in the source data associated with the target configuration information.
Optionally, the target configuration information of different tracing operations is different, and the presentation forms of fields in different target configuration information are different; furthermore, for analyzing the tracing operation and cleaning the tracing operation, the relationship between the fields in the target configuration information is simple, and then the target configuration information is simply analyzed, so that the target field, the target storage path of the target field and the source storage path of the source field corresponding to the target field can be quickly and accurately determined.
For example, parsing the target configuration information of the tracing operation is as follows:
Figure BDA0003288383430000101
the target configuration information is essentially used for carrying out structural processing on the content in the source field of the record content, and two target fields of case characteristics increase and preliminary diagnosis are added; further, based on the target configuration information, it can be determined that the target storage path of the target field of the case characteristics is "dst _ path" [ "first course record. case characteristics" ], and the source storage path of the source field (i.e. record content) corresponding to the target field of the case characteristics is "src _ path" [ "first course record. record content" ]. Similarly, the target storage path of the target field for the preliminary diagnosis may be determined as "dst _ path" [ "first course record.
Further, for the mapping and tracing operation, because the value types of the fields may include constant, single value, nesting, drill-down, multi-table fusion and other forms, and further, the relationship between the fields in the target configuration information is relatively complex, the target fields, the target storage paths of the target fields, and the source storage paths corresponding to the target fields may be determined by analyzing the relationship between the fields in the target configuration information.
According to one embodiment, candidate fields associated with mapping and tracing operations can be determined from target configuration information; determining a target field associated with the mapping traceability operation from the candidate fields according to the sub-field information of the candidate fields; and determining a target storage path of the target field and a source storage path of the source field corresponding to the target field according to the target configuration information.
Specifically, all fields related to a target field in the target configuration information are used as candidate fields by analyzing the target configuration information; determining subfield information of the candidate field according to the organization form of the data in the target configuration information; if the subfield information of the candidate field does not have the subfield, the candidate field can be used as a target field; if the subfield information of the candidate field is that the subfield exists, the candidate field is rejected. Further, after the target field is determined, the target path of the target field and the source storage path of the target field corresponding to the source field may be determined according to the organization form of the data in the target configuration information.
For example, the target configuration information of the mapping tracing operation is as follows:
Figure BDA0003288383430000111
the target configuration information is substantially used for structuring data in xml data, and further fields such as a record ID, a record name, a disease area and the like can be used as candidate fields, further, according to the organization form of the data in the target configuration information, each candidate field does not have a subfield, and further, each candidate field can be used as a target field, and then a target storage path can be determined for each target field, and a source storage path of the target field corresponding to a source field can be determined. For example, the target storage path for the target field of the disease area is "dst _ path" [ "first disease process record" ], and the target field of the disease area is derived from xml data of the source field of the database with the field name DOCUMENT _ CDA, so the source storage path may be composed of a structured path and a semi-structured path, wherein the structured path is used to describe which field of the database the xml data is located in, such as the field with the field name DOCUMENT _ CDA, and the semi-structured data is used to describe the storage path of the xml data, i.e. the source storage path may be expressed as:
Figure BDA0003288383430000121
further, when the structured path is determined, if the field DOCUMENT _ CDA is a subfield of a certain field, the method traces back until the root field is found, and splices according to the relationship between the fields, so as to obtain the structured path.
It should be noted that, for the mapping traceability operation, by introducing the sub-field information in combination with the diversity of the field values, the target field can be quickly and accurately located, and then the target path of the target field and the source path of the source field corresponding to the target field can be determined, thereby providing an optional way for the traceability operation of the mapping traceability data.
S304, determining a data relation corresponding to the source tracing operation according to the target storage path and the source storage path.
Optionally, for the mapping tracing operation and the cleansing tracing operation, after determining the target storage path of the target field associated with the tracing operation and the source storage path corresponding to the target field, the target storage path of the target field associated with the tracing operation and the source storage path of the source field corresponding to the target field may be directly used as the data relationship corresponding to the tracing operation.
For the parsing and tracing operation, since the parsing and structuring operation is usually used to process unstructured text content, the text has uncertainty, and a specific location can be located only when actual data participates, so as to improve data location and evaluation efficiency, in this embodiment, specific location information of the text is added to a data relationship corresponding to the parsing and tracing operation, and may be represented by offset.
Optionally, if the tracing operation includes an analysis tracing operation, the location information of the target field in the source field may be determined according to result data associated with target configuration information of the analysis tracing operation; and determining a data relation corresponding to the analysis tracing operation according to the target storage path, the source storage path and the position information.
For example, for the target configuration information of the parsing and tracing operation, after determining the target storage paths of all target fields (such as case characteristics and preliminary diagnosis) associated with the parsing and tracing operation and the source storage paths of the target fields corresponding to the source fields, for each target field, the position information of the target field in the source field can be determined from the result data associated with the target configuration information. For example, for the case characteristics, the position information of the case characteristics in the recorded content may be read from the result data, and may be represented by "offset":9, or may be represented by "offset":9, 15, for example.
Further, for each target field, after determining the location information of the target field in the source field, the target storage path, the source storage address, the location information, and the like of the target field may be used together as the data relationship corresponding to the target field in the parsing and tracing operation.
Further, the data relationships corresponding to all the target fields associated with the parsing and tracing operation may be used as the data relationships corresponding to the parsing and tracing operation.
S305, processing the data relation corresponding to the source tracing operation according to the incidence relation between the source tracing operations in the source tracing path to obtain the data relation between the data to be traced and the target original data to be obtained.
Further, if the target original data is unstructured text data stored in xml data in the database, the data relationships corresponding to the multiple source tracing operations are connected in series, and a source storage path in the finally obtained data relationship may be composed of a database query statement, an XPATH path corresponding to xml type data, location information corresponding to the text data, and the like.
For example, the tracing path of the data to be traced includes parsing tracing operation and mapping tracing operation, and the association relationship between the parsing tracing operation and the mapping tracing operation is that the parsing tracing operation is performed first, and then the mapping tracing operation is performed. Assuming that the data to be traced includes a field of case characteristics, by sorting the data relationship corresponding to the field of case characteristics in the analysis tracing operation and the mapping tracing operation, the source storage path of the related content in the target original data corresponding to the field of case characteristics can be obtained as follows:
Figure BDA0003288383430000131
this source storage path is composed of a database query statement, i.e., an sql statement, an XPATH path corresponding to xml-type data, and location information offset corresponding to text data.
S306, according to the data relation between the data to be traced and the target original data to be obtained, obtaining the target original data of the data to be traced.
According to the technical scheme provided by the embodiment of the disclosure, a target storage path of a target field and a source storage path of a source field corresponding to the target field are introduced, and a data relationship corresponding to a single-step source tracing operation, namely a field-level data relationship, is determined based on the target storage path and the source storage path; and then, processing the data relationship corresponding to the source tracing operation according to the incidence relationship between the source tracing operations in the source tracing path, so as to obtain the data relationship between the data to be traced and the target original data to be obtained, namely the combination of the data relationship of at least one field level, and accurately obtaining the target original data based on the data relationship. According to the scheme, the field-level data relation is generated by introducing the target storage path of the target field and the source storage path of the source field corresponding to the target field, and the efficiency of subsequent data problem positioning, data evaluation and the like is improved.
For example, on the basis of any of the above embodiments, in order to more accurately locate the problem, a field value is introduced into the data relationship corresponding to the source tracing operation, and then a field value is also included in the data relationship between the source data to be traced obtained by the serial source tracing operation and the target original data to be obtained, so that a user can quickly locate the abnormal problem by checking the data relationship corresponding to each source tracing operation and verifying whether each structured operation is correct.
According to an implementation manner, determining a data relationship corresponding to a source tracing operation according to a target storage path and a source storage path may be: acquiring a first field value from result data associated with target configuration information according to a target storage path; acquiring a second field value from source data associated with the target configuration information according to the source storage path; and determining a data relation corresponding to the source tracing operation according to the target storage path, the source storage path, the first field value and the second field value.
Specifically, for each target field associated with each source tracing operation, after determining a target storage path of the target field and a source storage path corresponding to the target field, a first field value, that is, a value of the target field, may be obtained from result data associated with target configuration information according to the target storage path; and according to the source storage path, acquiring a second field value from the source data associated with the target configuration information, namely the value of the source field corresponding to the target field. Then, the target storage path, the source storage path, the first field value, and the second field value may be used together as a data relationship corresponding to the target field in the tracing operation.
Further, the data relationships corresponding to all the target fields associated with the tracing operation may be used as the data relationships corresponding to the tracing operation.
It should be noted that, by introducing a field value into the data relationship corresponding to the tracing operation, a user can check the data relationship corresponding to each tracing operation to verify whether each structured operation is correct, so as to quickly locate the abnormal problem.
Fig. 4 is a schematic structural diagram of a data processing apparatus provided according to an embodiment of the present disclosure. The embodiment of the disclosure is suitable for the situation of how to trace the source of the data. The apparatus can be implemented by software and/or hardware, and the apparatus can implement the data processing method according to any embodiment of the disclosure. As shown in fig. 4, the data processing apparatus includes:
a source tracing path determining module 401, configured to determine a source tracing path of the data to be traced according to the generated path of the data to be traced;
a data relationship determining module 402, configured to determine a data relationship between the data to be traced and the target original data to be obtained according to the tracing path and the structured configuration information associated with the generation path;
the original data selecting module 403 is configured to obtain target original data of the data to be traced according to the data relationship.
According to the technical scheme provided by the embodiment of the disclosure, the tracing path of the data to be traced can be determined according to the generation path of the data to be traced; by combining the tracing path and the structural configuration information associated with the generation path, the data relationship between the data to be traced and the target original data to be acquired can be quickly and accurately determined, and the target original data can be accurately acquired based on the data relationship. Compared with the existing data source tracing mode realized by the database-based scripting language, the method is applicable to the source tracing of single type structured data and the source tracing of composite data, namely the universality is strong; meanwhile, the method does not depend on manpower, realizes automatic tracing, improves the tracing efficiency, and further improves the efficiency of data problem positioning, data evaluation and the like.
Illustratively, the data relationship determination module 402 includes:
the target configuration information selection unit is used for selecting the target configuration information of the tracing operation in the tracing path from the structural configuration information associated with the generation path;
the single-step data relation determining unit is used for determining a data relation corresponding to the source tracing operation according to the target configuration information;
and the total data relation determining unit is used for processing the data relation corresponding to the source tracing operation according to the incidence relation between the source tracing operations in the source tracing path to obtain the data relation between the data to be traced and the target original data to be obtained.
Illustratively, the single-step data relationship determination unit includes:
the field information determining subunit is used for determining a target storage path of a target field associated with the source tracing operation and a source storage path of a source field corresponding to the target field according to the target configuration information;
and the single-step data relation determining subunit is used for determining the data relation corresponding to the source tracing operation according to the target storage path and the source storage path.
Illustratively, if the tracing operation includes a parsing tracing operation, the single-step data relationship determining subunit is specifically configured to:
determining the position information of the target field in the source field according to the result data associated with the target configuration information of the analysis traceability operation;
and determining a data relation corresponding to the analysis tracing operation according to the target storage path, the source storage path and the position information.
For example, if the tracing operation includes a mapping tracing operation, the field information determining subunit is specifically configured to:
determining a candidate field associated with the mapping traceability operation from the target configuration information;
determining a target field associated with the mapping traceability operation from the candidate fields according to the sub-field information of the candidate fields;
and determining a target storage path of the target field and a source storage path of the source field corresponding to the target field according to the target configuration information.
Illustratively, the single-step data relationship determination subunit is specifically configured to:
acquiring a first field value from result data associated with target configuration information according to a target storage path;
acquiring a second field value from source data associated with the target configuration information according to the source storage path;
and determining a data relation corresponding to the source tracing operation according to the target storage path, the source storage path, the first field value and the second field value.
Exemplarily, the apparatus further includes:
and the display module is used for performing associated display on the data to be traced, the target original data and the data relationship between the data to be traced and the target original data to be acquired.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related to-be-traced data, original data, structured configuration information and the like all accord with the regulations of related laws and regulations, and do not violate the good custom of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 can also be stored. The calculation unit 501, the ROM502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as the data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (14)

1. A method of data processing, comprising:
determining a tracing path of the data to be traced according to a generating path of the data to be traced;
determining a data relationship between the data to be traced and the target original data to be acquired according to the tracing path and the structural configuration information associated with the generating path;
acquiring target original data of the data to be traced according to the data relation;
determining a data relationship between the data to be traced and the target original data to be obtained according to the structured configuration information associated with the tracing path and the generating path, including:
selecting target configuration information of the tracing operation in the tracing path from the structural configuration information associated with the generation path;
determining a data relation corresponding to the source tracing operation according to the target configuration information;
and processing the data relationship corresponding to the source tracing operation according to the incidence relationship between the source tracing operations in the source tracing path to obtain the data relationship between the data to be traced and the target original data to be acquired.
2. The method according to claim 1, wherein the determining a data relationship corresponding to the tracing operation according to the target configuration information includes:
determining a target storage path of a target field associated with the source tracing operation and a source storage path of a source field corresponding to the target field according to the target configuration information;
and determining a data relation corresponding to the source tracing operation according to the target storage path and the source storage path.
3. The method according to claim 2, wherein if the tracing operation includes a parsing tracing operation, the determining a data relationship corresponding to the tracing operation according to the target storage path and the source storage path includes:
determining the position information of the target field in the source field according to the result data associated with the target configuration information of the analysis and tracing operation;
and determining a data relation corresponding to the analysis tracing operation according to the target storage path, the source storage path and the position information.
4. The method according to claim 2, wherein if the tracing operation includes a mapping tracing operation, the determining, according to the target configuration information, a target storage path of a target field associated with the tracing operation and a source storage path of a source field corresponding to the target field includes:
determining candidate fields associated with the mapping traceability operation from the target configuration information;
determining a target field associated with the mapping traceability operation from the candidate fields according to the sub-field information of the candidate fields;
and determining a target storage path of the target field and a source storage path of the source field corresponding to the target field according to the target configuration information.
5. The method according to claim 2, wherein the determining the data relationship corresponding to the source tracing operation according to the target storage path and the source storage path includes:
acquiring a first field value from result data associated with the target configuration information according to the target storage path;
acquiring a second field value from the source data associated with the target configuration information according to the source storage path;
and determining a data relationship corresponding to the source tracing operation according to the target storage path, the source storage path, the first field value and the second field value.
6. The method of any of claims 1-5, further comprising:
and performing association display on the data to be traced, the target original data and the data relationship between the data to be traced and the target original data to be acquired.
7. A data processing apparatus comprising:
the source tracing path determining module is used for determining a source tracing path of the data to be traced according to a generation path of the data to be traced;
the data relationship determination module is used for determining the data relationship between the data to be traced and the target original data to be acquired according to the tracing path and the structural configuration information associated with the generation path;
the original data selection module is used for acquiring target original data of the data to be traced according to the data relation;
wherein the data relationship determination module comprises:
a target configuration information selecting unit, configured to select, from the structured configuration information associated with the generated path, target configuration information of a tracing operation in the tracing path;
the single-step data relation determining unit is used for determining a data relation corresponding to the source tracing operation according to the target configuration information;
and the total data relationship determining unit is used for processing the data relationship corresponding to the source tracing operation according to the incidence relationship between the source tracing operations in the source tracing path to obtain the data relationship between the data to be traced and the target original data to be acquired.
8. The apparatus of claim 7, wherein the single-step data relationship determination unit comprises:
a field information determining subunit, configured to determine, according to the target configuration information, a target storage path of a target field associated with the source tracing operation, and a source storage path of a source field corresponding to the target field;
and the single-step data relation determining subunit is used for determining the data relation corresponding to the source tracing operation according to the target storage path and the source storage path.
9. The apparatus of claim 8, wherein, if the traceback operation comprises a parse traceback operation, the single-step data relationship determination subunit is specifically configured to:
determining the position information of the target field in the source field according to the result data associated with the target configuration information of the analysis and tracing operation;
and determining a data relationship corresponding to the analysis traceability operation according to the target storage path, the source storage path and the position information.
10. The apparatus according to claim 8, wherein if the tracing operation includes a mapping tracing operation, the field information determining subunit is specifically configured to:
determining candidate fields associated with the mapping traceability operation from the target configuration information;
determining a target field associated with the mapping traceability operation from the candidate fields according to the sub-field information of the candidate fields;
and determining a target storage path of the target field and a source storage path of the source field corresponding to the target field according to the target configuration information.
11. The apparatus of claim 8, wherein the single-step data relationship determining subunit is specifically to:
according to the target storage path, acquiring a first field value from result data associated with the target configuration information;
according to the source storage path, acquiring a second field value from source data associated with the target configuration information;
and determining a data relationship corresponding to the source tracing operation according to the target storage path, the source storage path, the first field value and the second field value.
12. The apparatus of any of claims 7-11, further comprising:
and the display module is used for performing association display on the data to be traced, the target original data and the data relationship between the data to be traced and the target original data to be acquired.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-6.
14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data processing method according to any one of claims 1 to 6.
CN202111154987.9A 2021-09-29 2021-09-29 Data processing method, device, equipment and storage medium Active CN113901094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111154987.9A CN113901094B (en) 2021-09-29 2021-09-29 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111154987.9A CN113901094B (en) 2021-09-29 2021-09-29 Data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113901094A CN113901094A (en) 2022-01-07
CN113901094B true CN113901094B (en) 2022-08-23

Family

ID=79189550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111154987.9A Active CN113901094B (en) 2021-09-29 2021-09-29 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113901094B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125152A (en) * 2019-12-26 2020-05-08 积成电子股份有限公司 Full link data control method based on data processing process model
CN113434533A (en) * 2021-07-22 2021-09-24 支付宝(杭州)信息技术有限公司 Data tracing tool construction method, data processing method, device and equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2959651C (en) * 2014-09-03 2021-04-20 The Dun & Bradstreet Corporation System and process for analyzing, qualifying and ingesting sources of unstructured data via empirical attribution
CN110019116B (en) * 2017-09-26 2023-07-07 南京中兴新软件有限责任公司 Data tracing method, device, data processing equipment and computer storage medium
US10565229B2 (en) * 2018-05-24 2020-02-18 People.ai, Inc. Systems and methods for matching electronic activities directly to record objects of systems of record
CN111046085B (en) * 2019-12-19 2023-04-28 医渡云(北京)技术有限公司 Data tracing processing method and device, medium and equipment
CN111563103B (en) * 2020-04-28 2022-05-20 厦门市美亚柏科信息股份有限公司 Method and system for detecting data blood relationship
CN113326442A (en) * 2020-11-17 2021-08-31 崔海燕 Recommended material pushing method and system based on big data positioning and cloud computing center

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125152A (en) * 2019-12-26 2020-05-08 积成电子股份有限公司 Full link data control method based on data processing process model
CN113434533A (en) * 2021-07-22 2021-09-24 支付宝(杭州)信息技术有限公司 Data tracing tool construction method, data processing method, device and equipment

Also Published As

Publication number Publication date
CN113901094A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
US10089296B2 (en) System and method for sentiment lexicon expansion
TWI643076B (en) Financial analysis system and method for unstructured text data
US9558462B2 (en) Identifying and amalgamating conditional actions in business processes
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN110874364B (en) Query statement processing method, device, equipment and storage medium
CN115617888A (en) Data import method, device, equipment, storage medium and product
CN113032258B (en) Electronic map testing method and device, electronic equipment and storage medium
CN113609100A (en) Data storage method, data query method, data storage device, data query device and electronic equipment
CN113901094B (en) Data processing method, device, equipment and storage medium
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
CN110580170A (en) software performance risk identification method and device
CN107273293B (en) Big data system performance test method and device and electronic equipment
CN115455091A (en) Data generation method and device, electronic equipment and storage medium
CN111507080B (en) Data quality inspection method and device, electronic equipment and storage medium
CN110895529B (en) Processing method of structured query language and related device
CN108073643B (en) Task processing method and device
CN112307050B (en) Identification method and device for repeated correlation calculation and computer system
US20230132618A1 (en) Method for denoising click data, electronic device and storage medium
CN114492409B (en) Method and device for evaluating file content, electronic equipment and program product
CN116401177B (en) DDL correctness detection method, device and medium
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN115361290B (en) Configuration comparison method, device, electronic equipment and storage medium
US20210326514A1 (en) Method for generating interpretation text, electronic device and storage medium
CN115017507A (en) Method, device, equipment and storage medium for detecting source code tampering
CN117407513A (en) Question processing method, device, equipment and storage medium based on large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant