CN115862882B - Data extraction method, device, equipment and storage medium - Google Patents

Data extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN115862882B
CN115862882B CN202211542433.0A CN202211542433A CN115862882B CN 115862882 B CN115862882 B CN 115862882B CN 202211542433 A CN202211542433 A CN 202211542433A CN 115862882 B CN115862882 B CN 115862882B
Authority
CN
China
Prior art keywords
sub
data extraction
flow
mode
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211542433.0A
Other languages
Chinese (zh)
Other versions
CN115862882A (en
Inventor
武惠韬
张思琦
吴家林
代小亚
黄海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211542433.0A priority Critical patent/CN115862882B/en
Publication of CN115862882A publication Critical patent/CN115862882A/en
Application granted granted Critical
Publication of CN115862882B publication Critical patent/CN115862882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data extraction method, a device, equipment and a storage medium, and relates to the field of data processing, in particular to the field of big data. The specific implementation scheme is as follows: obtaining a target data extraction mode; determining a target data extraction flow corresponding to the target data extraction mode; for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained; generating data extraction logic based on the obtained configuration information; and according to the data extraction logic, carrying out data extraction. By applying the scheme provided by the embodiment of the disclosure, the data extraction efficiency can be improved.

Description

Data extraction method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of big data technologies.
Background
In various current application scenes, massive data often exist, and users generally only pay attention to part of the data, so that the data needs to be extracted from the massive data, and further the requirements of the users are met. For example, in a medical scenario, a clinical data center stores a large amount of unstructured data and structured data, from which medical or scientific personnel need to extract important medical data of their own attention.
In the prior art, a developer is often required to write data extraction codes, and then the electronic device performs data extraction by running the codes.
Disclosure of Invention
The present disclosure provides a data extraction method, apparatus, device, and storage medium.
According to an aspect of the present disclosure, there is provided a data extraction method, including:
obtaining a target data extraction mode;
determining a target data extraction flow corresponding to the target data extraction mode;
for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained;
generating data extraction logic based on the obtained configuration information;
and according to the data extraction logic, carrying out data extraction.
According to another aspect of the present disclosure, there is provided a data extraction apparatus including:
the extraction mode obtaining module is used for obtaining a target data extraction mode;
the extraction flow determining module is used for determining a target data extraction flow corresponding to the target data extraction mode;
the configuration information obtaining module is used for obtaining the configuration information of the configuration item corresponding to each sub-flow in the target data extraction mode;
The data extraction logic generation module is used for generating data extraction logic based on the obtained configuration information;
and the data extraction module is used for carrying out data extraction according to the data extraction logic.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data extraction method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described data extraction method.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described data extraction method.
From the above, when the scheme provided by the embodiment of the present disclosure is applied to perform data extraction, a target data extraction mode is obtained, and a target data extraction flow corresponding to the target data extraction mode is determined, so that for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained, so that data extraction logic can be generated based on the obtained configuration information, and data extraction can be performed successfully according to the data extraction logic.
In the process, after the user configures the configuration items corresponding to each sub-process in the target data extraction process, the electronic equipment can obtain the configuration information corresponding to each sub-process, so that data extraction logic is generated and data extraction is performed based on the configuration information, the user does not need to write data extraction codes in the process, the technical threshold required by the user when the user performs data extraction is reduced, the time required by writing codes is saved, and the data extraction efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flowchart of a first data extraction method according to an embodiment of the disclosure;
fig. 2 is a flow chart of a second data extraction method according to an embodiment of the disclosure;
fig. 3 is a flowchart of a third data extraction method according to an embodiment of the disclosure;
FIG. 4 is a schematic diagram of field types provided in an embodiment of the present disclosure;
Fig. 5 is a flowchart of a fourth data extraction method according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a user interface provided by an embodiment of the present disclosure;
fig. 7 is a schematic diagram of a data extraction flow provided in an embodiment of the disclosure;
FIG. 8 is a schematic diagram of one configuration provided by an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a data streaming process according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a data extraction device according to an embodiment of the disclosure;
fig. 11 is a block diagram of an electronic device for implementing a data extraction method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
First, description will be made of an execution body of the scheme provided by the embodiment of the present disclosure.
The implementation main body of the scheme provided by the embodiment of the disclosure is as follows: any one electronic device with data processing, storage and other functions.
The application scenario of the solution provided by the embodiments of the present disclosure is further described below.
The application scene of the scheme provided by the embodiment of the disclosure is as follows: the scene of the specific data is extracted from the pre-stored data.
From the aspect of storage mode, the pre-stored data may be structured data or unstructured data; from the data type point of view, the pre-stored data may be medical data, such as medical electronic report forms, etc.; any type of data such as commodity data, archive data, business data, etc. may be used.
The data extraction method provided by the embodiment of the present disclosure is specifically described below.
Referring to fig. 1, fig. 1 is a flowchart of a first data extraction method according to an embodiment of the disclosure, where the method includes the following steps S101 to S105.
Step S101: a target data extraction pattern is obtained.
This step is described below in terms of both the acquisition mode and the acquisition flow of the target data extraction pattern.
From the viewpoint of the acquisition mode, the target data extraction mode may be acquired in the following manner.
In one embodiment, a target data extraction pattern selected by a user via a user interface may be obtained.
Specifically, each data extraction mode may be displayed on a user interface, and then a target data extraction mode selected by a user is obtained by monitoring a selection operation of the user on the user interface.
In another embodiment, a target data extraction mode set by the user in the configuration file may be obtained.
Specifically, the user may write a configuration sentence for setting the target data extraction mode in the configuration file, and the electronic device may parse the configuration file according to a preset configuration file parsing rule to obtain the target data extraction mode represented by the configuration sentence.
From the acquisition flow, the target data extraction pattern can be obtained in the following manner.
In one embodiment, the target data extraction pattern may be obtained directly.
In another embodiment, the primary mode of data extraction may be obtained first, and then the target data extraction mode may be obtained from the sub-modes supported by the primary mode.
The above detailed description of the main mode and the sub-modes supported by the main mode is referred to the embodiments shown in fig. 3 and 5, which will not be described in detail herein.
It should be noted that, the sub-modes supported by the above main mode may be understood as sub-modes included in the main mode cascade, which are described in the following two cases:
in this case, the sub-mode supported by the main mode may be a sub-mode directly included in the main mode, and in this case, the sub-mode may also be referred to as a direct sub-mode of the main mode.
In another case, the sub-mode supported by the main mode may be a sub-mode supported by another mode directly included in the main mode, and in this case, the sub-mode may also be referred to as an indirect sub-mode of the main mode.
Therefore, the main mode of data extraction can be determined through the division of the main mode, and then the target data extraction mode is determined from the sub-modes supported by the main mode, so that the target data extraction mode can be conveniently and intuitively obtained through a layering determination mode.
Step S102: and determining a target data extraction flow corresponding to the target data extraction mode.
Specifically, the target data extraction flow corresponding to the obtained target data extraction mode may be determined according to a preset correspondence between the data extraction mode and the data extraction flow.
For example, if the target data extraction mode is the mode P1, it is determined that the target data extraction flow corresponding to P1 includes sub-flows P1, P2, P3, and the like.
The detailed description of the target data extraction flow corresponding to each target data extraction mode is referred to the embodiments shown in fig. 3 and 5, and will not be described in detail herein.
Step S103: for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained.
Each sub-flow corresponds to a configuration item in the target data extraction mode, and in this step, configuration information of the configuration item corresponding to each sub-flow needs to be obtained.
The configuration information is used to generate the data extraction logic, the configuration items of different sub-flows are different, and the configuration information corresponding to the configuration items is also different, which is briefly described by way of example.
For example, for the extraction condition configuration sub-flow, the configuration information of the configuration item corresponding to the extraction condition configuration sub-flow may be field information to be extracted, information corresponding to the condition description element, and information corresponding to the condition logic symbol. The meaning of the above-mentioned various types of information is described in detail in the following description of step S307 in the embodiment shown in fig. 3.
For another example, the sub-flow is configured for the mapping relationship, and the configuration information of the corresponding configuration item may be a field to be set with the mapping relationship.
Configuration information corresponding to each sub-flow including the mapping relation configuration sub-flow and the extraction condition configuration sub-flow referred to in the above examples may be referred to in the embodiments shown in fig. 3 and 5, which will not be described in detail herein.
In this step, similar to the above-described obtaining manner of the target data extraction mode in step S101, configuration information may be obtained based on the configuration performed by the user in the configuration items presented in the user interface, or configuration information for each configuration item written in the configuration file by the user may be obtained.
Step S104: based on the obtained configuration information, data extraction logic is generated.
Specifically, the data extraction logic may be generated based on the obtained configuration information in the following manner.
In one embodiment, the combination condition corresponding to each sub-flow may be generated based on the data extraction rule and the configuration information corresponding to the sub-flow, and then the data query statement of the obtained combination condition may be generated as the data extraction logic. The specific embodiment is shown in step S204-step S206 in the example shown in fig. 2, and will not be described in detail here.
In another embodiment, the data extraction logic may be generated directly based on the configuration information, and this embodiment is applicable to the configuration information including complete data extraction rules.
For example, the configuration information of the configuration item corresponding to the configuration sub-flow configured for the extraction condition written in the configuration file by the user contains a complete data extraction rule, so that a data query statement can be directly generated according to the data extraction rule and used as the data extraction logic.
The data query statement may be a statement written in a data query language such as SQL (Structured Query Language ), HQL (Hibernate Query Language, hibernate query language), JPQL (Java Persistence query language, java persistent query language), or the like.
Step S105: and according to the data extraction logic, carrying out data extraction.
Specifically, after the data extraction logic is generated, the data extraction logic may be operated to perform data extraction, and a data extraction result may be obtained.
From the above, when the scheme provided by the embodiment of the present disclosure is applied to perform data extraction, a target data extraction mode is obtained, and a target data extraction flow corresponding to the target data extraction mode is determined, so that for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained, so that data extraction logic can be generated based on the obtained configuration information, and data extraction can be performed successfully according to the data extraction logic.
In the process, after the user configures the configuration items corresponding to each sub-process in the target data extraction process, the electronic equipment can obtain the configuration information corresponding to each sub-process, so that data extraction logic is generated and data extraction is performed based on the configuration information, the user does not need to write data extraction codes in the process, the technical threshold required by the user when the user performs data extraction is reduced, the time required by writing codes is saved, and the data extraction efficiency is improved.
In addition, the target data extraction mode determines a target data extraction flow, and configuration information corresponding to each sub-flow in the target data extraction flow determines data extraction logic for extracting data, that is, the scheme provided by the embodiment of the disclosure can obtain the target data extraction flow, the configuration information and the data extraction logic corresponding to different target data extraction modes in a targeted manner, so that a user can flexibly select the target data extraction mode according to own data extraction requirements, and the flexibility and the practicability of the data extraction scheme are improved, and the user experience is improved.
On the basis of the embodiment shown in fig. 1, when generating the data extraction logic based on the obtained configuration information, the combination condition corresponding to each sub-flow may be generated based on the data extraction rule and the configuration information corresponding to the sub-flow, and then the data query statement of the obtained combination condition may be generated as the data extraction logic. In view of the foregoing, embodiments of the present disclosure provide a second data extraction method.
Referring to fig. 2, fig. 2 is a flowchart of a second data extraction method according to an embodiment of the disclosure, where the method includes the following steps S201 to S207.
Step S201: a target data extraction pattern is obtained.
Step S202: and determining a target data extraction flow corresponding to the target data extraction mode.
Step S203: for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained.
The steps S201 to S203 are the same as the steps S101 to S103 in the embodiment shown in fig. 1, and are not described herein.
After the above steps S201 to S203 are performed, the data extraction logic corresponding to each sub-process may be generated according to the following steps S204 to S206.
Step S204: and combining configuration information corresponding to the sub-processes according to the data extraction rules corresponding to the sub-processes to generate the target condition unit.
The data extraction rule may be a preset rule corresponding to the sub-flow, or may be a rule determined according to configuration information corresponding to the sub-flow, and the generation manner of the target condition unit is illustrated below for the two cases.
Example 1: the data extraction rule is a preset rule corresponding to the sub-process.
For example, the sub-flow is configured for a mapping relationship, and the corresponding preset data extraction rule may be to map one field to another field.
In this case, the field a and the field b may be obtained from the configuration information corresponding to the mapping relationship configuration sub-flow, and the two fields may be combined according to the rule to obtain the target condition unit: field a→field b.
Example 2: the data extraction rule is a rule determined based on configuration information corresponding to the sub-flow.
For example, for the extraction condition configuration sub-flow, the corresponding configuration information includes the following information c1-c4: symptoms, signs, diseases, and conditions a) and contains information characterizing the logical relationship between the above information: c1 or c2 or c3=c4.
In this case, the data extraction rule may be determined according to the condition description information, and then the configuration information c1-c4 may be combined according to the data extraction rule, to obtain the target condition unit: symptom/sign/disease = condition a, the above "/" is used to indicate a logical relationship "or".
In one embodiment of the present disclosure, the step S204 may be implemented by:
combining a field to be extracted in configuration information corresponding to the sub-process, a condition description element and information corresponding to a condition logical symbol to generate a first condition unit, and obtaining a second condition unit for restraining the generated first condition unit according to information corresponding to a condition restraining element in the configuration information corresponding to the sub-process to obtain a target condition unit comprising the first condition unit and the second condition unit. The details of the embodiment are shown in the following steps D to E in the example shown in fig. 3, which will not be described in detail here.
Step S205: and combining the generated target condition units according to the logic relation among the condition units configured by the sub-flow to obtain a combined condition.
The logical relationship may be an and, or, not, or the like, represented by a conditional symbol.
For example, if the logical relationship between the target condition unit a1 and the target condition unit a2 is "and", the logical relationship between the target condition unit a1, the target condition unit a2 and the target condition unit a3 is "or", the combination condition obtained by combining the condition units a1, a2, a3 is: (target condition unit a1 and target condition unit a 2) or target condition a3.
In one embodiment of the present disclosure, the step S205 may be implemented by:
and combining the second condition unit corresponding to the first condition unit according to the information corresponding to the constraint logic type in the configuration information corresponding to the sub-flow, and combining the combined first condition unit according to the information corresponding to the condition logic type in the configuration information corresponding to the sub-flow to obtain a combined condition. The details of the embodiment will be described in the following steps F1 to F2 in the example shown in fig. 3, which will not be described in detail here.
Step S206: and generating a data query statement of the combination condition as data extraction logic corresponding to the sub-flow.
Specifically, after the combination condition is obtained, a data query statement that characterizes the combination condition may be generated as data extraction logic corresponding to the sub-flow.
Step S207: and according to the data extraction logic, carrying out data extraction.
In the above, when generating the data extraction logic based on the obtained configuration information, for each sub-flow, the configuration information corresponding to the sub-flow may be first combined according to the data extraction rule corresponding to the sub-flow to generate the target condition unit, and then the generated target condition unit is combined according to the logic relationship between the condition units configured by the sub-flow to obtain the combined condition, and the data query statement of the combined condition is regenerated as the data extraction logic corresponding to the sub-flow. Thus, the final data extraction logic can be accurately and orderly obtained step by step according to the configuration information through the steps of generating the target condition unit, combining the target condition units, generating the data query statement and the like.
In one embodiment of the present disclosure, the step S207 may be further implemented by the following steps a to C:
Step A: a data query request is generated based on the data query statement.
Specifically, a data query request may be generated that carries a data query statement.
And (B) step (B): and sending a data query request to the data query terminal so that the data query terminal performs data query and data extraction based on the data query request.
The data querying side may be any type of electronic device, such as a server with a data querying engine deployed.
After the data query request is received by the data query terminal, a data query statement carried in the request can be obtained, the data query statement is operated to perform data query, and the queried data is extracted.
Step C: and receiving a data extraction result fed back by the data query end.
Therefore, the data query end performs the data query and data extraction steps, the electronic device receives the data extraction result fed back by the data query end, and the electronic device serving as an execution main body of the scheme is not required to execute the actual data extraction steps, so that the calculation resource consumption of the electronic device is saved, and the data extraction efficiency is improved.
Based on the embodiments shown in fig. 1 and fig. 2, the obtained target data extraction modes may also be different according to the difference of the main modes of data extraction, and further, the target data extraction flows corresponding to the target data modes are also different. The following describes the manner in which the target data extraction flows corresponding to the different main modes and sub modes are obtained by referring to fig. 3 and 5, respectively.
First, a manner of obtaining a target data extraction pattern and a target extraction flow for each sub-pattern in the case where the main pattern is the second data extraction pattern based on the variable generation rule is described by way of the embodiment shown in fig. 3.
Referring to fig. 3, fig. 3 is a flowchart of a third data extraction method according to an embodiment of the disclosure, where the method includes the following steps S301 to S308.
Step S301: a main pattern of data extraction is obtained from a first data extraction pattern referencing an existing rule and a second data extraction pattern based on a variable generation rule.
The variable-based generation rule can be understood as: and setting a variable representing a data extraction target, and generating a data extraction rule for the variable.
The variable representing the data extraction target may be one or more.
The main modes are divided into a first data extraction mode referencing the existing rules and a second data extraction mode generating rules based on variables, so that the sub modes can be selected from the two main modes conveniently.
Step S302: in the case where the main mode is the second data extraction mode, a target data extraction mode is obtained.
Specifically, the target data extraction pattern may be obtained from the following sub-patterns of data extraction:
1. and reflecting a first sub-mode of the mapping relation between the field to be extracted and the set field corresponding to the variable.
The fields to be extracted corresponding to the variables may be all fields related to the object of the data extraction.
For example, the data extraction targets for the variable representation are: the red blood cell count test result value within 30 minutes before the heart bypass operation, and the field to be extracted can be the fields of heart bypass, blood convention, red blood cells and the like.
The set field may be an actual field of data stored in a database to be subjected to data extraction.
2. And reflecting a second sub-mode of extraction conditions to be met by the field to be extracted corresponding to the variable.
The extraction conditions may be conditions for extracting data, and are generally formed by combining fields and logical relationships between fields, for example, the aforementioned examples may be: symptom/sign/disease = condition a.
3. And a third sub-mode corresponding to the expression of the field description to be extracted and corresponding to the variable.
The above-described expression is described by a field to be extracted, for example, the field to be extracted includes "lesion time", "disease progression time", and "diagnosis time", and the expression described by the above-described field may be "lesion time=disease progression time-diagnosis time".
It can be seen that the second data extraction mode based on the variable generation rule includes a first sub-mode for setting a mapping relationship between the field to be extracted and the set field, a second sub-mode for setting extraction conditions to be satisfied by the field to be extracted, and a third sub-mode for setting an expression described by the field to be extracted corresponding to the variable, where the sub-modes of the types are used for different configurations, so as to more comprehensively satisfy the data extraction requirement of the user.
Step S303: if the target data extraction mode is the first sub-mode, determining that the target data extraction flow corresponding to the target data extraction mode comprises a mapping relation configuration sub-flow.
The mapping relation configuration sub-flow is used for setting the mapping relation between the field to be extracted and the set field.
Step S304: if the target data extraction mode is the second sub-mode, determining that the target data extraction flow includes an extraction condition configuration sub-flow and an output condition configuration sub-flow.
Wherein the output condition configuration sub-flow includes at least one of the following sub-flows: an output type configuration sub-flow, a rules configuration sub-flow for output data, and an aggregation rules configuration sub-flow for output data.
Specifically, the operation of each of the above-described output condition configuration sub-processes will be described below.
1. The output type configures the sub-flow.
The data type for setting the data output when the extraction condition is satisfied or not satisfied may include an enumeration dictionary type, a variable type, a fixed value type, and the like.
For example, when the extraction condition is satisfied, outputting data of an enumeration dictionary type; when the extraction condition is not satisfied, data of a fixed value type or the like is output.
2. The sub-flows are configured for the rules of the output data.
The method is used for regularizing the output data, namely mapping the output data into preset standard data.
3. The sub-flow is configured for the aggregation rule of the output data.
For further aggregation of the output data to obtain scalar values.
If the variable representing the extracted object of the data is a classification variable representing the class, the aggregation operation for the output data may be to take the first output data, take the last output data, take the number of output data, etc.
If the variable representing the extracted object of the data is a numerical variable representing a numerical value, the aggregation operation for the output data may be taking the first output data, taking the last output data, taking the maximum value of the output data, taking the minimum value of the output data, taking the average value of the output data, taking the number of the output data, and the like.
The actions of the respective sub-flows included in the output condition configuration sub-flow are intuitively described below by table 1.
TABLE 1
The first column in table 1 represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, the third column represents an effect of configuring a sub-flow for an output type, the fourth column represents an effect of configuring a sub-flow for a rule of output data, and the fifth column represents an effect of configuring a sub-flow for an aggregation rule of output data.
Step S305: if the target data extraction mode is the third sub-mode, determining that the target data extraction flow comprises a field configuration sub-flow to be extracted and an expression configuration sub-flow.
The above-mentioned field configuration sub-process to be extracted may be used to set a mapping relationship between the field to be extracted and the set field.
The above-described expression configuration sub-flow is used to obtain an expression based on the field description to be extracted.
It can be seen that when the main data extraction mode is the second data extraction mode based on the variable generation rule, the target data extraction flow is different according to the sub-mode as the target data extraction mode, so that the extraction flow is more targeted.
The target data extraction pattern and the corresponding target data extraction flow described in the above steps S303 to S305 can be visually represented by the following table 2.
TABLE 2
In table 1, the first column represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, and the third column represents a target data extraction flow corresponding to the sub-mode.
Step S306: for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained.
Configuration information of the configuration items corresponding to the respective sub-flows mentioned in the foregoing step S303 to step S305 in the target data extraction mode will be briefly described below.
1. Mapping relation configuration sub-flow
The corresponding configuration information may be names of fields in which the mapping relationship is to be set.
2. Extraction condition configuration sub-process
The corresponding configuration information can be information corresponding to the field to be extracted, the condition description element and the condition logic symbol. The meaning of the above information is referred to in the following examples.
3. The output type configures the sub-flow.
Its corresponding configuration information may be the type of output data.
4. The sub-flows are configured for the rules of the output data.
The corresponding configuration information can be a standardized data set by a user and a mapping relationship between output data and standardized data.
5. The sub-flow is configured for the aggregation rule of the output data.
Its corresponding configuration information may be an aggregation operation for the output data.
6. The field to be extracted configures the sub-flow.
The corresponding configuration information may be names of fields in which the mapping relationship is to be set.
7. The expression configures the sub-flow.
The corresponding configuration information can be the logical relationship between each field to be extracted and the fields to be extracted, so that the custom expression can be generated according to the combination.
The corresponding configuration information can also be the identification of the preset expression, so that the preset expression corresponding to the identification can be selected.
The above-mentioned preset expression may be a BMI (body fat index) formula, a CCr (creatinine clearance) formula, or the like.
Step S307: based on the obtained configuration information, data extraction logic is generated.
In this step, the data extraction logic may be generated based on the obtained configuration information in the manner described in the foregoing step S204 to step S206 in the embodiment shown in fig. 2.
In one embodiment of the present disclosure, for the extraction condition configuration sub-process, the data extraction logic may also be generated based on the configuration information corresponding to the sub-process in the manner shown in the following step D-step G.
Step D: and combining the field to be extracted, the condition description element and the information corresponding to the condition logical symbol in the configuration information corresponding to the sub-flow to generate a first condition unit.
The above-described various information will be described below.
1. And information corresponding to the field to be extracted.
The detailed field information for describing the field to be extracted may include a field chinese name, a field english name, a field attribute, and the like.
2. Information corresponding to the condition description element.
The detailed information for describing the extraction condition composed of the fields to be extracted may include a field name, the number of fields, and the like contained in the extraction condition.
3. Information corresponding to the conditional logical symbol.
The logical relationships between the fields to be extracted are described as "AND", "OR", "NOT", etc.
This allows the combination of the above information to generate a first conditional element.
In one embodiment of the present disclosure, information such as condition logic type information, condition type information, etc. may also be used to assist in generating the first condition unit.
Wherein the above condition logic information is used to describe whether information satisfying a condition needs to be output or excluded, and the condition type information is used to describe whether the condition is a condition for a single variable or a condition for two variables having a temporal context.
Step E: and obtaining a second condition unit for restraining the generated first condition unit according to information corresponding to the condition restraining element in the configuration information corresponding to the sub-flow, and obtaining a target condition unit comprising the first condition unit and the second condition unit.
The information corresponding to the condition constraint element is used for constraining the first condition unit, and may include constraint fields and logical relationships between the constraint fields, and the second condition unit may be generated based on the constraint fields and the logical relationships between the constraint fields.
The constraint field described above is illustrated in connection with fig. 4.
Referring to fig. 4, a schematic diagram of field types is provided for an embodiment of the disclosure.
In fig. 4, fields such as examination item, examination description, hypernym, yin-yang, concomitance, symptom, time of occurrence, duration, location, nature, degree, stage, grade, pathology typing, cause, frequency of occurrence, frequency of exacerbation, and remission factor are constraint fields that can be used to generate the second condition element.
The target condition units obtained by step D and step E are exemplified by table 3 below.
TABLE 3 Table 3
The first column in table 3 represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, and the third column represents a target condition unit obtained by configuring configuration information of a sub-flow according to extraction conditions.
It can be seen that the target condition unit consists of 1 first condition unit a and 2 second condition units A1, A2.
In this way, the first condition unit is generated by combining the field to be extracted, the condition description element and the information corresponding to the condition logical symbol, and the second condition unit for restraining the generated first condition unit is obtained based on the information corresponding to the condition restraining element, so that the target condition unit comprising the first condition unit and the second condition unit is obtained, the information contained in the target condition unit is richer, and the description capability of the target condition unit for the extraction condition is improved.
Step F: and combining the generated target condition units according to the logic relation among the condition units configured by the sub-flow to obtain a combined condition.
F1: and combining the second condition unit corresponding to the first condition unit according to the information corresponding to the constraint logic type in the configuration information corresponding to the sub-flow.
The information corresponding to the constraint logic type is used to describe a logic relationship between the second condition element and the first condition element, for example, the logic relationship may be "and", "or", "not", or the like.
F2: and combining the combined first condition units according to the information corresponding to the condition logic type in the configuration information corresponding to the sub-flow to obtain a combined condition.
The information corresponding to the condition logic type is used to describe the logic relationship between the first condition units after each combination, for example, the logic relationship may be "and", "or", "not", etc.
Therefore, the second condition units corresponding to the first condition units are combined to the first condition units according to the information corresponding to the constraint logic types, and then the combined first condition units are combined according to the information corresponding to the condition logic types, so that the combined conditions can be conveniently and accurately generated through layer-by-layer combination.
Step G: and generating a data query statement of the combination condition as data extraction logic corresponding to the sub-flow.
Step S308: and according to the data extraction logic, carrying out data extraction.
The step S308 is the same as the step S105, and will not be described here again.
The manner of obtaining the target data extraction mode and the target extraction flow in the case where the main mode is the first data extraction mode will be described by way of the embodiment shown in fig. 5.
Referring to fig. 5, fig. 5 is a flowchart of a fourth data extraction method according to an embodiment of the disclosure, where the method includes the following steps S501 to S506.
Step S501: a main pattern of data extraction is obtained from a first data extraction pattern referencing an existing rule and a second data extraction pattern based on a variable generation rule.
Step S502: and under the condition that the main mode is the first data extraction mode, obtaining a target data extraction mode from the historical rule reference sub-mode and the preset rule reference sub-mode.
It can be seen that the first data extraction mode referencing the existing rule includes a historical rule referencing sub-mode and a preset rule referencing sub-mode, and different sub-modes are used for setting different contents, so that the requirements of users can be met more comprehensively.
Step S503: the determining target data extraction flow includes a target rule configuration sub-flow and a reference validation sub-flow.
Wherein the target rule includes a history rule or a preset rule.
The history rule may be a rule generated from a history data extraction record of the user, for example, a data extraction logic generated when the history data of the user is extracted is used as the history rule.
It can be seen that when the main data extraction mode is the first data extraction mode referring to the existing rule, the target data extraction flow is different according to different sub-modes serving as target data extraction modes, so that the extraction flow is more targeted.
The target data extraction pattern and the corresponding target data extraction flow described in the above steps S502 to S503 can be visually represented by the following table 4.
TABLE 4 Table 4
In table 4, the first column represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, and the third column represents a target data extraction flow corresponding to the sub-mode.
Step S504: for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained.
The following exemplifies configuration information of configuration items corresponding to the sub-flows by taking the extraction target rule configuration sub-flow and the reference confirmation sub-flow as examples.
Specifically, the configuration information corresponding to the target rule configuration sub-flow may be an identification of the target rule.
Step S505: based on the obtained configuration information, data extraction logic is generated.
Specifically, the referenced target rule may be determined according to the identifier of the target rule, and then preset data extraction logic corresponding to the target rule is generated.
Step S506: and according to the data extraction logic, carrying out data extraction.
The step S506 is the same as the step S105 in the embodiment shown in fig. 1, and will not be repeated here.
The data extraction method provided by the embodiment of the present disclosure is further described more intuitively by several data extraction cases.
Referring first to table 5 below, the primary schema provided for embodiments of the present disclosure is a data extraction case of a second data extraction schema based on a variable generation rule and a first sub-schema of the sub-schema.
TABLE 5
/>
In table 5, the first column represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, the third column represents a variable name, the fourth column represents an extraction condition obtained by the sub-mode according to the extraction sub-flow, and the fifth column represents an output type rule obtained according to the extraction sub-flow.
As can be seen from table 5, for each variable, a corresponding extraction condition for matching data in the database and an output type rule for determining the type of output data when the extraction condition is satisfied or not is set. Thus, for different variables, data can be extracted according to the extraction conditions and the output type rule, and the data extraction flow can also be called a single variable mode.
Referring now to table 6 below, the primary schema provided for embodiments of the present disclosure is another data extraction case for a second data extraction schema based on a variable generation rule and a first sub-schema of the sub-schema.
TABLE 6
/>
In table 6, the first column represents a main mode of data extraction, the second column represents a sub-mode supported by the main mode, the third column represents a variable name, the fourth column represents an extraction condition obtained according to an extraction sub-flow, the fifth column represents an output type rule obtained according to the extraction sub-flow, and the sixth column represents an aggregation rule obtained according to the extraction sub-flow.
It can be seen from table 6 that for each variable it corresponds to the same extraction conditions, but respectively to different output type rules and aggregation rules. Thus, for different variables, final output data is obtained according to the output type rule and the aggregation rule respectively corresponding to the variables, and the data extraction flow can also be called as a multi-variable mode.
For example, for a variable of the tumor size minimum diameter, the corresponding output type rule is: and outputting the tumor size when the extraction condition is met, wherein the corresponding aggregation rule is as follows: taking the min operation of the minimum value of the output data, and finally outputting the variable to be the minimum value of the tumor size.
Wherein the target condition units in the above tables 5 and 6 may be referred to as a condition group, the first condition unit may be referred to as a condition in the condition group, and the second condition unit for constraining the first condition unit may be referred to as a constraint of the condition.
As can be seen from the above data extraction cases, the scheme provided by the embodiment of the disclosure can support complex data extraction flows such as a univariate mode and a multivariate mode by performing simple configuration, and solves the problem that a user is limited by an ETL (Extract-Transform-Load) technical threshold when extracting data.
It should be noted that the data extraction cases shown in the foregoing tables 5 and 6 are only examples for easy understanding, and the extraction sub-processes related to the data extraction cases are not representative of all the extraction sub-processes provided in the embodiments of the present disclosure.
In one embodiment of the present disclosure, after the data extraction, an extraction report may also be generated according to the extracted data.
Referring to table 7, an example of an extraction report is provided for an embodiment of the present disclosure.
TABLE 7
The first column in table 7 represents a report item, and the report item may include a variable name, a data extraction result, the physical table, and an extraction person, and the second column represents report details corresponding to the report item.
The specific number of the data extraction results may be obtained by using a count aggregation rule, and the non-empty rate may be obtained by calculating a set expression, which will not be described in detail herein.
In one embodiment of the present disclosure, before generating the data extraction logic, the preview result of the data extraction may also be displayed on the user interface before performing the data extraction according to the data extraction logic, which is described in detail below with reference to fig. 6.
Referring to fig. 6, a schematic diagram of a user interface is provided according to an embodiment of the present disclosure.
As can be seen from fig. 6, after the extraction conditions, output conditions, normalization, and data aggregation settings are completed, an extraction result preview interface can be displayed, such as an extraction result preview table shown on the right side of fig. 6, in which extraction results corresponding to each patient ID are displayed.
Through previewing, a user intuitively and effectively knows the expected result of data extraction, and accordingly the electronic equipment can be controlled to extract data when the previewed result meets the expected requirement, and the effect of data extraction and the matching rate of the obtained data and the expected data are effectively improved.
A general description of a data extraction flow provided in an embodiment of the present disclosure is provided below with reference to fig. 7.
Referring to fig. 7, a schematic diagram of a data extraction flow is provided in an embodiment of the disclosure.
As can be seen from fig. 7, the data extraction process includes the following steps S701-S706:
Step S701: a data extraction mode is defined.
Corresponding to the step of determining the target data extraction mode.
Step S702: the input modules are regularly configured.
The configuration corresponding to the mapping relation configuration sub-flow, the extraction condition configuration sub-flow, the field configuration sub-flow to be extracted and the expression configuration sub-flow are corresponding.
Step S703: and (5) regularly configuring an output module.
And configuring the configuration corresponding to the sub-flow corresponding to the output type.
Step S704: data standardization and aggregation rule configuration.
And configuring the sub-process corresponding to the regular rule for the output data and the aggregation rule for the output data.
Step S705: and generating an extraction logic set according to the extraction logic configuration.
Corresponding to the step of generating data extraction logic based on the obtained configuration information.
Thus, the depth and breadth defined by the data extraction rule are fundamentally improved through flexible and well-defined input, output, standardization and aggregation layer configuration.
Step S706: and confirming the preview data and storing the secondary structured result.
In this step, the extracted data may be previewed, and after the user confirms the previewed data, the data is extracted, and the extracted secondary structured result is stored.
From the above, when the method with strict constraint and simple interaction shown in the steps S701-S706 is adopted to perform data extraction, for scientific researchers in the medical field, the threshold of the scientific researchers for performing data extraction on data stored in a clinical data center is reduced, the extraction of medical data can be effectively and rapidly realized, the barriers of the scientific researchers on the business requirements and the technical requirements in the clinical data application are broken, and the acquisition of the medical data is ensured from the source. And through the technical means of the specification, the breakthrough of medical data from treatment to application is realized, and the method has important significance in improving the application efficiency of the medical data, normalizing the extraction flow of the medical data and ensuring the deep utilization of medical data assets and has popularization value.
The configuration content of the input and output modules is illustrated in the following with reference to fig. 8.
Referring to fig. 8, a schematic diagram of configuration content is provided in an embodiment of the disclosure.
It can be seen that the configuration content of the input and output modules includes eCRF (Electronic Case report Form, electronic case report table) field definition, field rule setting, preset rule selection, mapping rule setting, enumeration rule setting, and formula rule setting.
The preset rule selection may include rules and normalized binding, and the mapping rule setting and enumeration rule setting may further include normalized rule setting.
After the configuration of each item of setting is completed, the structured task extraction can be performed according to the configuration content.
A data transfer process provided by the embodiment of the present disclosure is described below with reference to fig. 9.
Referring to fig. 9, a schematic diagram of a data streaming process is provided in an embodiment of the disclosure.
It can be seen that, firstly, the data of the clinical data center is obtained, then the extraction mode configuration is carried out, the configuration of the input layer, the output layer, the standardization layer and the aggregation layer is carried out, the result sampling and verification are carried out after the configuration is completed, and after the extraction result is confirmed and stored, the extracted data is stored in the scientific research variable extraction result set.
The result sampling and verification is that the preview sampling result is displayed on a user interface, so that a user can determine whether to conduct data extraction according to preview content.
The extraction mode configuration corresponds to the determination target data extraction mode, and the input layer, the output layer, the normalization layer and the aggregation layer configurations respectively correspond to the extraction condition configuration sub-flow, the output type configuration sub-flow, the rule configuration sub-flow for output data and the aggregation rule configuration sub-flow for output data.
Corresponding to the data extraction method, the embodiment of the disclosure also provides a data extraction device.
Referring to fig. 10, a schematic structural diagram of a data extraction device according to an embodiment of the disclosure is provided, where the device includes the following modules 1001-1005:
an extraction pattern obtaining module 1001, configured to obtain a target data extraction pattern;
an extraction flow determining module 1002, configured to determine a target data extraction flow corresponding to the target data extraction mode;
a configuration information obtaining module 1003, configured to obtain, for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode;
a data extraction logic generation module 1004, configured to generate data extraction logic based on the obtained configuration information;
the data extraction module 1005 is configured to perform data extraction according to the data extraction logic.
From the above, when the scheme provided by the embodiment of the present disclosure is applied to perform data extraction, a target data extraction mode is obtained, and a target data extraction flow corresponding to the target data extraction mode is determined, so that for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained, so that data extraction logic can be generated based on the obtained configuration information, and data extraction can be performed successfully according to the data extraction logic.
In the process, the user only needs to configure the configuration items corresponding to all the sub-processes in the target data extraction process, and the electronic equipment can obtain the configuration information corresponding to all the sub-processes, so that configuration data extraction logic is generated and data extraction is performed, the user does not need to write data extraction codes, the technical threshold required by the user when the user performs data extraction is reduced, the time required by writing codes is saved, and the data extraction efficiency is improved.
In addition, the target data extraction mode determines a target data extraction flow, and configuration information corresponding to each sub-flow in the target data extraction flow determines data extraction logic for extracting data, that is, the scheme provided by the embodiment of the disclosure can obtain the target data extraction flow, the configuration information and the data extraction logic corresponding to different target data extraction modes in a targeted manner, so that a user can flexibly select the target data extraction mode according to own data extraction requirements, and the flexibility and the practicability of the data extraction scheme are improved, and the user experience is improved.
In one embodiment of the present disclosure, the data extraction logic generating module 1004 is specifically configured to generate the data extraction logic corresponding to each sub-flow according to the following sub-modules:
The target condition unit generating sub-module is used for generating a target condition unit by combining configuration information corresponding to the sub-processes according to data extraction rules corresponding to the sub-processes;
a combination condition obtaining sub-module, configured to combine the generated target condition units according to the logic relationship between the condition units configured in the sub-flow, to obtain a combination condition;
and the data extraction logic generation sub-module is used for generating the data query statement of the combination condition as the data extraction logic corresponding to the sub-flow.
In the above, when generating the data extraction logic based on the obtained configuration information, for each sub-flow, the configuration information corresponding to the sub-flow may be first combined according to the data extraction rule corresponding to the sub-flow to generate the target condition unit, and then the generated target condition unit is combined according to the logic relationship between the condition units configured by the sub-flow to obtain the combined condition, and the data query statement of the combined condition is regenerated as the data extraction logic corresponding to the sub-flow. Thus, the final data extraction logic can be accurately and orderly obtained step by step according to the configuration information through the steps of generating the target condition unit, combining the target condition units, generating the data query statement and the like.
In one embodiment of the disclosure, the target condition unit generating submodule is specifically configured to combine a field to be extracted in configuration information corresponding to a sub-flow, a condition description element and information corresponding to a condition logical symbol to generate a first condition unit; and obtaining a second condition unit for restraining the generated first condition unit according to information corresponding to the condition restraining element in the configuration information corresponding to the sub-flow, and obtaining a target condition unit comprising the first condition unit and the second condition unit.
In this way, the first condition unit is generated by combining the field to be extracted, the condition description element and the information corresponding to the condition logical symbol, and the second condition unit for restraining the generated first condition unit is obtained based on the information corresponding to the condition restraining element, so that the target condition unit comprising the first condition unit and the second condition unit is obtained, the information contained in the target condition unit is richer, and the description capability of the target condition unit for the extraction condition is improved.
In one embodiment of the disclosure, the combination condition obtaining submodule is specifically configured to combine, according to information corresponding to a constraint logic type in configuration information corresponding to a sub-flow, a second condition unit corresponding to a first condition unit to the first condition unit; and combining the combined first condition units according to the information corresponding to the condition logic type in the configuration information corresponding to the sub-flow to obtain a combined condition.
Therefore, the second condition units corresponding to the first condition units are combined to the first condition units according to the information corresponding to the constraint logic types, and then the combined first condition units are combined according to the information corresponding to the condition logic types, so that the combined conditions can be conveniently and accurately generated through layer-by-layer combination.
In one embodiment of the disclosure, the data extraction module 1005 is specifically configured to generate a data query request based on the data query statement; sending the data query request to a data query end so that the data query end performs data query and data extraction based on the data query request; and receiving a data extraction result fed back by the data query end.
Therefore, the data query end performs the data query and data extraction steps, the electronic device receives the data extraction result fed back by the data query end, and the electronic device serving as an execution main body of the scheme is not required to execute the actual data extraction steps, so that the calculation resource consumption of the electronic device is saved, and the data extraction efficiency is improved.
In one embodiment of the present disclosure, the extraction mode obtaining module 1001 includes:
A main mode obtaining sub-module for obtaining a main mode of data extraction;
and the extraction mode obtaining sub-module is used for obtaining a target data extraction mode from the sub-modes supported by the main mode.
Therefore, the main mode of data extraction can be determined through the division of the main mode, and then the target data extraction mode is determined from the sub-modes supported by the main mode, so that the target data extraction mode can be conveniently and intuitively obtained through a layering determination mode.
In one embodiment of the disclosure, the main pattern obtaining submodule is specifically configured to obtain a main pattern of data extraction from a first data extraction pattern referencing an existing rule and a second data extraction pattern based on a variable generation rule.
The main modes are divided into a first data extraction mode referencing the existing rules and a second data extraction mode generating rules based on variables, so that the sub modes can be selected from the two main modes conveniently.
In one embodiment of the present disclosure, the extraction mode obtaining submodule is specifically configured to obtain, in a case where the main mode is the second data extraction mode, a target data extraction mode from among the following submodules for performing data extraction: a first sub-mode reflecting the mapping relation between the field to be extracted and the set field corresponding to the variable; a second sub-mode reflecting extraction conditions to be satisfied by the field to be extracted corresponding to the variable; and a third sub-mode corresponding to the expression of the field description to be extracted and corresponding to the variable.
It can be seen that the second data extraction mode based on the variable generation rule includes a first sub-mode for setting a mapping relationship between the field to be extracted and the set field, a second sub-mode for setting extraction conditions to be satisfied by the field to be extracted, and a third sub-mode for setting an expression described by the field to be extracted corresponding to the variable, where the sub-modes of the types are used for different configurations, so as to more comprehensively satisfy the data extraction requirement of the user.
In one embodiment of the present disclosure, the extraction flow determining module 1002 is specifically configured to determine that, if the target data extraction mode is the first sub-mode, the target data extraction flow corresponding to the target data extraction mode includes the mapping relationship configuration sub-flow; if the target data extraction mode is the second sub-mode, determining that the target data extraction flow includes the extraction condition configuration sub-flow and an output condition configuration sub-flow, wherein the output condition configuration sub-flow includes at least one of the following sub-flows: an output type configuration sub-flow, a rules configuration sub-flow for output data, and an aggregation rules configuration sub-flow for output data; and if the target data extraction mode is the third sub-mode, determining that the target data extraction flow comprises a field configuration sub-flow to be extracted and an expression configuration sub-flow.
It can be seen that when the main data extraction mode is the second data extraction mode based on the variable generation rule, the target data extraction flow is different according to the sub-mode as the target data extraction mode, so that the extraction flow is more targeted.
In one embodiment of the disclosure, the extraction mode obtaining sub-module is specifically configured to obtain, in a case where the main mode is the first data extraction mode, a target data extraction mode from a historical rule reference sub-mode and a preset rule reference sub-mode.
It can be seen that the first data extraction mode referencing the existing rule includes a historical rule referencing sub-mode and a preset rule referencing sub-mode, and different sub-modes are used for setting different contents, so that the requirements of users can be met more comprehensively.
In one embodiment of the disclosure, the extraction flow determining module 1002 is specifically configured to determine that the target data extraction flow includes a target rule configuration sub-flow and a reference confirmation sub-flow, where the target rule includes a history rule or a preset rule.
It can be seen that when the main data extraction mode is the first data extraction mode referring to the existing rule, the target data extraction flow is different according to different sub-modes serving as target data extraction modes, so that the extraction flow is more targeted.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
In one embodiment of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data extraction method described above.
In one embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described data extraction method is provided.
In one embodiment of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-described data extraction method.
Fig. 11 illustrates a schematic block diagram of an example electronic device 1100 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1101 performs the respective methods and processes described above, such as a data extraction method. For example, in some embodiments, the data extraction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM1102 and/or communication unit 1109. When a computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the data extraction method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the data extraction method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. A data extraction method, comprising:
obtaining a target data extraction mode;
determining a target data extraction flow corresponding to the target data extraction mode;
for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode is obtained, wherein the configuration items corresponding to different sub-flows are different, the configuration information corresponding to different configuration items is different, and the sub-flow comprises at least one of the following sub-flows: the method comprises the following sub-processes of extracting condition configuration, mapping relation configuration and output condition configuration, wherein the mapping relation configuration sub-process is used for setting the mapping relation between a field to be extracted and a set field, and the output condition configuration sub-process comprises at least one of the following sub-processes: an output type configuration sub-flow, a rules configuration sub-flow for output data, and an aggregation rules configuration sub-flow for output data;
Generating data extraction logic based on the obtained configuration information;
according to the data extraction logic, data extraction is carried out;
the generating data extraction logic based on the obtained configuration information includes:
generating data extraction logic corresponding to each sub-flow in the following manner:
combining configuration information corresponding to the sub-processes according to data extraction rules corresponding to the sub-processes to generate a target condition unit; combining the generated target condition units according to the logic relation among the condition units configured by the sub-processes to obtain a combined condition; generating a data query statement of the combination condition as data extraction logic corresponding to the sub-flow;
the method comprises the steps of combining configuration information corresponding to the sub-processes according to data extraction rules corresponding to the sub-processes to generate target condition units, and comprises the following steps:
combining the field to be extracted, the condition description element and the information corresponding to the condition logic symbol in the configuration information corresponding to the sub-process to generate a first condition unit; obtaining a second condition unit for constraining the generated first condition unit according to information corresponding to a condition constraint element in configuration information corresponding to the sub-flow, and obtaining a target condition unit comprising the first condition unit and the second condition unit, wherein the information corresponding to the condition constraint element comprises: constraint fields and logical relationships between constraint fields.
2. The method of claim 1, wherein the combining the generated target condition units according to the logical relationship among the condition units configured in the sub-flow to obtain the combined condition includes:
combining the second condition unit corresponding to the first condition unit according to the information corresponding to the constraint logic type in the configuration information corresponding to the sub-flow;
and combining the combined first condition units according to the information corresponding to the condition logic type in the configuration information corresponding to the sub-flow to obtain a combined condition.
3. The method of claim 1 or 2, wherein the performing data extraction according to the data extraction logic comprises:
generating a data query request based on the data query statement;
sending the data query request to a data query end so that the data query end performs data query and data extraction based on the data query request;
and receiving a data extraction result fed back by the data query end.
4. The method according to claim 1 or 2, wherein the obtaining a target data extraction pattern comprises:
obtaining a main mode of data extraction;
and obtaining a target data extraction mode from the sub modes supported by the main mode.
5. The method of claim 4, wherein the obtaining a master pattern of data extraction comprises:
a main pattern of data extraction is obtained from a first data extraction pattern referencing an existing rule and a second data extraction pattern based on a variable generation rule.
6. The method of claim 5, wherein the obtaining a target data extraction pattern from the sub-patterns supported by the master pattern comprises:
in the case where the main mode is the second data extraction mode, a target data extraction mode is obtained from the following sub-modes of data extraction:
a first sub-mode reflecting the mapping relation between the field to be extracted and the set field corresponding to the variable;
a second sub-mode reflecting extraction conditions to be satisfied by the field to be extracted corresponding to the variable;
and a third sub-mode corresponding to the expression of the field description to be extracted and corresponding to the variable.
7. The method of claim 6, wherein the determining the target data extraction flow corresponding to the target data extraction mode comprises:
if the target data extraction mode is the first sub-mode, determining that a target data extraction flow corresponding to the target data extraction mode comprises the mapping relation configuration sub-flow;
If the target data extraction mode is the second sub-mode, determining that the target data extraction flow comprises the extraction condition configuration sub-flow and the output condition configuration sub-flow;
and if the target data extraction mode is the third sub-mode, determining that the target data extraction flow comprises a field configuration sub-flow to be extracted and an expression configuration sub-flow.
8. The method of claim 5, wherein the obtaining a target data extraction pattern from the sub-patterns supported by the master pattern comprises:
and under the condition that the main mode is the first data extraction mode, obtaining a target data extraction mode from a historical rule reference sub-mode and a preset rule reference sub-mode.
9. The method of claim 8, wherein the determining the target data extraction flow corresponding to the target data extraction mode comprises:
the target data extraction flow is determined to comprise a target rule configuration sub-flow and a reference confirmation sub-flow, wherein the target rule comprises a history rule or a preset rule.
10. A data extraction apparatus comprising:
the extraction mode obtaining module is used for obtaining a target data extraction mode;
The extraction flow determining module is used for determining a target data extraction flow corresponding to the target data extraction mode;
the configuration information obtaining module is configured to obtain, for each sub-flow in the target data extraction flow, configuration information of a configuration item corresponding to the sub-flow in the target data extraction mode, where the configuration items corresponding to different sub-flows are different, and the configuration information corresponding to different configuration items is different, and the sub-flows include at least one of the following sub-flows: the method comprises the following sub-processes of extracting condition configuration, mapping relation configuration and output condition configuration, wherein the mapping relation configuration sub-process is used for setting the mapping relation between a field to be extracted and a set field, and the output condition configuration sub-process comprises at least one of the following sub-processes: an output type configuration sub-flow, a rules configuration sub-flow for output data, and an aggregation rules configuration sub-flow for output data;
the data extraction logic generation module is used for generating data extraction logic based on the obtained configuration information;
the data extraction module is used for extracting data according to the data extraction logic;
the data extraction logic generation module is specifically configured to generate data extraction logic corresponding to each sub-flow according to the following sub-modules:
The target condition unit generating sub-module is used for generating a target condition unit by combining configuration information corresponding to the sub-processes according to data extraction rules corresponding to the sub-processes; a combination condition obtaining sub-module, configured to combine the generated target condition units according to the logic relationship between the condition units configured in the sub-flow, to obtain a combination condition; the data extraction logic generation sub-module is used for generating a data query statement of the combination condition as data extraction logic corresponding to the sub-flow;
the target condition unit generation submodule is specifically configured to combine a field to be extracted in configuration information corresponding to the sub-process, a condition description element and information corresponding to a condition logical symbol to generate a first condition unit; obtaining a second condition unit for constraining the generated first condition unit according to information corresponding to a condition constraint element in configuration information corresponding to the sub-flow, and obtaining a target condition unit comprising the first condition unit and the second condition unit, wherein the information corresponding to the condition constraint element comprises: constraint fields and logical relationships between constraint fields.
11. The apparatus of claim 10, wherein,
The combination condition obtaining sub-module is specifically configured to combine the second condition unit corresponding to the first condition unit according to information corresponding to the constraint logic type in the configuration information corresponding to the sub-flow; and combining the combined first condition units according to the information corresponding to the condition logic type in the configuration information corresponding to the sub-flow to obtain a combined condition.
12. The device according to claim 10 or 11, wherein,
the data extraction module is specifically configured to generate a data query request based on the data query statement; sending the data query request to a data query end so that the data query end performs data query and data extraction based on the data query request; and receiving a data extraction result fed back by the data query end.
13. The apparatus of claim 10 or 11, wherein the extraction mode obtaining module comprises:
a main mode obtaining sub-module for obtaining a main mode of data extraction;
and the extraction mode obtaining sub-module is used for obtaining a target data extraction mode from the sub-modes supported by the main mode.
14. The apparatus of claim 13, wherein,
The main pattern obtaining sub-module is specifically configured to obtain a main pattern of data extraction from a first data extraction pattern referencing an existing rule and a second data extraction pattern based on a variable generation rule.
15. The apparatus of claim 14, wherein,
the extraction mode obtaining sub-module is specifically configured to obtain, when the main mode is the second data extraction mode, a target data extraction mode from among the following sub-modes for performing data extraction: a first sub-mode reflecting the mapping relation between the field to be extracted and the set field corresponding to the variable; a second sub-mode reflecting extraction conditions to be satisfied by the field to be extracted corresponding to the variable; and a third sub-mode corresponding to the expression of the field description to be extracted and corresponding to the variable.
16. The apparatus of claim 15, wherein,
the extraction flow determining module is specifically configured to determine that, if the target data extraction mode is the first sub-mode, the target data extraction flow corresponding to the target data extraction mode includes the mapping relationship configuration sub-flow; if the target data extraction mode is the second sub-mode, determining that the target data extraction flow comprises the extraction condition configuration sub-flow and the output condition configuration sub-flow; and if the target data extraction mode is the third sub-mode, determining that the target data extraction flow comprises a field configuration sub-flow to be extracted and an expression configuration sub-flow.
17. The apparatus of claim 14, wherein,
the extraction mode obtaining sub-module is specifically configured to obtain, when the main mode is the first data extraction mode, a target data extraction mode from a history rule reference sub-mode and a preset rule reference sub-mode.
18. The apparatus of claim 17, wherein,
the extraction flow determining module is specifically configured to determine that the target data extraction flow includes a target rule configuration sub-flow and a reference confirmation sub-flow, where the target rule includes a history rule or a preset rule.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.
CN202211542433.0A 2022-12-02 2022-12-02 Data extraction method, device, equipment and storage medium Active CN115862882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211542433.0A CN115862882B (en) 2022-12-02 2022-12-02 Data extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211542433.0A CN115862882B (en) 2022-12-02 2022-12-02 Data extraction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115862882A CN115862882A (en) 2023-03-28
CN115862882B true CN115862882B (en) 2024-02-13

Family

ID=85669589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211542433.0A Active CN115862882B (en) 2022-12-02 2022-12-02 Data extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115862882B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103827853A (en) * 2011-09-29 2014-05-28 国际商业机器公司 Minimizing rule sets in rule management system
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN111597245A (en) * 2020-05-20 2020-08-28 政采云有限公司 Data extraction method and device, information statistics method and related equipment
CN111753546A (en) * 2020-06-23 2020-10-09 深圳市华云中盛科技股份有限公司 Document information extraction method and device, computer equipment and storage medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product
CN112989763A (en) * 2021-03-16 2021-06-18 平安付科技服务有限公司 Data acquisition method and device, computer equipment and storage medium
CN113127522A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Data processing method, device, system and storage medium
CN113806434A (en) * 2021-09-22 2021-12-17 平安科技(深圳)有限公司 Big data processing method, device, equipment and medium
CN114328700A (en) * 2022-03-16 2022-04-12 上海柯林布瑞信息技术有限公司 Data checking method and device in medical data ETL task
CN114942971A (en) * 2022-07-22 2022-08-26 北京拓普丰联信息科技股份有限公司 Extraction method and device of structured data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011100403A (en) * 2009-11-09 2011-05-19 Sony Corp Information processor, information extraction method, program and information processing system
US20210073446A1 (en) * 2019-09-06 2021-03-11 BeamUp. Ltd. Structural design systems and methods for generating an actionable room index using modeling and simulation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103827853A (en) * 2011-09-29 2014-05-28 国际商业机器公司 Minimizing rule sets in rule management system
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN109299177A (en) * 2018-09-30 2019-02-01 江苏满运软件科技有限公司 Data pick-up method, apparatus, storage medium and electronic equipment
CN113127522A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Data processing method, device, system and storage medium
CN111597245A (en) * 2020-05-20 2020-08-28 政采云有限公司 Data extraction method and device, information statistics method and related equipment
CN111753546A (en) * 2020-06-23 2020-10-09 深圳市华云中盛科技股份有限公司 Document information extraction method and device, computer equipment and storage medium
CN112749219A (en) * 2021-01-04 2021-05-04 拉卡拉支付股份有限公司 Data extraction method, data extraction device, electronic equipment, storage medium and program product
CN112989763A (en) * 2021-03-16 2021-06-18 平安付科技服务有限公司 Data acquisition method and device, computer equipment and storage medium
CN113806434A (en) * 2021-09-22 2021-12-17 平安科技(深圳)有限公司 Big data processing method, device, equipment and medium
CN114328700A (en) * 2022-03-16 2022-04-12 上海柯林布瑞信息技术有限公司 Data checking method and device in medical data ETL task
CN114942971A (en) * 2022-07-22 2022-08-26 北京拓普丰联信息科技股份有限公司 Extraction method and device of structured data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ETL多数据流并行抽取中监控的研究与设计;王茜;;微计算机信息(第03期);183、200-201 *
基于模板流程配置的Web信息抽取;刘辉;陈静玉;徐学洲;;计算机工程;第34卷(第20期);55-57 *
基于规则引擎的大规模网页信息抽取平台设计与实现;任宪臻;朱义;;北京城市学院学报(第05期);67-70 *

Also Published As

Publication number Publication date
CN115862882A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US10181012B2 (en) Extracting clinical care pathways correlated with outcomes
EP3362954A1 (en) Systems and method for dynamic autonomous transactional identity management
CN113345577B (en) Diagnosis and treatment auxiliary information generation method, model training method, device, equipment and storage medium
WO2022160454A1 (en) Medical literature retrieval method and apparatus, electronic device, and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN111949692A (en) DTO-based user-defined index configuration method, system, equipment and medium
CN114678130A (en) Standard rule based evaluation method, terminal equipment and storage medium
CN115862882B (en) Data extraction method, device, equipment and storage medium
CN113590777A (en) Text information processing method and device, electronic equipment and storage medium
CN115620886B (en) Data auditing method and device
CN113377924A (en) Data processing method, device, equipment and storage medium
CN112488857A (en) Event recognition method and device, electronic equipment and storage medium
CN116504414B (en) Medical data query method, device, electronic equipment and storage medium
CN116089459B (en) Data retrieval method, device, electronic equipment and storage medium
CN116150475B (en) Information retrieval method, device, electronic equipment and storage medium
CN115033747B (en) Abnormal state searching method and device
CN114925118B (en) Cross-table searching method, device, equipment and storage medium
CN114661751B (en) Data production method, device, system, equipment and medium based on SQL (structured query language) knowledge base
CN109597847A (en) Medical data returns heavy method and device, storage medium, electric terminal
CN114880242B (en) Test case extraction method, device, equipment and medium
CN112365948B (en) Cancer stage prediction system
CN117453675A (en) Drug information standardization method, device, equipment and medium
CN116959741A (en) Medical image report comparison method, device, electronic equipment and storage medium
CN117409939A (en) Disease appeal diagnosis method, device, electronic equipment and storage medium
CN117253623A (en) Intelligent management method and system for chronic disease data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant