CN112667617A - Visual data cleaning system and method based on natural language - Google Patents

Visual data cleaning system and method based on natural language Download PDF

Info

Publication number
CN112667617A
CN112667617A CN202011617367.XA CN202011617367A CN112667617A CN 112667617 A CN112667617 A CN 112667617A CN 202011617367 A CN202011617367 A CN 202011617367A CN 112667617 A CN112667617 A CN 112667617A
Authority
CN
China
Prior art keywords
data
cleaning
attribute
natural language
cleaned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011617367.XA
Other languages
Chinese (zh)
Inventor
尹源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Chengqin Education Technology Co ltd
Original Assignee
Nanjing Chengqin Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Chengqin Education Technology Co ltd filed Critical Nanjing Chengqin Education Technology Co ltd
Priority to CN202011617367.XA priority Critical patent/CN112667617A/en
Publication of CN112667617A publication Critical patent/CN112667617A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

本发明涉及数据处理技术领域,具体涉及一种基于自然语言的可视化数据清洗系统及方法,本发明通过服务器指定需要清洗的数据源相关连接信息;获取待清洗数据的前N条,并解析其字段类型和格式去除无用字段;设置清洗模块,并选择触发数据同步和清洗任务;将清洗模块的清洗规则反向解析为数据清洗脚本,并对数据执行该情形脚本,将清洗后的数据传入分析库,重复执行直至所有数据清洗完毕,完成清洗。本发明实现对数据的清洗,无需掌握数据清洗工具的开发和使用方法,降低了大数据应用服务的技术门槛,提升了用户对大数据服务的体验,解决了传统的数据清洗系统的灵活性及可维护性问题,使的数据清洗工作人员的使用成本得到降低,效率得到提升。

Figure 202011617367

The invention relates to the technical field of data processing, and in particular to a natural language-based visual data cleaning system and method. The invention specifies connection information related to data sources to be cleaned through a server; obtains the first N pieces of data to be cleaned, and parses its fields Type and format to remove useless fields; set the cleaning module, and choose to trigger data synchronization and cleaning tasks; reversely parse the cleaning rules of the cleaning module into data cleaning scripts, execute the scenario script on the data, and pass the cleaned data into the analysis library, repeat the execution until all data is cleaned, and the cleaning is completed. The invention realizes data cleaning without mastering the development and use methods of data cleaning tools, lowers the technical threshold of big data application services, improves users' experience of big data services, and solves the flexibility and problems of traditional data cleaning systems. The maintainability problem reduces the use cost of the data cleaning staff and improves the efficiency.

Figure 202011617367

Description

Visual data cleaning system and method based on natural language
Technical Field
The invention relates to the technical field of data processing, in particular to a visual data cleaning system and method based on natural language.
Background
With the development of big data technology in recent years, a new analysis technical means is provided for original massive logs, internet records, historical data and the like, a lot of valuable information which cannot be found at ordinary times can be found by analyzing the massive data, big data analysis needs to be carried out, the first step is to collect data scattered at various places, carry out cleaning, and store the cleaned data in a warehouse. This process is called ETL, and involves three steps of extract data extraction, Transformation data conversion and Load data loading.
In the past, different cleaning tools are required to be adopted by means of data cleaning aiming at different data sources, different programs and scripts are required to be written for cleaning of different data sources, and the cleaning means require a user to master the using methods of various cleaning tools and have higher developing capability of the cleaning tools; resulting in high data cleansing system usage thresholds (associated expertise for the data source or cleansing tool needs to be learned) and high maintenance costs for the data cleansing process.
In the invention document No. CN201710011044.8, a data cleaning method and a data cleaning apparatus are disclosed, the data cleaning method including: acquiring original sample data to be cleaned; determining at least one data screening mechanism for cleaning the original sample data, and acquiring a screening value set by a user for each data screening mechanism according to the original sample data; and screening the original sample data according to the at least one data screening mechanism and the screening value set by the user so as to clean the original sample data. According to the technical scheme, the original sample data can be comprehensively cleaned, the dependence of a data cleaning process on operators can be reduced, the accuracy and the stability of a data cleaning result are ensured, and meanwhile, the data cleaning duration can be effectively shortened.
In an invention document with a patent number of CN201810143012, a data cleaning method and a data cleaning system are disclosed. The data cleaning method comprises the following steps: step S10: selecting a data source to be cleaned from heterogeneous data sources through a graphical interface; the heterogeneous data source comprises a text file and database data; step S11: editing a data cleaning rule through a graphical interface; step S12: data cleansing is performed through a graphical interface. According to the data cleaning method, the data source to be cleaned is selected from the heterogeneous data sources through the graphical interface, fusion cleaning of different data sources can be achieved, meanwhile, a user can clean data through simple operation on the graphical interface, the development and use method of a data cleaning tool does not need to be mastered, the technical threshold of big data application service is lowered, and the user experience of the big data service is improved.
In summary, the traditional data cleaning system mostly adopts script writing and a configuration file or control dragging type mode, so that the realization is simple, but the learning and maintenance cost is high, and the flexibility is low.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a visual data cleaning system and method based on natural language, which are used for solving the problems that the traditional data cleaning system mostly adopts script writing and a file or control dragging type mode, is simple to realize, but has higher learning and maintenance cost and lower flexibility.
The invention is realized by the following technical scheme:
in a first aspect, the invention discloses a visual data cleaning method based on natural language, which comprises the following steps:
s1 the system is initialized successfully, and the server designates the related connection information of the data source to be cleaned;
s2, after the data source is successfully connected, acquiring the first N pieces of data to be cleaned, and analyzing the field type and format of the data;
s3, confirming the data fields needing to be accessed through the graphical interface, carrying out the first round of screening, and removing useless fields;
s4, entering natural language cleaning configuration, setting a cleaning module, and selecting triggering data synchronization and cleaning tasks;
s5, reversely analyzing the cleaning rule of the cleaning module into a data cleaning script and executing the scenario script on the data;
and S6, transmitting the cleaned data into an analysis library, and repeating the steps until all data are cleaned, thereby completing cleaning.
Further, in the method, specifying the data source related connection information to be cleaned includes providing server host information, a username password, and a database for the remote data source; corresponding directory and file paths are provided for the local data sources.
Furthermore, in the method, each time a cleaning module is added, the system gives a natural language prompt according to the analyzed source field information, and assists the user in configuring the cleaning rule by using the natural language.
Furthermore, in the method, the abnormal attribute in the data set is identified, the corresponding weight is given to each attribute, then the average value and the standard deviation of the field value of each attribute are counted, a confidence interval is set for each attribute according to the average value and the standard deviation, and whether the attribute is abnormal or not is judged according to whether the attribute value is in the confidence interval or not.
Furthermore, the method uses a reduction algorithm based on attribute importance as a logic rule to clean the data attributes, performs distinguishable identification array calculation on the decision table S ═ { U, Q, V, F }, and assigns the core attributes in the distinguishable identification array to the attribute set obtained after attribute reduction, wherein U, Q, V, F are the attributes of the data.
Furthermore, in the method, all the attribute combination items left by subtracting the core attribute in the distinguishable identification array are removed; calculating the occurrence frequency of each condition attribute, performing descending order on all attribute frequencies, selecting the attribute with the highest attribute frequency as a, and deleting the combination item containing the condition attribute a from all combination items of the variable matrix; and judging whether the distinguishable matrix is empty, if not, continuing to delete the combination item containing the condition attribute a, and if so, ending, wherein Red is the finally obtained reduction result.
Furthermore, in the method, a rule closure set is obtained by a mathematical method for a data logic constraint rule formulated by a reduction algorithm based on attribute importance, and whether a field value violates a rule constraint is automatically judged, so that the correctness of the logic rule is judged.
In a second aspect, the invention discloses a natural language-based visual data cleaning system, which includes a visual cleaning process canvas, a natural language conversion module, a server and a memory storing execution instructions, wherein when the server executes the execution instructions stored in the memory, the server executes the natural language visual data cleaning method of the first aspect.
Furthermore, the visual cleaning process canvas supports the cleaning logic of drawing data in a dragging mode, different data cleaning component blocks are added, and a data circulation path is used for connecting the component blocks.
Furthermore, when the data cleaning component block is added, the natural language conversion module uses natural language to describe cleaning logic, checks and analyzes the statement input by the user, and if the analysis is successful, converts the statement into a corresponding bottom-layer data filtering query statement and transmits the corresponding bottom-layer data filtering query statement to the bottom-layer data cleaning execution module; and if the analysis fails, returning an abnormal state code to the visual cleaning process canvas to display corresponding abnormal information to prompt a user.
The invention has the beneficial effects that:
according to the invention, through the combination of the graphical interface and the natural language engine, a user can clean data through simple operation on the graphical interface without mastering the development and use method of a data cleaning tool, the technical threshold of the big data application service is reduced, the experience of the user on the big data service is improved, the problems of flexibility and maintainability of the traditional data cleaning system are solved, the use cost of data cleaning workers is reduced by using a visualization technology and a natural language interaction mode, the efficiency is improved, and the method has a very strong market application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram of the principle steps of a visualization data cleansing method based on natural language.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment discloses a visualization data cleaning method based on natural language as shown in fig. 1, which includes the following steps:
s1 the system is initialized successfully, and the server designates the related connection information of the data source to be cleaned;
s2, after the data source is successfully connected, acquiring the first N pieces of data to be cleaned, and analyzing the field type and format of the data;
s3, confirming the data fields needing to be accessed through the graphical interface, carrying out the first round of screening, and removing useless fields;
s4, entering natural language cleaning configuration, setting a cleaning module, and selecting triggering data synchronization and cleaning tasks;
s5, reversely analyzing the cleaning rule of the cleaning module into a data cleaning script and executing the scenario script on the data;
and S6, transmitting the cleaned data into an analysis library, and repeating the steps until all data are cleaned, thereby completing cleaning.
In this embodiment, the specifying of the relevant connection information of the data source to be cleaned includes, for the remote data source, providing server host information, a user name and a password, and a database; for a local data source, such as Excel, a log file, a corresponding directory and file path are provided.
In this embodiment, each time a cleaning module is added, the system provides a natural language prompt according to the analyzed source field information, and assists the user in configuring cleaning rules using natural language, such as data filtering: only importing data of which the class is not equal to the professional selection course.
In the embodiment, the abnormal attribute in the data set is identified, the corresponding weight is given to each attribute, then the average value and the standard deviation of the field value of each attribute are counted, a confidence interval is set for each attribute according to the average value and the standard deviation, and whether the attribute is abnormal or not is judged according to whether the attribute value is in the confidence interval or not.
Example 2
In this embodiment, a reduction algorithm based on attribute importance is used as a logic rule to clean data attributes, a distinguishable identification array calculation is performed on a decision table S ═ { U, Q, V, F }, and an attribute set is obtained by assigning a core attribute in the distinguishable identification array to attribute reduction, where U, Q, V, and F are data attributes.
In this embodiment, all the remaining attribute combination items are reduced by removing the kernel attribute in the distinguishable identification array; calculating the occurrence frequency of each condition attribute, performing descending order on all attribute frequencies, selecting the attribute with the highest attribute frequency as a, and deleting the combination item containing the condition attribute a from all combination items of the variable matrix; and judging whether the distinguishable matrix is empty, if not, continuing to delete the combination item containing the condition attribute a, and if so, ending, wherein Red is the finally obtained reduction result.
In the embodiment, a rule closure set is obtained by using a mathematical method for a data logic constraint rule formulated by a reduction algorithm based on attribute importance, and whether a field value violates a rule constraint is automatically judged, so that the correctness of the logic rule is judged.
In the present example, from the classification of the condition attributes, by removing the condition attributes cumulatively, and calculating and comparing the relative positive regions, it can be determined whether the core attribute and all the important attributes are removed. And then, adding the qualified condition attributes into the attribute reduction set, and outputting a final attribute reduction set.
In this embodiment, the idea based on the attribute reduction algorithm is improved, the condition attributes are subdivided, and then the attribute reduction set is directly output by comparing the relatively positive region from which the condition attributes are removed. After the algorithm is improved, the algorithm is mainly used for judging whether the core attribute and all important attributes are removed or not by calculating and comparing relative positive regions from the classification of the condition attributes through the accumulated removal of the condition attributes. Finally, the condition attributes meeting the requirements are added into the attribute reduction set, and the final attribute reduction set is output.
Example 3
The embodiment discloses a visual data cleaning system based on natural language, which comprises a visual cleaning process canvas, a natural language conversion module, a server and a memory for storing execution instructions, wherein when the server executes the execution instructions stored in the memory, the server executes a visual data cleaning method of natural language.
This embodiment is different from traditional data cleaning system, and on user interface, the user can directly adopt the washing logic of drawing data on visual washing flow canvas by dragging, namely: adding different data cleaning component blocks and connecting the component blocks by using a data circulation path.
When adding data cleansing component blocks, natural language can be used to describe cleansing logic, such as: inputting 'the creation date is more than 2019', the natural language conversion module can check and analyze the statement input by the user, if the analysis is successful, the statement can be converted into a corresponding bottom layer data filtering query statement, and the corresponding bottom layer data filtering query statement is transmitted to the bottom layer data cleaning execution module; if the analysis fails, an abnormal state code is returned to the canvas module, and the canvas module displays corresponding abnormal information to prompt a user.
In conclusion, by combining the graphical interface and the natural language engine, a user can clean data by simple operation on the graphical interface without mastering the development and use method of a data cleaning tool, the technical threshold of the big data application service is reduced, the experience of the user on the big data service is improved, the problems of flexibility and maintainability of the traditional data cleaning system are solved, the use cost of data cleaning workers is reduced by using a visualization technology and a natural language interaction mode, the efficiency is improved, and the method has a strong market application prospect.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1.一种基于自然语言的可视化数据清洗方法,其特征在于,所述方法包括以下步骤:1. a visual data cleaning method based on natural language, is characterized in that, described method comprises the following steps: S1系统初始化成功,通过服务器指定需要清洗的数据源相关连接信息;The S1 system is successfully initialized, and the connection information related to the data source that needs to be cleaned is specified through the server; S2数据源连接成功后,获取待清洗数据的前N条,并解析其字段类型和格式;After the S2 data source is successfully connected, obtain the first N pieces of data to be cleaned, and parse its field type and format; S3通过图形界面确认需要接入的数据字段,进行第一轮筛选,去除无用字段;S3 confirms the data fields to be accessed through the graphical interface, performs the first round of screening, and removes useless fields; S4进入自然语言清洗配置,设置清洗模块,并选择触发数据同步和清洗任务;S4 enters the natural language cleaning configuration, sets the cleaning module, and chooses to trigger data synchronization and cleaning tasks; S5将清洗模块的清洗规则反向解析为数据清洗脚本,并对数据执行该情形脚本;S5 reversely parses the cleaning rules of the cleaning module into a data cleaning script, and executes the scenario script on the data; S6将清洗后的数据传入分析库,重复执行上述步骤直至所有数据清洗完毕,完成清洗。S6 transfers the cleaned data into the analysis library, and repeats the above steps until all data cleaning is completed, and cleaning is completed. 2.根据权利要求1所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法中,指定需要清洗的数据源相关连接信息包括,对于远程数据源提供服务器主机信息、用户名密码和数据库;对于本地数据源提供对应的目录和文件路径。2. The natural language-based visual data cleaning method according to claim 1, wherein, in the method, specifying the relevant connection information of the data source to be cleaned comprises, providing server host information, username and password for the remote data source and database; provide the corresponding directory and file path for the local data source. 3.根据权利要求1所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法中,每添加一个清洗模块,系统根据解析出的源字段信息给出自然语言提示,辅助用户利用自然语言配置清洗规。3. The visual data cleaning method based on natural language according to claim 1, is characterized in that, in the described method, every time a cleaning module is added, the system provides natural language prompts according to the parsed source field information, and assists users to utilize Natural language configuration cleaning rules. 4.根据权利要求1所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法中,对数据集中异常属性的进行识别,其先为每个属性赋予相应的权重,然后统计每个属性字段值的平均值以及标准差,依据此为每一个属性设置一个置信区间,根据属性值是否在置信区间内来判断属性是否异常。4. The natural language-based visual data cleaning method according to claim 1, wherein, in the method, for identifying abnormal attributes in the data set, it first assigns a corresponding weight to each attribute, and then counts each attribute. The average value and standard deviation of each attribute field value, according to which a confidence interval is set for each attribute, and whether the attribute is abnormal is judged according to whether the attribute value is within the confidence interval. 5.根据权利要求1所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法使用基于属性重要性的约简算法作为逻辑规则对数据属性进行清洗,对决策表S={U,Q,V,F}进行可辨识别阵计算,通过将可辨识别阵中的核属性赋给属性约减后得到的属性集,其中U,Q,V,F均为数据的属性。5. The natural language-based visual data cleaning method according to claim 1, wherein the method uses a reduction algorithm based on attribute importance as a logical rule to clean data attributes, and the decision table S={U ,Q,V,F} is used to calculate the discriminative identification matrix, by assigning the kernel attributes in the discriminable identification matrix to the attribute set obtained after attribute reduction, in which U, Q, V, F are the attributes of the data. 6.根据权利要求5所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法中,去掉可辨识别阵中核属性约减剩余所有属性组合项;计算各条件属性出现的频率,对所有的属性频率进行降序排列,选取属性频率最高的属性记为a,RED=RED∪{a},在可变矩阵的所有组合项中删除包含条件属性a的组合项;判断可辨矩阵是否为空,若可辨矩阵不为空则继续删除包含条件属性a的组合项,若可辨矩阵为空则结束,其中Red为最后得到的约减结果。6. The natural language-based visual data cleaning method according to claim 5, characterized in that, in the method, the kernel attributes in the discernible identification array are removed and all remaining attribute combination items are reduced; the frequency of occurrence of each conditional attribute is calculated, Arrange all attribute frequencies in descending order, select the attribute with the highest attribute frequency and denote it as a, RED=RED∪{a}, delete the combination item containing the conditional attribute a from all the combination items of the variable matrix; judge whether the discernible matrix is is empty, if the discriminative matrix is not empty, continue to delete the combination item containing the conditional attribute a, if the discriminative matrix is empty, end, where Red is the final reduction result obtained. 7.根据权利要求1所述的基于自然语言的可视化数据清洗方法,其特征在于,所述方法中,对基于属性重要性的约简算法制定的数据逻辑约束规则,利用数学方法获得规则闭集,并自动判断字段值是否违反规则约束,进而判断逻辑规则的正误。7. The visual data cleaning method based on natural language according to claim 1, is characterized in that, in the described method, to the data logic constraint rule formulated by the reduction algorithm based on attribute importance, utilize mathematical method to obtain rule closed set , and automatically determine whether the field value violates the rule constraints, and then determine the correctness of the logic rules. 8.一种基于自然语言的可视化数据清洗系统,其特征在于,包括可视化清洗流程画布、自然语言转换模块、服务器以及存储有执行指令的存储器,当所述服务器执行所述存储器存储的所述执行指令时,所述服务器执行如权利要求1至7中任一项所述的自然语言的可视化数据清洗方法。8. A visual data cleaning system based on natural language, characterized in that it comprises a visual cleaning process canvas, a natural language conversion module, a server and a memory storing execution instructions, when the server executes the execution stored in the memory. When instructed, the server executes the natural language visual data cleaning method according to any one of claims 1 to 7. 9.根据权利要求8所述的基于自然语言的可视化数据清洗系统,其特征在于,所述可视化清洗流程画布支持拖曳方式绘制数据的清洗逻辑,添加不同的数据清洗组件块,并使用数据流转路径连接组件块。9 . The natural language-based visual data cleaning system according to claim 8 , wherein the visual cleaning process canvas supports the cleaning logic of drawing data by dragging, adding different data cleaning component blocks, and using data flow paths. 10 . Connect the component blocks. 10.根据权利要求8所述的基于自然语言的可视化数据清洗系统,其特征在于,所述自然语言转换模块在添加数据清洗组件块时,使用自然语言描述清洗逻辑,对用户输入的语句进行校验和解析,如果解析成功,会将此语句转换为对应的底层数据过滤查询语句,并传输给底层的数据清洗执行模块;如果解析失败,则会返回异常状态代码给所述可视化清洗流程画布显示对应的异常信息提示用户。10. The natural language-based visual data cleaning system according to claim 8, wherein the natural language conversion module uses natural language to describe the cleaning logic when adding a data cleaning component block, and verifies the statement input by the user. If the parsing is successful, the statement will be converted into the corresponding underlying data filtering query statement and transmitted to the underlying data cleaning execution module; if the parsing fails, an exception status code will be returned to the visualized cleaning process canvas for display The corresponding exception information is prompted to the user.
CN202011617367.XA 2020-12-30 2020-12-30 Visual data cleaning system and method based on natural language Pending CN112667617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011617367.XA CN112667617A (en) 2020-12-30 2020-12-30 Visual data cleaning system and method based on natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011617367.XA CN112667617A (en) 2020-12-30 2020-12-30 Visual data cleaning system and method based on natural language

Publications (1)

Publication Number Publication Date
CN112667617A true CN112667617A (en) 2021-04-16

Family

ID=75412038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011617367.XA Pending CN112667617A (en) 2020-12-30 2020-12-30 Visual data cleaning system and method based on natural language

Country Status (1)

Country Link
CN (1) CN112667617A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138982A (en) * 2021-05-25 2021-07-20 黄柱挺 Big data cleaning method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183814A (en) * 2015-08-27 2015-12-23 湖南人文科技学院 Internet of Things data cleaning method
WO2017162083A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Data cleaning method and apparatus
US20180150530A1 (en) * 2016-04-19 2018-05-31 Ping An Technology (Shenzhen) Co., Ltd. Method, Apparatus, Computing Device and Storage Medium for Analyzing and Processing Data
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN110956541A (en) * 2019-08-27 2020-04-03 西安交通大学 A Stock Trend Classification Prediction Method Based on Intelligent Fusion Computing
CN111506595A (en) * 2020-04-20 2020-08-07 金蝶软件(中国)有限公司 Data query method, system and related equipment
CN111858569A (en) * 2020-07-01 2020-10-30 长江岩土工程总公司(武汉) Mass data cleaning method based on stream computing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183814A (en) * 2015-08-27 2015-12-23 湖南人文科技学院 Internet of Things data cleaning method
WO2017162083A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Data cleaning method and apparatus
US20180150530A1 (en) * 2016-04-19 2018-05-31 Ping An Technology (Shenzhen) Co., Ltd. Method, Apparatus, Computing Device and Storage Medium for Analyzing and Processing Data
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN110956541A (en) * 2019-08-27 2020-04-03 西安交通大学 A Stock Trend Classification Prediction Method Based on Intelligent Fusion Computing
CN111506595A (en) * 2020-04-20 2020-08-07 金蝶软件(中国)有限公司 Data query method, system and related equipment
CN111858569A (en) * 2020-07-01 2020-10-30 长江岩土工程总公司(武汉) Mass data cleaning method based on stream computing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138982A (en) * 2021-05-25 2021-07-20 黄柱挺 Big data cleaning method

Similar Documents

Publication Publication Date Title
Baier et al. Matching events and activities by integrating behavioral aspects and label analysis
US20150142707A1 (en) Method and system for clustering, modeling, and visualizing process models from noisy logs
US20180137424A1 (en) Methods and systems for identifying gaps in predictive model ontology
CN109918437A (en) Distributed data processing method, apparatus and data assets management system
CN112181758A (en) Fault root cause positioning method based on network topology and real-time alarm
CN106599193A (en) Data cleaning method and system
US20140324908A1 (en) Method and system for increasing accuracy and completeness of acquired data
Di Ciccio et al. Ensuring model consistency in declarative process discovery
Horcas et al. We’re not gonna break it! consistency-preserving operators for efficient product line configuration
CN117792882A (en) Communication network fault log analysis method based on large language model assistance
CN117667702A (en) Knowledge graph-based software testing method, device, equipment and storage medium
CN114996331B (en) Data mining control method and system
CN116340536A (en) Operation and maintenance knowledge graph construction method, device, equipment, medium and program product
US20200327125A1 (en) Systems and methods for hierarchical process mining
CN112667617A (en) Visual data cleaning system and method based on natural language
Burattin Applicability of process mining techniques in business environments
US20230376795A1 (en) Device, computing platform and method of analyzing log files of an industrial plant
Peng et al. An approach of crossover service goal convergence and conflicts resolution
Ferlin et al. An automated method for the study of human reliability in railway supervision systems
Hidayat et al. Process model extension using heuristics miner:(Case study: Incident management of Volvo IT Belgium)
CN117314150A (en) Construction enterprise potential safety hazard prevention system and method based on data analysis
Babkin et al. Analysis of the consistency of enterprise architecture models using formal verification methods
CN113721977B (en) Programming data processing method and device
Bluemke et al. Usage of UML Combined Fragments in Automatic Function Point Analysis.
JP2013012082A (en) Test data generation program, test data generation method, and test data generation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination