CN112732701A - Method and system for intelligent indexing of data and automatic data cleaning - Google Patents

Method and system for intelligent indexing of data and automatic data cleaning Download PDF

Info

Publication number
CN112732701A
CN112732701A CN202110105878.1A CN202110105878A CN112732701A CN 112732701 A CN112732701 A CN 112732701A CN 202110105878 A CN202110105878 A CN 202110105878A CN 112732701 A CN112732701 A CN 112732701A
Authority
CN
China
Prior art keywords
data
indexing
cleaning
instruction
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110105878.1A
Other languages
Chinese (zh)
Inventor
戴文艳
黄炳裕
洪章阳
林文国
王伟宗
王孝文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Evecom Information Technology Development Co ltd
Original Assignee
Evecom Information Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Evecom Information Technology Development Co ltd filed Critical Evecom Information Technology Development Co ltd
Priority to CN202110105878.1A priority Critical patent/CN112732701A/en
Publication of CN112732701A publication Critical patent/CN112732701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for intelligent indexing of data and automatic data cleaning comprises the following specific steps: receiving a data input request; segmenting received data to obtain a plurality of segments of data; calling the instruction and judging the calling instruction as an indexing instruction or a cleaning instruction: if the index instruction is the index instruction, inputting index keywords, sequentially bringing the index keywords into each segment of data, indexing the data containing the index rule, and sequentially integrating and storing the index results of each segment; and if the command is a cleaning command, cleaning all the sections simultaneously, integrating and storing the cleaning results of all the sections, cleaning the integrated data after pre-cleaning again, and finally outputting and displaying the data after deep data cleaning. The invention also provides a system for intelligent indexing of data and automatic data cleaning. The invention segments the data and simultaneously commands different segments, thereby improving the operating efficiency.

Description

Method and system for intelligent indexing of data and automatic data cleaning
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for intelligent indexing of data and automatic data cleaning.
Background
Data cleaning is a process of rechecking and checking data, and aims to delete repeated information, correct existing errors, check data consistency, process invalid values and missing values and the like; the indexing means that people are guided to conveniently and quickly find needed information through the marks.
The existing data indexing and cleaning method has low efficiency and poor indexing and cleaning effects, and usually needs manual proofreading.
Disclosure of Invention
Objects of the invention
In order to solve the technical problems in the background technology, the invention provides a method and a system for intelligent indexing of data and automatic data cleaning.
(II) technical scheme
The invention provides a method for intelligent indexing of data and automatic data cleaning, which comprises the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
Preferably, also includes the repository; the integrated data is stored in a repository.
Preferably, the repository comprises a plurality of child repositories.
Preferably, the intelligent indexing of data and the automatic data washing device are configured to the industrial equipment in the form of configuration files or code blocks.
The invention provides a system for intelligent indexing of data and automatic data cleaning, which comprises the method for intelligent indexing of data and automatic data cleaning, and specifically comprises
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
Preferably, the device further comprises a storage module; for storing the integrated data.
Preferably, the storage module comprises a plurality of sub-storage modules.
Preferably, the intelligent indexing of data and the automatic data washing device are configured to the industrial equipment in the form of configuration files or code blocks.
The technical scheme of the invention has the following beneficial technical effects:
in the invention, a user downloads data according to a data request command, the downloaded data is segmented by a grouping module, the user selects a command to be called, the called command is judged by a judging module, when the indexing command is selected, an indexing keyword is input, the indexing keyword is searched in each segmented data at the same time, the searched data is integrated and displayed, and the searching efficiency is improved; when a cleaning command is selected, all the segmented data are simultaneously screened, all the segments are integrated after the deleted, wrong and inconsistent data are deleted, and then the integrated data are deeply cleaned, so that the cleaning efficiency and accuracy are improved.
Drawings
Fig. 1 is a flowchart of a method for intelligent indexing of data and automatic data cleaning according to the present invention.
Fig. 2 is a system block diagram of a system for intelligent indexing of data and automatic data cleaning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
Example 1
As shown in fig. 1, the method for intelligently indexing data and automatically cleaning data provided by the present invention includes the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
In an alternative embodiment, a repository is also included; the integrated data is stored in a repository.
In an alternative embodiment, a repository contains a plurality of child repositories; and the storage of a plurality of groups of integrated data is facilitated.
In an alternative embodiment, the intelligent indexing of data and automatic data washing device is configured to the industrial equipment in the form of configuration files or code blocks; the user only needs to configure the intelligent data indexing and automatic data cleaning device into the industrial equipment, so that the automatic cleaning of the equipment data by the equipment end can be realized, the difficulty of data cleaning is simplified, and the efficiency of data cleaning is improved.
Example 2
As shown in fig. 2, the system for intelligent indexing of data and automatic data cleansing provided by the present invention includes the method for intelligent indexing of data and automatic data cleansing in embodiment 1, which specifically includes
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
In an optional embodiment, the system further comprises a storage module; for storing the integrated data.
In an alternative embodiment, the storage module comprises a plurality of sub-storage modules, which are convenient for receiving different groups of data according to different data receiving requests.
In an alternative embodiment, the intelligent indexing of data and automatic data washing device is configured to the industrial equipment in the form of configuration files or code blocks; the user only needs to configure the intelligent data indexing and automatic data cleaning device into the industrial equipment, so that the automatic cleaning of the equipment data by the equipment end can be realized, the difficulty of data cleaning is simplified, and the efficiency of data cleaning is improved.
In the invention, a user downloads data according to a data request command, the downloaded data is segmented by a grouping module, the user selects a command to be called, the called command is judged by a judging module, when the indexing command is selected, an indexing keyword is input, the indexing keyword is searched in each segmented data at the same time, the searched data is integrated and displayed, and the searching efficiency is improved; when a cleaning command is selected, all the segmented data are simultaneously screened, all the segments are integrated after the deleted, wrong and inconsistent data are deleted, and then the integrated data are deeply cleaned, so that the cleaning efficiency and accuracy are improved.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (8)

1. A method for intelligent indexing of data and automatic data cleaning is characterized by comprising the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
2. The method of intelligent indexing of data and automated data cleansing of claim 1, further comprising a repository; the integrated data is stored in a repository.
3. The method of claim 2, wherein the repository comprises a plurality of sub-repositories.
4. The method for intelligent indexing of data and automated data cleansing according to claim 1, wherein the intelligent indexing of data and automated data cleansing apparatus is configured to the industrial equipment in the form of a configuration file or a code block.
5. A system for intelligent indexing of data and automatic data cleaning, comprising the method for intelligent indexing of data and automatic data cleaning according to any one of claims 1-2, and is characterized by specifically comprising
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
6. The system for intelligent indexing of data and automated data cleansing of claim 5, further comprising a storage module; for storing the integrated data.
7. The system of claim 5, wherein the storage module comprises a plurality of sub-storage modules.
8. The system for intelligent indexing of data and automated data cleansing according to claim 5, wherein said intelligent indexing of data and automated data cleansing means is configured to said industrial equipment in the form of configuration files or code blocks.
CN202110105878.1A 2021-01-26 2021-01-26 Method and system for intelligent indexing of data and automatic data cleaning Pending CN112732701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110105878.1A CN112732701A (en) 2021-01-26 2021-01-26 Method and system for intelligent indexing of data and automatic data cleaning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110105878.1A CN112732701A (en) 2021-01-26 2021-01-26 Method and system for intelligent indexing of data and automatic data cleaning

Publications (1)

Publication Number Publication Date
CN112732701A true CN112732701A (en) 2021-04-30

Family

ID=75594058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110105878.1A Pending CN112732701A (en) 2021-01-26 2021-01-26 Method and system for intelligent indexing of data and automatic data cleaning

Country Status (1)

Country Link
CN (1) CN112732701A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611852B1 (en) * 2000-09-29 2003-08-26 Emc Corporation System and method for cleaning a log structure
CN106484915A (en) * 2016-11-03 2017-03-08 国家电网公司信息通信分公司 A kind of cleaning method of mass data and system
CN110389950A (en) * 2019-07-31 2019-10-29 南京安夏电子科技有限公司 A kind of big data cleaning method quickly run
CN110941957A (en) * 2019-11-26 2020-03-31 交通运输部科学研究院 Traffic science and technology data indexing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611852B1 (en) * 2000-09-29 2003-08-26 Emc Corporation System and method for cleaning a log structure
CN106484915A (en) * 2016-11-03 2017-03-08 国家电网公司信息通信分公司 A kind of cleaning method of mass data and system
CN110389950A (en) * 2019-07-31 2019-10-29 南京安夏电子科技有限公司 A kind of big data cleaning method quickly run
CN110941957A (en) * 2019-11-26 2020-03-31 交通运输部科学研究院 Traffic science and technology data indexing method and system

Similar Documents

Publication Publication Date Title
KR102230661B1 (en) SQL review methods, devices, servers and storage media
CN102156751B (en) Method and device for extracting video fingerprint
KR101617696B1 (en) Method and device for mining data regular expression
CN113434557B (en) Method, device, equipment and storage medium for querying range of label data
CN108776660B (en) ArcGIS-based method for matching road attributes in batches
CN113672628A (en) Data blood margin analysis method, terminal device and medium
CN114675816A (en) Code completion ordering method and system based on user behaviors
CN111552640A (en) Code detection method, device, equipment and storage medium
CN111984673B (en) Fuzzy retrieval method and device for tree structure of power grid electric energy metering system
JP2004302678A (en) Database search path display method
CN113127617A (en) Knowledge question answering method of general domain knowledge graph, terminal equipment and storage medium
CN104636471A (en) Procedure code finding method and device
CN112732701A (en) Method and system for intelligent indexing of data and automatic data cleaning
CN111143483A (en) Method, apparatus and computer readable storage medium for determining data table relationships
RU2433467C1 (en) Method of forming aggregated data structure and method of searching for data through aggregated data structure in data base management system
CN112015850B (en) Method and system for updating POI electronic map data based on data mining and POI vertical industry data characteristics
CN113064982A (en) Question-answer library generation method and related equipment
CN112835905A (en) Indexing method, device, equipment and storage medium for array type column
CN117290355B (en) Metadata map construction system
CN115017376B (en) Automatic searching method for corner network based on graph algorithm
CN112711604B (en) Geophysical prospecting training data set construction method and device
CN117521017B (en) Method and device for acquiring multi-mode characteristics
CN115587088A (en) Junk data cleaning method and device, terminal device and readable storage medium
CN107451125B (en) Method for performing rapid close semantic matching aiming at sequence-independent item groups
JPH07239861A (en) Document retrieving device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210430

RJ01 Rejection of invention patent application after publication