CN112732701A - Method and system for intelligent indexing of data and automatic data cleaning - Google Patents
Method and system for intelligent indexing of data and automatic data cleaning Download PDFInfo
- Publication number
- CN112732701A CN112732701A CN202110105878.1A CN202110105878A CN112732701A CN 112732701 A CN112732701 A CN 112732701A CN 202110105878 A CN202110105878 A CN 202110105878A CN 112732701 A CN112732701 A CN 112732701A
- Authority
- CN
- China
- Prior art keywords
- data
- indexing
- cleaning
- instruction
- intelligent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000010354 integration Effects 0.000 claims description 3
- 238000005406 washing Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A method for intelligent indexing of data and automatic data cleaning comprises the following specific steps: receiving a data input request; segmenting received data to obtain a plurality of segments of data; calling the instruction and judging the calling instruction as an indexing instruction or a cleaning instruction: if the index instruction is the index instruction, inputting index keywords, sequentially bringing the index keywords into each segment of data, indexing the data containing the index rule, and sequentially integrating and storing the index results of each segment; and if the command is a cleaning command, cleaning all the sections simultaneously, integrating and storing the cleaning results of all the sections, cleaning the integrated data after pre-cleaning again, and finally outputting and displaying the data after deep data cleaning. The invention also provides a system for intelligent indexing of data and automatic data cleaning. The invention segments the data and simultaneously commands different segments, thereby improving the operating efficiency.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for intelligent indexing of data and automatic data cleaning.
Background
Data cleaning is a process of rechecking and checking data, and aims to delete repeated information, correct existing errors, check data consistency, process invalid values and missing values and the like; the indexing means that people are guided to conveniently and quickly find needed information through the marks.
The existing data indexing and cleaning method has low efficiency and poor indexing and cleaning effects, and usually needs manual proofreading.
Disclosure of Invention
Objects of the invention
In order to solve the technical problems in the background technology, the invention provides a method and a system for intelligent indexing of data and automatic data cleaning.
(II) technical scheme
The invention provides a method for intelligent indexing of data and automatic data cleaning, which comprises the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
Preferably, also includes the repository; the integrated data is stored in a repository.
Preferably, the repository comprises a plurality of child repositories.
Preferably, the intelligent indexing of data and the automatic data washing device are configured to the industrial equipment in the form of configuration files or code blocks.
The invention provides a system for intelligent indexing of data and automatic data cleaning, which comprises the method for intelligent indexing of data and automatic data cleaning, and specifically comprises
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
Preferably, the device further comprises a storage module; for storing the integrated data.
Preferably, the storage module comprises a plurality of sub-storage modules.
Preferably, the intelligent indexing of data and the automatic data washing device are configured to the industrial equipment in the form of configuration files or code blocks.
The technical scheme of the invention has the following beneficial technical effects:
in the invention, a user downloads data according to a data request command, the downloaded data is segmented by a grouping module, the user selects a command to be called, the called command is judged by a judging module, when the indexing command is selected, an indexing keyword is input, the indexing keyword is searched in each segmented data at the same time, the searched data is integrated and displayed, and the searching efficiency is improved; when a cleaning command is selected, all the segmented data are simultaneously screened, all the segments are integrated after the deleted, wrong and inconsistent data are deleted, and then the integrated data are deeply cleaned, so that the cleaning efficiency and accuracy are improved.
Drawings
Fig. 1 is a flowchart of a method for intelligent indexing of data and automatic data cleaning according to the present invention.
Fig. 2 is a system block diagram of a system for intelligent indexing of data and automatic data cleaning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
Example 1
As shown in fig. 1, the method for intelligently indexing data and automatically cleaning data provided by the present invention includes the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
In an alternative embodiment, a repository is also included; the integrated data is stored in a repository.
In an alternative embodiment, a repository contains a plurality of child repositories; and the storage of a plurality of groups of integrated data is facilitated.
In an alternative embodiment, the intelligent indexing of data and automatic data washing device is configured to the industrial equipment in the form of configuration files or code blocks; the user only needs to configure the intelligent data indexing and automatic data cleaning device into the industrial equipment, so that the automatic cleaning of the equipment data by the equipment end can be realized, the difficulty of data cleaning is simplified, and the efficiency of data cleaning is improved.
Example 2
As shown in fig. 2, the system for intelligent indexing of data and automatic data cleansing provided by the present invention includes the method for intelligent indexing of data and automatic data cleansing in embodiment 1, which specifically includes
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
In an optional embodiment, the system further comprises a storage module; for storing the integrated data.
In an alternative embodiment, the storage module comprises a plurality of sub-storage modules, which are convenient for receiving different groups of data according to different data receiving requests.
In an alternative embodiment, the intelligent indexing of data and automatic data washing device is configured to the industrial equipment in the form of configuration files or code blocks; the user only needs to configure the intelligent data indexing and automatic data cleaning device into the industrial equipment, so that the automatic cleaning of the equipment data by the equipment end can be realized, the difficulty of data cleaning is simplified, and the efficiency of data cleaning is improved.
In the invention, a user downloads data according to a data request command, the downloaded data is segmented by a grouping module, the user selects a command to be called, the called command is judged by a judging module, when the indexing command is selected, an indexing keyword is input, the indexing keyword is searched in each segmented data at the same time, the searched data is integrated and displayed, and the searching efficiency is improved; when a cleaning command is selected, all the segmented data are simultaneously screened, all the segments are integrated after the deleted, wrong and inconsistent data are deleted, and then the integrated data are deeply cleaned, so that the cleaning efficiency and accuracy are improved.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.
Claims (8)
1. A method for intelligent indexing of data and automatic data cleaning is characterized by comprising the following specific steps:
s1, receiving a data input request;
s2, segmenting the received data to obtain a plurality of segments of data;
s3, calling the instruction and judging whether the calling instruction is an indexing instruction or a cleaning instruction: if the instruction is an indexing instruction, executing S4; if the command is a cleaning command, executing S8;
s4, inputting the index key words and judging whether to finish the selection: if yes, go to S5; if not, re-executing S4;
s5, sequentially bringing each indexing keyword into each segmented data, indexing the data containing the indexing rules, and executing S6;
s6, sequentially integrating and storing the indexing results of all the sections;
s7, outputting and displaying the integrated result;
s8, data precleaning: simultaneously cleaning each section;
s9, integrating and storing the cleaning results of all sections;
s10, deep data cleaning: cleaning the integration data after the pre-cleaning again;
and S11, outputting and displaying the data after the data depth cleaning.
2. The method of intelligent indexing of data and automated data cleansing of claim 1, further comprising a repository; the integrated data is stored in a repository.
3. The method of claim 2, wherein the repository comprises a plurality of sub-repositories.
4. The method for intelligent indexing of data and automated data cleansing according to claim 1, wherein the intelligent indexing of data and automated data cleansing apparatus is configured to the industrial equipment in the form of a configuration file or a code block.
5. A system for intelligent indexing of data and automatic data cleaning, comprising the method for intelligent indexing of data and automatic data cleaning according to any one of claims 1-2, and is characterized by specifically comprising
The download module is used for receiving an input data request command;
the grouping module is used for segmenting data in the storage module so as to facilitate the simultaneous execution of commands on all the segment modules;
the calling module is used for calling the instruction;
the judging module is used for judging whether the called command is an indexing command or a cleaning command;
the indexing module is used for finding out corresponding indexing data in the data, extracting the indexing data and integrating the indexing data;
the cleaning module is used for carrying out data precleaning and data deep cleaning on the data, deleting data with errors, repetition and lack of consistency in the data, and integrating the cleaned data;
and the display module is used for displaying the indexing data or used for displaying the cleaned data.
6. The system for intelligent indexing of data and automated data cleansing of claim 5, further comprising a storage module; for storing the integrated data.
7. The system of claim 5, wherein the storage module comprises a plurality of sub-storage modules.
8. The system for intelligent indexing of data and automated data cleansing according to claim 5, wherein said intelligent indexing of data and automated data cleansing means is configured to said industrial equipment in the form of configuration files or code blocks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110105878.1A CN112732701A (en) | 2021-01-26 | 2021-01-26 | Method and system for intelligent indexing of data and automatic data cleaning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110105878.1A CN112732701A (en) | 2021-01-26 | 2021-01-26 | Method and system for intelligent indexing of data and automatic data cleaning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112732701A true CN112732701A (en) | 2021-04-30 |
Family
ID=75594058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110105878.1A Pending CN112732701A (en) | 2021-01-26 | 2021-01-26 | Method and system for intelligent indexing of data and automatic data cleaning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112732701A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6611852B1 (en) * | 2000-09-29 | 2003-08-26 | Emc Corporation | System and method for cleaning a log structure |
CN106484915A (en) * | 2016-11-03 | 2017-03-08 | 国家电网公司信息通信分公司 | A kind of cleaning method of mass data and system |
CN110389950A (en) * | 2019-07-31 | 2019-10-29 | 南京安夏电子科技有限公司 | A kind of big data cleaning method quickly run |
CN110941957A (en) * | 2019-11-26 | 2020-03-31 | 交通运输部科学研究院 | Traffic science and technology data indexing method and system |
-
2021
- 2021-01-26 CN CN202110105878.1A patent/CN112732701A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6611852B1 (en) * | 2000-09-29 | 2003-08-26 | Emc Corporation | System and method for cleaning a log structure |
CN106484915A (en) * | 2016-11-03 | 2017-03-08 | 国家电网公司信息通信分公司 | A kind of cleaning method of mass data and system |
CN110389950A (en) * | 2019-07-31 | 2019-10-29 | 南京安夏电子科技有限公司 | A kind of big data cleaning method quickly run |
CN110941957A (en) * | 2019-11-26 | 2020-03-31 | 交通运输部科学研究院 | Traffic science and technology data indexing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102230661B1 (en) | SQL review methods, devices, servers and storage media | |
CN102156751B (en) | Method and device for extracting video fingerprint | |
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN113434557B (en) | Method, device, equipment and storage medium for querying range of label data | |
CN108776660B (en) | ArcGIS-based method for matching road attributes in batches | |
CN113672628A (en) | Data blood margin analysis method, terminal device and medium | |
CN114675816A (en) | Code completion ordering method and system based on user behaviors | |
CN111552640A (en) | Code detection method, device, equipment and storage medium | |
CN111984673B (en) | Fuzzy retrieval method and device for tree structure of power grid electric energy metering system | |
JP2004302678A (en) | Database search path display method | |
CN113127617A (en) | Knowledge question answering method of general domain knowledge graph, terminal equipment and storage medium | |
CN104636471A (en) | Procedure code finding method and device | |
CN112732701A (en) | Method and system for intelligent indexing of data and automatic data cleaning | |
CN111143483A (en) | Method, apparatus and computer readable storage medium for determining data table relationships | |
RU2433467C1 (en) | Method of forming aggregated data structure and method of searching for data through aggregated data structure in data base management system | |
CN112015850B (en) | Method and system for updating POI electronic map data based on data mining and POI vertical industry data characteristics | |
CN113064982A (en) | Question-answer library generation method and related equipment | |
CN112835905A (en) | Indexing method, device, equipment and storage medium for array type column | |
CN117290355B (en) | Metadata map construction system | |
CN115017376B (en) | Automatic searching method for corner network based on graph algorithm | |
CN112711604B (en) | Geophysical prospecting training data set construction method and device | |
CN117521017B (en) | Method and device for acquiring multi-mode characteristics | |
CN115587088A (en) | Junk data cleaning method and device, terminal device and readable storage medium | |
CN107451125B (en) | Method for performing rapid close semantic matching aiming at sequence-independent item groups | |
JPH07239861A (en) | Document retrieving device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210430 |
|
RJ01 | Rejection of invention patent application after publication |