CN111949647A - Emergency management service data cleaning method, system, terminal and readable storage medium - Google Patents

Emergency management service data cleaning method, system, terminal and readable storage medium Download PDF

Info

Publication number
CN111949647A
CN111949647A CN202010916563.0A CN202010916563A CN111949647A CN 111949647 A CN111949647 A CN 111949647A CN 202010916563 A CN202010916563 A CN 202010916563A CN 111949647 A CN111949647 A CN 111949647A
Authority
CN
China
Prior art keywords
data
cleaning
cleaned
service
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010916563.0A
Other languages
Chinese (zh)
Inventor
李莉
王志刚
周锟
罗敏
夏昕
郭宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Anyitong Science And Technology Dev Co ltd
Original Assignee
Shenzhen Anyitong Science And Technology Dev Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Anyitong Science And Technology Dev Co ltd filed Critical Shenzhen Anyitong Science And Technology Dev Co ltd
Priority to CN202010916563.0A priority Critical patent/CN111949647A/en
Publication of CN111949647A publication Critical patent/CN111949647A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The embodiment of the application provides a method, a system, a terminal and a readable storage medium for cleaning emergency management service data, wherein the method comprises the following steps: when a data cleaning task is started, cleaning configuration data of an item to be cleaned is obtained, wherein the cleaning configuration data comprises a target service field and corresponding service attributes; extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned; determining a corresponding cleaning rule according to the service attribute and cleaning data to be cleaned; and outputting a data cleaning result and performing visual display. The technical scheme of this application combines emergent management business demand, sets up the washing rule that corresponds to the data of different business attributes, and not only the washing content is abundanter comprehensive, and the washing rate of accuracy is also higher, still demonstrates through visual interface, promotes user experience etc..

Description

Emergency management service data cleaning method, system, terminal and readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, a system, a terminal, and a readable storage medium for cleaning emergency management service data.
Background
With the rapid development of the informatization and intellectualization level of the emergency management industry and the continuous improvement of the detailed requirements on business application, the data volume generated by emergency management departments and supervision objects at all levels is increased explosively. In the face of huge data volume, how to dig out valuable information or knowledge from massive data to provide reference for decision makers becomes an important subject which cannot be ignored. Due to reasons of data entry errors, data source merging or migration of different representation methods and the like, a system has many situations such as redundant data, missing data, uncertain data, inconsistent data and the like, such data is called as dirty data, and the efficiency of data utilization and the decision quality are seriously affected.
Disclosure of Invention
In view of the above, an object of the present application is to overcome the deficiencies in the prior art and to provide an emergency management service data cleaning method, system, terminal and readable storage medium.
The embodiment of the application provides an emergency management service data cleaning method, when a data cleaning task is started, cleaning configuration data of an item to be cleaned is obtained according to the data cleaning task, wherein the cleaning configuration data comprises a target service field and a corresponding service attribute;
extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned;
determining a corresponding cleaning rule according to the service attribute and carrying out data cleaning on the data to be cleaned based on the cleaning rule to obtain a data cleaning result;
and visually displaying the data cleaning result.
In one embodiment, the cleaning configuration data includes a configured first data interface, and the extracting, according to the target service field, corresponding emergency management service data from a database, and importing the emergency management service data into a data warehouse and performing data preprocessing to obtain data to be cleaned includes:
extracting emergency management service data matched with the target service field from a database through the first data interface;
importing the extracted service data into a specified data warehouse through a preset format file;
and performing data extraction, conversion and/or merging pretreatment on the service data in the data warehouse to obtain the data to be cleaned.
In one embodiment, the method for cleaning emergency management service data further includes:
before the data cleaning is carried out, counting the total number of the data to be cleaned, and estimating the cleaning time of the data cleaning task according to the total number;
and after the data cleaning is carried out, counting the number of marked abnormal data in the data cleaning result, associating each abnormal data with corresponding business data, and displaying the associated business data on a visual interface in a link mode.
In one embodiment, the service attribute types include a non-duplicate item, a classification management rule item, a non-filled abnormal item, a non-checked abnormal item, a non-empty item and a time limit early warning item, and the determining the corresponding cleaning rule according to the service attribute includes:
if the service attribute is the non-repeated item, performing data similarity calculation on the data to be cleaned, and if the data with the similarity larger than or equal to a preset similarity threshold exists, marking the data as suspected repeated data;
if the business attribute is the classification management standard item, acquiring actual classification management information corresponding to an enterprise and performing consistency judgment on the actual classification management information and the data to be cleaned, and if inconsistent data exists, recording the inconsistent data as the non-standard data;
if the service attribute is the non-filling abnormal item, judging whether the data to be cleaned meets a preset filling rule corresponding to the target service field, and if the data to be cleaned does not meet the preset filling rule, recording the data which do not meet the preset filling rule as filling abnormal data;
if the service attribute is the non-check abnormal item, judging whether the data to be cleaned meets a preset logic or a preset threshold value, and if the data does not meet the preset logic or the preset threshold value, recording the data which does not meet the preset logic as logic abnormal data;
if the service attribute is the non-vacancy item, judging whether the data to be cleaned is vacant or invalid, and if the data to be cleaned is vacant or invalid, recording the data which is vacant or invalid as logic abnormal data;
if the service attribute is the time limit early warning item, judging whether the data to be cleaned has a time limit which is expired or is not updated within a preset time period, and if the data has the time limit which is expired or is not updated within the preset time period, recording the data as time abnormal data.
In one embodiment, the data cleaning task includes at least one item to be cleaned, and each item to be cleaned is configured with a corresponding service attribute;
wherein the items to be cleaned belonging to the non-repetitive items comprise: the unique identification information of the enterprise is repeated and the safe login information is repeated; items to be cleaned belonging to the classification management standard items include: the enterprise classification error is inconsistent with the incorporated grid management information of the enterprise; the items to be cleaned belonging to the non-filled abnormal items include: enterprise basic information reporting abnormity and emergency special information reporting abnormity; items to be cleaned belonging to the non-check abnormal items include: enterprise safety production management related information logic abnormity and enterprise basic information logic abnormity; items to be cleaned belonging to the non-empty items include: enterprise basic information and safety production management information are lost; the items to be cleaned belonging to the time limit early warning item comprise: the license qualification of the enterprise is due and the related information of the safety production is updated.
In one embodiment, the cleansing configuration data further includes a configured second data interface, and the outputting and visually displaying the data cleansing result includes:
and importing the output data cleaning result into a corresponding cleaning large-class page through the second data interface for visual display, wherein the type of the cleaning large-class is correspondingly matched with the type of the service attribute, and the cleaning large-class comprises repeated data cleaning, service irregular data cleaning, abnormal data filling cleaning, missing data cleaning, abnormal verification data cleaning and time verification data cleaning.
In one embodiment, the method for cleaning emergency management service data further includes:
before the data cleaning task is started, synchronizing related service data in a preset emergency management range from each distributed source database into a local database;
in one embodiment, the method for cleaning emergency management service data further includes:
and after the data cleaning task is started, acquiring corresponding emergency management service data from each distributed source database in real time according to the target service field.
An embodiment of the present application further provides an emergency management service data cleaning system, including:
the task configuration module is used for providing a data cleaning configuration interface and generating a data cleaning task when cleaning configuration data input by a user in the data cleaning configuration interface is received;
the task cleaning module is used for acquiring cleaning configuration data of an item to be cleaned according to the data cleaning task when the data cleaning task is started, wherein the cleaning configuration data comprises a target service field and a corresponding cleaning rule; extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned; performing data cleaning on the data to be cleaned according to the corresponding cleaning rule to obtain a data cleaning result;
and the visual display module is used for visually displaying the data cleaning result.
An embodiment of the present application further provides a terminal, where the terminal includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the computer program to implement the method for cleaning emergency management service data.
Embodiments of the present application also provide a readable storage medium storing a computer program, which when executed, implements the above-mentioned emergency management service data cleansing method.
The embodiment of the application has the following advantages:
the technical scheme of this application combines emergent management business demand, sets up the washing rule that corresponds to the data of different business attributes, and not only the washing content is abundanter comprehensive, and the washing rate of accuracy is also higher, still demonstrates through visual interface, promotes user experience etc..
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 shows a first flowchart of an emergency management service data cleaning method according to an embodiment of the present application;
fig. 2 is a second flow chart of the emergency management service data cleaning method according to the embodiment of the present application;
fig. 3 is a schematic diagram illustrating a relationship between a service attribute and an item to be cleaned in the method for cleaning emergency management service data according to the embodiment of the present application;
fig. 4 shows a third flow chart of the method for cleaning emergency management service data according to the embodiment of the present application;
FIG. 5 is a schematic structural diagram of an emergency management service data cleaning system according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a cleaning configuration interface of the emergency management service data cleaning system according to an embodiment of the present application;
fig. 7 is a schematic diagram illustrating a cleaning result display interface of the emergency management service data cleaning system according to the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Example 1
Referring to fig. 1, the present embodiment provides a method for cleaning emergency management service data, which can be used for cleaning service data, and by using the method, the accuracy of cleaning the data can be improved, and the cleaning content is more comprehensive. The method is described in detail below.
S110, when the data cleaning task is started, cleaning configuration data of an item to be cleaned is obtained according to the data cleaning task, wherein the cleaning configuration data comprises a target service field and a corresponding service attribute.
Exemplarily, the data cleansing task includes at least one item to be cleansed, and each item to be cleansed is configured with a corresponding business attribute. Wherein the business attribute can be used to determine the type of the cleansing rule of the data to be cleansed. For example, in one embodiment, the types of business attributes may include, but are not limited to, non-duplicate items, classification management specification items, non-filled exception items, non-checked exception items, non-empty items, time-limited warning items, and the like. Generally, different business data often have different requirements due to different business requirements, for example, some data cannot be left unfilled, some data cannot be duplicated with the data of other enterprises, and so on. It can be understood that the service attribute is mainly divided according to the characteristics of different types of service data.
And S120, extracting corresponding emergency management service data from the database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned.
Exemplarily, the cleaning configuration data includes a first data interface configured by a user, where the first data interface is mainly used to obtain corresponding service data and perform data cleaning, and the like. In one embodiment, as shown in fig. 2, the step S120 mainly includes the following sub-steps:
and S121, calling the first data interface to extract the emergency management service data matched with the target service field from the database.
And S122, importing the extracted service data into a specified data warehouse through a preset format file.
And S123, performing data extraction, conversion and/or merging pretreatment on the service data in the data warehouse to obtain the data to be cleaned.
Exemplarily, the required emergency management service data may be extracted from a designated database through a big data interface or the like, for example, when the item to be cleaned is a duplicate of a unified social credit code of an enterprise, the target service field is the unified social credit code, and accordingly, the emergency management service data is the unified social credit code data registered by each enterprise in different platforms or systems. It is understood that one item to be cleaned may include one or more target business fields, for example, when the enterprise establishment time for the item to be cleaned exceeds the business deadline scope, the target business field includes 2, which are the enterprise establishment time and the business deadline. The database may be a local database, or may be a source database in a platform providing emergency management, which is not limited herein.
In one embodiment, the extracted service data may be imported into a designated data warehouse, such as a Hive data warehouse, through a configured JSON file. Of course, the data may be imported by a file in another format, and is not limited herein.
For the business data imported into the data warehouse, before formal cleaning, the business data may be further preprocessed, for example, by merging the same type of business data, converting the business data in different data formats, extracting part of the data, and the like. It will be appreciated that the preprocessing operation is primarily used to provide corresponding data to be cleaned for subsequent cleaning operations.
In this embodiment, for the acquisition of the service data, the method for cleaning the emergency management service data includes two ways, namely, synchronization before the emergency management service data is extracted and real-time synchronization. Generally, when the traffic data volume is large, the data can be acquired by means of early synchronization; and when the traffic data volume is small, the data can be synchronously acquired in real time. Of course, the two modes can be specifically selected according to actual requirements, for example, for the case of a small data amount, the two modes can be obtained by a synchronization mode in advance.
In one embodiment, the relevant service data within the preset emergency management scope can be synchronized to the local database from each distributed source database. For example, the service data may be obtained from source databases of a plurality of distributed platforms, such as a social security bureau, a business bureau, a national tax bureau, and the like. In addition, the data real-time performance of the local database and the like can be ensured by a periodic synchronization mode in consideration of the possibility of updating the data of the other platform. It can be understood that, before the data cleaning task is started, the system synchronizes the relevant service data from other platforms in advance, so that the synchronized service data sources can be directly used for data cleaning after the task is started, and the whole time of data cleaning can be greatly shortened. In another embodiment, in the case of a small amount of service data, the system may also obtain corresponding emergency management service data from each distributed source database in real time according to the corresponding target service field after the data cleaning task is started.
S130, determining a corresponding cleaning rule according to the service attribute and performing data cleaning on the data to be cleaned based on the cleaning rule to obtain a data cleaning result.
Exemplarily, the determining the corresponding cleansing rule according to the service attribute in step S130 includes:
in a first case, if the service attribute is a non-duplicate item, performing data similarity calculation on the data to be cleaned, and if data with the similarity greater than or equal to a preset similarity threshold exists, marking the data as suspected duplicate data. It is understood that the non-duplicate entries are mainly for some business data that cannot be the same, for example, the business names of two different businesses are usually different, and if they are the same, the two business name data are abnormal, and therefore need to be cleaned up to provide the user's attention.
And for the cleaning rule corresponding to the non-repeated item, mainly judging by carrying out similarity calculation, and when the similarity of the two data is greater than or equal to a preset similarity threshold, judging as suspected repeated data and marking. It is understood that the preset similarity threshold needs to be determined according to corresponding service data, for example, information for uniquely identifying an enterprise, such as a mobile phone number, a unified social credit code, and the like, and the preset similarity threshold is usually 100%.
Of course, the user may also adjust the preset similarity threshold for some specific occasions, for example, if the to-be-cleaned items are enterprises with similar names, that is, to inquire how many enterprises with similar names exist, the preset similarity threshold may be adjusted, and is not limited herein. For the cleaning item with similar business names, how many similar names exist in all registered businesses can be inquired through a big data word segmentation retrieval tool.
Exemplarily, as shown in fig. 3, for a plurality of items to be cleaned, items to be cleaned applicable to the non-repetitive items may include, but are not limited to, items including: the unique identification information of the enterprise is repeated, the safety login information is repeated, and the like. The unique identification information repetition of the enterprise can specifically include, but is not limited to, similar or repeated enterprise names, repeated production and operation addresses, repeated telephone numbers of enterprise contacts, repeated registration addresses and the like; the secure login information repetition may include, but is not limited to, login password repetition, etc.;
in the second case, if the service attribute is a classification management standard item, acquiring actual classification management information corresponding to the enterprise and performing consistency judgment on the actual classification management information and the data to be cleaned, and if inconsistent data exists, recording the inconsistent data as the non-standard data. It can be understood that the classified management specification item is mainly used for some service data requiring specification filling for emergency management and the like.
And for the cleaning rule corresponding to the classification management standard item, the consistency judgment is mainly carried out on the actual classification management information and the classification management service data to be cleaned. For example, for enterprises, gas stations, etc. that are wrongly classified as three-small or pure office locations, it can be determined whether the enterprise name contains the corresponding characters of "company", "gas station", and "gas station"; alternatively, the determination may be made for hospitals, schools, manufacturing companies, and the like according to the "industry administration classification" thereof.
Exemplary items to be cleaned suitable for categorizing the management specification items may include, but are not limited to including: enterprise classification errors, inconsistent enterprise incorporated grid management information, and the like. The enterprise classification error may include a gas station classified as a triple-small or pure office place, a hospital classified as a triple-small or pure office place, a school classified as a triple-small or pure office place, a company classified as a triple-small or pure office place, a manufacturing enterprise classified as a triple-small or pure office place, and the like. The incorporating grid management information inconsistency may include: the actual address (such as the address displayed on a hundred-degree map) of the enterprise which enters the grid is not in the range of the grid, the production operation address and the positioning address of the enterprise which does not enter the grid are not consistent, and the enterprise is managed by the emergency management bureau which is not incorporated into the grid. The "disagreement" may include two cases, that is, the business "map location" information is not in the scope of the street in the "administrative area" in the business information, or the business "detailed address" is not consistent with the hundredth location match or is not in the grid.
In a third case, if the service attribute is a non-filling abnormal item, determining whether the data to be cleaned meets a preset filling rule corresponding to the target service field, and if the data does not meet the preset filling rule, recording the data which does not meet the preset filling rule as filling abnormal data.
And for the cleaning rule corresponding to the non-filled abnormal item, mainly judging whether the filling rule corresponding to the target service field is met. For example, for the address class, it can be determined whether characters such as "xx City xx district" are included, and if not, it is determined that the report is abnormal; alternatively, a total number of characters less than or equal to 5 characters may be considered as a filled-in exception or the like. For the unit and the name, whether a number class, an English letter class or a character less than or equal to 5 exists can be judged, and the unit and the name are all regarded as abnormal data. For the ID card number, whether the ID card information appears can be judged: (ii) 3 and more than 3 digits are repeated 2 and more than 2 times in succession, e.g. 421232323001200677; (ii) there are more than 3 and 3 consecutive increasing or decreasing natural numbers, such as 421232323001200677; (iii) there are more than 5 repeated numbers, such as 421023199001288888; fourthly, the number 18 digits are incomplete, such as 4210231990012; the gender and the penultimate of the identification number are not consistent, and if the penultimate is any one or more of males and females, the gender and the identification number can be judged as abnormal data. For the mobile phone number in the enterprise information, whether the mobile phone number appears can be judged: (ii) 2 and more than 2 digits are repeated consecutively 2 and more than 2 times, e.g. 13838226722; (ii) there are more than 5 and 5 consecutive increasing or decreasing natural numbers, such as 13223456789; third, repeat number of words is 5 or more, such as 13288888888; fourthly, the number 11 is incomplete, for example 13223456, if one or more than one of the numbers are present, the number is judged to be abnormal data.
Exemplary items to be cleaned for non-filled exception items may include, but are not limited to including: enterprise basic information reporting abnormity, emergency special information reporting abnormity and the like. Wherein, the enterprise basic information reporting exception can include: the 'registration address' filling in the enterprise information is abnormal, the 'production and operation address' filling in is abnormal, the mobile phone number of the enterprise contact person is abnormal, the identity card number of the enterprise legal person is abnormal, and the like. The emergency special information reporting exception may include: filling abnormity of 'warehouse address' in emergency materials, filling abnormity of 'storage address' in emergency materials, filling abnormity of 'detailed address' in public risk points and the like.
In a fourth case, if the service attribute is a non-check abnormal item, it is determined whether the data to be cleaned satisfies a preset logic or a preset threshold, and if there is unsatisfied data, the unsatisfied data is recorded as logic abnormal data.
For the cleaning rule of the non-check abnormal item, a user can check some key data or information, mainly to judge whether the key data or information meets a preset logic or a preset threshold. And judging the abnormal data once the abnormal data exceeds a preset threshold or a preset range. For example, if the number of persons (people) who are security managers is "0", the item to be cleaned is determined to be abnormal if the item data is 0.
Exemplary items to be cleaned for non-verified outliers may include, but are not limited to including: the enterprise safety production related information logic is abnormal, the enterprise basic information logic is abnormal and the like. Wherein, the enterprise basic information logic exception may include: the classification of the industries and the managers of the industries is inconsistent, and the establishment time of the enterprises exceeds the range of the business deadline. The safety production related information logic exception may include: the number of the safety management persons is less than the number of the persons holding the certificate, the number of the safety management persons is '0', the number of the insurance persons to be paid is less than the number of the actual payment, and the like.
In a fifth case, if the service attribute is a non-empty item, it is determined whether the data to be cleaned is empty or invalid, and if there is empty or invalid data, the empty or invalid data is recorded as logic abnormal data.
And for the cleaning rule of the non-vacant item, if the data corresponding to the target service field is empty or invalid, judging that the target service field is missing abnormally. Exemplary items to be cleaned that are suitable for non-void items may include, but are not limited to including: enterprise basic information is missing, safety production management information is missing, and the like. Wherein, the enterprise basic information missing can include: the method comprises the following steps of enterprise name loss, enterprise registration address loss, hazardous enterprise license number loss, enterprise unified social credit code loss, enterprise production and operation address loss, enterprise administrative area loss, enterprise membership relation loss and the like. The loss of the safety production management information may include: the information of the special information table of the dangerous equipment is lost, the information of the limited space operation condition is lost, the information of the special information table of the dangerous chemical is lost, and the like.
In a sixth situation, if the service attribute is a time limit early warning item, it is determined whether the data to be cleaned has expired or is not updated within a preset time period, and if the data has expired or is not updated within the preset time period, the data is recorded as time abnormal data.
The cleaning rule of the time limit early warning item is mainly judged by time, for example, some important information is not updated in time, or a related qualification certificate expires. Exemplary items to be cleaned suitable for the time-limit warning item may include, but are not limited to, items including: the license qualification maturity of the enterprise and the update overdue of the related information of the safety production, etc. Wherein, the qualification expiration of the license can comprise: a license expiring enterprise, a security manager certificate expiring enterprise, etc. The information update overdue may include: the enterprise basic information is not updated after the preset time, and the self-checking and self-reporting are not carried out more than half a year.
It can be understood that for these situations, by dividing the data into the above several service attributes according to different service requirements, different cleaning rules can be applied to different types of service data, and the cleaning accuracy can be improved.
And S140, visually displaying the data cleaning result.
Exemplarily, the cleaning configuration data further includes a configured second data interface, and the second data interface may be configured to import the output data cleaning result to a corresponding page for display. In an embodiment, for step S140, the system may import the output data cleansing result into the corresponding cleansing large-class page through the configured second data interface for visual presentation.
Wherein, the type of the cleaning large class is correspondingly matched with the type of the service attribute. It can be understood that the cleaning class is mainly used for type distinguishing or identifying different types of data cleaning, so that a user can conveniently manage and query various data. For the types of business attributes described above, the purge broad class will illustratively include: repeated data cleaning, service irregular data cleaning, filling abnormal data cleaning, missing data cleaning, check abnormal data cleaning and time check data cleaning. Of course, the user may also define a new service attribute according to the actual requirement of the user, which is not limited herein.
In an alternative embodiment, as shown in fig. 4, before performing data cleaning, the method further includes step S150 of counting the total amount of data to be cleaned, and estimating the cleaning time of the data cleaning task according to the total amount. Optionally, after the data cleaning, the method further includes step S160 of counting the number of the marked abnormal data in the data cleaning result, where each abnormal data is associated with corresponding business data, and the associated business data can be displayed on a visual interface in a link manner.
It is to be understood that the execution sequence of step S150 and step S160 is not limited accordingly, as long as the execution is performed before the formal data cleaning and after the data cleaning is completed.
The method for cleaning the emergency management service data combines the requirements of the emergency management service, sets corresponding cleaning rules for the data with different service attributes, is rich and comprehensive in cleaning content and high in cleaning accuracy, and can be displayed through a visual interface, so that the user experience is improved. In addition, the data calculation bottleneck problem can be solved by adopting a cleaning mode such as big data, and the cleaning result obtained by the method can be better subjected to further deep mining and the like.
Example 2
Referring to fig. 5, based on the foregoing embodiment 1, the present embodiment provides an emergency management service data cleaning system, exemplarily, the system mainly includes a task configuration module, a task cleaning module, and a visualization display module, and each module is described below.
And the task configuration module is used for providing a data cleaning configuration interface and generating a data cleaning task when receiving cleaning configuration data input by a user in the data cleaning configuration interface.
Exemplarily, as shown in fig. 6, the cleaning configuration interface may include, but is not limited to, related configuration items including names of items to be cleaned, cleaning manners, first data interfaces, second data interfaces, and cleaning results, and the like. For example, the cleaning manner may include cleaning by using an oracle function or an interface such as big data. The oracle function is mainly suitable for operations with simpler logics, the big data can be subjected to distributed operations such as spark and mapreduce, and especially for the condition that the association relation between the tables is more, the big data cleaning mode is better in performance.
Further, a specific interface is called to process through a first data interface (namely, interface 1) and a second data interface (namely, interface 2) specified by a user, wherein the interface 1 is mainly used for acquiring emergency management service data and performing specific cleaning operation; the interface 2 is mainly used for counting the cleaning quantity and the like. The related configuration items for the cleaning result may include, but are not limited to, a storage table name including the cleaning result, a page display path, a target service field for display, and the like. In addition, the user may also define the execution mode of the data cleansing task, for example, the execution mode may be a timing execution mode, or a manual execution mode, that is, the data cleansing is executed once per click, and the like.
The task cleaning module is used for acquiring cleaning configuration data of an item to be cleaned according to the data cleaning task when the data cleaning task is started, wherein the cleaning configuration data comprises a target service field and a corresponding cleaning rule; extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned; and carrying out data cleaning on the data to be cleaned according to the corresponding cleaning rule and outputting a data cleaning result.
And the visual display module is used for visually displaying the output data cleaning result.
For example, taking a data cleansing result of an item to be cleansed as an example, as shown in fig. 7, a system interface has a plurality of cleansing large-class tags, including repeated data, abnormal service, abnormal filling, missing data, abnormal verification and time verification, a user can enter different cleansing large-class pages by clicking different tags, and the current page is a verification abnormal page. In the check exception page, the user can see the historical cleaning item records, and in addition, if the user wants to further know the cleaning result of a certain cleaning item, the user can click to enter the corresponding cleaning item record to check more detailed cleaning data results.
It is understood that the task cleaning module corresponds to the steps S110-S130 of the above-mentioned emergency management service data cleaning method, and the visual display module corresponds to the step S140, so detailed description is not repeated here. Any alternatives in the above embodiments are also applicable to this embodiment, and therefore will not be described in detail here.
The application also provides a terminal, such as a server and the like. The terminal illustratively includes a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program to implement the above-described emergency management service data cleansing method.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data (such as a database, a data table, etc.) created according to the use of the terminal, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The application also provides a readable storage medium for storing the computer program used in the terminal.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified rule function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a grid device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims (10)

1. An emergency management service data cleaning method is characterized by comprising the following steps:
when a data cleaning task is started, acquiring cleaning configuration data of an item to be cleaned according to the data cleaning task, wherein the cleaning configuration data comprises a target service field and a corresponding service attribute;
extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned;
determining a corresponding cleaning rule according to the service attribute and carrying out data cleaning on the data to be cleaned based on the cleaning rule to obtain a data cleaning result;
and visually displaying the data cleaning result.
2. The method of claim 1, wherein the cleaning configuration data comprises a configured first data interface, and the extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned comprises:
extracting emergency management service data matched with the target service field from a database through the first data interface;
importing the extracted service data into a specified data warehouse through a preset format file;
and performing data extraction, conversion and/or merging pretreatment on the service data in the data warehouse to obtain the data to be cleaned.
3. The method of claim 1 or 2, further comprising:
before the data cleaning is carried out, counting the total number of the data to be cleaned, and estimating the cleaning time of the data cleaning task according to the total number;
and after the data cleaning is carried out, counting the number of marked abnormal data in the data cleaning result, associating each abnormal data with corresponding business data, and displaying the associated business data on a visual interface in a link mode.
4. The method of claim 1, wherein the types of the service attributes include a non-duplicate item, a classification management rule item, a non-filled abnormal item, a non-checked abnormal item, a non-empty item, and a time-limit warning item, and wherein determining the corresponding cleaning rule according to the service attributes includes:
if the service attribute is the non-repeated item, performing data similarity calculation on the data to be cleaned, and if the data with the similarity larger than or equal to a preset similarity threshold exists, marking the data as suspected repeated data;
if the business attribute is the classification management standard item, acquiring actual classification management information corresponding to an enterprise and performing consistency judgment on the actual classification management information and the data to be cleaned, and if inconsistent data exists, recording the inconsistent data as the non-standard data;
if the service attribute is the non-filling abnormal item, judging whether the data to be cleaned meets a preset filling rule corresponding to the target service field, and if the data to be cleaned does not meet the preset filling rule, recording the data which do not meet the preset filling rule as filling abnormal data;
if the service attribute is the non-check abnormal item, judging whether the data to be cleaned meets a preset logic or a preset threshold value, and if the data does not meet the preset logic or the preset threshold value, recording the data which does not meet the preset logic as logic abnormal data;
if the service attribute is the non-vacancy item, judging whether the data to be cleaned is vacant or invalid, and if the data to be cleaned is vacant or invalid, recording the data which is vacant or invalid as logic abnormal data;
if the service attribute is the time limit early warning item, judging whether the data to be cleaned has a time limit which is expired or is not updated within a preset time period, and if the data has the time limit which is expired or is not updated within the preset time period, recording the data as time abnormal data.
5. The method according to claim 4, wherein the data cleansing task comprises at least one item to be cleansed, each item to be cleansed being configured with a corresponding business attribute;
wherein the items to be cleaned belonging to the non-repetitive items comprise: the unique identification information of the enterprise is repeated and the safe login information is repeated; items to be cleaned belonging to the classification management standard items include: the enterprise classification error is inconsistent with the incorporated grid management information of the enterprise; the items to be cleaned belonging to the non-filled abnormal items include: enterprise basic information reporting abnormity and emergency special information reporting abnormity; items to be cleaned belonging to the non-check abnormal items include: enterprise safety production management related information logic abnormity and enterprise basic information logic abnormity; items to be cleaned belonging to the non-empty items include: enterprise basic information and safety production management information are lost; the items to be cleaned belonging to the time limit early warning item comprise: the license qualification of the enterprise is due and the related information of the safety production is updated.
6. The method of claim 4, wherein the cleansing configuration data further comprises a configured second data interface, and wherein visually presenting the data cleansing results comprises:
and importing the output data cleaning result into a corresponding cleaning large-class page through the second data interface for visual display, wherein the type of the cleaning large-class is correspondingly matched with the type of the service attribute, and the cleaning large-class comprises repeated data cleaning, service irregular data cleaning, abnormal data filling cleaning, missing data cleaning, abnormal verification data cleaning and time verification data cleaning.
7. The method of claim 1, further comprising:
before the data cleaning task is started, synchronizing related service data in a preset emergency management range from each distributed source database into a local database;
and/or acquiring corresponding emergency management service data from each distributed source database in real time according to the target service field after the data cleaning task is started.
8. An emergency management business data cleansing system, comprising:
the task configuration module is used for providing a data cleaning configuration interface and generating a data cleaning task when cleaning configuration data input by a user in the data cleaning configuration interface is received;
the task cleaning module is used for acquiring cleaning configuration data of an item to be cleaned according to the data cleaning task when the data cleaning task is started, wherein the cleaning configuration data comprises a target service field and a corresponding cleaning rule; extracting corresponding emergency management service data from a database according to the target service field, importing the emergency management service data into a data warehouse, and performing data preprocessing to obtain data to be cleaned; performing data cleaning on the data to be cleaned according to the corresponding cleaning rule to obtain a data cleaning result;
and the visual display module is used for visually displaying the data cleaning result.
9. A terminal, characterized in that the terminal comprises a processor and a memory, the memory storing a computer program for executing the computer program to implement the emergency management service data cleansing method according to any one of claims 1 to 7.
10. A readable storage medium, characterized in that it stores a computer program which, when executed, implements the emergency management service data cleansing method according to any one of claims 1 to 7.
CN202010916563.0A 2020-09-03 2020-09-03 Emergency management service data cleaning method, system, terminal and readable storage medium Pending CN111949647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010916563.0A CN111949647A (en) 2020-09-03 2020-09-03 Emergency management service data cleaning method, system, terminal and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010916563.0A CN111949647A (en) 2020-09-03 2020-09-03 Emergency management service data cleaning method, system, terminal and readable storage medium

Publications (1)

Publication Number Publication Date
CN111949647A true CN111949647A (en) 2020-11-17

Family

ID=73367991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010916563.0A Pending CN111949647A (en) 2020-09-03 2020-09-03 Emergency management service data cleaning method, system, terminal and readable storage medium

Country Status (1)

Country Link
CN (1) CN111949647A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051257A (en) * 2021-03-22 2021-06-29 深圳微众信用科技股份有限公司 Service data cleaning method and device
CN113138982A (en) * 2021-05-25 2021-07-20 黄柱挺 Big data cleaning method
CN113204544A (en) * 2021-05-10 2021-08-03 深圳技术大学 Data cleaning method and device and computer readable storage medium
CN113239027A (en) * 2021-05-11 2021-08-10 浪潮软件股份有限公司 Data cleaning and matching processing method
CN113379219A (en) * 2021-06-04 2021-09-10 广东省电信规划设计院有限公司 Quality evaluation method and device for emergency management data
CN113609407A (en) * 2021-07-30 2021-11-05 盐城金堤科技有限公司 Region consistency checking method and device
CN114490616A (en) * 2022-02-10 2022-05-13 北京星汉博纳医药科技有限公司 Data cleaning method and device, electronic equipment and storage medium
CN114722037A (en) * 2022-05-16 2022-07-08 中国信息通信研究院 Industrial internet middleware data processing method, middleware and readable storage medium
CN115203192A (en) * 2022-09-15 2022-10-18 北京清众神州大数据有限公司 Cleaning method and device based on visual data and related components
CN115640285A (en) * 2022-10-24 2023-01-24 北京国电通网络技术有限公司 Power abnormality information transmission method, device, electronic apparatus, and medium
CN116303404A (en) * 2023-05-11 2023-06-23 起点(山东)大数据科技有限责任公司 Big data storage system for preventing data redundancy based on data classification and peer comparison
CN116362443A (en) * 2023-03-30 2023-06-30 中国水利水电第三工程局有限公司 Data management method and device for enterprise information platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294492A (en) * 2015-06-08 2017-01-04 深圳中兴网信科技有限公司 Data cleaning method and cleaning engine
CN109542885A (en) * 2018-11-19 2019-03-29 北京锐安科技有限公司 Data cleaning method, device, equipment and storage medium
WO2019227075A1 (en) * 2018-05-24 2019-11-28 People.ai, Inc. Systems and methods for maintaining a node graph from electronic activities and record objects
CN110727668A (en) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 Data cleaning method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294492A (en) * 2015-06-08 2017-01-04 深圳中兴网信科技有限公司 Data cleaning method and cleaning engine
WO2019227075A1 (en) * 2018-05-24 2019-11-28 People.ai, Inc. Systems and methods for maintaining a node graph from electronic activities and record objects
CN109542885A (en) * 2018-11-19 2019-03-29 北京锐安科技有限公司 Data cleaning method, device, equipment and storage medium
CN110727668A (en) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 Data cleaning method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
樊重俊等: "《大数据分析与应用》", 第79-81页, pages: 79 - 81 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051257B (en) * 2021-03-22 2024-04-02 深圳微众信用科技股份有限公司 Service data cleaning method and device
CN113051257A (en) * 2021-03-22 2021-06-29 深圳微众信用科技股份有限公司 Service data cleaning method and device
CN113204544A (en) * 2021-05-10 2021-08-03 深圳技术大学 Data cleaning method and device and computer readable storage medium
CN113239027A (en) * 2021-05-11 2021-08-10 浪潮软件股份有限公司 Data cleaning and matching processing method
CN113138982A (en) * 2021-05-25 2021-07-20 黄柱挺 Big data cleaning method
CN113138982B (en) * 2021-05-25 2022-09-27 深圳市元宇宙科技有限公司 Big data cleaning method
CN113379219A (en) * 2021-06-04 2021-09-10 广东省电信规划设计院有限公司 Quality evaluation method and device for emergency management data
CN113609407A (en) * 2021-07-30 2021-11-05 盐城金堤科技有限公司 Region consistency checking method and device
CN113609407B (en) * 2021-07-30 2024-04-05 盐城天眼察微科技有限公司 Regional consistency verification method and device
CN114490616A (en) * 2022-02-10 2022-05-13 北京星汉博纳医药科技有限公司 Data cleaning method and device, electronic equipment and storage medium
CN114722037B (en) * 2022-05-16 2022-08-26 中国信息通信研究院 Industrial Internet middleware data processing method, middleware and readable storage medium
CN114722037A (en) * 2022-05-16 2022-07-08 中国信息通信研究院 Industrial internet middleware data processing method, middleware and readable storage medium
CN115203192A (en) * 2022-09-15 2022-10-18 北京清众神州大数据有限公司 Cleaning method and device based on visual data and related components
CN115640285A (en) * 2022-10-24 2023-01-24 北京国电通网络技术有限公司 Power abnormality information transmission method, device, electronic apparatus, and medium
CN115640285B (en) * 2022-10-24 2023-10-27 北京国电通网络技术有限公司 Power abnormality information transmission method, device, electronic equipment and medium
CN116362443A (en) * 2023-03-30 2023-06-30 中国水利水电第三工程局有限公司 Data management method and device for enterprise information platform
CN116303404A (en) * 2023-05-11 2023-06-23 起点(山东)大数据科技有限责任公司 Big data storage system for preventing data redundancy based on data classification and peer comparison
CN116303404B (en) * 2023-05-11 2023-08-04 起点(山东)大数据科技有限责任公司 Big data storage system for preventing data redundancy based on data classification and peer comparison

Similar Documents

Publication Publication Date Title
CN111949647A (en) Emergency management service data cleaning method, system, terminal and readable storage medium
CN108399240B (en) Enterprise change information data mining method and system
US7386439B1 (en) Data mining by retrieving causally-related documents not individually satisfying search criteria used
CN111159272A (en) Data quality monitoring and early warning method and system based on data warehouse and ETL
CN111209400B (en) Data analysis method and device
EP2235648A2 (en) Dynamic machine assisted informatics
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN111178005B (en) Data processing system, method and storage medium
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN111159161A (en) ETL rule-based data quality monitoring and early warning system and method
US20230061746A1 (en) Managing hierarchical data structures for entity matching
CN113722301A (en) Big data processing method, device and system based on education information and storage medium
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
Schild Linking'Orbis' Company Data with Establishment Data from the German Federal Employment Agency
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN112631889B (en) Portrayal method, device, equipment and readable storage medium for application system
CN111861733B (en) Fraud prevention and control system and method based on address fuzzy matching
CN113240309A (en) Enterprise dynamic monitoring method and device
CN111061793B (en) Data processing system and method
Francis et al. The Police National Computer and the Offenders Index: can they be combined for research purposes?
CN113822715B (en) Data acquisition, training and processing integrated platform analysis method
CN114463053A (en) Enterprise attribution classification method and system
CN114153860A (en) Business data management method and device, electronic equipment and storage medium
CN114840519A (en) Data labeling method, equipment and storage medium
CN114581215A (en) Enterprise credit state identification construction method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 1306, phase 1, Tianli Central Business Plaza, haidesan Road, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: SHENZHEN ANYITONG SCIENCE AND TECHNOLOGY DEV Co.,Ltd.

Address before: 518100 1306, phase 1, Tianli Central Business Plaza, haidesan Road, Bao'an District, Shenzhen City, Guangdong Province

Applicant before: SHENZHEN ANYITONG SCIENCE AND TECHNOLOGY DEV Co.,Ltd.