CN112256682B - Data quality detection method and device for multi-dimensional heterogeneous data - Google Patents

Data quality detection method and device for multi-dimensional heterogeneous data Download PDF

Info

Publication number
CN112256682B
CN112256682B CN202011140921.XA CN202011140921A CN112256682B CN 112256682 B CN112256682 B CN 112256682B CN 202011140921 A CN202011140921 A CN 202011140921A CN 112256682 B CN112256682 B CN 112256682B
Authority
CN
China
Prior art keywords
data
quality
resource
completion
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011140921.XA
Other languages
Chinese (zh)
Other versions
CN112256682A (en
Inventor
贾志忠
袁伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PCI Technology Group Co Ltd
Original Assignee
PCI Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PCI Technology Group Co Ltd filed Critical PCI Technology Group Co Ltd
Priority to CN202011140921.XA priority Critical patent/CN112256682B/en
Publication of CN112256682A publication Critical patent/CN112256682A/en
Application granted granted Critical
Publication of CN112256682B publication Critical patent/CN112256682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a data quality detection method and device for multi-dimensional heterogeneous data. According to the technical scheme provided by the embodiment of the application, resource data received from different data sources are standardized by establishing a data quality check rule, and heterogeneous data in different network domains and service domains are standardized; the acquired resource data are further identified by calling different processing interfaces to complete the data, and the data after being completed are subjected to quality detection to judge whether to store or alarm the data.

Description

Data quality detection method and device for multi-dimensional heterogeneous data
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data quality detection method and device for multi-dimensional heterogeneous data.
Background
Urban multidimensional sensing data gradually becomes an urban basic data source. However, at present, city management departments establish a plurality of chimney-type systems for different services, the systems all have own storage equipment and IT equipment, and independent management tools and databases, and different systems cannot share resources, interact and access, so that a resource island and an information island are formed. And due to different use scenes, different data standards of different data are different, and the data cannot be effectively and controllably managed. Therefore, it is an urgent technical problem for those skilled in the art to design a method capable of accessing and managing multidimensional data information.
Disclosure of Invention
The embodiment of the application provides a data quality detection method and device for multi-dimensional heterogeneous data, which can be used for standardizing resource data received from different data sources by establishing a data quality check rule, standardizing heterogeneous data islands of different network domains and service domains, realizing fine management of data access by a management department, and quickly preparing available, credible, reliable and safe data for a video big data application scene.
In a first aspect, an embodiment of the present application provides a data quality detection method for multidimensional heterogeneous data, including:
receiving resource data transmitted by different data sources, wherein the resource data comprises basic equipment data, population data, logistics data and track data;
performing data conversion completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic equipment data, the logistics data, the track data and the population data in the data conversion completion process to perform data completion;
performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meet a preset quality requirement;
and storing the data of the completion resource data meeting the preset quality requirement.
Further, after the data storage is performed on the completion resource data meeting the preset quality requirement, the method further includes:
and performing alarm operation on the completion resource data which does not meet the preset quality requirement.
Further, before the receiving the resource data transmitted by the different data sources, the method further includes:
and performing data exploration on resource data transmitted by different data sources before access, wherein the data exploration comprises null value rate exploration, enumerated value exploration and data distribution exploration.
Further, the performing data quality detection on the completion resource data according to a preset data quality rule includes:
and carrying out data exploration on the completion resource data according to a preset data quality rule, wherein the preset data quality rule comprises one or more of a data field rule, an equipment profiling information verification rule and a video picture quality rule.
Further, the performing data quality detection on the completion resource data according to a preset data quality rule includes:
and configuring different preset data quality rules according to different completion resource data to perform data quality detection.
Further, after the data storage is performed on the completion resource data meeting the preset quality requirement, the method further includes:
and carrying out periodic quality analysis on the supplemented resource data subjected to data storage to obtain a corresponding quality report, and displaying the quality report.
Further, the track type data comprises one or more of face snapshot data, vehicle passing data at a gate, video access control data, video structured data, electric enclosure data and WIFI data;
after the receiving resource data transmitted by different data sources, the resource data including basic device data, population data, logistics data and trajectory class data, the method further includes:
and pushing the face snapshot data, the vehicle passing data of the gate, the video access control data, the video structured data, the electric enclosure data and the WIFI data to a card queue to perform real-time exploration of the data.
Further, the performing data conversion completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic device data, the logistics data, the trajectory data and the population data in the data conversion completion process to perform data completion includes:
extracting the resource data and filling the extracted data into a preset data table;
judging whether the filled preset data table has a missing data type, if so, calling corresponding resource data according to the missing data type to further identify the corresponding resource data;
identifying corresponding resource data to obtain corresponding position information and/or identity information;
and performing data completion on the missing data type in the preset data table according to the position information and/or the identity information to obtain completed resource data.
Further, the identity information is obtained by the following steps:
extracting face information in the face snapshot data, and calling a corresponding identity recognition module to recognize the face information to obtain corresponding identity information; or the like, or, alternatively,
extracting vehicle information in the bayonet vehicle passing data, and extracting corresponding identity information according to the vehicle information, wherein the identity information comprises driver information or vehicle owner information; or the like, or, alternatively,
and acquiring corresponding mobile phone information through WIFI data, extracting a mobile phone number or a mac address according to the mobile phone information, and determining identity information of the owner according to the mobile phone number or the mac address.
In a second aspect, an embodiment of the present application provides an apparatus for detecting data quality of multidimensional heterogeneous data, including:
a receiving module: the system comprises a data source, a data processing device and a data processing device, wherein the data processing device is used for receiving resource data transmitted by different data sources, and the resource data comprises basic equipment data, population data, logistics data and track type data;
a conversion module: the system comprises a data conversion and completion module, a data processing module and a data processing module, wherein the data conversion and completion module is used for performing data conversion and completion on the resource data according to a preset data table to obtain completed resource data, and fusing the basic equipment data, the logistics data, the track data and population data in the data conversion and completion process to perform data completion;
a data probing module: performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meet a preset quality requirement;
a data storage module: the method is used for storing the data of the completion resource data meeting the preset quality requirement.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory and one or more processors;
the memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for detecting data quality of multi-dimensional heterogeneous data according to the first aspect.
In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for data quality detection of multi-dimensional heterogeneous data according to the first aspect.
The embodiment of the application standardizes the received resource data of different data sources by establishing a data quality check rule, and standardizes heterogeneous data in different network domains and service domains; the acquired resource data are further identified by calling different processing interfaces to complete the data, and the data after being completed are subjected to quality detection to judge whether to store or alarm the data.
Drawings
Fig. 1 is a flowchart of a data quality detection method for multidimensional heterogeneous data according to an embodiment of the present application;
FIG. 2 is a diagram illustrating an example of a generic data probing report provided by an embodiment of the present application
FIG. 3 is a schematic flow chart illustrating data completion according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a page display of a video monitoring rule set provided by an embodiment of the present application;
FIG. 5 is a schematic illustration of a page display of quality task management provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data quality detection apparatus for multidimensional heterogeneous data according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
At present, a city management department establishes a plurality of chimney-type systems for different services, the systems all have own storage equipment and IT equipment, and independent management tools and databases, and different systems cannot share resources, interact and access, so that a resource island and an information island are formed. And due to different use scenes, different data standards of different data are different, and the data cannot be effectively and controllably managed. Based on this, the embodiment of the application standardizes the received resource data of different data sources by establishing a data quality check rule, and standardizes the heterogeneous data in different network domains and service domains; the acquired resource data are further identified by calling different processing interfaces to complete the data, and the data after being completed are subjected to quality detection to judge whether to store or alarm the data.
Fig. 1 shows a flowchart of a data quality detection method for multidimensional heterogeneous data according to an embodiment of the present application, where the data quality detection method for multidimensional heterogeneous data provided in this embodiment may be executed by a data quality detection device for multidimensional heterogeneous data, the data quality detection device for multidimensional heterogeneous data may be implemented in a software and/or hardware manner, and the data quality detection device for multidimensional heterogeneous data may be formed by two or more physical entities or may be formed by one physical entity. Generally, the data quality detection device for the multidimensional heterogeneous data can be a computer, a mobile phone, a tablet or a background server, and the like.
The following description will be given by taking a background server as an example of a device for executing the data quality detection method of the multidimensional heterogeneous data. Referring to fig. 1, the data quality detection method for the multidimensional heterogeneous data specifically includes:
s101: receiving resource data transmitted by different data sources, wherein the resource data comprises basic equipment data, population data, logistics data and track data.
Because the current service data distribution is relatively dispersed, information between different service systems is relatively isolated; therefore, when further business development is carried out, no method is available for rapidly accessing reliable and comprehensive data, and the difficulty of data access of a subsequent development system is increased. Therefore, in this step, corresponding data is obtained mainly by interfacing different service systems, and then the data is subjected to fusion processing. For example, the docked service system includes a public security camera system, a logistics system, a public security identity system, and the like to obtain corresponding resource data.
More preferably, before the receiving the resource data transmitted by the different data sources, the method further includes:
and performing data exploration on resource data transmitted by different data sources before access, wherein the data exploration comprises null value rate exploration, enumerated value exploration and data distribution exploration.
The above-mentioned is to guarantee that the subsequent data processing is convenient to carry out the investigation before the access, the investigation before the access can directly butt joint multiple data sources, realize data source connectivity, sample investigation, dictionary distribution investigation, data distribution investigation, single field data distribution investigation, etc. The method mainly probes some basic data information, such as whether the data has null values and the data distribution. In specific implementation, pre-access probing is mainly realized based on a jupyter notebook, and a universal data probing tool is developed for different data sources (including obtaining sample data, giving field number, field type, null rate, value example and the like). After the pre-access probing is performed on the data, a corresponding general data probing report example can be obtained, as shown in fig. 2, fig. 2 is a general data probing report example provided in this embodiment of the present application. By reading the corresponding probing report, the data quality condition of the probed data source is known, and the corresponding data information, the variable type, the corresponding alarm data, and the like can be determined through fig. 2.
S102: and performing data conversion completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic equipment data, the logistics data, the track data and the population data in the data conversion completion process to perform data completion.
In the step, mainly for obtaining standard data content, since the types of data obtained by interfacing different service systems are different, all data are unified and arranged by setting corresponding data standard specifications, for example, trajectory data, basic device data and population data are fused, so that the obtained data are associated with people. And converting the acquired data into a standard data structure, supplementing the missing data, and extracting from other attribute data to realize data completion.
More preferably, fig. 3 is a schematic flow chart of performing data completion according to an embodiment of the present application, and as shown in fig. 3, step S102 includes:
s102 a: extracting the resource data and filling the extracted data into a preset data table;
s102 b: judging whether the filled preset data table has a missing data type, if so, calling corresponding resource data according to the missing data type to further identify the corresponding resource data;
s102 c: identifying corresponding resource data to obtain corresponding position information and/or identity information;
s102 d: and performing data completion on the missing data type in the preset data table according to the position information and/or the identity information to obtain completed resource data.
In the embodiment of the application, the basic equipment data defines field attributes, attribute must be filled, attribute value range, basic data association information and the like; when the corresponding attribute is identified to be missing, the corresponding data can be completed according to the requirement.
The above steps are mainly for realizing data conversion and data completion, because there may be a problem of data loss during data filling, for example, during information acquisition, only the identification card number information of a person can be acquired, but no face information and position information of the person, etc., at this time, an identification interface may be called to call the relevant information of the user identification card, and corresponding face information is obtained through the identification card number, because the identification card number, address information, face image information, etc. are generally stored in the public security system, the information completion is realized by calling the corresponding interface. Similarly, when the data only contains the face data but does not contain the identification card number information, the corresponding interface can be called to obtain the identification card number information to complete the completion of the information.
More preferably, the identity information is obtained by the following steps:
extracting face information in the face snapshot data, and calling a corresponding identity recognition module to recognize the face information to obtain corresponding identity information; or the like, or, alternatively,
extracting vehicle information in the bayonet vehicle passing data, and extracting corresponding identity information according to the vehicle information, wherein the identity information comprises driver information or vehicle owner information; or the like, or, alternatively,
and acquiring corresponding mobile phone information through WIFI data, extracting a mobile phone number or a mac address according to the mobile phone information, and determining identity information of the owner according to the mobile phone number or the mac address.
All information is associated with identity information, namely people, and the corresponding information of the reporting equipment can be further obtained except through face recognition, for example, the corresponding equipment installation position, the area, the street and the like of the equipment can be checked, and finally, the longitude and latitude of the position are further determined; the above information is stored as position information. Specifically, when there is no location information, the corresponding location information is determined according to the device information; and calling the position information of the corresponding basic equipment data by reading the basic equipment data, and determining the specific area, street, longitude and latitude information according to the position information.
More preferably, in the embodiment of the application, the trajectory data includes one or more of face snapshot data, vehicle passing data at a gate, video access control data, video structured data, electric enclosure data, and WIFI data;
after the receiving resource data transmitted by different data sources, the resource data including basic device data, population data, logistics data and trajectory-class data, the method further includes:
s1021: and pushing the face snapshot data, the vehicle passing data of the gate, the video access control data, the video structured data, the electric enclosure data and the WIFI data to a card queue to perform real-time exploration of the data.
In the actual data acquisition process, some data are static data, such as basic device data, population data, logistics data and the like, which do not change in real time, or are updated at each certain time, but some data are different, such as face snapshot data and the like, which need to be updated in real time, when corresponding face data need to be acquired through a camera, the acquired image needs to be continuously identified, and then when the data are probed, the real-time probing needs to be performed. Therefore, the acquired data needs to be pushed to the kafka queue for data processing, and real-time exploration of the data is realized by adjusting the data source for exploration to the kafka queue. In the embodiment of the application, the kafka queue is also a kaffa queue, and both represent the same meaning.
S103: and performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meet the preset quality requirement.
In the embodiment of the application, a universal or service customized quality rule checking function is built in the system to check the data quality, if the data has national standard requirements or market-level standard requirements, the corresponding national standard or market-level standard requirements are adopted to check the data quality; if not, the corresponding quality rules may be customized according to business requirements.
Specifically, in the embodiment of the present application, data quality detection may be performed according to different preset data quality rules configured for different completion resource data. And different quality check rule sets are created aiming at different data tables and different service use scenes. Fig. 4 is a schematic page display diagram of a video monitoring rule set provided in an embodiment of the present application, and as shown in fig. 4, the schematic page display diagram is a rule set example of a basic information table of a video monitoring device for face snapshot, specifically, a second-class point, a third-class point or an internal video may be used according to an attribute of the device, then different quality check rule sets are created for different scenes, and then a policy is executed according to configured tasks to perform data quality check.
In the implementation process, a rule set can be directly added, and the following information can be added when the rule set is added: rule set name, applicable scene, data source type, data table, statistics, dimension field, etc.; and a new rule set is realized by perfecting the information. Specifically, the newly added rule in a certain rule set may include a field name, a rule template, a rule level, a rule description, and the like, and when the matching check of a specific field is involved, the contents of an initial position, an interception length, a matching field, and the like need to be set.
More preferably, the performing data quality detection on the completion resource data according to a preset data quality rule includes:
and carrying out data exploration on the completion resource data according to a preset data quality rule, wherein the preset data quality rule comprises one or more of a data field rule, an equipment profiling information verification rule and a video picture quality rule.
In the step, real-time probing during access is adopted, and a data probing tool is built in a data acquisition access tool DTS, so that real-time (flash) and batch probing is supported, and the problem of data quality is given in real time.
Specifically, before the rule is loaded, a system dictionary table and an administrative division table need to be loaded. The data field rules comprise non-null verification, regular expression matching, enumerated value verification, dimension table verification, interval value verification, date verification, same table field matching verification, constant verification of field designated positions, enumerated value verification of field designated positions, check that the field designated positions accord with the regular expression verification, field uniqueness verification, data type verification, character length verification and the like. In addition to the above rules, the device profile information verification rule mentioned in the embodiment of the present application includes administrative division verification, police area verification, longitude and latitude verification, parent device verification, device code and administrative division compliance verification, access network and device code compliance verification, service group ID verification of the virtual organization, platform code pre-4 compliance verification with the administrative division, IP + port uniqueness verification, and the like. When specific rule setting is performed, rule description and examples, such as police area verification, of the rule description and examples are that "the field is a number with a length of 12, the first 8 bits are taken from the national standard code of the camera, and the last four bits are 0000. Reference paradigm: province + city + region + base unit +0000 ". After the data quality check is completed on the equipment profiling information verification rule, the corresponding data can be stored, and the subsequent utilization can be facilitated due to the fact that the corresponding data contains the corresponding position information. In addition to the data field rules and the equipment filing information verification rules, the method also comprises video picture quality rules, wherein the video picture quality rules comprise video coding format verification, video code rate type verification, image resolution verification and image frame rate verification; the quality of the obtained video image data is verified through the rules, and the data can be stored after the corresponding rule verification is completed.
Fig. 5 is a schematic view of page display of quality task management provided in an embodiment of the present application, and as shown in fig. 5, a quality inspection task is created in the quality inspection task management, data quality inspection is performed according to a configured task execution policy, and a quality inspection result is put in storage or an alarm is triggered.
S104: and storing the completion resource data meeting the preset quality requirement.
After the corresponding quality check is completed, the data passing the check is stored, for example, the data passing the video image quality rule and the equipment profiling information check rule is stored, and the data passing the data field rule is not stored. Meanwhile, the quality inspection result can be put in a warehouse or an alarm can be triggered.
More preferably, after the data storage of the completion resource data meeting the preset quality requirement, the method further includes:
and performing alarm operation on the completion resource data which does not meet the preset quality requirement.
Specifically, step S105 is further included after step S104: and carrying out periodic quality analysis on the supplemented resource data subjected to data storage to obtain a corresponding quality report, and displaying the quality report.
Besides the above alarm and data storage, it can also perform data quality detection after access: the accessed data is periodically analyzed, and a quality report is given, so that operation and maintenance personnel can know the data quality condition in time.
The embodiment of the application aims at forming data standards and data quality rules for basic equipment data, population data, logistics data and track data related to video big data. Each data standard has data quality rules that are used in different scenarios. The platform provides data access, data quality detection and data quality inspection functions and carries out unified monitoring management aiming at the tasks. The incidence relation among all data can be constructed through the steps; and each department can conveniently carry out subsequent customized data calling. The method comprises the steps of standardizing resource data received from different data sources by establishing a data quality check rule, and standardizing heterogeneous data in different network domains and service domains; the acquired resource data are further identified by calling different processing interfaces to complete the data, and the data after being completed are subjected to quality detection to judge whether to store or alarm the data.
On the basis of the foregoing embodiment, fig. 6 is a schematic structural diagram of a data quality detection apparatus for multidimensional heterogeneous data according to an embodiment of the present application. Referring to fig. 6, the apparatus for detecting data quality of multidimensional heterogeneous data provided in this embodiment specifically includes:
the receiving module 21: the system comprises a data source, a data processing device and a data processing device, wherein the data processing device is used for receiving resource data transmitted by different data sources, and the resource data comprises basic equipment data, population data, logistics data and track type data;
the conversion module 22: the system is used for carrying out data conversion completion on the resource data according to a preset data table to obtain completed resource data, and fusing the basic equipment data, the logistics data, the track data and the population data in the data conversion completion process to complete the data;
the data probing module 23: performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meet a preset quality requirement;
the data storage module 24: the method is used for storing the data of the completion resource data meeting the preset quality requirement.
The method comprises the steps of standardizing resource data received from different data sources by establishing a data quality check rule, and standardizing heterogeneous data in different network domains and service domains; the acquired resource data are further identified by calling different processing interfaces to complete the data, and the data after being completed are subjected to quality detection to judge whether to store or alarm the data.
The data quality detection device for the multi-dimensional heterogeneous data provided by the embodiment of the application can be used for executing the data quality detection method for the multi-dimensional heterogeneous data provided by the embodiment, and has corresponding functions and beneficial effects.
Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and with reference to fig. 7, the electronic device includes: a processor 31, a memory 32, a communication module 33, an input device 34, and an output device 35. The number of processors 31 in the electronic device may be one or more, and the number of memories 32 in the electronic device may be one or more. The processor 31, the memory 32, the communication module 33, the input device 34 and the output device 35 of the electronic apparatus may be connected by a bus or other means.
The memory 32 is a computer readable storage medium, and can be used for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the data quality detection method for multidimensional heterogeneous data according to any embodiment of the present application (for example, a receiving module, a converting module, a data probing module, and a data storage module in a data quality detection apparatus for multidimensional heterogeneous data). The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The communication module 33 is used for data transmission.
The processor 31 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 32, that is, implements the data quality detection method of the multi-dimensional heterogeneous data.
The input device 34 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 35 may include a display device such as a display screen.
The electronic device provided by the embodiment can be used for executing the data quality detection method of the multi-dimensional heterogeneous data provided by the embodiment, and has corresponding functions and beneficial effects.
The present embodiments also provide a storage medium containing computer-executable instructions, which when executed by a computer processor 31, are configured to perform a data quality detection method for multi-dimensional heterogeneous data, where the data quality detection method for multi-dimensional heterogeneous data includes:
receiving resource data transmitted by different data sources, wherein the resource data comprises basic equipment data, population data, logistics data and track data;
performing data conversion completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic equipment data, the logistics data, the track data and the population data in the data conversion completion process to perform data completion;
performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meet a preset quality requirement;
and storing the completion resource data meeting the preset quality requirement.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media residing in different locations, e.g., in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors 31.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the data quality detection method for multi-dimensional heterogeneous data described above, and may also perform related operations in the data quality detection method for multi-dimensional heterogeneous data provided in any embodiment of the present application.
The data quality detection apparatus, the storage medium, and the electronic device for the multi-dimensional heterogeneous data provided in the foregoing embodiments may perform the data quality detection method for the multi-dimensional heterogeneous data provided in any embodiment of the present application, and refer to the data quality detection method for the multi-dimensional heterogeneous data provided in any embodiment of the present application without detailed technical details described in the foregoing embodiments.
The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (9)

1. A data quality detection method for multi-dimensional heterogeneous data is characterized by comprising the following steps:
receiving resource data transmitted by different data sources, wherein the resource data comprises basic equipment data, population data, logistics data and track data;
performing data conversion completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic equipment data, the logistics data, the track data and the population data in the data conversion completion process to perform data completion, wherein the data extraction is performed on the resource data, and the extracted data is filled into the preset data table; judging whether the filled preset data table has a missing data type, if so, calling corresponding resource data according to the missing data type to further identify the corresponding resource data; identifying corresponding resource data to obtain corresponding position information and/or identity information; performing data completion on the missing data type in a preset data table according to the position information and/or the identity information to obtain completed resource data;
performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meets a preset quality requirement, wherein the preset data quality rule comprises one or more of a data field rule, an equipment profiling information verification rule and a video picture quality rule;
performing data storage on the completion resource data meeting the preset quality requirement, wherein the data passing through the video image quality rule and the equipment filing information verification rule are stored, and the data passing through the data field rule are not stored;
and carrying out periodic quality analysis on the supplemented resource data subjected to data storage to obtain a corresponding quality report, and displaying the quality report.
2. The method for detecting the data quality of the multi-dimensional heterogeneous data according to claim 1, further comprising, after the storing the data of the completed resource data satisfying the preset quality requirement:
and performing alarm operation on the completion resource data which does not meet the preset quality requirement.
3. The method for detecting data quality of multi-dimensional heterogeneous data according to claim 1, further comprising, before the receiving the resource data transmitted from different data sources:
and performing data exploration on resource data transmitted by different data sources before access, wherein the data exploration comprises null value rate exploration, enumerated value exploration and data distribution exploration.
4. The method for detecting the data quality of the multidimensional heterogeneous data according to claim 1, wherein the performing the data quality detection on the completion resource data according to the preset data quality rule comprises:
and configuring different preset data quality rules according to different completion resource data to perform data quality detection.
5. The data quality detection method of the multi-dimensional heterogeneous data according to claim 1, wherein the trajectory class data comprises one or more of face snapshot data, passing vehicle data at a checkpoint, video access control data, video structured data, electric enclosure data and WIFI data;
after the receiving resource data transmitted by different data sources, the resource data including basic device data, population data, logistics data and trajectory class data, the method further includes:
and pushing the face snapshot data, the vehicle passing data of the gate, the video access control data, the video structured data, the electric enclosure data and the WIFI data to a card queue to perform real-time exploration of the data.
6. The method for detecting the data quality of the multi-dimensional heterogeneous data according to claim 1, wherein the identity information is obtained by the following steps:
extracting face information in the face snapshot data, and calling a corresponding identity recognition module to recognize the face information to obtain corresponding identity information; or the like, or, alternatively,
extracting vehicle information in the bayonet vehicle passing data, and extracting corresponding identity information according to the vehicle information, wherein the identity information comprises driver information or vehicle owner information; or the like, or, alternatively,
and acquiring corresponding mobile phone information through WIFI data, extracting a mobile phone number or a mac address according to the mobile phone information, and determining identity information of the owner according to the mobile phone number or the mac address.
7. A data quality detection device for multi-dimensional heterogeneous data is characterized by comprising:
a receiving module: the system comprises a data acquisition module, a data transmission module and a data transmission module, wherein the data acquisition module is used for receiving resource data transmitted by different data sources, and the resource data comprises basic equipment data, population data, logistics data and track data;
a conversion module: the system comprises a data conversion and completion module, a data processing module and a data processing module, wherein the data conversion and completion module is used for performing data conversion and completion on the resource data according to a preset data table to obtain completed resource data, and performing fusion processing on the basic equipment data, the logistics data, the track data and population data in the data conversion and completion process to perform data completion, wherein the data extraction is performed on the resource data, and the extracted data is filled into the preset data table; judging whether the filled preset data table has a missing data type, if so, calling corresponding resource data according to the missing data type to further identify the corresponding resource data; identifying corresponding resource data to obtain corresponding position information and/or identity information; performing data completion on the missing data type in a preset data table according to the position information and/or the identity information to obtain completed resource data;
a data probing module: performing data quality detection on the completion resource data according to a preset data quality rule to judge whether the completion resource data meets a preset quality requirement, wherein the preset data quality rule comprises one or more of a data field rule, an equipment profiling information verification rule and a video picture quality rule;
a data storage module: the method is used for storing data of the completion resource data meeting the preset quality requirement, wherein the method comprises the steps of storing the data passing through a video image quality rule and an equipment filing information verification rule, and not storing the data passing through a data field rule;
the data quality detection device of the multi-dimensional heterogeneous data is further used for: and carrying out periodic quality analysis on the completed resource data subjected to data storage to obtain a corresponding quality report, and displaying the quality report.
8. An electronic device, comprising:
a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method for data quality detection of multi-dimensional heterogeneous data according to any one of claims 1 to 6.
9. A storage medium containing computer-executable instructions for performing the method for data quality detection of multi-dimensional heterogeneous data according to any one of claims 1 to 6 when executed by a computer processor.
CN202011140921.XA 2020-10-22 2020-10-22 Data quality detection method and device for multi-dimensional heterogeneous data Active CN112256682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011140921.XA CN112256682B (en) 2020-10-22 2020-10-22 Data quality detection method and device for multi-dimensional heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011140921.XA CN112256682B (en) 2020-10-22 2020-10-22 Data quality detection method and device for multi-dimensional heterogeneous data

Publications (2)

Publication Number Publication Date
CN112256682A CN112256682A (en) 2021-01-22
CN112256682B true CN112256682B (en) 2022-09-20

Family

ID=74264043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011140921.XA Active CN112256682B (en) 2020-10-22 2020-10-22 Data quality detection method and device for multi-dimensional heterogeneous data

Country Status (1)

Country Link
CN (1) CN112256682B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113760681A (en) * 2021-03-10 2021-12-07 中科天玑数据科技股份有限公司 Unified SQL (structured query language) -based multi-source heterogeneous data quality verification method and system
CN112988734A (en) * 2021-04-29 2021-06-18 贵州数据宝网络科技有限公司 Multi-element and multi-dimensional data fusion treatment method and system
CN113377776A (en) * 2021-06-29 2021-09-10 中煤能源研究院有限责任公司 Intelligent mine data management system, method, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
US20180232407A1 (en) * 2017-02-10 2018-08-16 Wipro Limited Method and system for assessing quality of incremental heterogeneous data
CN108446293A (en) * 2018-01-22 2018-08-24 中电海康集团有限公司 A method of based on urban multi-source isomeric data structure city portrait
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN110851495A (en) * 2019-10-24 2020-02-28 长城计算机软件与系统有限公司 Heterogeneous source data processing method and device, storage medium and electronic equipment
CN111078761A (en) * 2019-12-27 2020-04-28 天津幸福生命科技有限公司 Data probing method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
US20180232407A1 (en) * 2017-02-10 2018-08-16 Wipro Limited Method and system for assessing quality of incremental heterogeneous data
CN108446293A (en) * 2018-01-22 2018-08-24 中电海康集团有限公司 A method of based on urban multi-source isomeric data structure city portrait
CN109656812A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Data quality checking method, apparatus and storage medium
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN110851495A (en) * 2019-10-24 2020-02-28 长城计算机软件与系统有限公司 Heterogeneous source data processing method and device, storage medium and electronic equipment
CN111078761A (en) * 2019-12-27 2020-04-28 天津幸福生命科技有限公司 Data probing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112256682A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112256682B (en) Data quality detection method and device for multi-dimensional heterogeneous data
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN111815421B (en) Tax policy processing method and device, terminal equipment and storage medium
WO2022142685A1 (en) Infection probability prediction method and apparatus for infectious disease, storage medium and electronic device
EP3828732A2 (en) Method and apparatus for processing identity information, electronic device, and storage medium
CN110990235A (en) Performance data management method, device, equipment and medium of heterogeneous storage equipment
CN112953952A (en) Industrial security situation awareness method, platform, electronic device and storage medium
CN111831750A (en) Block chain data analysis method and device, computer equipment and storage medium
CN115567563B (en) Comprehensive transportation hub monitoring and early warning system based on end edge cloud and control method thereof
CN115022201B (en) Data processing function test method, device, equipment and storage medium
CN110967036B (en) Test method and device for navigation product
CN112764839B (en) Big data configuration method and system for management service platform
US20210406391A1 (en) Production Protection Correlation Engine
CN114265759A (en) Tracing method and system after data information leakage and electronic equipment
CN110955709B (en) Data processing method and device and electronic equipment
CN114297495A (en) Service data searching method and device, electronic equipment and storage medium
CN113076308A (en) Space-time big data service system
CN113761443A (en) Website page data acquisition and statistics method, storage medium and equipment
CN108614822B (en) Intelligent event storage and reading method and device
CN111324796A (en) Domain name crawling method and device based on block chain and SDN edge computing network system
CN114584616B (en) Message pushing method and device, electronic equipment and storage medium
CN117648718B (en) Business object display method and device based on data source, electronic equipment and medium
CN116955275B (en) Multi-tenant-based enterprise-level document center implementation method and system
CN110399411B (en) Data source switching method, device, equipment and computer readable storage medium
CN116070268B (en) Privacy data identification monitoring method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Applicant after: Jiadu Technology Group Co.,Ltd.

Address before: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Applicant before: PCI-SUNTEKTECH Co.,Ltd.

GR01 Patent grant
GR01 Patent grant