CN115599775A - Data quality management method and device for market research and storage medium - Google Patents

Data quality management method and device for market research and storage medium Download PDF

Info

Publication number
CN115599775A
CN115599775A CN202211264642.3A CN202211264642A CN115599775A CN 115599775 A CN115599775 A CN 115599775A CN 202211264642 A CN202211264642 A CN 202211264642A CN 115599775 A CN115599775 A CN 115599775A
Authority
CN
China
Prior art keywords
data
research
desensitization
quality
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211264642.3A
Other languages
Chinese (zh)
Inventor
何明龙
曾广层
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wanren Market Research Co ltd
Original Assignee
Shenzhen Wanren Market Research Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wanren Market Research Co ltd filed Critical Shenzhen Wanren Market Research Co ltd
Priority to CN202211264642.3A priority Critical patent/CN115599775A/en
Publication of CN115599775A publication Critical patent/CN115599775A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The invention discloses a data quality management method, a data quality management device and a storage medium for market research, and relates to the technical field of research data management. The invention comprises the following steps: creating a prepositive investigation database; cleaning the investigated data and removing dirty data; carrying out hierarchical desensitization on sensitive information of static data and dynamic data in the data respectively through desensitization rules; flow calculation is carried out on desensitized data to monitor data quality abnormity on line; and calculating the quality in the monitoring data off line. The quality management method comprises the steps of carrying out data quality management on the research data, including data cleaning and data desensitization, screening dirty data and abnormal data, improving the quality of the research data, carrying out data grading desensitization on different desensitization rules to obtain desensitization data of different degrees, and carrying out dynamic visual and convenient-to-operate quality management condition setting method and quality management task setting method, so that the operation difficulty of the research data is reduced, and the management efficiency of the research data is improved.

Description

Data quality management method and device for market research and storage medium
Technical Field
The invention belongs to the technical field of research data management, and particularly relates to a data quality management method, a data quality management device and a storage medium for market research.
Background
The scientific method for application in market research is a general term for market research and research, and is an essential component in the working process of systematically designing, collecting, recording, sorting, analyzing and researching various information data and report research results of the market and the market forecasting and operating decision process in the market research and the market research.
Data generated in the market research process is an important data asset of an enterprise, and effective analysis and utilization of the data can greatly help the enterprise reduce cost and improve efficiency and support production and operation decisions. However, data is from each business system, and the data quality is often not high, so that the data analysis result is inaccurate, and the availability and the value of the enterprise data assets are seriously influenced. Data quality is an important factor limiting the data value, and low-quality data cannot create value in application, but causes resource waste, even irretrievable loss or disaster, so that the data quality needs to be managed.
In the prior art, the quality problem of data is often judged by a manual review mode. However, depending on the manual judgment of the auditor, the misjudgment and the missed judgment are easy to occur, and the efficiency is low.
Disclosure of Invention
The invention aims to provide a data quality management method, a device and a storage medium for market research, which are used for carrying out data quality management on research data, including data cleaning and data desensitization, screening out dirty data and abnormal data, and carrying out data grading desensitization on different desensitization rules to obtain desensitization data with different degrees, thereby solving the problems of complex process and low efficiency of the existing data quality management conditions.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a data quality management method for market research, which comprises the following steps:
step S1: dividing research fields of market research according to the industry, research direction and research method, and formulating research standard indexes and index standard options of each research field to form a preposed research database;
step S2: establishing a pre-investigation database according to a data standard, and mapping external data to the pre-investigation database according to the data standard through a data acquisition interface;
and step S3: cleaning the investigated data and removing dirty data;
and step S4: carrying out hierarchical desensitization on sensitive information of static data and dynamic data in the data respectively through desensitization rules;
step S5: flow calculation is carried out on desensitized data to monitor data quality abnormity on line;
step S6: and (5) calculating the quality in the monitoring data off line.
As a preferred technical solution, in the step S1, the classification of the research standard index is completed before the standard library is formed; the investigation standard index classification is specifically classified according to an execution index, an identity index, a value index and other indexes; the other indexes comprise the problem of investigation which cannot be standardized; the value indicator includes a research content portion and a research direction portion.
As a preferred technical solution, in step S2, a corresponding research field is determined according to the description of the research data, a first mapping relationship is established according to the research question and a corresponding research standard index in the research field, and a second mapping relationship is established according to the question option and a corresponding index standard option; meanwhile, the investigation question is converted into a standard index according to the first mapping relation, and the question option is converted into a standard option according to the second mapping relation.
As a preferred technical solution, in the step S3, investigating data quality criteria includes: accuracy, completeness, consistency, effectiveness, uniqueness, timeliness and stability; cleaning the research data comprises checking the consistency of the data, and eliminating invalid values and missing values.
As a preferred technical solution, in the step S4, the static data is desensitized in batch by using a Flume system; desensitizing the dynamic data by adopting a spark streaming system; the desensitization rules include synonymy replacement, partial data masking, hybrid masking, deterministic masking, and reversible desensitization.
As a preferred technical solution, in the step S5, when online monitoring data quality abnormality judgment is performed on desensitized data, first, an application scenario corresponding to a data quality judgment rule is determined, and when a target trigger condition is satisfied, a judgment rule corresponding to a target application scenario is determined as an effectiveness rule; wherein the target application scene is one of the application scenes; and acquiring input data in a target application scene, and screening dirty data from the input data according to the validation rule.
The invention relates to a data quality management device for market research, which comprises a data acquisition unit, a quality rule unit, a data cleaning unit, a data desensitization unit, a data processing module and a data display unit, wherein the data acquisition unit is used for acquiring data;
the data acquisition unit acquires and stores the research data through the data acquisition interface;
the quality rule unit updates and stores the data quality index and the standard value thereof;
the data cleaning unit is used for cleaning the research data by using a cleaning rule according to the quality index of the quality rule unit;
the data desensitization unit updates and stores desensitization rules and corresponding client types, and performs data desensitization of different degrees according to the desensitization rules and the corresponding client types;
the data processing module processes the measured data through calculation;
and the data display unit displays data desensitization results of different degrees according to different types of clients.
As a preferred technical scheme, the quality rule unit comprises a data extraction unit, a definition unit, a verification unit, an analysis unit and a governance unit; the extraction unit is used for providing a multi-investigation data adapter and an ETL tool and extracting data from the data; the definition unit is configured with a check rule and a check standard; the verification unit verifies the extracted data through the data verification engine and the verification standard of the definition unit and stores the verification result into the relational database; the analysis unit displays a check result, counts a report form, reports problems, analyzes data and positions problems; and the management unit carries out data synchronization, data error correction, data recruitment and data deduplication.
As a preferred technical solution, the data processing module includes an ID conversion service, a formula calculation service, and a state judgment; the ID conversion service converts the equipment ID in the message in real time through a mapping relation and sends the converted data back to a message bus for the data storage service to store in a historical database, a real-time database and a hot database; the formula calculation service determines the measurement type to be calculated after receiving the message, obtains a real-time calculation result and stores the real-time calculation result in the MPP for a front-end interface to inquire and display; and the state judgment realizes the state of the real-time synchronous data of the source end in the access process.
The present invention is a data quality management storage medium for market research, having stored thereon a computer program which, when executed by a cloud server, performs the steps of the method of any one of claims 1 to 6.
The invention has the following beneficial effects:
the quality management method comprises the steps of carrying out data quality management on the research data, including data cleaning and data desensitization, screening dirty data and abnormal data, improving the quality of the research data, carrying out data grading desensitization on different desensitization rules to obtain desensitization data of different degrees, and carrying out dynamic visual and convenient-to-operate quality management condition setting method and quality management task setting method, so that the operation difficulty of the research data is reduced, and the management efficiency of the research data is improved.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a data quality management method for market research;
fig. 2 is a schematic structural diagram of a data quality management apparatus for market research.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present invention is a data quality management method for market research, including the following steps:
step S1: dividing research fields of market research according to the industry, research direction and research method, and formulating research standard indexes and index standard options of each research field to form a preposed research database;
in the step S1, completing research standard index classification before forming a standard library; the research standard index classification is specifically classified according to an execution index, an identity index, a value index and other indexes; other indicators include research issues that cannot be standardized; the value indicator includes a research content portion and a research direction portion.
Step S2: establishing a pre-investigation database according to a data standard, and mapping external data to the pre-investigation database according to the data standard through a data acquisition interface;
step S2, determining a corresponding research field according to the investigation data description, establishing a first mapping relation according to the investigation question and a corresponding investigation standard index in the research field, and establishing a second mapping relation according to the question option and a corresponding index standard option; meanwhile, the investigation question is converted into a standard index according to the first mapping relation, and the question option is converted into a standard option according to the second mapping relation.
And step S3: cleaning the investigated data and removing dirty data;
in step S3, the quality standard of the research data includes: accuracy, completeness, consistency, effectiveness, uniqueness, timeliness and stability; cleaning the research data comprises checking the consistency of the data, and eliminating invalid values and missing values.
And step S4: carrying out hierarchical desensitization on sensitive information of static data and dynamic data in the data respectively through desensitization rules;
in the step S4, the static data is subjected to batch desensitization by adopting a Flume system; desensitizing dynamic data by adopting a spark streaming system; desensitization rules include synonymy replacement, partial data masking, hybrid masking, deterministic masking, and reversible desensitization.
The static data desensitization refers to the process of desensitizing and privacy removal of data files and simultaneously ensuring the association relationship among data. For example: and encrypting the identity card, the name, the mobile phone number and the like for the important fields of the user. The desensitization mode is suitable for a project development unit to ensure that data analysis work can be successfully completed only by acquiring complete data, and the data provider does not want sensitive data to be leaked out. Therefore, data sharing and result consistency in the development process are guaranteed, and real data are guaranteed not to be leaked in the development process.
Dynamic data desensitization is transparent, real-time desensitization of sensitive data in a database. The user level is generally defined according to the role and the responsibility of the user, and the data returned by the production database is dynamically specially shielded, encrypted and hidden, so that the users at different levels can be ensured to access sensitive data to different degrees, and the data in the production database does not need to be changed at all.
Step S5: carrying out flow calculation on desensitized data to monitor data quality abnormity on line;
in step S5, when online monitoring data quality abnormity judgment is carried out on desensitized data, firstly, an application scene corresponding to a data quality judgment rule is determined, and when a target trigger condition is met, a judgment rule corresponding to a target application scene is determined as an effect rule; the target application scene is one of the application scenes; and acquiring input data in a target application scene, and screening dirty data from the input data according to an effective rule.
Step S6: and calculating the quality in the monitoring data off line.
Example two
Referring to fig. 2, the present invention is a data quality management apparatus for market research, including a data acquisition unit, a quality rule unit, a data cleaning unit, a data desensitization unit, a data processing module, and a data display unit;
the data acquisition unit acquires and stores the research data through the data acquisition interface;
the quality rule unit updates and stores the data quality index and the standard value thereof;
the data cleaning unit is used for cleaning the research data by using a cleaning rule according to the quality index of the quality rule unit;
the data desensitization unit updates and stores desensitization rules and corresponding client types, and performs data desensitization of different degrees according to the desensitization rules and the corresponding client types;
the data processing module is used for processing the measured data through calculation;
and the data display unit displays data desensitization results of different degrees according to different types of clients.
The quality rule unit comprises a data extraction unit, a definition unit, a verification unit, an analysis unit and a treatment unit; the extraction unit provides a multi-investigation data adapter and an ETL tool and extracts data from the data; defining a unit configuration check rule and a check standard; the verification unit verifies the extracted data through the data verification engine and the verification standard of the definition unit and stores the verification result into the relational database; the analysis unit displays the verification result, counts the report form, reports the problem, analyzes the data and positions the problem; the management unit carries out data synchronization, data error correction, data recruitment and data deduplication.
The data processing module comprises ID conversion service, formula calculation service and state judgment; the ID conversion service converts the equipment ID in the message in real time through a mapping relation and sends the converted data back to a message bus for the data storage service to store the data in a historical database, a real-time database and a hot database; the formula calculation service determines the measurement type to be calculated after receiving the message, obtains a real-time calculation result and stores the real-time calculation result in the MPP for the front-end interface to inquire and display; and the state judgment is realized in the access process, and the state of the real-time synchronous data of the source end is obtained.
The data processing module and the online quality monitoring module can detect abnormal data in real time and feed back data quality problems in an online monitoring mode, and after measured data are stored in databases, the data quality can be monitored from multiple angles through the quality problems existing in the data monitored by the data quality tool in an offline mode.
EXAMPLE III
The present invention is a data quality management storage medium for market research, having stored thereon a computer program which, when executed by a cloud server, performs the steps of the method of any one of claims 1 to 6.
It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In addition, those skilled in the art can understand that all or part of the steps in the method for implementing the embodiments described above can be implemented by a program to instruct related hardware, and the corresponding program can be stored in a computer readable storage medium.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand the invention for and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (10)

1. A data quality management method for market research is characterized by comprising the following steps:
step S1: dividing research fields of market research according to the industry, research direction and research method, and formulating research standard indexes and index standard options of each research field to form a preposed research database;
step S2: establishing a pre-investigation database according to the data standard, and mapping external data to the pre-investigation database according to the data standard through a data acquisition interface;
and step S3: cleaning the investigated data and removing dirty data;
and step S4: carrying out hierarchical desensitization on sensitive information of static data and dynamic data in the data respectively through desensitization rules;
step S5: flow calculation is carried out on desensitized data to monitor data quality abnormity on line;
step S6: and calculating the quality in the monitoring data off line.
2. The data quality management method for market research according to claim 1, wherein in step S1, classification of research criteria index is completed before forming criteria library; the investigation standard index classification is specifically classified according to an execution index, an identity index, a value index and other indexes; the other indicators include research issues that cannot be standardized; the value indicator includes a research content portion and a research direction portion.
3. The data quality management method for market research according to claim 1, wherein in step S2, the corresponding research field is determined according to the research data description, a first mapping relationship is established according to the research question and the corresponding research standard index in the research field, and a second mapping relationship is established according to the question option and the corresponding index standard option; meanwhile, the investigation question is converted into a standard index according to the first mapping relation, and the question option is converted into a standard option according to the second mapping relation.
4. The method as claimed in claim 1, wherein the step S3 of investigating data quality criteria includes: accuracy, completeness, consistency, effectiveness, uniqueness, timeliness and stability; cleaning the research data comprises checking the consistency of the data, and eliminating invalid values and missing values.
5. The data quality management method for market research according to claim 1, wherein in the step S4, the static data is subjected to batch desensitization by using a Flume system; desensitizing the dynamic data by adopting a spark streaming system; the desensitization rules include synonymy replacement, partial data masking, hybrid masking, deterministic masking, and reversible desensitization.
6. The data quality management method for market research according to claim 1, wherein in step S5, when desensitization data is judged to have abnormal quality in online monitoring data, an application scenario corresponding to a data quality judgment rule is determined first, and when a target trigger condition is satisfied, the judgment rule corresponding to a target application scenario is determined to be an effectiveness rule; wherein the target application scene is one of the application scenes; and acquiring input data in a target application scene, and screening dirty data from the input data according to the validation rule.
7. A data quality management device for market research is characterized by comprising a data acquisition unit, a quality rule unit, a data cleaning unit, a data desensitization unit, a data processing module and a data display unit;
the data acquisition unit acquires and stores the research data through the data acquisition interface;
the quality rule unit updates and stores the data quality index and the standard value thereof;
the data cleaning unit is used for cleaning the research data by using a cleaning rule according to the quality index of the quality rule unit;
the data desensitization unit updates and stores desensitization rules and corresponding client types, and performs data desensitization of different degrees according to the desensitization rules and the corresponding client types;
the data processing module processes the measured data through calculation;
and the data display unit displays data desensitization results of different degrees according to different types of clients.
8. The data quality management device for market research according to claim 7, wherein the quality rule unit comprises a data extraction unit, a definition unit, a verification unit, an analysis unit and a governance unit; the extraction unit provides a multi-research data adapter and an ETL tool and extracts data from the data; the definition unit is configured with a check rule and a check standard; the checking unit checks the extracted data through the data checking engine and the checking standard of the definition unit and stores the checking result into the relational database; the analysis unit displays the verification result, counts a report form, reports problems, analyzes data and positions the problems; and the management unit performs data synchronization, data error correction, data recruitment and data deduplication.
9. The data quality management device for market research according to claim 7, wherein the data processing module comprises an ID conversion service, a formula calculation service and a status judgment; the ID conversion service converts the equipment ID in the message in real time through a mapping relation and sends the converted data back to a message bus for the data storage service to store the data in a historical database, a real-time database and a thermal database; the formula calculation service determines the measurement type to be calculated after receiving the message, obtains a real-time calculation result and stores the real-time calculation result in the MPP for a front-end interface to inquire and display; and the state judgment realizes that the state of the real-time synchronous data of the source end is obtained in the access process.
10. A data quality management storage medium for market research, having stored thereon a computer program, wherein the computer program, when executed by a cloud server, implements the steps of the method of any one of claims 1 to 6.
CN202211264642.3A 2022-10-17 2022-10-17 Data quality management method and device for market research and storage medium Pending CN115599775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211264642.3A CN115599775A (en) 2022-10-17 2022-10-17 Data quality management method and device for market research and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211264642.3A CN115599775A (en) 2022-10-17 2022-10-17 Data quality management method and device for market research and storage medium

Publications (1)

Publication Number Publication Date
CN115599775A true CN115599775A (en) 2023-01-13

Family

ID=84846775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211264642.3A Pending CN115599775A (en) 2022-10-17 2022-10-17 Data quality management method and device for market research and storage medium

Country Status (1)

Country Link
CN (1) CN115599775A (en)

Similar Documents

Publication Publication Date Title
CN109739742B (en) Service data checking method, device and equipment
CN105868373B (en) Method and device for processing key data of power business information system
CN112307057A (en) Data processing method and device, electronic equipment and computer storage medium
CN113157489B (en) Database data backup method and device and computer equipment
CN113242157B (en) Centralized data quality monitoring method under distributed processing environment
CN114880405A (en) Data lake-based data processing method and system
CN111581104A (en) DevOps research and development operation integration-based measurement method
CN109241043B (en) Data quality detection method and device
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
CN106951360B (en) Data statistical integrity calculation method and system
CN114153914A (en) Power plant equipment defect visualization system, method, computer equipment and storage medium
CN113965389A (en) Network security management method, equipment and medium based on firewall log
CN111639016A (en) Big data log analysis method and device and computer storage medium
CN112668314A (en) Data standard conformance detection method, device, system and storage medium
CN110928942A (en) Index data monitoring and management method and device
CN115599775A (en) Data quality management method and device for market research and storage medium
CN113806343B (en) Evaluation method and system for Internet of vehicles data quality
CN111523764A (en) Business architecture detection method, device, tool, electronic equipment and medium
CN111078783A (en) Data management visualization method based on supervision and protection
CN115509797A (en) Method, device, equipment and medium for determining fault category
CN109933798B (en) Audit log analysis method and audit log analysis device
CN114610561A (en) System monitoring method, device, electronic equipment and computer readable storage medium
CN112579352A (en) Quality monitoring result generation method, storage medium and quality monitoring system of service data processing link
CN112667624A (en) Data quality management method and system thereof
CN112579699A (en) Quality monitoring method, system and storage medium for service data processing link

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination