CN115410718B - Method for evaluating error of investigator in large-scale face-to-face investigation - Google Patents

Method for evaluating error of investigator in large-scale face-to-face investigation Download PDF

Info

Publication number
CN115410718B
CN115410718B CN202110593435.1A CN202110593435A CN115410718B CN 115410718 B CN115410718 B CN 115410718B CN 202110593435 A CN202110593435 A CN 202110593435A CN 115410718 B CN115410718 B CN 115410718B
Authority
CN
China
Prior art keywords
error
investigator
survey
data
wrong
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110593435.1A
Other languages
Chinese (zh)
Other versions
CN115410718A (en
Inventor
赵星
孙承媛
刘祥
郭冰
肖雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110593435.1A priority Critical patent/CN115410718B/en
Publication of CN115410718A publication Critical patent/CN115410718A/en
Application granted granted Critical
Publication of CN115410718B publication Critical patent/CN115410718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for evaluating error of an investigator in large-scale face-to-face investigation, which comprises the steps of obtaining questionnaire data and recording data by a base line investigation; preprocessing questionnaire data and then identifying outlier survey objects by adopting a Fast-MCD algorithm; according to the error evaluation rule of the investigator, recording the record check result of the outlier investigation object, wherein the check result is classified into five types: correct, wrong questioning mode, wrong questioning/not questioning, wrong logging and no verification, wherein the wrong questioning mode, the wrong questioning/not questioning and the wrong logging belong to the error of the investigator; and constructing an error occurrence rate index and an error contribution rate index based on the recording verification data, and evaluating the occurrence condition of the error of the investigator. The outlier detection algorithm is introduced, and recording check work is carried out on abnormal data based on the outlier detection algorithm, so that the error of investigators is found and corrected as much as possible at low cost; the contribution of each surveyor to the error of the surveyor is quantized, and the data quality is improved.

Description

Method for evaluating error of investigator in large-scale face-to-face investigation
Technical Field
The invention relates to the technical field of data quality control, in particular to a method for evaluating error of an investigator in large-scale face-to-face investigation.
Background
In large epidemiological surveys, information is often collected by means of face-to-face surveys. However, the data collection method of face-to-face investigation inevitably introduces investigator errors, and further influences the data quality and the reliability of research results. Conventional epidemiological surveys focus on data quality control through improvements in survey design, enhanced training of investigators, and the like, but the conventional data quality control measures cannot guarantee data quality due to lack of feasible data quality assessment means and limited manpower and material resources.
Disclosure of Invention
The invention aims to provide a method for evaluating error of an investigator in large-scale face-to-face investigation, which is used for solving the problems that data quality control in the prior art lacks data quality evaluation means and cannot obtain quality guarantee due to limited manpower and material resource data quality control measures.
The invention solves the problems through the following technical scheme:
a method of assessing investigator error in a large face-to-face investigation, comprising:
step S1: acquiring questionnaire data and recording data of a baseline survey through an electronic information platform, and generating indexes of the questionnaire data and the recording data according to survey objects;
step S2: after the baseline survey is finished, questionnaire data are exported, and outlier survey objects are identified by adopting a Fast-MCD algorithm after pretreatment, wherein the method specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, and represents that the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfy
Figure BDA0003090109000000011
In order to give consideration to good robustness and calculation efficiency, h takes a value of 0.8n;
step S22: calculating the sample mean value of the h sample data
Figure BDA0003090109000000012
Covariance matrix>
Figure BDA0003090109000000013
Sum covariance determinant
Figure BDA0003090109000000021
Based on->
Figure BDA0003090109000000022
And &>
Figure BDA0003090109000000023
Mahalanobis distances for n panelists were further calculated:
Figure BDA0003090109000000024
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the minimum distance, and calculating the sample mean value of the h survey objects
Figure BDA0003090109000000025
Covariance matrix>
Figure BDA0003090109000000026
Covariance determinant->
Figure BDA0003090109000000027
And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m time
Figure BDA0003090109000000028
The mean and covariance calculated from the mth sample are taken as a robust estimate of the final mean and covariance and recorded as £ er>
Figure BDA0003090109000000029
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
Figure BDA00030901090000000210
step S26: judging the surveyed objects with the Mahalanobis distance larger than a preset value as outliers;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, unquestioned/unquestioned, wrong input, and cannot be verified, wherein the wrong questioning mode, the unquestioned/unquestioned and wrong input belong to the existence of investigator errors;
and step S4: based on the recording check data, an error occurrence rate index and an error contribution rate index are constructed, and the occurrence condition of the error of the investigator is evaluated to obtain:
Figure BDA00030901090000000211
Figure BDA0003090109000000031
Figure BDA0003090109000000032
Figure BDA0003090109000000033
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
Figure BDA0003090109000000034
Figure BDA0003090109000000035
Figure BDA0003090109000000036
Figure BDA0003090109000000037
the number of questions with error of investigator = the number of questions with wrong questioning method + the number of questions with question/question not asked + the number of questions with wrong entry.
Further comprising step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the investigator errors are concentrated on part of the investigators.
Further comprising step S6: according to the calculated error incidence rate ER of different investigators i Further calculating investigator error contribution rate of each investigator as
Figure BDA0003090109000000038
Where k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rate of the investigator of the ith investigator and the jth investigator; />
Figure BDA0003090109000000041
The larger the value, the greater the risk of the investigator error.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) Compared with the conventional means of repeatedly investigating and comparing the consistency of the investigation data, the recording and inspecting method saves manpower and material resources consumed by secondary investigation, and also avoids the problem of data difference caused by different investigation time of two times in repeated investigation.
(2) The outlier detection algorithm is originally introduced, recording verification work is carried out on abnormal data based on the outlier detection algorithm, the error of investigators is found and corrected as much as possible at low cost, and the outlier detection algorithm has a far-reaching application value in large-scale epidemiological investigation.
(3) The method quantifies the contribution of each surveyor to the error of the surveyor, is favorable for reducing the error of the surveyor by taking measures in the future and improves the data quality.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
Example 1:
referring to fig. 1, a method for evaluating error of an investigator in a large-scale face-to-face investigation includes four steps of baseline investigation, outlier detection, recording check and analysis after check, and specifically includes the following steps:
in the first step, during the baseline survey, the data is collected in a face-to-face survey mode and recorded in the whole process. Specifically, an electronic information platform is built to realize informatization of the whole investigation process, and the electronic information platform needs to comprise the following functional modules:
1) Data acquisition module (PAD side): and acquiring questionnaire data and recording data through the PAD, and uploading the questionnaire data and the recording data under the networking condition.
2) Data management module (computer side): generating a unique index of each surveyor, and searching questionnaire data and a recording file of the surveyor through the unique index; and the survey objects meeting the conditions can be inquired through the keywords, and batch export of questionnaires and recorded data is realized.
3) Quality control module (computer side): the questionnaire data and the recording file of a specific survey object are retrieved, and the questionnaire can be checked while listening to the recording, and a quality control report can be filled in.
And step two, exporting questionnaire data after the baseline survey is finished. After deleting the repeated survey objects and the survey objects with missing values, adopting a multivariate outlier detection algorithm-Minimum Covariance Determinant (MCD) (namely Fast-MCD algorithm) to identify abnormal survey objects in the questionnaire data, which specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, and represents that the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfy
Figure BDA0003090109000000051
The larger the value of h is, the more efficient the MCD method operation is, but the lower the robustness of the estimator is, and the value of h is 0.8n in order to give consideration to good robustness and calculation efficiency;
step S22: calculating the sample mean value of the h sample data
Figure BDA0003090109000000052
Covariance matrix>
Figure BDA0003090109000000053
Sum covariance determinant
Figure BDA0003090109000000054
Based on->
Figure BDA0003090109000000055
And &>
Figure BDA0003090109000000056
Mahalanobis distances for n panelists were further calculated:
Figure BDA0003090109000000057
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the minimum distance, and calculating the sample mean value of the h survey objects
Figure BDA0003090109000000058
Covariance matrix>
Figure BDA0003090109000000059
Covariance determinant->
Figure BDA00030901090000000510
And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m time
Figure BDA00030901090000000511
The mean and covariance calculated from the mth sample are taken as the final robust estimates of mean and covariance, denoted as
Figure BDA00030901090000000512
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
Figure BDA0003090109000000061
step S26: MD of surveyor iMCD The larger the value, the more reasonable it is to judge it as an outlier. Since the calculated robust mahalanobis distance is approximately obeyed a chi-square distribution with p degrees of freedom
Figure BDA0003090109000000064
Thus, the Mahalanobis distance is made to exceed a preset value such as
Figure BDA0003090109000000062
The survey object of (2) is determined as an outlier;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, unquestioned/unquestioned, wrong input, and cannot be verified, wherein the wrong questioning mode, the unquestioned/unquestioned and wrong input belong to the existence of investigator errors; the specific investigator error evaluation rules are shown in table 1 below:
Figure BDA0003090109000000063
TABLE 1 investigator error assessment rules
And step S4: based on the recorded sound checking data, an Error occurrence rate index and an Error contribution rate index are constructed, the occurrence condition of the Error of the investigator is evaluated, the total occurrence condition of the Error of the investigator and the probability of the Error of different types of the investigators are reflected by an Error Rate (ER) index, and the calculation of the index is as follows:
Figure BDA0003090109000000071
Figure BDA0003090109000000072
Figure BDA0003090109000000073
Figure BDA0003090109000000074
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
Figure BDA0003090109000000075
Figure BDA0003090109000000076
Figure BDA0003090109000000077
Figure BDA0003090109000000078
the number of questions with error of investigator = the number of questions with wrong questioning method + the number of questions with question/question not asked + the number of questions with wrong entry.
Further comprising step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the error of the investigator is concentrated on part of the investigator.
The research firstly provides an index of error contribution rate to reflect the aggregation tendency of the error of an investigator. Considering that the number of the surveyed objects is different among different surveyors, the more the surveyors survey, the more the number of questions with surveyor errors, and the greater the contribution of the surveyor errors. Therefore, the present study proposes a standardized procedure to estimate the investigator's error contribution rate. The method specifically comprises the following steps:
according to the calculated error incidence rate ER of different investigators i Go forward and go forwardThe error contribution rate of each surveyor is calculated in one step as
Figure BDA0003090109000000081
Where k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rates of the investigators of the ith investigator and the jth investigator; />
Figure BDA0003090109000000082
The larger the value, the greater the risk of investigator error for this investigator.
And sensitivity analysis is carried out to ensure the robustness of the data quality evaluation result. The method comprises the specific steps of randomly extracting a small number of non-outlier individuals, and finishing record checking and analysis. And finally, comparing the recording checking results of the outlier sample and the non-outlier sample, and evaluating the robustness of the research result.
Although the invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be the only preferred embodiments of the invention, it is not intended that the invention be limited thereto, since many other modifications and embodiments will be apparent to those skilled in the art and will be within the spirit and scope of the principles of this disclosure.

Claims (3)

1. A method of assessing investigator error in a large face-to-face investigation, comprising:
step S1: acquiring questionnaire data and recording data of a baseline survey through an electronic information platform, and generating indexes of the questionnaire data and the recording data according to survey objects;
step S2: after the baseline survey is finished, questionnaire data are exported, and after pretreatment, outlier survey objects are identified by adopting a Fast-MCD algorithm, which specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfy
Figure FDA0003090108990000011
h takes the value of 0.8n;
step S22: calculating the sample mean value of the h sample data
Figure FDA0003090108990000012
Covariance matrix>
Figure FDA0003090108990000013
And covariance determinant>
Figure FDA0003090108990000014
Based on->
Figure FDA0003090108990000015
And &>
Figure FDA0003090108990000016
Mahalanobis distances for n panelists were further calculated:
Figure FDA0003090108990000017
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the smallest distance, and calculating the sample mean value of the h survey objects
Figure FDA0003090108990000018
Covariance matrix>
Figure FDA0003090108990000019
Covariance determinant->
Figure FDA00030901089900000110
And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m time
Figure FDA00030901089900000111
The mean and covariance calculated from the mth sample are taken as the final robust estimates of mean and covariance, and are recorded as
Figure FDA00030901089900000112
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
Figure FDA00030901089900000113
step S26: judging the surveyed objects with the Mahalanobis distance larger than a preset value as outliers;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, wrong questioning/not questioning, wrong logging and no verification, wherein the wrong questioning mode, the wrong questioning/not questioning and the wrong logging belong to the error of the investigator;
and step S4: based on the recording check data, an error occurrence rate index and an error contribution rate index are constructed, and the occurrence condition of the error of the investigator is evaluated to obtain:
Figure FDA0003090108990000021
Figure FDA0003090108990000022
/>
Figure FDA0003090108990000023
Figure FDA0003090108990000024
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
Figure FDA0003090108990000025
Figure FDA0003090108990000026
Figure FDA0003090108990000027
Figure FDA0003090108990000028
the number of questions with investigator error = the number of questions with questioning mode error + the number of questions with question/question not asked + the number of questions with entry error.
2. The method of evaluating surveyor's errors in a large face-to-face survey according to claim 1, further comprising:
step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the error of the investigator is concentrated on part of the investigator.
3. A method of assessing investigator error in a large face-to-face investigation according to claim 2, further comprising:
step S6: according to the calculated error incidence rate ER of different investigators i Further calculating investigator error contribution rate of each investigator as
Figure FDA0003090108990000031
Where k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rate of the investigator of the ith investigator and the jth investigator; />
Figure FDA0003090108990000032
The larger the value, the greater the risk of the investigator error. />
CN202110593435.1A 2021-05-28 Method for evaluating error of investigator in large-scale face-to-face investigation Active CN115410718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110593435.1A CN115410718B (en) 2021-05-28 Method for evaluating error of investigator in large-scale face-to-face investigation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110593435.1A CN115410718B (en) 2021-05-28 Method for evaluating error of investigator in large-scale face-to-face investigation

Publications (2)

Publication Number Publication Date
CN115410718A CN115410718A (en) 2022-11-29
CN115410718B true CN115410718B (en) 2023-04-18

Family

ID=

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014071776A1 (en) * 2012-11-06 2014-05-15 中兴通讯股份有限公司 Method and system for evaluating quality of experience of communication service users
CN107169734A (en) * 2017-05-10 2017-09-15 美亚联创(北京)科技有限公司 A kind of social investigation management system
JP2019100011A (en) * 2017-11-29 2019-06-24 清水建設株式会社 Determination method for positioning subsoil exploration, determination device, subsoil estimation method and subsoil estimation device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014071776A1 (en) * 2012-11-06 2014-05-15 中兴通讯股份有限公司 Method and system for evaluating quality of experience of communication service users
CN107169734A (en) * 2017-05-10 2017-09-15 美亚联创(北京)科技有限公司 A kind of social investigation management system
JP2019100011A (en) * 2017-11-29 2019-06-24 清水建設株式会社 Determination method for positioning subsoil exploration, determination device, subsoil estimation method and subsoil estimation device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘佳丽 ; 王薇 ; 何法霖 ; 钟堃;袁帅;张志新 ; 杜雨轩 ; 王治国 ; .全国566家临床实验室血清降钙素原室内质量控制不精密度调查与分析.现代检验医学杂志.2018,(第02期),全文. *
李婷婷 ; 王薇 ; 赵海建 ; 何法霖 ; 钟堃; 袁帅 ; 王治国 ; .临床实验室室间质量评价结果解释及不合格结果原因调查的建议性方法.现代检验医学杂志.2017,(第05期),全文. *

Similar Documents

Publication Publication Date Title
US20200350058A1 (en) Chinese medicine production process knowledge system
CN112257963B (en) Defect prediction method and device based on spaceflight software defect data distribution outlier
CN112700325A (en) Method for predicting online credit return customers based on Stacking ensemble learning
CN111177655B (en) Data processing method and device and electronic equipment
CN110472209B (en) Deep learning-based table generation method and device and computer equipment
CN114490404A (en) Test case determination method and device, electronic equipment and storage medium
CN110956543A (en) Method for detecting abnormal transaction
CN115410718B (en) Method for evaluating error of investigator in large-scale face-to-face investigation
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN112906672A (en) Steel rail defect identification method and system
CN116049157B (en) Quality data analysis method and system
Dhiman et al. A Clustered Approach to Analyze the Software Quality Using Software Defects
CN115410718A (en) Method for evaluating error of investigator in large-scale face-to-face investigation
CN114519437B (en) Cloud-based micro-service method and system for fault diagnosis analysis and repair reporting
CN115422821A (en) Data processing method and device for rock mass parameter prediction
Agarwal et al. A machine learning model to prune insignificant attributes
CN112732773B (en) Method and system for checking uniqueness of relay protection defect data
CN114625901A (en) Multi-algorithm integration method and device
CN114722960A (en) Method and system for detecting incomplete track of event log in business process
CN111897853A (en) Big data-based computer data mining and exploring method and system
Kumar et al. Student’s Performance Analysis with EDA and Machine Learning Models
CN116069674B (en) Security assessment method and system for grade assessment
Khoussi et al. A neural networks-based methodology for fitting data to probability distributions
Vinodha et al. Framework for Improving the Accuracy of the Machine Learning Model in Predicting Future Values
Adjei et al. Modelling heterogeneity in the classification process in multi‐species distribution models can improve predictive performance

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant