CN115410718B - Method for evaluating error of investigator in large-scale face-to-face investigation - Google Patents
Method for evaluating error of investigator in large-scale face-to-face investigation Download PDFInfo
- Publication number
- CN115410718B CN115410718B CN202110593435.1A CN202110593435A CN115410718B CN 115410718 B CN115410718 B CN 115410718B CN 202110593435 A CN202110593435 A CN 202110593435A CN 115410718 B CN115410718 B CN 115410718B
- Authority
- CN
- China
- Prior art keywords
- error
- investigator
- survey
- data
- wrong
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses a method for evaluating error of an investigator in large-scale face-to-face investigation, which comprises the steps of obtaining questionnaire data and recording data by a base line investigation; preprocessing questionnaire data and then identifying outlier survey objects by adopting a Fast-MCD algorithm; according to the error evaluation rule of the investigator, recording the record check result of the outlier investigation object, wherein the check result is classified into five types: correct, wrong questioning mode, wrong questioning/not questioning, wrong logging and no verification, wherein the wrong questioning mode, the wrong questioning/not questioning and the wrong logging belong to the error of the investigator; and constructing an error occurrence rate index and an error contribution rate index based on the recording verification data, and evaluating the occurrence condition of the error of the investigator. The outlier detection algorithm is introduced, and recording check work is carried out on abnormal data based on the outlier detection algorithm, so that the error of investigators is found and corrected as much as possible at low cost; the contribution of each surveyor to the error of the surveyor is quantized, and the data quality is improved.
Description
Technical Field
The invention relates to the technical field of data quality control, in particular to a method for evaluating error of an investigator in large-scale face-to-face investigation.
Background
In large epidemiological surveys, information is often collected by means of face-to-face surveys. However, the data collection method of face-to-face investigation inevitably introduces investigator errors, and further influences the data quality and the reliability of research results. Conventional epidemiological surveys focus on data quality control through improvements in survey design, enhanced training of investigators, and the like, but the conventional data quality control measures cannot guarantee data quality due to lack of feasible data quality assessment means and limited manpower and material resources.
Disclosure of Invention
The invention aims to provide a method for evaluating error of an investigator in large-scale face-to-face investigation, which is used for solving the problems that data quality control in the prior art lacks data quality evaluation means and cannot obtain quality guarantee due to limited manpower and material resource data quality control measures.
The invention solves the problems through the following technical scheme:
a method of assessing investigator error in a large face-to-face investigation, comprising:
step S1: acquiring questionnaire data and recording data of a baseline survey through an electronic information platform, and generating indexes of the questionnaire data and the recording data according to survey objects;
step S2: after the baseline survey is finished, questionnaire data are exported, and outlier survey objects are identified by adopting a Fast-MCD algorithm after pretreatment, wherein the method specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, and represents that the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfyIn order to give consideration to good robustness and calculation efficiency, h takes a value of 0.8n;
step S22: calculating the sample mean value of the h sample dataCovariance matrix>Sum covariance determinantBased on->And &>Mahalanobis distances for n panelists were further calculated:
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the minimum distance, and calculating the sample mean value of the h survey objectsCovariance matrix>Covariance determinant->And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m timeThe mean and covariance calculated from the mth sample are taken as a robust estimate of the final mean and covariance and recorded as £ er>
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
step S26: judging the surveyed objects with the Mahalanobis distance larger than a preset value as outliers;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, unquestioned/unquestioned, wrong input, and cannot be verified, wherein the wrong questioning mode, the unquestioned/unquestioned and wrong input belong to the existence of investigator errors;
and step S4: based on the recording check data, an error occurrence rate index and an error contribution rate index are constructed, and the occurrence condition of the error of the investigator is evaluated to obtain:
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
the number of questions with error of investigator = the number of questions with wrong questioning method + the number of questions with question/question not asked + the number of questions with wrong entry.
Further comprising step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the investigator errors are concentrated on part of the investigators.
Further comprising step S6: according to the calculated error incidence rate ER of different investigators i Further calculating investigator error contribution rate of each investigator asWhere k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rate of the investigator of the ith investigator and the jth investigator; />The larger the value, the greater the risk of the investigator error.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) Compared with the conventional means of repeatedly investigating and comparing the consistency of the investigation data, the recording and inspecting method saves manpower and material resources consumed by secondary investigation, and also avoids the problem of data difference caused by different investigation time of two times in repeated investigation.
(2) The outlier detection algorithm is originally introduced, recording verification work is carried out on abnormal data based on the outlier detection algorithm, the error of investigators is found and corrected as much as possible at low cost, and the outlier detection algorithm has a far-reaching application value in large-scale epidemiological investigation.
(3) The method quantifies the contribution of each surveyor to the error of the surveyor, is favorable for reducing the error of the surveyor by taking measures in the future and improves the data quality.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
Example 1:
referring to fig. 1, a method for evaluating error of an investigator in a large-scale face-to-face investigation includes four steps of baseline investigation, outlier detection, recording check and analysis after check, and specifically includes the following steps:
in the first step, during the baseline survey, the data is collected in a face-to-face survey mode and recorded in the whole process. Specifically, an electronic information platform is built to realize informatization of the whole investigation process, and the electronic information platform needs to comprise the following functional modules:
1) Data acquisition module (PAD side): and acquiring questionnaire data and recording data through the PAD, and uploading the questionnaire data and the recording data under the networking condition.
2) Data management module (computer side): generating a unique index of each surveyor, and searching questionnaire data and a recording file of the surveyor through the unique index; and the survey objects meeting the conditions can be inquired through the keywords, and batch export of questionnaires and recorded data is realized.
3) Quality control module (computer side): the questionnaire data and the recording file of a specific survey object are retrieved, and the questionnaire can be checked while listening to the recording, and a quality control report can be filled in.
And step two, exporting questionnaire data after the baseline survey is finished. After deleting the repeated survey objects and the survey objects with missing values, adopting a multivariate outlier detection algorithm-Minimum Covariance Determinant (MCD) (namely Fast-MCD algorithm) to identify abnormal survey objects in the questionnaire data, which specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, and represents that the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfyThe larger the value of h is, the more efficient the MCD method operation is, but the lower the robustness of the estimator is, and the value of h is 0.8n in order to give consideration to good robustness and calculation efficiency;
step S22: calculating the sample mean value of the h sample dataCovariance matrix>Sum covariance determinantBased on->And &>Mahalanobis distances for n panelists were further calculated:
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the minimum distance, and calculating the sample mean value of the h survey objectsCovariance matrix>Covariance determinant->And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m timeThe mean and covariance calculated from the mth sample are taken as the final robust estimates of mean and covariance, denoted as
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
step S26: MD of surveyor iMCD The larger the value, the more reasonable it is to judge it as an outlier. Since the calculated robust mahalanobis distance is approximately obeyed a chi-square distribution with p degrees of freedomThus, the Mahalanobis distance is made to exceed a preset value such asThe survey object of (2) is determined as an outlier;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, unquestioned/unquestioned, wrong input, and cannot be verified, wherein the wrong questioning mode, the unquestioned/unquestioned and wrong input belong to the existence of investigator errors; the specific investigator error evaluation rules are shown in table 1 below:
TABLE 1 investigator error assessment rules
And step S4: based on the recorded sound checking data, an Error occurrence rate index and an Error contribution rate index are constructed, the occurrence condition of the Error of the investigator is evaluated, the total occurrence condition of the Error of the investigator and the probability of the Error of different types of the investigators are reflected by an Error Rate (ER) index, and the calculation of the index is as follows:
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
the number of questions with error of investigator = the number of questions with wrong questioning method + the number of questions with question/question not asked + the number of questions with wrong entry.
Further comprising step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the error of the investigator is concentrated on part of the investigator.
The research firstly provides an index of error contribution rate to reflect the aggregation tendency of the error of an investigator. Considering that the number of the surveyed objects is different among different surveyors, the more the surveyors survey, the more the number of questions with surveyor errors, and the greater the contribution of the surveyor errors. Therefore, the present study proposes a standardized procedure to estimate the investigator's error contribution rate. The method specifically comprises the following steps:
according to the calculated error incidence rate ER of different investigators i Go forward and go forwardThe error contribution rate of each surveyor is calculated in one step asWhere k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rates of the investigators of the ith investigator and the jth investigator; />The larger the value, the greater the risk of investigator error for this investigator.
And sensitivity analysis is carried out to ensure the robustness of the data quality evaluation result. The method comprises the specific steps of randomly extracting a small number of non-outlier individuals, and finishing record checking and analysis. And finally, comparing the recording checking results of the outlier sample and the non-outlier sample, and evaluating the robustness of the research result.
Although the invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be the only preferred embodiments of the invention, it is not intended that the invention be limited thereto, since many other modifications and embodiments will be apparent to those skilled in the art and will be within the spirit and scope of the principles of this disclosure.
Claims (3)
1. A method of assessing investigator error in a large face-to-face investigation, comprising:
step S1: acquiring questionnaire data and recording data of a baseline survey through an electronic information platform, and generating indexes of the questionnaire data and the recording data according to survey objects;
step S2: after the baseline survey is finished, questionnaire data are exported, and after pretreatment, outlier survey objects are identified by adopting a Fast-MCD algorithm, which specifically comprises the following steps:
step S21: the questionnaire data comprises n rows and p columns, the questionnaire data comprises n survey objects, each survey object comprises information of p variables, then h sample data are extracted from the n survey objects, wherein the value of h must satisfyh takes the value of 0.8n;
step S22: calculating the sample mean value of the h sample dataCovariance matrix>And covariance determinant>Based on->And &>Mahalanobis distances for n panelists were further calculated:
step S23: sorting the Mahalanobis distances of the n survey objects from small to large, selecting h survey objects with the smallest distance, and calculating the sample mean value of the h survey objectsCovariance matrix>Covariance determinant->And mahalanobis distances for h panelists;
step S24: performing iterative calculation according to the steps S21 to S23, if the m timeThe mean and covariance calculated from the mth sample are taken as the final robust estimates of mean and covariance, and are recorded as
Step S25: based on the robust estimator, mahalanobis distances are calculated for all panelists:
step S26: judging the surveyed objects with the Mahalanobis distance larger than a preset value as outliers;
and step S3: and (3) performing record check on the outlier investigator according to the investigator error evaluation rule:
a quality controller logs in an electronic information platform, searches questionnaire data and a recording file of a survey object corresponding to the outlier according to the unique index, judges whether the questionnaire data and the recording file of the survey object are consistent, and if the questionnaire data and the recording file are not consistent, the survey object cannot accurately capture and record answers of the survey object, namely, an error of the survey object exists; recording the checking result, wherein the checking result is classified into five types: correct, wrong questioning mode, wrong questioning/not questioning, wrong logging and no verification, wherein the wrong questioning mode, the wrong questioning/not questioning and the wrong logging belong to the error of the investigator;
and step S4: based on the recording check data, an error occurrence rate index and an error contribution rate index are constructed, and the occurrence condition of the error of the investigator is evaluated to obtain:
different investigators may investigate one or more investigators, and the investigation situation of the investigators is evaluated by calculating the incidence of the investigator error of different investigators, which is calculated as follows:
the number of questions with investigator error = the number of questions with questioning mode error + the number of questions with question/question not asked + the number of questions with entry error.
2. The method of evaluating surveyor's errors in a large face-to-face survey according to claim 1, further comprising:
step S5: based on the calculated error occurrence rate of the investigators, further analyzing the popular characteristics of the investigators in the error of different investigators, and exploring the distribution mode and the aggregation mode of the investigators; the distribution pattern is reflected by a probability density map; the aggregation mode is used to explore whether the error of the investigator is concentrated on part of the investigator.
3. A method of assessing investigator error in a large face-to-face investigation according to claim 2, further comprising:
step S6: according to the calculated error incidence rate ER of different investigators i Further calculating investigator error contribution rate of each investigator asWhere k denotes the number of investigators, ER i 、ER j Respectively representing the error occurrence rate of the investigator of the ith investigator and the jth investigator; />The larger the value, the greater the risk of the investigator error. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110593435.1A CN115410718B (en) | 2021-05-28 | Method for evaluating error of investigator in large-scale face-to-face investigation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110593435.1A CN115410718B (en) | 2021-05-28 | Method for evaluating error of investigator in large-scale face-to-face investigation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115410718A CN115410718A (en) | 2022-11-29 |
CN115410718B true CN115410718B (en) | 2023-04-18 |
Family
ID=
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014071776A1 (en) * | 2012-11-06 | 2014-05-15 | 中兴通讯股份有限公司 | Method and system for evaluating quality of experience of communication service users |
CN107169734A (en) * | 2017-05-10 | 2017-09-15 | 美亚联创(北京)科技有限公司 | A kind of social investigation management system |
JP2019100011A (en) * | 2017-11-29 | 2019-06-24 | 清水建設株式会社 | Determination method for positioning subsoil exploration, determination device, subsoil estimation method and subsoil estimation device |
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014071776A1 (en) * | 2012-11-06 | 2014-05-15 | 中兴通讯股份有限公司 | Method and system for evaluating quality of experience of communication service users |
CN107169734A (en) * | 2017-05-10 | 2017-09-15 | 美亚联创(北京)科技有限公司 | A kind of social investigation management system |
JP2019100011A (en) * | 2017-11-29 | 2019-06-24 | 清水建設株式会社 | Determination method for positioning subsoil exploration, determination device, subsoil estimation method and subsoil estimation device |
Non-Patent Citations (2)
Title |
---|
刘佳丽 ; 王薇 ; 何法霖 ; 钟堃;袁帅;张志新 ; 杜雨轩 ; 王治国 ; .全国566家临床实验室血清降钙素原室内质量控制不精密度调查与分析.现代检验医学杂志.2018,(第02期),全文. * |
李婷婷 ; 王薇 ; 赵海建 ; 何法霖 ; 钟堃; 袁帅 ; 王治国 ; .临床实验室室间质量评价结果解释及不合格结果原因调查的建议性方法.现代检验医学杂志.2017,(第05期),全文. * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200350058A1 (en) | Chinese medicine production process knowledge system | |
CN112257963B (en) | Defect prediction method and device based on spaceflight software defect data distribution outlier | |
CN112700325A (en) | Method for predicting online credit return customers based on Stacking ensemble learning | |
CN111177655B (en) | Data processing method and device and electronic equipment | |
CN110472209B (en) | Deep learning-based table generation method and device and computer equipment | |
CN114490404A (en) | Test case determination method and device, electronic equipment and storage medium | |
CN110956543A (en) | Method for detecting abnormal transaction | |
CN115410718B (en) | Method for evaluating error of investigator in large-scale face-to-face investigation | |
CN111863135B (en) | False positive structure variation filtering method, storage medium and computing device | |
CN112906672A (en) | Steel rail defect identification method and system | |
CN116049157B (en) | Quality data analysis method and system | |
Dhiman et al. | A Clustered Approach to Analyze the Software Quality Using Software Defects | |
CN115410718A (en) | Method for evaluating error of investigator in large-scale face-to-face investigation | |
CN114519437B (en) | Cloud-based micro-service method and system for fault diagnosis analysis and repair reporting | |
CN115422821A (en) | Data processing method and device for rock mass parameter prediction | |
Agarwal et al. | A machine learning model to prune insignificant attributes | |
CN112732773B (en) | Method and system for checking uniqueness of relay protection defect data | |
CN114625901A (en) | Multi-algorithm integration method and device | |
CN114722960A (en) | Method and system for detecting incomplete track of event log in business process | |
CN111897853A (en) | Big data-based computer data mining and exploring method and system | |
Kumar et al. | Student’s Performance Analysis with EDA and Machine Learning Models | |
CN116069674B (en) | Security assessment method and system for grade assessment | |
Khoussi et al. | A neural networks-based methodology for fitting data to probability distributions | |
Vinodha et al. | Framework for Improving the Accuracy of the Machine Learning Model in Predicting Future Values | |
Adjei et al. | Modelling heterogeneity in the classification process in multi‐species distribution models can improve predictive performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |