Disclosure of Invention
An object of the present disclosure is to provide a method and an apparatus for processing questionnaire data, a storage medium, and an electronic device, so as to solve the above technical problems.
In order to achieve the above object, a first aspect of the present disclosure provides a method for processing questionnaire data, the method comprising: acquiring answer data of a user on a questionnaire; inputting the answer sheet data into a data processing model to obtain a data processing result output by the data processing model, wherein the data processing result comprises a data tag used for representing the degree of credibility of a user; the data processing model is obtained by training according to the actual data labels of the sample users and the answer sheet data samples of the questionnaires of the sample users.
Optionally, the questionnaire consists of questions that distinguish a fraudulent user of the sample users above a distinguishing threshold; wherein the discrimination of the problem is determined by: for each question, determining the proportion of fraudulent users in the sample users selecting each option under the question; and determining the discrimination of the problem according to the ratio, wherein the ratio of the fraudulent user and the discrimination are in positive correlation.
Optionally, the acquiring answer data of the user on the questionnaire includes: acquiring answer data of a questionnaire corresponding to the service type of a user by the user; the step of inputting the answer sheet data into a data processing model comprises the following steps: inputting the answer sheet data into a data processing model corresponding to the service type; the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under the service types is higher than a first discrimination threshold, or the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under all the service types is higher than a second discrimination threshold, wherein the second discrimination threshold is smaller than the first discrimination threshold.
Optionally, the method further comprises: processing the answer sheet data sample of the sample user according to the data processing model to obtain an intermediate value of a processing result output by the data processing model; judging whether the proportion of fraudulent users in the sample users with the intermediate value in the preset value interval is higher than a preset proportion threshold value or not; and if the proportion of the fraudulent user is lower than the preset proportion threshold value, re-determining the problems in the questionnaire, and re-training according to the re-determined questionnaire to generate a data processing model.
Optionally, the questionnaire comprises a provable question and a provable question; for the testimonial question, the answer sheet data is obtained by the following method: determining whether the answer of the sample user is matched with the characteristic information of the sample user, if the answer is matched with the characteristic information, the answer is correct, and if the answer is not matched with the characteristic information, the answer is wrong, wherein the answer data comprises information for representing whether the answer of the user is correct, and the characteristic information is extracted from user data submitted when the sample user transacts business; for the non-corroborable question, the answer sheet data includes a selection item for each option provided by the user for the question.
In a second aspect of the present disclosure, a device for processing questionnaire data is provided, which includes an obtaining module, configured to obtain answer data of a questionnaire by a user; the processing module is used for inputting the answer sheet data into a data processing model to obtain a data processing result output by the data processing model, and the data processing result comprises a data tag used for representing the degree of credibility of a user; the data processing model is obtained by training according to the actual data labels of the sample users and the answer sheet data samples of the questionnaires of the sample users.
Optionally, the questionnaire consists of questions that distinguish a fraudulent user of the sample users above a distinguishing threshold; wherein the discrimination of the problem is determined by: for each question, determining the proportion of fraudulent users in the sample users selecting each option under the question; and determining the discrimination of the problem according to the ratio, wherein the ratio of the fraudulent user and the discrimination are in positive correlation.
Optionally, the obtaining module is configured to obtain answer data of a questionnaire corresponding to a service type of the user; the processing module is used for inputting the answer sheet data into a data processing model corresponding to the service type; the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under the service types is higher than a first discrimination threshold, or the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under all the service types is higher than a second discrimination threshold, wherein the second discrimination threshold is smaller than the first discrimination threshold.
Optionally, the device further includes an optimization module, configured to process the answer sheet data sample of the sample user according to the data processing model, so as to obtain an intermediate value of a processing result output by the data processing model; judging whether the proportion of fraudulent users in the sample users with the intermediate value in the preset value interval is higher than a preset proportion threshold value or not; and if the proportion of the fraudulent user is lower than the preset proportion threshold value, re-determining the problems in the questionnaire, and re-training according to the re-determined questionnaire to generate a data processing model.
Optionally, the questionnaire comprises a provable question and a provable question;
for the testimonial question, the answer sheet data is obtained by the following method: determining whether the answer of the sample user is matched with the characteristic information of the sample user, if the answer is matched with the characteristic information, the answer is correct, and if the answer is not matched with the characteristic information, the answer is wrong, wherein the answer data comprises information for representing whether the answer of the user is correct, and the characteristic information is extracted from user data submitted when the sample user transacts business; for the non-corroborable question, the answer sheet data includes a selection item for each option provided by the user for the question.
In a third aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of any one of the first aspect of the disclosure.
In a fourth aspect of the present disclosure, an electronic device is provided, which includes a memory and a processor, wherein the memory stores a computer program thereon, and the processor is configured to execute the computer program in the memory to implement the steps of the method in any one of the first aspect of the present disclosure.
By the technical scheme, the answer sheet data of the user can be processed through the data processing model, the data tag representing the credibility of the user is obtained, and the problems of low accuracy and low efficiency in manual answer sheet data processing can be solved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
FIG. 1 is a flow chart illustrating a method of processing questionnaire data according to an exemplary disclosed embodiment. As shown in fig. 1, the processing method of questionnaire data includes the following steps:
and S11, obtaining answer sheet data of the user to the questionnaire.
The questionnaire can be an electronic questionnaire or a paper questionnaire. When the questionnaire is a paper questionnaire, character recognition or fill-in area recognition can be performed on the answer content of the questionnaire to obtain the answer data. In a possible implementation mode, the questionnaire can be issued to the user in a telephone questionnaire mode, and answer data of the user is collected in a voice recognition mode.
The answer sheet data is answer answers of users to the questionnaires, and can be subjective answers, namely open answers without standard options given in the question stem, or objective answers, namely optional answers.
Optionally, the questionnaire consists of questions that distinguish fraudulent ones of the sample users above a distinguishing threshold. Discrimination characterization problems can distinguish between fraudulent and non-fraudulent users by different options.
Wherein the discrimination of the problem is determined by:
for each question, determining the proportion of fraudulent users in the sample users selecting each option under the question; and determining the discrimination of the problem according to the ratio, wherein the ratio of the fraudulent user and the discrimination are in positive correlation.
For example, if a question has A, B, C, D four options, and the number of users with fraud is up to 60% among sample users who have selected C option, and the number of users with fraud is less than 10% among sample users who have selected A, B, D option, it indicates that the question can be primarily distinguished by "whether or not the C option is selected" whether or not the user will make fraud, and the distinction degree of the question is high. If a question has A, B, C, D four options, but the fraudulent user distributions are even (e.g. all in the interval of 30% -40%) among the sample users selecting which option, it means that the question cannot preliminarily distinguish whether the user will make fraudulent activities by selecting a certain option, and the distinction degree of the question is low.
The questionnaire comprises a testable question and an uncertifiable question, wherein the testable question is a question with standard answers or a question which can be used for distinguishing whether the user answers correctly or not through user characteristics, and the uncertifiable question is a question which cannot be used for distinguishing whether the user answers correctly or not without the standard answers. For example, if the question is the monthly income level of the user, the user can be distinguished whether the user answers correctly by comparing the payroll list of the user with the options selected by the user, and therefore, the question is a testable question; if the question is a risk range that the user can bear, the answer of the question is determined by the subjective feeling of the user and cannot be distinguished, so that the question is an unverifiable question.
For the testimonial question, the answer sheet data is obtained by the following method: and determining whether the answer of the sample user is matched with the characteristic information of the sample user, if the answer is matched with the characteristic information, the answer is correct, and if the answer is not matched with the characteristic information, the answer is wrong, wherein the answer data comprises information for representing whether the answer of the user is correct, and the characteristic information is extracted from user data submitted when the sample user transacts the business.
For example, if the question is "which of the following is your monthly income: A. below 4000, B, 4000-8000, C, 8000-13000, and D, 13000 or above ", the user selects the option C, and the monthly income of the user is 4500 yuan actually extracted from the payroll details submitted by the user, and if the payroll interval does not belong to the option C, the user is indicated to have made an incorrect answer.
For the non-corroborable question, the answer sheet data includes a selection item for each option provided by the user for the question.
And S12, inputting the answer sheet data into a data processing model to obtain a data processing result output by the data processing model.
The data processing result comprises a data tag used for representing the credibility size of the user.
The data processing model is obtained by training according to the actual data labels of the sample users and the answer sheet data samples of the questionnaires of the sample users.
The sample user may be a user who has transacted a service before, and when the user performs an actual service operation after processing the user answer data, the user may also become a sample user. Thus, with the actual sample data support, the selection of questionnaires and the data processing capabilities of the data processing model are constantly optimized as the number of sample users increases.
By the technical scheme, the answer sheet data of the user can be processed through the data processing model, the data tag representing the credibility of the user is obtained, and the problems of low accuracy and low efficiency in manual answer sheet data processing can be solved.
FIG. 2 is a flow chart illustrating another method of processing questionnaire data according to an exemplary disclosed embodiment. As shown in fig. 2, the processing method of the questionnaire data includes the following steps:
and S21, obtaining answer sheet data of the user to the questionnaire corresponding to the service type of the user.
The questionnaire can be an electronic questionnaire or a paper questionnaire. When the questionnaire is a paper questionnaire, character recognition or fill-in area recognition can be performed on the answer content of the questionnaire to obtain the answer data. In a possible implementation mode, the questionnaire can be issued to the user in a telephone questionnaire mode, and answer data of the user is collected in a voice recognition mode.
The answer sheet data is answer answers of users to the questionnaires, and can be subjective answers, namely open answers without standard options given in the question stem, or objective answers, namely optional answers.
Optionally, the questionnaire consists of questions that distinguish fraudulent ones of the sample users above a distinguishing threshold. Discrimination characterization problems can distinguish between fraudulent and non-fraudulent users by different options.
Wherein the discrimination of the problem is determined by:
for each question, determining the proportion of fraudulent users in the sample users selecting each option under the question; and determining the discrimination of the problem according to the ratio, wherein the ratio of the fraudulent user and the discrimination are in positive correlation.
For example, if a question has A, B, C, D four options, and the number of users with fraud is up to 60% among sample users who have selected C option, and the number of users with fraud is less than 10% among sample users who have selected A, B, D option, it indicates that the question can be primarily distinguished by "whether or not the C option is selected" whether or not the user will make fraud, and the distinction degree of the question is high. If a question has A, B, C, D four options, but the fraudulent user distributions are even (e.g. all in the interval of 30% -40%) among the sample users selecting which option, it means that the question cannot preliminarily distinguish whether the user will make fraudulent activities by selecting a certain option, and the distinction degree of the question is low.
The questionnaire comprises a testable question and an uncertifiable question, wherein the testable question is a question with standard answers or a question which can be used for distinguishing whether the user answers correctly or not through user characteristics, and the uncertifiable question is a question which cannot be used for distinguishing whether the user answers correctly or not without the standard answers. For example, if the question is the monthly income level of the user, the user can be distinguished whether the user answers correctly by comparing the payroll list of the user with the options selected by the user, and therefore, the question is a testable question; if the question is a risk range that the user can bear, the answer of the question is determined by the subjective feeling of the user and cannot be distinguished, so that the question is an unverifiable question.
For the testimonial question, the answer sheet data is obtained by the following method: and determining whether the answer of the sample user is matched with the characteristic information of the sample user, if the answer is matched with the characteristic information, the answer is correct, and if the answer is not matched with the characteristic information, the answer is wrong, wherein the answer data comprises information for representing whether the answer of the user is correct, and the characteristic information is extracted from user data submitted when the sample user transacts the business.
For example, if the question is "which of the following is your monthly income: A. below 4000, B, 4000-8000, C, 8000-13000, and D, 13000 or above ", the user selects the option C, and the monthly income of the user is 4500 yuan actually extracted from the payroll details submitted by the user, and if the payroll interval does not belong to the option C, the user is indicated to have made an incorrect answer.
For the non-corroborable question, the answer sheet data includes a selection item for each option provided by the user for the question.
Optionally, the questionnaire is composed of questions that the discrimination of fraudulent users of the sample users under the service type is higher than a first discrimination threshold, or the questionnaire is composed of questions that the discrimination of fraudulent users of the sample users under all the service types is higher than a second discrimination threshold, where the second discrimination threshold is smaller than the first discrimination threshold.
It should be noted that a service problem pool may be generated in advance for each service, where a problem in the service questionnaire pool is composed of problems that the discrimination of a fraudulent user among sample users in the service type is higher than a first discrimination threshold; a general question pool may also be generated, the questions in the general questionnaire pool consisting of questions that the degree of discrimination for fraudulent ones of the sample users under all traffic types is higher than the second degree of discrimination threshold. When the questionnaire is generated, a preset number of questions can be extracted from a service question pool corresponding to the service type of the user to form an answer sheet, a preset number of questions can be extracted from a common question pool to form an answer sheet, or a preset number of questions can be mixed and extracted from the service question pool and the common question pool according to a certain proportion to form an answer sheet.
And S22, inputting the answer sheet data into a data processing model corresponding to the service type to obtain a data processing result output by the data processing model.
The data processing result comprises a data tag used for representing the credibility size of the user.
The data processing model is obtained by training according to the actual data labels of the sample users and the answer sheet data samples of the questionnaires of the sample users.
The sample user may be a user who has transacted a service before, and when the user performs an actual service operation after processing the user answer data, the user may also become a sample user. Thus, with the actual sample data support, the selection of questionnaires and the data processing capabilities of the data processing model are constantly optimized as the number of sample users increases.
Optionally, the answer data samples of the sample users may be processed according to the data processing model to obtain an intermediate value of a processing result output by the data processing model, and it is determined whether a proportion of fraudulent users among the sample users in a preset value interval of the intermediate value is higher than a preset proportion threshold value, if the proportion of the fraudulent users is lower than the preset proportion threshold value, the questions in the questionnaire are re-determined, and the data processing model is generated according to the re-determined questionnaire through retraining.
For example, after the data processing model processes the answer sheet data sample of the sample user, a processing intermediate value of the sample user can be obtained, wherein the intermediate value can be any value of 1-100, and the interval from low to high is 1: 1-10, interval 2: 11-20, interval 3: 21-30, interval 4: 31-40, interval 5: 41-50, interval 6: 51-60, interval 7: 61-70, interval 8: 71-80, interval 9: 81-90, interval 10: and the ten intervals of 91-100 can respectively determine the proportion of the fraudulent users in each interval, if the interval with the highest proportion of the fraudulent users is lower than a preset proportion threshold, the recognition degree of the result output by the data processing model to the fraudulent users is not high, the problems in the questionnaire need to be determined again, and the data processing model is generated again according to the re-determined questionnaire.
It should be noted that, for the data processing model with the highest recognition degree of the fraudulent user, the highest proportion of fraudulent users can be distinguished in the interval with a small number of people, for example, there are 600 sample users, where only the median value of 20 sample users falls into the interval 3, but the number of fraudulent users in the 20 sample users is up to 19, and then the data processing model can judge whether the user is likely to have fraudulent behaviors by whether the data tag value of the user is in the interval 3, that is, the recognition degree of the data processing model for the fraudulent user is very high; if there are 600 users, 400 of them fall into the interval 7, and the number of the fraudulent users is 400, although the proportion of the fraudulent users in the interval is also high (66.7%), there are also a large number of users who are not fraudulent users, and the recognition degree of the model for the fraudulent users is not high. In the actual selection model, whether an interval lower than a preset number threshold exists or not can be judged, and then whether the proportion of fraudulent users in the interval is higher than a preset proportion threshold or not can be judged, so that whether the problems in the questionnaire need to be determined again or not can be determined. The optimization mode can also be realized through an optimization model, and the optimal questionnaire and the data processing model are automatically determined by controlling the number of sample users and the proportion of fraudulent users in each interval.
By the technical scheme, the answer sheet data of users with different service types can be processed through the data processing models with different service types, the data tags representing the credibility of the users can be obtained, the problems of low accuracy and low efficiency in manual answer sheet data processing can be solved, the answer sheet data of the users can be distinguished and processed according to different service types, and the flexibility is high.
FIG. 3 is a block diagram illustrating a device for processing questionnaire data according to an exemplary disclosed embodiment. As shown in fig. 3, the questionnaire data processing device 300 includes an acquisition module 310 and a processing module 320.
The obtaining module 310 is configured to obtain answer data of a questionnaire from a user.
The processing module 320 is configured to input the answer sheet data into a data processing model, to obtain a data processing result output by the data processing model, where the data processing result includes a data tag used to represent the user credibility; the data processing model is obtained by training according to the actual data labels of the sample users and the answer sheet data samples of the questionnaires of the sample users.
Optionally, the questionnaire consists of questions that distinguish a fraudulent user of the sample users above a distinguishing threshold; wherein the discrimination of the problem is determined by: for each question, determining the proportion of fraudulent users in the sample users selecting each option under the question; and determining the discrimination of the problem according to the ratio, wherein the ratio of the fraudulent user and the discrimination are in positive correlation.
Optionally, the obtaining module is configured to obtain answer data of a questionnaire corresponding to a service type of the user; the processing module is used for inputting the answer sheet data into a data processing model corresponding to the service type; the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under the service types is higher than a first discrimination threshold, or the questionnaire is composed of the problems that the discrimination of the fraudulent users in the sample users under all the service types is higher than a second discrimination threshold, wherein the second discrimination threshold is smaller than the first discrimination threshold.
Optionally, the device further includes an optimization module, configured to process the answer sheet data sample of the sample user according to the data processing model, so as to obtain an intermediate value of a processing result output by the data processing model; judging whether the proportion of fraudulent users in the sample users with the intermediate value in the preset value interval is higher than a preset proportion threshold value or not; and if the proportion of the fraudulent user is lower than the preset proportion threshold value, re-determining the problems in the questionnaire, and re-training according to the re-determined questionnaire to generate a data processing model.
Optionally, the questionnaire comprises a provable question and a provable question;
for the testimonial question, the answer sheet data is obtained by the following method: determining whether the answer of the sample user is matched with the characteristic information of the sample user, if the answer is matched with the characteristic information, the answer is correct, and if the answer is not matched with the characteristic information, the answer is wrong, wherein the answer data comprises information for representing whether the answer of the user is correct, and the characteristic information is extracted from user data submitted when the sample user transacts business; for the non-corroborable question, the answer sheet data includes a selection item for each option provided by the user for the question.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
By the technical scheme, the answer sheet data of the user can be processed through the data processing model, the data tag representing the credibility of the user is obtained, and the problems of low accuracy and low efficiency in manual answer sheet data processing can be solved.
Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communications component 405.
The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the above method for processing questionnaire data. Memory 402 is used to store various types of data to support operation at the electronic device 400, such data may include, for example, instructions for any application or method operating on the electronic device 400, as well as application-related data, such as user-submitted answer sheet data, sample user's data tags, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 405 may therefore include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components, and is used for executing the above-mentioned method for Processing questionnaire data.
In another exemplary embodiment, there is also provided a computer-readable storage medium including program instructions, which when executed by a processor, implement the steps of the above-described method of processing questionnaire data. For example, the computer readable storage medium may be the memory 402 including the program instructions executable by the processor 401 of the electronic device 400 to perform the above-described questionnaire data processing method.
Fig. 5 is a block diagram illustrating an electronic device 500 in accordance with an example embodiment. For example, the electronic device 500 may be provided as a server. Referring to fig. 5, the electronic device 500 comprises a processor 522, which may be one or more in number, and a memory 532 for storing computer programs executable by the processor 522. The computer programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processor 522 may be configured to execute the computer program to perform the above-described processing method of questionnaire data.
Additionally, the electronic device 500 may also include a power component 526 and a communication component 550, the power component 526 may be configured to perform power management of the electronic device 500, and the communication component 550 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 500. In addition, the electronic device 500 may also include input/output (I/O) interfaces 558. The electronic device 500 may operate based on an operating system stored in the memory 532, such as Windows Server, Mac OSXTM, UnixTM, LinuxTM, and the like.
In another exemplary embodiment, there is also provided a computer-readable storage medium including program instructions, which when executed by a processor, implement the steps of the above-described method of processing questionnaire data. For example, the computer readable storage medium may be the memory 532 including program instructions executable by the processor 522 of the electronic device 500 to perform the above-described method of processing questionnaire data.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of processing questionnaire data when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.