CN110569906B - Data processing method, data processing apparatus, and computer-readable storage medium - Google Patents

Data processing method, data processing apparatus, and computer-readable storage medium Download PDF

Info

Publication number
CN110569906B
CN110569906B CN201910852637.6A CN201910852637A CN110569906B CN 110569906 B CN110569906 B CN 110569906B CN 201910852637 A CN201910852637 A CN 201910852637A CN 110569906 B CN110569906 B CN 110569906B
Authority
CN
China
Prior art keywords
user
information
data
information feedback
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910852637.6A
Other languages
Chinese (zh)
Other versions
CN110569906A (en
Inventor
颜文靖
张思维
朱婷
郝硕
文嘉慈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN201910852637.6A priority Critical patent/CN110569906B/en
Publication of CN110569906A publication Critical patent/CN110569906A/en
Priority to PCT/CN2020/110537 priority patent/WO2021047376A1/en
Application granted granted Critical
Publication of CN110569906B publication Critical patent/CN110569906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The disclosure provides a data processing method, a data processing device and a computer readable storage medium, and relates to the technical field of information security. The data processing method comprises the following steps: acquiring behavior data of a target user for executing information feedback operation; generating behavior characteristics of the target user for executing the information feedback operation by using the behavior data; and processing the behavior characteristics by using a pre-trained machine learning model to obtain the safety degree data of the target user. The method and the device can simply and efficiently obtain the safety data of the user, do not need complex operation and special instruments, and have wide application range.

Description

Data processing method, data processing apparatus, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of information security technologies, and in particular, to a data processing method, a data processing apparatus, and a computer-readable storage medium.
Background
With the increasing development of internet services, the information security of users gradually receives more and more attention. For the convenience of security management, security data needs to be determined for each user to measure the security of each user for internet services.
Disclosure of Invention
The technical problem solved by the present disclosure is how to obtain the security data of the user simply and efficiently.
According to an aspect of the embodiments of the present disclosure, there is provided a data processing method, including: acquiring behavior data of a target user for executing information feedback operation; generating behavior characteristics of the target user for executing the information feedback operation by using the behavior data; and processing the behavior characteristics by using a pre-trained machine learning model to obtain the safety degree data of the target user.
In some embodiments, further comprising: acquiring behavior data of a known user for executing information feedback operation; generating behavior characteristics of the known user for executing the information feedback operation by utilizing behavior data of the known user for executing the information feedback operation; marking the behavior characteristics of the known user for executing the information feedback operation by using the safety data of the known user; and training the machine learning model by using the marked behavior characteristics of the known user for executing the information feedback operation, so that the trained machine learning model can process the behavior data of the target user for executing the information feedback operation, and the safety data of the target user is obtained.
In some embodiments, the generating the behavior characteristics of the known user performing the information feedback operation by using the behavior data of the known user performing the information feedback operation includes: generating at least one candidate behavior characteristic of the information feedback operation executed by the known user by utilizing the behavior data of the information feedback operation executed by the known user; calculating the correlation degree between the candidate behavior characteristics and the safety degree data of the known user; and taking the candidate behavior characteristics with the correlation degree larger than the preset value as the behavior characteristics of the known user for executing the information feedback operation.
In some embodiments, processing the behavior features using a pre-trained machine learning model, and obtaining the safety data of the target user includes: and processing the input behavior characteristics by utilizing the machine learning model with the largest area under the working characteristic curve of the subject in the pre-trained multiple machine learning models, and outputting the safety degree data of the target user.
In some embodiments, processing the behavior features using a pre-trained machine learning model, and obtaining the safety data of the target user includes: respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of a target user; and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user.
In some embodiments, the behavioral characteristics include at least one of the following characteristics: the system comprises a response time length when a user firstly feeds back various information, the total times of modifying various information by the user, the total times of reviewing and modifying various information by the user, the times of unmatching various information with reserved information, a force parameter of pressing information feedback equipment by the user, and a pitching angle parameter and a swinging angle parameter of handheld information feedback equipment by the user.
In some embodiments, the behavioral data includes at least one of the following: the time when the user enters each information feedback page, the time when the user feeds back each item of information, the information identification and the information content when the user feeds back each item of information, the touch parameter when the user presses the information feedback device, and the angle parameter when the user holds the information feedback device.
In some embodiments, using the behavior data to generate the behavior characteristics of the target user performing the information feedback operation includes: determining the reaction duration when the user primarily feeds back each item of information by using the time when the user enters each information feedback page and the time when the user feeds back each item of information; or, determining the total times of modifying each item of information by the user, the total times of reviewing each item of information by the user, and the total times of reviewing and modifying each item of information by the user by using the information identifiers and the information contents of each item of information fed back by the user.
In some embodiments, the candidate behavioral characteristics include at least one of the following data: the method comprises the following steps of responding time when a user firstly feeds back various information, staying time of the user on various information feedback pages, total times of modifying various information by the user, total times of reviewing and modifying various information by the user, times of unmatching various information with reserved information, force parameters of pressing information feedback equipment by the user, time of pressing the information feedback equipment by the user, area of pressing the information feedback equipment by the user, and pitch angle parameters and swing angle parameters of handheld information feedback equipment by the user.
In some embodiments, further comprising: prior to generating the behavioral characteristics, preprocessing the behavioral data using at least one of the following methods: behavior data with numerical categories lower than a first threshold value in the behavior data are removed; removing the behavior data with the deletion rate higher than a second threshold value from the behavior data; and filling the behavior data with a mode or average number, wherein the missing rate of the behavior data is lower than a second threshold value.
According to another aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including: the data acquisition module is configured to acquire behavior data of a target user for executing information feedback operation; the characteristic generation module is configured to generate behavior characteristics of the target user for executing the information feedback operation by utilizing the behavior data; and the model processing module is configured to process the behavior characteristics by utilizing a machine learning model trained in advance to obtain the safety degree data of the target user.
In some embodiments, further comprising a model training module configured to: acquiring behavior data of a known user for executing information feedback operation; generating behavior characteristics of the known user for executing the information feedback operation by utilizing behavior data of the known user for executing the information feedback operation; marking the behavior characteristics of the known user for executing the information feedback operation by using the safety data of the known user; and training the machine learning model by using the marked behavior characteristics of the known user for executing the information feedback operation, so that the trained machine learning model can process the behavior data of the target user for executing the information feedback operation to obtain the safety data of the target user.
In some embodiments, the model training module is configured to: generating at least one candidate behavior characteristic of the information feedback operation executed by the known user by utilizing the behavior data of the information feedback operation executed by the known user; calculating the correlation between the candidate behavior characteristics and the safety data of the known user; and taking the candidate behavior characteristics with the correlation degree larger than the preset value as the behavior characteristics of the known user for executing the information feedback operation.
In some embodiments, the model processing module is configured to: and processing the input behavior characteristics by utilizing the machine learning model with the largest area under the working characteristic curve of the subject in the various machine learning models trained in advance, and outputting the safety degree data of the target user.
In some embodiments, the model processing module is configured to: respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of a target user; and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user.
In some embodiments, the behavioral characteristics include at least one of the following characteristics: the system comprises a response time length when a user firstly feeds back various information, the total times of modifying various information by the user, the total times of reviewing and modifying various information by the user, the times of unmatching various information and reserved information by the user, a force parameter of pressing the information feedback equipment by the user, a pitching angle parameter and a swinging angle parameter of the handheld information feedback equipment by the user.
In some embodiments, the behavioral data includes at least one of the following: the time when the user enters each information feedback page, the time when the user feeds back each item of information, the information identification and the information content when the user feeds back each item of information, the touch parameter when the user presses the information feedback device, and the angle parameter when the user holds the information feedback device.
In some embodiments, the feature generation module is configured to: determining the reaction duration when the user primarily feeds back each item of information by using the time when the user enters each information feedback page and the time when the user feeds back each item of information; or, determining the total times of the user modifying each item of information, the total times of the user reviewing each item of information, and the total times of the user reviewing and modifying each item of information by using the information identification and the information content of each item of information fed back by the user.
In some embodiments, the candidate behavioral characteristics include at least one of the following data: the method comprises the following steps of responding time when a user firstly feeds back various information, staying time of the user on various information feedback pages, total times of modifying various information by the user, total times of reviewing and modifying various information by the user, times of unmatching various information with reserved information, force parameters of pressing information feedback equipment by the user, time of pressing the information feedback equipment by the user, area of pressing the information feedback equipment by the user, and pitch angle parameters and swing angle parameters of handheld information feedback equipment by the user.
In some embodiments, the data preprocessing module is further configured to preprocess the behavior data using at least one of the following methods: behavior data with numerical categories lower than a first threshold value in the behavior data are removed; removing the behavior data with the deletion rate higher than a second threshold value from the behavior data; and filling the behavior data with a mode or average number, wherein the missing rate of the behavior data is lower than a second threshold value.
According to still another aspect of an embodiment of the present disclosure, there is provided a data processing apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the aforementioned data processing method based on instructions stored in the memory.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement the aforementioned data processing method.
The method and the device can simply and efficiently obtain the safety data of the user, do not need complex operation and special instruments, and have wide application range.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 shows a flow diagram of a data processing method of some embodiments of the present disclosure.
Fig. 2 shows a flow diagram of a data processing method according to further embodiments of the present disclosure.
Fig. 3 shows a schematic structural diagram of a data processing apparatus according to some embodiments of the present disclosure.
Fig. 4 shows a schematic structural diagram of a data processing apparatus according to further embodiments of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The inventor finds that a certain test result can be generated by using modes such as a lie detector, an electroencephalogram technology, a brain imaging technology and the like, and safety data of a user can be determined according to the test result. However, these methods generally require the use of specific instruments and therefore have the following disadvantages: (1) the expense is high; (2) the instrument is inconvenient to carry and easy to damage; (3) the operation of the instrument is complicated, and workers need to be trained professionally; (4) the tested user is required to be highly matched, and the user experience is poor. These disadvantages result in a less useful range of use with specific instruments for determining the safety data of the user.
In order to obtain the safety data of the user more simply and efficiently without complex operation and specific instruments, and to improve the application range, the present disclosure provides a data processing method, which is described in detail below.
Some embodiments of the disclosed data processing method are first described in conjunction with fig. 1 to explain how to train a machine learning model for data processing.
Fig. 1 shows a flow diagram of a data processing method of some embodiments of the present disclosure. As shown in fig. 1, the present embodiment includes steps S101 to S103.
In step S101, behavior data of a known user performing an information feedback operation is acquired.
When a user executes an information feedback operation, the handheld information feedback device (such as a tablet computer or a mobile phone terminal) enters an information feedback page. When the user feeds back different information, the user needs to enter different information feedback pages. Therefore, when behavior data are acquired, the behavior data of the user can be acquired at the row buried points of different information feedback pages.
The behavior data may specifically include a time when the user enters each information feedback page, a time when the user feeds back each item of information, an information identifier and an information content when the user feeds back each item of information, a touch parameter when the user presses the information feedback device, an angle parameter when the user holds the information feedback device, and the like.
After behavior data are collected, the behavior data can be sent to a background server for storage through an http request, and the specific http request can be packaged through data in a custom json format. In addition, the behavior data may be stored in a local database of the information feedback device, and when the behavior data stored in the local database within a specified time exceeds a certain threshold (for example, 50 pieces), the behavior data of the current batch is uniformly sent to the background server, so that the background server stores the behavior data of the user in the database, and performs feature mining on the behavior data of the user.
In step S102, behavior characteristics of the known user performing the information feedback operation are generated using behavior data of the known user performing the information feedback operation.
Before generating behavior features, the following method may be used to preprocess the behavior data:
(1) and eliminating the behavior data with low numerical category, namely eliminating the behavior data with numerical category lower than the first threshold value in the behavior data. For example, for the question "do you are Chinese? "if the feedback information of the topic is lower than 2 types if the users are known to select Chinese, the information identification (for example, the topic number of the question) and the information content (answer" yes "or" no ") fed back by the users can be deleted.
(2) And (4) rejecting the high-missing-rate behavior data, namely rejecting the behavior data with the missing rate higher than a second threshold value in the behavior data. For example, for a mobile phone terminal of an ios system, no sensor-related data can be collected. Then, when the mobile phone terminal of the ios system used by the user is higher than 90%, the missing rate of the behavior data of the angle parameter of the information feedback device held by the user is too high, and the behavior data can be deleted.
(3) And filling the behavior data with a mode or average number, wherein the missing rate of the behavior data is lower than a second threshold value.
In some embodiments, a binning method may be used to convert some continuous behavior features with small difference in value into a category behavior feature, and encode the category behavior feature. Taking the reaction time length when the user firstly feeds back various information as an example, dividing the value lower than the lower edge (0.95) into quick initial reaction, dividing the value between the lower edge and the upper quartile (0.95-0.98) into normal initial reaction, dividing the value between the upper quartile and the upper edge (0.98-1) into slow initial reaction, dividing the value higher than the upper edge (1) into slow initial reaction, and respectively carrying out one-hot coding on the four categories.
In some embodiments, at least one candidate behavior feature of the known user performing the information feedback operation is first generated by using the behavior data of the known user performing the information feedback operation. The candidate behavior features may specifically include: the system comprises a first information feedback device, a second information feedback device, a third information feedback device, a fourth information feedback device, a fifth information feedback device, a sixth information feedback device and a sixth information feedback device. The reserved information may include the name, gender, birth date, and identification number of the user.
Then, the correlation degree between each candidate behavior characteristic and the safety degree data of the known user is calculated. The correlation may be calculated using pearson correlation coefficients. The Pearson correlation coefficient can measure the linear correlation between variables, and the value range of the result is [ -1, 1 ]. Wherein, the value-1 represents complete negative correlation, the value-1 represents complete positive correlation, and the value-0 represents no linear correlation. And finally, taking the candidate behavior characteristic with the correlation degree larger than a preset value (for example, the Pearson correlation coefficient is larger than 0.5) as the behavior characteristic of the known user for executing the information feedback operation. The behavior characteristics may specifically include a reaction duration when the user primarily feeds back each item of information, a total number of times that the user modifies each item of information, a total number of times that the user reviews and modifies each item of information, a number of times that each item of information is not matched with the reserved information, a force parameter of pressing the information feedback device by the user, a pitch angle parameter and a roll angle parameter of holding the information feedback device by the user, and the like.
In step S103, the behavior characteristics of the known user performing the information feedback operation are labeled by using the safety data of the known user.
For example, if the known user a is a white list user, the behavior characteristics of the information feedback operation executed by the known user a are labeled 1; and if the known user b is the blacklist user, marking 0 for the behavior characteristics of the information feedback operation executed by the known user b.
In step S104, the machine learning model is trained by using the labeled behavior characteristics of the known user performing the information feedback operation, so that the trained machine learning model can process the behavior data of the target user performing the information feedback operation, and obtain the safety data of the target user.
Those skilled in the art will appreciate that the Machine learning model may be specifically an SVM (Support Vector Machine), a random forest, a LightGBM model, an XGBoost model, and so on. When the machine learning models are trained, a plurality of machine learning models can be trained respectively, AUC (Area Under the working characteristic Curve of the subject and the Area enclosed by coordinate axes) is used as the evaluation index of each machine learning model, and then the machine learning model with the highest AUC is selected for subsequent data processing. Of course, each machine learning model may also be used for subsequent data processing.
Further embodiments of the disclosed data processing method are described below in conjunction with fig. 2 to explain how data processing is performed using a pre-trained machine learning model.
Fig. 2 shows a flow diagram of a data processing method according to further embodiments of the present disclosure. As shown in fig. 2, the present embodiment includes steps S201 to S203.
In step S201, behavior data of the target user performing the information feedback operation is acquired.
The specific process of acquiring the behavior data of the target user for performing the information feedback operation may refer to step S101, and will not be described repeatedly herein.
In step S202, behavior characteristics of the target user performing the information feedback operation are generated using the behavior data.
For example, the time when the user enters each information feedback page and the time when the user feeds back each item of information may be used to determine the reaction duration when the user feeds back each item of information for the first time. For another example, the total times of the user modifying each item of information, the total times of the user reviewing each item of information, and the total times of the user reviewing and modifying each item of information may be determined by using the information identifiers and the information contents of each item of information fed back by the user.
In step S203, the behavior characteristics are processed by using a machine learning model trained in advance, and security data of the target user is obtained.
In some embodiments, step S203 comprises: and processing the input behavior characteristics by utilizing the machine learning model with the largest area under the working characteristic curve of the subject in the pre-trained multiple machine learning models, and outputting the safety degree data of the target user.
In some embodiments, step S203 comprises: respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of a target user; and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user. For example, the same weight or a weight corresponding to the AUC indicator may be used to perform weighting processing on multiple pieces of security data, so as to obtain the security data of the target user.
According to the embodiment, the behavior data of the user executing the information feedback operation is utilized, the behavior characteristics of the user executing the information feedback operation can be generated, and the safety degree of the user is predicted by adopting a machine learning method, so that the safety degree data of the user can be simply and efficiently obtained, complex operation and specific instruments are not needed, and the application range is wide.
How to obtain behavior data of the user performing the information feedback operation is described below. For the convenience of the reader, the information feedback operation performed by the user is exemplified in terms of the topic of choice made by the user. Those skilled in the art will appreciate that the user performing the information feedback operation may also not be limited to the form of making choice questions.
Three categories of verifiable choice questions can be generated using a database associated with the user's personal information: anticipatory questions, namely choice questions related to the user's personal identity information, such as "do you've year and month X and day X; unexpected questions, namely choice questions derived from personal identity information, such as "your generic phase is X do"; control questions, i.e. choices that the user does not lie, such as "do your sex X? ". Let the user click the yes/no option feedback information.
When a user enters each information feedback page, recording the time of entering the information feedback page and the number of a selected question; when the user selects the option part of each question, recording the current operation time of the user, the question number of the current question and the option number selected by the user. Table 1 shows operation records of the user performing the information feedback operation.
TABLE 1
Figure BDA0002197312640000101
Figure BDA0002197312640000111
(1) The number of times that each item of information is not matched with the reserved information.
For example, the user's information base records "Zhang III originated in 1995", however, the selection "do not" in Zhang III "does your generic phase pig", namely, it is recorded as not matching once.
(2) Total times of user modifying various information
And extracting the operation records with the same title number from the table 1, and counting the number of records with the operation type of feedback minus 1 to obtain the number of times of modification of the current title.
(3) Reaction duration when user primarily feeds back various information
The time range from the first time when the user enters a certain topic page to the first time when the user selects the option is the initial reaction time length. And acquiring a record with the previous question mark smaller than the next question mark in the operation record, wherein the initial reaction time of the item of information is obtained by subtracting the operation time corresponding to the previous question mark from the operation time corresponding to the next question mark.
(4) The stay time of the user on each information feedback page
The behavior feature identifies the total length of time that the user stays in each choice question page, i.e., the length of time from entering the choice question page to leaving the choice question page. Those skilled in the art will appreciate that there are two ways for a user to enter the choice questions page in order to answer or to return to review in operation. Traversing each choice question, extracting the operation record with the click type of 'enter' from the table 1, subtracting the operation time of the previous record from the operation time of the next record to obtain the current time, and accumulating the current time. After the traversal is finished, for the time consumption of the last topic, the difference between the operation time of entering the last topic for the last time and the operation time of leaving the last topic for the last time is added.
(5) Total number of times of user reviewing each item of information
The index is used for recording the times of returning the previous choice questions in the process of making the choice questions by the user, and the user turns back to the target question and records the target question once. The operation record with the click type of "enter" is extracted from table 1, and the question number list is obtained, starting from the second element, the previous question number is greater than the current question number and the next question number is greater than the current question number, and the review number is increased once.
(6) Total times of user reviewing and modifying various information
The index is used for recording the times of selecting the questions and modifying before the user returns in the process of selecting the questions, and the user returns the previous questions and modifies the questions once. In the operation record of the user, starting from the second element, when the previous question mark is larger than the current question mark and the next question mark is equal to the current question mark, the review and modification times are increased once.
(7) Duration of pressing of information feedback device by user
If the information feedback device is provided with a touch screen, the time when the user clicks the option button and the time when the user leaves the option button can be recorded by rewriting the button control, and the pressing time length when the user touches the touch screen is obtained by calculating the time difference.
(8) Area of information feedback device pressed by user
It is assumed that the information feedback device has a touch screen,
through the API provided by the Android official, the size of the contact area between the user's finger and the screen can be obtained using getSize provided in the MotionEvent.
(9) Force parameter of user pressing information feedback equipment
For the pressing force value of each time the user touches the screen, the variance can be further calculated respectively to measure the fluctuation size.
(10) And the pitch angle parameter and the roll angle parameter of the information feedback equipment held by the user, and the like.
The pitching angle, the rolling angle and the rotating angle of the handheld mobile phone are obtained by detecting a sensor of the mobile phone of the user. In addition, the kurtosis value, the skewness value, the first order difference, the second order difference, and the like can be further extracted respectively.
Some embodiments of the disclosed data processing apparatus are described below in conjunction with fig. 3.
Fig. 3 shows a schematic structural diagram of a data processing apparatus according to some embodiments of the present disclosure. As shown in fig. 3, the data processing apparatus 30 in the present embodiment includes:
a data obtaining module 302 configured to obtain behavior data of a target user performing an information feedback operation; a feature generation module 304 configured to generate behavior features of the target user performing the information feedback operation by using the behavior data; and the model processing module 306 is configured to process the behavior characteristics by using a machine learning model trained in advance to obtain the safety degree data of the target user.
In some embodiments, further comprising a model training module 301 configured to: acquiring behavior data of a known user for executing information feedback operation; generating behavior characteristics of the known user for executing the information feedback operation by utilizing behavior data of the known user for executing the information feedback operation; marking the behavior characteristics of the known user for executing the information feedback operation by using the safety data of the known user; and training the machine learning model by using the marked behavior characteristics of the known user for executing the information feedback operation, so that the trained machine learning model can process the behavior data of the target user for executing the information feedback operation, and obtain the safety data of the target user.
In some embodiments, the model training module 301 is configured to: generating at least one candidate behavior characteristic of the information feedback operation executed by the known user by utilizing the behavior data of the information feedback operation executed by the known user; calculating the correlation between the candidate behavior characteristics and the safety data of the known user; and taking the candidate behavior characteristics with the correlation degree larger than the preset value as the behavior characteristics of the known user for executing the information feedback operation.
In some embodiments, the model processing module 306 is configured to: and processing the input behavior characteristics by utilizing the machine learning model with the largest area under the working characteristic curve of the subject in the pre-trained multiple machine learning models, and outputting the safety degree data of the target user.
In some embodiments, the model processing module 306 is configured to: respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of a target user; and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user.
In some embodiments, the behavioral characteristics include at least one of the following characteristics: the system comprises a response time length when a user firstly feeds back various information, the total times of modifying various information by the user, the total times of reviewing and modifying various information by the user, the times of unmatching various information with reserved information, a force parameter of pressing information feedback equipment by the user, and a pitching angle parameter and a swinging angle parameter of handheld information feedback equipment by the user.
In some embodiments, the behavioral data includes at least one of the following: the time when the user enters each information feedback page, the time when the user feeds back each item of information, the information identification and the information content when the user feeds back each item of information, the touch parameter when the user presses the information feedback device, and the angle parameter when the user holds the information feedback device.
In some embodiments, the feature generation module 304 is configured to: determining the reaction duration when the user primarily feeds back each item of information by using the time when the user enters each information feedback page and the time when the user feeds back each item of information; or, determining the total times of the user modifying each item of information, the total times of the user reviewing each item of information, and the total times of the user reviewing and modifying each item of information by using the information identification and the information content of each item of information fed back by the user.
In some embodiments, the candidate behavioral characteristics include at least one of the following: the method comprises the following steps of responding time when a user firstly feeds back various information, staying time of the user on various information feedback pages, total times of modifying various information by the user, total times of reviewing and modifying various information by the user, times of unmatching various information with reserved information, force parameters of pressing information feedback equipment by the user, time of pressing the information feedback equipment by the user, area of pressing the information feedback equipment by the user, and pitch angle parameters and swing angle parameters of handheld information feedback equipment by the user.
In some embodiments, the data preprocessing module 303 is further included and configured to preprocess the behavior data using at least one of the following methods: behavior data with numerical categories lower than a first threshold value in the behavior data are removed; removing the behavior data with the deletion rate higher than a second threshold value from the behavior data; and filling the behavior data with a mode or average number, wherein the missing rate of the behavior data is lower than a second threshold value.
According to the embodiment, the behavior data of the user executing the information feedback operation is utilized, the behavior characteristics of the user executing the information feedback operation can be generated, and the safety degree of the user is predicted by adopting a machine learning method, so that the safety degree data of the user can be simply and efficiently obtained, complex operation and specific instruments are not needed, and the application range is wide.
Further embodiments of the data processing apparatus of the present disclosure are described below in conjunction with fig. 4.
Fig. 4 shows a schematic structural diagram of a data processing apparatus according to further embodiments of the present disclosure. As shown in fig. 4, the data processing apparatus 40 of this embodiment includes: a memory 410 and a processor 420 coupled to the memory 410, the processor 420 being configured to perform the data processing method of any of the foregoing embodiments based on instructions stored in the memory 410.
Memory 410 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
The data processing apparatus 40 may further include an input-output interface 430, a network interface 440, a storage interface 450, and the like. These interfaces 430, 440, 450 and the connection between the memory 410 and the processor 420 may be, for example, via a bus 460. The input/output interface 430 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 440 provides a connection interface for various networking devices. The storage interface 450 provides a connection interface for external storage devices such as an SD card and a usb disk.
The present disclosure also includes a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a data processing method in any of the foregoing embodiments.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (12)

1. A method of data processing, comprising:
acquiring behavior data of a target user for executing information feedback operation;
generating behavior characteristics of information feedback operation executed by a target user by using the behavior data, wherein the information feedback operation is executed to answer a question, and the behavior characteristics comprise the total times of modifying various information by the user;
processing the behavior characteristics by utilizing a pre-trained machine learning model to obtain safety degree data of a target user, wherein the safety degree data comprises
Respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of the target user;
and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user, wherein the safety degree data of the user represents whether the user lies when answering the question.
2. The data processing method of claim 1, further comprising:
acquiring behavior data of a known user for executing information feedback operation;
generating behavior characteristics of the known user for executing the information feedback operation by utilizing behavior data of the known user for executing the information feedback operation;
marking the behavior characteristics of the known user for executing the information feedback operation by using the safety data of the known user;
and training the machine learning model by using the marked behavior characteristics of the known user for executing the information feedback operation, so that the trained machine learning model can process the behavior data of the target user for executing the information feedback operation, and the safety data of the target user is obtained.
3. The data processing method of claim 2, wherein the generating behavior characteristics of the known users performing the information feedback operations using the behavior data of the known users performing the information feedback operations comprises:
generating at least one candidate behavior characteristic of the information feedback operation executed by the known user by utilizing the behavior data of the information feedback operation executed by the known user;
calculating the correlation degree between the candidate behavior characteristics and the safety degree data of the known user;
and taking the candidate behavior characteristics with the correlation degree larger than a preset value as behavior characteristics of the known user for executing information feedback operation.
4. The data processing method of claim 1, wherein the processing the behavior features using a pre-trained machine learning model to obtain the safety data of the target user comprises:
and processing the input behavior characteristics by utilizing the machine learning model with the largest area under the working characteristic curve of the subject in the pre-trained multiple machine learning models, and outputting the safety degree data of the target user.
5. The data processing method of any of claims 1 to 4, wherein the behavioral characteristics comprise at least one of:
the system comprises a response time length when a user firstly feeds back various information, the total times of the user reviewing the various information, the total times of the user reviewing and modifying the various information, the times of the various information not matched with the reserved information, a force parameter of the user pressing the information feedback equipment, and a pitching angle parameter and a swinging angle parameter of the user holding the information feedback equipment.
6. The data processing method of any of claims 1 to 4, wherein the behavioural data comprises at least one of:
the time when the user enters each information feedback page, the time when the user feeds back each item of information, the information identification and the information content when the user feeds back each item of information, the touch parameter when the user presses the information feedback device, and the angle parameter when the user holds the information feedback device.
7. The data processing method of claim 1, wherein the generating behavior characteristics of the target user performing the information feedback operation using the behavior data comprises:
determining the reaction duration when the user primarily feeds back each item of information by using the time when the user enters each information feedback page and the time when the user feeds back each item of information;
alternatively, the first and second electrodes may be,
and determining the total times of modifying the information by the user, the total times of reviewing the information by the user, and the total times of reviewing and modifying the information by the user by using the information identification and the information content of the information fed back by the user.
8. The data processing method of claim 3, wherein the candidate behavioral features comprise at least one of:
the method comprises the following steps of firstly feeding back various items of information by a user, reacting time when the user firstly feeds back the various items of information, staying time of the user on various information feedback pages, modifying total times of the various items of information, reviewing total times of the various items of information by the user, reviewing and modifying total times of the various items of information, times that the various items of information are not matched with reserved information, force parameters of pressing information feedback equipment by the user, time when the user presses the information feedback equipment, area of pressing the information feedback equipment by the user, and pitching angle parameters and swinging angle parameters of handheld information feedback equipment by the user.
9. The data processing method of claim 1, further comprising:
prior to generating the behavioral characteristics, preprocessing the behavioral data using at least one of the following methods:
eliminating the behavior data of which the numerical type is lower than a first threshold value in the behavior data;
rejecting the behavior data with the deletion rate higher than a second threshold value in the behavior data;
and filling the behavior data with a mode or average number, wherein the missing rate of the behavior data is lower than a second threshold value.
10. A data processing apparatus comprising:
the data acquisition module is configured to acquire behavior data of a target user for executing information feedback operation;
the characteristic generating module is configured to generate behavior characteristics of information feedback operations executed by a target user by using the behavior data, wherein the information feedback operations are executed to answer questions, and the behavior characteristics comprise the total times of modifying various information by the user;
a model processing module configured to process the behavior characteristics by using a pre-trained machine learning model to obtain safety degree data of a target user, including
Respectively processing the input behavior characteristics by utilizing various machine learning models trained in advance, and outputting a plurality of preliminary safety degree data of the target user;
and carrying out weighting processing on the plurality of preliminary safety degree data to obtain the safety degree data of the target user, wherein the safety degree data of the user represents whether the user lies when answering the question.
11. A data processing apparatus comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the data processing method of any of claims 1 to 9 based on instructions stored in the memory.
12. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 9.
CN201910852637.6A 2019-09-10 2019-09-10 Data processing method, data processing apparatus, and computer-readable storage medium Active CN110569906B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910852637.6A CN110569906B (en) 2019-09-10 2019-09-10 Data processing method, data processing apparatus, and computer-readable storage medium
PCT/CN2020/110537 WO2021047376A1 (en) 2019-09-10 2020-08-21 Data processing method, data processing apparatus and related devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910852637.6A CN110569906B (en) 2019-09-10 2019-09-10 Data processing method, data processing apparatus, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN110569906A CN110569906A (en) 2019-12-13
CN110569906B true CN110569906B (en) 2022-08-09

Family

ID=68778773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910852637.6A Active CN110569906B (en) 2019-09-10 2019-09-10 Data processing method, data processing apparatus, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110569906B (en)
WO (1) WO2021047376A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569906B (en) * 2019-09-10 2022-08-09 京东科技控股股份有限公司 Data processing method, data processing apparatus, and computer-readable storage medium
CN111949867A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Cross-APP user behavior analysis model training method, analysis method and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107889111A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Crank call identification method and device based on deep neural network
CN108234462A (en) * 2017-12-22 2018-06-29 杭州安恒信息技术有限公司 A kind of method that intelligent intercept based on cloud protection threatens IP
CN108416198A (en) * 2018-02-06 2018-08-17 平安科技(深圳)有限公司 Man-machine identification model establishes device, method and computer readable storage medium
CN109388548A (en) * 2018-09-29 2019-02-26 北京京东金融科技控股有限公司 Method and apparatus for generating information
CN109461068A (en) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 Judgment method, device, equipment and the computer readable storage medium of fraud

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10291638B1 (en) * 2016-09-08 2019-05-14 Skyhigh Networks, Llc Cloud activity threat detection for sparse and limited user behavior data
CN109344906A (en) * 2018-10-24 2019-02-15 中国平安人寿保险股份有限公司 Consumer's risk classification method, device, medium and equipment based on machine learning
CN109544166B (en) * 2018-11-05 2023-05-30 创新先进技术有限公司 Risk identification method and risk identification device
CN110569906B (en) * 2019-09-10 2022-08-09 京东科技控股股份有限公司 Data processing method, data processing apparatus, and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107889111A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Crank call identification method and device based on deep neural network
CN108234462A (en) * 2017-12-22 2018-06-29 杭州安恒信息技术有限公司 A kind of method that intelligent intercept based on cloud protection threatens IP
CN108416198A (en) * 2018-02-06 2018-08-17 平安科技(深圳)有限公司 Man-machine identification model establishes device, method and computer readable storage medium
CN109461068A (en) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 Judgment method, device, equipment and the computer readable storage medium of fraud
CN109388548A (en) * 2018-09-29 2019-02-26 北京京东金融科技控股有限公司 Method and apparatus for generating information

Also Published As

Publication number Publication date
CN110569906A (en) 2019-12-13
WO2021047376A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
CN107704834B (en) Micro-surface examination assisting method, device and storage medium
CN110008397B (en) Recommendation model training method and device
CN104993962A (en) Method and system for obtaining use state of terminal
CN108399565A (en) Financial product recommendation apparatus, method and computer readable storage medium
CN110569906B (en) Data processing method, data processing apparatus, and computer-readable storage medium
CN112445757A (en) Visual log storage and backtracking method and device and computer equipment
CN107194213A (en) A kind of personal identification method and device
CN109659009B (en) Emotion management method and device and electronic equipment
CN111026967B (en) Method, device, equipment and medium for obtaining user interest labels
CN110418204B (en) Video recommendation method, device, equipment and storage medium based on micro expression
CN113538070B (en) User life value cycle detection method and device and computer equipment
CN109872026A (en) Evaluation result generation method, device, equipment and computer readable storage medium
CN108985501B (en) Index feature extraction-based stock index prediction method, server and storage medium
CN113705792A (en) Personalized recommendation method, device, equipment and medium based on deep learning model
CN109785114A (en) Credit data methods of exhibiting, device, equipment and medium for audit of providing a loan
CN115687790B (en) Advertisement pushing method and system based on big data and cloud platform
CN113806568B (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN109345184A (en) Nodal information processing method, device, computer equipment and storage medium based on micro- expression
CN114782224A (en) Webpage evaluation cheating monitoring method and device based on user characteristics and electronic equipment
CN115294592A (en) Claim settlement information acquisition method and acquisition device, computer equipment and storage medium
KR101781597B1 (en) Apparatus and method for creating information on electronic publication
CN113643283A (en) Method, device, equipment and storage medium for detecting aging condition of human body
CN113407696A (en) Collection table processing method, device, equipment and storage medium
CN110163657A (en) Insurance products recommended method and relevant device based on manifold learning arithmetic
CN114495140B (en) Method, system, device, medium, and program product for extracting information of table

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Beijing Daxing District, Beijing

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Beijing Daxing District, Beijing

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant