WO2019196534A1 - 验证码的人机识别方法及装置 - Google Patents
验证码的人机识别方法及装置 Download PDFInfo
- Publication number
- WO2019196534A1 WO2019196534A1 PCT/CN2019/072354 CN2019072354W WO2019196534A1 WO 2019196534 A1 WO2019196534 A1 WO 2019196534A1 CN 2019072354 W CN2019072354 W CN 2019072354W WO 2019196534 A1 WO2019196534 A1 WO 2019196534A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- data
- machine learning
- verification code
- learning model
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/316—User authentication by observing the pattern of computer usage, e.g. typical user behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/36—User authentication by graphic or iconic representation
Definitions
- the present disclosure mainly relates to the technical field of machine learning, and more particularly to a human-computer recognition method and apparatus for a verification code.
- Human-computer recognition is a public Turing machine test for identifying whether a registrant is a normal user or an abnormal user, and distinguishes between computer and human security automation.
- An abnormal user that is, a computer or a machine, can request a login by continuously accessing the website, and simulate normal user input of a verification code to attack the website service. Therefore, it is crucial to identify whether a normal user or an abnormal user launches a defensive attack against a large website by initiating a login request.
- the CAPTCHA is an abbreviation for "Completely Automated Public Turing test to tell Computers and Humans Apart", which is a public fully automatic program that distinguishes whether a user is a computer or a normal user. This can automatically prevent malicious users from using a specific program to make continuous login attempts to the website.
- a method for identifying a registrant as a normal user or an abnormal user is to establish a user browsing behavior model such as a Hidden Semi-Markov model (HsMM) to monitor user access by using data obtained from a server log. Normality.
- HsMM Hidden Semi-Markov model
- This model is usually a statistical model with lower accuracy and slower recognition.
- the present invention proposes a method for performing human-computer recognition using a machine learning model.
- Machine learning is a kind of artificial intelligence. Its main purpose is to use the past experience or data to obtain certain rules from a large amount of data through an algorithm that allows the computer to automatically "learn", so as to predict or reason the future data. .
- an embodiment of the present invention provides a human-machine recognition method for a verification code, including: collecting real-time user data when a first user inputs a verification code; and predicting real-time user data according to a machine learning model to determine The attribute of the first user, the machine learning model is obtained by training a sample data set comprising one or more sets of training sample data and a label respectively set for each set of training sample data, the label indicating the attribute of the second user .
- the training sample data includes at least one of: behavior data of the second user, risk data of the second user, terminal information data of the second user, and real-time user data including the following At least one of the items: behavior data of the first user, risk data of the first user, and terminal information data of the first user.
- the verification code is a slider verification code
- the behavior data of the second user includes mouse movement trajectory data before and after the second user drags the slider verification code
- the risk data of the second user includes The identity data and/or the credit data of the second user
- the terminal information data of the second user includes at least one of user agent data, device fingerprint and IP address
- the behavior data of the first user includes the first user dragging
- the risk data of the first user includes identity data and/or credit data of the first user
- the terminal information data of the first user includes at least at least one of user agent data, device fingerprint and IP address.
- the attribute of the first user represents whether the first user is a normal user or an abnormal user.
- the method of the first aspect further comprises: collecting a sample data set; and using the sample data set to train the machine learning model.
- the method of the first aspect further comprises: adjusting the machine learning model with the real-time user data as new training sample data.
- using a sample data set to train a machine learning model includes: characterizing each set of training sample data in one or more sets of training sample data to obtain one or more sets of samples Features; and determining parameters of the machine learning model by one or more sets of sample features and tags corresponding to each set of training sample data, respectively.
- real-time user data is predicted according to a machine learning model, including: performing feature engineering on real-time user data to obtain real-time user features, and predicting real-time user features using a machine learning model.
- the machine learning model is an XGboost model.
- an embodiment of the present invention provides a human face recognition device for a verification code, including: an acquisition module, configured to collect real-time user data when a first user inputs a verification code; and a prediction module, configured to learn according to a machine The model predicts real-time user data to determine the attributes of the first user.
- the machine learning model is obtained by training the sample data set.
- the sample data set includes one or more sets of training sample data and is set for each set of training sample data respectively.
- the training sample data includes at least one of: behavior data of the second user, risk data of the second user, terminal information data of the second user, and real-time user data including the following At least one of the items: behavior data of the first user, risk data of the first user, and terminal information data of the first user.
- the verification code is a slider verification code
- the behavior data of the second user includes mouse movement trajectory data before and after the second user drags the slider verification code
- the risk data of the second user includes The identity data and/or the credit data of the second user
- the terminal information data of the second user includes at least one of user agent data, device fingerprint and IP address
- the behavior data of the first user includes the first user dragging
- the risk data of the first user includes identity data and/or credit data of the first user
- the terminal information data of the first user includes at least at least one of user agent data, device fingerprint and IP address.
- the attribute of the first user represents whether the first user is a normal user or an abnormal user.
- the apparatus of the second aspect further comprises: a collection module for collecting a sample data set; and a training module for training the machine learning model using the sample data set.
- the apparatus of the second aspect further comprises: an adjustment module for adjusting the machine learning model with the real-time user data as new training sample data.
- the training module is configured to feature engineering each set of training sample data in one or more sets of training sample data to obtain one or more sets of sample features, and to pass one or more The set of sample features and the tags corresponding to each set of training sample data respectively determine the parameters of the machine learning model.
- the prediction module is configured to feature engineering real-time user data to obtain real-time user characteristics, and to predict real-time user features using a machine learning model.
- the machine learning model is an XGboost model.
- an embodiment of the present invention provides a computer device, including: a processor; a storage device, where the storage device includes computer instructions stored thereon, when executed by the processor, causing the processor to execute the first The human-computer recognition method of the verification code described in the aspect.
- an embodiment of the present invention provides a computer readable storage medium comprising computer instructions stored thereon, the computer instructions, when executed by a processor, cause the processor to perform the verification code of the first aspect Human recognition method.
- the embodiment of the invention provides a human-machine recognition method and device for verifying a code.
- the machine learning model obtained by the training to predict the real-time user data in the verification code verification stage, it is possible to accurately identify whether the user is a normal user, and thus An abnormal user intercepted.
- the conventionally used statistical model can only handle a small amount of data and a narrower data attribute, and in the embodiment of the present invention, a larger amount of sample data can be processed when training a machine learning model, which makes it comparable to the conventional method. Increased reliability and accuracy of predictions.
- FIG. 1 is a schematic flowchart of a human-machine recognition method for a verification code according to an embodiment of the invention.
- FIG. 2 is a schematic flowchart of a human-machine recognition method for a verification code according to another embodiment of the present invention.
- FIG. 3 is a schematic flowchart of a method for training a machine learning model according to an embodiment of the invention.
- FIG. 4 is a schematic flowchart of a method for predicting real-time user data according to an embodiment of the invention.
- FIG. 5 is a schematic structural diagram of a human-machine recognition apparatus for a verification code according to an embodiment of the present invention.
- FIG. 6 is a block diagram of a computer device for human-computer recognition of a verification code, according to an exemplary embodiment of the present invention.
- the slider verification code is a kind of verification code, which is a type of verification code that requires the user to drag the slider to a certain position in the verification code verification stage to achieve the verification effect.
- the verification code is the slider verification code
- the invention provides a human-computer recognition method for verification code, and can then establish an accurate and robust user identification model in the verification code verification phase.
- FIG. 1 is a schematic flowchart of a human-machine recognition method for a verification code according to an embodiment of the invention. As shown in FIG. 1, the method includes the following.
- the machine learning model 120 predicting real-time user data according to a machine learning model to determine an attribute of the first user, the machine learning model is obtained by training a sample data set, where the sample data set includes one or more sets of training sample data and respectively for each group The label set by the training sample data, the label indicating the attribute of the second user.
- the first user may be a user who actually uses the machine learning model to identify the verification code input by the first user
- the second user may be a user corresponding to the sample data set.
- a tag corresponding to each set of training sample data may be used to represent attributes of a second user that generated the set of training sample data.
- the collected one or more sets of training sample data and the labels respectively corresponding to each set of training sample data are collectively referred to as a sample data set.
- the embodiment of the invention provides a human-computer recognition method for verification code.
- the machine learning model obtained by training to predict real-time user data in the verification code verification stage it is possible to accurately identify whether the user is a normal user, thereby Intercept.
- the conventionally used statistical model can only handle a small amount of data and a narrower data attribute, and in the embodiment of the present invention, a larger amount of sample data can be processed when training a machine learning model, which makes it comparable to the conventional method. Increased reliability and accuracy of predictions.
- machine learning model used in the embodiment of the present invention can be operated in parallel by multi-threading of the CPU, so that the speed of prediction can also be improved.
- the attribute of the second user represents whether the second user is a normal user or an abnormal user.
- the normal user can represent that the operation object of the input verification code is a person, and the operation object that the abnormal user can represent the input verification code is a computer or the like.
- the training sample data of the normal user can be used as a negative sample, and the label is set to 0; at the same time, the sample data of the abnormal user can be used as a positive sample, and the label is set to 1.
- the attribute of the first user may also represent whether the first user is a normal user or an abnormal user.
- the attribute of the first user can be determined, that is, whether the first user is a normal user or an abnormal user.
- the attribute of the first user/the attribute of the second user may represent other meanings set according to the predicted goal.
- the real-time user data comprises at least one of the following: behavior data of the first user, risk data of the first user, terminal information data of the first user.
- the training sample data includes at least one of the following: behavior data of the second user, risk data of the second user, and terminal information data of the second user.
- the behavior data of the first user may include a motion track and/or a click behavior of the first user operating the mouse, and the like;
- the risk data of the first user may include identity information and/or credit data of the first user, etc.;
- the user's terminal information data may include at least one of User-agent data, device fingerprint, and client IP address.
- the behavior data of the second user, the risk data of the second user, and the terminal information data of the second user are similar to those of the first user. To avoid repetition, details are not described herein again.
- the risk data and the terminal information data of the potential abnormal user can be obtained through the data provider or some shared information systems.
- the verification code is a slider verification code
- the behavior data of the first user includes mouse movement trajectory data before and after the slider verification code is dragged by the first user
- the behavior data of the second user includes the second The user moves the trajectory data of the mouse before and after dragging the slider verification code
- the mouse movement trajectory data includes: an abscissa, an ordinate, a time stamp, and a number of retries for each movement of the mouse.
- the verification code may also be other forms of verification code, such as a text or picture verification code
- the training sample data may also be other data, such as the second user's identity information, credit data, and the like.
- the method further comprises: collecting a sample data set; and using the sample data set to train the machine learning model.
- each set of training sample data refers to all relevant data obtained by the computer when logging in for each second user.
- the mouse movement track data of the one or more sets of normal users and/or abnormal users before and after dragging the slider verification code and the terminal information data of the second user may be collected by the log server, wherein the model is constructed.
- the normal user and/or the abnormal user can be simulated to log in the website and drag the slider verification code, so that the computer obtains the mouse movement track data.
- using a sample data set to train a machine learning model includes: performing feature engineering on each set of training sample data of one or more sets of training sample data to obtain one or more sets of sample features; And determining parameters of the machine learning model by one or more sets of sample features and tags corresponding to each set of training sample data, respectively.
- the so-called feature engineering design refers to maximizing the extraction of features from the collected raw data, and obtaining a more comprehensive, fuller and multi-faceted representation of the original data for use by the model.
- the feature engineering may include selecting a feature with high correlation according to the target, performing dimensionality reduction or up-dimensional processing on the data, and performing numerical processing on the original data.
- the steps of feature engineering may also be omitted.
- mouse movement trajectory data of one or more sets of normal users and/or abnormal users before and after dragging the slider verification code and terminal information data of the second user are collected by the log server.
- the mouse movement trajectory data such as the abscissa, ordinate, time stamp and number of retries of each mouse movement
- the following characteristics are extracted: the time elapsed by the mouse movement, the distance moved laterally, the maximum distance, and the average Speed, maximum speed and speed variance, distance traveled longitudinally, maximum distance, average speed, maximum speed and speed variance, number of sliding attempts, time interval before starting to slide.
- the collected terminal information data the following features are extracted: user agent data, device fingerprint data, and IP address.
- the user agent data may include: browser-related attributes such as an operating system and version, a CPU type, a browser and a version, a browser language, and a browser plug-in.
- the device fingerprint data may include: a hardware ID of the device, an IMEI of the mobile phone, a Mac address of the network card, a font setting, and the like to identify feature information of the device.
- the terminal information data is collected in addition to the behavior data of the second user, which improves the prediction accuracy of the machine learning model for the risk terminal.
- the characterized sample data is used, i.e., one or more sets of sample features are used, and a label corresponding to each set of training sample data, respectively (in one embodiment, the label is "0" or " 1”) to determine the parameters of the machine learning model.
- the machine learning model used is a tree-based integrated learning model XGboost (eXtreme Gradient Boosting).
- XGboost eXtreme Gradient Boosting
- K represents the number of trees to learn
- x i is the input
- F is the hypothesis space
- f(x) is the Classification and Regression Tree (CART):
- q(x) denotes that the sample x is assigned to a leaf node
- w is the fraction of the leaf node
- w q(x) represents the predicted value of the regression tree for the sample.
- the model uses the prediction results of each regression tree in the K-regression tree to iteratively calculate to obtain the final prediction result. Also, the input samples for each regression tree are related to the training and prediction of the previous regression tree.
- one or more sets of training sample data are separately feature engineered as described above to obtain one or more sets of sample features.
- one or more sets of sample features are taken as x i in the data set D, and the tag corresponding to each set of training sample data is used as y i in the data set D to learn the parameters of the K-tree regression tree in the XGboost model. , that is, to determine the input x i of each regression tree and its output
- the mapping relationship, where x i can be an n-dimensional vector or array.
- the prediction result of the above model is obtained by inputting the known training sample data x i Compared with the actual mapped label y i of the training sample data, the model parameters are continuously adjusted until the expected accuracy is reached, and the model parameters are determined, thereby establishing a prediction model.
- tree-based boost models other than the XGboost model may be used, or other types of machine learning models, such as random forest models, may be used.
- the generated model is saved after the model has been modeled based on the training sample data and its corresponding label.
- the model can be used to predict real-time users, ie, 110 and 120 can be executed.
- the data of the first user is captured by the data collection code deployed to the login interface of the website for data burying.
- the verification code is a slider verification code
- the mouse movement trajectory data of the drag slider verification code and the terminal information data of the user are collected for each user who is performing the login operation. The types of these data are the same as the training sample data described above, and therefore will not be described again here.
- the trained real-time user data is predicted using a trained machine learning model to determine the attributes of the first user.
- 120 may include performing feature engineering on real-time user data; using a previously trained machine learning model to predict the first user to determine attributes of the first user.
- the method of feature engineering design and the obtained feature type are similar to the method and type of feature engineering design of the training sample data described above, and thus will not be described herein.
- the following model function is used to determine the attributes of the first user:
- the parameters of the model function have been determined in the above steps, so that the characterization of the real-time user data as input x i can obtain the prediction result for the input.
- the input x i can be an n-dimensional vector or an array.
- the prediction result is "1" it means that the current login operation is an abnormal user, that is, the machine or computer program logs in, and the user is prevented from logging in; if the prediction result is "0", it indicates that the current login operation is a normal user. , allowing users to log in. Specifically, the prediction result may be fed back to the webpage front-end server, thereby implementing interception by an abnormal user.
- the method further comprises: adjusting the machine learning model by using real-time user data as new training sample data.
- the real-time user data is fed back to the machine learning model as new training sample data, and the model is trained to update the model parameters, thereby improving the prediction accuracy of the model.
- the updated model is trained with a period of T+1, where T represents a natural day, ie, the relevant data for all users logging on each natural day (T) is on the second natural day after the natural day ( T+1)
- the model training update is performed as new training sample data to adjust the model parameters.
- the updated model can also be trained at any time interval, for example, the update can be trained in real time, the update can be trained hourly, and the like.
- the human-computer recognition method of the verification code provided by the embodiment of the present invention can establish an accurate and robust user identification model in the verification code verification stage, thereby quickly and accurately identifying the user type.
- a prediction accuracy of 95% can be achieved.
- FIG. 2 is a schematic flowchart of a human-machine recognition method for a verification code according to another embodiment of the present invention. As shown in FIG. 2, the method includes the following.
- the sample data set includes one or more sets of training sample data and a label respectively set for each set of training sample data, the label indicating an attribute of the second user corresponding to the sample data set, ie, whether the second user is a normal user or Abnormal user.
- 240 Predicting real-time user data according to a machine learning model to determine attributes of the first user.
- 280 may be performed after 240, or may be performed after 260 and 270, which is not limited by the present invention.
- 220 may further include the following.
- the label setting process can be referred to the description in FIG. 1 , and details are not described herein again.
- 221 may also be performed prior to 220.
- 222 may be performed before 221 or after 221. After the machine learning model is established, the machine learning model is saved and the steps after 230 and 230 can be performed.
- 240 may further include the following.
- the method of the feature engineering and the type of the obtained feature, and the process of determining the attribute of the first user may be referred to the description in FIG. 1 and will not be further described herein.
- FIG. 5 is a schematic structural diagram of a human-machine recognition apparatus 500 for a verification code according to an embodiment of the present invention.
- the apparatus 500 includes: an acquisition module 510, configured to collect real-time user data when a first user inputs a verification code; and a prediction module 520, configured to predict real-time user data according to a machine learning model to determine The attribute of the first user, the machine learning model is obtained by training a sample data set comprising one or more sets of training sample data and a label respectively set for each set of training sample data, the label indicating the attribute of the second user .
- the embodiment of the invention provides a human-machine identification device for verifying a code.
- the machine learning model obtained by the training to predict the real-time user data in the verification code verification phase, it is possible to accurately identify whether the user is a normal user and thus the abnormal user. Intercept.
- the conventionally used statistical model can only handle a small amount of data and a narrower data attribute, and in the embodiment of the present invention, a larger amount of sample data can be processed when training a machine learning model, which makes it comparable to the conventional method. Increased reliability and accuracy of predictions.
- the training sample data includes at least one of the following: behavior data of the second user, risk data of the second user, terminal information data of the second user, and the real-time user data includes the following items: At least one of: behavior data of the first user, risk data of the first user, terminal information data of the first user.
- the verification code is a slider verification code
- the behavior data of the second user includes mouse movement trajectory data before and after the second user drags the slider verification code
- the risk data of the second user includes the second The identity data and/or the credit data of the user
- the terminal information data of the second user includes at least one of user agent data, device fingerprint and IP address
- the behavior data of the first user includes the first user verifying by dragging the slider Mouse movement track data before and after the code
- the risk data of the first user includes identity data and/or credit data of the first user
- the terminal information data of the first user includes at least one of user agent data, device fingerprint and IP address.
- the attribute of the first user represents whether the first user is a normal user or an abnormal user.
- the apparatus 500 further includes: a collection module 530 for collecting a sample data set; and a training module 540 for training the machine learning model using the sample data set.
- the apparatus 500 further includes an adjustment module 550 for adjusting the machine learning model by using real-time user data as new training sample data.
- the training module 540 is configured to perform feature engineering on each set of training sample data of one or more sets of training sample data to obtain one or more sets of sample features, and through one or more groups The sample features and the tags corresponding to each set of training sample data are used to determine parameters of the machine learning model.
- the prediction module 520 is configured to perform feature engineering on real-time user data to obtain real-time user features, and to predict real-time user features using a machine learning model.
- the machine learning model is an XGboost model.
- FIG. 6 is a block diagram of a computer device 600 for human-computer identification of a verification code, in accordance with an exemplary embodiment of the present invention.
- apparatus 600 includes a processing component 610 that further includes one or more processors, and memory resources represented by memory 620 for storing instructions executable by processing component 610, such as an application.
- An application stored in memory 620 can include one or more modules each corresponding to a set of instructions.
- the processing component 610 is configured to execute instructions to perform the human identification method of the verification code described above.
- Apparatus 600 can also include a power supply component configured to perform power management of apparatus 600, a wired or wireless network interface configured to connect apparatus 600 to the network, and an input/output (I/O) interface.
- Device 600 can operate based on an operating system stored in memory 620, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a non-transitory computer readable storage medium when the instructions in the storage medium are executed by the processor of the apparatus 600, enabling the apparatus 600 to perform a human identification method of the verification code, comprising: collecting the first user input Real-time user data at the time of verification code; and prediction of real-time user data according to a machine learning model to determine attributes of the first user, the machine learning model is obtained by training a sample data set comprising one or more sets of training The sample data and the labels respectively set for each set of training sample data, the labels indicating the attributes of the second user.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
- the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
- the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, and can store a program check code. Medium.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Social Psychology (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (20)
- 一种验证码的人机识别方法,包括:采集第一用户输入验证码时的实时用户数据;以及根据机器学习模型对所述实时用户数据进行预测,以确定所述第一用户的属性,所述机器学习模型是通过训练样本数据集得到的,所述样本数据集包括一组或多组训练样本数据以及分别针对每组训练样本数据设定的标签,所述标签表示第二用户的属性。
- 根据权利要求1所述的方法,其中,所述训练样本数据包括以下各项中的至少一项:所述第二用户的行为数据、所述第二用户的风险数据、所述第二用户的终端信息数据,所述实时用户数据包括以下各项中的至少一项:所述第一用户的行为数据、所述第一用户的风险数据、所述第一用户的终端信息数据。
- 根据权利要求2所述的方法,其中,所述验证码为滑块验证码,并且,所述第二用户的行为数据包括所述第二用户在拖动所述滑块验证码前后的鼠标移动轨迹数据,所述第二用户的风险数据包括所述第二用户的身份数据和/或征信数据,所述第二用户的终端信息数据包括用户代理数据、设备指纹和IP地址中的至少一项,所述第一用户的行为数据包括所述第一用户在拖动所述滑块验证码前后的鼠标移动轨迹数据,所述第一用户的风险数据包括所述第一用户的身份数据和/或征信数据,所述第一用户的终端信息数据包括用户代理数据、设备指纹和IP地址中的至少一项。
- 根据权利要求1至3中任一项所述的方法,其中,所述第一用户的属性代表所述第一用户是正常用户还是异常用户。
- 根据权利要求1至4中任一项所述的方法,还包括:收集所述样本数据集;使用所述样本数据集来训练所述机器学习模型。
- 根据权利要求5所述的方法,还包括:将所述实时用户数据作为新的训练样本数据来调整所述机器学习模型。
- 根据权利要求5所述的方法,其中,所述使用所述样本数据集来训练所述机器学习模型,包括:对所述一组或多组训练样本数据中的每组训练样本数据进行特征工程设计,以获得一组或多组样本特征;以及通过所述一组或多组样本特征以及分别与每组训练样本数据相对应的所述标签来确定所述机器学习模型的参数。
- 根据权利要求1至7中任一项所述的方法,其中,所述根据所述机器学习 模型对所述实时用户数据进行预测,包括:对所述实时用户数据进行特征工程设计,以获得实时用户特征,使用所述机器学习模型对所述实时用户特征进行预测。
- 根据权利要求1至8中任一项所述的方法,其中,所述机器学习模型为XGboost模型。
- 一种验证码的人机识别装置,包括:采集模块,用于采集第一用户输入验证码时的实时用户数据;以及预测模块,用于根据机器学习模型对所述实时用户数据进行预测,以确定所述第一用户的属性,所述机器学习模型是通过训练样本数据集得到的,所述样本数据集包括一组或多组训练样本数据以及分别针对每组训练样本数据设定的标签,所述标签表示第二用户的属性。
- 根据权利要求10所述的装置,其中,所述训练样本数据包括以下各项中的至少一项:所述第二用户的行为数据、所述第二用户的风险数据、所述第二用户的终端信息数据,所述实时用户数据包括以下各项中的至少一项:所述第一用户的行为数据、所述第一用户的风险数据、所述第一用户的终端信息数据。
- 根据权利要求11所述的装置,其中,所述验证码为滑块验证码,并且,所述第二用户的行为数据包括所述第二用户在拖动所述滑块验证码前后的鼠标移动轨迹数据,所述第二用户的风险数据包括所述第二用户的身份数据和/或征信数据,所述第二用户的终端信息数据包括用户代理数据、设备指纹和IP地址中的至少一项,所述第一用户的行为数据包括所述第一用户在拖动所述滑块验证码前后的鼠标移动轨迹数据,所述第一用户的风险数据包括所述第一用户的身份数据和/或征信数据,所述第一用户的终端信息数据包括用户代理数据、设备指纹和IP地址中的至少一项。
- 根据权利要求10至12中任一项所述的装置,其中,所述第一用户的属性代表所述第一用户是正常用户还是异常用户。
- 根据权利要求10至13中任一项所述的装置,还包括:收集模块,用于收集所述样本数据集;训练模块,用于使用所述样本数据集来训练所述机器学习模型。
- 根据权利要求14所述的装置,还包括:调整模块,用于将所述实时用户数据作为新的训练样本数据来调整所述机器学习模型。
- 根据权利要求14所述的装置,其中,所述训练模块用于对所述一组或多组训练样本数据中的每组训练样本数据进行特征工程设计,以获得一组或多组样本特征,以及通过所述一组或多组样本特征以及分别与每组训练样本数据相对应的所述标签来确定所述机器学习模型的参数。
- 根据权利要求10至16中任一项所述的装置,其中,所述预测模块用于对所述实时用户数据进行特征工程设计,以获得实时用户特征,以及使用所述机器学习模型对所述实时用户特征进行预测。
- 根据权利要求10至17中任一项所述的装置,其中,所述机器学习模型为XGboost模型。
- 一种计算机设备,包括:处理器;存储设备,所述存储设备包括存储在其上的计算机指令,所述计算机指令在被所述处理器执行时,使得所述处理器执行权利要求1至9中任一项所述的验证码的人机识别方法。
- 一种计算机可读存储介质,包括存储在其上的计算机指令,所述计算机指令在被处理器执行时,使得所述处理器执行权利要求1至9中任一项所述的验证码的人机识别方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/392,311 US20190311114A1 (en) | 2018-04-09 | 2019-04-23 | Man-machine identification method and device for captcha |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810309762.8A CN108491714A (zh) | 2018-04-09 | 2018-04-09 | 验证码的人机识别方法 |
CN201810309762.8 | 2018-04-09 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/392,311 Continuation US20190311114A1 (en) | 2018-04-09 | 2019-04-23 | Man-machine identification method and device for captcha |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019196534A1 true WO2019196534A1 (zh) | 2019-10-17 |
Family
ID=63315257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/072354 WO2019196534A1 (zh) | 2018-04-09 | 2019-01-18 | 验证码的人机识别方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108491714A (zh) |
WO (1) | WO2019196534A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420276A (zh) * | 2021-08-20 | 2021-09-21 | 北京顶象技术有限公司 | 基于验证码的风险确定方法、装置、电子设备和存储介质 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491714A (zh) * | 2018-04-09 | 2018-09-04 | 众安信息技术服务有限公司 | 验证码的人机识别方法 |
CN109255230A (zh) * | 2018-09-29 | 2019-01-22 | 武汉极意网络科技有限公司 | 异常验证行为的识别方法、系统、用户设备及存储介质 |
CN109361660B (zh) * | 2018-09-29 | 2021-09-03 | 武汉极意网络科技有限公司 | 异常行为分析方法、系统、服务器及存储介质 |
CN109409049A (zh) * | 2018-10-10 | 2019-03-01 | 北京京东金融科技控股有限公司 | 用于识别交互操作的方法和装置 |
CN110059457B (zh) * | 2018-11-05 | 2020-06-30 | 阿里巴巴集团控股有限公司 | 一种核身方法及装置 |
CN109902474B (zh) * | 2019-03-01 | 2020-11-03 | 北京奇艺世纪科技有限公司 | 一种滑动验证码中移动对象的移动轨迹的确定方法及装置 |
CN110046647A (zh) * | 2019-03-08 | 2019-07-23 | 同盾控股有限公司 | 一种验证码机器行为识别方法及装置 |
CN110490632A (zh) * | 2019-07-01 | 2019-11-22 | 广州阿凡提电子科技有限公司 | 一种潜在客户识别方法、电子设备及存储介质 |
CN111062019A (zh) * | 2019-12-13 | 2020-04-24 | 支付宝(杭州)信息技术有限公司 | 用户攻击检测方法、装置、电子设备 |
CN111783063A (zh) * | 2020-06-12 | 2020-10-16 | 完美世界(北京)软件科技发展有限公司 | 一种操作的验证方法和装置 |
CN111897435B (zh) * | 2020-08-06 | 2022-08-02 | 陈涛 | 一种人机识别的方法、识别系统、mr智能眼镜及应用 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015049055A1 (de) * | 2013-10-04 | 2015-04-09 | Giesecke & Devrient Gmbh | Verfahren zum darstellen einer information |
CN106155298A (zh) * | 2015-04-21 | 2016-11-23 | 阿里巴巴集团控股有限公司 | 人机识别方法及装置、行为特征数据的采集方法及装置 |
CN107846412A (zh) * | 2017-11-28 | 2018-03-27 | 五八有限公司 | 验证码请求处理方法、装置及验证码处理系统 |
CN108491714A (zh) * | 2018-04-09 | 2018-09-04 | 众安信息技术服务有限公司 | 验证码的人机识别方法 |
-
2018
- 2018-04-09 CN CN201810309762.8A patent/CN108491714A/zh active Pending
-
2019
- 2019-01-18 WO PCT/CN2019/072354 patent/WO2019196534A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015049055A1 (de) * | 2013-10-04 | 2015-04-09 | Giesecke & Devrient Gmbh | Verfahren zum darstellen einer information |
CN106155298A (zh) * | 2015-04-21 | 2016-11-23 | 阿里巴巴集团控股有限公司 | 人机识别方法及装置、行为特征数据的采集方法及装置 |
CN107846412A (zh) * | 2017-11-28 | 2018-03-27 | 五八有限公司 | 验证码请求处理方法、装置及验证码处理系统 |
CN108491714A (zh) * | 2018-04-09 | 2018-09-04 | 众安信息技术服务有限公司 | 验证码的人机识别方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420276A (zh) * | 2021-08-20 | 2021-09-21 | 北京顶象技术有限公司 | 基于验证码的风险确定方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN108491714A (zh) | 2018-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019196534A1 (zh) | 验证码的人机识别方法及装置 | |
US20190311114A1 (en) | Man-machine identification method and device for captcha | |
CN108009521B (zh) | 人脸图像匹配方法、装置、终端及存储介质 | |
WO2019153604A1 (zh) | 人机识别模型的建立装置、方法及计算机可读存储介质 | |
WO2019233421A1 (zh) | 图像处理方法及装置、电子设备、存储介质 | |
US10938927B2 (en) | Machine learning techniques for processing tag-based representations of sequential interaction events | |
JP7414901B2 (ja) | 生体検出モデルのトレーニング方法及び装置、生体検出の方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム | |
CN107193962B (zh) | 一种互联网推广信息的智能配图方法及装置 | |
WO2022105118A1 (zh) | 基于图像的健康状态识别方法、装置、设备及存储介质 | |
CN109034069B (zh) | 用于生成信息的方法和装置 | |
EP3286679A1 (en) | Method and system for identifying a human or machine | |
WO2020238353A1 (zh) | 数据处理方法和装置、存储介质及电子装置 | |
US10511681B2 (en) | Establishing and utilizing behavioral data thresholds for deep learning and other models to identify users across digital space | |
WO2022148038A1 (zh) | 信息推荐方法及装置 | |
CN110544109A (zh) | 用户画像生成方法、装置、计算机设备和存储介质 | |
CN110855648B (zh) | 一种网络攻击的预警控制方法及装置 | |
CN107944032B (zh) | 用于生成信息的方法和装置 | |
US20190347472A1 (en) | Method and system for image identification | |
WO2019061664A1 (zh) | 电子装置、基于用户上网数据的产品推荐方法及存储介质 | |
CN111898561B (zh) | 一种人脸认证方法、装置、设备及介质 | |
CN115941322B (zh) | 基于人工智能的攻击检测方法、装置、设备及存储介质 | |
CN109377347A (zh) | 基于特征选择的网络信用预警方法、系统及电子设备 | |
WO2021175010A1 (zh) | 用户性别识别的方法、装置、电子设备及存储介质 | |
CN110414562A (zh) | X光片的分类方法、装置、终端及存储介质 | |
CN114238764A (zh) | 基于循环神经网络的课程推荐方法、装置及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019532747 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19785382 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/02/2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19785382 Country of ref document: EP Kind code of ref document: A1 |