US20150161365A1 - Automatic construction of human interaction proof engines - Google Patents

Automatic construction of human interaction proof engines Download PDF

Info

Publication number
US20150161365A1
US20150161365A1 US14/624,936 US201514624936A US2015161365A1 US 20150161365 A1 US20150161365 A1 US 20150161365A1 US 201514624936 A US201514624936 A US 201514624936A US 2015161365 A1 US2015161365 A1 US 2015161365A1
Authority
US
United States
Prior art keywords
hip
schemes
scheme
captchas
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/624,936
Inventor
Geoffrey J. Hulten
Patrice Y. Simard
Darko Kirovski
Jesper B. Lind
Christopher A. Meek
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to US14/624,936 priority Critical patent/US20150161365A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIND, JESPER B., SIMARD, PATRICE Y., HULTEN, GEOFFREY J., MEEK, CHRIISTOPHER A., KIROVSKI, DARKO
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Publication of US20150161365A1 publication Critical patent/US20150161365A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2133Verifying human interaction, e.g., Captcha

Definitions

  • a human interaction proof which is sometimes referred to as a “captcha,” is a mechanism that is used to distinguish human users from robots.
  • letters and numbers are displayed on a screen as graphics in some way that is designed to obscure the letters and numbers.
  • a user has to type the letters and numbers into a box as a form of proof that the user is human.
  • the theory behind captchas is that recognizing symbols that intentionally have been obscured is a hard problem that demands the flexibility of the human brain. Thus, captchas are something akin to an applied Turing test.
  • captchas A problem that arises with captchas is that they can be broken in various ways. Once a particular captcha scheme has been in use for some amount of time, the obscured symbols become recognizable in the sense that optical character recognition (OCR) systems can be trained to recognize them. OCR is thus an automated way of breaking captchas, and it can work as long as there is enough data on which to train the OCR.
  • the training data can be generated by human captcha solvers, or can even be generated just by guessing solutions and analyzing which guesses succeed and which ones fail. Since captchas themselves can be used as training data, for as long as a captcha scheme is in use it continues to generate training data that can be used to break the scheme.
  • captcha schemes generally have a limited shelf life, after which they are likely to have been broken.
  • another way to break a captcha scheme is to use inexpensive human labor to solve captchas.
  • Captchas can be transmitted electronically anywhere in the world (including places where labor is inexpensive), and teams of people can be employed to solve captchas.
  • the solved captchas can be used in real-time, or the solutions can be stored and used as training data for OCR systems, thereby allowing human breaking to feed the process of automated breaking.
  • captchas are used ensure, probabilistically, that services are being used by humans rather than machines, in order for captchas to continue to serve their intended purpose, the captcha schemes often have to be changed. But changing the captcha scheme involves designing and testing a new scheme, which can be labor intensive. Thus, new captcha schemes generally are not designed and deployed as frequently as they could be.
  • Captchas may be specified using a system that streamlines the process of describing the elements and parameters of the scheme. Moreover, captchas schemes may be changed and enhanced over time, by using a genetic algorithm to change the elements and parameters of a captcha scheme. Additionally, the effectiveness of captcha schemes may be monitored to determine when an existing scheme has been broken by attackers, or is likely to have been broken.
  • a captcha specification language may be used to specify a captcha scheme.
  • the language may include features that allow the various elements of a captcha to be specified.
  • a captcha typically includes some sequence of letters and/or numbers that constitute the correct answer to a captcha challenge.
  • the symbols e.g., letters and numbers
  • the symbols may be printed in some font.
  • the symbols may be distorted through warping, skewing, blurring, etc.
  • Distracters that are designed to confuse an OCR system e.g., lines at various angles, shapes, backgrounds of various levels of contrast, etc. may be shown with the symbols.
  • the language may allow parameters of the symbols and distracters to be specified—e.g., how much warping, skewing, blurring; the type, size, and shape of the distracters; etc.
  • parameters may be specified as probability distributions—e.g., a parameter may be specified as a normally distributed random variable, with some mean and variance, so that the actual parameter value used in a specific instance of the captcha will be chosen through a random process with the specified distribution.
  • captcha specification language makes it relatively easy for a person to specify new captcha schemes.
  • another aspect of using such a language is that it makes it possible to automate the process of generating new schemes.
  • a genetic algorithm may be used to combine elements from captcha schemes that have been discovered to be effective, in order to create new schemes.
  • the effectiveness of captcha schemes may be monitored, and statistical techniques may be used to judge the effectiveness of particular features, or combinations of features, of a captcha scheme.
  • regression analysis may be used to predict how long it will take to break a new captcha scheme as a function of the new scheme's measured level of resistance to existing OCRs, or based on the level of difference between the features of the new scheme and existing schemes.
  • FIG. 1 is a block diagram of some example symbols that may appear in a captcha.
  • FIG. 2 is a block diagram of various example features that may be specified in a captcha specification.
  • FIG. 3 is a flow diagram of an example process of creating a new captcha scheme.
  • FIG. 4 is a flow diagram of an example process that may be used to assess the quality of a program.
  • FIG. 5 is a block diagram of example components that may be used in connection with implementations of the subject matter described herein.
  • HIP human interaction proof
  • Some web services such as e-mail, blogs, social networking, etc., present a HIP challenge when a user attempts to register for the service. If the user does not pass the HIP challenge, then the user is not allowed to register for the account.
  • certain actions that people perform on the web such as posting to a blog, downloading a public record, etc., are gated by HIP challenges, such that service is either allowed or disallowed based on whether a user correctly answers the HIP.
  • An HIP is sometimes referred to as a captcha.
  • captchas A problem that arises with captchas is that they can be broken. An effective captcha generally depends on being able to show some set of symbols that a human would find relatively easy to recognize, but that a machine would find difficult to recognize Ordinary, unadorned letters make poor captchas, since optical character recognition (OCR) technology can recognize ordinary letters with relative ease. Thus, captcha schemes generally focus on obscuring letters and numbers in some way—enough that an OCR algorithm would be confused, but not so much as to make the symbols unreadable to a human.
  • OCR optical character recognition
  • the symbols can be warped, skewed, blurred, or transformed in some other manner.
  • distracters can be added to the symbols. Examples of distracters include: lines or curves at various angles that are designed to confuse the segmentation of the captcha into its discrete symbols; backgrounds in various colors or patterns that are designed to confuse the contrast-detection techniques that distinguish a symbol from its background; or other types of distracters.
  • a captcha scheme may involve having distinct symbols intersect with each other to some degree, which—like the line or curve distracters mentioned above, is also designed to confuse the segmentation of the captcha image into its constituent symbols.
  • captcha scheme provides a continual source of training data. Every captcha that is presented provides an example that a human captcha solver could solve in order to generate training data. Moreover, every time a captcha is presented, even if a robot simply takes an educated guess at the answer, the system that presents the captcha challenge responds with either success or failure. Information about which guesses succeed and which ones failed can, itself, be used as a form of training data.
  • captchas schemes have a shelf life in the sense that, some amount of time after they are first deployed, enough data will be available such that an OCR with a machine-learning algorithm can be trained to solve the captcha with some level of reliability (possibly with some human-made adjustments to the machine-learning algorithm, the training data, and/or the results the algorithm produces).
  • the world provides sources of inexpensive labor that can be used to solve captchas. Since captchas may be made up of image data (or even audio data), the data can be sent anywhere in the world where the cost of labor is low. There are businesses in some of these low-cost areas of the world that use human labor to solve captchas at the rate of hundreds of captchas for one dollar.
  • captchas may depend on changing the captcha scheme frequently to confound OCR solvers.
  • the subject matter herein provides techniques for specifying captcha schemes in order to allow the schemes to be changed easily and quickly.
  • the subject matter herein provides techniques for automatically creating new captcha schemes by combining effective features from existing captcha schemes.
  • techniques described herein may be used to monitor the how long captcha schemes that have been deployed remain effective, in order to predict when new captcha schemes are likely to have been broken.
  • a captcha specification language may be used.
  • One example of a captcha specification language is a variant of XAML, which may be referred to as HXAML.
  • XAML the Extensible Application Markup Language
  • HXAML is an extension to XAML, which may be used to specify the HIP elements of a UI.
  • HXAML provides primitives that are relevant to the problem of obscuring symbols (e.g., blurring, skewing, etc.)
  • HXAML is merely one example of a language that may be used to specify captchas; other mechanisms could also be used.
  • the language may provide mechanisms for specifying the answer to the captcha (i.e., the letters, numbers or other symbols that constitute the correct answer to a captcha challenge), as well as the way in which those symbols are to be drawn and distorted.
  • the language may allow users to specify the font of the symbols; the amount of skew, warp, blurring, etc., that is to be applied to the symbols; the existence and nature of distracters to be drawn with the symbols (e.g., extraneous lines or curves); the nature of the background on which the symbols are to be drawn; the way in which the symbols are to be animated; the extent to which symbols are to intersect; or any other features of the appearance of a captcha.
  • the language may allow the scheme to have some built-in variability.
  • a scheme might specify that a letter is to be skewed thirty degrees clockwise.
  • the amount of skew could be specified as a random variable, such as a normal variable with a mean of thirty degrees and a variance of 100 (i.e., a standard deviation of ten degrees).
  • captcha specification language allows a captcha to be specified as a combination of feature, it is possible to modify the captcha scheme automatically using techniques such as genetic algorithms. Genetic algorithms allow features of existing schemes to be combined in new ways to produce new schemes. In one example, the features from particularly effective captcha schemes may be combined, in order to generate a scheme that has a high likelihood of success.
  • captcha schemes when new captcha schemes are deployed, it is possible to monitor these schemes to determine when they have been broken. Moreover, the data from this monitoring can be used with statistical methods to determine the amount of time that it will likely take for a new scheme to be broken. Given some set of captcha schemes with some set of features, the amount of time that it takes until a captcha scheme to be broken can be mapped against the captcha scheme's features. Then, regression can be used to predict how long it would take to break a particular captcha scheme based on the features that it contains.
  • FIG. 1 shows some example symbols that may appear in a captcha. These symbols appear with a variety of features.
  • Drawing 102 is a drawing of the letter A.
  • Drawing 102 includes a representation 104 of the letter A itself. Additionally, drawing 102 shows the letter A on a background 106 .
  • Background 106 is represented as a dotted stipple pattern in the drawing of FIG. 1 , although in a real-life captcha background 106 might have colors and/or more complicated patterns. Background 106 is designed to confuse an OCR algorithm. Since OCR algorithms generally include a process to distinguish a symbol from the background by looking for regions in which a sharp contrast occurs, the use of a complex background is designed to confuse the OCR algorithm's ability to detect the contrast.
  • drawing 102 contains a line 108 , which cuts across the representation 104 of the letter A.
  • One hard problem in OCR is the segmentation of portions of an image into symbols. By drawing extraneous lines or curves over the symbols, the problem of segmentation is made more complicated for OCR algorithms that might be used to break a captcha scheme.
  • Drawing 110 is contains another representation 112 of the letter A.
  • the letter A is rotated about forty-five degrees counterclockwise. Rotation of a representation of a symbol is intended to confuse an OCR algorithm by complicated the problem of orienting the symbol to be recognized.
  • Drawing 114 contains another representation 116 of the letter A.
  • the letter A is blurred (as represented by the dashed line). Blurring of a symbol is another way that can be used to confuse an OCR algorithm.
  • Drawings 102 , 110 , and 114 show various ways to obscure a symbol that is being drawn.
  • These drawings are simplified representations of obscuring techniques; in a real-world captcha scheme, more complicated techniques would be used.
  • Each of the obscuring techniques used in these drawings, as well as the degrees to which they are applied, may constitute the features of a captcha scheme.
  • features that tend to obscure the solution to a captcha may be referred to as “complications.”
  • Distracters, distortions, background, etc. are examples of complications.
  • the fact that a symbol is skewed can be a feature of a particular captcha scheme.
  • the amount that the symbols is skewed can also be features of the captcha scheme.
  • the background and distracter line shown in drawing 102 and the blurring of drawing 114 can also be features of a captcha scheme, as can the parameters that describe the extent to which these features are applied.
  • a configurable captcha generator 118 may be used to generate captchas with the specified features.
  • the configurable captcha generator 118 may generate captchas based on specifications written in a captcha specification language, such as HXAML.
  • captcha specification language There are various ways to design a captcha specification language.
  • the language provides mechanisms to specify the various elements of the captcha, and the parameters that specify how those elements are to be drawn.
  • FIG. 2 shows various example features that may be specified in a captcha specification 200 .
  • a type of feature that may be specified in a captcha specification is the basic visual elements 202 .
  • these elements include the text 204 to be rendered (e.g., a symbol, such as A, B, C, 1, 2, 3, etc.)
  • Another example of a visual element is a distracter 206 .
  • many features in a captcha e.g., angle of skew, color or pattern of background, etc.
  • Distracter 206 is one specific way of creating that confusion, through the drawing of a specific visual elements, such as lines, curves, smudges, etc.
  • text 204 is part of the answer to a captcha challenge, while a distracter 206 is not. That is, if text 204 contains the letter A, then the letter A is part of the answer to the challenge. However, if distracter 206 is a line or curve, that line or curve is not part of the answer, but rather a particular way of obscuring the answer.
  • Parameters 208 are some example parameters that may be specified in a captcha specification language.
  • One example parameter is the position 210 .
  • Symbols in a font generally have a defined quadrilateral boundary with an upper left corner. By default, the upper left corner of a symbol is drawn in the same position as the upper left corner of the area that is designated to draw the symbol. However, the position 210 can be specified as some vertical and/or horizontal offset from that default position.
  • tangent layout 212 refers to the extent to which elements intersect with each other. For example, by default symbols are drawn next to each other so as not to intersect. However, intersection among symbols may be a relevant property for captchas, since intersecting symbols tend to confuse visual segmentation algorithms. Thus, given some defined set of objects to be drawn, tangent layout 212 may specify the number of pixels that are to be made to intersect with each other. (One way to define a “set of objects to be drawn” is to put the objects in a container. Thus, the tangent layout parameter might specify the number of intersecting pixels among all objects in the container to which that parameter applies. The use of containers in a captcha specification language is further described below.)
  • Animation refers to the idea that the entire view of the captcha that a user (or OCR engine) would have to see in order to solve the captcha may not be available at a single instant in time. In other words, acquiring the full amount of visual information that it would take to solve the captcha may involve not only space, but also time.
  • animation specifies the rate at which a drawing changes.
  • Many formats for describing visual elements allow some simple form of animation. For example, XAML and the Graphical Interchange Format (GIF) allow objects to be animated by proceeding through, or cycling through, a finite number of drawings.
  • animation may be specified as follows.
  • Parameters may be specified as random variables that are to be drawn from probability distributions. (The use of probability distributions as parameters is described in greater detail below.)
  • the animation parameter might take two arguments, N and x, which specifies that—for each randomized parameter—N values are to be selected according to the random parameter's probability distribution, and these N values are to be cycled on an x second timer.
  • An animation might take the form of moving “focus” across the letters and numbers in the captcha, so that different parts of the captcha are brought into focus at different times.
  • the animation might involve having pixels of the captcha that are near each other be in their correct relative positions at the same time, but having pixels that are far from each other be in their correct relative positions at different times—thereby complicating the process of performing simple image capture on the captcha, by ensuring that there is not a single point in time at which the entire captcha is shown.
  • One or more parameters could define how this animation is to be performed—i.e., the way in which the captcha is to be shown over a duration of time, rather than all at one time.
  • Distortion may take various forms.
  • distortion could take the form of blurring, warping, skewing, other types of transformations, or any combination thereof.
  • Each different form of distortion could be specified by a separate parameter, so distortion may actually be specified as a set of parameters.
  • the skew angle could be one parameter
  • the amount of blurring could be specified as another parameter, and so on.
  • a parameter could be specified as fixed value 220 .
  • a parameter could be specified as a random variable chosen from some probability distribution.
  • a probability distribution is a Gaussian (or “normal”) distribution 222 .
  • Gaussian distributions may be specified by their mean and variance (or standard deviation).
  • a parameter might be specified as “G10,1”, indicating that a number is to be drawn from a normal distribution with a mean of 10, and a variance/standard deviation of 1.
  • a parameter could be specified as being drawn from a uniform distribution 224 .
  • a parameter might be specified as “U10,100”, indicating that the parameter is to be drawn from a uniform distribution having upper and lower bounds of 10 and 100, respectively.
  • Other distributions e.g., exponential, binomial, Poisson, chi square, etc.
  • the value specifies the degree to which a particular distortion, or other type of complication, is to be applied to a captcha.
  • the value of a blurring parameter is U10,100, then it may be said that blurring is to be applied in a degree that is chosen from a uniform random variable with a range of ten to one hundred.
  • a container might contain a letter and its distracters, or a sequence of letters. Parameters could be defined for the container, so that the container's parameters would be applied to any elements in the container.
  • a container includes both a symbol and a distracter, and a blur parameter is set on the container, then the blur would be applied both to the symbol and its distracter.
  • a container contains three letters and a skew parameter, then all letters in the container would be skewed according to the parameter.
  • a container contains three letters and a tangent layout parameter is defined for that container, then the three letters would be made to intersect with each other in accordance with the tangent layout parameter.
  • Captcha-generation programs created using the above-described features may be used by a captcha generation system, which creates captchas in accordance with the specifications that it receives.
  • a captcha generation system which creates captchas in accordance with the specifications that it receives.
  • there may be an HAXML engine that generates captchas based on an HXAML specification.
  • captcha schemes can be designed relatively quickly. However, some schemes are more effective than others.
  • the following is a description of a process for creating a new captcha scheme.
  • captcha schemes can be designed by hand, one aspect of the process described below is that it allows the process of generating captcha schemes to be automated. The process is shown in FIG. 3 .
  • FIG. 3 Before turning to a description of FIG. 3 , it is noted that the flow diagrams contained herein (both in FIG. 3 and in FIG. 4 ) are described, by way of example, with reference to components shown in FIGS. 1 and 2 , although these processes may be carried out in any system and are not limited to the scenarios shown in FIGS. 1 and 2 . Additionally, each of the flow diagrams in FIGS. 3 and 4 shows an example in which stages of a process are carried out in a particular order, as indicated by the lines connecting the blocks, but the various stages shown in these diagrams can be performed in any order, or in any combination or sub-combination.
  • a HIP generation program is evaluated for effectiveness.
  • a HIP generation program is a program that generates captchas in accordance with some captcha scheme.
  • captcha scheme E.g., “choose five letters of the alphabet, skew the first by an angle chosen from a Gaussian random variable with a mean of 10 and standard deviation of 1, blur the second by an amount chosen from a uniform random variable with bounds 10 and 100, . . . ” is an example of a captcha scheme.
  • Such a captcha scheme might be specified in a language such as HXAML.
  • Judging the effectiveness of a HIP-generation program may involve various considerations.
  • Some example considerations are: how quickly a legitimate user can solve captcha instances generated by the scheme; how difficult it is for an illegitimate user to solve the captchas; or some combination of the scheme being for OCRs, and also difficult (but not prohibitively difficult) for humans. (Difficulty of human solving drives up the cost for people who employ humans to solve captchas, but also discourages legitimate users.
  • captcha scheme that is very OCR resistant and also takes a human a moderate amount of time (e.g., 15 seconds) to solve.) Since many captcha schemes can eventually be broken, the difficulty for an illegitimate user is generally measured by how long it takes after deployment of the scheme before an OCR algorithm can solve the captchas some percentage of the time.
  • an OCR that can solve the captcha 5% of the time might be considered to have broken the captcha scheme, since the cost of using an OCR-based solver with a 5% success rate is probably not high enough to discourage the use of such a solver.
  • data show that illegitimate users have a success rate in solving captchas that is similar to, or higher than, legitimate users, this fact may indicate that the illegitimate users are solving captchas using human labor (which is generally more accurate than OCR). Even where human labor is inexpensive, it is still generally more expensive than an OCR solution, so any indication that captchas are being solved by human labor tends to indicate failure of attempts to break the scheme with OCR.
  • the quality of an HIP generation program may be judged by the amount of time that it takes to break it (or by the fact that it has not yet been broken, if that is in fact the case).
  • some set of programs may be selected in a manner that is biased by quality (at 304 ). That is, given that the quality of some set of programs, P, has been assessed, a subset of those programs, may be chosen in a way that is random but gives a program a higher probability of being chosen if it has been judged to be of high quality.
  • P′ The subset of programs that are actually chosen by this process.
  • a feature of a program may be a particular way in which symbols are chosen and/or obscured.
  • a particular alphabet from which to choose symbols is a feature of a program. So is the way in which the symbol is distorted, including any parameters that are used in the distortion.
  • a captcha scheme might define that a letter is to be drawn from in the Arial font, and that a distracter is to be drawn with the letter.
  • drawing a letter from the Arial font might be one feature, and the use of a line as a distracter (as well as the way in which the length and position of the line are chosen) might be another feature.
  • the resulting programs may be mutated in some way. For example, mutation might change a parameter (at 310 ), drop a feature (at 312 ), or add a feature (at 314 ). For example, if a parameter of one of the programs is chosen using a normal random variable with a mean of 10 and a standard deviation of 1, then the program might be mutated by changing the mean to 11 and the standard deviation to 2. Adding and/or dropping features might include adding or removing distracters, adding or removing certain types of distortion, adding or removing symbols from which the text of the captcha is chosen, or any other type of change to an existing HIP-generation program.
  • the various forms of mutation may have probabilities assigned to them, so that a particular mutation has some probably of occurring (or not occurring).
  • the mutation process might be performed so that there is, say, a 25% probability that a parameter will be changed; in other words, the mutation process would actually change only one out of every four parameters.
  • a probability of zero percent would mean that it is absolutely certain that the change will not occur, and a probability of one hundred percent would mean that it is absolutely certain the change would occur.
  • the probability of a change occurring may be set somewhere between zero and one hundred percent, exclusive.
  • the result is the creation of a new set of HIP-generation programs (at 316 ).
  • the resulting programs may combine features of programs that have generally been found to be successful (since the process of selecting the programs is biased in favor of high quality programs), and may also contain some new features (or new versions of old features) through the mutation process.
  • the result is a set of programs that implement new captcha schemes. These captcha schemes may be deployed, and their effectiveness may be judged. As these captchas are deployed (possibly after some amount of testing to ensure the captchas are not too hard and/or too easy to solve), and after their effectiveness is judged, the process may be repeated. That is, the newly-created programs may then be fed into the process shown in FIG. 3 , so that the high-quality programs may be combined and/or mutated to create the next generation of captcha schemes.
  • FIG. 4 shows an example process that may be used to assess the quality of a new program, based on how long it is estimated that the program can be used before it is broken.
  • the OCRs are applied to captchas generated by the new program (at 404 ).
  • the new program implements the captcha scheme whose quality we want to assess.
  • the idea behind applying the OCRs to captchas generated by the new program is that the OCRs that have been trained on broken programs is a reasonable estimate of what tools attackers currently have at their disposal to break the new program.
  • statistics may be calculated on how well the new program performed against the OCRs (at 406 ). For example, the percentage of captchas that each OCR successfully breaks could be calculated. Using these averages, a statistic could be calculated based on the average percentage over all the OCRs, the maximum percentage among the OCRs, etc. In general, the statistic measures the new program's success at generating captchas that resist recognition by the various trained OCR engines. The program may then be assigned a quality, Q, using whatever statistic is chosen to represent quality.
  • the program may then be deployed (at 408 ). I.e., the program may be used to generate actual captchas. Services whose use is gated by the captchas that the new program generates are then monitored to determine when the new program is broken. A measurement is then made of how much long it takes between when the program is deployed and when the program is declared to be broken (at 410 ). This measurement is an amount of time, T. Thus, for each new program, it is possible to calculate two values, Q and T, representing the quality and time-to-breakage of the program, respectively. Regression analysis thus may be used to determine the relationship between quality and time-to-breakage (at 412 ).
  • the regression of T on Q may be calculated, thereby giving an average time-to-breakage for any given level of quality. Therefore, when a new program is created, its quality can be measured in the way described above (e.g., by training OCRs on known broken captcha schemes, applying those OCRs to a new program, and measuring the new program's resistance to the OCRs). Once the quality has been measured, the time-to-breakage (i.e., the shelf-life of the program) can be estimated using the function that results from the regression analysis.
  • a distance metric between two captcha schemes may be defined. For example, if two HIP-generating programs differ in the value of a particular parameter, then the distance between these two parameters could be defined as the numerical difference in their values.
  • the distance could be defined by analogy to the Levenshtein distance (i.e., the number of insertion, deletion, and substitution operations that it takes to transform captcha scheme A so that it has the same features as captcha scheme B).
  • Levenshtein distance i.e., the number of insertion, deletion, and substitution operations that it takes to transform captcha scheme A so that it has the same features as captcha scheme B.
  • a distance metric it is possible to calculate a statistic based on the distances between a new program and each existing broken program.
  • the statistic might be the average distance to the broken programs, the minimum distance, or any other appropriate statistic.
  • each program can be assigned a statistic, D, representing its distance to the known broken programs.
  • D the time, T, that it takes for a new program to become broken may also be measured.
  • any new program may be associated with two values, D and T.
  • T By calculating the regression of T on D, it is possible to identify a function that predicts the times that it takes to break a new program (i.e., its shelf life) in terms of the distance between the new program and existing programs.
  • FIG. 5 shows an example environment in which aspects of the subject matter described herein may be deployed.
  • Computer 500 includes one or more processors 502 and one or more data remembrance components 504 .
  • Processor(s) 502 are typically microprocessors, such as those found in a personal desktop or laptop computer, a server, a handheld computer, or another kind of computing device.
  • Data remembrance component(s) 504 are components that are capable of storing data for either the short or long term. Examples of data remembrance component(s) 504 include hard disks, removable disks (including optical and magnetic disks), volatile and non-volatile random-access memory (RAM), read-only memory (ROM), flash memory, magnetic tape, etc.
  • Data remembrance component(s) are examples of computer-readable storage media.
  • Computer 500 may comprise, or be associated with, display 512 , which may be a cathode ray tube (CRT) monitor, a liquid crystal display (LCD) monitor, or any other type of monitor.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • Software may be stored in the data remembrance component(s) 504 , and may execute on the one or more processor(s) 502 .
  • An example of such software is captcha generation software 506 , which may implement some or all of the functionality described above in connection with FIGS. 1-4 , although any type of software could be used.
  • Software 506 may be implemented, for example, through one or more components, which may be components in a distributed system, separate files, separate functions, separate objects, separate lines of code, etc.
  • a computer e.g., personal computer, server computer, handheld computer, etc.
  • a program is stored on hard disk, loaded into RAM, and executed on the computer's processor(s) typifies the scenario depicted in FIG. 5 , although the subject matter described herein is not limited to this example.
  • the subject matter described herein can be implemented as software that is stored in one or more of the data remembrance component(s) 504 and that executes on one or more of the processor(s) 502 .
  • the subject matter can be implemented as instructions that are stored on one or more computer-readable storage media. Tangible media, such as an optical disks or magnetic disks, are examples of storage media.
  • the instructions may exist on non-transitory media. Such instructions, when executed by a computer or other machine, may cause the computer or other machine to perform one or more acts of a method.
  • the instructions to perform the acts could be stored on one medium, or could be spread out across plural media, so that the instructions might appear collectively on the one or more computer-readable storage media, regardless of whether all of the instructions happen to be on the same medium.
  • any acts described herein may be performed by a processor (e.g., one or more of processors 502 ) as part of a method.
  • a processor e.g., one or more of processors 502
  • a method may be performed that comprises the acts of A, B, and C.
  • a method may be performed that comprises using a processor to perform the acts of A, B, and C.
  • computer 500 may be communicatively connected to one or more other devices through network 508 .
  • Computer 510 which may be similar in structure to computer 500 , is an example of a device that can be connected to computer 500 , although other types of devices may also be so connected.

Abstract

Human Interaction Proofs (“HIPs”, sometimes referred to as “captchas”), may be generated automatically. An captcha specification language may be defined, which allows a captcha scheme to be defined in terms of how symbols are to be chosen and drawn, and how those symbols are obscured. The language may provide mechanisms to specify the various ways in which to obscure symbols. New captcha schemes may be generated from existing specifications, by using genetic algorithms that combine features from existing captcha schemes that have been successful. Moreover, the likelihood that a captcha scheme has been broken by attackers may be estimated by collecting data on the time that it takes existing captcha schemes to be broken, and using regression to estimate the time to breakage as a function of either the captcha's features or its measured quality.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is a continuation of U.S. patent application Ser. No. 14/281,316, filed May 19, 2014, entitled “AUTOMATIC CONSTRUCTION OF HUMAN INTERACTION PROOF ENGINES” (Atty. Docket No. 329556.06), which is a divisional of U.S. patent application Ser. No. 12/821,124 filed Jun. 22, 2010, entitled “AUTOMATIC CONSTRUCTION OF HUMAN INTERACTION PROOF ENGINES,” now U.S. Pat. No. 8,739,276, issued May 27, 2014 (Atty. Docket No. 329556.01). The entirety of these afore-mentioned applications are incorporated herein by reference.
  • BACKGROUND
  • A human interaction proof (HIP), which is sometimes referred to as a “captcha,” is a mechanism that is used to distinguish human users from robots. Many services that are available on the web—e.g., e-mail, blogs, social networks, access to patent databases, etc.—are gated by captchas. In a typical captcha scheme, letters and numbers are displayed on a screen as graphics in some way that is designed to obscure the letters and numbers. A user has to type the letters and numbers into a box as a form of proof that the user is human. The theory behind captchas is that recognizing symbols that intentionally have been obscured is a hard problem that demands the flexibility of the human brain. Thus, captchas are something akin to an applied Turing test.
  • A problem that arises with captchas is that they can be broken in various ways. Once a particular captcha scheme has been in use for some amount of time, the obscured symbols become recognizable in the sense that optical character recognition (OCR) systems can be trained to recognize them. OCR is thus an automated way of breaking captchas, and it can work as long as there is enough data on which to train the OCR. The training data can be generated by human captcha solvers, or can even be generated just by guessing solutions and analyzing which guesses succeed and which ones fail. Since captchas themselves can be used as training data, for as long as a captcha scheme is in use it continues to generate training data that can be used to break the scheme. Thus, captcha schemes generally have a limited shelf life, after which they are likely to have been broken. In addition to OCR, another way to break a captcha scheme is to use inexpensive human labor to solve captchas. Captchas can be transmitted electronically anywhere in the world (including places where labor is inexpensive), and teams of people can be employed to solve captchas. The solved captchas can be used in real-time, or the solutions can be stored and used as training data for OCR systems, thereby allowing human breaking to feed the process of automated breaking.
  • Since captchas are used ensure, probabilistically, that services are being used by humans rather than machines, in order for captchas to continue to serve their intended purpose, the captcha schemes often have to be changed. But changing the captcha scheme involves designing and testing a new scheme, which can be labor intensive. Thus, new captcha schemes generally are not designed and deployed as frequently as they could be.
  • SUMMARY
  • Captchas may be specified using a system that streamlines the process of describing the elements and parameters of the scheme. Moreover, captchas schemes may be changed and enhanced over time, by using a genetic algorithm to change the elements and parameters of a captcha scheme. Additionally, the effectiveness of captcha schemes may be monitored to determine when an existing scheme has been broken by attackers, or is likely to have been broken.
  • A captcha specification language may be used to specify a captcha scheme. The language may include features that allow the various elements of a captcha to be specified. For example, a captcha typically includes some sequence of letters and/or numbers that constitute the correct answer to a captcha challenge. In order to create the graphic that is shown to a user as part of a challenge, the symbols (e.g., letters and numbers) may be printed in some font. The symbols may be distorted through warping, skewing, blurring, etc. Distracters that are designed to confuse an OCR system (e.g., lines at various angles, shapes, backgrounds of various levels of contrast, etc.) may be shown with the symbols. The language may allow parameters of the symbols and distracters to be specified—e.g., how much warping, skewing, blurring; the type, size, and shape of the distracters; etc. In one example, parameters may be specified as probability distributions—e.g., a parameter may be specified as a normally distributed random variable, with some mean and variance, so that the actual parameter value used in a specific instance of the captcha will be chosen through a random process with the specified distribution.
  • One aspect of using a captcha specification language is that it makes it relatively easy for a person to specify new captcha schemes. However, another aspect of using such a language is that it makes it possible to automate the process of generating new schemes. For example, a genetic algorithm may be used to combine elements from captcha schemes that have been discovered to be effective, in order to create new schemes. Moreover, the effectiveness of captcha schemes may be monitored, and statistical techniques may be used to judge the effectiveness of particular features, or combinations of features, of a captcha scheme. In particular, regression analysis may be used to predict how long it will take to break a new captcha scheme as a function of the new scheme's measured level of resistance to existing OCRs, or based on the level of difference between the features of the new scheme and existing schemes.
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of some example symbols that may appear in a captcha.
  • FIG. 2 is a block diagram of various example features that may be specified in a captcha specification.
  • FIG. 3 is a flow diagram of an example process of creating a new captcha scheme.
  • FIG. 4 is a flow diagram of an example process that may be used to assess the quality of a program.
  • FIG. 5 is a block diagram of example components that may be used in connection with implementations of the subject matter described herein.
  • DETAILED DESCRIPTION
  • A human interaction proof (HIP) is often used to gate access to services. HIPs are used to distinguish, probabilistically, human users from robots. For example, some web services such as e-mail, blogs, social networking, etc., present a HIP challenge when a user attempts to register for the service. If the user does not pass the HIP challenge, then the user is not allowed to register for the account. As another example, certain actions that people perform on the web, such as posting to a blog, downloading a public record, etc., are gated by HIP challenges, such that service is either allowed or disallowed based on whether a user correctly answers the HIP. An HIP is sometimes referred to as a captcha.
  • A problem that arises with captchas is that they can be broken. An effective captcha generally depends on being able to show some set of symbols that a human would find relatively easy to recognize, but that a machine would find difficult to recognize Ordinary, unadorned letters make poor captchas, since optical character recognition (OCR) technology can recognize ordinary letters with relative ease. Thus, captcha schemes generally focus on obscuring letters and numbers in some way—enough that an OCR algorithm would be confused, but not so much as to make the symbols unreadable to a human.
  • There are various ways of obscuring symbols. For example, the symbols can be warped, skewed, blurred, or transformed in some other manner. Or, distracters can be added to the symbols. Examples of distracters include: lines or curves at various angles that are designed to confuse the segmentation of the captcha into its discrete symbols; backgrounds in various colors or patterns that are designed to confuse the contrast-detection techniques that distinguish a symbol from its background; or other types of distracters. In another example, a captcha scheme may involve having distinct symbols intersect with each other to some degree, which—like the line or curve distracters mentioned above, is also designed to confuse the segmentation of the captcha image into its constituent symbols.
  • However, no matter how elaborate a captcha scheme is, it can eventually be broken. The use of a captcha scheme provides a continual source of training data. Every captcha that is presented provides an example that a human captcha solver could solve in order to generate training data. Moreover, every time a captcha is presented, even if a robot simply takes an educated guess at the answer, the system that presents the captcha challenge responds with either success or failure. Information about which guesses succeed and which ones failed can, itself, be used as a form of training data. In other words, captchas schemes have a shelf life in the sense that, some amount of time after they are first deployed, enough data will be available such that an OCR with a machine-learning algorithm can be trained to solve the captcha with some level of reliability (possibly with some human-made adjustments to the machine-learning algorithm, the training data, and/or the results the algorithm produces). Moreover, even if training an OCR algorithm on a particular captcha scheme were to prove intractable, the world provides sources of inexpensive labor that can be used to solve captchas. Since captchas may be made up of image data (or even audio data), the data can be sent anywhere in the world where the cost of labor is low. There are businesses in some of these low-cost areas of the world that use human labor to solve captchas at the rate of hundreds of captchas for one dollar.
  • Thus, the effective use of captchas may depend on changing the captcha scheme frequently to confound OCR solvers. The subject matter herein provides techniques for specifying captcha schemes in order to allow the schemes to be changed easily and quickly. Moreover, the subject matter herein provides techniques for automatically creating new captcha schemes by combining effective features from existing captcha schemes. Additionally, techniques described herein may be used to monitor the how long captcha schemes that have been deployed remain effective, in order to predict when new captcha schemes are likely to have been broken.
  • In order to create captcha schemes efficiently, a captcha specification language may be used. One example of a captcha specification language is a variant of XAML, which may be referred to as HXAML. XAML (the Extensible Application Markup Language) is a language that is used to define elements of a user interface (UI), including graphical elements. HXAML is an extension to XAML, which may be used to specify the HIP elements of a UI. HXAML provides primitives that are relevant to the problem of obscuring symbols (e.g., blurring, skewing, etc.) HXAML is merely one example of a language that may be used to specify captchas; other mechanisms could also be used. Regardless of the particular captcha specification language that is used, the language may provide mechanisms for specifying the answer to the captcha (i.e., the letters, numbers or other symbols that constitute the correct answer to a captcha challenge), as well as the way in which those symbols are to be drawn and distorted. For example, the language may allow users to specify the font of the symbols; the amount of skew, warp, blurring, etc., that is to be applied to the symbols; the existence and nature of distracters to be drawn with the symbols (e.g., extraneous lines or curves); the nature of the background on which the symbols are to be drawn; the way in which the symbols are to be animated; the extent to which symbols are to intersect; or any other features of the appearance of a captcha. The language may allow the scheme to have some built-in variability. Thus, in one example (without variability), a scheme might specify that a letter is to be skewed thirty degrees clockwise. But, in another example, the amount of skew could be specified as a random variable, such as a normal variable with a mean of thirty degrees and a variance of 100 (i.e., a standard deviation of ten degrees).
  • Since the captcha specification language allows a captcha to be specified as a combination of feature, it is possible to modify the captcha scheme automatically using techniques such as genetic algorithms. Genetic algorithms allow features of existing schemes to be combined in new ways to produce new schemes. In one example, the features from particularly effective captcha schemes may be combined, in order to generate a scheme that has a high likelihood of success.
  • Moreover, when new captcha schemes are deployed, it is possible to monitor these schemes to determine when they have been broken. Moreover, the data from this monitoring can be used with statistical methods to determine the amount of time that it will likely take for a new scheme to be broken. Given some set of captcha schemes with some set of features, the amount of time that it takes until a captcha scheme to be broken can be mapped against the captcha scheme's features. Then, regression can be used to predict how long it would take to break a particular captcha scheme based on the features that it contains.
  • Turning now to the drawings, FIG. 1 shows some example symbols that may appear in a captcha. These symbols appear with a variety of features. Drawing 102 is a drawing of the letter A. Drawing 102 includes a representation 104 of the letter A itself. Additionally, drawing 102 shows the letter A on a background 106. Background 106 is represented as a dotted stipple pattern in the drawing of FIG. 1, although in a real-life captcha background 106 might have colors and/or more complicated patterns. Background 106 is designed to confuse an OCR algorithm. Since OCR algorithms generally include a process to distinguish a symbol from the background by looking for regions in which a sharp contrast occurs, the use of a complex background is designed to confuse the OCR algorithm's ability to detect the contrast. Additionally, drawing 102 contains a line 108, which cuts across the representation 104 of the letter A. One hard problem in OCR is the segmentation of portions of an image into symbols. By drawing extraneous lines or curves over the symbols, the problem of segmentation is made more complicated for OCR algorithms that might be used to break a captcha scheme.
  • Drawing 110 is contains another representation 112 of the letter A. In drawing 110, the letter A is rotated about forty-five degrees counterclockwise. Rotation of a representation of a symbol is intended to confuse an OCR algorithm by complicated the problem of orienting the symbol to be recognized.
  • Drawing 114 contains another representation 116 of the letter A. In drawing 114, the letter A is blurred (as represented by the dashed line). Blurring of a symbol is another way that can be used to confuse an OCR algorithm.
  • Drawings 102, 110, and 114 show various ways to obscure a symbol that is being drawn. (These drawings are simplified representations of obscuring techniques; in a real-world captcha scheme, more complicated techniques would be used.) Each of the obscuring techniques used in these drawings, as well as the degrees to which they are applied, may constitute the features of a captcha scheme. (For the purpose of the discussion herein, features that tend to obscure the solution to a captcha may be referred to as “complications.” Distracters, distortions, background, etc., are examples of complications.) Thus, the fact that a symbol is skewed (as in drawing 110) can be a feature of a particular captcha scheme. Additionally, the amount that the symbols is skewed (e.g., 45 degrees), or the particular way in which a random skew is selected (e.g., a normal random variable with a mean of 45 degrees and a standard deviation of 10 degrees), can also be features of the captcha scheme. The background and distracter line shown in drawing 102 and the blurring of drawing 114 can also be features of a captcha scheme, as can the parameters that describe the extent to which these features are applied. A configurable captcha generator 118 may be used to generate captchas with the specified features. The configurable captcha generator 118 may generate captchas based on specifications written in a captcha specification language, such as HXAML.
  • There are various ways to design a captcha specification language. In one example, the language provides mechanisms to specify the various elements of the captcha, and the parameters that specify how those elements are to be drawn. FIG. 2 shows various example features that may be specified in a captcha specification 200.
  • A type of feature that may be specified in a captcha specification is the basic visual elements 202. Examples of these elements include the text 204 to be rendered (e.g., a symbol, such as A, B, C, 1, 2, 3, etc.) Another example of a visual element is a distracter 206. To some extent, many features in a captcha (e.g., angle of skew, color or pattern of background, etc.) are designed to be confusing to an OCR algorithm. Distracter 206 is one specific way of creating that confusion, through the drawing of a specific visual elements, such as lines, curves, smudges, etc. Semantically, the distinction between text 204 and distracter 206 is that—while both are objects to be drawn—text 204 is part of the answer to a captcha challenge, while a distracter 206 is not. That is, if text 204 contains the letter A, then the letter A is part of the answer to the challenge. However, if distracter 206 is a line or curve, that line or curve is not part of the answer, but rather a particular way of obscuring the answer.
  • The various visual elements may be parameterized in some way. Parameters 208 are some example parameters that may be specified in a captcha specification language. One example parameter is the position 210. Symbols in a font generally have a defined quadrilateral boundary with an upper left corner. By default, the upper left corner of a symbol is drawn in the same position as the upper left corner of the area that is designated to draw the symbol. However, the position 210 can be specified as some vertical and/or horizontal offset from that default position.
  • Another example of a parameter is tangent layout 212, which refers to the extent to which elements intersect with each other. For example, by default symbols are drawn next to each other so as not to intersect. However, intersection among symbols may be a relevant property for captchas, since intersecting symbols tend to confuse visual segmentation algorithms. Thus, given some defined set of objects to be drawn, tangent layout 212 may specify the number of pixels that are to be made to intersect with each other. (One way to define a “set of objects to be drawn” is to put the objects in a container. Thus, the tangent layout parameter might specify the number of intersecting pixels among all objects in the container to which that parameter applies. The use of containers in a captcha specification language is further described below.)
  • Another example of a parameter is animation 214. Animation refers to the idea that the entire view of the captcha that a user (or OCR engine) would have to see in order to solve the captcha may not be available at a single instant in time. In other words, acquiring the full amount of visual information that it would take to solve the captcha may involve not only space, but also time. In one simple example, animation specifies the rate at which a drawing changes. Many formats for describing visual elements allow some simple form of animation. For example, XAML and the Graphical Interchange Format (GIF) allow objects to be animated by proceeding through, or cycling through, a finite number of drawings. In one example, animation may be specified as follows. Parameters may be specified as random variables that are to be drawn from probability distributions. (The use of probability distributions as parameters is described in greater detail below.) Thus, in this example, the animation parameter might take two arguments, N and x, which specifies that—for each randomized parameter—N values are to be selected according to the random parameter's probability distribution, and these N values are to be cycled on an x second timer. Thus, if one parameter is an angle of a line to be drawn, and the parameter is selected from a normal distribution, then animating that line with N=5 and x=2 would select N values from the angle's distribution, and would change the angle of the line every two seconds. The pattern would repeat after five angle changes. However, the foregoing is merely one example of an animation. An animation might take the form of moving “focus” across the letters and numbers in the captcha, so that different parts of the captcha are brought into focus at different times. Or, the animation might involve having pixels of the captcha that are near each other be in their correct relative positions at the same time, but having pixels that are far from each other be in their correct relative positions at different times—thereby complicating the process of performing simple image capture on the captcha, by ensuring that there is not a single point in time at which the entire captcha is shown. One or more parameters could define how this animation is to be performed—i.e., the way in which the captcha is to be shown over a duration of time, rather than all at one time.
  • Another example of a parameter is distortion 216. Distortion may take various forms. For example, distortion could take the form of blurring, warping, skewing, other types of transformations, or any combination thereof. Each different form of distortion could be specified by a separate parameter, so distortion may actually be specified as a set of parameters. For example, the skew angle could be one parameter, the amount of blurring could be specified as another parameter, and so on.
  • We now turn to the various different ways 218 to specify parameters. For example, if a particular captcha scheme specifies that an element of the captcha is to be skewed by some angle, there are various way to specify that angle. A parameter could be specified as fixed value 220. However, as noted above, a parameter could be specified as a random variable chosen from some probability distribution. One example of such a probability distribution is a Gaussian (or “normal”) distribution 222. Gaussian distributions may be specified by their mean and variance (or standard deviation). Thus, a parameter might be specified as “G10,1”, indicating that a number is to be drawn from a normal distribution with a mean of 10, and a variance/standard deviation of 1. Similarly, a parameter could be specified as being drawn from a uniform distribution 224. Thus, a parameter might be specified as “U10,100”, indicating that the parameter is to be drawn from a uniform distribution having upper and lower bounds of 10 and 100, respectively. Other distributions (e.g., exponential, binomial, Poisson, chi square, etc.) could be defined. In general, the value specifies the degree to which a particular distortion, or other type of complication, is to be applied to a captcha. E.g., if the value of a blurring parameter is U10,100, then it may be said that blurring is to be applied in a degree that is chosen from a uniform random variable with a range of ten to one hundred.
  • One way to organize the elements of a captcha, and the parameters that apply to them, is to define containers. For example, a container might contain a letter and its distracters, or a sequence of letters. Parameters could be defined for the container, so that the container's parameters would be applied to any elements in the container. Thus, if a container includes both a symbol and a distracter, and a blur parameter is set on the container, then the blur would be applied both to the symbol and its distracter. Or, if a container contains three letters and a skew parameter, then all letters in the container would be skewed according to the parameter. Or, as another example, if a container contains three letters and a tangent layout parameter is defined for that container, then the three letters would be made to intersect with each other in accordance with the tangent layout parameter.
  • Captcha-generation programs created using the above-described features may be used by a captcha generation system, which creates captchas in accordance with the specifications that it receives. For example, there may be an HAXML engine that generates captchas based on an HXAML specification.
  • As mentioned above, there may be reason to change captcha schemes frequently. Since a captcha specification language makes it relatively easy to define a new captcha scheme by changing the features and/or parameters of the scheme, new schemes can be designed relatively quickly. However, some schemes are more effective than others. The following is a description of a process for creating a new captcha scheme. Although captcha schemes can be designed by hand, one aspect of the process described below is that it allows the process of generating captcha schemes to be automated. The process is shown in FIG. 3.
  • Before turning to a description of FIG. 3, it is noted that the flow diagrams contained herein (both in FIG. 3 and in FIG. 4) are described, by way of example, with reference to components shown in FIGS. 1 and 2, although these processes may be carried out in any system and are not limited to the scenarios shown in FIGS. 1 and 2. Additionally, each of the flow diagrams in FIGS. 3 and 4 shows an example in which stages of a process are carried out in a particular order, as indicated by the lines connecting the blocks, but the various stages shown in these diagrams can be performed in any order, or in any combination or sub-combination.
  • At 302, a HIP-generation program is evaluated for effectiveness. A HIP generation program is a program that generates captchas in accordance with some captcha scheme. (E.g., “choose five letters of the alphabet, skew the first by an angle chosen from a Gaussian random variable with a mean of 10 and standard deviation of 1, blur the second by an amount chosen from a uniform random variable with bounds 10 and 100, . . . ” is an example of a captcha scheme. Such a captcha scheme might be specified in a language such as HXAML.) Judging the effectiveness of a HIP-generation program (or the captcha scheme that the program implements) may involve various considerations. Some example considerations are: how quickly a legitimate user can solve captcha instances generated by the scheme; how difficult it is for an illegitimate user to solve the captchas; or some combination of the scheme being for OCRs, and also difficult (but not prohibitively difficult) for humans. (Difficulty of human solving drives up the cost for people who employ humans to solve captchas, but also discourages legitimate users. Thus, one might like to find a captcha scheme that is very OCR resistant and also takes a human a moderate amount of time (e.g., 15 seconds) to solve.) Since many captcha schemes can eventually be broken, the difficulty for an illegitimate user is generally measured by how long it takes after deployment of the scheme before an OCR algorithm can solve the captchas some percentage of the time. Given that the cost of failure is relatively low (i.e., the system might have to try again, and possibly have an IP address blocked for some period of time), an OCR that can solve the captcha 5% of the time might be considered to have broken the captcha scheme, since the cost of using an OCR-based solver with a 5% success rate is probably not high enough to discourage the use of such a solver. Additionally, if data show that illegitimate users have a success rate in solving captchas that is similar to, or higher than, legitimate users, this fact may indicate that the illegitimate users are solving captchas using human labor (which is generally more accurate than OCR). Even where human labor is inexpensive, it is still generally more expensive than an OCR solution, so any indication that captchas are being solved by human labor tends to indicate failure of attempts to break the scheme with OCR.
  • Thus, based on the foregoing discussion, the quality of an HIP generation program may be judged by the amount of time that it takes to break it (or by the fact that it has not yet been broken, if that is in fact the case). Regardless of the manner in which the quality of the HIP is judged, some set of programs may be selected in a manner that is biased by quality (at 304). That is, given that the quality of some set of programs, P, has been assessed, a subset of those programs, may be chosen in a way that is random but gives a program a higher probability of being chosen if it has been judged to be of high quality. The subset of programs that are actually chosen by this process may be referred to as P′.
  • After the programs P′ have been chosen, features from pairs of programs may be combined in some manner (at 306). A feature of a program may be a particular way in which symbols are chosen and/or obscured. Thus, a particular alphabet from which to choose symbols is a feature of a program. So is the way in which the symbol is distorted, including any parameters that are used in the distortion. For example, a captcha scheme might define that a letter is to be drawn from in the Arial font, and that a distracter is to be drawn with the letter. In this case, drawing a letter from the Arial font might be one feature, and the use of a line as a distracter (as well as the way in which the length and position of the line are chosen) might be another feature.
  • At 308, the resulting programs may be mutated in some way. For example, mutation might change a parameter (at 310), drop a feature (at 312), or add a feature (at 314). For example, if a parameter of one of the programs is chosen using a normal random variable with a mean of 10 and a standard deviation of 1, then the program might be mutated by changing the mean to 11 and the standard deviation to 2. Adding and/or dropping features might include adding or removing distracters, adding or removing certain types of distortion, adding or removing symbols from which the text of the captcha is chosen, or any other type of change to an existing HIP-generation program. The various forms of mutation may have probabilities assigned to them, so that a particular mutation has some probably of occurring (or not occurring). For example, the mutation process might be performed so that there is, say, a 25% probability that a parameter will be changed; in other words, the mutation process would actually change only one out of every four parameters. (A probability of zero percent would mean that it is absolutely certain that the change will not occur, and a probability of one hundred percent would mean that it is absolutely certain the change would occur. Thus, to introduce some randomness into the process, the probability of a change occurring may be set somewhere between zero and one hundred percent, exclusive.) Since the selection of programs for P′ is biased in favor of high quality programs, there may be reason to avoid changing the features of the programs in P′ too much. However, this consideration is balanced against the value of adding features to a captcha scheme that have not yet been seen by OCR engines. Thus, it may be effective to mutate combinations of successful programs to some degree, but not to an excessive degree. These considerations can be balanced by appropriately choosing the probability with which a particular type of mutation will occur.
  • After the combination of programs and/or mutation of those programs, the result is the creation of a new set of HIP-generation programs (at 316). The resulting programs may combine features of programs that have generally been found to be successful (since the process of selecting the programs is biased in favor of high quality programs), and may also contain some new features (or new versions of old features) through the mutation process. The result is a set of programs that implement new captcha schemes. These captcha schemes may be deployed, and their effectiveness may be judged. As these captchas are deployed (possibly after some amount of testing to ensure the captchas are not too hard and/or too easy to solve), and after their effectiveness is judged, the process may be repeated. That is, the newly-created programs may then be fed into the process shown in FIG. 3, so that the high-quality programs may be combined and/or mutated to create the next generation of captcha schemes.
  • When a new program is created, there may be reason to try to estimate the shelf life of the program—i.e., how long it will take for the captcha scheme implemented by the program to be broken. FIG. 4 shows an example process that may be used to assess the quality of a new program, based on how long it is estimated that the program can be used before it is broken.
  • In the process of FIG. 4, it is assumed that there is some set of HIP-generation programs that are known to have been broken in the sense that attackers have successfully trained OCR algorithms on the captcha schemes that the programs implement. For each such broken program, the entity that wants to measure the quality of new programs trains an OCR classifier at 402. (It is assumed that the entity that wants to measure the quality of new programs does not have accessed to the attackers' trained OCRs, and thus has to start by training its own OCRs on the broken programs). Training the OCR algorithms on the broken captcha schemes is relatively easy. Each program is used to generate a set of captchas and their answers. The captchas and their answers are then used as training data to a machine learning algorithm.
  • The process of training an OCR at 402 is repeated for each broken program. Thus, if the number of known broken programs is B, then the repetition of 402 will result in B trained OCRs.
  • After OCRs have been trained for each of the broken programs, the OCRs are applied to captchas generated by the new program (at 404). The new program implements the captcha scheme whose quality we want to assess. The idea behind applying the OCRs to captchas generated by the new program is that the OCRs that have been trained on broken programs is a reasonable estimate of what tools attackers currently have at their disposal to break the new program.
  • When the OCRs have been applied to captchas generated by the new program, statistics may be calculated on how well the new program performed against the OCRs (at 406). For example, the percentage of captchas that each OCR successfully breaks could be calculated. Using these averages, a statistic could be calculated based on the average percentage over all the OCRs, the maximum percentage among the OCRs, etc. In general, the statistic measures the new program's success at generating captchas that resist recognition by the various trained OCR engines. The program may then be assigned a quality, Q, using whatever statistic is chosen to represent quality.
  • The program may then be deployed (at 408). I.e., the program may be used to generate actual captchas. Services whose use is gated by the captchas that the new program generates are then monitored to determine when the new program is broken. A measurement is then made of how much long it takes between when the program is deployed and when the program is declared to be broken (at 410). This measurement is an amount of time, T. Thus, for each new program, it is possible to calculate two values, Q and T, representing the quality and time-to-breakage of the program, respectively. Regression analysis thus may be used to determine the relationship between quality and time-to-breakage (at 412). That is, the regression of T on Q may be calculated, thereby giving an average time-to-breakage for any given level of quality. Therefore, when a new program is created, its quality can be measured in the way described above (e.g., by training OCRs on known broken captcha schemes, applying those OCRs to a new program, and measuring the new program's resistance to the OCRs). Once the quality has been measured, the time-to-breakage (i.e., the shelf-life of the program) can be estimated using the function that results from the regression analysis.
  • An alternative way of assessing a new program is to measure its distance from existing programs. Based on the idea that attackers' tools have been trained on existing captcha schemes, it is reasonable to assume that these tools will be more effective on new captcha schemes that are similar to existing ones, and less effective on captcha schemes that are very different from existing once. Thus, a distance metric between two captcha schemes may be defined. For example, if two HIP-generating programs differ in the value of a particular parameter, then the distance between these two parameters could be defined as the numerical difference in their values. Or, when entire elements are present in one program and absent in another program (e.g., where one program contains a particular distracter and another one does not), then the distance could be defined by analogy to the Levenshtein distance (i.e., the number of insertion, deletion, and substitution operations that it takes to transform captcha scheme A so that it has the same features as captcha scheme B). The foregoing are some examples, although any appropriate distance metric could be defined.
  • Once a distance metric is defined, it is possible to calculate a statistic based on the distances between a new program and each existing broken program. For example, the statistic might be the average distance to the broken programs, the minimum distance, or any other appropriate statistic. Thus, each program can be assigned a statistic, D, representing its distance to the known broken programs. As described above in connection with FIG. 4, the time, T, that it takes for a new program to become broken may also be measured. Thus, any new program may be associated with two values, D and T. By calculating the regression of T on D, it is possible to identify a function that predicts the times that it takes to break a new program (i.e., its shelf life) in terms of the distance between the new program and existing programs.
  • FIG. 5 shows an example environment in which aspects of the subject matter described herein may be deployed.
  • Computer 500 includes one or more processors 502 and one or more data remembrance components 504. Processor(s) 502 are typically microprocessors, such as those found in a personal desktop or laptop computer, a server, a handheld computer, or another kind of computing device. Data remembrance component(s) 504 are components that are capable of storing data for either the short or long term. Examples of data remembrance component(s) 504 include hard disks, removable disks (including optical and magnetic disks), volatile and non-volatile random-access memory (RAM), read-only memory (ROM), flash memory, magnetic tape, etc. Data remembrance component(s) are examples of computer-readable storage media. Computer 500 may comprise, or be associated with, display 512, which may be a cathode ray tube (CRT) monitor, a liquid crystal display (LCD) monitor, or any other type of monitor.
  • Software may be stored in the data remembrance component(s) 504, and may execute on the one or more processor(s) 502. An example of such software is captcha generation software 506, which may implement some or all of the functionality described above in connection with FIGS. 1-4, although any type of software could be used. Software 506 may be implemented, for example, through one or more components, which may be components in a distributed system, separate files, separate functions, separate objects, separate lines of code, etc. A computer (e.g., personal computer, server computer, handheld computer, etc.) in which a program is stored on hard disk, loaded into RAM, and executed on the computer's processor(s) typifies the scenario depicted in FIG. 5, although the subject matter described herein is not limited to this example.
  • The subject matter described herein can be implemented as software that is stored in one or more of the data remembrance component(s) 504 and that executes on one or more of the processor(s) 502. As another example, the subject matter can be implemented as instructions that are stored on one or more computer-readable storage media. Tangible media, such as an optical disks or magnetic disks, are examples of storage media. The instructions may exist on non-transitory media. Such instructions, when executed by a computer or other machine, may cause the computer or other machine to perform one or more acts of a method. The instructions to perform the acts could be stored on one medium, or could be spread out across plural media, so that the instructions might appear collectively on the one or more computer-readable storage media, regardless of whether all of the instructions happen to be on the same medium.
  • Additionally, any acts described herein (whether or not shown in a diagram) may be performed by a processor (e.g., one or more of processors 502) as part of a method. Thus, if the acts A, B, and C are described herein, then a method may be performed that comprises the acts of A, B, and C. Moreover, if the acts of A, B, and C are described herein, then a method may be performed that comprises using a processor to perform the acts of A, B, and C.
  • In one example environment, computer 500 may be communicatively connected to one or more other devices through network 508. Computer 510, which may be similar in structure to computer 500, is an example of a device that can be connected to computer 500, although other types of devices may also be so connected.
  • Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (20)

1. An automated method for generating Human Interaction Proofs (HIP) schemes, the method comprising:
training one or more optical character recognition (OCR) engines on captchas generated by an input HIP scheme and on answers to the captchas generated by the input HIP scheme;
determining, by the one or more trained OCR engines, answers to captchas generated by one or more candidate HIP schemes;
determining an ability of the one or more trained OCR engines to correctly determine answers to the captchas generated by the one or more candidate HIP schemes; and
generating, based on the determined ability of the one or more trained OCR engines to correctly determine answers to the captchas generated by the one or more candidate HIP schemes, at least one output HIP scheme.
2. The automated method of claim 1, wherein generating the at least one output HIP scheme includes:
producing a combined HIP scheme based on at least two HIP schemes of the one or more candidate HIP schemes; and
generating the at least one output HIP scheme based on the combined HIP scheme.
3. The automated method of claim 1, wherein the method further comprises:
estimating a time to breakage of the at least one output HIP scheme based on the determined ability of the one or more trained OCR engines to correctly determine answers to the captchas generated by the one or more candidate HIP schemes.
4. The automated method of claim 1, wherein the at least one output HIP scheme is generated in a HIP specification language.
5. The automated method of claim 1, wherein the at least one output HIP scheme defines:
an alphabet from which multiple symbols are to be selected as answers to output captchas;
multiple complications that are selectable for use in generation of the output captchas; and
multiple values that define extents to which respective complications of the multiple complications are to be applied to the symbols of the alphabet in the generation of the output captchas.
6. A computing device for generating Human Interaction Proofs (HIP) schemes, comprising:
a memory and a processor that are respectively configured to store and execute instructions that cause the computing device to perform operations for generating the HIP schemes, the operations including:
applying one or more trained OCR engines to one or more candidate HIP schemes;
determining, based on the applying of the one or more trained OCR engines to the one or more candidate HIP schemes, information regarding an ability of the one or more trained OCR engines to ascertain answers to captchas generated by the one or more candidate HIP schemes; and
employing the information regarding the ability of the one or more trained OCR engines to generate at least one output HIP scheme.
7. The computing device of claim 6, wherein the information regarding the ability of the one or more trained OCR engines includes statistics regarding a percentage of captchas generated by the one or more candidate HIP schemes that can be decoded by the one or more trained OCR engines.
8. The computing device of claim 6, wherein employing the information regarding the ability of the one or more trained OCR engines includes:
selecting a set of HIP schemes from the plurality of HIP schemes;
producing a combined HIP scheme based on at least two HIP schemes from the selected set of HIP schemes; and
generating the at least one output HIP scheme based on the combined HIP scheme.
9. The computing device of claim 6, wherein the operations further comprise:
determining, based on the applying of the one or more trained OCR engines to the one or more candidate HIP schemes, information regarding an ability of the one or more trained OCR engines to ascertain answers to captchas generated by the at least one output HIP scheme.
10. The computing device of claim 6, wherein the operations further comprise:
estimating a time to breakage of the at least one output HIP scheme based on the information regarding the ability of the one or more trained OCR engines to ascertain answers to captchas generated by the at least one output HIP scheme.
11. The computing device of claim 6, wherein employing the information regarding the ability of the one or more trained OCR engines includes:
selecting a starting set of HIP schemes from the plurality of HIP schemes based on measures of quality of individual HIP schemes of the plurality of HIP schemes;
producing a combined HIP scheme based on at least two HIP schemes from the selected starting set of HIP schemes, including:
combining aspects from each of the at least two HIP schemes from the starting selected set of HIP schemes into the combined HIP scheme;
generating an output HIP scheme based on the combined HIP scheme, including:
mutating the combined HIP scheme into the output HIP scheme; and
outputting, by a computing device, the output HIP scheme.
12. The computing device of claim 6, wherein employing the information regarding the ability of the one or more trained OCR engines includes:
combining aspects from each of at least two HIP schemes into a combined HIP scheme.
13. The computing device of claim 6, wherein employing the information regarding the ability of the one or more trained OCR engines further includes:
mutating the combined HIP scheme by changing a parameter of the combined HIP scheme.
14. The computing device of claim 6, wherein the HIP schemes of the plurality of HIP schemes are in a HIP specification language.
15. The computing device of claim 6, wherein the at least one output HIP scheme defines:
an alphabet from which multiple symbols are to be selected as answers to output captchas;
multiple complications that are selectable for use in generation of the output captchas; and
multiple values that define extents to which respective complications of the multiple complications are to be applied to the symbols of the alphabet during generation of the output captchas.
16. The computing device of claim 15, wherein the multiple complications include a distracter, a background, and/or a distortion.
17. A computer-readable storage medium, comprising a memory and/or a disk, that stores computer-executable instructions that facilitate generation of human interaction proof (HIP) schemes, wherein the computer-executable instructions, in response to execution by a computing device, cause the computing device to perform operations, the operations comprising:
determining, by one or more OCR engines trained on captchas generated by an input HIP scheme and on answers to the captchas, answers to captchas generated by a plurality of candidate HIP schemes;
determining an ability of the one or more OCR engines to solve the captchas generated by the plurality of candidate HIP schemes; and
producing an output HIP scheme from aspects of each of at least two of the plurality of candidate HIP schemes according to the determined ability of the one or more OCR engines to solve the captchas generated by the plurality of candidate HIP schemes.
18. The computer-readable storage medium of claim 17, wherein the aspects include:
a background for at least some symbols;
an amount of skew for at least some of the symbols;
an amount of blurring for at least some of the symbols; and
an amount of warping for at least some of the symbols.
19. The computer-readable storage medium of claim 17, wherein the operations further comprise:
adding and/or dropping a feature from the output HIP scheme, wherein the feature includes a distracter, a background, and/or a distortion.
20. The computer-readable storage medium of claim 17, wherein the operations further comprise:
estimating a time to breakage of the output HIP scheme according to the determined ability of the one or more OCR engines to solve the captchas generated by the plurality of candidate HIP schemes.
US14/624,936 2010-06-22 2015-02-18 Automatic construction of human interaction proof engines Abandoned US20150161365A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/624,936 US20150161365A1 (en) 2010-06-22 2015-02-18 Automatic construction of human interaction proof engines

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/821,124 US8739276B2 (en) 2010-06-22 2010-06-22 Automatic construction of human interaction proof engines
US14/281,316 US8978144B2 (en) 2010-06-22 2014-05-19 Automatic construction of human interaction proof engines
US14/624,936 US20150161365A1 (en) 2010-06-22 2015-02-18 Automatic construction of human interaction proof engines

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/281,316 Continuation US8978144B2 (en) 2010-06-22 2014-05-19 Automatic construction of human interaction proof engines

Publications (1)

Publication Number Publication Date
US20150161365A1 true US20150161365A1 (en) 2015-06-11

Family

ID=45329879

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/821,124 Active 2031-04-14 US8739276B2 (en) 2010-06-22 2010-06-22 Automatic construction of human interaction proof engines
US14/281,316 Active US8978144B2 (en) 2010-06-22 2014-05-19 Automatic construction of human interaction proof engines
US14/624,936 Abandoned US20150161365A1 (en) 2010-06-22 2015-02-18 Automatic construction of human interaction proof engines

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/821,124 Active 2031-04-14 US8739276B2 (en) 2010-06-22 2010-06-22 Automatic construction of human interaction proof engines
US14/281,316 Active US8978144B2 (en) 2010-06-22 2014-05-19 Automatic construction of human interaction proof engines

Country Status (4)

Country Link
US (3) US8739276B2 (en)
EP (1) EP2585971B1 (en)
CN (1) CN102947837B (en)
WO (1) WO2011163098A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9465928B2 (en) * 2014-12-31 2016-10-11 Verizon Patent And Licensing Inc. No-CAPTCHA CAPTCHA
US20170161477A1 (en) * 2015-12-03 2017-06-08 Google Inc. Image Based CAPTCHA Challenges
US10496809B1 (en) 2019-07-09 2019-12-03 Capital One Services, Llc Generating a challenge-response for authentication using relations among objects
US10614207B1 (en) * 2019-07-09 2020-04-07 Capital One Services, Llc Generating captcha images using variations of the same object

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8885931B2 (en) * 2011-01-26 2014-11-11 Microsoft Corporation Mitigating use of machine solvable HIPs
US8776173B2 (en) * 2011-03-24 2014-07-08 AYAH, Inc. Method for generating a human likeness score
US8621564B2 (en) * 2011-06-03 2013-12-31 Ebay, Inc. Focus-based challenge-response authentication
CN102710635A (en) * 2012-05-30 2012-10-03 无锡德思普科技有限公司 Verification method based on cyclic steady and dynamic-state verification code pictures
US20130339245A1 (en) * 2012-06-13 2013-12-19 Sri International Method for Performing Transaction Authorization to an Online System from an Untrusted Computer System
CN103731403B (en) * 2012-10-12 2017-06-23 阿里巴巴集团控股有限公司 A kind of identifying code generates system and method
US9679124B2 (en) * 2014-09-05 2017-06-13 Disney Enterprises, Inc. Smart CAPTCHAs
US9762597B2 (en) * 2015-08-26 2017-09-12 International Business Machines Corporation Method and system to detect and interrupt a robot data aggregator ability to access a website
US9710637B2 (en) * 2015-08-28 2017-07-18 Salesforce.Com, Inc. Unicode-based image generation and testing
US9710638B2 (en) * 2015-08-28 2017-07-18 Salesforce.Com, Inc. Unicode-based image generation and testing
CN105763319A (en) * 2016-02-02 2016-07-13 南京云创大数据科技股份有限公司 Random multi-state verification code generation method
CN106204559B (en) * 2016-06-30 2019-03-12 北京奇艺世纪科技有限公司 Image processing method and device
CN106355072B (en) * 2016-08-19 2019-02-22 沈建国 The implementation method and its device of threedimensional model identifying code
CN107844696B (en) * 2016-09-20 2021-07-27 腾讯科技(深圳)有限公司 Verification code interference method and server
WO2018226740A2 (en) * 2017-06-05 2018-12-13 Balanced Media Technology, LLC Platform for collaborative processing of computing tasks
US11074340B2 (en) * 2019-11-06 2021-07-27 Capital One Services, Llc Systems and methods for distorting CAPTCHA images with generative adversarial networks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050229251A1 (en) * 2004-03-31 2005-10-13 Chellapilla Kumar H High performance content alteration architecture and techniques
US7624277B1 (en) * 2003-02-25 2009-11-24 Microsoft Corporation Content alteration for prevention of unauthorized scripts
US20090319270A1 (en) * 2008-06-23 2009-12-24 John Nicholas Gross CAPTCHA Using Challenges Optimized for Distinguishing Between Humans and Machines
US20100077209A1 (en) * 2008-09-24 2010-03-25 Yahoo! Inc Generating hard instances of captchas
US20100077210A1 (en) * 2008-09-24 2010-03-25 Yahoo! Inc Captcha image generation
US20100302255A1 (en) * 2009-05-26 2010-12-02 Dynamic Representation Systems, LLC-Part VII Method and system for generating a contextual segmentation challenge for an automated agent
US20100306055A1 (en) * 2009-05-26 2010-12-02 Knowledge Probe, Inc. Compelled user interaction with advertisement with dynamically generated challenge
US8483518B2 (en) * 2010-02-19 2013-07-09 Microsoft Corporation Image-based CAPTCHA exploiting context in object recognition
US20140181960A1 (en) * 2007-01-23 2014-06-26 Carnegie Mellon University Methods and apparatuses for controlling access to computer systems and for annotating media files

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPR824401A0 (en) * 2001-10-15 2001-11-08 Silverbrook Research Pty. Ltd. Methods and systems (npw002)
US7725395B2 (en) 2003-09-19 2010-05-25 Microsoft Corp. System and method for devising a human interactive proof that determines whether a remote client is a human or a computer program
US7533411B2 (en) * 2003-09-23 2009-05-12 Microsoft Corporation Order-based human interactive proofs (HIPs) and automatic difficulty rating of HIPs
US7523499B2 (en) 2004-03-25 2009-04-21 Microsoft Corporation Security attack detection and defense
US7533419B2 (en) 2004-10-29 2009-05-12 Microsoft Corporation Human interactive proof service
US7552467B2 (en) 2006-04-24 2009-06-23 Jeffrey Dean Lindsay Security systems for protecting an asset
US8601538B2 (en) * 2006-08-22 2013-12-03 Fuji Xerox Co., Ltd. Motion and interaction based CAPTCHA
US20080209223A1 (en) * 2007-02-27 2008-08-28 Ebay Inc. Transactional visual challenge image for user verification
US20090150983A1 (en) * 2007-08-27 2009-06-11 Infosys Technologies Limited System and method for monitoring human interaction
US8104070B2 (en) 2007-09-17 2012-01-24 Microsoft Corporation Interest aligned manual image categorization for human interactive proofs
US8217800B2 (en) * 2009-02-06 2012-07-10 Research In Motion Limited Motion-based disabling of messaging on a wireless communications device
US20110197268A1 (en) * 2010-02-05 2011-08-11 Yahoo! Inc. Captchas that include overlapped characters, projections on virtual 3d surfaces, and/or virtual 3d objects
US8990959B2 (en) * 2010-05-28 2015-03-24 Microsoft Corporation Manipulable human interactive proofs

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624277B1 (en) * 2003-02-25 2009-11-24 Microsoft Corporation Content alteration for prevention of unauthorized scripts
US20050229251A1 (en) * 2004-03-31 2005-10-13 Chellapilla Kumar H High performance content alteration architecture and techniques
US20050246775A1 (en) * 2004-03-31 2005-11-03 Microsoft Corporation Segmentation based content alteration techniques
US7505946B2 (en) * 2004-03-31 2009-03-17 Microsoft Corporation High performance content alteration architecture and techniques
US20140181960A1 (en) * 2007-01-23 2014-06-26 Carnegie Mellon University Methods and apparatuses for controlling access to computer systems and for annotating media files
US20130132093A1 (en) * 2008-06-23 2013-05-23 John Nicholas And Kristin Gross Trust U/A/D April 13, 2010 System and Method for Generating Challenge Items for CAPTCHAs
US20090319271A1 (en) * 2008-06-23 2009-12-24 John Nicholas Gross System and Method for Generating Challenge Items for CAPTCHAs
US8380503B2 (en) * 2008-06-23 2013-02-19 John Nicholas and Kristin Gross Trust System and method for generating challenge items for CAPTCHAs
US20090319274A1 (en) * 2008-06-23 2009-12-24 John Nicholas Gross System and Method for Verifying Origin of Input Through Spoken Language Analysis
US8489399B2 (en) * 2008-06-23 2013-07-16 John Nicholas and Kristin Gross Trust System and method for verifying origin of input through spoken language analysis
US8494854B2 (en) * 2008-06-23 2013-07-23 John Nicholas and Kristin Gross CAPTCHA using challenges optimized for distinguishing between humans and machines
US8744850B2 (en) * 2008-06-23 2014-06-03 John Nicholas and Kristin Gross System and method for generating challenge items for CAPTCHAs
US20090319270A1 (en) * 2008-06-23 2009-12-24 John Nicholas Gross CAPTCHA Using Challenges Optimized for Distinguishing Between Humans and Machines
US20100077209A1 (en) * 2008-09-24 2010-03-25 Yahoo! Inc Generating hard instances of captchas
US20100077210A1 (en) * 2008-09-24 2010-03-25 Yahoo! Inc Captcha image generation
US20100302255A1 (en) * 2009-05-26 2010-12-02 Dynamic Representation Systems, LLC-Part VII Method and system for generating a contextual segmentation challenge for an automated agent
US20100306055A1 (en) * 2009-05-26 2010-12-02 Knowledge Probe, Inc. Compelled user interaction with advertisement with dynamically generated challenge
US8483518B2 (en) * 2010-02-19 2013-07-09 Microsoft Corporation Image-based CAPTCHA exploiting context in object recognition

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9465928B2 (en) * 2014-12-31 2016-10-11 Verizon Patent And Licensing Inc. No-CAPTCHA CAPTCHA
US20170161477A1 (en) * 2015-12-03 2017-06-08 Google Inc. Image Based CAPTCHA Challenges
US9760700B2 (en) * 2015-12-03 2017-09-12 Google Inc. Image based CAPTCHA challenges
KR20180041699A (en) * 2015-12-03 2018-04-24 구글 엘엘씨 Image-based CAPTCHA task
CN108351932A (en) * 2015-12-03 2018-07-31 谷歌有限责任公司 CAPTCHA challenges based on image
US10042992B2 (en) * 2015-12-03 2018-08-07 Google Llc Image based CAPTCHA challenges
KR102043938B1 (en) * 2015-12-03 2019-11-12 구글 엘엘씨 Image-based CAPTCHA challenge
KR20200090963A (en) * 2015-12-03 2020-07-29 구글 엘엘씨 Image based captcha challenges
KR102504077B1 (en) * 2015-12-03 2023-02-27 구글 엘엘씨 Image based captcha challenges
US10496809B1 (en) 2019-07-09 2019-12-03 Capital One Services, Llc Generating a challenge-response for authentication using relations among objects
US10614207B1 (en) * 2019-07-09 2020-04-07 Capital One Services, Llc Generating captcha images using variations of the same object
US10949525B2 (en) 2019-07-09 2021-03-16 Capital One Services, Llc Generating a challenge-response for authentication using relations among objects

Also Published As

Publication number Publication date
WO2011163098A3 (en) 2012-02-23
US8739276B2 (en) 2014-05-27
CN102947837A (en) 2013-02-27
WO2011163098A2 (en) 2011-12-29
EP2585971B1 (en) 2022-03-23
US20110314537A1 (en) 2011-12-22
US8978144B2 (en) 2015-03-10
EP2585971A4 (en) 2018-01-17
EP2585971A2 (en) 2013-05-01
CN102947837B (en) 2016-03-02
US20140259104A1 (en) 2014-09-11

Similar Documents

Publication Publication Date Title
US8978144B2 (en) Automatic construction of human interaction proof engines
Nightingale et al. Can people identify original and manipulated photos of real-world scenes?
US10769487B2 (en) Method and device for extracting information from pie chart
US20200195667A1 (en) Url attack detection method and apparatus, and electronic device
CN109918892B (en) Verification code generation method and device, storage medium and computer equipment
Rojas et al. Sampling techniques to improve big data exploration
US7925062B2 (en) Image processing apparatus, image processing method, signature registration program, and storage medium
US8495518B2 (en) Contextual abnormality CAPTCHAs
EP2410450A1 (en) Method for providing a challenge based on a content
CN104200150B (en) Method and device for processing verification codes
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
US20110283346A1 (en) Overlay human interactive proof system and techniques
US11900494B2 (en) Method and apparatus for adaptive security guidance
US11321524B1 (en) Systems and methods for testing content developed for access via a network
JP2019028094A (en) Character generation device, program and character output device
Parish et al. A study on priming methods for graphical passwords
JP4607943B2 (en) Security level evaluation apparatus and security level evaluation program
JP6168645B2 (en) Reverse Turing test method and access authentication method
Fu et al. Face morphing attacks and face image quality: The effect of morphing and the unsupervised attack detection by quality
Chu et al. Automated GUI testing for android news applications
Roshanbin Interweaving unicode, color, and human interactions to enhance CAPTCHA security
Alshehri et al. Using image saliency and regions of interest to encourage stronger graphical passwords
Hernández-Castro et al. BASECASS: A methodology for CAPTCHAs security assurance
Lago On the preservation of media trustworthiness in the social media era
Zelinsky Studies of Monte Carlo Methodology for Assessing Convergence, Incorporating Decision Making, and Manipulating Continuous Variables

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HULTEN, GEOFFREY J.;SIMARD, PATRICE Y.;KIROVSKI, DARKO;AND OTHERS;SIGNING DATES FROM 20100612 TO 20100621;REEL/FRAME:035032/0612

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:035032/0902

Effective date: 20141014

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION