US20180143896A1 - Testing applications with a defined input format - Google Patents

Testing applications with a defined input format Download PDF

Info

Publication number
US20180143896A1
US20180143896A1 US15/360,554 US201615360554A US2018143896A1 US 20180143896 A1 US20180143896 A1 US 20180143896A1 US 201615360554 A US201615360554 A US 201615360554A US 2018143896 A1 US2018143896 A1 US 2018143896A1
Authority
US
United States
Prior art keywords
value
text element
document
score
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/360,554
Other versions
US9977729B1 (en
Inventor
Franjo Ivancic
Laszlo Szekeres
Domagoj Babic
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US15/360,554 priority Critical patent/US9977729B1/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BABIC, DOMAGOJ, IVANCIC, FRANJO, SZEKERES, LASZLO
Priority to DE102017121649.9A priority patent/DE102017121649A1/en
Priority to DE202017105654.6U priority patent/DE202017105654U1/en
Priority to GB2015227.8A priority patent/GB2590140B/en
Priority to GB1715225.7A priority patent/GB2556978B/en
Priority to CN202110308048.9A priority patent/CN113110988B/en
Priority to CN201710892789.XA priority patent/CN108089974B/en
Priority to PCT/US2017/054024 priority patent/WO2018097895A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Priority to US15/901,349 priority patent/US10095610B2/en
Publication of US9977729B1 publication Critical patent/US9977729B1/en
Application granted granted Critical
Publication of US20180143896A1 publication Critical patent/US20180143896A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • G06F17/2247
    • G06F17/24
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0445
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Fuzz testing provides a technique for testing computer programs with the use of randomized input.
  • fuzz-based testing techniques may be used to generate and modify test inputs, including file documents, that conform with a defined text format such as HyperText Markup Language (HTML), Portable Document Format (PDF) or Cascading Style Sheet (CSS) language.
  • HTTP HyperText Markup Language
  • PDF Portable Document Format
  • CSS Cascading Style Sheet
  • Certain generation-based fuzz techniques may randomly generate or change test documents based on a manually-specified grammar
  • the requirements of a defined format may be written as a set of computer instructions that generate or change a sequence of random values such that the sequence remains fully consistent with the format.
  • Complicated formats may make it difficult and cumbersome to create computer instructions that fully implement the grammar, e.g., are capable of iterating through all of the requirements or iterating through the requirements in unexpected ways.
  • small changes to the requirements of the defined format may require substantial changes to the computer instructions.
  • a mutation-based fuzz technique may involve: selecting a document that conforms with a defined text format; mutating (e.g., modifying) the selected document by randomly changing characters (e.g., by bit flipping or byte incrementing), deleting characters, adding characters, or swapping strings of characters; processing the document using the application being tested; scoring the document based on its coverage (e.g., the identity of routines and the number of unique lines of code that were executed in the application as a result of processing the document) and; using the score as a fitness function in a genetic algorithm or the like to determine whether the document should be further mutated and scored.
  • Documents that result in crashes or allow potentially malicious actions may also be selected for additional mutation and testing.
  • mutation-based fuzz techniques are effective for certain formats such as media formats, they may be less effective than generation-based fuzz techniques when used in connection with complicated text formats.
  • One aspect of the technology relates to a method that includes: receiving a sequence of values of text elements; determining, with one or more computing devices, a score for a text element value of the sequence, where the score is related to the probability of a particular text element value equaling one or more given values, and where said probability is based on sequences of text element values that are consistent with a defined format, comparing, with the one or more computing devices, the score to a threshold; when the score is below a threshold, modifying, with the one or more computing devices, the value of the text element to form a modified sequence of text element values; processing, with the one or more computing devices, the modified sequence of text element values with a set of instructions; and testing, with the one or more computing devices, a performance characteristic of the set of instructions when the set of instructions process the modified sequence of text element values.
  • Another aspect of the technology relates to a system that includes one or more computing devices and a memory storing instructions executable by the one or more computing devices, where the instructions include: receiving an initial sequence of text elements having values; determining a first score for the value of a first text element of the initial sequence, wherein determining a score with respect to the value of a particular text element in a particular sequence of text elements is related to how frequently the value of the particular text element follows same or similar sequences of text element values that are consistent with a defined format; determining a second score for the value of a second text element of the initial sequence, wherein there is a third text element between the first and second text elements in the initial sequence; comparing the first and second scores to a threshold; when the first and second scores are above the threshold, generating a second sequence of text elements having values, where the value of a first text element in the second sequence equals the value of the first text element in the initial sequence, the value of a second text element in the second sequence equals the value of the second text element in the initial
  • Yet another aspect of the system relates to a system of one or more computing devices and a storing instructions executable by the one or more computing devices, where the instructions include: receiving a document containing a sequence of text characters; determining a score for each of a plurality of characters of the document, wherein the score of a character is determined based on the value of the character, the value of one or more preceding characters in the document, and a machine learning component trained with sequences of characters conforming with the defined format; when the score of a character below a threshold, associating the character with a set of characters eligible for modification; modifying at least one of the characters in the set of characters; and after modifying at least one of the characters in the set of characters, measuring the performance of an application as the application processes the document.
  • FIG. 1 is a functional diagram of a system in accordance with aspects of the disclosure.
  • FIG. 2 is a flow diagram in accordance with aspects of the disclosure.
  • FIG. 3 is a flow diagram of the training of a neural network in accordance with aspects of the disclosure.
  • FIG. 4 is an example of a test document.
  • FIG. 5 is an example of likelihood values of text elements.
  • FIG. 6 is an example of a test document.
  • FIG. 7 is an example of a test document modified in accordance with aspects of the disclosure.
  • FIG. 8 is an example of performance characteristics of an application.
  • FIG. 9 is an example of performance characteristics of an application.
  • FIG. 10 is a flow diagram in accordance with aspects of the disclosure.
  • the technology relates to generating a sequence of text element values that may be used to measure the performance of instructions for a computing device.
  • a system 100 of one or more computing devices 110 and 120 may be used to identify documents that are consistent with a defined format (block 210 ), train a recurrent neural network with the documents (block 220 ), use the network to identify the predictability of text elements in a test document (block 230 ), modify the test document based on the predictability of the text elements (block 240 ), test an application with the test document (block 250 ) and, depending on the results of the test, repeat the process of identifying predictability, modifying the test document and testing the application.
  • the system may identify documents that are consistent with a particular defined format and use those documents to train a machine learning component. For instance, the system may train neural network 340 with a corpus 330 of documents 320 that were retrieved from web servers 310 - 12 .
  • the system may use the machine learning component to identify the predictability of text elements within a test document. For instance, neural network 340 may return score values that indicate the predictability of the characters within portion 611 (e.g., string 410 ) is relatively high, the predictability of the first character in portion 610 is relatively moderate, and the predictability of the characters in portion 630 are relatively low. As a result, the system may assign relatively high scores to portions 610 and 630 and relatively low scores to portion 611 .
  • portion 611 e.g., string 410
  • the system may modify the identified portions based on the text element scores returned by the neural network. For example and as shown in FIG. 7 , one or more of the characters in portions 610 and 630 may be changed, deleted or added.
  • modified document 700 may be provided to a browser application for testing and the system may measure various characteristics of the browser's performance, such as how many of the browser's instructions were executed or failed as a result.
  • the system may determine a test score based on the performance and, as shown in FIG. 9 , the modified document may be analyzed, modified, tested and scored again. If the test score increases, the document may be continuously analyzed, modified, tested and scored until its test score does not increase.
  • FIG. 1 provides the example of system 100 , which includes computing devices 110 and 120 .
  • the computing devices are configured to accept information, perform operations based on that information, and take an action or provide additional information in response.
  • the computing devices may be, or include, a processor that is capable of receiving one or more electrical signals representing information expressed as a numerical value as input, determine a numerical value based on the input in accordance with instructions, and provide one or more electrical signals that represent the determined numerical value as output.
  • Device 110 includes processor 111 , which may be a commercially available central processing unit (CPU), application-specific integrated circuit (ASIC) or field-programmable gate array .
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • the instructions used by a computing device include any set of one or more instructions that are accessed and executed by the computing device.
  • device 110 stores values representing instructions 113 and processor 111 is able to access those values and perform, or cause other components of device 110 or system 100 to perform, operations associated with those instructions.
  • device 110 instructions 113 may include machine code (e.g., machine code stored in object code) that is capable of being executed directly by processor 111 .
  • instructions 113 may be stored in a format that requires additional processing before execution, such as a script or collection of independent source code modules that are interpreted on demand
  • An operation expressed as a single instruction in one format may correspond with multiple instructions in another format, e.g., executing a single command in script may require the execution of multiple machine code instructions.
  • the instructions may include instructions that run in, above, or below the operating system layer. For instance, some of the operations described herein may involve the execution of instructions provided by the Chrome or Android operating systems provided by Google, the Windows operating system provided by Microsoft, or the macOS, OS X or iOS operating systems provided by Apple.
  • the instructions may be stored in a memory.
  • instructions 113 are stored in memory 112 .
  • the memory may be any component that is capable of storing information on a non-transitory storage medium that can be read by a computing device, e.g., registers provided on the same substrate as processor 111 , volatile memory such as RAM (random-access memory), non-volatile memory such as flash memory, e.g. a Secure Digital (SD) card, a hard-disk drive, a solid-state drive, optical storage, or tape backups.
  • Device 110 , processor 111 and memory 112 are configured so that processor 111 can read, modify, delete and add values stored in memory 112 .
  • Memory may be configured to provide less access than the example of memory 112 , e.g, memory may be read-only.
  • Memory may store information that is used by, or results from, the operations performed by the computing device.
  • memory 112 stores data 114 , which includes values that are retrieved or stored by processor 111 in accordance with instructions 113 , such as information that is required or determined by device 110 when performing some of the operations described herein. Values stored in memory 112 may be stored in accordance with one or more data structures.
  • a value stored in memory 112 may represent a single numeric value (e.g., a binary number, an integer, a floating point number, a Unicode value representing a single character of text, digit or punctuation mark, or a value representing a single machine code instruction), a set of multiple numeric values (e.g., an array of numbers, a string of text characters, XML-formatted data, or a file), or information from which values to be processed in accordance with instructions 113 may be obtained (e.g., a reference to a value stored at a remote location or a parameter of a function from which the required value is calculated).
  • a single numeric value e.g., a binary number, an integer, a floating point number, a Unicode value representing a single character of text, digit or punctuation mark, or a value representing a single machine code instruction
  • a set of multiple numeric values e.g., an array of numbers, a string of text characters, XML-format
  • a computing device may include components for receiving information from the physical environment surrounding the device and allow direct user input to the computing device. Similar to device 110 , device 120 includes a processor 111 , memory 112 , instructions 113 and data 114 . Device 120 also includes components that detect information relating to the physical environment in which the component is disposed, and this information may include information provided by user 150 . Device 110 includes a user input component 125 having circuitry and other components configured to receive input from user 150 , such as information provided tactilely (e.g., a mouse, keyboard, keypad, button or touchscreen). User input components may perform functions that are not primarily directed to user input.
  • camera 127 may be used to capture user commands (e.g., hand gestures) and other visual information (e.g., the visual characteristics of a mountain).
  • Microphone 126 may be used to capture user commands (e.g., verbal commands) and other audio information (e.g., the sound of a waterfall).
  • a computing device may include components for providing information via the physical environment surrounding the device and provide output directly to users.
  • a component may include circuitry that outputs visual, audio or tactile information to users of the device, such as display 130 (e.g., a computer monitor, a touch-screen, a projector or another component that is operable to change a visual characteristic in response to a signal), speaker 128 , or motor 126 to vibrate the device.
  • a computing device may include one or more components for communicating with other computing devices.
  • devices 110 and 120 include circuitry (e.g., a network interface) connecting each device to a different node of communication network 190 .
  • Network 190 may be composed of multiple networks using different communication protocols. For instance, when device 110 transmits information to device 120 , the information may be sent over one or more of the Internet (e.g., via core Internet routers in accordance with the Transmission Control Protocol (TCP) and Internet Protocol (IP)), a cellular network (e.g., in accordance with the LTE (Long-Term Evolution) standard), a local network (e.g., an Ethernet or Wi-Fi network), and a Bluetooth connection.
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • a device may provide information to a user via other devices, e.g., device 110 may display information to user 150 by sending the information over network 190 to device 120 for display on display 130 .
  • a computing device may also provide information to another computing device without the use of a network.
  • one computing device may output information with a display and another computing device may detect that information with a camera.
  • FIG. 1 the system may include a large number of computing devices that are connected to the network at a large number of nodes.
  • FIG. 1 shows computing devices 110 and 120 as individual blocks, each of which contains its own processor and memory
  • the operations described herein may involve a single computing device or many computing devices, e.g., in the “cloud”.
  • various operations described below as involving a single computing device e.g., a single central processing unit (CPU) in a single server
  • CPU central processing unit
  • a plurality of computing devices e.g., multiple processors in a load-balanced server farm
  • memory components at different locations may store different portions of instructions 113 and collectively form a medium for storing the instructions.
  • operations described as involving a plurality of computing devices may be performed by a single computing device, e.g., rather than sending data to device 110 for processing, device 120 may process the data itself.
  • device 120 may function as a thin client wherein device 110 performs all or nearly all operations that are not directly related to receiving and providing information to users via user input component 125 and display 130 .
  • Various operations described herein as being performed by a computing device may be performed by a virtual machine.
  • instructions 113 may be specific to a Windows server, but the relevant operations may be performed by a Linux server running a hypervisor that emulates a Windows server.
  • the operations may also be performed by a container, e.g., a computing environment that does not rely on an operating system tied to specific types of hardware.
  • device 110 is a server and devices 120 - 21 are client devices.
  • device 110 may be a server and device 120 may be a desktop (notebook) computer system, e.g., processor 121 and memory 122 may be contained in a desktop personal computer, display 130 may be an external monitor connected to the personal computer by a cable, and user input component 125 may be an external keyboard that communicates with the computer via Bluetooth.
  • device 120 may be a wireless phone with a touchscreen that functions as both display 130 and user input component 125 .
  • client devices may include, by way of example, laptops, notebooks, netbooks, tablets, set-top boxes (e.g., a cable-television set-top box connected to a television) and wearable devices (e.g., a smartwatch).
  • set-top boxes e.g., a cable-television set-top box connected to a television
  • wearable devices e.g., a smartwatch
  • a computing device may include other components that are typically present in such devices or general purpose computers but are not expressly described herein.
  • the system may also store documents having one or more text elements, e.g., a single character, a token (e.g., a sequence of characters between delimiters within a document such as a word) or a sequence of characters of a given length.
  • the text elements may have values, e.g., the value of a character may “a” or “*”.
  • the text element may be stored within a document as a sequence of values that conform with a defined format, e.g., one or more requirements regarding how text elements relating to certain types of information should be stored within a document.
  • a defined format e.g., one or more requirements regarding how text elements relating to certain types of information should be stored within a document.
  • the requirements may be considered the grammar of the language.
  • the requirements may relate to, by way of example only, the permissible text element values (e.g., UNICODE or ASCII), where certain types of information needs to be stored relative to the beginning of the document (e.g., documents that comply with HTML version 5 start with “ ⁇ !DOCTYPE html>”), reserved keywords, and where certain types of information are stored relative to other types of information (e.g., the destination URL of a link on a webpage follows the keyword “href”). While most of the examples below focus on HTML for ease of illustration, the system and operations described herein apply to other defined formats, including proprietary standards.
  • permissible text element values e.g., UNICODE or ASCII
  • the system may identify preexisting samples of information arranged in accordance with a selected defined format. For instance and as shown in FIG. 1 , device 110 may use network 190 to access documents that are stored at sources 191 and are consistent with the defined format. By way of example, as shown in FIG. 3 , the system may retrieve a plurality of web pages formatted in accordance with HTML from a plurality of web servers 310 - 12 . The web pages 320 collectively form a corpus 330 of HTML documents.
  • the system may train a machine learning component with the samples of the defined format.
  • device 110 may store instructions associated with training a neural network, traversing the neural network, extracting data stored in connection with the neural network, and generating sequences of data values based on the neural network.
  • neural network 340 may be a recurrent network such as a Long-Short Term Memory (LSTM) neural network, and device 110 may train the neural network with corpus 330 .
  • LSTM Long-Short Term Memory
  • the weights, biases and other aspects stored in the data structures of machine learning component may represent a model of the defined format, wherein the model is not a set of grammatical rules (such as might be present in a generation-based fuzzing technique) but rather a probabilistic model.
  • system 100 may use neural network 340 to determine a score (e.g., a numeric value) for a text element that is related to how frequently the value of the particular text element followed the same or similar sequences of text elements in the corpus 330 of HTML documents that were used to train the neural network.
  • the neural network may model the defined format by providing scores that are related to the likelihood that the value of a particular text element within a sequence of text elements values will equal a specific value when the sequence conforms with the defined format.
  • the likelihood may be a function of the values of the other text elements in the sequence and the proximity of the other values to the particular text element such as, in the case of a recurrent network, the values of a given number of text elements that precede the particular text element.
  • model 350 may indicate (and the scores returned by the neural network may reflect) that the likelihood of “e” following “head” is 33% (e.g., as in “ ⁇ header>”), the likelihood of “>” is 22% (e.g., as in “ ⁇ head>”), the likelihood of a space is 11% (e.g., when the word “head” is used in a sentence), the likelihood of “a” is 6% (e.g., as in “headache”), etc.
  • the example percentage distributions of the example strings and characters discussed herein have been selected for ease of illustration, e.g., they ignore issues such as case sensitivity.
  • a large corpus of publicly-accessible HTML documents may yield different probabilities and distributions than those set forth herein.
  • the system may use the scores to identify portions of a test document to be modified. For instance, the scores provided by the neural network 340 may be used to determine whether a particular text element of test document is eligible for mutation.
  • FIG. 4 provides an example of an HTML test document. Like nearly all, if not all, HTML documents, test document 400 starts with the string 410 (“ ⁇ !DO”), which corresponds with the keywords “!DOCTYPE html”.
  • model 350 may indicate that the likelihood 510 of an HTML document starting with “ ⁇ ” is 100%, the likelihood 511 that the next character is “!” is 100%, and the likelihood 512 that the next character is “D” is also 100%.
  • neural network 340 may return a relatively high score for each character of the string “ ⁇ !D” at the beginning of a document because the string has relatively low perplexity, e.g., each character is highly predictable in view of the characters before it. (For the purposes of discussing FIGS. 4 and 5 , it is assumed that the neural network 340 is configured to score the next character based on up to five characters that immediately precede it.)
  • the character after the string “[space]lang” may be considered to have relatively moderate perplexity because model 350 indicates that there is a 60% chance it will be one of two values.
  • the likelihood of the next character being a double quote symbol is relatively high, e.g., 88%.
  • the likelihood 523 that the character after that would be “e” is significant.
  • FIG. 5 further illustrates the predictability of the character following the string 430 , e.g., “inter”.
  • the number of popular words that contain the string “inter” may be so numerous that many characters have a small yet relatively equal likelihood 530 of being next. Therefore, since the relative perplexity of the character following the string “inter” is relatively high, neural network 340 may return a relatively low score for that character.
  • the system may select text elements of a sequence for modification based on the scores returned by the neural network. Modifying keywords in a document may result in an application being unable to parse the document and may thus cause the application to cease processing the document altogether or crash; that may be helpful in some circumstances, but it may also result in many routines of the application going untested. As noted above, keywords tend to be associated with high scores. As a result and as shown in FIG. 6 , the system may designate the characters within portions 611 and 612 (containing many keywords) as being sufficiently predictable to remain unmodified and the characters within portions 610 (containing one or many possible language codes) and 630 (containing text intended for display to users) as being sufficiently unpredictable to be modified.
  • the system may determine whether a text element is eligible or not for modification by comparing the text element's score to a threshold. For instance, device 110 may iterate through the values of the characters of test document 400 and, for each character, determine a score based on the neural network and compare that score to a modification eligibility threshold.
  • the modification eligibility threshold may correspond with a likelihood threshold.
  • the score returned by neural network 340 may exceed a corresponding modification eligibility threshold and, if so, the device 110 may designate the character as ineligible for modification.
  • modification eligibility thresholds associated with likelihood thresholds of 90% or greater may yield more interesting results.
  • the device 110 may have designated the character as eligible for modification.
  • the system may further designate all portions between portions of high predictability, such as portions 610 and 630 between portions 611 - 12 and portions 631 - 32 , respectively, as eligible for modification.
  • the score and modification eligibility threshold may also be based on factors that are not specific to the value of the text element. For instance, if many character values have a relatively uniform likelihood of following a particular string, the modification eligibility threshold may be lowered, e.g., system 100 may be more likely to designate the character as being eligible for modification.
  • the modification eligibility threshold may also be dynamically determined. For instance, the threshold may be a randomly determined number. By way of example, a character within document 400 may be identified as ineligible for modification when S>Random(0.00-1.00), where S is the score returned by neural network 340 for the character, and the scores and potential random values range between 0.00 to 1.00.
  • the system may designate a text element as being eligible for modification when S>1 ⁇ Random(0.00-1.00).
  • the score may also be based on the number of potential values that are above a minimum threshold or the score of the most-likely value for the text element.
  • the portions of the document that were designated as eligible for modification may be randomly changed.
  • the system may generate a modified document by 700 indicating to a mutation-based fuzzing module that portions 610 and 630 are available for mutation, e.g., characters may be randomly changed, added or deleted.
  • the portions that are eligible for modification may also be selected for replacement by portions from the same or other documents. For instance and as shown in FIG. 7 , the system has swapped the positions of portions 640 and 650 . Portions of the test document may also be replaced with portions from other test documents. Moreover, the replacement portion does not have to be the same size as the replaced portion, particularly if the replacement is taken from another document. For example, increasingly larger replacement sequences may be added until a maximum length is reached or a relatively high perplexity point is reached.
  • the system may consider not only the preceding text elements but subsequent text elements as well.
  • the FIG. 6 shows string “ ⁇ /head> ⁇ body>” as a single block 631 .
  • the model may indicate that the character “b” is relatively unpredictable since it may be highly likely that a keyword will follow “ead> ⁇ ” but the precise keyword itself may be hard to predict.
  • the model may indicate that the remaining characters of the block 631 are highly predictable because the tag “ ⁇ body>” often follows “ ⁇ head>”.
  • the system may check whether the lower-scored character and subsequent high-scored characters form a single keyword.
  • the system may measure one or more characteristics of a set of computer instructions' performance as they process the test sequences.
  • device 110 may load test document 700 into a browser 810 and generate a report 805 regarding how well the browser performed.
  • the report may include the name 811 of the document and information such as whether loading the document caused particular routines 820 - 23 (e.g., routines relating to rendering content, executing javascript, communicating information over a network and securing data) and third party plug-ins 830 - 32 to be called and, if so, whether any errors occurred.
  • routines 820 - 23 e.g., routines relating to rendering content, executing javascript, communicating information over a network and securing data
  • third party plug-ins 830 - 32 to be called and, if so, whether any errors occurred.
  • the system may also calculate, and the report may also include, a test score 815 based on other performance characteristics such as the document's coverage (e.g., the number of different routines that were called, the total number of unique lines of code that were executed), processing speed (e.g., load times), CPU and memory usage, whether any interesting results were encountered (e.g., errors, security issues such as buffer overflow errors and decompression bombs, plug-in failure 832 , the application crashed, etc.), and any other measurable aspects relating to performance. Different events may be weighted differently when calculating a test score.
  • the document's coverage e.g., the number of different routines that were called, the total number of unique lines of code that were executed
  • processing speed e.g., load times
  • CPU and memory usage whether any interesting results were encountered (e.g., errors, security issues such as buffer overflow errors and decompression bombs, plug-in failure 832 , the application crashed, etc.), and any other measurable aspects relating to performance.
  • the performance characteristics may be used to determine whether further modifications should be made to the application or document. For instance, after the initial test, the document may be modified as described above and tested once again. As shown in FIG. 9 , the system may then generate a second report 905 , which indicates that an additional routine 921 of the browser and third-party plug-in 931 were called. The report also indicates that third-party plugin 930 , which processed the prior version of the document without difficulty, failed with a buffer overflow error that could, in some circumstances, create security issues for data handled by that plug-in. As a result, the calculated test score 915 of the document increased relative to the first report 805 .
  • the system may repeat the process of identifying portions of a document that are eligible for modification based on scores returned by the neural network, modifying the eligible portions of the document, and determining a test score for the document until the document's test score stops increasing.
  • the system may test a PDF reader, e.g., the functionality of a browser for displaying PDF documents, a stand-alone PDF application for displaying and editing PDF documents, etc.
  • the system may also be used to generate a completely new set of test documents in compliance with the model of the defined format. For instance, the system may create a new test document and randomly select the value of the text elements based on the likelihood of their occurrence as indicated by the model represented by the neural network.
  • model 350 may indicate that all HTML document in the corpus begin with “ ⁇ DOCTYPE html”.
  • the probabilities may change, e.g., the model may indicate that 80% of the time the string “html” is followed by a space and 20% of the time it is followed by a “>”.
  • the system may randomly select a space or “>” by generating a random number between 0.00 and 1.00 and determining whether the number is greater or less than 0.80 and select a space or “>” accordingly as the next character of the generated document.
  • the remaining characters of the document may be similarly determined.
  • the randomly-generated test documents may be then be modified and tested as described above.
  • the model may be periodically updated to reflect changes to requirements and features of the defined format.
  • neural network 340 may be periodically trained with recently created documents in order to keep model 350 up to date with changes to the defined format.
  • the cost and other resources required to train and maintain a neural network such as neural network 340 may be less than the cost and resources required to write and maintain computer programs that create or analyze documents based on a rigid set of grammatical rules that were set by the people or organization that defined the format.
  • the system may use other components to provide a score that is based on the relationship of a given sequence of text element values to sequences of text element values that conform with a defined format.
  • the machine learning component may be a support vector machine trained with N-grams copied from documents that comply with the deformed format, or a hidden Markov model.
  • the text-element scoring component may include a statistical regression routine that uses a sequence containing the text-element as the dependent variable and sequences in the corpus as independent variables.
  • FIG. 10 provides a flowchart of a method that may be executed by one or more computing devices.
  • a sequence of values of text elements is received.
  • a score for a text element value of the sequence is determined, where the score with respect to a particular text element value is related to the probability that the particular text element value will equal one or more given values, and where the probability is determined based on sequences of text element values that are consistent with a defined format.
  • the score is compared to a threshold.
  • the value of the text element is modified to form a modified sequence of text element values.
  • the modified sequence is processed with a set of instructions.
  • a characteristic of the performance of the computer instructions is tested by processing the modified sequence with the computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A system and method are provided for testing the performance of applications. By way of example only, the method may include training a neural network with documents containing text elements that are arranged in accordance with a defined format and using the neural network to determine the predictability of the value of individual text elements within a test document. When the neural network indicates that the value of a text element is unlikely, the value may be modified and the modified document may be used to test an application that processes documents in accordance with the defined format.

Description

    BACKGROUND
  • Fuzz testing provides a technique for testing computer programs with the use of randomized input. For instance, fuzz-based testing techniques may be used to generate and modify test inputs, including file documents, that conform with a defined text format such as HyperText Markup Language (HTML), Portable Document Format (PDF) or Cascading Style Sheet (CSS) language. When the document is provided to an application for processing, the application may be monitored for unexpected or undesirable behaviors, such as crashes or exposing data to unauthorized access.
  • Certain generation-based fuzz techniques may randomly generate or change test documents based on a manually-specified grammar For example, the requirements of a defined format may be written as a set of computer instructions that generate or change a sequence of random values such that the sequence remains fully consistent with the format. Complicated formats may make it difficult and cumbersome to create computer instructions that fully implement the grammar, e.g., are capable of iterating through all of the requirements or iterating through the requirements in unexpected ways. Moreover, small changes to the requirements of the defined format may require substantial changes to the computer instructions.
  • Certain mutation-based fuzz techniques may make small changes to an existing test document, analyze the results and then repeat the process. By way of example, a mutation-based fuzz technique may involve: selecting a document that conforms with a defined text format; mutating (e.g., modifying) the selected document by randomly changing characters (e.g., by bit flipping or byte incrementing), deleting characters, adding characters, or swapping strings of characters; processing the document using the application being tested; scoring the document based on its coverage (e.g., the identity of routines and the number of unique lines of code that were executed in the application as a result of processing the document) and; using the score as a fitness function in a genetic algorithm or the like to determine whether the document should be further mutated and scored. Documents that result in crashes or allow potentially malicious actions (e.g., buffer overflow) may also be selected for additional mutation and testing. Although mutation-based fuzz techniques are effective for certain formats such as media formats, they may be less effective than generation-based fuzz techniques when used in connection with complicated text formats.
  • SUMMARY
  • One aspect of the technology relates to a method that includes: receiving a sequence of values of text elements; determining, with one or more computing devices, a score for a text element value of the sequence, where the score is related to the probability of a particular text element value equaling one or more given values, and where said probability is based on sequences of text element values that are consistent with a defined format, comparing, with the one or more computing devices, the score to a threshold; when the score is below a threshold, modifying, with the one or more computing devices, the value of the text element to form a modified sequence of text element values; processing, with the one or more computing devices, the modified sequence of text element values with a set of instructions; and testing, with the one or more computing devices, a performance characteristic of the set of instructions when the set of instructions process the modified sequence of text element values.
  • Another aspect of the technology relates to a system that includes one or more computing devices and a memory storing instructions executable by the one or more computing devices, where the instructions include: receiving an initial sequence of text elements having values; determining a first score for the value of a first text element of the initial sequence, wherein determining a score with respect to the value of a particular text element in a particular sequence of text elements is related to how frequently the value of the particular text element follows same or similar sequences of text element values that are consistent with a defined format; determining a second score for the value of a second text element of the initial sequence, wherein there is a third text element between the first and second text elements in the initial sequence; comparing the first and second scores to a threshold; when the first and second scores are above the threshold, generating a second sequence of text elements having values, where the value of a first text element in the second sequence equals the value of the first text element in the initial sequence, the value of a second text element in the second sequence equals the value of the second text element in the initial sequence, the value of a third text element in the second sequence is different from the value of the third text element in the initial sequence, and the third text element is in between the first and second text elements in the sequence; processing, with the one or more computing devices, the second sequence of text elements with an application; and testing, with the one or more computing devices, a performance characteristic of the application when the application processes the modified sequence of text element values.
  • Yet another aspect of the system relates to a system of one or more computing devices and a storing instructions executable by the one or more computing devices, where the instructions include: receiving a document containing a sequence of text characters; determining a score for each of a plurality of characters of the document, wherein the score of a character is determined based on the value of the character, the value of one or more preceding characters in the document, and a machine learning component trained with sequences of characters conforming with the defined format; when the score of a character below a threshold, associating the character with a set of characters eligible for modification; modifying at least one of the characters in the set of characters; and after modifying at least one of the characters in the set of characters, measuring the performance of an application as the application processes the document.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a functional diagram of a system in accordance with aspects of the disclosure.
  • FIG. 2 is a flow diagram in accordance with aspects of the disclosure.
  • FIG. 3 is a flow diagram of the training of a neural network in accordance with aspects of the disclosure.
  • FIG. 4 is an example of a test document.
  • FIG. 5 is an example of likelihood values of text elements.
  • FIG. 6 is an example of a test document.
  • FIG. 7 is an example of a test document modified in accordance with aspects of the disclosure.
  • FIG. 8 is an example of performance characteristics of an application.
  • FIG. 9 is an example of performance characteristics of an application.
  • FIG. 10 is a flow diagram in accordance with aspects of the disclosure.
  • DETAILED DESCRIPTION
  • Overview
  • The technology relates to generating a sequence of text element values that may be used to measure the performance of instructions for a computing device. By way of example and as shown in FIGS. 1 and 2, a system 100 of one or more computing devices 110 and 120 may be used to identify documents that are consistent with a defined format (block 210), train a recurrent neural network with the documents (block 220), use the network to identify the predictability of text elements in a test document (block 230), modify the test document based on the predictability of the text elements (block 240), test an application with the test document (block 250) and, depending on the results of the test, repeat the process of identifying predictability, modifying the test document and testing the application.
  • In that regard and as shown in FIG. 3, the system may identify documents that are consistent with a particular defined format and use those documents to train a machine learning component. For instance, the system may train neural network 340 with a corpus 330 of documents 320 that were retrieved from web servers 310-12.
  • As shown in FIGS. 4-6, the system may use the machine learning component to identify the predictability of text elements within a test document. For instance, neural network 340 may return score values that indicate the predictability of the characters within portion 611 (e.g., string 410) is relatively high, the predictability of the first character in portion 610 is relatively moderate, and the predictability of the characters in portion 630 are relatively low. As a result, the system may assign relatively high scores to portions 610 and 630 and relatively low scores to portion 611.
  • The system may modify the identified portions based on the text element scores returned by the neural network. For example and as shown in FIG. 7, one or more of the characters in portions 610 and 630 may be changed, deleted or added.
  • The modified document may be provided to a set of computer instructions for testing. By way of example and as shown in FIG. 8, modified test document 700 may be provided to a browser application for testing and the system may measure various characteristics of the browser's performance, such as how many of the browser's instructions were executed or failed as a result. The system may determine a test score based on the performance and, as shown in FIG. 9, the modified document may be analyzed, modified, tested and scored again. If the test score increases, the document may be continuously analyzed, modified, tested and scored until its test score does not increase.
  • Example Systems
  • Systems such as those described above may include one or more computing devices. For instance, FIG. 1 provides the example of system 100, which includes computing devices 110 and 120. The computing devices are configured to accept information, perform operations based on that information, and take an action or provide additional information in response. The computing devices may be, or include, a processor that is capable of receiving one or more electrical signals representing information expressed as a numerical value as input, determine a numerical value based on the input in accordance with instructions, and provide one or more electrical signals that represent the determined numerical value as output. Device 110 includes processor 111, which may be a commercially available central processing unit (CPU), application-specific integrated circuit (ASIC) or field-programmable gate array .
  • The instructions used by a computing device include any set of one or more instructions that are accessed and executed by the computing device. By way of example, device 110 stores values representing instructions 113 and processor 111 is able to access those values and perform, or cause other components of device 110 or system 100 to perform, operations associated with those instructions. For example, device 110 instructions 113 may include machine code (e.g., machine code stored in object code) that is capable of being executed directly by processor 111. Alternatively or in addition, instructions 113 may be stored in a format that requires additional processing before execution, such as a script or collection of independent source code modules that are interpreted on demand An operation expressed as a single instruction in one format may correspond with multiple instructions in another format, e.g., executing a single command in script may require the execution of multiple machine code instructions. If the computing device has an operating system, the instructions may include instructions that run in, above, or below the operating system layer. For instance, some of the operations described herein may involve the execution of instructions provided by the Chrome or Android operating systems provided by Google, the Windows operating system provided by Microsoft, or the macOS, OS X or iOS operating systems provided by Apple.
  • The instructions may be stored in a memory. For instance, instructions 113 are stored in memory 112. The memory may be any component that is capable of storing information on a non-transitory storage medium that can be read by a computing device, e.g., registers provided on the same substrate as processor 111, volatile memory such as RAM (random-access memory), non-volatile memory such as flash memory, e.g. a Secure Digital (SD) card, a hard-disk drive, a solid-state drive, optical storage, or tape backups. Device 110, processor 111 and memory 112 are configured so that processor 111 can read, modify, delete and add values stored in memory 112. Memory may be configured to provide less access than the example of memory 112, e.g, memory may be read-only.
  • Memory may store information that is used by, or results from, the operations performed by the computing device. By way of example, memory 112 stores data 114, which includes values that are retrieved or stored by processor 111 in accordance with instructions 113, such as information that is required or determined by device 110 when performing some of the operations described herein. Values stored in memory 112 may be stored in accordance with one or more data structures. For instance, a value stored in memory 112 may represent a single numeric value (e.g., a binary number, an integer, a floating point number, a Unicode value representing a single character of text, digit or punctuation mark, or a value representing a single machine code instruction), a set of multiple numeric values (e.g., an array of numbers, a string of text characters, XML-formatted data, or a file), or information from which values to be processed in accordance with instructions 113 may be obtained (e.g., a reference to a value stored at a remote location or a parameter of a function from which the required value is calculated).
  • A computing device may include components for receiving information from the physical environment surrounding the device and allow direct user input to the computing device. Similar to device 110, device 120 includes a processor 111, memory 112, instructions 113 and data 114. Device 120 also includes components that detect information relating to the physical environment in which the component is disposed, and this information may include information provided by user 150. Device 110 includes a user input component 125 having circuitry and other components configured to receive input from user 150, such as information provided tactilely (e.g., a mouse, keyboard, keypad, button or touchscreen). User input components may perform functions that are not primarily directed to user input. By way of example, camera 127 may be used to capture user commands (e.g., hand gestures) and other visual information (e.g., the visual characteristics of a mountain). Microphone 126 may be used to capture user commands (e.g., verbal commands) and other audio information (e.g., the sound of a waterfall).
  • A computing device may include components for providing information via the physical environment surrounding the device and provide output directly to users. For example, a component may include circuitry that outputs visual, audio or tactile information to users of the device, such as display 130 (e.g., a computer monitor, a touch-screen, a projector or another component that is operable to change a visual characteristic in response to a signal), speaker 128, or motor 126 to vibrate the device.
  • A computing device may include one or more components for communicating with other computing devices. By way of example, devices 110 and 120 include circuitry (e.g., a network interface) connecting each device to a different node of communication network 190. Network 190 may be composed of multiple networks using different communication protocols. For instance, when device 110 transmits information to device 120, the information may be sent over one or more of the Internet (e.g., via core Internet routers in accordance with the Transmission Control Protocol (TCP) and Internet Protocol (IP)), a cellular network (e.g., in accordance with the LTE (Long-Term Evolution) standard), a local network (e.g., an Ethernet or Wi-Fi network), and a Bluetooth connection. A device may provide information to a user via other devices, e.g., device 110 may display information to user 150 by sending the information over network 190 to device 120 for display on display 130. A computing device may also provide information to another computing device without the use of a network. By way of example, one computing device may output information with a display and another computing device may detect that information with a camera. Although only a few computing devices are depicted in FIG. 1, the system may include a large number of computing devices that are connected to the network at a large number of nodes.
  • Although FIG. 1 shows computing devices 110 and 120 as individual blocks, each of which contains its own processor and memory, the operations described herein may involve a single computing device or many computing devices, e.g., in the “cloud”. For example, various operations described below as involving a single computing device (e.g., a single central processing unit (CPU) in a single server) may involve a plurality of computing devices (e.g., multiple processors in a load-balanced server farm). Similarly, memory components at different locations may store different portions of instructions 113 and collectively form a medium for storing the instructions. By way of further example, operations described as involving a plurality of computing devices may be performed by a single computing device, e.g., rather than sending data to device 110 for processing, device 120 may process the data itself. Alternatively, device 120 may function as a thin client wherein device 110 performs all or nearly all operations that are not directly related to receiving and providing information to users via user input component 125 and display 130. Various operations described herein as being performed by a computing device may be performed by a virtual machine. By way of example, instructions 113 may be specific to a Windows server, but the relevant operations may be performed by a Linux server running a hypervisor that emulates a Windows server. The operations may also be performed by a container, e.g., a computing environment that does not rely on an operating system tied to specific types of hardware.
  • In various examples described herein, device 110 is a server and devices 120-21 are client devices. For instance, device 110 may be a server and device 120 may be a desktop (notebook) computer system, e.g., processor 121 and memory 122 may be contained in a desktop personal computer, display 130 may be an external monitor connected to the personal computer by a cable, and user input component 125 may be an external keyboard that communicates with the computer via Bluetooth. Alternatively, device 120 may be a wireless phone with a touchscreen that functions as both display 130 and user input component 125. Other client devices may include, by way of example, laptops, notebooks, netbooks, tablets, set-top boxes (e.g., a cable-television set-top box connected to a television) and wearable devices (e.g., a smartwatch). In that regard, a computing device may include other components that are typically present in such devices or general purpose computers but are not expressly described herein.
  • The system may also store documents having one or more text elements, e.g., a single character, a token (e.g., a sequence of characters between delimiters within a document such as a word) or a sequence of characters of a given length. The text elements may have values, e.g., the value of a character may “a” or “*”.
  • The text element may be stored within a document as a sequence of values that conform with a defined format, e.g., one or more requirements regarding how text elements relating to certain types of information should be stored within a document. To the extent the defined format may be considered a language, the requirements may be considered the grammar of the language. The requirements may relate to, by way of example only, the permissible text element values (e.g., UNICODE or ASCII), where certain types of information needs to be stored relative to the beginning of the document (e.g., documents that comply with HTML version 5 start with “<!DOCTYPE html>”), reserved keywords, and where certain types of information are stored relative to other types of information (e.g., the destination URL of a link on a webpage follows the keyword “href”). While most of the examples below focus on HTML for ease of illustration, the system and operations described herein apply to other defined formats, including proprietary standards.
  • Example Methods
  • Operations in accordance with a variety of aspects of the method will now be described. It should be understood that the following operations do not have to be performed in the precise order described below. Rather, various steps can be handled in different order or simultaneously.
  • The system may identify preexisting samples of information arranged in accordance with a selected defined format. For instance and as shown in FIG. 1, device 110 may use network 190 to access documents that are stored at sources 191 and are consistent with the defined format. By way of example, as shown in FIG. 3, the system may retrieve a plurality of web pages formatted in accordance with HTML from a plurality of web servers 310-12. The web pages 320 collectively form a corpus 330 of HTML documents.
  • The system may train a machine learning component with the samples of the defined format. In that regard, device 110 may store instructions associated with training a neural network, traversing the neural network, extracting data stored in connection with the neural network, and generating sequences of data values based on the neural network. By way of example, neural network 340 may be a recurrent network such as a Long-Short Term Memory (LSTM) neural network, and device 110 may train the neural network with corpus 330.
  • Once trained, the weights, biases and other aspects stored in the data structures of machine learning component may represent a model of the defined format, wherein the model is not a set of grammatical rules (such as might be present in a generation-based fuzzing technique) but rather a probabilistic model. For instance, system 100 may use neural network 340 to determine a score (e.g., a numeric value) for a text element that is related to how frequently the value of the particular text element followed the same or similar sequences of text elements in the corpus 330 of HTML documents that were used to train the neural network. In that regard, the neural network may model the defined format by providing scores that are related to the likelihood that the value of a particular text element within a sequence of text elements values will equal a specific value when the sequence conforms with the defined format. The likelihood may be a function of the values of the other text elements in the sequence and the proximity of the other values to the particular text element such as, in the case of a recurrent network, the values of a given number of text elements that precede the particular text element. By way of example, after neural network 340 is trained with corpus 330 of HTML documents, model 350 may indicate (and the scores returned by the neural network may reflect) that the likelihood of “e” following “head” is 33% (e.g., as in “<header>”), the likelihood of “>” is 22% (e.g., as in “<head>”), the likelihood of a space is 11% (e.g., when the word “head” is used in a sentence), the likelihood of “a” is 6% (e.g., as in “headache”), etc. (The example percentage distributions of the example strings and characters discussed herein have been selected for ease of illustration, e.g., they ignore issues such as case sensitivity. A large corpus of publicly-accessible HTML documents may yield different probabilities and distributions than those set forth herein.)
  • The system may use the scores to identify portions of a test document to be modified. For instance, the scores provided by the neural network 340 may be used to determine whether a particular text element of test document is eligible for mutation. FIG. 4 provides an example of an HTML test document. Like nearly all, if not all, HTML documents, test document 400 starts with the string 410 (“<!DO”), which corresponds with the keywords “!DOCTYPE html”. In that regard and as shown in FIG. 5, model 350 may indicate that the likelihood 510 of an HTML document starting with “<” is 100%, the likelihood 511 that the next character is “!” is 100%, and the likelihood 512 that the next character is “D” is also 100%. As a result, neural network 340 may return a relatively high score for each character of the string “<!D” at the beginning of a document because the string has relatively low perplexity, e.g., each character is highly predictable in view of the characters before it. (For the purposes of discussing FIGS. 4 and 5, it is assumed that the neural network 340 is configured to score the next character based on up to five characters that immediately precede it.)
  • As is also common with many HTML documents, document 400 also contains the keyword “lang=” followed by “[double quote]en[double quote]” (string 420), which indicates that the document is written in English. In that regard, model 350 may indicate that the likelihood 520 that “=” will follow a double quote is 35% (e.g., when “lang” is used as a keyword), the likelihood 521 that “u” will follow “[space]lang” is 25% (e.g., when the word “language” is used in a sentence), and the likelihood that any other character will follow [double quote] may be close to zero (and thus not shown in the chart of FIG. 5). As a result, the character after the string “[space]lang” may be considered to have relatively moderate perplexity because model 350 indicates that there is a 60% chance it will be one of two values. As also indicated in FIG. 5, if the next character after “[space]lang” is “=”,the likelihood of the next character being a double quote symbol is relatively high, e.g., 88%. Because of the number of HTML documents that contain the string “[space]lang=[double quote]en[double quote]”, the likelihood 523 that the character after that would be “e” is significant. However, because there are many other potential languages for an HTML document beyond English, “e” may be one of many characters that have a significant likelihood of following “lang=[double quote]”. As a result, the character after the string “ang=[double quote]” may be considered to have a relatively moderate perplexity and predictability, and neural network 340 may thus return a relatively moderate score for the character “e” in the string “lange=[double quote]e”.
  • FIG. 5 further illustrates the predictability of the character following the string 430, e.g., “inter”. The number of popular words that contain the string “inter” may be so numerous that many characters have a small yet relatively equal likelihood 530 of being next. Therefore, since the relative perplexity of the character following the string “inter” is relatively high, neural network 340 may return a relatively low score for that character.
  • The system may select text elements of a sequence for modification based on the scores returned by the neural network. Modifying keywords in a document may result in an application being unable to parse the document and may thus cause the application to cease processing the document altogether or crash; that may be helpful in some circumstances, but it may also result in many routines of the application going untested. As noted above, keywords tend to be associated with high scores. As a result and as shown in FIG. 6, the system may designate the characters within portions 611 and 612 (containing many keywords) as being sufficiently predictable to remain unmodified and the characters within portions 610 (containing one or many possible language codes) and 630 (containing text intended for display to users) as being sufficiently unpredictable to be modified.
  • The system may determine whether a text element is eligible or not for modification by comparing the text element's score to a threshold. For instance, device 110 may iterate through the values of the characters of test document 400 and, for each character, determine a score based on the neural network and compare that score to a modification eligibility threshold. The modification eligibility threshold may correspond with a likelihood threshold. By way of example and with reference to FIGS. 4 and 5, model 350 may indicate that the likelihood of “=” being the next character after “[space]lang” is 35%, which exceeds a threshold (line 550) of 30%. As a result, the score returned by neural network 340 may exceed a corresponding modification eligibility threshold and, if so, the device 110 may designate the character as ineligible for modification. (Although FIG. 5 indicates a threshold of 30% for ease of illustration, modification eligibility thresholds associated with likelihood thresholds of 90% or greater may yield more interesting results.) However, if the value of the character following “[space]lang” had been “u” instead of “=”, and if the score of “u” was lower than the modification eligibility threshold (e.g., the model indicates that the odds of the character being “u” was relatively unlikely), the device 110 may have designated the character as eligible for modification. The system may further designate all portions between portions of high predictability, such as portions 610 and 630 between portions 611-12 and portions 631-32, respectively, as eligible for modification.
  • The score and modification eligibility threshold may also be based on factors that are not specific to the value of the text element. For instance, if many character values have a relatively uniform likelihood of following a particular string, the modification eligibility threshold may be lowered, e.g., system 100 may be more likely to designate the character as being eligible for modification. The modification eligibility threshold may also be dynamically determined. For instance, the threshold may be a randomly determined number. By way of example, a character within document 400 may be identified as ineligible for modification when S>Random(0.00-1.00), where S is the score returned by neural network 340 for the character, and the scores and potential random values range between 0.00 to 1.00. As a corollary, the system may designate a text element as being eligible for modification when S>1−Random(0.00-1.00). The score may also be based on the number of potential values that are above a minimum threshold or the score of the most-likely value for the text element.
  • The portions of the document that were designated as eligible for modification may be randomly changed. By way of example and as shown in FIG. 7, the system may generate a modified document by 700 indicating to a mutation-based fuzzing module that portions 610 and 630 are available for mutation, e.g., characters may be randomly changed, added or deleted.
  • The portions that are eligible for modification may also be selected for replacement by portions from the same or other documents. For instance and as shown in FIG. 7, the system has swapped the positions of portions 640 and 650. Portions of the test document may also be replaced with portions from other test documents. Moreover, the replacement portion does not have to be the same size as the replaced portion, particularly if the replacement is taken from another document. For example, increasingly larger replacement sequences may be added until a maximum length is reached or a relatively high perplexity point is reached.
  • When determining whether a text element should be modified, the system may consider not only the preceding text elements but subsequent text elements as well. For example, the FIG. 6 shows string “</head><body>” as a single block 631. However, the model may indicate that the character “b” is relatively unpredictable since it may be highly likely that a keyword will follow “ead><” but the precise keyword itself may be hard to predict. However, once it is known that the value of the next character is “b”, the model may indicate that the remaining characters of the block 631 are highly predictable because the tag “<body>” often follows “<head>”. As a result and in some aspects, if a single low-scored character is sandwiched between strings of high-scored characters, the system may check whether the lower-scored character and subsequent high-scored characters form a single keyword.
  • The system may measure one or more characteristics of a set of computer instructions' performance as they process the test sequences. By way of example and as shown in FIG. 8, device 110 may load test document 700 into a browser 810 and generate a report 805 regarding how well the browser performed. The report may include the name 811 of the document and information such as whether loading the document caused particular routines 820-23 (e.g., routines relating to rendering content, executing javascript, communicating information over a network and securing data) and third party plug-ins 830-32 to be called and, if so, whether any errors occurred. The system may also calculate, and the report may also include, a test score 815 based on other performance characteristics such as the document's coverage (e.g., the number of different routines that were called, the total number of unique lines of code that were executed), processing speed (e.g., load times), CPU and memory usage, whether any interesting results were encountered (e.g., errors, security issues such as buffer overflow errors and decompression bombs, plug-in failure 832, the application crashed, etc.), and any other measurable aspects relating to performance. Different events may be weighted differently when calculating a test score.
  • The performance characteristics may be used to determine whether further modifications should be made to the application or document. For instance, after the initial test, the document may be modified as described above and tested once again. As shown in FIG. 9, the system may then generate a second report 905, which indicates that an additional routine 921 of the browser and third-party plug-in 931 were called. The report also indicates that third-party plugin 930, which processed the prior version of the document without difficulty, failed with a buffer overflow error that could, in some circumstances, create security issues for data handled by that plug-in. As a result, the calculated test score 915 of the document increased relative to the first report 805. The system may repeat the process of identifying portions of a document that are eligible for modification based on scores returned by the neural network, modifying the eligible portions of the document, and determining a test score for the document until the document's test score stops increasing. If the defined format is PDF, the system may test a PDF reader, e.g., the functionality of a browser for displaying PDF documents, a stand-alone PDF application for displaying and editing PDF documents, etc.
  • The system may also be used to generate a completely new set of test documents in compliance with the model of the defined format. For instance, the system may create a new test document and randomly select the value of the text elements based on the likelihood of their occurrence as indicated by the model represented by the neural network.
  • By way of example and as shown in part in FIG. 5, model 350 may indicate that all HTML document in the corpus begin with “<DOCTYPE html”. As a result, when system 100 uses neural network 340 to randomly generate a new document, the first fourteen characters of the document would be “<DOCTYPE html”. Thereafter, however, the probabilities may change, e.g., the model may indicate that 80% of the time the string “html” is followed by a space and 20% of the time it is followed by a “>”. As a result, the system may randomly select a space or “>” by generating a random number between 0.00 and 1.00 and determining whether the number is greater or less than 0.80 and select a space or “>” accordingly as the next character of the generated document. The remaining characters of the document may be similarly determined. The randomly-generated test documents may be then be modified and tested as described above.
  • The model may be periodically updated to reflect changes to requirements and features of the defined format. For instance, neural network 340 may be periodically trained with recently created documents in order to keep model 350 up to date with changes to the defined format. Depending on the complexity of the defined format and other circumstances, the cost and other resources required to train and maintain a neural network such as neural network 340 may be less than the cost and resources required to write and maintain computer programs that create or analyze documents based on a rigid set of grammatical rules that were set by the people or organization that defined the format.
  • While the use of a recurrent neural network to score text elements may be particularly advantageous in certain applications, the system may use other components to provide a score that is based on the relationship of a given sequence of text element values to sequences of text element values that conform with a defined format. For instance, in lieu of a recurrent neural network, the machine learning component may be a support vector machine trained with N-grams copied from documents that comply with the deformed format, or a hidden Markov model. Moreover, in lieu of machine learning, the text-element scoring component may include a statistical regression routine that uses a sequence containing the text-element as the dependent variable and sequences in the corpus as independent variables.
  • FIG. 10 provides a flowchart of a method that may be executed by one or more computing devices. At block 1010, a sequence of values of text elements is received. At block 1020, a score for a text element value of the sequence is determined, where the score with respect to a particular text element value is related to the probability that the particular text element value will equal one or more given values, and where the probability is determined based on sequences of text element values that are consistent with a defined format. At block 1030, the score is compared to a threshold. At block 1040, when the score is below a threshold, the value of the text element is modified to form a modified sequence of text element values. At block 1050, the modified sequence is processed with a set of instructions. At block 1060, a characteristic of the performance of the computer instructions is tested by processing the modified sequence with the computer instructions.
  • As these and other variations and combinations of the features discussed above can be utilized without departing from the invention as defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the invention as defined by the claims. The provision of examples of the invention (as well as clauses phrased as “such as,” “e.g.”, “including” and the like) should not be interpreted as limiting the invention to the specific examples; rather, the examples are intended to illustrate only some of many possible aspects. Similarly, references to “based on” and the like means “based at least in part on”.

Claims (20)

1. A method comprising:
receiving a sequence of values of text elements;
determining, with one or more computing devices, a score for a text element value of the sequence, where the score is related to the probability of a particular text element value equaling one or more given values, and where said probability is based on sequences of text element values that are consistent with a defined format;
comparing, with the one or more computing devices, the score to a threshold;
when the score is below a threshold, modifying, with the one or more computing devices, the value of the text element to form a modified sequence of text element values;
processing, with the one or more computing devices, the modified sequence of text element values with a set of instructions; and
testing, with the one or more computing devices, a performance characteristic of the set of instructions when the set of instructions process the modified sequence of text element values.
2. The method of claim 1 wherein determining the score comprises determining the score based on a recurrent neural network trained with the sequences of values of text elements that conform with the defined format.
3. The method of claim 2 wherein the recurrent neural network is a Long-Short Term Memory (LSTM) neural network.
4. The method of claim 2 wherein the sequences of values of text elements that conform with the defined format are accessible via the Internet.
5. The method of claim 4 wherein the defined format is HTML or PDF.
6. The method of claim 1 further comprising determining a test value based on the performance characteristic.
7. The method of claim 6 further comprising:
determining a test score based on the performance characteristic; and
continuing to determine a test value for a given sequence of text element values, modify a value of one or more text elements of the given sequence of text element values, and test the performance characteristic of the instructions with the given sequence until the test value exceeds a threshold.
8. A system comprising
one or more computing devices, and
a memory storing instructions executable by the one or more computing devices,
wherein the instructions comprise:
receiving an initial sequence of text elements having values;
determining a first score for the value of a first text element of the initial sequence, wherein determining a score with respect to the value of a particular text element in a particular sequence of text elements is related to how frequently the value of the particular text element follows same or similar sequences of text element values that are consistent with a defined format;
determining a second score for the value of a second text element of the initial sequence, wherein there is a third text element between the first and second text elements in the initial sequence;
comparing the first and second scores to a threshold;
when the first and second scores are above the threshold, generating a modified sequence of text elements having values, where the value of a first text element in the modified sequence equals the value of the first text element in the initial sequence, the value of a second text element in the modified sequence equals the value of the second text element in the initial sequence, the value of a third text element in the modified sequence is different from the value of the third text element in the initial sequence, and the third text element is in between the first and second text elements in the sequence;
processing, with the one or more computing devices, the modified sequence of text elements with an application; and
testing, with the one or more computing devices, a performance characteristic of the application when the application processes the modified sequence of text element values.
9. The system of claim 8 wherein determining a score comprises determining the score based on a recurrent neural network trained with the sequences of text element values that are consistent with a defined format.
10. The system of claim 9 wherein the recurrent neural network is a Long-Short Term Memory (LSTM) neural network.
11. The system of claim 10 wherein the defined format is PDF, the neural network is trained with PDF documents, and the application comprises a PDF reader.
12. The system of claim 8 wherein the instructions further comprise:
receiving the modified sequence of text elements having values, the modified sequence including fourth, fifth and sixth text elements, the sixth text element being between the fourth and fifth text elements;
determining a fourth score for the value of a fourth text element of the modified sequence,
determining a fifth score for the value of a fifth text element of the modified sequence,
comparing the fourth and fifth scores to the threshold;
when the first, second, fourth and fifth scores are above the threshold, modifying the value of the third text element to match the value of the sixth text element.
13. The system of claim 12 wherein the instructions further comprise modifying the value of the sixth text element to match the value of the third text element.
14. A system comprising:
one or more computing devices;
a memory storing instructions executable by the one or more computing devices;
wherein the instructions comprise:
receiving a document containing a sequence of text characters;
determining a score for each of a plurality of characters of the document, wherein the score of a character is determined based on the value of the character, the value of one or more preceding characters in the document, and a machine learning component trained with sequences of characters conforming with the defined format;
when the score of a character below a threshold, associating the character with a set of characters eligible for modification;
modifying at least one of the characters in the set of characters; and
after modifying at least one of the characters in the set of characters, measuring the performance of an application as the application processes the document.
15. The system of claim 14 wherein the machine learning component comprises a recurrent neural network.
16. The system of claim 15 wherein the recurrent neural network is a Long-Short Term Memory (LSTM) neural network.
17. The system of claim 14 wherein modifying a character comprises replacing the character with a plurality of characters.
18. The system of claim 14 wherein the modifying a character comprises replacing the character with other characters in the set of character.
19. The system of claim 14 wherein the score of a character is further based on the value of at least two preceding characters in the document.
20. The system of claim 14 wherein the received document is a first document and wherein the instructions further comprise:
generating a second document that is a copy of the first document after the at least one or more characters have been modified, determining the scores of the characters of the second document, modifying at least one character of the second document based on the character's score; and
depending on the performance of the application, generating a third document that is a copy of the second document after the at least one or more characters of the second document have been modified, determining the scores of the characters of the third document, modifying at least one character of the third document based on the character's score, and measuring the performance of the application as the application processes the third document.
US15/360,554 2016-11-23 2016-11-23 Testing applications with a defined input format Active US9977729B1 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
US15/360,554 US9977729B1 (en) 2016-11-23 2016-11-23 Testing applications with a defined input format
DE102017121649.9A DE102017121649A1 (en) 2016-11-23 2017-09-19 Testing applications with a defined input format
DE202017105654.6U DE202017105654U1 (en) 2016-11-23 2017-09-19 Testing applications with a defined input format
GB2015227.8A GB2590140B (en) 2016-11-23 2017-09-21 Testing applications with a defined input format
GB1715225.7A GB2556978B (en) 2016-11-23 2017-09-21 Testing applications with a defined input format
CN201710892789.XA CN108089974B (en) 2016-11-23 2017-09-27 Testing applications with defined input formats
CN202110308048.9A CN113110988B (en) 2016-11-23 2017-09-27 Testing applications with defined input formats
PCT/US2017/054024 WO2018097895A1 (en) 2016-11-23 2017-09-28 Testing applications with a defined input format
US15/901,349 US10095610B2 (en) 2016-11-23 2018-02-21 Testing applications with a defined input format

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/360,554 US9977729B1 (en) 2016-11-23 2016-11-23 Testing applications with a defined input format

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/901,349 Continuation US10095610B2 (en) 2016-11-23 2018-02-21 Testing applications with a defined input format

Publications (2)

Publication Number Publication Date
US9977729B1 US9977729B1 (en) 2018-05-22
US20180143896A1 true US20180143896A1 (en) 2018-05-24

Family

ID=60162247

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/360,554 Active US9977729B1 (en) 2016-11-23 2016-11-23 Testing applications with a defined input format
US15/901,349 Active US10095610B2 (en) 2016-11-23 2018-02-21 Testing applications with a defined input format

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/901,349 Active US10095610B2 (en) 2016-11-23 2018-02-21 Testing applications with a defined input format

Country Status (5)

Country Link
US (2) US9977729B1 (en)
CN (2) CN108089974B (en)
DE (2) DE102017121649A1 (en)
GB (2) GB2556978B (en)
WO (1) WO2018097895A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474562B2 (en) * 2017-09-20 2019-11-12 Salesforce.Com Machine learning based ranking of test cases for software development
US11080603B2 (en) * 2018-05-18 2021-08-03 Google Llc Systems and methods for debugging neural networks with coverage guided fuzzing
US11281567B1 (en) * 2020-06-02 2022-03-22 Meta Platforms, Inc. Selecting an optimal subset of software tests for execution

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10379851B2 (en) 2017-06-23 2019-08-13 International Business Machines Corporation Fine-grained management of exception enablement of floating point controls
US10725739B2 (en) 2017-06-23 2020-07-28 International Business Machines Corporation Compiler controls for program language constructs
US10514913B2 (en) 2017-06-23 2019-12-24 International Business Machines Corporation Compiler controls for program regions
US10740067B2 (en) 2017-06-23 2020-08-11 International Business Machines Corporation Selective updating of floating point controls
US10481908B2 (en) 2017-06-23 2019-11-19 International Business Machines Corporation Predicted null updated
US10684852B2 (en) 2017-06-23 2020-06-16 International Business Machines Corporation Employing prefixes to control floating point operations
CN108944930B (en) * 2018-07-05 2020-04-21 合肥工业大学 Automatic car following method and system for simulating driver characteristics based on LSTM
US11734375B2 (en) * 2018-09-27 2023-08-22 Google Llc Automatic navigation of interactive web documents
CN109522221B (en) * 2018-10-26 2021-08-31 中国科学院信息工程研究所 Method and system for improving fuzzy test efficiency
CN109598334B (en) * 2018-12-03 2021-06-29 中国信息安全测评中心 Sample generation method and device
US10474564B1 (en) * 2019-01-25 2019-11-12 Softesis Inc. Identifying user interface elements using element signatures
CN112530235A (en) * 2020-12-15 2021-03-19 深圳市新亚恒利科技有限公司 Fast reading training control method, device, equipment and storage medium
US11775878B2 (en) 2020-12-22 2023-10-03 Sas Institute Inc. Automated machine learning test system
CN114385491A (en) * 2021-12-30 2022-04-22 大连理工大学 JS translator defect detection method based on deep learning

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5369574A (en) * 1990-08-01 1994-11-29 Canon Kabushiki Kaisha Sentence generating system
DE4306508A1 (en) * 1993-03-03 1994-09-08 Philips Patentverwaltung Method and arrangement for determining words in a speech signal
US5523945A (en) * 1993-09-17 1996-06-04 Nec Corporation Related information presentation method in document processing system
US7117153B2 (en) * 2003-02-13 2006-10-03 Microsoft Corporation Method and apparatus for predicting word error rates from text
KR100857862B1 (en) * 2007-06-05 2008-09-10 한국전자통신연구원 The file mutation method and system using file section information and mutation rules
US8862940B2 (en) 2012-02-14 2014-10-14 Microsoft Corporation Integrated fuzzing
US9043349B1 (en) * 2012-11-29 2015-05-26 A9.Com, Inc. Image-based character recognition
US9367435B2 (en) * 2013-12-12 2016-06-14 Vertafore, Inc. Integration testing method and system for web services
US9558089B2 (en) 2014-11-12 2017-01-31 Intuit Inc. Testing insecure computing environments using random data sets generated from characterizations of real data sets
CN104965784B (en) * 2015-06-16 2017-12-26 广州华多网络科技有限公司 Automatic test approach and device
CN106095684B (en) * 2016-06-17 2019-01-29 北京大学 A kind of predictability mutation testing method
US10452526B2 (en) * 2017-06-15 2019-10-22 Microsoft Technology Licensing, Llc Machine learning for constrained mutation-based fuzz testing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474562B2 (en) * 2017-09-20 2019-11-12 Salesforce.Com Machine learning based ranking of test cases for software development
US11080603B2 (en) * 2018-05-18 2021-08-03 Google Llc Systems and methods for debugging neural networks with coverage guided fuzzing
US11281567B1 (en) * 2020-06-02 2022-03-22 Meta Platforms, Inc. Selecting an optimal subset of software tests for execution

Also Published As

Publication number Publication date
GB202015227D0 (en) 2020-11-11
DE202017105654U1 (en) 2018-01-19
GB2590140A (en) 2021-06-23
GB201715225D0 (en) 2017-11-08
WO2018097895A1 (en) 2018-05-31
GB2590140B (en) 2021-09-08
GB2556978A (en) 2018-06-13
US9977729B1 (en) 2018-05-22
CN108089974B (en) 2021-03-26
CN113110988B (en) 2022-07-29
DE102017121649A1 (en) 2018-05-24
US20180181486A1 (en) 2018-06-28
CN108089974A (en) 2018-05-29
CN113110988A (en) 2021-07-13
GB2556978B (en) 2020-11-04
US10095610B2 (en) 2018-10-09

Similar Documents

Publication Publication Date Title
US10095610B2 (en) Testing applications with a defined input format
US20210264109A1 (en) Stylistic Text Rewriting for a Target Author
US11264023B2 (en) Using multiple modality input to feedback context for natural language understanding
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
US10885281B2 (en) Natural language document summarization using hyperbolic embeddings
CN106663123B (en) Comment-centric news reader
JP7103264B2 (en) Generation device, learning device, generation method and program
US9298694B2 (en) Generating a regular expression for entity extraction
US8224642B2 (en) Automated identification of documents as not belonging to any language
US20160154785A1 (en) Optimizing generation of a regular expression
WO2016155643A1 (en) Input-based candidate word display method and device
WO2022134577A1 (en) Translation error identification method and apparatus, and computer device and readable storage medium
JP2019036210A (en) FAQ registration support method using machine learning, and computer system
US20150199332A1 (en) Browsing history language model for input method editor
US11056215B2 (en) Performing chemical textual analysis to discover dangerous chemical pathways
US11544449B2 (en) Annotating chemical reactions
CN115858776A (en) Variant text classification recognition method, system, storage medium and electronic equipment
US9305117B2 (en) String generation tool
CN109933788B (en) Type determining method, device, equipment and medium
JP6972788B2 (en) Specific program, specific method and information processing device
CN112732913B (en) Method, device, equipment and storage medium for classifying unbalanced samples
JP6828812B2 (en) Information processing equipment, information processing methods and programs
KR20220045797A (en) Apparatus and method for providing summarized translation text
CN116796730A (en) Text error correction method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:IVANCIC, FRANJO;SZEKERES, LASZLO;BABIC, DOMAGOJ;REEL/FRAME:040483/0936

Effective date: 20161130

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044567/0001

Effective date: 20170929

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4