US20140122511A1 - Framework for generating programs to process beacons - Google Patents

Framework for generating programs to process beacons Download PDF

Info

Publication number
US20140122511A1
US20140122511A1 US13/660,788 US201213660788A US2014122511A1 US 20140122511 A1 US20140122511 A1 US 20140122511A1 US 201213660788 A US201213660788 A US 201213660788A US 2014122511 A1 US2014122511 A1 US 2014122511A1
Authority
US
United States
Prior art keywords
beacons
information
objects
beacon
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/660,788
Other versions
US8725750B1 (en
Inventor
Lucas Waye
Kevin Seng
Viral Bajaria
Shane Moriah
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hulu LLC
Original Assignee
Hulu LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hulu LLC filed Critical Hulu LLC
Priority to US13/660,788 priority Critical patent/US8725750B1/en
Assigned to HULU LLC reassignment HULU LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WAYE, LUCAS, SENG, KEVIN, BAJARIA, VIRAL, MORIAH, SHANE
Priority to US14/228,003 priority patent/US9305032B2/en
Publication of US20140122511A1 publication Critical patent/US20140122511A1/en
Application granted granted Critical
Publication of US8725750B1 publication Critical patent/US8725750B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Definitions

  • a user may view a video in a media player.
  • the companies often seek to improve their service by analyzing events that occur while the users are using their client devices. For example, while viewing the video, the user performs different actions, such as seeking to different times in the video, stopping the video, hovering over icons, etc.
  • Web requests are generated to document the actions taken at the client devices (also referred to as “beacons”).
  • a server may aggregate information, such as the IP address of the computer being used; the time the material was viewed; the type of browser that was used, the type of action taken by the user, etc.
  • the beacons are logged and aggregated for the company.
  • the beacons include information that is in an unstructured format.
  • the unstructured format is not in a pre-defined data model that a company can easily store in a structured database. For example, many analysis applications are keyed to retrieve data in fields in a structured database.
  • the beacons do not include data that can easily be stored in the correct fields. Thus, if a company is going to analyze the information in the beacons, the company needs to transform the unstructured data into structured data.
  • the structured data organizes the data in a format desired by the company where the company can then analyze the structured data.
  • a method receives a specification for processing beacons.
  • the beacons are associated with an event occurring at a client while a user is interacting with a web application and include unstructured data.
  • the method then parses the specification to determine an object model including objects determined from the specification where different specifications are parsed into a format of the object model.
  • a generator is determined from a set of generators. Each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model.
  • the method then runs the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • a non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • an apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • FIG. 1 depicts a simplified system for processing beacons according to one embodiment.
  • FIG. 2 shows an example of a compiler according to one embodiment.
  • FIG. 3 depicts a simplified flowchart for generating target programs according to one embodiment.
  • FIG. 4 shows a specification according to one embodiment.
  • FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects.
  • FIG. 6 shows an example of a target program according to one embodiment.
  • Described herein are techniques for a framework for processing beacons.
  • numerous examples and specific details are set forth in order to provide a thorough understanding of particular embodiments.
  • Particular embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
  • FIG. 1 depicts a simplified system 100 for processing beacons according to one embodiment.
  • System 100 includes clients 102 , a server 104 , beacon target programs 106 , and a beacon target program generation compiler 108 .
  • the beacons may include unicode strings and URL encoded binary strings. To obtain any further semantic meaning of the beacon data, the beacon data needs to be interpreted and transformed by target programs.
  • beacons are described, which may be web event logs for events that occur while users use clients 102 , other types of unstructured data may be appreciated.
  • beacons may also include extensible mark-up language (XML) specifications, hypertext transfer mark-up language (HTML) code, and other human-readable documentation.
  • XML extensible mark-up language
  • HTML hypertext transfer mark-up language
  • Users interact with clients 102 to produce events. For example, users may interact with websites on the worldwide web (WWW), such as through mouse clicks, hovering over objects, and other user interactions with web pages.
  • WWW worldwide web
  • Beacons are created based on the events and include information for the actions taken by the users and may also include other metadata about the event.
  • the metadata may include user identification information, what platform (e.g., device type or operating system) is being used, what application is being used, etc.
  • the beacons may be unstructured data. Also, different clients 102 and different web sites may generate beacons in different formats.
  • a server 104 receives and stores the beacons for later processing.
  • server 104 may aggregate beacons from multiple network devices.
  • server 104 may be a distributed system of servers that are storing the beacons.
  • server 104 stores the beacons, but other storage devices may store the beacons.
  • target programs 106 may be executed to process the beacons.
  • target programs 106 may determine beacons that are of interest and then transform the unstructured data of the beacons into structured data that can be used by a company. For example, different target programs 106 may be interested in different types of beacons. Each target program 106 would identify the applicable beacons. Then, target programs 106 transform the unstructured data into structured data.
  • the structured data may be stored in a database for later querying, such as to generate reports.
  • compiler 108 receives a specification and uses the specification to automatically generate a target program 106 .
  • compiler 108 uses the specification to automatically generate a target program 106 .
  • Using the specification allows users to declaratively specify what beacons are of interest and what structured data is desired.
  • Compiler 108 then generates target programs 106 that can process the beacons and perform the desired transformations from unstructured data to structured data.
  • FIG. 2 shows a more detailed example of compiler 108 according to one embodiment.
  • Specifications 202 may be written using a specific grammar that declares what beacons are of interest and what structured data is desired. Users may write different specifications 202 to generate different structured data from different beacons.
  • an abstract syntax tree generator 203 first converts specifications 202 into abstract syntax trees 204 .
  • the abstract syntax tree is an abstract way of representing the syntax of different specifications 202 .
  • an abstract syntax tree is a tree representation of the syntactic structure of the input program. The syntax tree is built through the use of a parser, which produces a tree representation of the input program based on a grammar specification.
  • An object model generator 205 uses the abstract syntax trees to generate object models 206 .
  • Object models 206 convert nodes of the abstract syntax tree into objects that are in the object model.
  • the object model is used such that generators 208 can be written to read a specific format defined in the object model. This allows generators 208 to be reused to process different specifications 202 .
  • specifications 202 may be written and parsed into object models 206 .
  • object models 206 with different objects may be generated, but the same generators 208 may be used.
  • the same generator 208 may be used because each generator 208 is configured to parse the same format of an object model 206 .
  • the object model is a simplified and generalized view of the input specification based on the abstract syntax tree. The object model is generated by passing over the abstract syntax tree multiple times. Specification correctness checks may be performed (semantic analysis), symbols may be resolved (e.g., various references that must be resolved and disambiguated), and a simplified structure is created (called the object model) so that generators 208 can be written more concisely.
  • Object models 206 are in a format that can be read by different generators 208 - 1 - 208 -N.
  • Each generator 208 - 1 - 208 -N may generate target programs #1-N, respectively.
  • some generators 208 may generate MapReduce source code, structured query language (SQL) queries, representational state transfer (REST) requests, HTML documentation, and other target programs.
  • Each generator 208 may be written to process the formats of object models 206 and thus multiple generators 208 do not need to be written for different specifications 202 . That is, if MapReduce code is desired, the same MapReduce generator 208 is used for multiple specifications 202 .
  • the objects in object model 206 may change, but the same generator 208 may be used.
  • FIG. 3 depicts a simplified flowchart for generating target programs 106 according to one embodiment.
  • compiler 108 receives a specification 202 .
  • Specification 202 specifies which beacons to process and what transformations of the unstructured data to specified structured data are desired.
  • specification 202 does not include code that is used to process beacons and transform the unstructured data to structured data.
  • compiler 108 may parse the specification for correctness. For example, compiler 108 parse the specification for semantic correctness, such as compiler 108 may determine that a basefact is referencing a beacon that is not defined.
  • compiler 108 parses specification 202 into an abstract syntax tree 204 .
  • the abstract syntax tree organizes the elements of specification 202 into a tree structure.
  • compiler 108 converts abstract syntax tree 204 into an object model 206 .
  • compiler 108 parses nodes of abstract syntax tree 204 to generate object model 206 .
  • Object model 206 organizes specification 202 into objects.
  • compiler 108 determines a generator 208 for a target program 106 .
  • compiler 108 may receive a user selection of a generator 208 .
  • the selected generator 208 is configured to produce a specific type of target program 106 .
  • compiler 108 generates target program 106 for generator 208 based on object model 206 .
  • FIG. 4 shows a specification 202 according to one embodiment.
  • Specification 202 produces a target program 106 to convert a video ID to a video name, transform a browser name for the browser used to play a video to a browser name, and count the number of times the video was played. It should be noted that specification 202 may not be a complete specification and has parts redacted, such as when a “ . . . ” is shown.
  • Specification 202 includes three sections of “composite”, “beacon”, and “basefact”.
  • a composite defines what is in the beacon, such as the raw data that is in the beacon, and how to transform the raw data in the beacon.
  • three composite objects of “Video”, “Browser”, and “Count” are shown. Composites may have any number of input fields and one or more output fields.
  • the Video composite object has an input parameter object named “video_id”. This is what the beacon parameter name is in a raw log line.
  • the unstructured data may include the term “video_id”.
  • the Video composite object includes an output field object called “video_name”. This is the field name after video_id is transformed.
  • a mapper object for “MapReduceJob” includes transformational logic for the output field object video_name.
  • the mapper object includes details for performing the transformation that is specified in the mapper definition located at conversionMethod. Additional mappers may also be included in a composite object that may perform other transformations.
  • other composite objects of “Browser” and “Count” are included. Details have not been provided, but would be similar to those found in the Video composite object. It will be understood that specification 202 may include any number of composite objects 402 . For example, specification 202 may include additional composite objects (not shown) that may be used by other beacon objects.
  • a beacon object is identified as “playback_start” and uniquely identifies the beacon within specification 202 . Because specification 202 may include multiple composite objects, the beacon object identifies which composite objects are part of this beacon object.
  • the beacon includes three field objects: “selected_video”, which references the Video composite object; “user_browser”, which references the Browser composite object; and “count”, which references the Count composite object. The field objects are used to refer back to composite objects.
  • specification 202 defines a basefact object of “start_by_video_and_browser”.
  • the basefact object is used to define what structured data is desired and what unstructured data should be used to populate the structured data.
  • the basefact object may use multiple basefacts objects. For example, this basefact object uses the “playback_start” beacon object to determine applicable data. That is, this basefact ignores all other beacon objects that are not named “playback_start” in specification 202 .
  • the basefact object includes three structured data field objects for the “playback_start” beacon.
  • the structured data fields may be different types, such as dimension or fact fields.
  • a dimension maps a field in the beacon to a structured data field.
  • a fact may perform a function (e.g., an aggregation function) on a field in the beacon to determine a result that is mapped to a structured data field.
  • a first structured data field of “videoName” is defined as a dimension of the video_name field object in the composite object referenced by the selected_video field object in the beacon object and a second structured data field of “browserName” is defined as a dimension from the name field object in the composite object referenced by the user_browser field object in the beacon object.
  • a third structured data field of “totalCount” is defined as a fact that is the aggregation of the count field object in the composite object referenced by the count field object in the beacon object.
  • compiler 108 selects a generator 208 that is used to generate a target program 106 .
  • compiler 108 converts specification 202 into object model 206 .
  • Generator 208 takes object model 206 and generates code in a software language that is used to process beacons.
  • compiler 108 generates MapReduce job code as a target program 106 .
  • Target program 106 is configured to receive unstructured data, such as raw web event log lines, and generate structured data specified by the starts_by_video_and_browser basefact definition. That is, transformed data from the beacons is stored in structured data fields of videoName, browserName, and totalCount.
  • FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects that generator 208 analyzes to generate code for target program 106 .
  • generator 208 identifies the beacon object for the basefact object.
  • specification 202 may include multiple beacon objects and the beacon object for this basefact object is the playback_start beacon object.
  • Generator 208 generates filtering code that determines which beacons should be processed by target program 106 .
  • the structured data field objects in the basefact object point to field objects in the beacon object at 504 .
  • selected_video, user_browser, and count are referenced in both the basefact and the beacon objects.
  • the field objects in the beacon object are associated with composite objects.
  • Generator 208 uses the referenced composite objects from the beacon object to generate instructions on how to map unstructured data to structured data. For example, generator 208 generates instructions on how to tokenize (breaking the text of the beacon into words or phrases) and transform raw web log data to structured data.
  • the basefact object defines the structured data by the terms videoName, browserName, and totalCount, which are structured data fields that can be defined in a database.
  • the transformations for the field objects in the basefact object are specified in the composite object that each beacon field object references as was discussed with respect to 506 . Also, for the fact field object, generator 208 generates instructions to aggregate rows based on the count composite object.
  • Target program 106 can then be used to process beacons and produce the transformed data as specified in the basefact definition.
  • FIG. 6 shows an example of target program 106 according to one embodiment.
  • Generator 208 may generate target program 106 based on specification 202 and object model 206 .
  • the function “Map” defines the aggregator/reducer based on the MapReduce paradigm. Dimensions correspond to Keys, and Facts correspond to Values.
  • the term “playback_start” is based on which beacons were defined by specification 202 . In this case, only events defined by playback_start beacons are reviewed.
  • the conversion found in the composite Video is found, and at 612 , the conversion found in the composite Browser is found.
  • the functions “Identity ⁇ Long>( )” and “StaticInputAction ⁇ Long>(1L)” are determined based on the fact “sum” in the basefact in specification 202 . The above information is determined by reviewing object model 206 to generate the target program 106 .
  • compiler 108 generates target program 106 , which can map unstructured data to structured data.
  • a user can declare the structured data that was desired and the transformations needed to transform unstructured data to structured data.
  • Compiler 108 then generates the software code to perform the desired transformations. A user thus does not need to write software code for target program 106 .
  • particular embodiments leverage object model 206 that allows different generators 208 to operate on the object model.
  • different specifications 202 may be parsed into an object model 206 that can be operated on by the same generators 208 .
  • Particular embodiments may be implemented in a non-transitory computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or machine.
  • the computer-readable storage medium contains instructions for controlling a computer system to perform a method described by particular embodiments.
  • the instructions when executed by one or more computer processors, may be operable to perform that which is described in particular embodiments.

Abstract

A method receives a specification for processing beacons. The beacons are associated with an event occurring at a client while a user is interacting with a web application and include unstructured data. The method parses the specification to determine an object model including objects determined from the specification where different specifications are parsed into a format of the object model. A generator is determined and each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model. The method runs the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data.

Description

    BACKGROUND
  • Companies provide services that users access using client devices. For example, a user may view a video in a media player. The companies often seek to improve their service by analyzing events that occur while the users are using their client devices. For example, while viewing the video, the user performs different actions, such as seeking to different times in the video, stopping the video, hovering over icons, etc. Web requests are generated to document the actions taken at the client devices (also referred to as “beacons”). For example, when a user's browser requests information from a website, a server may aggregate information, such as the IP address of the computer being used; the time the material was viewed; the type of browser that was used, the type of action taken by the user, etc. The beacons are logged and aggregated for the company.
  • The beacons include information that is in an unstructured format. The unstructured format is not in a pre-defined data model that a company can easily store in a structured database. For example, many analysis applications are keyed to retrieve data in fields in a structured database. The beacons do not include data that can easily be stored in the correct fields. Thus, if a company is going to analyze the information in the beacons, the company needs to transform the unstructured data into structured data. The structured data organizes the data in a format desired by the company where the company can then analyze the structured data.
  • Programs need to be written to perform the transformation of the unstructured data of the beacons into structured data. However, each type of beacon has different types of information. Thus, for each type of beacon that the company wants to analyze, a programmer needs to write a program to transform the unstructured data for the beacon to the desired type of structured data. Writing the programs to perform these transformations may be a tedious process. Also, having to write code for the programs limits the number of users that can write the programs because most users are not programmers.
  • SUMMARY
  • In one embodiment, a method receives a specification for processing beacons. The beacons are associated with an event occurring at a client while a user is interacting with a web application and include unstructured data. The method then parses the specification to determine an object model including objects determined from the specification where different specifications are parsed into a format of the object model. A generator is determined from a set of generators. Each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model. The method then runs the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • In one embodiment, a non-transitory computer-readable storage medium is provided containing instructions, that when executed, control a computer system to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • In one embodiment, an apparatus is provided comprising: one or more computer processors; and a computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for: receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data; parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model; determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
  • The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of particular embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 depicts a simplified system for processing beacons according to one embodiment.
  • FIG. 2 shows an example of a compiler according to one embodiment.
  • FIG. 3 depicts a simplified flowchart for generating target programs according to one embodiment.
  • FIG. 4 shows a specification according to one embodiment.
  • FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects.
  • FIG. 6 shows an example of a target program according to one embodiment.
  • DETAILED DESCRIPTION
  • Described herein are techniques for a framework for processing beacons. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of particular embodiments. Particular embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
  • FIG. 1 depicts a simplified system 100 for processing beacons according to one embodiment. System 100 includes clients 102, a server 104, beacon target programs 106, and a beacon target program generation compiler 108. The beacons may include unicode strings and URL encoded binary strings. To obtain any further semantic meaning of the beacon data, the beacon data needs to be interpreted and transformed by target programs. Although beacons are described, which may be web event logs for events that occur while users use clients 102, other types of unstructured data may be appreciated. For example, beacons may also include extensible mark-up language (XML) specifications, hypertext transfer mark-up language (HTML) code, and other human-readable documentation.
  • Users interact with clients 102 to produce events. For example, users may interact with websites on the worldwide web (WWW), such as through mouse clicks, hovering over objects, and other user interactions with web pages. Beacons are created based on the events and include information for the actions taken by the users and may also include other metadata about the event. For example, the metadata may include user identification information, what platform (e.g., device type or operating system) is being used, what application is being used, etc. The beacons may be unstructured data. Also, different clients 102 and different web sites may generate beacons in different formats.
  • A server 104 receives and stores the beacons for later processing. In one example, server 104 may aggregate beacons from multiple network devices. Also, server 104 may be a distributed system of servers that are storing the beacons. In this example, server 104 stores the beacons, but other storage devices may store the beacons.
  • In one example, target programs 106 may be executed to process the beacons. When executed, target programs 106 may determine beacons that are of interest and then transform the unstructured data of the beacons into structured data that can be used by a company. For example, different target programs 106 may be interested in different types of beacons. Each target program 106 would identify the applicable beacons. Then, target programs 106 transform the unstructured data into structured data. The structured data may be stored in a database for later querying, such as to generate reports.
  • Conventionally, users would have to write target programs 106 for each type of beacon that a company wanted to process. However, particular embodiments automatically generate target programs 106. For example, as will be described in more detail below, compiler 108 receives a specification and uses the specification to automatically generate a target program 106. Using the specification allows users to declaratively specify what beacons are of interest and what structured data is desired. Compiler 108 then generates target programs 106 that can process the beacons and perform the desired transformations from unstructured data to structured data. By using the specification to declare what is wanted, users do not have to write a program that is used to process the beacons. This may allow more users to specify how to process beacons.
  • The process of generating a target program 106 from a specification will now be described in more detail. FIG. 2 shows a more detailed example of compiler 108 according to one embodiment. Specifications 202 may be written using a specific grammar that declares what beacons are of interest and what structured data is desired. Users may write different specifications 202 to generate different structured data from different beacons.
  • In one embodiment, an abstract syntax tree generator 203 first converts specifications 202 into abstract syntax trees 204. The abstract syntax tree is an abstract way of representing the syntax of different specifications 202. In one embodiment, an abstract syntax tree is a tree representation of the syntactic structure of the input program. The syntax tree is built through the use of a parser, which produces a tree representation of the input program based on a grammar specification.
  • An object model generator 205 uses the abstract syntax trees to generate object models 206. Object models 206 convert nodes of the abstract syntax tree into objects that are in the object model. The object model is used such that generators 208 can be written to read a specific format defined in the object model. This allows generators 208 to be reused to process different specifications 202. Because beacons may have similar formats of data, specifications 202 may be written and parsed into object models 206. Thus, to process different types of beacons, object models 206 with different objects may be generated, but the same generators 208 may be used. Also, even though the information that is being transformed from unstructured data to structured data may be different, the same generator 208 may be used because each generator 208 is configured to parse the same format of an object model 206. In one embodiment, the object model is a simplified and generalized view of the input specification based on the abstract syntax tree. The object model is generated by passing over the abstract syntax tree multiple times. Specification correctness checks may be performed (semantic analysis), symbols may be resolved (e.g., various references that must be resolved and disambiguated), and a simplified structure is created (called the object model) so that generators 208 can be written more concisely.
  • Object models 206 are in a format that can be read by different generators 208-1-208-N. Each generator 208-1-208-N may generate target programs #1-N, respectively. For example, some generators 208 may generate MapReduce source code, structured query language (SQL) queries, representational state transfer (REST) requests, HTML documentation, and other target programs. Each generator 208 may be written to process the formats of object models 206 and thus multiple generators 208 do not need to be written for different specifications 202. That is, if MapReduce code is desired, the same MapReduce generator 208 is used for multiple specifications 202. The objects in object model 206 may change, but the same generator 208 may be used.
  • FIG. 3 depicts a simplified flowchart for generating target programs 106 according to one embodiment. At 302, compiler 108 receives a specification 202. Specification 202 specifies which beacons to process and what transformations of the unstructured data to specified structured data are desired. In one embodiment, specification 202 does not include code that is used to process beacons and transform the unstructured data to structured data. Also, compiler 108 may parse the specification for correctness. For example, compiler 108 parse the specification for semantic correctness, such as compiler 108 may determine that a basefact is referencing a beacon that is not defined.
  • At 304, compiler 108 parses specification 202 into an abstract syntax tree 204. The abstract syntax tree organizes the elements of specification 202 into a tree structure.
  • At 306, compiler 108 converts abstract syntax tree 204 into an object model 206. For example compiler 108 parses nodes of abstract syntax tree 204 to generate object model 206. Object model 206 organizes specification 202 into objects.
  • At 308, compiler 108 determines a generator 208 for a target program 106. For example, compiler 108 may receive a user selection of a generator 208. The selected generator 208 is configured to produce a specific type of target program 106.
  • At 310, compiler 108 generates target program 106 for generator 208 based on object model 206. To illustrate the above process of generating target program 106 from specification 202, an example specification 202 will be described. FIG. 4 shows a specification 202 according to one embodiment. Specification 202 produces a target program 106 to convert a video ID to a video name, transform a browser name for the browser used to play a video to a browser name, and count the number of times the video was played. It should be noted that specification 202 may not be a complete specification and has parts redacted, such as when a “ . . . ” is shown.
  • Specification 202 includes three sections of “composite”, “beacon”, and “basefact”. A composite defines what is in the beacon, such as the raw data that is in the beacon, and how to transform the raw data in the beacon. At 402, three composite objects of “Video”, “Browser”, and “Count” are shown. Composites may have any number of input fields and one or more output fields. At 404, the Video composite object has an input parameter object named “video_id”. This is what the beacon parameter name is in a raw log line. For example, the unstructured data may include the term “video_id”. At 408, the Video composite object includes an output field object called “video_name”. This is the field name after video_id is transformed. At 410, a mapper object for “MapReduceJob” includes transformational logic for the output field object video_name. The mapper object includes details for performing the transformation that is specified in the mapper definition located at conversionMethod. Additional mappers may also be included in a composite object that may perform other transformations. At 412, other composite objects of “Browser” and “Count” are included. Details have not been provided, but would be similar to those found in the Video composite object. It will be understood that specification 202 may include any number of composite objects 402. For example, specification 202 may include additional composite objects (not shown) that may be used by other beacon objects.
  • At 412, a beacon object is identified as “playback_start” and uniquely identifies the beacon within specification 202. Because specification 202 may include multiple composite objects, the beacon object identifies which composite objects are part of this beacon object. At 414, the beacon includes three field objects: “selected_video”, which references the Video composite object; “user_browser”, which references the Browser composite object; and “count”, which references the Count composite object. The field objects are used to refer back to composite objects.
  • At 416, specification 202 defines a basefact object of “start_by_video_and_browser”. The basefact object is used to define what structured data is desired and what unstructured data should be used to populate the structured data. The basefact object may use multiple basefacts objects. For example, this basefact object uses the “playback_start” beacon object to determine applicable data. That is, this basefact ignores all other beacon objects that are not named “playback_start” in specification 202. At 418, the basefact object includes three structured data field objects for the “playback_start” beacon. The structured data fields may be different types, such as dimension or fact fields. A dimension maps a field in the beacon to a structured data field. A fact may perform a function (e.g., an aggregation function) on a field in the beacon to determine a result that is mapped to a structured data field.
  • A first structured data field of “videoName” is defined as a dimension of the video_name field object in the composite object referenced by the selected_video field object in the beacon object and a second structured data field of “browserName” is defined as a dimension from the name field object in the composite object referenced by the user_browser field object in the beacon object. A third structured data field of “totalCount” is defined as a fact that is the aggregation of the count field object in the composite object referenced by the count field object in the beacon object.
  • Once receiving specification 202, compiler 108 selects a generator 208 that is used to generate a target program 106. As discussed above, compiler 108 converts specification 202 into object model 206. Generator 208 takes object model 206 and generates code in a software language that is used to process beacons. In one embodiment, compiler 108 generates MapReduce job code as a target program 106. Target program 106 is configured to receive unstructured data, such as raw web event log lines, and generate structured data specified by the starts_by_video_and_browser basefact definition. That is, transformed data from the beacons is stored in structured data fields of videoName, browserName, and totalCount.
  • FIG. 5 shows the relationship of objects within the composite, beacon, and basefact objects that generator 208 analyzes to generate code for target program 106. At 502, generator 208 identifies the beacon object for the basefact object. For example, specification 202 may include multiple beacon objects and the beacon object for this basefact object is the playback_start beacon object. Generator 208 generates filtering code that determines which beacons should be processed by target program 106.
  • The structured data field objects in the basefact object point to field objects in the beacon object at 504. For example, selected_video, user_browser, and count are referenced in both the basefact and the beacon objects. To determine which composite objects these structured data field objects are associated with, at 506, the field objects in the beacon object are associated with composite objects.
  • Generator 208 then uses the referenced composite objects from the beacon object to generate instructions on how to map unstructured data to structured data. For example, generator 208 generates instructions on how to tokenize (breaking the text of the beacon into words or phrases) and transform raw web log data to structured data. For example, at 508, the basefact object defines the structured data by the terms videoName, browserName, and totalCount, which are structured data fields that can be defined in a database. The transformations for the field objects in the basefact object are specified in the composite object that each beacon field object references as was discussed with respect to 506. Also, for the fact field object, generator 208 generates instructions to aggregate rows based on the count composite object.
  • Generator 208 then outputs the final software code that is compiled into target program 106. Target program 106 can then be used to process beacons and produce the transformed data as specified in the basefact definition.
  • FIG. 6 shows an example of target program 106 according to one embodiment. Generator 208 may generate target program 106 based on specification 202 and object model 206. At 602, the function “Map” defines the aggregator/reducer based on the MapReduce paradigm. Dimensions correspond to Keys, and Facts correspond to Values. At 604, the field “totalCount” corresponds to the structured data field defined in the basefact object of specification 202. Also, at 606, the “+=” symbol is determined based on the “sum” function in specification 202 that is an aggregator.
  • At 608, the term “playback_start” is based on which beacons were defined by specification 202. In this case, only events defined by playback_start beacons are reviewed. At 610, the conversion found in the composite Video is found, and at 612, the conversion found in the composite Browser is found. Further, at 614, the functions “Identity<Long>( )” and “StaticInputAction<Long>(1L)” are determined based on the fact “sum” in the basefact in specification 202. The above information is determined by reviewing object model 206 to generate the target program 106.
  • Accordingly, compiler 108 generates target program 106, which can map unstructured data to structured data. A user can declare the structured data that was desired and the transformations needed to transform unstructured data to structured data. Compiler 108 then generates the software code to perform the desired transformations. A user thus does not need to write software code for target program 106.
  • Further, particular embodiments leverage object model 206 that allows different generators 208 to operate on the object model. Thus, different specifications 202 may be parsed into an object model 206 that can be operated on by the same generators 208.
  • Particular embodiments may be implemented in a non-transitory computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or machine. The computer-readable storage medium contains instructions for controlling a computer system to perform a method described by particular embodiments. The instructions, when executed by one or more computer processors, may be operable to perform that which is described in particular embodiments.
  • As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
  • The above description illustrates various embodiments along with examples of how aspects of particular embodiments may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope hereof as defined by the claims.

Claims (22)

What is claimed is:
1. A method comprising:
receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data;
parsing, by a computer system, the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model;
determining, by the computer system, a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and
running, by the computer system, the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
2. The method of claim 1, wherein parsing the specification comprises determining a set of composite objects specifying a set of input parameters in the beacon and transformations to transform the set of input parameters to a set of output fields, a beacon object including a set of field objects that identify composite objects for the beacon object, and a basefact object including a set of structured data objects that identify the set of output fields in the composite object to map to a set of structured data fields.
3. The method of claim 2, wherein running the generator comprises determining first information for the beacon object referenced in the basefact object to determine which beacon objects are applicable for the basefact object.
4. The method of claim 3, wherein running the generator comprises determining second information for the set of structured data fields referenced in the basefact object to determine which output fields map to which structured data fields.
5. The method of claim 4, wherein running the generator comprises determining third information for a set of transformations for the set of input parameters in the set of composite objects to determine how to perform transformations to transform the set of input parameters to the set of output fields.
6. The method of claim 5, wherein running the generator comprises generating instructions for the target program using the first information, the second information, and the third information to transform the set of input fields to the set of output fields and map the set of output fields to the set of structured data fields.
7. The method of claim 6, wherein the first information, the second information, and the third information comprise software code.
8. The method of claim 1, further comprising parsing the specification to determine an abstract syntax tree, wherein the object model is determined from the abstract syntax tree.
9. A non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be configured for:
receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data;
parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model;
determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and
running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
10. The non-transitory computer-readable storage medium of claim 9, wherein parsing the specification comprises determining a set of composite objects specifying a set of input parameters in the beacon and transformations to transform the set of input parameters to a set of output fields, a beacon object including a set of field objects that identify composite objects for the beacon object, and a basefact object including a set of structured data objects that identify the set of output fields in the composite object to map to a set of structured data fields.
11. The non-transitory computer-readable storage medium of claim 10, wherein running the generator comprises determining first information for the beacon object referenced in the basefact object to determine which beacon objects are applicable for the basefact object.
12. The non-transitory computer-readable storage medium of claim 11, wherein running the generator comprises determining second information for the set of structured data fields referenced in the basefact object to determine which output fields map to which structured data fields.
13. The non-transitory computer-readable storage medium of claim 12, wherein running the generator comprises determining third information for a set of transformations for the set of input parameters in the set of composite objects to determine how to perform transformations to transform the set of input parameters to the set of output fields.
14. The non-transitory computer-readable storage medium of claim 13, wherein running the generator comprises generating instructions for the target program using the first information, the second information, and the third information to transform the set of input fields to the set of output fields and map the set of output fields to the set of structured data fields.
15. The non-transitory computer-readable storage medium of claim 14, wherein the first information, the second information, and the third information comprise software code.
16. The non-transitory computer-readable storage medium of claim 9, further comprising parsing the specification to determine an abstract syntax tree, wherein the object model is determined from the abstract syntax tree.
17. An apparatus comprising:
one or more computer processors; and
a computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for:
receiving a specification for processing beacons, the beacons being associated with an event occurring at a client while a user is interacting with a web application and including unstructured data;
parsing the specification to determine an object model including objects determined from the specification, wherein different specifications are parsed into a format of the object model;
determining a generator from a set of generators, wherein each generator is configured to process the format of the object model to generate a different type of target program to process the beacons and multiple generators can process different specifications that are parsed into the format of the object model; and
running the generator with the object model to generate a target program configured to identify the beacons for the specification, determine unstructured data in the beacons that were specified in the specification, and transform the unstructured data into structured data as specified in the specification.
18. The apparatus of claim 17, wherein parsing the specification comprises determining a set of composite objects specifying a set of input parameters in the beacon and transformations to transform the set of input parameters to a set of output fields, a beacon object including a set of field objects that identify composite objects for the beacon object, and a basefact object including a set of structured data objects that identify the set of output fields in the composite object to map to a set of structured data fields.
19. The apparatus of claim 18, wherein running the generator comprises determining first information for the beacon object referenced in the basefact object to determine which beacon objects are applicable for the basefact object.
20. The apparatus of claim 19, wherein running the generator comprises determining third information for a set of transformations for the set of input parameters in the set of composite objects to determine how to perform transformations to transform the set of input parameters to the set of output fields.
21. The apparatus of claim 20, wherein running the generator comprises generating instructions for the target program using the first information, the second information, and the third information to transform the set of input fields to the set of output fields and map the set of output fields to the set of structured data fields.
22. The apparatus of claim 21, wherein the first information, the second information, and the third information comprise software code.
US13/660,788 2012-10-25 2012-10-25 Framework for generating programs to process beacons Active US8725750B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/660,788 US8725750B1 (en) 2012-10-25 2012-10-25 Framework for generating programs to process beacons
US14/228,003 US9305032B2 (en) 2012-10-25 2014-03-27 Framework for generating programs to process beacons

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/660,788 US8725750B1 (en) 2012-10-25 2012-10-25 Framework for generating programs to process beacons

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/228,003 Continuation US9305032B2 (en) 2012-10-25 2014-03-27 Framework for generating programs to process beacons

Publications (2)

Publication Number Publication Date
US20140122511A1 true US20140122511A1 (en) 2014-05-01
US8725750B1 US8725750B1 (en) 2014-05-13

Family

ID=50548393

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/660,788 Active US8725750B1 (en) 2012-10-25 2012-10-25 Framework for generating programs to process beacons
US14/228,003 Active 2033-01-05 US9305032B2 (en) 2012-10-25 2014-03-27 Framework for generating programs to process beacons

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/228,003 Active 2033-01-05 US9305032B2 (en) 2012-10-25 2014-03-27 Framework for generating programs to process beacons

Country Status (1)

Country Link
US (2) US8725750B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10951685B1 (en) * 2018-07-19 2021-03-16 Poetic Systems, Llc Adaptive content deployment
US11288448B2 (en) * 2019-07-26 2022-03-29 Arista Networks, Inc. Techniques for implementing a command line interface

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392452B2 (en) * 2010-09-03 2013-03-05 Hulu Llc Method and apparatus for callback supplementation of media program metadata
US8868648B2 (en) 2012-05-14 2014-10-21 Business Objects Software Ltd. Accessing open data using business intelligence tools
US20140214897A1 (en) * 2013-01-31 2014-07-31 Yuankai Zhu SYSTEMS AND METHODS FOR ACCESSING A NoSQL DATABASE USING BUSINESS INTELLIGENCE TOOLS
US10803083B2 (en) 2015-08-27 2020-10-13 Infosys Limited System and method of generating platform-agnostic abstract syntax tree

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8015143B2 (en) * 2002-05-22 2011-09-06 Estes Timothy W Knowledge discovery agent system and method
CA2528492A1 (en) * 2003-06-04 2005-01-06 The Trustees Of The University Of Pennsylvania Ndma db schema dicom to relational schema translation and xml to sql query translation
US20050234973A1 (en) * 2004-04-15 2005-10-20 Microsoft Corporation Mining service requests for product support
US7822768B2 (en) * 2004-11-23 2010-10-26 International Business Machines Corporation System and method for automating data normalization using text analytics
US20060173865A1 (en) * 2005-02-03 2006-08-03 Fong Joseph S System and method of translating a relational database into an XML document and vice versa
US7849048B2 (en) * 2005-07-05 2010-12-07 Clarabridge, Inc. System and method of making unstructured data available to structured data analysis tools
US20070011183A1 (en) * 2005-07-05 2007-01-11 Justin Langseth Analysis and transformation tools for structured and unstructured data
US7613996B2 (en) * 2005-08-15 2009-11-03 Microsoft Corporation Enabling selection of an inferred schema part
US20080091591A1 (en) * 2006-04-28 2008-04-17 Rockne Egnatios Methods and systems for opening and funding a financial account online
US7849030B2 (en) * 2006-05-31 2010-12-07 Hartford Fire Insurance Company Method and system for classifying documents
US8271429B2 (en) * 2006-09-11 2012-09-18 Wiredset Llc System and method for collecting and processing data
US8160977B2 (en) * 2006-12-11 2012-04-17 Poulin Christian D Collaborative predictive model building
US20090276403A1 (en) * 2008-04-30 2009-11-05 Pablo Tamayo Projection mining for advanced recommendation systems and data mining
US20100100439A1 (en) * 2008-06-12 2010-04-22 Dawn Jutla Multi-platform system apparatus for interoperable, multimedia-accessible and convertible structured and unstructured wikis, wiki user networks, and other user-generated content repositories
US9460189B2 (en) * 2010-09-23 2016-10-04 Microsoft Technology Licensing, Llc Data model dualization
US9111018B2 (en) * 2010-12-30 2015-08-18 Cerner Innovation, Inc Patient care cards
US9092802B1 (en) * 2011-08-15 2015-07-28 Ramakrishna Akella Statistical machine learning and business process models systems and methods
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10951685B1 (en) * 2018-07-19 2021-03-16 Poetic Systems, Llc Adaptive content deployment
US11288448B2 (en) * 2019-07-26 2022-03-29 Arista Networks, Inc. Techniques for implementing a command line interface

Also Published As

Publication number Publication date
US9305032B2 (en) 2016-04-05
US8725750B1 (en) 2014-05-13
US20140214867A1 (en) 2014-07-31

Similar Documents

Publication Publication Date Title
US9305032B2 (en) Framework for generating programs to process beacons
CN106575166B (en) Method for processing hand input character, splitting and merging data and processing encoding and decoding
US9607061B2 (en) Using views of subsets of nodes of a schema to generate data transformation jobs to transform input files in first data formats to output files in second data formats
US9495429B2 (en) Automatic synthesis and presentation of OLAP cubes from semantically enriched data sources
WO2016082468A1 (en) Data graphing method, device and database server
US8726229B2 (en) Multi-language support for service adaptation
US20070038930A1 (en) Method and system for an architecture for the processing of structured documents
US20110276603A1 (en) Dependency graphs for multiple domains
US9535966B1 (en) Techniques for aggregating data from multiple sources
CN104536987B (en) A kind of method and device for inquiring about data
EP3846089B1 (en) Generating a knowledge graph of multiple application programming interfaces
US20180129712A1 (en) Data provenance and data pedigree tracking
US10031981B2 (en) Exporting data to web-based applications
Daquino et al. Creating RESTful APIs over SPARQL endpoints using RAMOSE
CN109284088B (en) Signaling big data processing method and electronic equipment
US9886424B2 (en) Web application framework for extracting content
Bader et al. Semantic annotation of heterogeneous data sources: Towards an integrated information framework for service technicians
US11726994B1 (en) Providing query restatements for explaining natural language query results
KR100491725B1 (en) A data integration system and method using XQuery for defining the integrated schema
US7617448B2 (en) Method and system for validation of structured documents
CN1588371A (en) Forming method for package device
US20230306002A1 (en) Help documentation enabler
CN116303322B (en) Declaration type log generalization method and device
Dong et al. Implementation of Web Resource Service to Product Design
Eeda Rendering real-time dashboards using a GraphQL-based UI Architecture

Legal Events

Date Code Title Description
AS Assignment

Owner name: HULU LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WAYE, LUCAS;SENG, KEVIN;BAJARIA, VIRAL;AND OTHERS;SIGNING DATES FROM 20121022 TO 20121024;REEL/FRAME:029194/0325

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8