EP1194862A1 - Intelligentes computersystem - Google Patents

Intelligentes computersystem

Info

Publication number
EP1194862A1
EP1194862A1 EP00930757A EP00930757A EP1194862A1 EP 1194862 A1 EP1194862 A1 EP 1194862A1 EP 00930757 A EP00930757 A EP 00930757A EP 00930757 A EP00930757 A EP 00930757A EP 1194862 A1 EP1194862 A1 EP 1194862A1
Authority
EP
European Patent Office
Prior art keywords
inference
evidence
bayesian
domain
bayesian models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00930757A
Other languages
English (en)
French (fr)
Inventor
Joel Ratsaby
Gad Barnea
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Manna Inc
Original Assignee
Manna Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Manna Inc filed Critical Manna Inc
Publication of EP1194862A1 publication Critical patent/EP1194862A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Definitions

  • the invention relates generally to distributed computer systems accessible by a plurality of users. More specifically, the present invention relates to intelligent distributed computer systems that make recommendations or answer queries related to users (or virtual sessions that represent users either online or offline) as a function of what the system learns related to those users.
  • network-based computer systems seek to go beyond the static, pre-scripted, or predetermined presentation of information to computer users.
  • some systems now attempt to provide secondary or ancillary information, such as marketing information, to a user (e.g., a consumers engaged in e-commerce), wherein this secondary information is unsolicited. It is simply provided (in response to an event) in addition to the providing of the primary information sought by the user.
  • a system will attempt to determine and provide the primary information that the system deems relevant or optimal for the user.
  • a system may attempt to provide information that is not directly presented to a user, but is in response to an event related to a user (or some other system activity).
  • some systems attempt to "learn" something about the user, and then selectively provide information as a result of a prediction based on what the system has learned or knows - sometimes referred to as "personalization” .
  • Systems that attempt to accomplish this vary, and so to does their effectiveness.
  • collaborative filtering is a supervised learning approach.
  • User profiles are constructed during a training phase where the user rates selected items, e.g. , movies, with a score in a supervised manner.
  • Such supervised training approaches are time consuming and can pose an inconvenience for the user.
  • collaborative filtering approaches can be viewed as learning by discovering clusters of similar user profiles and predicting the response for a selected user based on the cluster to which it belongs. Clustering is the most basic way of discovering interesting relationships between attributes (of profiles). For instance, discovering that high scores for certain movies are correlated with low scores of a selected movie.
  • Such techniques base their predictions or classification of the selected user using the profiles of similar users.
  • collaborative filtering is a basic form of case-based learning. It is based on algorithms that were developed in the 1960's and 1970's, specifically, the "weighted nearest-neighbor” algorithm and the "k-nearest neighbor” algorithm. These are non-parametric methods that rely on stored data for prediction. Because these approaches do not develop models of any sort, they require storing in memory of all the data cases in order to make predictions. In addition to high memory costs, significant computations are required at prediction time since the pattern similarity computations are done at recommendation time, rather than in advance. This consumption of system resources may undesirably limit the prediction response times.
  • Neural networks are parametric models used to approximate multi-variable real valued mappings or functions. They can be used to learn probability functions.
  • At a micro-level neural networks are composed of nodes which are graphically interconnected, with a moderate degree of approximation ability.
  • On a macro-level neural networks are not well suited for making predictions in a highly dynamic real-time environment.
  • Neural networks are like black-boxes, wherein an input layer and an output layer must be defined a priori to learning. After learning, only this single mapping can be used for inferences.
  • the strengths between the internal nodes also known as the hidden layers
  • the strengths between the internal nodes also known as the hidden layers which are learned by the neural networks are where the information lies. However, there is no easy way of interpreting this information other than requesting an output for a given input.
  • neural networks cannot utilize available prior knowledge in addition to the input data. That is, a neural network is static with respect to time. If at some later stage new variables are added or deleted to the data then one needs to re-learn the neural network with these changed domain. This means that previously learned information will be lost.
  • Neural networks in some instances can learn incrementally, but using a heuristic approach as opposed to sound probability-based updating, which compromises the eventual results.
  • neural networks can not learn with partial incomplete data, they generally can not do "clustering" (or unsupervised learning). That is, multi-layered feedforward neural networks cannot do clustering because they require complete data (this mode of learning is known as supervised learning).
  • neural networks have strong predictive and estimation capabilities utilizing their inherent non-linear structure, the algorithms with which they learn are too slow for dynamic settings such as the ones encountered in today's distributed computer systems, e.g., on-line e-commerce systems. They require learning from batch data, which means they need to scan through each data case numerous times before converging to the optimal solution. As such, they do not scale efficiently with respect to the size of large, changing data sets.
  • neural networks are very hard to scale (if at all possible) mainly due to their black-box nature. For example, one cannot break a traditional single neural network into two networks, which limits the usefulness in a dynamic distributed computer architecture, where such scalability, if available, can be advantageous.
  • the present invention is an intelligence system capable of learning trends and profiles related to a user based on, at least in part, the user's interaction with the system and possibly some domain related information or events to facilitate on-line, real-time production and communication of predictions and/or recommendations related to the user.
  • the intelligence system is characterized as having automated distributed intelligence (ADI), and is added to an enterprise computer system to add intelligence and personalization thereto, creating an intelligent enterprise system.
  • ADI automated distributed intelligence
  • the intelligent enterprise system Due to the intelligence system, the intelligent enterprise system has substantially no down-time, operates in real-time, and is structured to have unencumbered scalability, through the use of standardized and platform independent software entities and the ability of the intelligence system to monitor and redistribute its load dynamically.
  • an enterprise may be a vendor of products and/or services engaged in e-commerce, wherein the access to the intelligent enterprise system may be controlled, as in a private network, or open, as in a publicly accessible e-commerce Web site.
  • the intelligent enterprise system may be distributed over any of a variety of commonly known networks, such as a local area network (LAN), wide area network (WAN), world wide web (the "Web"), intranet, extranet, Internet, private network, or some combination thereof.
  • the intelligent enterprise system may include one or more servers and databases that may be accessed by multiple users simultaneously over a network by wired and/or wireless devices (e.g., personal computers, personal digital assistants, cellular telephones, and so on).
  • the enterprise computer system may take any of a variety of forms.
  • the enterprise computer system may be an application (e.g., a word processing application) running on a computer that includes an interface to the intelligence system for gaining a personalization capability.
  • the enterprise computer system may be an e- mail system having an interface with the intelligence system.
  • the e-mail system may include servers and databases and may link to an intelligence system having its own set of servers and databases, or the two systems may share servers and databases.
  • the enterprise system may be a video system wherein personalization from the intelligence system influences the video content presented to the user of the (combined) intelligent enterprise system.
  • the enterprise system may be a web application running in a browser with its backend running at a web server. The web application will link to the intelligence system.
  • the intelligence system is preferably implemented like a client - server architecture, having a set of server-side software modules and possibly a client-side software module (i.e., client module).
  • the software modules of the present invention are, largely, implemented in accordance with object oriented design methodologies, but those skilled in the art will appreciate that other design methods may also be used to practice the present invention.
  • the intelligence system preferably includes (or has access to) at least one wired or wireless intelligence system server (or computer) to host the intelligence system server-side software, along with at least one associated database.
  • the client module may be hosted on any of a variety of types of wired or wireless electronic devices, such as a computer, server, or other device.
  • the client module and server-side software may be hosted on the same platform, and with or without an enterprise application. That is, the distinction between the server-side software and client module of the intelligence system is primarily functional and logical, rather than physical.
  • the client module provides an interface between the enterprise application (e.g., e- mail, word processor, and so on) and the personalization functionality of the server-side.
  • the client module may be hosted on a front-end Web server, along with an enterprise Web-based application and the server-side software may be hosted on a back-end intelligence system server.
  • the front-end server could be a Web server, an application server, or a wired or wireless communication server, as examples, or any other system, server, or device that seeks to take advantage of the personalization offered by intelligence system of the present invention.
  • the front-end server may be integrated into the same platform as the back-end server, as discussed above.
  • the server-side software includes a business command center, a core module, an artificial intelligence (Al) module, and a set of administrative tools.
  • the administrative tools include a business object developer for automated creation of business (i.e., enterprise) objects, which embody enterprise specific rules.
  • the business command center may be run on the intelligence system (e.g. , back-end) server and use the same database.
  • the business command center provides logic for an enterprise, non-technical user to generate and maintain enterprise specific rules (as objects). These rules are related to the goals, tasks, and processes necessary to carry out the enterprise's objectives for the system. For example, if the enterprise were an on-line food distributor, the rules may relate to enterprise product offerings, to shopping cart abandonment, and so on.
  • the business command center implements standard browser-based functionality and "wizards" to allow ease of use by enterprise users.
  • Rules may contain a variety of components, depending on the application of the intelligent enterprise system.
  • the rules may include a "time frame” component indicating when the rule to be applied and a "situation” component that provides dynamic information about the current state of the session.
  • a "profile” component provides static information about the user and a “result” component indicates the nature of the system's response as a function of the previous components.
  • the business command center is also configured to generate a variety of reports about the system, based on data it collects, that are useful to enterprise.
  • the core module and Al module are hosted and run on the intelligence system (e.g. , back-end) server.
  • the core module serves as an "operating system" to the intelligence system and provides centralized administration and processing functions.
  • the core module -includes a messaging facility and offers dynamic views of available computer resources. Using a highly distributed architecture, plug-ins, load balancing, messaging, and events, the core module continually ensures maximum utilization of system resources and retains only those objects that relate to the active sessions.
  • the core also includes a rule evaluator that evaluates information generated by an external event (e.g. , a consumer purchase) according to the business rules established by the enterprise using the business command center.
  • the Al module provides for the real-time creation, maintenance and application of Bayesian models, which are used to make personalized recommendations and to infer answers for various queries.
  • the Bayesian models are created and maintained using on-line and offline processes as a function of previous and current user responses and the general state of the intelligent enterprise system.
  • the Bayesian models and rules are applied to generate intelligent responses to user inputs, including providing to a user information (e.g. , recommendations) regarding a second subject matter that is related to a first subject matter being queried by the user.
  • the Al module is the "brain” (or Al engine) of the intelligence system, and makes real-time recommendations based on user behavior models, continuously updating them with new information.
  • the Al module more than any other component, is what transforms a standard enterprise site into a dynamic learning and inference center.
  • the Al module creates and employs intelligent virtual agents that are capable of automatically learning, utilizing and sharing learned knowledge to serve the client.
  • the intelligence system's robust, scalable and distributed server architecture allows multiple virtual agents, learning independently, to form a unified intelligence entity, which acts as a single distributed "virtual brain".
  • model building process constructs attributes, defines their values and then gathers them into models according to logical relationships. These models are then inter-linked, based on their statistical coupling.
  • models include enterprise subject matter categories and subcategories, such as products/services categories.
  • Bayesian network models built off-line and tailored by the intelligence system are customized for the enterprise's application, within the context of the intelligent enterprise system.
  • Off-line learning involves running several intelligent programs with several input data files, resulting in a set of models.
  • Each model is an object file, in the preferred embodiment.
  • a model includes variables and probability connections between those variables, which may be expressed in tables.
  • the models are updated during normal system operation to reflect knowledge gained through on-line learning.
  • user-related events occur they are logged into a database.
  • the intelligent enterprise system opens the database and puts the information gathered into data files.
  • the data files are continuously updated as new events occur.
  • On-line learning processes the files, updating the values in the models' probability tables. This action permits an enterprise site to learn quickly and accurately about the trends and patterns of user behavior and activity.
  • the Al module is comprised of two main parts, a machine learning system (MLS) and an inference system (IS).
  • MLS machine learning system
  • IS inference system
  • the MLS automatically creates Bayesian Network Models based on consumers' past data. These models are then used by the IS as on-line intelligent resources. All of the intelligent operations preformed by the Al module can be obtained from these
  • Bayesian models such as prediction, classification, and maximum expected utility optimization.
  • Both of the MLS and IS are structured using parallel distributed Java object-oriented code, allowing advantage to be taken of the distributed application server architecture.
  • the processes of machine learning (using MLS) and inference (using IS) are scalable. This implies that for making inferences, the number of on-line sessions that require intelligent resources is scalable, while for machine learning the number of Bayesian network models is scalable with the number of computers (or servers).
  • Bayesian Model networks are not a supervised learning approach, wherein user profiles are constructed during a training phase where the user rates selected items, e.g., movies, with a score in a supervised manner.
  • the Bayesian Model-based learning algorithms of the present invention can incorporate both supervised and unsupervised forms of learning. Through statistically based incremental learning algorithms, learning can take place even with partially incomplete data.
  • User profiles containing dynamic information such as on-line browsing behavior are formed in an unsupervised manner without burdening users with questionnaires.
  • Bayesian model networks are by no means black-boxes. Every node, or attribute, has a meaning and its value may be queried.
  • a Bayesian model network in its whole, represents a joint probability distribution over the attributes of the domain where the functional form is depicted via its structure and probabilities.
  • the learned structure depicts useful information in terms of inter-dependencies amongst these attributes. For instance, the fact that one Bayesian model attribute is linked to another attribute implies that there is a statistically significant (based on the data) relationship between the two variables.
  • no prior input-output mapping needs to be defined at the start of learning. Any attribute(s) may serve as an input or output at any particular query time.
  • Bayesian model networks provide benefits over neural networks.
  • the implementation of Bayesian model networks in the present invention causes the system to accumulate available observations (or evidence) and then compound this evidence in a statistically sound manner, and produce the best (in a probabilistic sense) output or recommendation. Additionally, a Bayesian model network can incrementally change both its structure and its probabilities based on a new domain and on new data. New attributes can be added and old ones can be deleted with minor loss of probabilistic interdependency information.
  • Figure 1 is a representative computer architecture using the intelligent enterprise system, in accordance with the present invention
  • Figure 2 is a program architecture for the intelligence system of Figure 1;
  • Figure 3 is a representative graphical user interface of the business command center of the system of Figure 2;
  • Figure 4 is a flowchart depicting the process of rule evaluation in accordance with the present invention
  • FIG. 5 is a block diagram of an artificial intelligence module, as part of the intelligent enterprise system of the present invention.
  • Figures 6A and 6B are examples of a representative product database file and portions of a corresponding representative serial file, respectively, used by the machine learning system of the Al module of Figure 5;
  • Figures 7 A and 7B show examples of logical and topological linking models, respectively, of the IS and MLS of Figure 5;
  • Figure 8 is a representative hyperlink file generated by the MLS of Figure 5;
  • Figure 9 is an example of a serial session data file generated and used by the MLS of Figure 5;
  • Figure 10 is an object-oriented depiction of a DRV model, used by the MLS of Figure 5;
  • Figure 11 is a table depicting the on-line learning modules of the MLS of Figure 5;
  • Figures 12 A and 12B are block diagrams of an architecture of the MLS of Figure 5;
  • Figures 13 A and 13B provide a table depicting the MLS components and their interactions;
  • Figure 14 is a table depicting messaging objects used by the MLS, in accordance with the present invention.
  • Figures 15A-15C provide a table of the components of the IS of Figure 5 and their interactions;
  • Figure 16 is a table depicting messaging objects used by the IS, in accordance with the present invention.
  • Figure 17 is a representative block diagram of the IS of Figure 5;
  • Figure 18 is an example of a Bayesian network model, in accordance with the present invention.
  • Figure 19 is an example of a Bayesian model in accordance with the present invention.
  • Figure 20 is an example of a category change event file generated and used by the IS of Figure 5;
  • Figures 21A-12E are samples of QRCTM agent structures of various APIs supported by the IS;
  • Figure 22 is a sample recommended list returned by an IS inference agent.
  • Figures 23-26 depict various absorptions techniques of evidence among models.
  • the present invention is an intelligence system capable of real-time inference and learning, and updating of user profiles and trends to facilitate on-line, real-time recommendations or answer queries related to the user, referred to as "personalization" .
  • the intelligent computer system may be added to, or integrated with, any of a variety of enterprise computer systems to add dynamic personalization capability thereto, resulting in an intelligent enterprise system.
  • the enterprise system may be an application (e.g. , a word processor, or e-mail personalization server), a network-based application with a presentation device (like a video tour on a kiosk that is personalized), or an e-commerce system.
  • An intelligent enterprise system in accordance with the present invention may include at least three characteristics.
  • the intelligent enterprise system automatically develops Bayesian models and uses them to generate intelligent responses to system events.
  • the intelligent e- enterprise system accesses data and responds to events throughout the system.
  • the intelligent e- enterprise system processes, analyzes, and applies data according to rules defined by an enterprise user (i.e., enterprise rules).
  • an enterprise computer system be any of a variety of systems and may include or be accessible by a network, such as, for example, a LAN, a WAN, an intranet, an extranet, a private network, the Internet, or the World Wide Web (the "Web"), or some combination thereof.
  • Representatives of the enterprise which configure the system for a particular enterprise's needs are referred to as "enterprise users" and may include, for example, non-technical sales and marketing staff.
  • any of a variety of types of entities may generates and/or receive events to and from the intelligence system.
  • Such other types of entities may include an e-commerce consumer searching for at least one primary product or service, a non e-commerce user (e.g.
  • the intelligent enterprise system may support events related to a variety of types of entities.
  • the intelligence system is added to an e-commerce system of an enterprise to add, among other things, intelligence thereto, resulting in an intelligent e-commerce system.
  • the intelligent e-commerce system may provide recommendations related to one or more secondary products or services, or predict the answer for a query about the consumer that could result in a dynamic site change (like offering an incentive).
  • a secondary product or service is one not directly searched by the consumer, and may or may not be related to the primary product or service that was searched by the consumer.
  • a secondary (related) product may be a certain cheese from a certain maker or supplier, but another (unrelated) secondary product may be a particular automobile.
  • the particular secondary products or services offered or recommended are a result of the intelligence provided by the intelligent computer system, as it continually updates consumer profiles and applies probabilistic models relating to the products and services sought by the consumer.
  • predictions and recommendations are not confined to the secondary products (i.e., information).
  • the intelligence system can make predictions and recommendations regarding and user related information (i.e. , primary, secondary, or characterized in some other way).
  • the predictions and recommendations need not relate (directly) to a user's request for information. Rather, such predictions and requests are made in response to events generated and somehow related to said user.
  • the intelligent enterprise system is implemented on a distributed computer architecture 100, as shown in Figure 1.
  • consumers, or other types of users may access the intelligent enterprise system 180 via any of a variety of known means and with any of a variety of known wired or wireless devices.
  • a software application (or system) 101 may access the intelligent enterprise system 180 by accessing an application server 110 via a network, represented by network cloud 118.
  • such devices include a personal computer 102, a laptop computer 104, a personal digital assistant 106, which are shown in Figure 1 as accessing a Web server 152 via the Internet and World Wide Web (the Web), represented by cloud 120.
  • any of these devices may also include an application (like application 101) that access server 162 via the Internet and Web 120 and server 152.
  • a user can access the intelligent enterprise system 180 using a typical telephone or a cellular telephone, which are shown accessing telephone server 114 via a telephone network, represented as network cloud 122.
  • server 110, 152, and 114 access an intelligence system server 162 via a wired or wireless means. All separation is logical and in theory 162 and 152 could run on the same computer.
  • the intelligent enterprise system 180 includes the intelligence system integrated with the enterprise (e.g. , e-commerce) system, each of which may include one or more servers and associated databases. In some configurations, the intelligence and enterprise systems may share servers and databases.
  • application server 110, Web server 152, and/or telephone server 114 may be front-end servers, through which entities access the intelligent system 180.
  • One or more intelligence system (e.g. , back- end) servers and databases e.g., server 162 and database 163) service requests or events related to activity by entities interacting with the intelligent enterprise system 180, and received via one of the front-end servers.
  • a database server 142 (or another relevant server or system) may be linked to the intelligent system server 162 locally via a LAN or remotely via a WAN as indicated by link 126, as examples.
  • database server 142 being a general site DB server, could be also connected to 110, 152, 114.
  • Figure 2 shows a representative architecture for the intelligent enterprise system 180.
  • the intelligence system 200 integrates with an e-commerce solution, tying into an e-commerce Web server 152 to obtain real-time information related consumer 110 click-stream behavior or other site information, as well as tying into existing database information (e.g., database 143), which may include information such as consumer demographics and buying behavior.
  • the Web server 152 hosts an intelligence system 200 client module (the "client") 154.
  • Client 154 interacts with intelligence system server 162, which hosts various intelligence system program modules, including a core module 210, an artificial intelligence (Al) module 220 and a business command center (BCC) module 230.
  • an enterprise user 112 creates and updates business rules using the business command center module 230 and the rules are stored in the intelligence system database 163 and published (i.e., made available) to the intelligence system server 162. Thereafter, consumer 110 accesses the enterprise's e-commerce Web site through the Web server 152 to interact with the intelligent e-commerce system 180. Beyond consumer activity, the intelligence system 200 may also be responsive to events generated by other (non-consumer) entities. In response to consumer activity, the client 154 sends related events to and receives results (e.g., recommendations) from the intelligence system server 162. Client 154 can be deployed in any of a variety of known manners, for example, using ActiveX, Servlets or Sockets, depending on the system tools and platform.
  • the client 154 When using sockets, the client 154 is actually nonexistent and the e-commerce system communicates directly with 162.
  • the core 210 i.e., the intelligence system's "operating system”
  • listens for occurrences of selected events resulting from the consumer's click-stream activity.
  • a rule evaluator which is a component of core 210
  • an event is sent to all active rules to determine which rules are relevant for the given consumer 110.
  • the rule evaluator calls on Al module 220, as needed, to determine how to apply each rule for the given consumer.
  • the Al module 220 can utilize operational and historical databases 143 for specific consumer/product data.
  • core 210 sends a resulting recommendation for personalized products, services or other content, as examples, to client 154.
  • client 154 communicates the recommendation to Web server 152, which in turn changes the Web page and passes a new one to consumer 110.
  • core 210 executes basic functions such as load balancing, rule creation and messaging.
  • the core 210 reads and updates data, including rules and configuration information, which are stored as XML files in the intelligence system database 163.
  • the intelligence system 200 integrates to offer an enterprise a comprehensive, scalable and customizable on-line personalization tool, including the five components: business command center 230 , intelligence system server 162 (the actual personalization server), intelligence system client 154, intelligence system database 163, and a set of administration tools 240. Working together, these components enable the enterprise to create and test specific e-commerce initiatives, interact with and learn from consumers and their preferences, as well as analyze and report results.
  • the business command center 230 allows an enterprise user to create, pre-test, update, and evaluate the impact of their intended e-commerce initiatives using their own defined business rules. Preferably, this interaction takes place via a standard Web browser (e.g., Internet Explorer by Microsoft Corporation of Redmond, WA) and "wizards", in an easy to use windowing environment, without requiring support from technical personnel (e.g., computer programmers or information technology personnel).
  • a standard Web browser e.g., Internet Explorer by Microsoft Corporation of Redmond, WA
  • “wizards” in an easy to use windowing environment, without requiring support from technical personnel (e.g., computer programmers or information technology personnel).
  • the business command center 230 provides in-depth reports and analyses offering enterprise users 112 the critical information necessary for making effective decisions in real-time mode.
  • a business command center main screen 300 is shown in Figure 3, which may be used for building and editing enterprise defined rules. Rules are built using business objects created using a business object developer (BOD), discussed with respect to the administration tools 240 below.
  • BOD business object developer
  • a business object provides a tangible expression for (or embodies) a rule, wherein a rule has four components (or levels) in the preferred embodiment, including time frame 302, situation 304, profile 306 and result 308, shown in Figure 3.
  • the latter 3 components are constructed of Boolean expressions composed of operations. Those operations could be of several types, for example, they may be:
  • the time frame level 302 defines the exact times when the business rule will be activated by the system 180 (e.g. , a first quarter promotion (Jan 1 - Mar 31)).
  • the situation level 304 defines the dynamic information relating to the current state of the system (e.g., "The user has clicked to purchase cookies.”).
  • the situation 304 triggers the evaluation of the business rule by the rule evaluator (discussed in Section 2.1 below).
  • the profile 306 provides static information about the user (e.g., "This user is age 50".; "This user bought milk last month.”).
  • the result component is the response to the user (e.g., "This month we have a sale on milk.”), which delivers personalized recommendations of products, services or content.
  • the business command center 220 includes a reporting capability that permits an enterprise to track the performance (i.e., consumer response) of their initiatives and provides a powerful analysis tool in the process.
  • intelligence system 200 can recommend refinements to various rules to improve the intelligent computer system's 180 performance. Ongoing analysis, refinement and changing business needs lead naturally to the creation of new rules, in turn providing better marketing intelligence for the enterprise.
  • the intelligence system 200 reporting module tracks several critical types of information and provides corresponding reports 328, including: business rule reports, intelligence system performance reports, and site behavior reports. This performance monitoring and the corresponding reports allow enterprise users to make improvements through the creation of new business rules and the modification of existing business rules.
  • business rule reports e.g., Holiday initiatives
  • the intelligence system 200 tracks the impact on consumer behavior over time.
  • key metrics such as page views, click-throughs, and purchase patterns
  • intelligence system 200 enables the enterprise user to determine which rules are most effective, and to analyze rule effectiveness across consumer profiles, time periods, products and categories.
  • intelligence system 200 tracks its own effectiveness across all business rules, measuring bottom-line impact on consumer behavior. Again, this information can be analyzed across profiles, time periods, products and categories. Finally, at the most basic level, enterprise users must understand how consumers are behaving on-line. Therefore, intelligence system 200 provides key information on consumer profiles (e.g., who are my consumers?), page views and click-throughs (e.g., where are they going on the site?) and consumer purchase patterns (e.g., what are they buying?).
  • consumer profiles e.g., who are my consumers?
  • page views and click-throughs e.g., where are they going on the site
  • consumer purchase patterns e.g., what are they buying?
  • Intelligence system 200 is also equipped with a set of default tables and charts for each type of report, which are preferably output in a standard output form, such as ExcelTM, by Microsoft Corporation, and customized by enterprise users as needed. All reports can be run in real-time or scheduled for on-going reporting.
  • the BCC 230 includes simulation capability. That is, once a new rule has been created, the enterprise user 112 can simulate it, prior to publishing it, by accessing the learning and inference engine (discussed in section 6 below) to run a prediction for success on the considered rule. Thus, marketing initiatives can be tested internally, before publication (or launch). As a result, failed initiatives are minimized and optimal on-line results may be realized. 2.
  • the Core 210
  • both the core 210 (and its rule evaluator) and the Al module 220 are key components that run on the intelligence system server 162.
  • Core 210 provides operating system-level and other services to the intelligence system applications and modules.
  • the intelligence system server 162 is responsible for distributing events to the appropriate services and to the intelligence system database 163, to be logged for future reporting purposes.
  • the core 210 which is the "heart" of the intelligence system 200, also includes a messaging facility and offers dynamic views of available computer resources. Using a highly distributed architecture, plug-ins, load balancing, sophisticated messaging, and events, core 210 continually ensures maximum utilization of system resources and retains only those objects that relate to active sessions. Merging a variety of methodologies, the core provides a robust intelligence system to an enterprise's e-commerce system, delivers high scalability and automation, and supports multiple applications. The core 210 accomplishes this primarily in two ways. First, the core uses plug-ins to automate system processes, such as data retrieval and rule evaluation. The plug-ins act independently of other system components and can be deleted, edited or modified instantly without rebooting the system. Second, the core implements a highly efficient, adaptable messaging system. The core reads files in XML and transmits in XML DOM, i.e., formats supporting the intelligence system's 200 distributed architecture.
  • Plug-ins are executable components of the core created for a specific business (or enterprise) function. They can be created, invoked, updated, and removed in real- time without interrupting the system operation. Plug-ins contribute to the intelligent system's flexibility by allowing rules to be created, edited and deleted during run-time. Business rules, business objects, and certain other components are plug-ins, as examples.
  • Load Balancing is the term given to the intelligent system's ability to control the distribution of sessions (or tasks) over a group of virtual machines. Load balancing facilitates optimal deployment for any given hardware configuration, ensuring that a given machine is not overloaded with sessions, relative to other machines in the distributed architecture. More demanding services are placed on the machines with the highest capacity, as determined by a load balancing system. This combined with he ability to run on more than one physical machines achieves scalability. Generally, load-balancing functionality is known and not discussed in detail herein.
  • Query request contexts (QRCTMs) agents are the intelligence system's internal messaging agents (QRCTM is a trademark of Manna, Inc. of Newton, MA).
  • messages are sent to components external to the intelligence system's server 162 in XML format, and transmitted internally in XML DOM format, wherein both formats are generally known in the art.
  • a QRCTM agent contains data, a session number, a consumer ID, a list of target plug-ins (to receive the QRCTM agent), and a list of objects that are interested in receiving a reply from the target plug-ins.
  • a system integrator in conjunction with an enterprise system administrator, establishes or defines events. Each event is recorded in a configuration table, which is stored in the intelligence system database 163. Any visitor (e.g., consumer) action on the e-commerce Web site (or server 152) can trigger an event (e.g., a visitor registering his name). There can be usage of internal system events as well.
  • a rule is a specified business scenario that can occur on-line, including an interactive event that is triggered by a scenario.
  • the rule evaluator a component of the core 210, is the mechanism that evaluates information generated by an external event (e.g., a consumer purchase) according to a business rule.
  • the rule evaluator receives the rules established by the business command center 230, and evaluates each rule using data stored (e.g., in database 163) or the intelligence system's consumer behavior models combined with other available data. The result is routed to client 154 in real-time, using the intelligence system's server's messaging facility.
  • One business rule can serve many sessions and can be in one of two states: published or unpublished (as indicated in Figure 3), wherein the state of the rule is controlled through the business command center 230 (as discussed above).
  • a published rule is active in the personalization server, whereas an unpublished rule isn't active, but it also appears in the rule database 163.
  • the business rule evaluation cycle is summarized in the flow chart 400 of Figure 4.
  • a visitor comes to the e-commerce Web site and triggers an event that is sent to the intelligence system's server 162, in step 402.
  • the rule evaluator in step 404, only looks at the published rules that fit the current time frame to see if they apply to the event.
  • the rule evaluator then, in step 406, filters out all rules that do not have the event in the situation parameter, not yet determining if the rule will be evaluated as "true” .
  • the next step is the evaluation of the clauses in the situation and profile components of the remaining (i.e., not eliminated) rules. If the situation evaluates to be true, then the profile is checked, in step 408.
  • step 408 the result parameter (or level) is determined and sent back to the client 154, in step 410.
  • the system then returns to step 412 and waits for another event, in step 412. All of these could run in parallel (assume 2 events arrive at the server at the same time).
  • Client 154 uses highly adaptable, yet standard, protocols to mediate between all leading Web applications and the intelligence system server 162.
  • the client can support a single Web application or adapt itself to work with Web applications distributed over a number of hosts, as well as "non-sticky" sessions.
  • the intelligence system database 163 stores the configuration tables required for system setup, business objects created using a business object developer, and business rules created in the business command center 230. Log files created by intelligence system 200 for reporting purposes are also stored in the intelligence system database 163. The data is stored in standard XML format so that it is easy to send the information between system components.
  • the intelligence system 200 includes administration tools for implementing and maintaining a stable, scalable, personalized enterprise system, in this case a Web-based e- commerce system, such as the business object developer (BOD).
  • BOD business object developer
  • the BOD is an automated tool for creating business objects and their operations, which are later used as building blocks by the BCC to create rule components.
  • the BOD implements a browser-based approach and "wizards" to facilitate ease of use.
  • Business objects are designed for three stages of the rule evaluation process 400, namely, the evaluation of the situation, profile and result components.
  • the BOD supports operations relating to:
  • SQL which identifies standard SQL queries to retrieve one or more data values
  • Stored Procedure which identifies standard store procedure routines with defined input and output parameters
  • Business objects are the fundamental building blocks of any business rule. Once created with the BOD, the business objects are available to the enterprise through the BCC for creating or modifying enterprise related initiatives or business rules.
  • a business object is a group of related business functions called "methods" or "operations", and provides a mechanism to group like methods.
  • a business object may be expressed in the format: BusinessObjectName.Method(Parameter). Methods are pieces of business functionality that are derived by accessing consumer and legacy databases, data stores, or through a series of procedures. Parameters are the values required to properly evaluate methods. However, as will be understood by those skilled in the art, not all methods require parameters.
  • a business object could be "Consumer", offering a choice of several methods, including “Age”, “Address” or “Occupation”, among others. Each method may have a selection of parameters that is used to determine the value of the method, such as a consumer number.
  • a complete business object would be: Consumer. Address(Consumer ⁇ D)
  • a business rule is a series of clauses consisting of business objects, which are evaluated sequentially to determine whether the overall rule is relevant for a specific consumer at that moment.
  • the rule evaluator determines where to go to evaluate business objects. This may require invoking click-stream data stored within the historical database 143 and/or database 163 to see if the consumer has just completed a specific action on the Web site, or by using stored demographic or past buying data (stored in enterprise databases 143 or by using the artificial intelligence module 220) to determine the best Profile or Result to target. 6.
  • Al Module 220
  • the Al module 220 is the "brain” (or Al engine) of the intelligence system 200, and makes real-time recommendations and predictions based on consumer behavior models, continuously updating them with new information.
  • the Al module 220 more than any other component, is what transforms a standard e-commerce site into a dynamic learning and inference center.
  • the Al module 220 creates and employs intelligent virtual agents that are capable of automatically learning, utilizing and sharing learned knowledge to serve the client 154 using inference.
  • the intelligence system's 200 robust, scalable and distributed server architecture allows multiple virtual agents, learning independently, to form a unified intelligence entity, which acts as a single distributed "virtual brain". Learning scalability enables the intelligence system 200 to deal with the exponential growth of data and to learn large data sets rapidly.
  • Consumer behavior models are implemented as Bayesian network models. Unlike other learning models (e.g., collaborative learning and neural networks), the Bayesian network models can work with incomplete data cases and levels of uncertainty. Once deterministic or probabilistic evidence has been obtained, the probability distribution for an attribute or combination of attributes can then be determined dynamically, in real-time. These consumer behavior models are reusable and can be combined to form new models, saving on both data resources and learning power.
  • the model building process constructs attributes, defines their values and then gathers them into models according to logical relationships. These models are then inter-linked, based on their statistical coupling.
  • models include products/services and categories.
  • Bayesian network models are built off-line and tailored by intelligence system 200 and then customized for the enterprise's e-commerce Web site. Off-line learning involves running several intelligence system 200 programs with several input data files, resulting in a set of Bayesian models that include variables and probability connections between those variables, which may be expressed in tables.
  • the Bayesian models are adapted during normal system operation to reflect knowledge gained through on-line learning. As consumer-related events occur, they are logged into a database (e.g., database 143). At preset intervals, intelligence system 200 opens the database and puts the information gathered into data files. The data files are continuously updated as new events occur. On-line learning processes the files, updating the values in the models' probability tables. This action permits an e-commerce site to learn quickly and accurately about the trends and patterns of consumer behavior and activity.
  • a database e.g., database 143
  • intelligence system 200 opens the database and puts the information gathered into data files. The data files are continuously updated as new events occur.
  • On-line learning processes the files, updating the values in the models' probability tables. This action permits an e-commerce site to learn quickly and accurately about the trends and patterns of consumer behavior and activity.
  • the Al module 220 is comprised of two main parts, a machine learning system (MLS) 510 and an inference system (IS) 520, as shown in Figure 5.
  • the MLS 510 automatically creates Bayesian network models based on consumers' past data. These models are then used by the IS 520 as on-line intelligent resources. All of the following intelligent operations can be obtained from these models: prediction, classification, maximum expected utility optimization.
  • Both of the MLS 510 and IS 520 are designed using parallel distributed Java object- oriented code, allowing advantage to be taken of the distributed application server architecture.
  • the processes of machine learning (using MLS) and inference (using IS) are scalable. This implies that for making inferences the number of on-line sessions that require intelligent resources is scalable, while for machine learning the number of Bayesian Network Models is scalable with the number of computers (or servers).
  • the MLS 510 provides (1) automatic building of Bayesian Models (including attribute or feature extraction) based on an existing consumer-domain knowledge as well as on available data, (2) automatic learning of the built models (including structural/parametric estimation), and (3) automatic continuous adaptation of the Bayesian models based on new data and/or new domain knowledge, which includes the capability to incrementally change the structure based on data and add/delete attributes/links without loosing learned information from older data.
  • the MLS 510 merges two independent technologies; parallel distributed processing and advanced machine learning, in implementing an efficient, fault tolerant (via its distributed agents) and fully configurable system.
  • MLS 510 includes two main parts.
  • An off-line sub-system automatically builds Bayesian network models based on input data files, runs advanced off-line learning processes that produce the structure as well as the probability parameters for the models.
  • An on-line sub-system is deployed as any other regular plug-in on the server and updates the Bayesian.
  • the MLS 510 continuously operates in a background manner to adapt Bayesian models and to update them with new statistics based on the newly acquired data. Note that all references to the word 'off-line' versus the word 'on-line' mean that the off-line process is carried out using a program that need not be run on the intelligence system server 162, but rather on a separate Java machine, in the preferred embodiment.
  • the MLS 510 is able to provide a potentially unlimited amount of machine learning resources (limited only by the enterprise's number of computers) that are scalable and configurable, while at the same time supporting a broad range of dynamic, real-time adaptation of these resources from a variety of sources.
  • These resources are based on Bayesian network models (or Bayesnet models) that can learn to estimate joint probability distributions used for predictive inference and clustering (i.e. , unsupervised learning), which is used to discover groupings in the data.
  • MLS 510 is capable of using information from a variety of sources to construct and adapt the Bayesian network models.
  • the sources of information handled by the MLS for adapting the models include qualitative knowledge about the domain, e.g.
  • a key aspect of the MLS 510 is that the above adaptations can be carried out in an interleaving manner during the lifetime of the model. For instance, initially, a model may be automatically built by off-line learning.
  • the on-line learning processes can then take over and adapt the model over the next month of new data, as an example.
  • An individual e.g., an enterprise user 112
  • the on-line learning can then continue to adapt the model based on the new structure, while keeping all important probabilistic parameter information that has been learned during the past month. For instance, the probability distribution of "AGE” given "GENDER” will be maintained even if according to the new structure a new attribute named "LOCATION" has been added. 6.1.2 Distributed Machine Learning
  • the MLS is comprised of multiple autonomous Java "learning agents" (LAs), each of which is responsible for learning a single Bayesian network model.
  • LAs autonomous Java "learning agents”
  • the task of learning is accomplished by efficient parallel distributed processing starting from the lowest level of the learning process, which is filtering locally relevant data for each model, to the top level, updating the Bayesian network model at a central model repository (CMR), described in more detail in section 6.2 IS 520.
  • CMR central model repository
  • Each learning agent looks at the same continuous serial stream of incoming data (from serial.dat files). Each learning agent then extracts from the stream only data relevant to its model. The agent re-samples this serial data and converts it to parallel data cases.
  • the learning agent has at its disposal an inference engine (i.e., the same Java object discussed with respect to IS 520, described in section 6.2 herein) which is used to complete the unknown value based on an expectation-maximization (EM) algorithm, wherein an EM-algorithm is a well-known technique for estimation of statistical parameters with partially incomplete data.
  • an inference engine i.e., the same Java object discussed with respect to IS 520, described in section 6.2 herein
  • EM expectation-maximization
  • a model is updated to an extent that it can be used by IS 520 for making inferences (e.g., for providing recommendations to on-line consumers).
  • the process by which a model that just completed learning becomes available as an on-line resource is fully automatic and involves certain intelligent synchronization between MLS 510 and IS 520.
  • Learning agents cooperate amongst themselves and also with inference agents (IA) by responding to requests that change their set of attributes.
  • IA inference agents
  • An example of this takes place as part of the adaptation of the link structure that link "inference agents” (see section 6.2 IS herein).
  • an inference agent named HLA every so often, sends a message that is intercepted by all learning agents, which causes multiple pairs of learning agents to exchange local information, import/export attributes, and so on.
  • the augmented models then continue to be learned without the loss of previously learned probabilistic information.
  • Another example is the model update. Once a learning agent writes its latest version of a model to the CMR, the CMR notifies the appropriate inference agent about the new update.
  • the inference agent even if it is at a state of using an older version of the model, is able to read the new version of the Bayesian model and use it for new sessions. As can be appreciated, there is a high degree of cooperation between inference and learning agents. Together they form an adaptive system influenced by changing data and domain knowledge. 6.1.3 Machine Learning Processes Overview
  • MLS 510 is implements of two main processes: a one-time off-line building and learning process and an on-going on-line learning process.
  • the off-line process is an automatic process by which Bayesian network models are built using feature extraction.
  • the off-line process need not be run on the intelligence system server 162, since it is more of a background process.
  • hyperlinks are constructed that define the association between statistically related models, and an off-line learning process is performed to arrive at a fully learned model. Building a Bayesian model is an iterative process by which important attributes are determined, grouped and associated.
  • On-line learning is an incremental learning process, which reads only once through a given stream of data.
  • the primary responsibility of the on-line learning process is to ensure that the models are updated as time progresses. While the off-line learning process need not run on server 162, the on-line learning process does run on server 162, since it has real-time interaction (e.g. , updating) with models available to other real-time processes. For instance, the on-line learning process is able to automatically and quickly adapt Bayesian network models based on a real-time incoming click stream of data.
  • the on-line learning process employs algorithms that modify the structure of the Bayesian network models, as well as track the changing parametric statistics over time in order estimate each model's probabilities.
  • On-line learning is a local learning process. Each model gets learned and improves itself over time. Off-line building, on the other hand, does attribute extraction from data. This requires that the whole domain of attributes (not just those that are local to a single model at a time) be considered. As such, off-line learning is a more involved process and it requires iteratively reading the input data multiple times. While Bayesian network models may be built automatically using the off-line learning process, they could alternatively be constructed using on a partial or completely manual process. As an example, partial building could mean manually ascribing a structure (e.g., for instance a naive Bayesian structure), while automatically learning the probabilities.
  • a structure e.g., for instance a naive Bayesian structure
  • the on-line learning process can adapt models regardless of how they were originally learned, e.g. , through the automatic off-line building/learning process or manually.
  • the on-line learning process can also continue to learn models after modification, for instance, after a new attribute is added to the model, without loosing previously learned parametric and structural information.
  • the off-line learning process is comprised of the following stages (or steps):
  • serial.dat file that, as the name suggests, is a serial stream of contiguous blocks of data (referred to as serial cases).
  • Intrinsic attributes sets of attributes (referred to as "intrinsic attributes”), which are logically related (according to the hierarchy of codes defined in the totalproductinfo.dat) and a set of non-intrinsic attributes that have been determined to be statistically correlated with the intrinsic attributes; this process includes a "feature extraction” stage (as referred to in the art).
  • the MLS 510 of the Al module 220 provides an automatic way in which complex statistical models are learned, requiring very minor human intervention, e.g., only at the start of the process through the definition of the domain. This definition stage is carried out through a sequence of simple database operations that merge into a single file all of the products (or more generally Al entities) on which the Al system can learn and produce intelligent predictions.
  • the off-line learning process is configured for use by non-technical enterprise users and is the basis for a relatively quick on-site Al integration phase (i.e., integration of the intelligence system 200 in to an enterprise's system).
  • the on-line learning process adapts (i.e. , modifies over time) the Bayesian network models built by the off-line learning process with an aim of ensuring that the models' representations of their respective domains remains accurate as these domains change over time. This means that both the probability distributions of the attributes as well as interdependencies remain up to date.
  • the on-line learning process has the following key aspects, in the preferred embodiment: 1) Input data stream is fully compatible with the data used in the off-line learning process, so that off-line and on-line learning can be interchanged or repeated at any time.
  • LSR local statistical repository
  • This feature is well suited for a dynamic, data-rich, real- world environment, such as with e-commerce sites on the Internet. Incremental learning of parameters using a standard known algorithm and incremental learning of the structure of each model. 6) Provides an ability to learn with incomplete data, based on the EM-algorithm.
  • Inference is accomplished as a completion mechanism for learning, followed by a formation of multiple parallel probabilistic cases with a real-valued weight. These cases then update the LSR using the same process as regular cases coming from complete data. 7)
  • a configurable automatic "window-of-focus" permits the adaptation of models according to a decreasing level of importance for older data. This is constantly in effect, thereby making the new incoming data stream always more important for inference purposes.
  • HLA Hyperlink topology adaptation
  • the off-line learning process is the initial stage of the machine learning process.
  • the first file encapsulates the domain for the Al module (e.g., totalproductinfo.dat file).
  • the second file is a sequence of blocks (or "serial cases"), analogous to a series of consumer orders, each block listing the full path names of Al entities which are related to each other by the fact that they are in the same serial case (e.g., a serial.dat file).
  • each Al entity representation includes a comma-delimited sequence of codes.
  • An example of an Al entity having 4 codes is: ' Al ' , 'BAK' , 'BAKBAG' , ' 100343 ' .
  • the sequence may have variable number of code names for different entities.
  • An example of a code domain having a fixed-code of width equal to 4 that describes a product hierarchy is: Acode, Bcode, Ccode, productlDcode.
  • a product is an Al entity that has a full path code, e.g., Al, BAK, BAKBAG, 12911 (which indicates a bagel product).
  • the full path code shows that the product is from the Al category, BAK subcategory, and BAKBAG sub- subcategory, with a product ID code of 12911.
  • any string name can be used as a code name.
  • the bagel product comes from a defined Fresh- Food-Category having a code "Al ".
  • the bagel product also comes from a defined Bakery- Category having a code "BAK", which appears as a subcategory under the Fresh-Food category.
  • the data files which convey associated items, are put in the format of a list of associated full path codes that are denoted as the serial.dat file, in the preferred embodiment.
  • the two files form an Al domain and input definition according to which many real-world domains, whether they represent products, user profile values, etc., can be well-represented.
  • a serial stream rather than a parallel set of cases (as, for instance, required by other learning methodologies, such as neural networks), permits a variable length contiguous blocks of associated items. Consequently, the present approach permits a variable degree of evidence per serial case, which is typically the case in a real- world input data file.
  • serial case ⁇ A1.BAK.BAKBAG.1202, A2.ICE.APPL.493, B2.COF.SANK.303 ⁇ simultaneously effects the Al model, the Al.BAK model, the A2 model, the A2.ICE model, the B2 model and the B2.COF model.
  • the totalproductinfo.dat file is a listing of all valid Al entities and their respective code names.
  • Figure 6 A shows an example of sections of a totalproductinfo.dat file, including its format.
  • each Al entity has a code name consisting of 4 codes (the first 4 values in each line). The remaining 4 values are description names of each of the codes respectively.
  • Al entity code names may have a variable number of codes.
  • the serial.dat file is a single stream of data used both for off-line building and on-line learning. It is defined as any stream of consecutive Al entity code names appended by an order number.
  • An example of a portion of the serial.dat file is shown in Figure 6B. Regarding the example of Figure 6B, several points may be appreciated.
  • the serial.dat file is a collection of serial cases sorted by the order number of each Al entity, which is the last code in each line, wherein a serial case is a collection of Al entities.
  • the number of codes of an Al entity may vary from one Al entity to the next.
  • an attribute extraction technique includes the following key steps or aspects: 1) Collecting and forming sets of logically related attributes, i.e., according to their
  • each hyperlink has a value that is proportional to the mutual information between the two respective representative sets.
  • This coupling of models serves as a basis for a measure of information over the communication channel.
  • An inference agent when having the choice to receive information from multiple agents, decides from whom to receive, based on this coupling and also based on the evidence (or data) available in each of these agents.
  • Figures 7 A and 7B show examples of logical and topological relations linking models, respectively.
  • a topology of inference agents 700 is shown.
  • model A2 702 is connected to model Al.DAR 704, which suggests that there is a strong association (the strength of which is available to the two models) between patterns of product orders on the domain of model Al.DAR.
  • model Al .DAR 704 happens to represent dairy products ("DAR") and model A2 702 represents the domain of frozen foods.
  • DAR dairy products
  • Model A2 702 represents the domain of frozen foods.
  • model Al .DAR 704 is represented as a folder “Dar” 708 that is subordinate to (or included in) folder "Al " 706, wherein folder Al 706 has a close relationship with A2 710, i.e. , Al 706 is at the same level and proximate to A2 710 in the logical hierarchy.
  • the data serial.dat and totalproductinfo.dat files may be prepared using a stored- procedure database routine or some other external mechanism.
  • the corresponding model definition based thereon has a variety of characteristics. For example, given a corresponding Bayesian Network model by the name of "Top" (shown in Figure 7B), all of the first code names (i.e., the first code in each line of the totalproducitnfo.dat of Figure 6 A) are defined as attributes of the model Top. Additionally, all 3 consecutive code names are recursively obtained. As an example, let the first code be the model name, the second code be an attribute name and the third code be an attribute value for the respective attribute.
  • a Bayesian model by the name of Al is created with the following attributes: Al.BAK and A2.VGP.
  • the Al .BAK has the following values: 'BAKBAG' , 'BAKBRD' .
  • the attribute A2.VGP has the following values: 'FRZVGG', 'FRZVBE'.
  • a model by the name of Al.BAK is created with the following attributes: Al.BAK.BAKBAG and Al.BAK. BAKBRD taking the values: ' 100335' to ' 100357 * and ' 100366' to ' 100376' respectively.
  • an "intrinsic attribute" of a Bayesian model is an attribute whose name consists of the model's name as a prefix followed by an additional single code name.
  • Al.BAK.BAKBAG is an intrinsic attribute of model Al.BAK.
  • model M For a specific model X from the set: a) For each model M ⁇ X (where model M represents each model in the set that is not model X): i) Compute the score S(X, M) and store in list LI . b) Let there be d hyperlinks between model X and d other models M whose score S(X, M) is in the top d scores in the list LI where a hyperlink is created using algorithm exchange Attributes (see the exchange attributes discussion below). c) Store the d hyperlinks in list L2. d) Prepare a representative set for model X and for these d models.
  • model X For each model X: a) Create and print into a file cases.dat in the respective directory of model X the cases for model X (see case created for the model discussion below), b) Print the attribute information for the model X into a file varinfo.dat in the respective directory of model X.
  • model X import the attributes listed in Lm and label them as non-intrinsic attributes for X.
  • model M import the attributes in list Lx and label them as non-intrinsic attributes for M.
  • Attribute Al.BAK has ⁇ [ ⁇ ULL.O], [ BAKBAG, 0.5], [BAKBRD, 0.5] ⁇ , ii) Attribute Al.FRT has ⁇ [NULL, 1], [FRTA, 0], [FRTB, 0] ⁇ , iii) Attribute A2.VGP has ⁇ [NULL, 0], [VGPFZZ, 1], [VGPA, 0] ⁇ , iv) Attribute A2.ICE has ⁇ [NULL, 1], [ICEC, 0], [ICEYOG, 0] ⁇ .
  • a single hyperlink defines a connection between MODEL1 and MODEL2, based on a shared set of attributes whose names is listed under SHVARS.
  • the link has a coupling which is defined as the mutual information I(S1; S2), wherein SI is a subset of shared variables originating from MODEL 1 and S2 is a subset originating from MODEL2.
  • SI is a subset of shared variables originating from MODEL 1
  • S2 is a subset originating from MODEL2.
  • DRV models are used for predicting the probability of a NULL and then distributing this prediction as derived evidence to the associated models.
  • hyperlinks are not used for connecting the DRV models.
  • the cases.dat files are formed based on a conversion process from a serial serialsession.dat file to parallel cases that reflect the time of entry into a subcategory, for each subcategory of a given category.
  • serialsession.dat file An example of a serialsession.dat file is shown in Figure 9, wherein its format is similar to the serial.dat file.
  • the last code name in each line represents a session number.
  • the serialsession.dat file lists Al entity code names in an incremental fashion. For instance, with respect to session No. 0, the file indicates that the consumer visited category TOP, then chose a link taking him to subcategory Al.BAK.BAKBAG (this counts as the first entry to category Al).
  • Off-line learning picks up where off-line building left off. This is a standalone routine that, from the parallel data cases produced by off-line building, learns a structure and estimates the probability parameters of a Bayesian model.
  • the algorithms are primarily extensions of well known, but yet advanced, statistical principles, such as statistical model selection based on m d 1 (MDL) criterion.
  • MDL m d 1
  • the structured learning process constructively forms an initial structure. It picks out a representative set for a model, then links the rest of the attributes based on their coupling to this set. Special care is taken when a set of attributes is said to be of particular interest for prediction.
  • n the number of attributes in model X.
  • Pa_i) be the conditional entropy of Ai given the set of parents Pa i (for the definition of conditional entropy see T. Cover & T. Thomas, An Introduction To Information Theory, (John Wiley & Sons 1991).
  • MDL-score(X) ⁇ i (log(n) + log Comb(n, #Pa_i) ) + Vi log(N) ⁇ i ( ⁇ Paj ⁇ ( ⁇ Ai ⁇ - 1) ) + N ⁇ H(Ai I PaJ).
  • a DRV model has two types of attributes, NULL and TIME with an outgoing link from NULL to TIME. Thus it has a fixed structure, as shown in Figure 10.
  • the NULL attribute takes two values ⁇ NULL, OTHER ⁇ while the TIME attribute takes values ⁇ 0,1,...,M ⁇ , for some constant M > 0.
  • a value of "OTHER" indicates that the consumer did enter the subcategory and the associated time of entry is presented as a value to the corresponding TIME attribute.
  • the reason for having NULL feeding an arrow to TIME is for viewing the NULL as the class variable to be predicted.
  • the model presents two possible class conditional distributions over the domain of TIME. The first is P(TIME
  • NULL NULL) and the second is P(TIME
  • NULL OTHER). Learning processes take care of estimating these two distributions.
  • off-line learning simply learns the probability estimates for each of the conditional probability tables of the DRV model as it learns any other "regular" model.
  • MLS 510 includes the following aspects:
  • Bayesian models can be changed structurally, i.e. , links and attributes alike, with minor loss of parametric information that has been previously computed based on past data.
  • HLA histoneum
  • ORB ORB
  • VoyagerTM by Objectspace, Inc. of Dallas, TX
  • VMs or JVMs Java virtual machines
  • a component is a modular sub-part of the on-line MLS learning system or of the AI system as whole.
  • QRCTM agent is an XML-based object-oriented communication messaging protocol
  • a table 1400 describing various QRCTM agents are shown in Figure 14. Different AI components communicate using this protocol.
  • Client stands for any external event-generating server, for instance, an Internet server that passes "product-order" events made by on-line users (e.g., client 154 and consumers 110 of Figure 2). The interaction between these components can be appreciated with respect to the on-line
  • FIG. 12B is a partially decomposed portion of Figure 12A and the interactions between the MLS on-line learning system components are described in table 1300.
  • table 1300 indicates, in row 1310, that the learning manager (LM) 1202 sends a QRCTM agent to the CMR 1208.
  • Interaction Type of each component is with respect to another component (not a component's partner). Additionally, with respect to table 1300:
  • - Admin stands for an external plug-in used to administrate/configure the on-line MLS
  • each LU accesses the same serial.dat file stream (e.g., input data files 1212 of Figure 12B). After completion of processing the current serial.dat file, each LU polls to check if a new serial.dat file has been placed.
  • the format of the serial.dat is identical to that used for off-line building (see Figure 6A).
  • the LU's input provider forms parallel cases from each serial case using the same mechanism as described in section 6.1.6 above for creating cases for models.
  • Each LU 1206 contains a tree-like data structure that holds statistics necessary and sufficient to estimate the probabilities of any of the conditional probability distributions of the relevant model.
  • a structure known by the name of AD-tree 1214 see A. Moore & M. S. Lee,
  • each learning agent uses a standard state-space search strategy to find an optimal (in the local sense) structure for the model which has an optimal MDL score (see, for instance, D. Heckerman, A tutorial On Learning With Bayesian Networks, MicrosoftTM Technical Report MSR-TR-95-06 (1996)).
  • the search is similar to that described with respect to Figure 9, except instead of basing the estimate for the conditional probability distributions on the cases.dat file, the learning agent uses the statistics in its LSR. For a given current structure, the probability estimates of each of the conditional probability tables in a model are computed based on the contents of the same LSR.
  • a learning agent writes its updated model to the CMR via the communication between its LU and the CMR (see Figures 12A, 12B, 13A and 13B).
  • the frequency of updates is configurable via a parameter in the MLS configuration setup on the server 162 of Figures 1 and 2.
  • a learning agent computes its model's representative set using the same mechanism as described with respect to the algorithm for preparing rep-sets in section 6.1.6 above. All probabilistic knowledge for computing the scores of each attribute is available to the learning agent 1216 based on the content of its LSR.
  • a Bayesian model that has been updated and written is saved in an XML format which can be read off using the JavaBayes editor (JavaBayes is a public domain Java object editor provided by the Free Software Foundation, Inc., Cambridge, MA and useful for creating and maintaining object oriented Bayesian Models).
  • JavaBayes is a public domain Java object editor provided by the Free Software Foundation, Inc., Cambridge, MA and useful for creating and maintaining object oriented Bayesian Models.
  • manual changes to Bayesian models may be accomplished, such as adding new attributes or deleting existing ones, or adding/deleting values of attributes.
  • the MLS on-line learning process is capable of reading (via the CMR) the XML file, building the Bayesnet model (or Bayesian Network Model) based on its newly changed structure.
  • the LU 1206 of the particular model ensures that any new serial cases will get translated into parallel cases (as described above in this section) that conform to the new structure.
  • the LSR does not loose any statistical information that was learned for the prior structure of the model and which is still needed by the new structure. The only statistics deleted are that are not necessary, based on the new structure. The algorithm that determines which statistics to hold and which to delete from the LSR is described above in this section with respect to the structured learning process. 6.1.13 Learning with Incomplete Data
  • the MLS on-line learning system 1250 can learn from partially-complete cases.
  • the corresponding algorithm which is an implementation of the well-known EM algorithm, is as follows:
  • An external configuration file identifies some models as "models with incomplete cases". For an attribute of such a model, a NULL value is never inserted by the learning unit's input provider. This is true both when this attribute appears as an intrinsic attribute in the model itself or as a non-intrinsic attribute in some other model.
  • the learning agent employs an object called the Learning Assistant 1218 which utilizes the latest version of the learned model in conjunction with an inference engine described in section 6.2 below.
  • the learning assistant 1218 Given a case, the learning assistant 1218 first sets respective values for all attributes having a non-NULL? value. The learning assistant 1218 then computes a joint probability distribution P for the subset of attributes that have a NULL? in the current case.
  • the learning agent 1216 transforms the original case c into k complete cases, each having a different combination comb of values over the attributes that originally had a NULL? value.
  • the learning agent 1216 increments its local statistics repository (LSR) due to each case, but with an increment not being 1, but rather being the weight (which is a number between 0 and 1).
  • LSR local statistics repository
  • Hyperlink Adaptation (HLA) algorithm enables the adaptation of the hyperlinks used to connect the inference agents described in section 6.1, and is as follows: 1) Let there be an existing collection of Bayesian models, Ml, M2, ..., MA: fully learned and deployed as part of the on-line MLS.
  • HLA a model named HLA be defined as follows: define attribute HLA.Mt in the HLA model to correspond to a model Mt in the above collection.
  • the attribute HLA.Mz takes two possible values ⁇ T, F ⁇ (i.e., true or false).
  • model HLA is manually defined so that it is sufficient to represent every possible joint probability distribution over any possible pair of models Mi and My.
  • serial.dat file is augmented using the following routine, for each serial case: a) determine the list of models everyone of which has at least one AI entity code that has the model as a prefix. Let this list be denoted as L. b) For each model X in L do: i) Form the AI entity code HLA, X, T. ii) Append the AI entity code to the end of the current serial case.
  • model HLA learns and the corresponding inference agent (IA) for model HLA can compute any pair-wise joint probability distribution P(HLA.A , HLA. A/).
  • a module called HLAM uses the HLA IA for computing the mutual information I(HLA.Az ' , HLA. Aj) between all pairs of attributes.
  • the HLAM creates a new list of hyperlinks L as follows: for each model X, connect the d highest coupled models to it.
  • the HLAM delete duplicate hyperlinks from L. 12)
  • the HLAM considers once in turn each of the learning units 1206 and repeats for each one the following: a) The HLAM sends the list L via QRCTM agent message to the learning manager 1202 containing the currently considered learning unit 1206. b) Once having received L, a learning unit 1206 instructs its learning agent 1216 to: i) Delete non-intrinsic attributes that came from currently linked neighboring models, ii) Import the new attributes from the representative set of the new model to which it is to be connected based on the list L. c) The LA 1216 imports these attributes from each of the d to-be-linked neighbors in a similar manner to the model building algorithm described in section 6.1.6.
  • the inference system (IS) 520 parallelizes the task of inductive inference.
  • Inductive inference is the process of making intelligent decisions/predictions given observations (also referred to as evidences) which have not necessarily been part of an original data set used to learn a Bayesian model.
  • the first is the evidence channel, which is an event-based, asynchronous channel by which observations are provided to IS 520.
  • the observations are provided, in parallel, for any subset of attributes of any of the Bayesian models.
  • the second channel provides the means of obtaining the prediction/decision output from any of the Bayesian models. This second channel is based on an interface that defines several types of outputs, which are called recommendations.
  • the inference system 520 merges two independent technologies, parallel distributed processing and artificial intelligence, in implementing an efficient, fault-tolerant and accurate prediction/decision inference process.
  • the IS has the ability to provide a potentially unlimited amount of AI resources (i.e., limited only by the enterprise's number of computers), which are scalable and configurable, while simultaneously keeping a dynamic and efficient representation of the ever-changing state of the domain (e.g. , simultaneous on-line user actions over multiple sessions) such that accurate predictions (i.e., recommendations to the users) can be promptly and reliably made.
  • the IS 520 is a main part of a single unified artificial intelligent "brain" from which AI inference is performed.
  • IS 520 comprises multiple autonomous inference agents (IAs) that utilize the Bayesian models created by the MSL 510 to make the recommendations, wherein inference is performed using an intricate collection of localized algorithms (local at each agent). Inference agents cooperate, that is, they share "beliefs" on the current state of the "world”. The inference agents listen to each other, taking various measures of confidence and information values into account before deciding on their recommendations.
  • IAs autonomous inference agents
  • IS 520 uses Bayesian model networks to accomplish inference with partial evidence.
  • Bayesian model networks to represent dynamic, complex (e.g. , e-commerce) applications suits well with the dynamic nature in which information appears and changes in a typical on-line product-recommendation application, for example.
  • the IS is configured such that it can give recommendations from the very start of a consumer 's session, even if, for instance, personal information about the user is missing and all that is known about the consumer is learned from current actions made on the site.
  • One of the primary advantages of having a distributed AI system 220 is in its ability to learn concepts on a multi-resolution level. This provides a multi-degree of inference about concepts. For instance, consider inferring about the concept of "Milk” . This concept is, on a broad level, related to product families such as "Frozen Foods”, “Packaged Foods”, “Drinks” and so on. In the preferred embodiment, the representation of milk takes place over several Bayesian models, each model having a context in a different inference level. As discussed in section 6.1, these models are automatically built using the MLS 510. These simultaneous multiple context levels for the same concept, for all concepts in the domain, permit a rich variety of inference capabilities that suit different states of evidences and thereby increase the chances that even with partial evidence highly accurate predictions are possible.
  • Most e-commerce applications involve dynamically changing information. For instance, consider an on-line session of a consumer on an e-commerce Web-site. The consumer goes from one link to another, continuously changing the location on the site, adding and deleting products to a shopping cart. In general, not only the static information about the consumer, such as his gender, age and demographic information, is important for predicting his product liking, but also dynamically changing information.
  • dynamic inference state representation allows all types of evidence, whether they originate from static database tables or from rapidly changing user click data, are taken into account prior to product recommendation or, more generally, prior to any AI prediction.
  • ISR Inference State Representation
  • any user behavior for instance expressing a variable degree of interest in a product, which can be formulated either based on direct consumer-related events or indirect effects (e.g. , time of entry to a category, pattern of category entries), can provide an important extension of the Bayesian models.
  • the present invention includes a capability for deriving evidence that is based on predictions made by a separate set of models.
  • These models are denoted as Derive Models (DMs or DRV models), and are not used for predictions geared for recommendation to the outside, but rather, only for Al-internal purposes.
  • the DMs can be defined to monitor various events, such as category-change events, product-order events, events that reflect a user's level of interest (for instance, the amount of time a user spends at a particular category), and are used to predict various internal attributes of the recommending Bayesian models (i.e., the models that are used for external recommendation to the on-line user or to the business marketers). Even so, the DMs are in many respects similar to the recommending models; they are off-line as well as on-line learned, thus can be updated over time.
  • the IS 520 generates evidence which does not directly come from the external world, e.g., does not reflect the purchase of a product or the change to another category on the site, but yet is usable by the recommending models.
  • This evidence is probabilistic with varying degrees of confidence and is available for the same recommending models that usually get their evidence deterministically directly based on external events.
  • a recommending Bayesian model which has been learned and is being used for inference based on direct evidence, can also utilize this other source of information, the DMs, which derive evidence based on more subtle combinations of events or the absence thereof.
  • inference agents The purpose of inference agents is to deliver intelligent predictions and decisions concerning any subset of a fixed set of variables in a fixed and defined Bayesian model, while automatically and continuously collecting as much evidence as possible from different information sources which include, but are not limited to, dynamically changing information from inference agents, static information residing in a database, evidence due to real actions taken by the current on-line consumers via server 152.
  • An inference agent is, in the preferred form, a complex self-autonomous Java code utilizing an inference engine (IE) for computing probabilistic inferences that are used for delivering predictions on variables in the Bayesian model network.
  • the IA has various interfaces which connect it to the various sources of information, based on which the IA receives (and also sends) probabilistic information as evidence. Also, the IA employs several algorithms for information communication, all of which aim at improving the value of information that it receives. These algorithms include probability-distribution emphasis mode (which may be viewed as a filtering action) and absorption from more informative peers. 6.2.4.1 Classification and Prediction Modes
  • the most common prediction mode used with Bayesian model networks simply provides the probability distribution of the targeted variable. For doing classification (as is known in the pattern recognition field) the well-known Maximum Aposteriori Probability (MAP) decision process can then be applied.
  • MAP Maximum Aposteriori Probability
  • the inference agents of the present invention can predict and classify in this manner.
  • the IS 520 also implements a score-based classification mode that provides, for such Bayesian model networks, a higher rate of accuracy than the classical MAP decision process.
  • the score-based classification mode is referred to as the DELTA classification mode.
  • each is a different formula that assigns a real-valued number to every of the queried-attribute's values. The decision is then to choose the value with the largest score.
  • an inference agent first computes the no-evidence probability distribution for each variable in the Bayesian model, which is denoted as P no-evid.
  • P no-evid the probability distribution for an attribute X is denoted as P_evid(X).
  • Each inference agent constantly distributes probabilistic evidence to a fixed set of peers, as soon as it receives deterministic evidence that comes from the inference manager (IM) or evidence from a peer agent. This is conducted passively, independent of recommendation requests that may arrive at any time to any of the agents.
  • the purpose of this process is to make the IS agents (or inference agents) as aware as possible of the dynamically changing evidence simultaneously for all on-line sessions on the server 162, so that when a prediction request for any particular agent arrives, the chances are that the corresponding agent has evidence, even if the user did not necessarily act in a direct context of the agent's model, e.g., agent Dairy will still have evidence even if a user did not buy a Dairy product.
  • a communication network links all inference agents to their peers and to the source of evidence.
  • the inference manager serves as an interface to server 162, which is a source of evidence.
  • server 162 which is a source of evidence.
  • This is not a communication network as used in the field of data communication, but is rather a logical software-based graphical representation of the relationships (i.e. , links) between agents.
  • probabilistic information in the form of Java objects representing discrete probability distributions are passed.
  • One IA passes its 'belief to another IA. For instance, evidence due to a placement of a Dairy product in the session "basket" can become evidence that the user most likely is not interested in Meat products, but is interested in Fresh Cheese products.
  • This evidence is dispersed by the Dairy Agent passing its belief to the Meat Agent and to the Fresh- Food Agent.
  • the present invention includes various features related to making information pass efficiently amongst the agent network, such as cycle avoidance to avoid redundant message passing and caching of previous probabilistic evidence distributions.
  • each inference agent determines when and from whom to request evidence. This is dynamically determined based on several considerations, such as the amount and value of evidence that the agent peer has obtained. The value relates to how many stages (i.e. , agents) the evidence propagated through before it influenced the peer agent and also to the statistical coupling between the two inference agents.
  • each inference agent in the intelligence system has a pool of inference engines (IEs) on which it runs all inference computations. Having multiple identical service providers, such as these engines, permits concurrent (or parallel) processing, which can reduce significantly the performance times.
  • each inference engine is a Java implementation of a well-known sum-product algorithm, specifically known as the bucket elimination algorithm. This algorithm is extended by several ways in the present invention, as described below.
  • One of the most important features of the AI inference system 520 is to be able to predict in the domain of one context when evidence is present in other contexts. For example, in the e-commerce field this may be referred to as doing cross-category recommendation.
  • each inference agent uses a single Bayesian model for prediction. This model has a particular context and its attributes are features that describe this context in various ways. For instance, an agent for dairy foods has attributes that characterize dairy products. Therefore, there is a certain context attributed to every agent. Additionally, the agents in a pool of inference agents cooperate by passing probabilistic messages to each other in order to give a cross-context prediction capability.
  • Agent Hopping pertains to the formation of a broad-recommendation to the on-line user based on the collective inferences of many inference agents, while also adhering to the main objective (as always) that the recommendation is based on prediction, which is based on the evidence at hand and thus fits the particular user.
  • the communication of these inferences occurs as a function of the relationships between inference agents.
  • inference agents are related. The first is a logical relationship, which is taken directly from the underlying logical hierarchy according to which the context categories of all agents are interrelated.
  • the second is a topological relationship, which is based on a link structure that is statistically formed during the operation of the off-line machine learning system.
  • Figure 7 A can be used to depict a topology of inference agents.
  • agent A2 702 is related to agent Al.DAR 704, which suggests that there is a strong association (the strength of which is available to the two agents) between patterns of product orders on the domain of Al .DAR (which represents dairy products) and on the domain of agent A2 (which represents frozen foods).
  • Figure 7B can be used to depict the logical hierarchy of the inference agents.
  • the Agent Hopping algorithm carries out a hopping sequence through the topology of inference agents.
  • a sequence of agents is automatically selected, for instance, agent A2.BRK, Al .MTS, A2.ICE, etc., and eventually a recommendation is made based on the available evidence of the inference agents selected (i.e. , not hopped over).
  • the selection process is based on a dynamically controlled series of tests that are made once the hopping reaches a certain agent.
  • the tests include:
  • the hopping path is state or evidence dependent, i.e., it is not hard-wired but rather each session may lead to different hopping paths since each session may have a different state of evidence.
  • Agent hopping is used to form a broad recommendation that spans the recommendations of multiple agents and aggregates them into one recommendation.
  • the intelligence system 200 may make a prediction or recommendation on the same context by more than one agent by using "agent voting" .
  • Agent voting For instance, suppose there is a variable named abandoned which may take a value of "yes” or "no" and which indicates whether an on-line user will abandon his/her electronic shopping cart and not buy at all. Let there be a network of n agents each with a different context, while all having the attribute abandoned in addition to their specific context attributes.
  • An "agent voting” algorithm combines the weighted predictions of all n agents on the variable abandoned and outputs a single yes or no prediction. The algorithm takes as a weight of an agent the confidence that it has in its prediction. This confidence is defined as the Ll-norm of the score vector (see Section 6.2.4 regarding classification and prediction modes). The algorithm then uses one of the following possible decision modes:
  • Majority Vote outputs the most common decision amongst all n agents.
  • a QRCTM agent is an XML-based object-oriented communication messaging protocol, a table 1600 describing various QRCTM agents is shown in Figure 16. Different AI components communicate using this protocol.
  • Client stands for any external event-generating server, for instance, an Internet server that passes "product-order" events made by on-line users (e.g., client 154 and user 110 of Figure 2).
  • table 1500 indicates, in row 1510, that the inference manager (IM) 1710 sends a QRCTM agent to CMR 1708 to request a net structure object.
  • IM inference manager
  • table 1500 indicates, in row 1510, that the inference manager (IM) 1710 sends a QRCTM agent to CMR 1708 to request a net structure object.
  • the "Interaction Type" of each component is with respect to another component (not the component's partner).
  • table 1500 indicates, in row 1510, that the inference manager (IM) 1710 sends a QRCTM agent to CMR 1708 to request a net structure object.
  • - Accept/Send means component does this action.
  • - Invoke Method means component invokes method locally (not through VoyagerTM)
  • the CMR 1708 (denoted as 1208 in Figures 12A and 12B) is responsible for maintaining all the AI Bayesian models and serves as the librarian of the AI system 220. Both MLS 510 and the IS 520 need to access the Bayesian models and they do so via CMR 1708.
  • the inference agents' primary interaction with CMR 1708 is in ensuring that they have the most recently updated Bayesian models so that their recommendations will be more accurate and up to date.
  • the IS 520 interaction with CMR 1708 is in reading models.
  • the MLS 510 includes multiple learning agents and is concerned with updating models by writing them to CMR 1708.
  • the CMR maintains an XML representation of an agent-network topology (also referred to as the net) of the inference agents, e.g., 1730, 1740, and 1750.
  • This topology dictates the interaction and communication direction between pairs of inference agents 1730 and 1740, for example, and is based on the findings of an "automatic building" process, which is the initial process of MLS off-line learning.
  • the CMR reads the net hyperlinks XML file, produces a net flow file, and builds a net structure object.
  • CMR 1708 receives a request (via QRCTM agent) to supply the net structure to IM. This process is discussed in more detail below with respect to the scenarios of the IS.
  • the Bayesian models are updated by MLS 510, wherein CMR 1708 is in charge of sending messages of model updates to the IM 1710 (or IMs).
  • the IMs 1710 let the relevant inference agents, via remote method invocation on the CMR, obtain the actual updated models, which is also discussed in more detail below with respect to the scenarios of the IS.
  • Files containing the Bayesian models are organized within a directory hierarchy.
  • the net structure object e.g., a hash table
  • a flow-handler computes the direction of evidence flow over each hyperlink.
  • FIG. 7A An example of a flow network is depicted in Figure 7A.
  • the agent A2 has a downstream neighbor A2.EN2 and an upstream neighbor being Al.DAR. 6.2.6 Inference Manager (IM) 1710
  • IM 1710 is the gateway for its set of agents to the rest of the server. As indicated in table 1500 of Figures 15A-15C, there may be several IM's. Sessions are split over these IMs so that each session obtains its intelligent resources from a single IM 1710. Each IM has its own set of inference agents 1730 and each inference agent serves all possible sessions that are served by its respective IM 1710. Each IM receives QRCTM agents from other AI sub-parts, such as the CMR 1708, learning managers 1202 (see Figures 12A and 12B), AI business objects, which interface IS 520 to other non-AI sub-systems. As shown in table 1500, the IM receives QRCTM agents of the following types: 1) user driven events, that originate from the clients; 2) recommendation requests from AI business objects; and 3) model update notifications initiating from the CMR.
  • QRCTM agents of the following types: 1) user driven events, that originate from the clients; 2) recommendation requests from AI business objects; and 3) model update notifications initiating from the CMR.
  • an IM 1710 delegates the responsibility for the actual handling of the events to its inference handlers (IH) 1720 and 1722, and each IH runs in its dedicated Java thread.
  • the his 1720 and 1722 are maintained in a blocked size pool that permits a balanced parallel distribution of all requests handled by a single IM 1710.
  • the IM keeps a representation of its net topology of agents, wherein different agents may be placed on different Java Virtual Machines.
  • the IHs need to know about the location of these agents in order to perform global operations which involve more than a single agent, for instance, to cause voluntary absorption between pairs of agents, to query multiple agents in a sequence based on the Agent Hopping module, and so on.
  • the structure of the net topology of agents is represented as a list of URL proxy addresses of all inference agents of the relevant IM 1710. Every IM uses its respective hyperlink file as the source of knowing which inference agents to create and deploy on line. Each IH, in a pool of IHs, is in charge of activating the inference agents for answering a recommendation request. A hyperlink is defined between two agents, or more precisely between their respective models, for which there needs to be a link of communication. Hyperlinks are represented in the AI system 220 both in memory inside the data structure of the IMs and also in an XML file format is shown in Figure 8.
  • the file of Figure 8 is read by the CMR and can either be formed manually or be generated via the MLS off-line model building process discussed in section 6.1.8, given the proper data and domain definition files.
  • a single hyperlink defines a connection between MODEL1 and MODEL2 based on a shared set of attributes whose names is listed under SHVARS.
  • the link has a coupling that is defined as the mutual information I(S1; S2) where SI is the subset of shared variables originating from MODEL1 and S2 is the subset originating from MODEL2.
  • the IH 1722 is responsible for performing operations on the set of inference agents (e.g., IA 1730, IA 1740, IA 1750) dedicated to its IM 1710.
  • the IH handles operations such as business object recommendation requests , which may involve a series of operations on different inference agents.
  • the result returned from each inference agent could determine the selection of the next inference agent upon which to operate.
  • This form of activity requires the IH 1722 to run asynchronously from IM 1710, keeping IM 1710 free to process further incoming requests. To do so IH 1710 operates in its own dedicated thread, using IM 1710 services (i.e., methods) to access and traverse the agent network.
  • an IH When handling QRCTM agents that require a reply (such as recommendation requests arriving from AI Business Objects), an IH is responsible for sending the reply to the intended listeners. Since some of the high level operations of IH 1722 could be application dependent, it is possible to create specialized IH Java objects that inherit all characteristics from the general IH, but that also contain additional application or client specific capabilities.
  • Each inference agent (e.g., 1730, 1740, and 1750) is responsible to deliver predictive inference capabilities based on a single Bayesian model. Furthermore, each inference agent serves all on-line sessions to which its IM is in charge of delivering intelligent services. Each inference agent 1730 remembers the state of evidence for each such session separately, so that when doing predictive inference it gives out special recommendations which are personalized to the profile of the particular user (e.g. , user 110) on each session.
  • the main tasks of an I A are: 1) Loading its model, ensures prompt replacement of model when the CMR has updated it. 2) Answering predictive queries about any attribute in its model, understands the various recommendation requests through an inference API, and handles these requests for every session. 3) Maintaining the state of evidence, deterministic and probabilistic, concerning the model, for every session. In order to handle multiple sessions, an IA keeps a list of inference session objects that record and maintain the state of evidence for each session, discussed above in section 6.2.7.
  • Evidence is propagated by first converting the evidence to a probability distribution over one or more attributes. Both the sender and receiver of this evidence must have a set of shared attributes over which this distribution is well-defined. Each inference agent keeps a list of neighbors, including the direction of evidence flow from/to each of them. Session evidence (which is client driven) enters the server 162 and gets sent to the appropriate inference manager (e.g. , IM 1710). The IM sends the evidence (via remote method invocation) to one or more inference agent as "Real evidence", namely, evidence which directly reflects a real event that the on-line user 110 has generated, e.g. , product order, category change, etc..
  • Real evidence namely, evidence which directly reflects a real event that the on-line user 110 has generated, e.g. , product order, category change, etc.
  • the real evidence is immediately translated into deterministic or probabilistic evidence, in the case where there needs to be multiple values placed on a single attribute.
  • the inference agent uses its IE (e.g., IE 1734) to re-compute the probability distribution over all shared sets with all its neighboring inference agents.
  • the inference agent passes (i.e., propagates) evidence only to those neighboring inference agents which are "downstream " with respect to it, based on the flow structure.
  • Each IA can handle three forms of evidence input:
  • Probabilistic evidence is transferred as a list of discrete functions, whose product is the probability of the shared variable(s). This multi-function form is done to avoid the creation and transformation of high-dimensionality probability functions.
  • the inference agent When receiving new evidence the inference agent performs the following steps: 1) It identifies the inference session object for which new evidence has arrived.
  • the involuntary evidence propagation mechanism described in the last section aims at spreading information-rich evidence to as many inference agents as possible.
  • the mechanism operates in a background manner and information follows in the direction of the flow structure. This flow direction ensures that an agent which is more coupled, i.e. , has neighbors with a high mutual information coupling over its shared attribute sets, gets to be the sender of information to its downstream neighbors. As indicated before, this operation is independent of the query activity that agents undergo.
  • the quality of the evidence plays an important role here, since evidence is probabilistic and thus has a certain information-content, i.e., parts of which are noisy. For this reason, there is an extended ability by which an inference agent can voluntarily absorb evidence from "trustworthy" neighboring agents.
  • the meaning of trustworthy is based on a combination of the coupling of the hyperlink between the agent and its neighboring agent and also on the confidence in the neighbor at the time of absorption. It is also a function of the strength of evidence that the neighbor has. The latter being inversely proportional to the distance that it propagated since the time it was first placed as real evidence at some terminal point in the agent network. Voluntary evidence absorption is performed by an inference agent prior to performing one or several possible inference computations.
  • the idea is to absorb evidence from a set of agents that does not include the agent's upstream neighbors, since they already involuntary propagated evidence to the agent.
  • the absorption is performed recursively so that an agent X receiving an absorption request from agent Y performs an absorption itself from its own neighbors (not including Y) prior to computing the probabilistic evidence that it returns to Y.
  • a timestamp for each probabilistic evidence is kept, which is the value of a time counter of the agent that originally sent it (note that an agent maintains a time counter per each session).
  • the absorbing agent Y When the absorbing agent Y enters a request to X, it sends to it the timestamp of the last evidence that it got from X. Upon receiving this absorption request, X performs the following:
  • the agent If the received timestamp is not older than the timestamp of the last evidence received on this session from this particular neighbor, then the agent returns with an indication that no absorption is necessary. Again, this is because the requesting agent already has the most recent belief (or evidence) based on its timestamp.
  • the agent checks its cache for the requested calculation result, and returns it if it is found.
  • the session object is updated, i.e. , the agent X changes its internal state due to the absorbed evidence.
  • the I A When receiving a simple query request the I A will perform the following: 1) Get a snapshot of the session's state, which includes all of the session's information. The idea here is to capture the state of the session, making the calculation that is about to be performed insensitive to incoming evidence or ongoing concurrent queries. 2) If a result for the requested calculation is found in the snapshot cache, then it is immediately returned as the result of the computation, which saves the need to reactivate the inference engine and re-compute the same prediction again.
  • JBH JBayes Handler
  • the returned objects supplied to the inference manager are lists of values obtained from a predictive query operation, and are referred to as recommendations.
  • a recommendation has both a value and a score, for example, ⁇ Coffee, 0.56 ⁇ .
  • a recommendation can be given on any value of a Bayesian model's attribute.
  • the recommendation object contains the following information:
  • Subject of recommendation the actual recommended value (e.g., Al.BAK.BAKMUF.10499 )
  • Context of recommendation the attribute whose value was recommended (e.g., Al.BAK.BAKMUF) 3) Local score: the score of the recommendation indicating the strength of belief in the specific value of the attribute, (e.g., 0.343).
  • Context score this is a vector of scores indicating the strength of belief in the context in which the recommendation is given. If the recommendation was performed after a certain traversal of the agent network, the context score represents the strength of belief in each traversal step, (e.g., ⁇ 0.33 , 0.56, 0.77 > ).
  • a group of recommendations is kept in a modular object called a RecommendList.
  • a RecommendList can be collected at a given Bayesian model or during the execution of a high level recommendation algorithm such as the Agent Hopping algorithm.
  • the recommendation list is sorted by context score as a primary key and local score as a secondary key. Other forms of sorting are also possible.
  • the inference currently handles five types of recommendation requests:
  • Var-Recommend A recommendation on the values of a specific variable.
  • Shallow-Recommend A recommendation on the attributes of a model. This recommendation is produced by performing a Var-Recommend on the Class variable of the model.
  • Deep Recommend A recommendation on a selected number of intrinsic attributes of a model. First the selected attributes are chosen by performing Shallow-recommend on the model, then a Var-Recommend is performed on all of the chosen attributes. 4) Agent_Hop_recommend: A higher level recommendation that is produced by an algorithm that traverses the agent network. 5) Agent Vote recommend: A group of agents that can predict a given attribute are activated. The final recommendation result is taken as a function of the confidence of each of these agents and the number of agreeing agents. All recommendation results are returned in the form of a recommendation list.
  • PROBABILITY mode The local score of recommended subject values is determined by their probabilities.
  • DELTA mode The local score of recommended subject values is determined by the change in their probability values given the evidence in the model, relative to their original probabilities without the evidence. This mode enables to concentrate on directions of change caused by the session evidence.
  • DELTA mode the local score of a subject S is calculated as either:
  • Gamma represents the total probabilistic evidence for a given Bayesian model, corresponding to a specific session state.
  • Gamma is represented as a list of discrete functions. It is calculated from a set of probabilistic evidence on different variables, and is effected by existing deterministic evidence in the Bayesian model. The calculation of Gamma is performed as follows:
  • Data structures for GAMMA include: 1) A list of neighbors (stating the propagation flow direction).
  • a primary mechanism of the inference system 520 is the ability of its distributed resources, namely the Inference Agents, to communicate and pass on their beliefs to their neighbors. Having this ability is paramount since it enables completing patterns of values over attributes across the domain of all models as a whole. For instance, having a partial observation in the domain of model Al is still sufficient to produce a prediction over the domain of model A2. This mechanism is what ties all models together into a single domain over which inference can be achieved.
  • the primary mechanism of communication is the transfer of probabilistic evidence.
  • the mathematical notion of transfer of probabilistic evidence is well known.
  • P*(Y1) is the new distribution after absorption of the evidence
  • P*(S1, S2) is the propagated evidence distribution based on model X
  • SI, S2) is the conditional probability distribution based on model Y.
  • Gamma is composed of a set of numerator functions, in the example these are functions whose product is P*(S1, S2), and denominator functions whose product is P(S1, S2).
  • the joint probability distribution P(Y1, Y2, Y3, SI, S2) is the core of model Y. Based on it, any sets of attributes in Y can be predicted by a simple operation of marginalization.
  • the IS 520 uses the BE algorithm to do this marginalization.
  • probability functions whose product forms the joint probability distribution are initially placed in "buckets". Since the effect of evidence is localized to the Gamma(Sl, S2) term, it is possible to place Gamma's functions, i.e., both the numerator and then the inverted denominator functions, into the buckets using the same rule.
  • the BE algorithm's computation then proceeds in a regular manner yielding the probability distribution of the variable Yl.
  • the Gamma object encapsulates the influence of probabilistic evidence on calculations within a given Bayesian model:
  • Gamma ⁇ null, null ⁇
  • the numerator or denominator of Gamma may be set to null in order to force their recalculation.
  • Arrival of deterministic evidence sets both numerator and denominator to null.
  • Arrival of probabilistic evidence sets only the numerator to null, while the need for denominator recalculation is determined at the time of Gamma recalculation.
  • Recommendation for an attribute means outputting a predicted probability distribution for the attribute.
  • a Bayesian model network enables any attribute to serve as an input or an output, but not both simultaneously. Thus, prior to answering such a prediction one must be sure to remove any evidence at hand for the queried variable. This way, the recommendation for the variable will not simply be the evidence that the inference agent has for it. This is done as follows:
  • An inference session object maintains the state of a session, including its evidence, Gamma function, and calculations cache.
  • the ISO is also responsible for keeping its local time counters (since sessions are independent), so each ISO keeps the time stamp of its last arriving evidence.
  • the data structures used for ISO include:
  • JBH The JBayes Handler
  • JBH is responsible for performing all calculations concerning a single Bayesian model.
  • JBH maintains a pool of IEs (e.g., IE 17340) which perform the actual calculations - predictions, MAP decisions, Maximum Expected Utility and Gamma calculations.
  • the JBH also performs other special purpose tasks that involve direct interaction with the IE.
  • a public domain distributed software named JavaBayes is extended and used as the basis for the inference engine. For instance, such extensions include the ability to use probabilistic evidence, to predict multiple attributes simultaneously and to perform dynamic simplification of computations as part of a tradeoff with the available memory resources.
  • the data structures used for JBH include a pool of engines of a specific model.
  • An inference engine (e.g., IE 1734) is the primitive module (or component) for performing various process involved in making an inference. Given a Bayesian model, a state of evidence (or observed values) for any subset of attributes of the model, and given a query request which may be a request to predict the probability distribution of a single or multiple attributes, a Maximum Aposteriori Probability (MAP) classification request, or a Maximum Expected Utility (MEU) decision, the engine proceeds with the requested computation.
  • MAP Maximum Aposteriori Probability
  • MEU Maximum Expected Utility
  • the basis of the IE, computations are based on the Bucket Elimination Algorithm, previously mentioned. Extended use of standard ideas from probability theory as well concepts of "d-separation" are used to make computations more efficient.
  • the low level computations of the inference system 520 are:
  • IS 520 adds extensions to the standard package JavaBayes to enable additional computational modes.
  • the standard JavaBayes software implements the BE algorithm.
  • the standard JavaBayes uses the interdependencies information inherent in the graphical structure of the network in order to determine which variables are relevant, i.e., have influence on the queried variable. This is done in the constructor of a class called "Ordering", through a call to a method all_affecting (objective Jndex), which is in the Java class DSeparation.
  • the method is based on the standard concept of d-separation, see J. Pearl, Probabilistic Reasoning in Intelligent Systems, (Morgan Kaufrnann Publishers 1988).
  • Inputs to the extended BE algorithm include:
  • Outputs of the extended BE algorithm include:
  • the extended BE algorithm includes the following steps: 1) Step 1: Determine the set of "affecting variables" with regard to the queried variables, based on the deterministic evidence (observations) and Gamma.
  • the set of "affecting variables” consists of the union of variables that are not d-separated from at list one queried variable. Variables for which probabilistic evidence exists are treated as observed as far as d-separation is concerned.
  • the standard d-separation algorithm can be used as is, in the prediction of joint probability and the existence of probabilistic evidence. The only required modifications are in its form of activation, as explained above.
  • the "participating variable” set is the set of variables that will participate in the bucket elimination phase. This set is defined as the union of the "affecting variable” set and the set of variables participating in Gamma.
  • Step 2 Filter Gamma functions as a result of d-separation. Functions appearing in the denominator that do not involve any of the "affecting variables” are removed from Gamma.
  • Step 3 Produce an ordered set of variables, and a set of functions that will participate in the bucket elimination phase.
  • the set of variables is initialized to the "participating variable” set, induced in the previous step.
  • the function set is initialized to include the conditional probability tables (CPTs) of the "affecting variables” and the set of filtered Gamma functions.
  • CPTs conditional probability tables
  • the moralization algorithm has been altered to receive a set of functions, replacing the model CPTs. As a result, Gamma functions are taken into account when inducing the order of variables.
  • the altered moralization algorithm receives a set of variables and a set of functions, from which it builds a "moralization graph". In the graph, each variable is represented by a node. Two nodes are linked if their variables appear in a common function. After initialization, the altered moralization algorithm chooses, in an iterative manner, the node having the minimal number of links.
  • the returned order of variables is the order of elimination. Since the process of node elimination corresponds exactly to the BE algorithm, which is executed at some later time on the same ordered set of variables, it is possible to identify the maximum bucket dimensionality at this early stage of moralization.
  • Our "kill" extension uses this to identify when a certain bucket is about to exceed a certain determined threshold and then kills the variable. b) If during the creation of the moralization graph, one of the graph nodes exceeds a certain threshold known as the "maximum bucket dimensionality", the ordering attempt fails.
  • the maximum bucket dimensionality represents the maximal allowed size (number of elements) in the "Lambda" function, which every bucket produces.
  • This size is equal to the product of the cardinality of all variables of all functions in the bucket.
  • the dimensionality of a variable's node in the moralization graph is equal to the dimensionality of its bucket in the bucket elimination phase.
  • the killed variable is chosen, in the preferred embodiment, to be that of the highest dimensionality, but other heuristics may be considered as well.
  • Step 4 When creating the buckets for the BE process, a bucket is created for each member of the ordered variable set, produced in the previous step. All members of the produced function set (including the Gamma functions) are inserted into the buckets according to the standard BE algorithm.
  • Step 5 The buckets are also reduced according to the standard BE algorithm, wherein in each step, the current last bucket is reduced into a Lambda function that is inserted into one of the remaining buckets.
  • the only exception concerns the case of a joint probability computation.
  • the reduction of buckets is terminated at the stage where the only remaining buckets are those of the queried variables.
  • the reduction stops and the returned result is the union of functions found in the buckets of the queried variables.
  • the joint probability is equal to the product of the function set, it is kept in the form of a function set to avoid the creation of a function of high dimension.
  • Step 6 In case of a MAP or MPE calculation, a backward maximization is performed to induce the most probable value of each variable.
  • Range of G ⁇ gl, g2, g3, g4, g5 ⁇
  • Range of Q ⁇ ql, q2, q3 ⁇
  • Max Dimensionality Threshold 15 Notice that the product of functions in the denominator equals the joint probability of B, D in model 1800; also notice that the existence of P2(B,D) indicates that B and D are not independent, as can easily be seen by viewing model 1800.
  • Phase 1 Determining the set of "affecting variables”: a) Before activating the d-separation algorithm, set the variables having probabilistic evidence, namely B and D, as observed just for the sake of d- separation. (The observation settings are lifted right after D-separation). b) After activating the d-separation algorithm the "affecting variable" set is determined to be: ⁇ A, D, G, E, Q ⁇ . The deterministic evidence on H caused the elimination of H and I while the probabilistic evidence P*(B) on B caused the elimination of B and C. The "participating variable” set is ⁇ A, D, G, E, Q, B ⁇ . 2) Phase 2: Filter Gamma functions.
  • Phase 3 Producing an ordered variable set.
  • the ordered variable set is initialized to the "participating variable" set, namely ⁇ A, D, G, E, Q, B ⁇ ; the function set becomes ⁇ P(A
  • variable E contains variables ⁇ E, D, B, G ⁇ , so that its dimensionality (240) exceeds the maximum Bucket Dimensionality, which is set to 15.
  • variable G is chosen to be killed (since it has the highest cardinality 5).
  • variable E contains variables ⁇ E, D, B ⁇ , although smaller then in the previous step, its dimensionality (48) still exceeds the threshold.
  • variable E of highest cardinality 4 is killed.
  • Phase 4 Creating the buckets for the bucket elimination phase: The produced buckets are: a) Bucket D : P(A
  • Phase 5 Performing the bucket elimination phase: a) Stepl: Bucket A: P(Q
  • Bucket Q empty b
  • Step2 Bucket Q: ⁇ (Q) c)
  • Step3 The product of functions in bucket Q, that is ⁇ (Q) is normalized, such that Norm( ⁇ (Q)) is returned as the result of the algorithm. 6)
  • Phase 6 This phase is not run in this example since it is a prediction calculation and not a MAP or MPE. 6.2.15 DRV Inference Agents or Models
  • DRV models are used in the AI system 220 for the purpose of predicting the probability of NULL value for each attribute in any model. This is because a NULL value has an important and strong influence over the prediction accuracy. However, a NULL value can never be entered as a deterministic value, since as long as a session is active it is not possible to state that a user entered NULL for an attribute (i.e. has not selected that category of product).
  • the present invention includes a process for updating update any model's belief concerning any of its attributes taking the value NULL. The basis of this prediction is information that contains the entry time to categories in a
  • An example of a DRV agent DRV. Al associated with regular Agent Al is shown Figure 10, but also applies in this case.
  • the NULL attribute 1004 takes the value ⁇ NULL ,OTHER ⁇ where an OTHER indicates that the user has selected something from the related category while a NULL signifies it has not selected.
  • the TIME attribute 1002 takes the values ⁇ 0, 1, ..., M ⁇ , where M is some fixed number.
  • the time attribute's value x refers to the time that the associated category was entered relative to the entry time of its parent category.
  • An example of such a sequence of activity is: A3, Al, Al.PKG, Al, Al.FRT, Al, Al.SEA.
  • the user entered Al 3 times Therefore, the likelihood of the user entering Al.VEG will be low.
  • the inference system distributes such time evidence to all relevant DRV models based on activity events called ⁇ C ATEGORY CHANGE > events. These events contain time sequence of category entries, as shown and described above.
  • the format 2000 of this event is shown in Figure 20.
  • the IS 520 uses the DRV models 1800 to predict the NULL attribute.
  • the DRV.A1 agent 1000 then computes for each of its DRV.NULL attributes one of the following, depending on where the user is currently (as indicated by the last CATEGORY_CHANGE event received by the IM), without loss of generality we display it for DRV. Al.PKG:
  • probabilistic evidence on NULL is placed based on the DRV model's 1800 confidence, which is directly related to the category changes exhibited so far in the current on-line session.
  • the DRV models are learned using the same Machine Learning System that learns the entity Bayesian models (see also section 6.1 on MLS 510). However, instead of using the serial.dat files, the DRV models utilize category -change information, which is still represented in a serial manner. A module converts this serial stream of category-change information into cases for attributes NULL and TIME by scanning the number of times each category was visited, as seen in the serial stream. When the end of the session activity stream is reached, letting the TIME attribute take the last entry time relative to the parent category (discussed more detail in MLS section 6.1.9). From these cases, the DRV model 1800 is learned using the same off-line process as the regular, entity Bayesian models.
  • the learned DRV model can then be used by a DRV Inference Agent.
  • IS 520 can be viewed as an AI system 200 resource that receives query requests and returns sets of induced recommendations.
  • the available query forms are represented by a set of APIs, which are activated by sending QRCTM agents, to the IS.
  • the result of each API is returned in the form of a ReplyObject, containing a set of recommendation descriptions.
  • AgentName name of queried agent.
  • maxItemsNum maximum number of returned recommendations.
  • evaluationCriterion name of the evaluation criterion applied by IS. Return: d) A RecommendationList reply object. 3) Deep- Agent-Recommend:
  • agentName name of queried agent.
  • maxItemsNum maximum number of returned recommendations.
  • evaluationCriterion name of the evaluation criterion applied by IS.
  • max Attributes (int) maximum number of recommended attributes within the queried agent.
  • maxAttribute Values (int) maximum number of recommended values of each attribute.
  • All APIs are activated by sending a QRCTM agent of tag "INF RECOMMEND EVENT" to the IS.
  • the sent QRCTM agent must contain the following elements: 1) MODEL - indicating the name of the queried agent;
  • the QRCTM agent When activating the Deep-Agent-Recommend API the QRCTM agent must also contain the following fields:
  • MAX_ATTRIBS - indicating the maximum number of recommended attributes within the queried agent.
  • the recommendations are returned in the form of a reply object of class RecommendList (see Figure 22), which represents a collection of recommendations, induced by IS 520.
  • Each recommendation element is an object of class Recommend containing the following fields:
  • Subject indicating the subject of recommendation (e.g. , " Al .Bl .Cl " , “ A2.B2” , etc.).
  • SubjectType indicating the type of the recommended subject (e.g. , "product” , "B level category”, etc.).
  • Context the context in which this recommendation was done (e.g. , the category in which this subject was recommended).
  • LocalScore the relative score (e.g., probability) of this subject within its context.
  • ContextScore the score of the context of the recommended subject.
  • RecommendationsList object The contents of the RecommendationsList object are translated to outer "users" (such as Business Objects) via a set of defined interface methods.
  • An example 2200 is given in Figure 22, which presents the returned RecommendList information 2200 when activating a "Shallow-Neighbors- Agent-Recommend" API on agent Al (see Figure 19), using the QRCTM agent structure 2150 of Figure 21 E
  • the returned RecommendList 2200 of Figure 22 contains a variety of information.
  • there are 2 items whose subjects are Al.BRD.BRDPIZ and Al.BRD.BRDCRM.
  • Asynchronous handling is achieved by a pool of Inference Handlers (IH) each operating in its dedicated thread, c) The default number of IEs dedicated to each IA.
  • IH Inference Handlers
  • Each engine requires memory for its model. Engines are preferably not ORB objects, so it is possible for them to share the same model, thereby saving memory.
  • the configuration is implemented as static Java variables in the AI Configuration object. This AI configuration is accessible to all objects that run in the current Java VM. 6.2.18 Scenarios of the Inference System For illustrative purposes, several scenarios of IS 520 are discussed below.
  • the initiator is the intelligence system 200 server 162.
  • IM sends CMR a QRCTM agent "Supply Model Net” , and waits for reply.
  • CMR sends a reply object containing, a list of model objects & neighbors.
  • IM creates an internal agent graph.
  • IM creates the real agents (i.e., ORB objects).
  • the initiator is CMR.
  • IM receives an IP reference to the new model.
  • the initiator is, for example, client 154, as result of user action .
  • IM finds the affected agent/s and propagates to them the "real" evidence.
  • the affected agent/s IA receives the "Real” evidence and updates its corresponding session object IS.
  • IS performs the following: a) Updates its evidence state (Real, Deterministic, Probabilistic evidence). b) Update the timestamp of last arriving evidence, to the current time stamp, (time stamp is a local counter independent of the clock). c) Have the calculations cache point to an empty list (Hash-Table). d) Recalculate Gamma, and store it in the cache.
  • the downstream agent IA receives the probabilistic evidence, and performs steps (4 to 7), excluding the updating of real evidence.
  • the initiator is a AI rule evaluation system.
  • IM 1710 allocates a free inference handler (e.g., IH 1710) and sends it the contents of the QRCTM agent, including the requester's addresses.
  • IH 1710 allocates a free inference handler (e.g., IH 1710) and sends it the contents of the QRCTM agent, including the requester's addresses.
  • the IH operating in a dedicated thread, interprets the query, and selects the initial agent with which to start the query.
  • IH performs a series of steps, wherein in each step the currently selected agent (e.g., IA 1730) is requested to perform a single or series of operations, such as: a) Evidence absorption from neighbors. b) Simple computations such as: Posterior Probability, Classification (e.g. , MAP), and Most Probable Explanation (MPE). c) More complex computations such as: "Highest Peek” and "Delta Classification".
  • the IH can shift its attention to a different IA, requesting it to perform further calculations.
  • the query result is calculated by a target agent or agents that are reached through this series of steps.
  • IH Upon achieving the result, IH returns it in a reply object to the initiating business rule component, and becomes ready for further incoming requests.
  • the initiator is an IH, as part of performing the "Query Request” scenario.
  • the Inference Agent receives the requested computation request.
  • IA attains a snapshot of the corresponding session. 4) If a result for the requested calculation is found in the snapshot's cache, then it is returned as the result of the computation.
  • JBH Java Bayes Handler
  • the I A receives a request to absorb all evidence from its non-downstream neighbors.
  • IA requests each of its non-downstream neighbors to supply it with probabilistic evidence on their mutually shared variables, by sending it an absorb request. 4) Each such neighbor then absorbs evidence from its non-downstream neighbors, calculates the requested probabilistic evidence and returns it to the requesting agent. 5) This recursive process terminates when the initial IA has completed absorption of evidence from all of its neighbors. 6.2.18.7 Session Termination
  • the initiator is AI system 200, (external, client 154 driven).
  • Model 1 2310 gets (and has) probabilistic evidence from Model 2 2320 and Model 3 2330.
  • overriding evidence An example of overriding evidence is shown in Figure 24, wherein Model 1 2410 shares variable set A with Model 2 2420 and Model 3 2430. Suppose that both Model 2 and Model 3 propagate probabilistic evidence on variable A. There are various ways to form this single P*(A). For instance, let P*(A) be the linear combination of both functions with weights equaling the model-dependent confidence, where confidence is a function of the coupling (i.e. , the mutual information of the set A with respect to the no-evidence distribution) and the distance between the evidence-distribution of A and the no-evidence distribution of A (any of the standard distance functions may be used, e.g., the Ll-distance).
  • the algorithm lets P*(A) be the most recently propagated probabilistic evidence, namely, the most recently obtained evidence over A overrides any prior probabilistic evidence on A.
  • Model 1 2510 propagates probabilistic evidence P*(A,B) to Model 2 2520. Since Model 1 2510 received real evidence on A 2530, the probabilistic evidence sent to Model 2 2520 contains only a function P*(B). Notice that the evidence on A is sent to Model 2 in the form of real evidence by the IM of Model 1, thus there is no need for Agent A to send it to Model 2. 6.2.19.4 Preferring Probabilistic Evidence
  • probabilistic evidence that emanates from real evidence 2630 is preferred over propagated probabilistic evidence. That it, real evidence on two values of A is translated into probabilistic evidence on A. In the preferred embodiment, in Model 1 2610, this evidence overrides probabilistic evidence propagated from Model 2 2620.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP00930757A 1999-05-14 2000-05-15 Intelligentes computersystem Withdrawn EP1194862A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13410599P 1999-05-14 1999-05-14
US134105P 1999-05-14
PCT/US2000/013360 WO2000070481A1 (en) 1999-05-14 2000-05-15 Intelligent computer system

Publications (1)

Publication Number Publication Date
EP1194862A1 true EP1194862A1 (de) 2002-04-10

Family

ID=22461792

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00930757A Withdrawn EP1194862A1 (de) 1999-05-14 2000-05-15 Intelligentes computersystem

Country Status (3)

Country Link
EP (1) EP1194862A1 (de)
AU (1) AU4852000A (de)
WO (1) WO2000070481A1 (de)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5075041A (en) * 1990-06-28 1991-12-24 Shell Oil Company Process for the preparation of secondary alcohol sulfate-containing surfactant compositions
US20130066743A1 (en) * 2002-03-08 2013-03-14 Faten "Fay" HELLAL Method and Apparatus for Providing a Shopping List Service
US20190190797A1 (en) * 2017-12-14 2019-06-20 International Business Machines Corporation Orchestration engine blueprint aspects for hybrid cloud composition
US11770307B2 (en) 2021-10-29 2023-09-26 T-Mobile Usa, Inc. Recommendation engine with machine learning for guided service management, such as for use with events related to telecommunications subscribers

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7295991B1 (en) * 2000-11-10 2007-11-13 Erc Dataplus, Inc. Employment sourcing system
EP1410297A4 (de) * 2001-04-16 2008-08-27 Bea Systems Inc System und verfahren für web-gestütztes marketing und kampagnenmanagement
US8379830B1 (en) 2006-05-22 2013-02-19 Convergys Customer Management Delaware Llc System and method for automated customer service with contingent live interaction
US7937349B2 (en) 2006-11-09 2011-05-03 Pucher Max J Method for training a system to specifically react on a specific input
US8706545B2 (en) * 2007-02-22 2014-04-22 Fair Isaac Corporation Variable learning rate automated decisioning
US8103598B2 (en) 2008-06-20 2012-01-24 Microsoft Corporation Compiler for probabilistic programs
US8073809B2 (en) 2008-10-02 2011-12-06 Microsoft Corporation Graphical model for data validation
CN103136247B (zh) 2011-11-29 2015-12-02 阿里巴巴集团控股有限公司 属性数据区间划分方法及装置
US20130262504A1 (en) * 2012-03-30 2013-10-03 Sap Ag Case-based Adaptation Framework for Customization Knowledge in Enterprise Systems
US8935191B2 (en) 2012-05-02 2015-01-13 Sap Ag Reuse of on-demand enterprise system customization knowledge utilizing collective experience
US9251467B2 (en) 2013-03-03 2016-02-02 Microsoft Technology Licensing, Llc Probabilistic parsing
US9886247B2 (en) 2014-10-30 2018-02-06 International Business Machines Corporation Using an application programming interface (API) data structure in recommending an API composite
US20170212650A1 (en) * 2016-01-22 2017-07-27 Microsoft Technology Licensing, Llc Dynamically optimizing user engagement
MX2018012578A (es) 2016-04-15 2019-03-01 Walmart Apollo Llc Sistemas y metodos para proporcionar recomendaciones de productos basadas en contenido.
MX2018012569A (es) * 2016-04-15 2019-03-11 Walmart Apollo Llc Sistemas y metodos que proporcionan ambientes minoristas representados a los consumidores.
WO2017181017A1 (en) 2016-04-15 2017-10-19 Wal-Mart Stores, Inc. Partiality vector refinement systems and methods through sample probing
US10592959B2 (en) 2016-04-15 2020-03-17 Walmart Apollo, Llc Systems and methods for facilitating shopping in a physical retail facility
CA3027866A1 (en) 2016-06-15 2017-12-21 Walmart Apollo, Llc Vector-based characterizations of products and individuals with respect to customer service agent assistance
US10373464B2 (en) 2016-07-07 2019-08-06 Walmart Apollo, Llc Apparatus and method for updating partiality vectors based on monitoring of person and his or her home
US10984034B1 (en) 2016-10-05 2021-04-20 Cyrano.ai, Inc. Dialogue management system with hierarchical classification and progression
WO2018191451A1 (en) 2017-04-13 2018-10-18 Walmart Apollo, Llc Systems and methods for receiving retail products at a delivery destination
US10833962B2 (en) 2017-12-14 2020-11-10 International Business Machines Corporation Orchestration engine blueprint aspects for hybrid cloud composition
US11025511B2 (en) 2017-12-14 2021-06-01 International Business Machines Corporation Orchestration engine blueprint aspects for hybrid cloud composition
US20200265270A1 (en) * 2019-02-20 2020-08-20 Caseware International Inc. Mutual neighbors
CN110210944B (zh) * 2019-06-05 2021-04-23 齐鲁工业大学 联合贝叶斯推理与加权拒绝采样的多任务推荐方法及系统
US11537416B1 (en) 2021-06-10 2022-12-27 NTT DATA Services, LLC Detecting and handling new process scenarios for robotic processes

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4884217A (en) * 1987-09-30 1989-11-28 E. I. Du Pont De Nemours And Company Expert system with three classes of rules
US5999908A (en) * 1992-08-06 1999-12-07 Abelow; Daniel H. Customer-based product design module
US5644686A (en) * 1994-04-29 1997-07-01 International Business Machines Corporation Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications
US6076083A (en) * 1995-08-20 2000-06-13 Baker; Michelle Diagnostic system utilizing a Bayesian network model having link weights updated experimentally
US5867799A (en) * 1996-04-04 1999-02-02 Lang; Andrew K. Information system and method for filtering a massive flow of information entities to meet user information classification needs
US5963447A (en) * 1997-08-22 1999-10-05 Hynomics Corporation Multiple-agent hybrid control architecture for intelligent real-time control of distributed nonlinear processes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0070481A1 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5075041A (en) * 1990-06-28 1991-12-24 Shell Oil Company Process for the preparation of secondary alcohol sulfate-containing surfactant compositions
US20130066743A1 (en) * 2002-03-08 2013-03-14 Faten "Fay" HELLAL Method and Apparatus for Providing a Shopping List Service
US9519929B2 (en) 2002-03-08 2016-12-13 Facebook, Inc. Method and apparatus for providing a shopping list service
US20190190797A1 (en) * 2017-12-14 2019-06-20 International Business Machines Corporation Orchestration engine blueprint aspects for hybrid cloud composition
US10972366B2 (en) * 2017-12-14 2021-04-06 International Business Machines Corporation Orchestration engine blueprint aspects for hybrid cloud composition
US11770307B2 (en) 2021-10-29 2023-09-26 T-Mobile Usa, Inc. Recommendation engine with machine learning for guided service management, such as for use with events related to telecommunications subscribers

Also Published As

Publication number Publication date
WO2000070481A1 (en) 2000-11-23
AU4852000A (en) 2000-12-05

Similar Documents

Publication Publication Date Title
WO2000070481A1 (en) Intelligent computer system
US11423086B2 (en) Data processing system and method of associating internet devices based upon device usage
Klusch Semantic web service coordination
US9514248B1 (en) System to group internet devices based upon device usage
Wang et al. Toward trust and reputation based web service selection: A survey
US7567915B2 (en) Ontology-driven information system
Kokash et al. Web service discovery based on past user experience
US20030053615A1 (en) Methods and apparatus for automated monitoring and action taking based on decision support mechanism
US11431582B2 (en) Systems and methods for context aware adaptation of services and resources in a distributed computing system
Hussain et al. Integrated AHP-IOWA, POWA framework for ideal cloud provider selection and optimum resource management
Rosaci et al. A multi-agent recommender system for supporting device adaptivity in e-commerce
Zhang et al. LA-LMRBF: Online and long-term web service QoS forecasting
EP1189160A1 (de) Verfahren und System zum Transformieren von Sitzungs-Daten
Baldominos Gómez et al. AWS PredSpot: Machine learning for predicting the price of spot instances in AWS cloud
Yu et al. Adaptive web services composition using q-learning in cloud
Mukhopadhyay et al. Multi‐agent information classification using dynamic acquaintance lists
Patel et al. Context aware semantic service discovery
Zhang et al. Weighted Bayesian Runtime Monitor: A Novel QoS Monitoring Approach Sensitive to Environmental Factors
Chen et al. A model for managing and discovering services based on dynamic quality of services
Yanagimoto Customer state estimation with Poisson distribution model
Xiang Context-aware data mining methodology for supply chain finance cooperative systems
Chen et al. HTG: A heterogeneous topology aware model to improve cold start in cloud service QoS prediction
Ramadhan Approaches to Web Service Composition for the Semantic Web
Ludwig Fuzzy match score of semantic service match
Liu A new enterprise customer and supplier cooperative system framework based on multiple criteria decision-making

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20011214

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BARNEA, GAD

Inventor name: RATSABY, JOEL

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20021203