US20160224951A1 - Game theoretic prioritization system and method - Google Patents

Game theoretic prioritization system and method Download PDF


Publication number
US20160224951A1 US15/095,565 US201615095565A US2016224951A1 US 20160224951 A1 US20160224951 A1 US 20160224951A1 US 201615095565 A US201615095565 A US 201615095565A US 2016224951 A1 US2016224951 A1 US 2016224951A1
United States
Prior art keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Application number
Steven M. Hoffberg
Original Assignee
Steven M. Hoffberg
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US60907004P priority Critical
Priority to US11/005,460 priority patent/US7590589B2/en
Priority to US12/560,293 priority patent/US9311670B2/en
Application filed by Steven M. Hoffberg filed Critical Steven M. Hoffberg
Priority to US15/095,565 priority patent/US20160224951A1/en
Publication of US20160224951A1 publication Critical patent/US20160224951A1/en
Application status is Pending legal-status Critical




    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/04Payment circuits
    • G06Q20/06Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
    • G06Q20/065Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
    • G06Q20/0652Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash e-cash with decreasing value according to a parameter, e.g. time
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/10Payment architectures specially adapted for electronic funds transfer [EFT] systems; specially adapted for home banking systems
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q30/00Commerce, e.g. shopping or e-commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions, matching or brokerage
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation, credit approval, mortgages, home banking or on-line banking
    • G06Q40/025Credit processing or loan processing, e.g. risk analysis for mortgages
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Investment, e.g. financial instruments, portfolio management or fund management
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • G06Q50/188Electronic negotiation
    • H04W4/028
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services


A method for transferring virtual wealth using an automated communication network, comprising: providing a wealth generation function which, when automatically evaluated, defines a change of virtual wealth over time of an agent having a communications port configured to interface with the automated communication network; evaluating the wealth generation function with at least one automated processor, to define the change in virtual wealth from a current state of virtual wealth to a subsequent state of virtual wealth; and transferring ownership of at least a portion of the virtual wealth dependent on the subsequent state of virtual wealth, through the communications port, in consideration of an automated transaction.


  • The present application is a Division of U.S. patent application Ser. No. 12/560,293, filed Sep. 15, 2009, now U.S. Pat. No. 9,311,670, issued Apr. 12, 2016, which is a Division of U.S. patent application Ser. No. 11/005,460, filed Dec. 6, 2004, now U.S. Pat. No. 7,590,589, issued Sep. 15, 2009, which claims benefit of priority from U.S. Provisional Patent Application No. 60/609,070, filed Sep. 10, 2004, and from which PCT/US05/32113 filed Sep. 9, 2005 claims priority, each of which is expressly incorporated herein by reference.
  • The present invention relates to the field of ad hoc network protocols and control architectures.
  • A number of fields of endeavor are relevant to the present invention, and exemplary prior art, each of which is expressly incorporated herein by reference, are disclosed below. The references disclosed provide a skilled artisan with disclosure of embodiments of various elements of the present invention, and the teachings therein may be combined and subcombined in various manners in accordance with the present teachings. Therefore, the identified prior art provides a point of reference and set of tools which are expressly available, both as a part of the invention, and to implement the invention.
  • The topical headings are advisory only, and are not intended to limit the applicability of any reference. While some embodiments are discussed as being preferred, it should be understood that all embodiments discussed, in any portion of this documents, whether stated as having advantages or not, form a part of the invention and may be combined and/or subcombined in a consistent manner in accordance with the teachings hereof. Likewise, the disclosure herein is intended to disclose permissive combinations, subcombinations, and attributes, and any language which appears to limit the scope of applicant's invention is intended to apply to the particular embodiment referenced, or as a permissive suggestion for implementation of other embodiments which with it may be consistently applied. The present disclosure includes details of a number of aspects, which may find independent utility, and therefore the present specifications are not intended to be construed as being limited to the conjunction of the elements of the disclosure.
  • INTERNET: The Internet is structured such various networks are interconnected, with communications effected by addressed packets conforming to a common protocol. Based on the packet addressing, information is routed from source to destination, often through a set of networks having multiple potential pathways. The communications medium is shared between all users. Statistically, some proportion of the packets are extraordinarily delayed, or simply lost. Therefore, protocols involving communications using these packets include error detection schemes that request a retransmit of required data not received within a time window. In the event that the network nears capacity or is otherwise subject to limiting constraint, the incidence of delayed or lost packets increases, thereby increasing requests for retransmission and retransmission. Therefore, as the network approaches available bandwidth, the load increases, ultimately leading to failure. In instances where a minimum quality of service must be guaranteed, special Internet technologies are required, to reserve bandwidth or to specify network pathways. End-to-end quality of service guarantees, however, may exceed the cost of circuit switched technologies, such as dialup modems, especially where the high quality needs are intermittent. Internet usage typically involves an Internet server, an automated system capable of responding to communications received through the Internet, and often communicating with other systems not directly connected to the Internet. The server typically has relatively large bandwidth to the Internet, allowing multiple simultaneous communications sessions, and usually supports the hypertext transport protocol (HTTP), which provides, in conjunction with a so-called web browser on a remote client system, a human readable interface which facilitates navigation of various resources available in the Internet. The client systems are typically human user interfaces, which employ a browser to display HTTP “web pages”. The browser typically does not provide intelligence. Bandwidth between the client and Internet is typically relatively small, and various communications and display rendering considered normal. Typically, both client and server are connected to the Internet through Internet service providers, each having its own router. It is also known to provide so-called proxy servers and firewalls, which are automated systems that insulate the client system from the Internet. Further, so-called Internet applications and applets are known which provide local intelligence at the client system. Further, it is known to provide a local server within the client system for locally processing a portion of the information. These local servers, applications and applets are non-standard, and thus require special software to be available locally for execution. Thus, the Internet poses a number of advantages for commercial use, including low cost and ubiquitous connectivity. Therefore, it is desirable to employ standard Internet technologies while achieving sufficient quality communications to effect an efficient transaction. A widely dispersed network of access points may implement a mobile telecommunications protocol, such as IETF RFC 3344 (Mobile IP, IPv4), or various mobile ad hoc network (MANET) protocols, 2.5G or 3G cellular, or other types of protocols. Preferably, the protocol allows the client to maintain a remote connection while traversing between various access points. See, U.S. Pub. App. No. 20040073642, expressly incorporated herein by reference. Mobile Internet Protocol (Mobile IP or MIP, in this case, v4) is an Internet Engineering Task Force (IETF) network layer protocol, specified in RFC-3344. It is designed to allow seamless connectivity session maintenance under TCP (Transmission Control Protocol) or other connection oriented transport protocols when a mobile node moves from one IP subnet to another. MIPv4 uses two network infrastructure entities, a Home Agent (HA) and an optional Foreign Agent (FA), to deliver packets to the mobile node when it has left its home network. MIPv4 also supports point-of-attachment Care-of Addresses (CoA) if a FA is unavailable. Mobile IP is increasingly being deployed for 2.5/3 G (2.5 or third generation wireless) provider networks and may be deployed in medium and large Enterprise IEEE 802.11-based LANs (Local Area Networks) with multiple subnets. MIPv4 relies on the use of permanently assigned “home” IP addresses to help maintain connectivity when a mobile device connects to a foreign network. On the other hand, IPsec-based (Internet Protocol Security, a security protocol from IETF) VPNs (Virtual Private Networks) use a tunneling scheme in which the outer source IP address is based on a CoA at the point-of-attachment and an inner source IP address assigned for the “home” domain. In general if either address is changed, such as when the mobile node switches IP subnets, then a new tunnel is negotiated with new keys and several round trip message exchanges. The renegotiation of the tunnel interferes with seamless mobility across wired and wireless IP networks spanning multiple IP subnets.
  • MARKET ECONOMY SYSTEMS: In modern retail transactions, predetermined price transactions are common, with market transactions, i.e., commerce conducted in a setting which allows the transaction price to float based on the respective valuation allocated by the buyer(s) and seller(s), often left to specialized fields. While interpersonal negotiation is often used to set a transfer price, this price is often different from a transfer price that might result from a best-efforts attempt at establishing a market price. Assuming that the market price is optimal, it is therefore assumed that alternatives are sub optimal. Therefore, the establishment of a market price is desirable over simple negotiations. One particular problem with market-based commerce is that both seller optimization and market efficiency depend on the fact that representative participants of a preselected class are invited to participate, and are able to promptly communicate, on a relevant timescale, in order to accurately value the goods or services and make an offer. Thus, in traditional market-based system, all participants are in the same room, or connected by a high quality telecommunications link. Alternately, the market valuation process is prolonged over an extended period, allowing non-real time communications of market information and bids. Thus, attempts at ascertaining a market price for non-commodity goods can be subject to substantial inefficiencies, which reduce any potential gains by market pricing. Further, while market pricing might be considered “fair”, it also imposes an element of risk, reducing the ability of parties to predict future pricing and revenues. Addressing this risk may also reduce efficiency of a market-based system.
  • AUCTION SYSTEMS: When a single party seeks to sell goods to the highest valued purchaser(s), to establish a market price, the rules of conduct typically define an auction. Typically, known auctions provide an ascending price or descending price over time, with bidders making offers or ceasing to make offers, in the descending price or ascending price models, respectively, to define the market price. After determining the winner of the auction, the pricing rules define uniform price auctions, wherein all successful bidders pay the lowest successful bid, second price auctions wherein the winning bidder pays the amount bid by the next highest bidder, and pay-what-you-bid auctions. The pay-what-you-bid auction is also known as a discriminative auction while the uniform price auction is known as a non-discriminative auction. In a second-price auction, also known as a Vickrey auction, the policy seeks to create a disincentive for speculation and to encourage bidders to submit bids reflecting their true value for the good. In the uniform price and second price schemes, the bidder is encourages to disclose the actual private value to the bidder of the good or service, since at any price below this amount, there is an excess gain to the buyer, whereas by withholding this amount the bid may be unsuccessful, resulting in a loss of the presumably desirable opportunity. In the pay-what-you-bid auction, on the other hand, the buyer need not disclose the maximum private valuation, and those bidders with lower risk tolerance will bid higher prices. See,;
  • Two common types of auction are the English auction, which sells a single good to the highest bidder in an ascending price auction, and the Dutch auction, in which multiple units are available for sale, and in which a starting price is selected by the auctioneer, which is successively reduced, until the supply is exhausted by bidders (or the minimum price/final time is reached), with the buyer(s) paying the lowest successful bid. The term Dutch auction is also applied to a type of sealed bid auction. In a multi-unit live Dutch auction, each participant is provided with the current price, the quantity on hand and the time remaining in the auction. This type of auction, typically takes place over a very short period of time and there is a flurry of activity in the last portion of the auction process. The actual auction terminates when there is no more product to be sold or the time period expires. In selecting the optimal type of auction, a number of factors are considered. In order to sell large quantities of a perishable commodity in a short period of time, the descending price auctions are often preferred. For example, the produce and flower markets in Holland routinely use the Dutch auction (hence the derivation of the name), while the U.S. Government uses this form to sell its financial instruments. The format of a traditional Dutch auction encourages early bidders to bid up to their “private value”, hoping to pay some price below the “private value”. In making a bid, the “private value” becomes known, helping to establish a published market value and demand curve for the goods, thus allowing both buyers and sellers to define strategies for future auctions.
  • In an auction, typically a seller retains an auctioneer to conduct an auction with multiple buyers. (In a reverse auction, a buyer solicits the lowest price from multiple competing vendors for a desired purchase). Since the seller retains the auctioneer, the seller essentially defines the rules of the auction. These rules are typically defined to maximize the revenues or profit to the seller, while providing an inviting forum to encourage a maximum number of high valued buyers. If the rules discourage high valuations of the goods or services, or discourage participation by an important set of potential bidders, then the rules are not optimum. A rule may also be imposed to account for the valuation of the good or service applied by the seller, in the form of a reserve price. It is noted that these rules typically seek to allocate to the seller a portion of the economic benefit that would normally inure to the buyer, creating an economic inefficiency. However, since the auction is to benefit the seller, not society as a whole, this potential inefficiency is tolerated. An optimum auction thus seeks to produce a maximum profit (or net revenues) for the seller. An efficient auction, on the other hand, maximizes the sum of the utilities for the buyer and seller. It remains a subject of academic debate as to which auction rules are most optimum in given circumstances; however, in practice, simplicity of implementation may be a paramount concern, and simple auctions may result in highest revenues; complex auctions, while theoretically more optimal, may discourage bidders from participating or from applying their true and full private valuation in the auction process. Typically, the auction rules are predefined and invariant. Further, for a number of reasons, auctions typically apply the same rules to all bidders, even though, with a priori knowledge of the private values assigned by each bidder to the goods, or a prediction of the private value, an optimization rule may be applied to extract the full value assigned by each bidder, while selling above the sellers reserve.
  • In a known ascending price auction, each participant must be made aware of the status of the auction, e.g., open, closed, and the contemporaneous price. A bid is indicated by the identification of the bidder at the contemporaneous price, or occasionally at any price above the minimum bid increment plus the previous price. The bids are asynchronous, and therefore each bidder must be immediately informed of the particulars of each bid by other bidders. In a known descending price auction, the process traditionally entails a common clock, which corresponds to a decrementing price at each decrement interval, with an ending time (and price). Therefore, once each participant is made aware of the auction parameters, e.g., starting price, price decrement, ending price/time, before the start of the auction, the only information that must be transmitted is auction status (e.g., inventory remaining).
  • As stated above, an auction is traditionally considered an efficient manner of liquidating goods at a market price. The theory of an auction is that either the buyer will not resell, and thus has an internal or private valuation of the goods regardless of other's perceived values, or that the winner will resell, either to gain economic efficiency or as a part of the buyers regular business. In the later case, it is a general presumption that the resale buyers are not in attendance at the auction or are otherwise precluded from bidding, and therefore that, after the auction, there will remain demand for the goods at a price in excess of the price paid during the auction. Extinction of this residual demand results in the so-called “winner's curse”, in which the buyer can make no profit from the transaction during the auction. Since this detracts from the value of the auction as a means of conducting profitable commerce, it is of concern to both buyer and seller. In fact, experience with initial public offerings (IPOs) of stock through various means has demonstrated that by making stock available directly to all classes of potential purchasers, latent demand for a new issue is extinguished, and the stock price is likely to decline after issuance, resulting in an IPO which is characterized as “unsuccessful”. This potential for post IPO decline tempers even initial interest in the issue, resulting in a paradoxical decline in revenues from the vehicle. In other words, the “money on the table” resulting from immediate retrading of IPO shares is deemed a required aspect of the IPO process. Thus, methods that retain latent demand after IPO shares result in post IPO increases, and therefore a “successful” IPO. Therefore, where the transaction scheme anticipates demand for resale after the initial distribution, it is often important to assure a reasonable margin for resellers and limitations on direct sale to ultimate consumers. Research into auction theory (game theory) shows that in an auction, the goal of the seller is to optimize the auction by allocating the goods inefficiently, and thus to appropriate to himself an excess gain. This inefficiency manifests itself by either withholding goods from the market or placing the goods in the wrong hands. In order to assure for the seller a maximum gain from a misallocation of the goods, restrictions on resale are imposed; otherwise, post auction trading will tend to undue the misallocation, and the anticipation of this trading will tend to control the auction pricing. The misallocation of goods imposed by the seller through restrictions allow the seller to achieve greater revenues than if free resale were permitted. It is believed that in an auction followed by perfect resale, that any mis-assignment of the goods lowers the seller's revenues below the optimum and likewise, in an auction market followed by perfect resale, it is optimal for the seller to allocate the goods to those with the highest value. Therefore, if post-auction trading is permitted, the seller will not benefit from these later gains, and the seller will obtain sub optimal revenues.
  • These studies, however, typically do not consider transaction costs and internal inefficiencies of the resellers, as well as the possibility of multiple classes of purchasers, or even multiple channels of distribution, which may be subject to varying controls or restrictions, and thus in a real market, such theoretical optimal allocation is unlikely. In fact, in real markets the transaction costs involved in transfer of ownership are often critical in determining a method of sale and distribution of goods. For example, it is the efficiency of sale that motivates the auction in the first place. Yet, the auction process itself may consume a substantial margin, for example 1-15% of the transaction value. To presume, even without externally imposed restrictions on resale, that all of the efficiencies of the market may be extracted by free reallocation, ignores that the motivation of the buyer is a profitable transaction, and the buyer may have fixed and variable costs on the order of magnitude of the margin. Thus, there are substantial opportunities for the seller to gain enhanced revenues by defining rules of the auction, strategically allocating inventory amount and setting reserve pricing. Therefore, perfect resale is but a fiction created in auction (game) theory. Given this deviation from the ideal presumptions, auction theory may be interpreted to provide the seller with a motivation to misallocate or withhold based on the deviation of practice from theory, likely based on the respective transaction costs, seller's utility of the goods, and other factors not considered by the simple analyses.
  • A number of proposals have been made for effecting auction systems using the Internet. These systems include consumer-to-consumer, business-to-consumer, and business-to-business types. Generally, these auctions, of various types and implementations discussed further below, are conducted through Internet browsers using hypertext markup language (HTML) “web pages”, using HTTP. In some instances, such as BIDWATCH, discussed further below, an application with associated applets is provided to define a user interface instead of HTML. As stated above, the information packets from the transaction server to client systems associated with respective bidders communicate various information regarding the status of an interactive auction during the progress thereof. The network traffic from the client systems to the transaction server is often limited to the placement of bids; however, the amount of information required to be transmitted can vary greatly, and may involve a complex dialogue of communications to complete the auction offer. Typically, Internet based auction systems have scalability issues, wherein economies of scale are not completely apparent, leading to implementation of relatively large transaction server systems to handle peak loads. When the processing power of the transaction server system is exceeded, entire system outages may occur, resulting in lost sales or diminished profits, and diminished goodwill. In most Internet auction system implementations, there are a large quantity of simultaneous auctions, with each auction accepting tens or hundreds of bids over a timescale of hours to days. In systems where the transaction volume exceeds these scales, for example in stock and commodity exchanges, which can accommodate large numbers of transactions per second involving the same issue, a private network, or even a local area network, is employed, and the public Internet is not used as a direct communications system with the transaction server. Thus, while infrastructures are available to allow successful handling of massive transaction per second volumes, these systems typically avoid direct public Internet communications or use of some of its limiting technologies. The transaction processing limitations are often due to the finite time required to handle, e.g., open, update, and close, database records. In business-to-business auctions, buyers seek to ensure that the population of ultimate consumers for the good or services are not present at the auction, in order to avoid the “winner's curse”, where the highest bidder in the auction cannot liquidate or work the asset at a profit. Thus, business-to-business auctions are distinct from business-to-consumer auctions. In the former, the optimization by the seller must account for the desire or directive of the seller to avoid direct retail distribution, and instead to rely on a distribution tier represented in the auction. In the latter, the seller seeks maximum revenues and to exhaust the possibilities for downstream trade in the goods or services. In fact, these types of auctions may be distinguished by various implementing rules, such as requiring sales tax resale certificates, minimum lot size quantities, preregistration or qualification, support or associated services, or limitations on the title to the goods themselves. The conduct of these auctions may also differ, in that consumer involvement typically is permissive of mistake or indecision, while in a pure business environment professionalism and decisiveness are mandated. In many instances, psychology plays an important role auction conduct. In a live auction, bidders can see each other, and judge the tempo of the auction. In addition, multiple auctions are often conducted sequentially, so that each bidder can begin to understand the other bidder's patterns, including hesitation, bluffing, facial gestures or mannerisms. Thus, bidders often prefer live auctions to remote or automated auctions if the bidding is to be conducted strategically.
  • INTERNET AUCTIONS: On-line electronic auction systems which allow efficient sales of products and services are well known, for example, EBAY.COM, ONSALE.COM, UBID.COM, and the like. Inverse auctions that allow efficient purchases of product are also known, establishing a market price by competition between sellers. The Internet holds the promise of further improving efficiency of auctions by reducing transaction costs and freeing the “same time-same place” limitations of traditional auctions. This is especially appropriate where the goods may be adequately described by text or images, and thus a physical examination of the goods is not required prior to bidding. In existing Internet systems, the technological focus has been in providing an auction system that, over the course of hours to days, allow a large number of simultaneous auctions, between a large number of bidders to occur. These systems must be scalable and have high transaction throughput, while assuring database consistency and overall system reliability. Even so, certain users may selectively exploit known technological limitations and artifacts of the auction system, including non-real time updating of bidding information, especially in the final stages of an auction. Because of existing bandwidth and technological hurdles, Internet auctions are quite different from live auctions with respect to psychological factors. Live auctions are often monitored closely by bidders, who strategically make bids, based not only on the “value” of the goods, but also on an assessment of the competition, timing, psychology, and progress of the auction. It is for this reason that so-called proxy bidding, wherein the bidder creates a preprogrammed “strategy”, usually limited to a maximum price, are disfavored. A maximum price proxy bidding system is somewhat inefficient, in that other bidders may test the proxy, seeking to increase the bid price, without actually intending to purchase, or contrarily, after testing the proxy, a bidder might give up, even below a price he might have been willing to pay. Thus, the proxy imposes inefficiency in the system that effectively increases the transaction cost.
  • In order to address a flurry of activity that often occurs at the end of an auction, an auction may be held open until no further bids are cleared for a period of time, even if advertised to end at a certain time. This is common to both live and automated auctions. However, this lack of determinism may upset coordinated schedules, thus impairing efficient business use of the auction system.
  • In order to facilitate management of bids and bidding, some of the Internet auction sites have provided non-Hypertext Markup Language (HTML) browser based software “applet” to track auctions. For example, ONSALE.COM has made available a Marimba Castanet® applet called Bidwatch to track auction progress for particular items or classes of items, and to facilitate bidding thereon. This system, however, lacks real-time performance under many circumstances, having a stated refresh period of 10 seconds, with a long latency for confirmation of a bid, due to constraints on software execution, quality of service in communications streams, and bid confirmation dialogue. Thus, it is possible to lose a bid even if an attempt was made prior to another bidder. The need to quickly enter the bid, at risk of being too late, makes the process potentially error prone.
  • Proxy bidding, as discussed above, is a known technique for overcoming the constraints of Internet communications and client processing limitations, since it bypasses the client and telecommunications links and may execute solely on the host system or local thereto. However, proxy bidding undermines some of the efficiencies gained by a live market.
  • U.S. Pat. No. 5,890,138, expressly incorporated herein by reference in its entirety, relates to an Internet auction system. The system implements a declining price auction process, removing a user from the auction process once an indication to purchase has been received. See, Rockoff, T. E., Groves, M; “Design of an Internet-based System for Remote Dutch Auctions”, Internet Research, v 5, n 4, pp. 10-16, MCB University Press, Jan. 1, 1995.
  • A known computer site for auctioning a product on-line comprises at least one web server computer designed for serving a host of computer browsers and providing the browsers with the capability to participate in various auctions, where each auction is of a single product, at a specified time, with a specified number of the product available for sale. The web server cooperates with a separate database computer, separated from the web server computer by a firewall. The database computer is accessible to the web computer server computer to allow selective retrieval of product information, which includes a product description, the quantity of the product to be auctioned, a start price of the product, and an image of the product. The web server computer displays, updated during an auction, the current price of the product, the quantity of the product remaining available for purchase and the measure of the time remaining in the auction. The current price is decreased in a predetermined manner during the auction. Each user is provided with an input instructing the system to purchase the product at a displayed current price, transmitting an identification and required financial authorization for the purchase of the product, which must be confirmed within a predetermined time. In the known system, a certain fall-out rate in the actual purchase confirmation may be assumed, and therefore some overselling allowed. Further, after a purchase is indicated, the user's screen is not updated, obscuring the ultimate lowest selling price from the user. However, if the user maintains a second browser, he can continue to monitor the auction to determine whether the product could have been purchased at a lower price, and if so, fail to confirm the committed purchase and purchase the same goods at a lower price while reserving the goods to avoid risk of loss. Thus, the system is flawed, and may fail to produce an efficient transaction or optimal price. An Internet declining price auction system may provide the ability to track the price demand curve, providing valuable marketing information. For example, in trying to determine the response at different prices, companies normally have to conduct market surveys. In contrast, with a declining price auction, substantial information regarding price and demand is immediately known. The relationship between participating bidders and average purchasers can then be applied to provide a conventional price demand curve for the particular product. See U.S. Pat. No. 5,835,896, expressly incorporated herein by reference in its entirety. The auction rules may be flexible, for example including Dutch-type auctions, for example by implementing a price markdown feature with scheduled price adjustments, and English-type (progressive) auctions, with price increases corresponding to successively higher bids. In the Dutch type auction, the price markdown feature may be responsive to bidding activity over time, amount of bids received, and number of items bid for. Likewise, in the progressive auction, the award price may be dependent on the quantity desired, and typically implements a lowest successful bid price rule. Bids that are below a preset maximum posted selling price are maintained in reserve by the system. If a certain sales volume is not achieved in a specified period of time, the price is reduced to liquidate demand above the price point, with the new price becoming the posted price. On the other hand, if a certain sales volume is exceeded in a specified period of time, the system may automatically increase the price. These automatic price changes allow the seller to respond quickly to market conditions while keeping the price of the merchandise as high as possible, to the seller's benefit. A “Proxy Bidding” feature allows a bidder to place a bid for the maximum amount they are willing to pay, keeping this value a secret, displaying only the amount necessary to win the item up to the amount of the currently high bids or proxy bids of other bidders. This feature allows bidders to participate in the electronic auction without revealing to the other bidders the extent to which they are willing to increase their bids, while maintaining control of their maximum bid without closely monitoring the bidding. The feature assures proxy bidders the lowest possible price up to a specified maximum without requiring frequent inquiries as to the state of the bidding. A “Floating Closing Time” feature may also be implemented whereby the auction for a particular item is automatically closed if no new bids are received within a predetermined time interval, assuming an increasing price auction. Bidders thus have an incentive to place bids expeditiously, rather than waiting until near the anticipated close of the auction. See, U.S. Pat. No. 590,975, expressly incorporated herein by reference in its entirety.
  • SECURE NETWORKS: Various references relate to secure networks, an aspect of various embodiments of the present invention. These references are incorporated herein by reference in their entirety, including U.S. Pat. Nos. 5,933,498; 5,978,918; 6,005,943; 6,009,526; 6,021,202; 6,021,491; 6,021,497; 6,023,762; 6,029,245; 6,049,875; 6,055,508; 6,065,119; 6,073,240; 6,075,860; and 6,075,861.
  • CRYPTOGRAPHIC TECHNOLOGY: See, U.S. Pat. Nos. 5,956,408; 5,982,891; 5,949,876; 5,892,900; 6,009,177; 6,052,467; and 6,052,780. See also, U.S. Pat. Nos. 4,200,770; 4,218,582; 4,264,782; 4,306,111; 4,309,569; 4,326,098; 4,351,982; 4,365,110; 4,386,233; 4,393,269; 4,399,323; 4,405,829; 4,438,824; 4,453,074; 4,458,109; 4,471,164; 4,514,592; 4,528,588; 4,529,870; 4,558,176; 4,567,600; 4,575,621; 4,578,531; 4,590,470; 4,595,950; 4,625,076; 4,633,036; 5,991,406; 6,026,379; 6,026,490; 6,028,932; 6,028,933; 6,028,936; 6,028,937; 6,028,939; 6,029,150; 6,029,195; 6,029,247; 6,031,913; 6,031,914; 6,034,618; 6,035,041; 6,035,398; 6,035,402; 6,038,315; 6,038,316; 6,038,322; 6,038,581; 6,038,665; 6,038,666; 6,041,122; 6,041,123; 6,041,357; 6,041,408; 6,041,410; 6,044,131; 6,044,155; 6,044,157; 6,044,205; 6,044,349; 6,044,350; 6,044,388; 6,044,462; 6,044,463; 6,044,464; 6,044,466; 6,044,468; 6,047,051; 6,047,066; 6,047,067; 6,047,072; 6,047,242; 6,047,268; 6,047,269; 6,047,374; 6,047,887; 6,049,610; 6,049,612; 6,049,613; 6,049,671; 6,049,785; 6,049,786; 6,049,787; 6,049,838; 6,049,872; 6,049,874; 6,052,466; 6,052,467; 6,052,469; 6,055,314; 6,055,321; 6,055,508; 6,055,512; 6,055,636; 6,055,639; 6,056,199; 6,057,872; 6,058,187; 6,058,188; 6,058,189; 6,058,193; 6,058,381; 6,058,383; 6,061,448; 6,061,454; 6,061,692; 6,061,789; 6,061,790; 6,061,791; 6,061,792; 6,061,794; 6,061,796; 6,061,799; 6,064,723; 6,064,738; 6,064,740; 6,064,741; 6,064,764; 6,064,878; 6,065,008; 6,067,620; 6,069,647; 6,069,952; 6,069,954; 6,069,955; 6,069,969; 6,069,970; 6,070,239; 6,072,870; 6,072,874; 6,072,876; 6,073,125; 6,073,160; 6,073,172; 6,073,234; 6,073,236; 6,073,237; 6,073,238; 6,073,242; 6,075,864; 6,075,865; 6,076,078; 6,076,162; 6,076,163; 6,076,164; 6,076,167; 6,078,663; 6,078,665; 6,078,667; 6,078,909; 6,079,018; 6,079,047; 6,081,597; 6,081,598; 6,081,610; 6,081,790; 6,081,893; and 6,192,473; each of which is expressly incorporated herein by reference. See, also, U.S. Pat. Nos. 6,028,937; 6,026,167; 6,009,171; 5,991,399; 5,948,136, 5,915,018, 5,715,403; 5,638,443; 5,634,012; and 5,629,980, expressly incorporated herein by reference, and Jim Wright and Jeff Robillard (Philsar Semiconductor), “Adding Security to Portable Designs”, Portable Design, March 2000, pp. 16-20. U.S. Provisional Patent Application No. 60/278,317, filed Mar. 23, 2001, provides a three party authorization encryption technique. This technique has the significant advantage of being more secure than a Public Key system because it requires the agreement of all three parties—the creator of the secure record, the party that the secure record is about, and the database repository—on a case by case basis, in order to release secure records. Then and only then can a released secure record be decrypted and accessed by the requesting party alone. Each access generates an entry into a log file with automatic security alerts for any unusual activity. A component of this system is that each party wishing to secure records enters into a contract with a “virtual trust agency” to represent that party in all matters where privacy is an issue. The virtual trust agency never has access to the data contained in the secured records and yet acts on behalf of the party whose information is contained in the secure records to control data access to authorized requesting parties. To enable this privacy, the virtual trust agency issues a public-private key pair and maintains the party's private key. The private key is only used in calculations to generate an intermediate key that is passed on to the data repository and used to re-encrypt the data for the requesting party's view. A unique aspect of this patent pending technique is the fact that the party's records are actually protected from unauthorized use at all times inside the organization that holds the database repository or by the original record's creator, not simply in transmission from the database repository or the individual or organization that created the record in the first place, to outside requesting parties. This system requires consent of all three parties to decrypt secured information. Its virtual trust component takes the place of the trusted individual or organization in protecting the party whose record contains information that has legal mandates to rights of privacy.
  • COMPUTER SECURITY AND DEVICES: Various references relate to computer system security, part of various embodiments of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 5,881,225; 5,937,068; 5,949,882; 5,953,419; 5,956,400; 5,958,050; 5,978,475; 5,991,878; 6,070,239; and 6,079,021. A number of references relate to computer security devices, which is a part of various embodiment of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 5,982,520; 5,991,519; 5,999,629; 6,034,618; 6,041,412; 6,061,451; and 6,069,647.
  • VIRTUAL PRIVATE NETWORK: Various references relate to virtual private networks, part of various embodiments of the invention. The following references relevant to this issue are incorporated herein by reference: U.S. Pat. Nos. 6,079,020; 6,081,900; 6,081,533; 6,078,946; 6,078,586; 6,075,854; 6,075,852; 6,073,172; 6,061,796; 6,061,729; 6,058,303; 6,055,575; 6,052,788; 6,047,325; 6,032,118; 6,029,067; 6,016,318; 6,009,430; 6,005,859; 6,002,767; and 6,002,756, each of which is expressly incorporated herein by reference. See also, U.S. Pat. Nos. 4,564,018; 4,731,841; 4,736,203; 4,752,676; 4,819,267; 4,827,518; 4,868,376; 4,890,323; 4,896,363; 4,926,480; 4,941,173; 4,952,928; 4,961,142; 4,972,476; 4,993,068; 5,020,105; 5,036,461; 5,056,141; 5,056,147; 5,065,429; 5,067,162; 5,073,950; 5,131,038; 5,155,680; 5,163,094; 5,191,611; 5,204,670; 5,208,858; 5,224,173; 5,228,094; 5,229,764; 5,245,329; 5,272,754; 5,280,527; 5,283,431; 5,291,560; 5,335,288; 5,341,428; 5,345,549; 5,347,580; 5,363,453; 5,412,727; 5,414,755; 5,432,864; 5,448,045; 5,453,601; 5,455,407; 5,457,747; 5,469,506; 5,475,839; 5,478,993; 5,483,601; 5,485,312; 5,485,519; 5,497,430; 5,523,739; 5,526,428; 5,533,123; 5,534,855; 5,544,255; 5,553,155; 5,557,765; 5,559,885; 5,561,718; 5,572,596; 5,578,808; 5,583,933; 5,583,950; 5,586,171; 5,588,059; 5,592,408; 5,594,806; 5,608,387; 5,613,012; 5,615,277; 5,633,932; 5,636,282; 5,646,839; 5,647,017; 5,647,364; 5,659,616; 5,666,400; 5,668,878; 5,680,460; 5,682,032; 5,682,142; 5,696,827; 5,703,562; 5,706,427; 5,712,912; 5,712,914; 5,719,950; 5,734,154; 5,737,420; 5,742,683; 5,742,685; 5,745,555; 5,745,573; 5,748,738; 5,751,809; 5,751,836; 5,757,431; 5,757,916; 5,761,298; 5,763,862; 5,764,789; 5,767,496; 5,768,382; 5,770,849; 5,771,071; 5,774,551; 5,784,461; 5,784,566; 5,787,187; 5,789,733; 5,790,668; 5,790,674; 5,799,083; 5,799,086; 5,799,088; 5,802,199; 5,805,719; 5,815,252; 5,815,577; 5,825,871; 5,825,880; 5,828,751; 5,832,119; 5,832,464; 5,838,812; 5,841,122; 5,841,865; 5,841,886; 5,841,907; 5,844,244; 5,848,231; 5,850,442; 5,850,451; 5,857,022; 5,862,223; 5,862,246; 5,862,260; 5,867,578; 5,867,795; 5,867,802; 5,869,822; 5,870,723; 5,872,834; 5,872,848; 5,872,849; 5,875,108; 5,876,926; 5,878,144; 5,881,226; 5,889,474; 5,890,152; 5,892,824; 5,892,838; 5,892,902; 5,897,616; 5,898,154; 5,901,246; 5,907,149; 5,910,988; 5,912,818; 5,912,974; 5,913,025; 5,913,196; 5,915,973; 5,920,058; 5,920,384; 5,920,477; 5,923,763; 5,930,804; 5,933,498; 5,933,515; 5,935,071; 5,943,423; 5,949,046; 5,949,879; 5,949,881; 5,951,055; 5,952,641; 5,954,583; 5,963,657; 5,963,908; 5,966,446; 5,970,143; 5,974,146; 5,978,494; 5,979,773; 5,982,894; 5,984,366; 5,986,746; 5,987,153; 5,987,155; 5,991,408; 5,991,429; 5,991,431; 5,995,630; 5,999,095; 5,999,637; 6,002,770; 6,003,135; 6,006,328; 6,009,177; 6,011,858; 6,012,039; 6,012,049; 6,016,476; 6,018,739; 6,026,166; 6,031,910; 6,035,398; 6,035,402; 6,035,406; 6,037,870; 6,038,315; 6,038,337; 6,038,666; 6,040,783; 6,041,410; 6,044,155; 6,044,349; 6,045,039; 6,052,468; 6,056,197; 6,064,751; 6,068,184; 6,070,141; 6,072,894; 6,075,455; 6,076,167; 6,078,265; 6,079,621; 6,081,199; 6,081,750; 6,081,900; each of which is expressly incorporated herein by reference.
  • E-COMMERCE SYSTEMS: See, U.S. Pat. Nos. 5,946,669; 6,005,939; 6,016,484; 6,029,150; 6,047,269; expressly incorporated herein by reference.
  • MICROPAYMENTS: See, U.S. Pat. No. 5,999,919, expressly incorporated herein by reference. The following U.S. patents, expressly incorporated herein by reference, define aspects of micropayment, digital certificate, and on-line payment systems: U.S. Pat. Nos. 5,666,416; 5,677,955; 5,717,757; 5,793,868; 5,815,657; 5,839,119; 5,857,023; 5,884,277; 5,903,651; 5,903,880; 5,915,093; 5,930,777; 5,933,498; 5,937,394; 5,960,083; 5,963,924; 5,987,132; 5,996,076; 6,016,484; 6,018,724; 6,021,202; 6,035,402; 6,049,786; 6,049,787; 6,057,872; 6,058,381; 6,061,448; and 6,061,665. See also, Rivest and Shamir, “PayWord and MicroMint: Two Simple Micropayment Schemes” (May 7, 1996); Micro PAYMENT transfer Protocol (MPTP) Version 0.1 (22 Nov. 1995) et seq.,; Common Markup for web Micropayment Systems, (9 Jun. 1999); “Distributing Intellectual Property: a Model of Microtransaction Based Upon Metadata and Digital Signatures”, Olivia, Maurizio,˜olivia/RFC/09/, all of which are expressly incorporated herein by reference. See, also: U.S. Pat. Nos. 4,977,595; 5,224,162; 5,237,159; 5,392,353; 5,511,121; 5,621,201; 5,623,547; 5,679,940; 5,696,908; 5,754,939; 5,768,385; 5,799,087; 5,812,668; 5,828,840; 5,832,089; 5,850,446; 5,889,862; 5,889,863; 5,898,154; 5,901,229; 5,920,629; 5,926,548; 5,943,424; 5,949,045; 5,952,638; 5,963,648; 5,978,840; 5,983,208; 5,987,140; 6,002,767; 6,003,765; 6,021,399; 6,026,379; 6,029,150; 6,029,151; 6,047,067; 6,047,887; 6,055,508; 6,065,675; 6,072,870; each of which is expressly incorporated herein by reference.
  • NEURAL NETWORKS: The resources relating to Neural Networks, listed in the Neural Networks References Appendix, each of which is expressly incorporated herein by reference, provides a sound basis for understanding the field of neural networks (and the subset called artificial neural networks, which distinguish biological systems) and how these might be used to solve problems. A review of these references will provide a state of knowledge appropriate for an understanding of aspects of the invention which rely on Neural Networks, and to avoid a prolix discussion of no benefit to those already possessing an appropriate state of knowledge.
  • TELEMATICS: The resources relating to telematics listed in the Telematics Appendix, each of which is expressly incorporated herein by reference, provides a background in the theory and practice of telematics, as well as some of the underlying technologies. A review of these references is therefore useful in understanding practical issues and the context of functions and technologies which may be used in conjunction with the advances set forth herein.
  • The drawings show:
  • FIG. 1 shows a Bayesian Network;
  • FIG. 2 shows a Markov chain;
  • FIG. 3 shows a model of the output of a Markov chain as a mixture of Gaussians;
  • FIGS. 4A-4C show an input-output, a factorial, and a coupled Hidden Markov Model (HMM), respectively;
  • FIG. 5 shows a predictor corrector algorithm of the discrete Kalman filter cycle;
  • FIG. 6 shows aspects of the discrete Kalman filter cycle algorithm;
  • FIG. 7 shows aspects of the extended Kalman filter cycle;
  • FIG. 8 shows a block diagram of a preferred embodiment of a communications system according to the present invention;
  • FIG. 9 is a schematic diagram showing the prioritization scheme; and
  • FIG. 10 is a block diagram representing a message format.
  • The present invention seeks, among other aspects, to apply aspects of game theory to the enhancement or optimization of communities. These communities may themselves have various rules, arrangements or cultures, which can be respected or programmed as a part of the system operation. Thus, in accordance with a game theoretic analysis, various rules and perceived benefits may be applied to appropriately model the real system, or may be imposed to control behavior. These communities may be formed or employed for various purposes, and a preferred embodiment optimizes wireless communications in an open access, e.g., unlicensed band. By optimizing communications, a greater communications bandwidth will generally be available, which will allow richer communications. This, in turn, permits new applications which depend on communications.
  • First Embodiment
  • In a typical auction, each player is treated fairly; that is, the same rules apply to each player, and therefore a single economy describes the process. The fair auction therefore poses challenges for an inherently hierarchal set of users, such as a military organization, where rank is accompanied by privilege. The net result, however, is a decided disadvantage to lower ranking agents, at least when viewed in light of constricted self-interest. The issues that arise are similar to the relating to “altruism”, although not identical, and thus the game theoretic analysis of altruistic behavior may be imported for consideration as appropriate.
  • In a mobile ad hoc communications network, a real issue is user defection or non-compliance. For example, where a cost is imposed on a user for participating in the ad hoc network, e.g., battery power consumption, if the anticipated benefit does not exceed the cost, the user will simply turn off the device until actually needed. The result of mass defection will, of course, be the instability and failure of the ad hoc network itself, leading to decreased utility, even for those who gain an unfair or undue advantage under the system. Thus, perceived fairness and net benefit is required to network success, assuming that defection and/or non-compliance are possible. On the other hand, in military systems, the assertion of rank as a basis for priority is not itself arbitrary and capricious. Orders and communications from a central command are critical for the organization itself, and thus the lower ranking agents gain at least a peripheral benefit as their own chain of command employs their resources. Therefore, the difficulty in analyzing the application of a fair game to a hierarchal organization is principally a result of conceptualizing and aligning the individual incentives with those of the organization as a whole and the relationship between branches. Thus, in contradistinction to typical self-organizing peer-to-peer networks, a hierarchal network is not seen as self-organizing, at least in terms of the hierarchy, which is extrinsic to the formation of the communications network under consideration.
  • As discussed below, the “distortions” of the network imposed by the external hierarchy can be analyzed and accounted for by, for example, the concepts of inheritance and delegation. Thus, each branch of a hierarchy tree may be considered an object, which receives a set of characteristics from its root, and from which each sub-branch inherits the characteristics and adds subcharacteristics of, for example, specialization. It is noted that the hierarchy need not follow non-ambiguous or perfect rules, and thus there is no particular limit imposed that the hierarchy necessarily follow these formalisms. Rather, by analyzing those aspects of the hierarchy which comply with these formalisms in accordance therewith, efficiency is facilitated.
  • In establishing an economic system, a preliminary question is whether the system is microeconomic or macroeconomic; that is, whether the economy is linked to a real economy or insulated from it. One disadvantage of a real economy with respect to a peer relationship is that external wealth can override internal dynamics, thus diminishing the advantages to be gained by optimization, and potentially creating a perception of unfairness for externally less wealthy agents, at least unless and until the system accomplishes a wealth redistribution. An artificial economy provides a solution for a peer network in which each node has an equal opportunity to gain control over the ad hoc network, independent of outside influences. On the other hand, by insulating the network from external wealth redistribution, real efficiency gains may be unavailable. Therefore, both types of economies, as well as hybrids, are available. Thus, as discussed in more detail below, a “fair” initial (or recurring) wealth distribution may be applied, which may be supplemented with, and/or provide an output of, external wealth. The rules or proportion of external influence may be predetermined, adaptive, or otherwise.
  • In accordance with the proposed artificial economy, each node has a generator function for generating economic units, which are then used in an auction with other nodes to create a market economy, that is, each node has a supply and demand function, and acts as a source or sink for a limited resource. In some cases, nodes may have only supply or demand functions, or a degree of asymmetry, but in this case, these are typically subject to an external economic consideration, and the artificial economy will be less effective in providing appropriate incentives. According to this embodiment, the artificial economic units have a temporally and spatially declining value, so that wealth does not accumulate over long periods and cannot be transferred over large distances. The decline may be linear, exponential, or based on some other function. This creates a set of microeconomies insulated from each other. Where distant microeconomies must deal with each other, there is a discount. This architecture provides a number of advantages, for example, by decreasing the influence of more spatially and temporally distant effects, the scope of an optimization analysis may be relatively constrained, while reducing the amount of information which must be stored over time and/or carried over distance in order to permit an optimization. Likewise, since the economy is artificial, the discount need not be recouped within the scope of the system. In the same manner, a somewhat different incentive structure may be provided; that is, economic units generated at one location and at one time may have a higher value at a different location and time; this may encourage reduced immediate use of the system, and relocation to higher valued locations. As discussed below, one embodiment of the invention permits trading of credits, and thus, for example, a user may establish a repeater site at an underserved location to gain credits for use elsewhere. Preferably, beyond a “near field” effect, the value does not continue to increase, since this may result in inflationary pressures, and undermine the utility of the system in optimally balancing immediate supply and demand at a particular location. Through modifications of the governing rules and formulae, the system can be incentivized to behave in certain ways, care should be exercised since a too narrow analysis of the incentive might result in unintended effects. To the extent that human behavior is involved, care should also be exercised in applying a rationality assumption, since this is not always true. Rather, there may be applicable models for human irrational behavior that are better suited to an understanding of the network behavior in response to a perturbation.
  • The typical peer-to-peer ad hoc network may be extended to the hierarchal case by treating each branch (including sub-branches) within the chain of command as an economic unit with respect to the generator function. At any level of the hierarchy, the commander retains a portion of the wealth generation capacity, and delegates the remainder to its subordinates. Therefore, the rank and hierarchal considerations are translated to an economic wealth (or wealth generation) distribution. One aspect of this system allows wealth transfer or redistribution, although in a real system, a time delay is imposed, and in the event of a temporally and/or spatially declining value, the transfer will impose a cost. Thus, an initial misallocation is undesired, and there will be an incentive to optimally distribute the wealth initially. Of course, if centralized control with low penalty is desired, it is possible to limit the penalty, of any, for wealth redistribution through appropriate rules, although the time for propagation through the network remains an issue, and blind nodes (i.e., those which do not have an efficient communication path, or have insufficient resources to utilize otherwise available paths through the hierarchy) may also lead to limitations on system performance.
  • In this system, there may be an economic competitive distortion, under which a node's subjective value of a resource is influenced by its then subjective wealth. If a node is supplied with wealth beyond its needs, the wealth is wasted, since it declines in value and cannot be hoarded indefinitely. (In a network wealth model in which wealth could be hoarded indefinitely, small deviations from optimality and arbitrage opportunities may be exploited to create a perception of unfairness, thus, this is not preferred.) If a node is supplied with insufficient wealth, economic surplus through transactional gains are lost. Thus, each node must analyze its expected circumstances to retain or delegate the generator function, and to optimally allocate wealth between competing subordinates. Likewise, there may be a plurality of quasi-optimal states.
  • In any economic transaction, there is an amount that a seller requires to part with the resource, a price a buyer is willing to pay, and a surplus between them. Typically, in a two party transaction, the surplus is allocated to the party initiating the transaction, that is, the party initiating the transaction uses some discovery mechanism to find the minimum price acceptable by the buyer. In brokered or agent-mediated transactions, a portion of the surplus is allocated to a facilitator. In accordance with this aspect of the present invention, compliance with the community rules, as well as an incentive to bid or ask a true private value is encouraged by distributing a portion of the transaction surplus to competitive bidders in accordance with their reported valuations. In particular, the competitive bidders seeking to allocate a scarce resource for themselves receive compensation for deferring to the winning bidder in an amount commensurate with their reported value. Thus, sellers receive their minimum acceptable value, buyers pay their maximum valuation, the surplus is distributed to the community in a manner tending to promote the highest bids, that is, the true bidder value (or even possibly slightly higher). In a corresponding manner, the auction rules can be established to incentivized sellers to ask the minimum possible amount. For example, a portion of the surplus may be allocated to bidders in accordance with how close they come to the winning ask. Therefore, both incentives may be applied, for example with the surplus split in two, and half allocated to the bidder pool and half allocated to the seller pool. Clearly, other allocations are possible.
  • The winning bidder and/or seller may be included within the rebate pool. This is particularly advantageous where for various reasons, the winning bidder is not selected. Thus, this process potentially decouples the bidding (auction) process and the resulting commercial transaction. It may also be useful to apply Vickrey (second price) rules to the auction, that is, the winning bidder pays the second bid price, and/or the winning seller pays the second ask price. Because of transactional inefficiencies, human behavioral aspects, and a desire to avoid increased network overhead by “false” bidders seeking a share of the allocation pool without intending to win the auction, it may be useful to limit the allocation of the surplus pool to a subset of the bidders and/or sellers, for example the top three of one or both. This therefore encourages bidders and/or sellers to seek to be in the limited group splitting the pool, and thus incentivizes higher bids and lower asks. Of course, a party will have a much stronger incentive to avoid bidding outside its valuation bounds, so the risk of this type of inefficiency is small. One embodiment of the invention provides a possible redistribution or wealth among nodes within a hierarchal chain. This redistribution may be of accumulated wealth, or of the generation function portion. Trading among hierarchically related parties is preferred, since the perceived cost is low, and the wealth can be repeatedly redistributed. In fact, it is because of the possibility of wealth oscillation and teaming that the declining wealth function is preferred, since this will tend to defeat closely related party control over the network for extended periods.
  • It is noted that, in a multihop mobile ad hoc network, if a communication path fails, no further transfers are possible, potentially resulting in stalled or corrupt system configuration. It is possible to transfer an expiring or declining portion of the generating function; however, this might lead a node which is out of range to have no ability to rejoin the network upon return, and thus act as an impediment to efficient network operation. Therefore, it is preferred that, in an artificial economy, each node has some intrinsic wealth generator function, so an extended period of inactivity, a node gains wealth likely sufficient to rejoin the network as a full participant. In practice, in a typical military-type hierarchy, the bulk of the wealth generating function will be distributed to the lowest ranks with the highest numbers. Thus, under normal circumstances, the network will appear to operate according to a non-hierarchical (i.e., peer-to-peer) model, with the distortion that not all nodes have a common generator function. On the other hand, hierarchically superior nodes either retain, or more likely, can quickly recruit surrounding subordinates to allocate their wealth generating function and accumulated wealth to pass urgent or valuable messages. Thus, if 85% of the wealth and network resources are distributed to the lowest-ranking members, the maximum distortion due to hierarchal modifications is 15%.
  • One way that this allocation of wealth may be apparent is with respect to the use of expensive assets. Thus, a high level node might have access to a high power broadcast system or licensed spectrum, while low level nodes might ordinarily be limited to lower power transmission and/or unlicenced spectrum or cellular wireless communications. For a low level node to generate a broadcast using an expensive asset (or to allocate a massive amount of space-bandwidth product), it must pass the request up through the chain of command, until sufficient wealth (i.e., authority) is available to implement the broadcast.
  • In fact, such communications and authorizations are quite consistent with the expectations within a hierarchal organization, and this construct is likely to be accepted within a military-type hierarchal organization.
  • Under normal circumstances, a superior would have an incentive to assure that each subordinate node possesses sufficient wealth to carry out its function and be incentivized to participate in the network. If a subordinate has insufficient initial wealth (or wealth generating function) allocation, it may still participate, but it must expend its internal resources to obtain wealth for participation in its own benefit. This, in turn, leads to a potential exhaustion of resources, and the unavailability of the node for ad hoc intermediary use, even for the benefit of the hierarchy. An initial surplus allocation will lead to overbidding for resources, and thus inefficient resource allocation, potential waste of allocation, and a disincentive to act as an intermediary in the ad hoc network. While in a traditional military hierarchy, cooperation can be mandated, in systems where cooperation is perceived as contrary to the net personal interests of the actor, network stability may be poor, and defection in spite of mandate. In a military system, it is thus possible to formulate an “engineered” solution which forces participation and eliminates defection; however, it is clear that such solutions forfeit the potential gains of optimality, and incentivizes circumvention and non-compliance. Further, because such a system is not “cost sensitive” (however the appropriate cost function might be expressed), it fails to respond to “market” forces.
  • Accordingly, a peer to peer mobile ad hoc network suitable for respecting hierarchal organization structures is delegation is provided. In this hierarchal system, the hierarchy is represented by an initial wealth or wealth generation function distribution, and the hierarchically higher nodes can reallocate wealth of nodes beneath themselves, exercising their higher authority. This wealth redistribution can be overt or covert, and if overt, the hierarchal orders can be imposed without nodal assent. In a covert redistribution, trust may be required to assure redistribution by a node to a grandchild node. The wealth and its distribution can be implemented using modified micropayment techniques and other verifyable cryptographic techniques. This wealth can be applied to auctions and markets, to allocate resources. Various aspects of this system are discussed in more detail elsewhere herein.
  • Second Embodiment
  • Multihop Ad Hoc Networks require cooperation of nodes which are relatively disinterested in the content being conveyed. Typically, such disinterested intermediaries incur a cost for participation, for example, power consumption or opportunity cost. Economic incentives may be used to promote cooperation of disinterested intermediaries. An economic optimization may be achieved using a market-finding process, such as an auction. In many scenarios, the desire for the fairness of an auction is tempered by other concerns, i.e., there are constraints on the optimization which influence price and parties of a transaction. For example, in military communication systems, rank may be deemed an important factor in access to, and control over, the communications medium. A simple process of rank-based preemption, without regard for subjective or objective importance, will result in an inefficient economic distortion. In order to normalize the application of rank, one is presented with two options: imposing a normalization scheme with respect to rank to create a unified economy, or providing considering rank using a set of rules outside of the economy. One way to normalize rank, and the implicit hierarchy underlying the rank, is by treating the economy as an object-oriented hierarchy, in which each individual inherits or is allocated a subset of the rights of a parent, with peers within the hierarchy operating in a purely economic manner. The extrinsic consideration of rank, outside of an economy, can be denominated “respect”, which corresponds to the societal treatment of the issue, rather than normalizing this factor within the economy, in order to avoid unintended secondary economic distortion. Each system has its merits and limitations.
  • An economic optimization is one involving a transaction in which all benefits and detriments can be expressed in normalized terms, and therefore by balancing all factors, including supply and demand, at a price, an optimum is achieved. Auctions are well known means to achieve an economic optimization between distinct interests, to transfer a good or right in exchange for a market price. While there are different types of auctions, each having their limitations and attributes, as a class these are well accepted as a means for transfer of goods or rights at an optimum price. Where multiple goods or rights are required in a sufficient combination to achieve a requirement, a so-called Vickrey-Clarke-Groves (VCG) auction may be employed. In such an auction, each supplier asserts a desired price for his component. The various combinations which meet the requirement are then compared, and the lowest selected. In a combinatorial supply auction, a plurality of buyers each seek a divisible commodity, and each bids its best price. The bidders with the combination of prices which is maximum is selected. In a commodity market, there are a plurality of buyers and sellers, so the auction is more complex. In a market economy, the redistribution of goods or services are typically transferred between those who value them least to those who value them most. The transaction price depends on the balance between supply and demand; with the surplus being allocated to the limiting factor.
  • DERIVATIVES, HEDGES, FUTURES AND INSURANCE: In a market economy, the liquidity of the commodity is typically such that the gap between bid and ask is small enough that each buyer and seller gain portions of the surplus, and the gap between them is small enough that it is insignificant in terms of preventing a transaction. Of course, the quantum of liquidity necessary to assure an acceptably low gap is subjective, but typically, if the size of the market is sufficient, there will be low opportunity for arbitrage, or at least a competitive market for arbitrage. The arbitrage may be either in the commodity, or options, derivatives, futures, or the like.
  • In a market for communications resources, derivatives may provide significant advantages over a simple unitary market for direct transactions. For example, a node may wish to procure a reliable communications pathway for an extended period. Thus, it may seek to commit resources into the future, and not be subject to future competition for those resources, especially being subject to a prior broadcast of its own private valuation and a potential understanding by competitors of the presumed need for continued allocation of the resources. Thus, for similar reasons for the existence of derivative, options, futures, etc. markets, their analogy may be provided within a communications resource market.
  • In a futures market analogy, an agent seeks to procure its long-term or bulk requirements, or seeks to dispose of its assets in advance of their availability. In this way, there is increased predictability, and less possibility of self-competition. It also allows transfer of assets in bulk to meet an entire requirement or production lot capability, thus increasing efficiency and avoiding partial availability or disposal.
  • One issue in mobile ad hoc networks is accounting for mobility of nodes and unreliability of communications. In commodities markets, one option is insurance of the underlying commodity and its production. The analogy in communications resource markets focuses on communications is the reliability, since the nodal mobility is “voluntary” and not typically associated with an insurable risk. On the other hand, the mobility risk may be mitigated by an indemnification. In combination, these, and other risk transfer techniques may provide means for a party engaged in a communications market transaction to monetarily compensate for risk tolerance factors. An agent in the market having a low risk tolerance can undertake risk transference, at some additional transaction costs, while one with a high risk tolerance can “go bare” and obtain a lower transaction cost.
  • Insurance may be provided in various manners. For example, some potential market participants may reserve wealth, capacity or demand for a fee, subject to claim in the event of a risk event. In other cases, a separate system may be employed, such as a cellular carrier, to step in the event that a lower cost resource is unavailable (typically for bandwidth supply only). A service provider may provide risk-related allocations to network members in an effort to increase perceived network stability; likewise, if the network is externally controlled, each node can be subject to a reserve requirements which is centrally (or hierarchically) allocated. If an agent promises to deliver a resource, and ultimately fails to deliver, it may undertake an indemnification, paying the buyer an amount representing “damages”, the transaction cost of buyer, e.g., the cost of reprocurement plus lost productivity. Likewise, if an agent fails to consume resources committed to it, it owes the promised payment, less the resale value of the remaining resources. An indemnification insurer/guarantor can undertake to pay the gap on behalf of the defaulting party. Typically, the insurer is not a normal agent peer, but can be. Hedge strategies may also be employed in known manner.
  • In order for markets to be efficient, there must be a possibility for resale of future assets. This imposes some complexity, since the assets are neither physical nor possessed by the intermediary. However, cryptographic authentication of transactions may provide some remedy. On the other hand, by increasing liquidity and providing market-makers, the transaction surplus may be minimized, and thus the reallocation of the surplus as discussed above minimized. Likewise, in a market generally composed of agents within close proximity, the interposition of intermediaries may result in inefficiencies rather than efficiencies, and the utility of such complexity may better come from the facilitation of distant transactions. Thus, if one presumes slow, random nodal mobility, little advantage is seen from liquid resource and demand reallocation. On the other hand, if an agent has a predefined itinerary for rapidly relocating, it can efficiently conduct transactions over its path, prearranging communication paths, and thus providing trunk services. Thus, over a short term, direct multihop communications provide long-distance communications of both administrative and content data. On the other hand, over a longer term, relocation of agents may provide greater efficiency for transport of administrative information, increasing the efficiency of content data communications over the limited communications resources.
  • BANDWIDTH AUCTION: A previous scheme proposes the application of game theory in the control of multihop mobile ad hoc networks according to “fair” principles. In this prior scheme, nodes seeking to control the network (i.e., are “buyers” of bandwidth), conduct an auction for the resources desired. Likewise, potential intermediate nodes conduct an auction to supply the resources. The set of winning bidders and winning sellers is optimized to achieve the maximum economic surplus. Winning bidders pay the maximum bid price or second price, while winning sellers receive their winning ask or second price. The remaining surplus is redistributed among losing bidders, whose cooperation and non-interference with the winning bidders is required for network operation, in accordance with their proportionate bid for contested resources. The winning bids are determined by a VCG combinatorial process. The result is an optimum network topology with a reasonable, but by no means the only, fairness criterion, while promoting network stability and utility.
  • As discussed above, risk may be a factor in valuing a resource. The auction optimization may therefore be normalized or perturbed in dependence on an economic assessment of a risk tolerance, either based on a personal valuation, or based on a third party valuation (insurance/indemnification). Likewise, the optimization may also be modified to account for other factors.
  • Thus, one issue with such a traditional scheme for fair allocation of resources is that it does not readily permit intentional distortions. However, in some instances, a relatively extrinsic consideration to supply and subjective demand may be a core requirement of a system. For example, in military systems, it is traditional and expected that higher military rank will provide access to and control over resources on a favored basis. In civilian systems, emergency and police use may also be considered privileged. However, by seeking to apply economic rules to this access, a number of issues arise. Most significantly, as a privileged user disburses currency, this is distributed to unprivileged users, leading to an inflationary effect and comparative dilution of the intended privilege. If the economy is real, that is the currency is linked to a real economy, this grant of privilege will incur real costs, which is also not always an intended effect. If the economy is synthetic, that is, it is unlinked to external economies, then the redistribution of wealth within the system can grant dramatic and potentially undesired control to a few nodes, potentially conveying the privilege to those undeserving, except perhaps due to fortuitous circumstances.
  • Two different schemes may be used to address this desire for both economic optimality and hierarchal considerations. One scheme maintains optimality and fairness within the economic structure, but applies a generally orthogonal consideration of “respect” as a separate factor within the operation of the protocol. Respect is a subjective factor, and thus permits each bidder to weight its own considerations. It is further noted that Buttyan et al. have discussed this factor as a part of an automated means for ensuring compliance with network rules, in the absence of a hierarchy. Levente Buttyan and Jean-Pierre Hubaux, Nuglets: a Virtual Currency to Stimulate Cooperation in Self-Organized Mobile Ad Hoc Networks, Technical Report DSC/2001/004, EPFL-DI-ICA, January 2001, incorporated herein by reference. See, P. Michiardi and R. Molva, CORE: A collaborative reputation mechanism to enforce node cooperation in mobile ad hoc networks, In B. Jerman-Blazic and T. Klobucar, editors, Communications and Multimedia Security, IFIP TC6/TC11 Sixth Joint Working Conference on Communications and Multimedia Security, Sep. 26-27, 2002, Portoroz, Slovenia, volume 228 of IFIP Conference Proceedings, pages 107-121. Kluwer Academic, 2002; Sonja Buchegger and Jean-Yves Le Boudec, A Robust Reputation System for P2P and Mobile Ad-hoc Networks, Second Workshop on the Economics of Peer-to-Peer Systems, June 2004; Po-Wah Yau and Chris J. Mitchell, Reputation Methods for Routing Security for Mobile Ad Hoc Networks; Frank Kargl, Andreas Klenk, Stefan Schlott, and Micheal Weber. Advanced Detection of Selfish or Malicious Nodes in Ad Hoc Network. The 1st European Workshop on Security in Ad-Hoc and Sensor Networks (ESAS 2004); He, Qi, et al., SORI: A Secure and Objective Reputation-based Incentive Scheme for Ad-Hoc Networks, IEEE Wireless Communications and Networking Conference 2004, each of which is expressly incorporated herein by reference.
  • The bias introduced in the manner is created by an assertion by one claiming privilege, and deference by one respecting privilege. One way to avoid substantial economic distortions is to require that the payment made be based on a purely economic optimization, while selecting the winner based on other factors. In this way, the perturbations of the auction process itself is subtle, that is, since bidders realize that the winning bid may not result in the corresponding benefit, but incurs the publication of private values and potential bidding costs, there may be perturbation of the bidding strategy from optimal. Likewise, since the privilege is itself unfair and predictable, those with lower privilege ratings will have greater incentive to defect from, or act against, the network. Therefore, it is important that either the assertion of privilege be subjectively reasonable to those who must defer to it, or the incidence or impact of the assertions be uncommon or have low anticipated impact on the whole.
  • In the extreme case, the assertion of privilege will completely undermine the auction optimization, and the system will be prioritized on purely hierarchal grounds, and the pricing non-optimal or unpredictable. This condition may be acceptable or even efficient in military systems, but may be unacceptable where the deference is voluntary and choice of network protocol is available.
  • It is noted that those seeking access based on respect, must still make an economic bid. This bid, for example, should be sufficient in the case that respect is not afforded, for example, from those of equal rank or above, or those who for various reasons have other factors that override the assertion of respect. Therefore, one way to determine the amount of respect to be afforded is the self-worth advertised for the resources requested. This process therefore minimizes the deviation from optimal and therefore promotes stability of the network. It is further noted that those who assert respect based on hierarchy typically have available substantial economic resources, and therefore it is largely a desire to avoid economic redistribution rather than an inability to effect such a redistribution, that compels a consideration of respect.
  • In a combinatorial auction, each leg of a multihop link is separately acquired and accounted. Therefore, administration of the process is quite involved. That is, each bidder broadcasts a set of bids for the resources required, and an optimal network with maximum surplus is defined. Each leg of each path is therefore allocated a value. Accordingly, if a bidder seeks to acquire the route, even though it was an insufficient economic bidder, but awarded the route based on respect, then those who must defer or accept reduced compensation must acquiesce based on deference, which is neither intrinsically mandated nor uniform. If pricing is defined by the economic optimization, the respect consideration requires that a subsidy be applied, either as an excess payment up to the amount of the winning bid, or as a discount provided by the sellers, down to the actually bid value. Since we presume that a surplus exists, this value may be applied to meet the gap, while maintaining optimal valuation. The node demanding respect may have an impact on path segments outside the required route; and thus the required payment to meet the differential between the optimum network and the resulting network may thus be significant. If there is insufficient surplus, then a different strategy may be applied.
  • Since the allocation of respect is subjective, each bidder supplies a bid, as well as an assertion of respect. Each supplier receives the bids, and applies a weighting or discount based on its subjective analysis of the respect assertion. In this case, the same bid is interpreted differently by each supplier, and the subjective analysis must be performed by or for each supplier. By converting the respect assertion into a subjective weighting or discount, a pure economic optimization may then be performed.
  • An alternate scheme for hierarchal deference is to organize the economy itself into a hierarchy. In a hierarchy, a node has one parent and possibly multiple children. At each level, a node receives an allocation of wealth from its parent, and distributes all or a portion of its wealth to children. A parent is presumed to control its children, and therefore can allocate their wealth or subjective valuations to its own ends. When nodes representing different lineages must be reconciled, one may refer to the common ancestor for arbitration, or a set of inherited rules to define the hierarchal relationships.
  • In this system, the resources available for reallocation between branches of the hierarchy depends on the allocation by the common grandparent, as well as competing allocations within the branch. This system presumes that children communicate with their parents and are obedient. In fact, if the communication presumption is violated, one must then rely on a priori instructions, which may not be sufficiently adaptive to achieve an optimal result. If the obedience presumption is violated, then the hierarchal deference requires an enforcement mechanism within the hierarchy. If both presumptions are simultaneously violated, then the system will likely fail, except on a voluntary basis, with results similar to the “reputation” scheme described above.
  • Thus, it is possible to include hierarchal deference as a factor in optimization of a multihop mobile ad hoc network, leading to compatibility with tiered organizations, as well as with shared resources.
  • GAME THEORY: Use of Game Theory to control arbitration of ad hoc networks is well known. F. P. Kelly, A. Maulloo, and D. Tan. Rate control in communication networks: shadow prices, proportional fairness and stability. Journal of the Operational Research Society, 49, 1998.; J. MacKie-Mason and H. Varian. Pricing congestible network resources. IEEE Journal on Selected Areas in Communications, 13(7):1141-1149, 1995. Some prior studies have focused on the incremental cost to each node for participation in the network, without addressing the opportunity cost of a node foregoing control over the communication medium. Courcoubetis, C., Siris, V. A. and Stamoulis, G. D. Integration of pricing and flow control for available bit rate services in ATM networks. In Proceedings IEEE Globecom '96, pp. 644-648. London, UK. A game theoretic approach addresses the situation where the operation of an agent which has freedom of choice, allowing optimization on a high level, considering the possibility of alternatives to a well-designed system. According to game theory, the best way to ensure that a system retains compliant agents, is to provide the greatest anticipated benefit, at the least anticipated cost, compared to the alternates.
  • Game Theory provides a basis for understanding the actions of Ad hoc network nodes. A multihop ad hoc network requires a communication to be passed through a disinterested node. The disinterested node incurs some cost, thus leading to a disincentive to cooperate. Meanwhile, bystander nodes must defer their own communications in order to avoid interference, especially in highly loaded networks. By understanding the decision analysis of the various nodes in a network, it is possible to optimize a system which, in accordance with game theory, provides benefits or incentives, to promote network reliability and stability. The incentive, in economic form, may be charged to those benefiting from the communication, and is preferably related to the value of the benefit received. The proposed network optimization scheme employs a modified combinatorial (VCG) auction, which optimally compensates those involved in the communication, with the benefiting party paying the second highest bid price (second price). The surplus between the second price and VCG price is distributed among those who defer to the winning bidder according to respective bid value. Equilibrium usage and headroom may be influenced by deviating from a zero-sum condition. The mechanism seeks to define fairness in terms of market value, providing probable participation benefit for all nodes, leading to network stability.
  • AD HOC NETWORKS: An ad hoc network is a wireless network which does not require fixed infrastructure or centralized control. The terminals in the network cooperate and communicate with each other, in a self-organizing network. In a multihop network, communications can extend beyond the scope of a single node, employing neighboring nodes (within the scope) to forward messages to their destination. In a mobile ad hoc network, constraints are not placed on the mobility of nodes, that is, they can relocate within a time scale which is short with respect to the communications, thus requiring consideration of dynamic changes in network architecture.
  • Ad hoc networks pose control issues with respect to contention, routing and information conveyance. There are typically tradeoffs involving equipment size, cost and complexity, protocol complexity, throughput efficiency, energy consumption, and “fairness” of access arbitration. Other factors may also come into play. L. Buttyan and J.-P. Hubaux. Rational exchange-a formal model based on game theory. In Proceedings of the 2nd International Workshop on Electronic Commerce (WELCOM), November 2001.; P. Michiardi and R. Molva. Game theoretic analysis of security in mobile ad hoc networks. Technical Report RR-02-070, Institut Eurécom, 2002; P. Michiardi and R. Molva. A game theoretical approach to evaluate cooperation enforcement mechanisms in mobile ad hoc networks. In Proceedings of WiOpt'03, March 2003; Michiardi, P., Molva, R.: Making greed work in mobile ad hoc networks. Technical report, Institut Eur ecom (2002); S. Shenker. Making greed work in networks: A game-theoretic analysis of switch service disciplines. IEEE/ACM Transactions on Networking, 3(6):819-831, December 1995; A. B. MacKenzie and S. B. Wicker. Selfish users in aloha: A game-theoretic approach. In Vehicular Technology Conference, 2001. VTC 2001 Fall. IEEE VTS 54th, volume 3, October 2001; J. Crowcroft, R. Gibbens, F. Kelly, and S. Östring. Modelling incentives for collaboration in mobile ad hoc networks. In Proceedings of WiOpt'03, 2003. Game theory studies the interactions of multiple independent decision makers, each seeking to fulfill their own objectives. Game theory encompasses, for example, auction theory and strategic decision-making. By providing appropriate incentives, a group of independent actors may be persuaded, according to self-interest, to act toward the benefit of the group. That is, the selfish individual interests are aligned with the community interests. In this way, the community will be both efficient and the network of actors stable and predictable. Of course, any systems wherein the “incentives” impose too high a cost, themselves encourage circumvention. In this case, game theory also addresses this issue. In computer networks, issues arise as the demand for communications bandwidth approaches the theoretical limit. Under such circumstances, the behavior of nodes will affect how close to the theoretical limit the system comes, and also which communications are permitted. The well-known collision sense, multiple access (CSMA) protocol allows each node to request access to the network, essentially without cost or penalty, and regardless of the importance of the communication. While the protocol incurs relatively low overhead and may provide fully decentralized control, under congested network conditions, the system may exhibit instability, that is, a decline in throughput as demand increases, resulting in ever increasing demand on the system resources and decreasing throughput. Durga P. Satapathy and Jon M. Peha, Performance of Unlicensed Devices With a Spectrum Etiquette,” Proceedings of IEEE Globecom, November 1997, pp. 414-418. According to game theory, the deficit of the CSMA protocol is that it is a dominant strategy to be selfish and hog resources, regardless of the cost to society, resulting in “the tragedy of the commons.” Garrett Hardin. The Tragedy of the Commons. Science, 162:1243-1248, 1968. Alternate Location: Game theory is most readily applied in the optimization of communications routes through a defined network, to achieve the best surplus allocation. The problems of determining the network topology, and conducting the communications themselves, are also applications of game theory. Since the communications incidental to the network access arbitration require consideration of some of the same issues as the underlying communications, elements of game theory apply correspondingly. Due to various uncertainties, the operation of the system is stochastic. This presumption, in turn, allows estimation of optimality within an acceptable margin of error, permitting simplifying assumptions and facilitating implementation.
  • In an ad hoc network used for conveying real-time information, as might be the case in a telematics system, there are potentially unlimited data communication requirements (e.g., video data), and network congestion is almost guaranteed. Therefore, using a CSMA protocol as the paradigm for basic information conveyance is destined for failure, unless there is a disincentive to network use. (In power constrained circumstances, this cost may itself provide such a disincentive). On the other hand, a system which provides more graceful degradation under high load, sensitivity to the importance of information to be communicated, and efficient utilization of the communications medium would appear more optimal. One way to impose a cost which varies in dependence on the societal value of the good or service, is to conduct an auction, which is a mechanism to determine the market value of the good or service, at least between the auction participants. Walsh, W. and M. Wellman (1998). A market protocol for decentralized task allocation, in “Proceedings of the Third International Conference on Multi-Agent Systems,” pp. 325-332, IEEE Computer Society Press, Los Alamitos. In an auction, the bidder seeks to bid the lowest value, up to a value less than or equal to his own private value (the actual value which the bidder appraises the good or service, and above which there is no surplus), that will win the auction. Since competitive bidders can minimize the gains of another bidder by exploiting knowledge of the private value attached to the good or service by the bidder, it is generally a dominant strategy for the bidder to attempt to keep its private value a secret, at least until the auction is concluded, thus yielding strategies that result in the largest potential gain. Auction strategies become more complex when the bidder himself is not a consumer or collector, but rather a reseller. In this case, the private value of the bidder is influenced by the perception of the private value of other bidders, and thus may change over the course of the auction in a successive price auction. On the other hand, in certain situations, release or publication of the private value is a dominant strategy, and can result in substantial efficiency, that is, honesty in reporting the private value results in the maximum likelihood of prospective gain.
  • APPLICATION OF GAME THEORY TO AD HOC NETWORKS: There are a number of aspects of ad hoc network control which may be adjusted in accordance with game theoretic approaches. An example of the application of game theory to influence system architecture arises when communications latency is an issue. A significant factor in latency is the node hop count. Therefore, a system may seek to reduce node hop count by using an algorithm other than a nearest neighbor algorithm, bypassing some nodes with longer distance communications. In analyzing this possibility, one must not only look at the cost to the nodes involved in the communication, but also the cost to nodes which are prevented from simultaneously accessing the network dude to interfering uses of network resources. As a general proposition, the analysis of the network must include the impact of each action, or network state, on every node in the system, although simplifying presumptions may be appropriate where information is unavailable, or the anticipated impact is trivial. Game theory is readily applied in the optimization of communications routes through a defined network, to achieve the best economic surplus allocation. In addition, the problem of determining the network topology, and the communications themselves, are ancillary, though real, applications of game theory. Since the communications incidental to the arbitration require consideration of some of the same issues as the underlying communications, corresponding elements of game theory may apply at both levels of analysis. Due to various uncertainties, the operation of the system is stochastic. This presumption, in turn, allows estimation of optimality within a margin of error, simplifying implementation as compared to a rigorous analysis without regard to statistical significance. There are a number of known and proven routing models proposed for forwarding of packets in ad hoc networks. These include Ad Hoc On-Demand Distance Vector (AODV) Routing, Optimized Link State Routing Protocol (OLSR), Dynamic Source Routing Protocol (DSR), and Topology Dissemination Based on Reverse-Path Forwarding (TBRPF). M. Mauve, J. Widmer, and H. Hartenstein. A survey on position-based routing in mobile ad hoc networks. IEEE Network Magazine, 15(6):30-39, November 2001.; Z. Haas. A new routing protocol for reconfigurable wireless networks. In IEEE 6th International Conference on Universal Communications Record, volume 2, pages 562-566, October 1997; X. Hong, K. Xu, and M. Gerla. Scalable routing protocols for mobile ad hoc networks. IEEE Networks, 16(4):11-21, July 2002; D. Johnson, D. Maltz, and Y.-C. Hu. The dynamic source routing protocol for mobile ad hoc networks, April; S.-J. Lee, W. Su, J. Hsu, M. Gerla, and R. Bagrodia. A performance comparison study of ad hoc wireless multicast protocols. In Proceedings of IEEE INFOCOM 2000, pages 565-574, March 2000; K. Mase, Y. Wada, N. Mori, K. Nakano, M. Sengoku, and S. Shinoda. Flooding schemes for a universal ad hoc network. In Industrial Electronics Society, 2000. IECON 2000, v. 2, pp. 1129-1134, 2000; R. Ogier, F. Templin, and M. Lewis. Topology dissemination based on reversepath forwarding, October 2003.vesuvio.ipv6.cseltit/internet-drafts/draft-ietf-manet-tbrpf-11.txt; A. Orda, R. Rom, and N. Shimkin. Competitive routing in multi-user communication networks. IEEE/ACM Transactions on Networking, 1(5):510-521, October 1993; C. Perkins, E. Belding-Royer, and S. Das. Ad hoc on-demand distance vector (AODV) routing. Request for comments 3561, Internet Engineering Task Force, 2003; C. E. Perkins, editor. Ad Hoc Networking. Addison-Wesley, Boston, 2001; E. Royer and C.-K. Toh. A review of current routing protocols for ad hoc mobile wireless networks. IEEE Personal Communications, 6(2):46-55, April 1999; Holger Füβler, Hannes Hartenstein, Dieter Vollmer, Martin Mauve, Michael Käsemann, Location-Based Routing for Vehicular Ad-Hoc Networks, Reihe Informatik March 2002,; J. Broch, D. A. Maltz, D. B. Johnson, Y. C. Hu, and J. Jetcheva. A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols. In Proc. of the ACM/IEEE MobiCom, October 1998, In most systems analyzed to date, the performance metrics studied were power consumption, end-to-end data throughput and delay, route acquisition time, percentage out-of-order delivery, and efficiency. A critical variable considered in many studies is power cost, presuming a battery operated transceiver with limited power availability. Juha Leino, “Applications of Game Theory in Ad Hoc Network”, Master's Thesis, Helsinki University Of Technology (2003); J. Shneidman and D. Parkes, “Rationality and Self-Interest in Peer to Peer Networks”, In Proc. 2nd Int. Workshop on Peer-to-Peer Systems (IPTPS'03), 2003,; V. Rodoplu and H.-Y. Meng. Minimum energy mobile wireless networks. IEEE Journal on Selected Areas in Communications, 17(8)1333-1344, August 1999; S. Singh, M. Woo, and C. S. Raghavendra. Power-aware routing in mobile ad hoc networks. In Proceeding of MOBICOM 1998, pages 181-190, 1998; A. Urpi, M. Bonuccelli, and S. Giordano. Modelling cooperation in mobile ad hoc networks: a formal description of selfishness. In Proceedings of WiOpt'03, March 2003; A. van den Nouweland, P. Borm, W. van Golstein Brouwers, R. Groot Bruinderink, and S. Tijs. A game theoretic approach to problems in telecommunication. Management Science, 42(2):294-303, February 1996.
  • There can be significant differences in optimum routing depending on whether a node can modulate its transmit power, which in turn controls range, and provides a further control over network topology. Likewise, steerable antennas, antenna arrays, and other forms of multiplexing provide further degrees of control over network topology. Note that the protocol-level communications are preferably broadcasts, while information conveyance communications are typically point-to-point. Prior studies typically presume a single transceiver, with a single omnidirectional antenna, operating according to in-band protocol data, for all communications. The tradeoff made in limiting system designs according to these presumptions should be clear.
  • It is the general self-interest of a node to conserve its own resources, maintain an opportunity to access network resources, while consuming whatever resource of other nodes as it desires. Clearly, this presents a significant risk of the “tragedy of the commons”, in which selfish individuals fail to respect the very basis for the community they enjoy, and a network of rational nodes operating without significant incentives to cooperate would likely fail. On the other hand, if donating a node's resources generated a sufficient associated benefit to that node, while consuming network resources imposed a sufficient cost, stability and reliability can be achieved. So long as the functionality is sufficient to meet the need, and the economic surplus is “fairly” allocated, that is, the cost incurred is less than the private value of the benefit, and that cost is transferred as compensation to those burdened in an amount in excess of their incremental cost, adoption of the system should increase stability. Even outside of these bounds, the system may be more stable than one which neither taxes system use nor rewards altruistic behavior. While the basic system is a zero sum system, and over time, the economic effects will likely average out (assuming symmetric nodes), in any particular instance, the incentive for selfish behavior by a node will be diminished. One way to remedy selfish behavior is to increase the cost of acting this way, that is, to impose a cost or tax for access to the network. In a practical implementation, however, this is problematic, since under lightly loaded conditions, the “value” of the communications may not justify a fixed cost which might be reasonable under other conditions, and likewise, under heavier loads, critical communications may still be delayed or impeded. Note that where the network includes more nodes, the throughput may increase, since there are more potential routes and overall reliability may be increased, but the increased number of nodes will likely also increase network demand. A variable cost, dependent on relative “importance”, may be imposed, and indeed, as alluded to above, this cost may be market based, in the manner of an auction. In a multihop network, such an auction is complicated by the requirement for a distribution of payments within the chain of nodes, with each node having potential alternate demands for its cooperation. The market-based price-finding mechanism, e.g., a VCG auction mechanism, excludes nodes which ask a price not supported by its market position, and the auction itself may comprise a value function encompassing reliability, latency, quality of service, or other non-economic parameters, in economic terms. The network may further require compensation to nodes which must defer communications because of inconsistent states, such as in order to avoid interference or duplicative use of an intermediary node, and which take no direct part in the communication. It is noted that the concept of the winner of an auction paying the losers is not well known, and indeed somewhat counterintuitive. Indeed, the effect of this rule perturbs the traditional analysis framework, since the possibility of a payment from the winner to the loser alters the allocation of economic surplus between the bidder, seller, and others. Likewise, while the cost to the involved nodes may be real, the cost to the uninvolved nodes may be subjective. Clearly, it would appear that involved nodes should generally be better compensated than uninvolved nodes, although a rigorous analysis remains to be performed.
  • The network provides competitive access to the physical transport medium, and cooperation with the protocol provides significant advantages over competition with it. Under normal circumstances, a well-developed ad hoc network system can present as a formidable coordinated competitor for access to contested bandwidth by other systems, while within the network, economic surplus is optimized. Thus, a node presented with a communications requirement is presented not with the simple choice to participate or abstain, but rather whether to participate in an ad hoc network with predicted stability and mutual benefit, or one with the possibility of failure due to selfish behavior, and non-cooperation. Even in the absence of a present communication requirement, a network which rewards cooperative behavior may be preferable to one which simply expects altruism.
  • Game theory also encompasses the concept of that each node may have an associated “reputation” in the community. This reputation may be evaluated as a parameter in an economic analysis, or applied separately. The protocol may also encompass the concept of node reputation, that is, a positive or negative statement by others regarding the node in question. P. Michiardi and R. Molva. Core: A collaborative reputation mechanism to enforce node cooperation in mobile ad hoc networks. In Communication and Multimedia Security 2002 Conference, 2002. This reputation may be evaluated as a parameter in an economic analysis, or applied separately, and may be anecdotal or statistical. In any case, if access to resources and payments are made dependent on reputation, nodes will be incentivized to maintain a good reputation, and avoid generating a bad reputation. Therefore, by maintaining and applying the reputation in a manner consistent with the community goals, the nodes are compelled to advance those goals in order to benefit from the community. Game theory distinguishes between good reputation and bad reputation. Nodes may have a selfish motivation to assert that another node has a bad reputation, while it would have little selfish motivation, absent collusion, for undeservedly asserting a good reputation. On the other hand, a node may have a selfish motivation in failing to reward behavior with a good reputation. Economics and reputation may be considered orthogonal, since the status of a node's currency account provides no information about the status of its reputation. This reputation parameter may be extended to encompass respect, that is, a subjective deference to another based on an asserted or imputed entitlement. While the prior system uses reputation as a factor to ensure compliance with system rules, this can be extended to provided deferential preferences either within or extrinsic to an economy. Thus, in a military hierarchy, a relatively higher ranking official can assert rank, and if accepted, override a relatively lower ranking bidder at the same economic bid. For each node, an algorithm is provided to translate a particular assertion of respect (i.e., rank and chain of command) into an economic perturbation. For example, in the same chain of command, each difference in rank might be associated with a 25% compounded discount, when compared with other bids, i.e. B1=B0×10(1+0.25×ΔR), while outside the chain of command, a different, generally lower, discount may be applied, possibly with a base discount as compared to all bids within the chain of command, i.e., B1=B0×10(1+dCOC+dNCOC×ΔR). The discount is applied so that higher ranking officers pay less, while lower ranking officers pay more. There is a high incentive for each bid to originate from the highest available commander within the chain of command, and given the effect of the perturbation, for ranking officers to “pull rank” judiciously.
  • THE MODIFIED VCG AUCTION: A so-called Vickrey-Clarke-Groves, or VCG, auction, is a type of auction suitable for bidding, in a single auction, for the goods or services of a plurality of offerors, as a unit. Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders, Journal of Finance 16, 8-37; Clarke, E. H. (1971). Multipart pricing of public goods, Public Choice 11, 17-33; Felix Brandt and Gerhard Weil. Antisocial Agents and Vickrey Auctions. In Pre-proceedings of the Eighth International Workshop on Agent Theories, Architectures, and Languages (ATAL-2001), pages 120-132, 2001; Tuomas Sandholm. Limitations of the Vickrey Auction in Computational Multiagent Systems. In Proceedings of the 2nd International Conference on Multi-Agent Systems (ICMAS). AAAI Press, 1996. Menlo Park, Calif.; Ron Lavi, Ahuva Mu'alem, and Noam Nisan, “Towards a Characterization of Truthful Combinatorial Auctions”,; Moulin, H. (1999). Incremental cost sharing; characterization by strategyproofness, Social Choice and Welfare 16, 279-320; Moulin, H. and S. Shenker (1997). Strategyproof Sharing of Submodular Costs: Budget Balance Versus Efficiency, to appear in Economic Theory.; Moulin, Hervé, and Scott Shenker (2001). “Strategyproof Sharing of Submodular Costs: Budget Balance versus Efficiency.” Economic Theory 18, 511-533; Feigenbaum, Joan, Christos Papadimitriou, Rahul Sami, and Scott Shenker (2002). “A BGP-based Mechanism for Lowest-Cost Routing.” In Proc. 21st Symposium on Principles of Distributed Computing, ACM Press, 173-182; J. Feigenbaum and S. Shenker. Distributed algorithmic mechanism design: Recent results and future directions. In Proc. 6th Intl Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications, pages 1-13, Atlanta, Ga., September 2002; Nisan, N. and A. Ronen (2000). Computationally Feasible VCG Mechanisms, to be presented at “Games 2000.” www.cs.huji.acil/˜noam/; Tuomas Sandholm. Limitations of the Vickrey Auction in Computational Multiagent Systems. In Proceedings of the 2nd International Conference on Multi-Agent Systems (ICMAS). AAAI Press, 1996. Menlo Park, Calif.; C. Jason Woodard and David C. Parkes, 1st Workshop on the Economics of P2P systems, Strategyproof Mechanisms for Ad Hoc Network Formation, 2003,; D. C. Parkes. Iterative Combinatorial Auctions: Achieving Economic and Computational Efficiency (Chapter 2). PhD thesis, University of Pennsylvania, May 2001.˜parkes/pubs/ In the classic case, each bidder bids a value vector for each available combination of goods or services. The various components and associated ask price are evaluated combinatorially to achieve the minimum sum to meet the requirement. The winning bid set is that which produces the maximum value of the accepted bids, although the second (Vickrey) price is paid. In the present context, each offeror submits an ask price (reserve) or evaluable value function for a component of the combination. If the minimum aggregate to meet the bid requirement is not met, the auction fails. If the auction is successful, then the set of offerors selected is that with the lowest aggregate bid, and they are compensated that amount.
  • The VCG auction is postulated as being optimal for allocation of multiple resources between agents. It is “strategyproof” and efficient, meaning that it is a dominant strategy for agents to report their true valuation for a resource, and the result of the optimization is a network which maximizes the value of the system to the agents. Game theory also allows an allocation of cost between various recipients of a broadcast or multicast. That is, the communication is of value to a plurality of nodes, and a large set of recipient nodes may efficiently receive the same information. This allocation from multiple bidders to multiple sellers is a direct extension of VCG theory, and a similar algorithm may be used to optimize allocation of costs and benefit.
  • The principal issue involved in VCG auctions is that the computational complexity of the optimization grows with the number of buyers and their different value functions and allocations. While various simplifying presumptions may be applied, studies reveal that these simplifications may undermine the VCG premise, and therefore do not promote honesty in reporting the buyer's valuation, and are thus not “strategyproof”, which is a principal advantage of the VCG process.
  • The surplus, i.e., gap between bid and ask, is then available to compensate the deferred bidders. This surplus is distributed proportionately to original the bid value for the bidder, thus further encouraging an honest valuation of control over the resource.
  • The optimization is such that, if any offeror asks an amount that is too high, it will be bypassed, e.g., in favor of more “reasonable” offerors. Since the bidder pays the second highest price, honesty in bidding the full private value is encouraged. Distribution of the surplus to losing bidders, which exercise deference to the winner, is proportional to the amount bid, that is, the reported value. In a scenario involving a request for information meeting specified criteria, the auction is complicated by the fact that the information resource content is unknown to the recipient, and therefore the bid is blind; the value of the information to the recipient is indeterminate. However, game theory supports the communication of a value function or utility function, which can then be evaluated at each node possessing information to be communicated, to normalize its value. It is a dominant strategy in a VCG auction to communicate a truthful value, and therefore broadcasting the private value function, to be evaluated by a recipient, is not untenable. In a mere request for information conveyance, such as the transport nodes in a multihop network, or in a cellular network infrastructure extension model, the bid may be a true (resolved) value, since the information content is not the subject of the bidding; rather it is the value of the communications per se, and the bidding node can reasonably value its bid. Game theory also allows an allocation of cost between recipients of a broadcast or multicast. In many instances, information of value to a plurality of nodes, and a large set of recipient nodes, may efficiently receive the same information. This allocation is a direct extension of VCG theory.
  • OPERATION OF PROTOCOL: The preferred method for acquiring an estimate of the state of the network is through use of a proactive routing protocol. In order to determine the network architecture state, each node must broadcast its existence, and, for example, a payload of information including its identity, location, itinerary (navigation vector) and “information value function” and/or “information availability function”. Typically, the system operates in a continuous state, so that it is reasonable to commence the process with an estimate of the state based on prior information. In a system with mobile nodes, the mobility may be predicted, or updates provided as necessary. Using an in-band or out-of-band propagation mechanism, this information must propagate to a network edge, which may be physically or artificially defined. If all nodes operate with a substantially common estimation of network topology, only deviations from previously propagated information need be propagated. A mechanism should be provided for initialization and in case a new node joins the network. If such estimates were accurate, the network could then be modeled similarly to a non-mobile network, with certain extensions. On the other hand, typical implementations will present substantial deviations between actual network architecture and predicted network architecture, requiring substantial fault tolerance in the fundamental operation of the protocol and system. CSMA is proposed for the protocol-related communications because it is relatively simple and robust, and well suited for ad hoc communications in lightly loaded networks. An initial node transmits using an adaptive power protocol, to achieve an effective transmit range of somewhat less than about two times the estimated average inter-nodal distance. This distance therefore promotes propagation to a set of neighboring nodes, without unnecessarily interfering with communications of non-neighboring nodes and therefore allowing this task to be performed in parallel. Neighboring nodes also transmit in succession, providing sequential and complete protocol information propagation over a relevance range.
  • If we presume that there is a spatial or temporal limit to relevance, for example, 5 miles or 10 hops, or 1-5 minutes, then the network state propagation may be so limited. Extending the network to encompass a large number of nodes will necessarily reduce the tractability of the optimization, although this may also produce substantial benefits, especially if the hop distance is relatively short with respect to the desired communication range. Each node has a local estimate of relevance as a filter on communications, especially arbitration communications. This consideration is accommodated, along with a desire to prevent exponential growth in protocol-related data traffic, by receiving an update communication from all nodes within a node's network relevance boundary, and a state variable which represents an estimate of relevant status beyond the arbitrarily defined boundary. The propagation of network state may thus conveniently occur over a finite number of hops, for example 5-10. The boundary estimate is advantageous in order to ensure long range consistency. On a practical note, assuming a cost is incurred by employing the ad hoc network, which scales with the number of hops, then at some point, especially considering the latency and reliability issues of ad hoc networks with a large number of hops, it is more efficient to employ cellular communications or the like. Making the ad hoc network suitable and reliable for 100 hop communications will necessarily impede communications over a much smaller number of hops, thus disincentivizing the more reasonable uses of the network. Under conditions of relatively high nodal densities, the system may employ a zone strategy, that is, proximate groups of nodes are is treated as an entity for purposes of external state estimation, especially with respect to distant nodes or zones. Such a presumption is realistic, since at extended distances, geographically proximate nodes may be modeled as being similar or inter-related, while at close distances, particularly in a zone in which all nodes are in direct communication, inter-node communications may be subject to mutual interference, and can occur without substantial external influence. Alternately, it is clear that to limit latencies and communication risks, it may be prudent to bypass neighboring nodes, thus trading latency for power consumption and overall network capacity. Therefore, a hierarchical scheme may be implemented to geographically organize the network at higher analytical levels, and geographic cells may cooperate to appear externally as a single entity.
  • In order to estimate a network edge condition, a number of presumptions must be made. The effect of an inaccurate estimate of the network edge condition typically leads to inefficiency, while inordinate efforts to accurately estimate the network edge condition also leads to inefficiency. Perhaps the best way to achieve the best compromise is to have a set of adaptive presumptions or rules, with a reasonable starting point. For example, in a multihop network, one might arbitrarily set a network edge the maximum range of five hops of administrative data using a 95% reliable transmission capability. Beyond this range, a set of state estimators is provided by each node for its surroundings, which are then communicated up to five hops (or the maximum range represented by five hops). This state estimator is at least one cycle old, and by the time it is transferred five hops away, it is at least six cycles old. Meanwhile, in a market economy, each node may respond to perceived opportunities, leading to a potential for oscillations if a time-element is not also communicated. Thus, it is preferred that the network edge state estimators represent a time-prediction of network behavior under various conditions, rather than a simple scalar value or instantaneous function.
  • For example, each node may estimate a network supply function and a network demand function, liquidity estimate and bid-ask gap for its environment, and its own subjective risk tolerance, if separately reported; the impact of nodes closer than five hops may then be subtracted from this estimate to compensate for redundant data. Further, if traffic routes are identifiable, which would correspond in a physical setting of highways, fixed infrastructure access points, etc., a state estimator for these may be provided as well. As discussed above, nodes may bid not only for their own needs or resources, but also to act as market-makers or merchants, and may obtain long term commitments (futures and/or options) and employ risk reduction techniques (insurance and/or indemnification), and thus may provide not only an estimate of network conditions, but also “guaranty” this state.
  • A node seeking to communicate within the five hop range need consider the edge state estimate only when calculating its own supply and demand functions, bearing in mind competitive pressures from outside. On the other hand, nodes seeking resources outside the five hop range must rely on the estimate, because a direct measurement or information would require excess administrative communications, and incur an inefficient administrative transaction. Thus, a degree of trust and reliance on the estimate may ensue, wherein a node at the arbitrary network edge is designated as an agent for the principal in procuring or selling the resource beyond its own sphere of influence, based on the provided parameters. The incentive for a node to provide misinformation is limited, since nodes with too high a reported estimate value lose gains from sale transactions, and indeed may be requested to be buyers, and vice versa. While this model may compel trading by intermediary nodes, if the information communicated accurately represents the network state, an economic advantage will accrue to the intermediary to participating, especially in a non-power constrained, unlicensed spectrum node configuration.
  • It should be borne in mind that the intended administration of the communications is an automated process, with little human involvement, other than setting goals. In a purely virtual economy with temporally declining currency value, the detriment of inaccurate optimizations is limited to reduced nodal efficiency, and with appropriate adaptivity, the system can learn from its “mistakes”. A fraud/malfeasance detection and remediation system may limit the adverse impact of such issues.
  • A supernode within a zone may be selected for its superior capability, or perhaps a central location. The zone is defined by a communication range of the basic data interface for communications, with the control channel having a longer range, for example at least double the normal data communications range. Communications control channel transmitters operate on a number of channels, for example at least 7, allowing neighboring zones in a hexagonal tiled array to communicate simultaneously without interference. In a geographic zone system, alternate zones which would otherwise be interfering may use an adaptive multiplexing scheme to avoid interference. All nodes may listen on all control channels, permitting rapid propagation of control information. As discussed elsewhere herein, directional antennas of various types may be employed, although it is preferred that out-of-band control channels employ omnidirectional antennas, having a generally longer range (and lower data bandwidth) than the normal data communications channels, in order to have a better chance to disseminate the control information to potentially interfering sources, and to allow coordination of nodes more globally. In order to effective provide decentralized control, either each node must have a common set of information to allow execution of an identical control algorithm, or nodes defer to the control signals of other nodes without internal analysis for optimality. A model of semi-decentralized control is also known, in which dispersed supernodes are nominated as master, with other topologically nearby nodes remaining as slave nodes. In the pure peer network, relatively complete information conveyance to each node is required, imposing a relatively high overhead. In a master-slave (or supernode) architecture, increased reliance on a single node trades-off reliability and robustness (and other advantages of pure peer-to-peer networks) for efficiency. A supernode within a cellular zone may be selected for its superior capability, or perhaps at a central location or immobile.
  • Once each control node (node or supernode) has an estimate of network topology, the next step is to optimize network channels. According to VCG theory, each agent has an incentive to broadcast its truthful value or value function for the scarce resource, which in this case, is control over communications physical layer, and or access to information. This communication can be consolidated with the network discovery transmission. Each control node then performs a combinatorial solution to select the optimum network configuration from the potentially large number of possibilities, which may include issues of transmit power, data rate, path, timing, reliability and risk criteria, economic and virtual economic costs, multipath and redundancy, etc., for the set or simultaneous equations according to VCG theory (or extensions thereof). This solution should be consistent between all nodes, and the effects of inconsistent solutions may be resolved by collision sensing, and possibly an error/inconsistency detection and correction algorithm specifically applied to this type of information. Thus, if each node has relatively complete information, or accurate estimates for incomplete information, then each node can perform the calculation and derive a closely corresponding solution, and verify that solutions reported by others are reasonably consistent to allow or promote reliance thereon.
  • As part of the network mapping, communications impairment and interference sources may also be mapped. GPS assistance may be particularly useful in this aspect. Where interference is caused by interfering communications, the issue is a determination of a strategy of deference, circumvention, or competition. If the interfering communication is continuous or unresponsive, then the only available strategies are circumvention or in some cases competition. On the other hand, when the competing system uses, for example, a CSMA system, such as 802.11, competition with such a communication simply leads to retransmission, and therefore ultimately increased network load, and deference strategy may be more optimal (dominant), at least and until it is determined that the competing communication is incessant, that is, the channel burden seen is consistently high. Other communications protocols, however may have a more or less aggressive strategy. By observation of a system over time, its strategies may be revealed, and game theory permits composition of an optimal strategy to deal with interference or coexistence.
  • The optimization process produces a representation of an optimal network architecture during the succeeding period(s). That is, value functions representing bids or other economic interests are broadcast, with the system then being permitted to determine an optimal real valuation and distribution of that value. Thus, prior to completion of the optimization, potentially inconsistent allocations must be prevented, and each node must communicate its evaluation of other node's value functions, so that the optimization is performed on a normalized economic basis. This step may substantially increase the system overhead, but is generally required for completion of the auction, at least if the auction does not account for incomplete or surrogate information. This valuation may be inferred, however, for intermediate nodes in a multihop network path, since there is little subjectivity for nodes solely in this role, and the respective value functions may be persistent. For example, the valuation applied by a node to forward information is generally independent of content and involved party.
  • As discussed above, may of the strategies for making the economic markets more efficient may be employed either directly, or analogy, to the virtual economy of the ad hoc network. The ability of nodes to act as market maker and derivative market agents facilitates the optimization, since a node may elect to undertake a responsibility (e.g., transaction risk), rather than relay to others, and therefore the control/administrative channel chain may be truncated at that point. If the network is dense, then a node which acts selfishly will be bypassed, and if the network is sparse, the node may well be entitled to gain transactional profit by acting as a principal and trader, subject to the fact that profits will generally be suboptimal if pricing is too high or too low.
  • After the network architecture and/or usage is defined, compensation is paid to those nodes providing value or subjected to a burden (including foregoing communication opportunity) by those gaining a benefit. The payment may be a virtual currency, with no specific true value, although the virtual currency system provides a convenient method to tax, subsidize, or control the system, and thus apply a normalized extrinsic value. A hybrid economy may be provided, linking both the virtual and real currencies, to some degree. This is especially useful if the network itself interfaces with an outside economy, such as the cellular telephony infrastructure (e.g., 2G, 2.5G, 3G, 4G, proposals for 5G, WiFi (802.11x) hotspots, WiMax (802.16x), etc.)
  • Using the protocol communication system, each node transmits its value function (or change thereof), passes through communications from neighboring nodes, and may, for example transmit payment information for the immediate-past bid for incoming communications. Messages are forwarded outward (avoiding redundant propagation back to the source), with messages appended from the series of nodes. Propagation continues for a finite number of hops, until the entire community has an estimate of the state and value function of each node in the community. Advantageously, the network beyond a respective community may be modeled in simplified form, to provide a better estimate of the network as a whole. If the propagation were not reasonably limited, the information would be stale by the time it is employed, and the system latency would be inordinate. Of course, in networks where a large number of hops are realistic, the limit may be time, distance, a counter or value decrement, or other variable, rather than hops. Likewise, the range may be adaptively determined, rather than predetermined, based on some criteria.
  • After propagation, each node evaluates the set of value functions for its community, with respect to its own information and ability to forward packets. Each node may then make an offer to supply or forward information, based on the provided information. In the case of multihop communications, the offers are propagated to the remainder of the community, for the maximum number of hops, including the originating node. At this point, each node has a representation of the state of its community, with community edge estimates providing consistency for nodes with differing community scopes, the valuation function each node assigns to control over portions of the network, as well as a resolved valuation of each node for supplying the need. Under these circumstances, each node may then evaluate an optimization for the network architecture, and come to a conclusion consistent with that of other members of its community. If supported, node reputation may be updated based on past performance, and the reputation applied as a factor in the optimization and/or externally to the optimization. As discussed above, a VCG-type auction is preferably employed as a basis for optimization. Since each node receives bid information from all other nodes within the network range (or maximum node count), the VCG auction produces an optimized result. By permitting futures, options, derivatives, insurance/indemnification/guaranties, long and short sales, etc., the markets may be relatively stabilized as compared to a simple set of independent and sequential auctions, which may show increased volatility, oscillations, chaotic behavior, and other features which may be inefficient. Transmissions may be made in frames, with a single bidding process controlling multiple frames, for example a multiple of the maximum number of hops. Therefore, the bid encompasses a frame's-worth of control over the modalities. In the event that the simultaneous use of, or control over, a modality by various nodes is not inconsistent, then the value of the respective nodes may be summed, with the resulting allocation based on, for example, a ratio of the respective value functions. In a preferred embodiment, as a part of the optimization, nodes are rewarded not only for supporting the communication, but also for deferring their own respective communications needs. As a result, after controlling the resources, a node will be relatively less wealthy and less able to subsequently control the resources, while other nodes will be more able to control the resources. The distribution to deferred nodes also serves to prevent pure reciprocal communications, since the proposed mechanism distributes and dilutes the wealth to deferring nodes.
  • Another possible transaction between nodes is a loan, that is, instead of providing bandwidth per se, one node may loan a portion of its generator function or accumulated wealth to another node. Presumably, there will be an associated interest payment. Since the currency in the preferred embodiment is itself defined by an algorithm, the loan transaction may also be defined by an algorithm. While this concept is somewhat inconsistent with a virtual currency which declines in value over time and/or space, it is not completely inconsistent, and, in fact, the exchange may arbitrage these factors, especially location-based issues.
  • Because each node in the model presented above is presumed to have relatively complete information or accurate estimates, for a range up to the maximum node count, the wealth of each node can be estimated by its neighbors, and payment inferred even if not actually consummated. (Failure of payment can occur for a number of reasons, including both malicious and accidental). Because each hop adds significant cost, the fact that nodes beyond the maximum hop distance are essentially incommunicado is typically of little consequence; since it is very unlikely that a node more than 5 or 10 hops away will be efficiently included in any communication, due to increasing cost with distance, as well as reduced reliability and increased latency. Thus, large area and scalable networks may exist. One way to facilitate longer-range transactions is for an intermediary node to undertake responsibility, and thus minimize the need for communications beyond that node in a chain. Enforcement of responsibility may be provided by a centralized system which assures that the transactions for each node are properly cleared, and that non-compliant nodes are either excluded from the network or at least labeled. While an automated clearinghouse which periodically ensures nodal compliance is preferred, a human discretion clearinghouse, for example presented as an arbitrator or tribunal, may be employed.
  • THE SYNTHETIC ECONOMY: Exerting external economic influences on the system may have various effects on the optimization, and may exacerbate differences in subjective valuations. The application of a monetary value to the virtual currency substantially also increases the possibility of misbehavior and external attacks. On the other hand, a virtual currency with no assessed real value is self-normalizing, while monetization leads to external and generally irrelevant influences as well as possible arbitrage. External economic influences may also lead to benefits, which are discussed in various papers on non-zero sum games.
  • In order to provide fairness, the virtual currency (similar to the so-called “nuglets” or “nuggets” proposed for use in the Terminodes project) is self-generated at each node according to a schedule, and itself may have a time dependent value. L. Blazevic, L. Buttyan, S. Capkun, S. Giordiano, J.-P. Hubaux, and J.-Y. Le Boudec. Self-organization in mobile ad-hoc networks: the approach of terminodes. IEEE Communications Magazine, 39(6):166-174, June 2001; M. Jakobsson, J. P. Hubaux, and L. Huttyan. A micro-payment scheme encouraging collaboration in multi-hop cellular networks. In Proceedings of Financial Crypto 2003, January 2003; J. P. Hubaux, et al., “Toward Self-Organized Mobile Ad Hoc Networks: The Terminodes Project”, IEEE Communications, 39(1), 2001.; Buttyan, L., and Hubaux, J.-P. Stimulating Cooperation in Self-Organizing Mobile Ad Hoc Networks. Tech. Rep. DSC/; Levente Buttyan and Jean-Pierre Hubaux, “Enforcing Service Availability in Mobile Ad-Hoc WANs”, 1st IEEE/ACM Workshop on Mobile Ad Hoc Networking and Computing (MobiHOC; L. Buttyan and J.-P. Hubaux. Nuglets: a virtual currency to stimulate cooperation in self-organized ad hoc networks. Technical Report DSC/2001,; Mario Cagalj, Jean-Pierre Hubaux, and Christian Enz. Minimum-energy broadcast in all-wireless networks: Np-completeness and distribution issues. In The Eighth ACM International Conference on Mobile Computing and Networking (MobiCom 2002),; N. Ben Salem, L. Buttyan, J. P. Hubaux, and Jakobsson M. A charging and rewarding scheme for packet forwarding. In Proceeding of Mobihoc, June 2003. For example, the virtual currency may have a half-life or temporally declining value. On the other hand, the value may peak at a time after generation, which would encourage deference and short term savings, rather than immediate spending, and would allow a recipient node to benefit from virtual currency transferred before its peak value. This also means that long term hoarding of the currency is of little value, since it will eventually decay in value, while the system presupposes a nominal rate of spending, which is normalized among nodes. The variation function may also be adaptive, but this poses a synchronization issue for the network. An external estimate of node wealth may be used to infer counterfeiting, theft and failure to pay debts, and to further effect remediation.
  • The currency is generated and verified in accordance with micropayment theory. Rivest, R. L., A. Shamir, PayWord and MicroMint: Two simple micropayment schemes, also presented at the RSA '96 conference,,; Silvio Micali and Ronald Rivest. Micropayments revisited. In Bart Preneel, editor, Progress in Cryptology—CT-RSA 2002, volume 2271 of Lecture Notes in Computer Science. Springer-Verlag, Feb. 18-22, 2002.
  • Micropayment theory generally encompasses the transfer of secure tokens (e.g., cryptographically endorsed information) having presumed value, which are intended for verification, if at all, in a non-real time transaction, after the transfer to the recipient. The currency is circulated (until expiration) as a token, and therefore is not subject to immediate authentication by source. Since these tokens may be communicated through an insecure network, the issue of forcing allocation of payment to particular nodes may be dealt with by cryptographic techniques, in particular public key cryptography, in which the currency is placed in a cryptographic “envelope” (often called a “cryptolope”) addressed to the intended recipient, e.g., is encrypted with the recipient's public key, which must be broadcast and used as, or in conjunction with, a node identifier. This makes the payment unavailable to other than the intended recipient. The issue of holding the encrypted token hostage and extorting a portion of the value to forward the packet can be dealt with by community pressure, that is, any node presenting this (or other undesirable) behavior might be ostracized. The likelihood of this type of misbehavior is also diminished by avoiding monetization of the virtual currency.
  • This currency generation and allocation mechanism generally encourages equal consumption by the various nodes over the long term. In order to discourage consumption of bandwidth, an external tax may be imposed on the system, that is, withdrawing value from the system based on usage. Clearly, the effects of such a tax must be carefully weighed, since this may also impose an impediment to adoption as compared to an untaxed system. On the other hand, a similar effect use-disincentive may be obtained by rewarding low consumption, for example by allocating an advertising subsidy between nodes, or in reward of deference. The external tax, if associated with efficiency-promoting regulation, may have a neutral or even beneficial effect. In a model telematics system, an audio and/or visual display provides a useful possibility for advertising and sponsorship; likewise, location based services may include commercial services. A synthetic economy affords the opportunity to provide particular control over the generator function, which in turn provides particular advantages with respect to a hierarchal organization. In this scheme, each node has the ability to control the generator function at respectively lower nodes, and can thus allocate wealth among subordinates. If one assumes real time communications, then it is clear that the superordinate node can directly place bids on behalf of subordinates, thus effectively controlling its entire branch. In the absence of real time communications, the superordinate node must defer to the discretion of the subordinate, subject to reallocation later if the subordinate defects. If communications are impaired, and a set of a priori instructions are insufficient, then it is up to the subjective response of a node to provide deference. Thus, a node may transfer all or a portion of its generator function, either for a limited time or permanently, using feedforward or feedback control. In this sense, the hierarchical and financial derivatives, options, futures, loans, etc. embodiments share a common theme.
  • It is noted that when sets of nodes “play favorites”, the VCG auction will no longer be considered “strategyproof”. The result is that bidders will assume bidding strategies that do not express their secret valuation, with the result being likely suboptimal market price finding during the auction. This factor can be avoided if hierarchal overrides and group bidding play only a small role in the economy, and thus the expected benefits from shaded bidding are outweighed by the normal operation of the system. On the other hand, the present invention potentially promotes competition within branches of a hierarchy, to the extent the hierarchy does not prohibit this. Between different branches of a hierarchy, there will generally be full competition, while within commonly controlled branches of a hierarchy, cooperation will be expected. Since the competitive result is generally more efficient, there will be incentive for the hierarchal control to permit competition as a default state, asserting control only where required for the hierarchal purpose.
  • MILITARY HIERARCHY: In a typical auction, each player is treated fairly; that is, the same rules apply to each player, and therefore a single economy describes the process. The fair auction therefore poses challenges for an inherently hierarchal set of users, such as a military organization. In the military, there is typically an expectation that “rank has its privileges”. The net result, however, is a decided subjective unfairness to lower ranking nodes. In a mobile ad hoc network, a real issue is user defection or non-compliance. For example, where a cost is imposed on a user for participating in the ad hoc network, e.g., battery power consumption, if the anticipated benefit does not exceed the cost, the user will simply turn off the device until actually needed, to conserve battery power outside the control of the network. The result of mass defection will of course be the instability and failure of the ad hoc network itself. Thus, perceived fairness and net benefit is important for network success, assuming that defection or non-compliance are possible. On the other hand, in military systems, the assertion of rank as a basis for priority is not necessarily perceived as arbitrary and capricious. Orders and communications from a central command are critical for the organization itself. Therefore, the difficulty in analyzing the application of a fair game to a hierarchal organization is principally a result of conceptualizing and aligning the individual incentives with those of the organization as a whole. Since the organization exists outside of the ad hoc network, it is generally not unrealistic to expect compliance with the hierarchal attributes both within and outside of the network. An artificial economy provides a basis for an economically efficient solution. In this economy, each node has a generator function for generating economic units which are used in a combinatorial auction with other nodes. The economic units may have a declining value, so that wealth does not accumulate over long periods, and by implication, wealth accumulated in one region is not available for transfer in a distant region. The geographic decline may also be explicit, for example based on a GPS or navigational system. In other cases, nodal motility is valuable, and mobile nodes are to be rewarded over those which are stationary. Therefore, the value or a portion thereof, or the generator function, may increase with respect to relocations.
  • This scheme may be extended to the hierarchal case by treating each chain of command as an economic unit with respect to the generator function. At any level of the hierarchy, the commander retains a portion of the wealth generation capacity, and delegates the remainder to its subordinates. In the case of real-time communications, a commander may directly control allocation of the generator function at each time period. Typically, there is no real-time communications capability, and the wealth generator function must be allocated a priori. Likewise, wealth may also be reallocated, although a penalty is incurred in the event of an initial misallocation since the transfer itself incurs a cost, and there will be an economic competitive distortion, under which a node's subjective value of a resource is influenced by its subjective wealth. If a node is supplied with wealth beyond its needs, the wealth is wasted, since it declines in value and cannot be hoarded indefinitely. If a node is supplied with insufficient wealth, economic surplus through transactional gains are lost. Thus, each node must analyze its expected circumstances to retain or delegate the generator function, and to optimally allocate wealth between competing subordinates. In any transaction, there will be a component which represents the competitive “cost”, and a possible redistribution among nodes within a hierarchal chain. This redistribution may be of accumulated wealth, or of the generation function portion. In the former case, if the communication path fails, no further transfers are possible, while in the later case, the result is persistent until the transfer function allocation is reversed. It is also possible to transfer an expiring or declining portion of the generating function; however, this might lead a node which is out of range to have no ability to rejoin the network upon return, and thus act as an impediment to efficient network operation. As discussed above, one possibility is for nodes to borrow or load currency. In this case, a node deemed credit-worthy may blunt the impact of initially having insufficient wealth by merely incurring a transaction cost (including interest, if applied). The bulk of the wealth generating function will be distributed to the lowest ranks with highest numerosity, and under normal circumstances, the network appears to operate according to a non-hierarchal VCG model, with the distortion that not all nodes have a common generator function. It is possible, however, for nodes within one branch of a hierarchy to conspire against nodes outside that branch, resulting in a different type of distortion. Since the ad hoc network typically gains by having a larger number of participating nodes, this type of behavior may naturally be discouraged. On the other hand, hierarchically superior nodes either retain, or more likely, can quickly recruit surrounding subordinates to allocate their wealth generating function and accumulated wealth to pass urgent or valuable messages. One way that this allocation of wealth may be apparent is through the use of expensive assets. Thus, a high level node might have access to a high power broadcast system, while low level nodes might ordinarily be limited to cellular wireless communications (including mobile cells, e.g., mobile ad hoc networks (MANETs)). For a low level node to generate a broadcast using an expensive asset (or to allocate a massive amount of space-bandwidth product, it must pass the request up through the chain of command, until sufficient wealth (i.e., authority) is available to implement the broadcast. In fact, such communications and authorizations are quite consistent with the expectations within a hierarchal organization, and this likely to be accepted.
  • Under normal circumstances, a superior would have an incentive to assure that each subordinate node possesses sufficient wealth to carry out its function and be incentivized to participate in the network. If a subordinate has insufficient initial wealth (or wealth generating function) allocation, it may still participate, but it must expend its internal resources to obtain wealth for participation in its own benefit. This, in turn, leads to a potential exhaustion of resources (including, for example, assets and credit), and the unavailability of the node for ad hoc intermediary use, even for the benefit of the hierarchy. An initial surplus allocation will lead to overbidding for resources, and thus inefficient resource allocation, potential waste of allocation, and a disincentive to act as an intermediary in the ad hoc network. In a military system, it is clearly possible to formulate an “engineered” solution which forces participation and eliminates defection; however, it is clear that such solutions forfeit the potential gains of optimality, and incentivized circumvention.
  • CELLULAR NETWORK EXTENSION: Cellular Networks provide efficient coverage of large portions of the inhabited landmass. On the other hand, achieving complete coverage, including relatively uninhabited areas, may be cost inefficient or infeasible. On the other hand, there remains significant unmet demand for coverage of certain areas. Generally, it is likely that a need for service arises within a few miles from the edge of a cellular network. That is, the fixed infrastructure is almost in reach. On the other hand, the infrastructure costs required to fill in gaps or marginally extend the network may be inordinately high, for the direct economic benefits achieved. At present, there is no effective means for remediating these gaps.
  • One problem arises in that the present networks generally have a threshold usage plan. All territory encompassed by a network is treated as fungible, and incurs the same cost. Likewise, usage of partner networks is also treated as fungible. Therefore, the incentive to extend network reach for any commercial enterprise is limited to the overall incentive for customers to defect to different networks, balanced against the increased cost of extending the network. It is in the context of this economic problem that a solution is proposed. Quite simply, in the same areas where the cellular infrastructure is insufficient and there is demand for service, it may be possible to implement a peer-to-peer network or multihop network to extend a cellular network system. In fact, if we presume that the coverage is absent, the network extension function may make use of the licensed spectrum, thus making the transceiver design more efficient, and eliminating extrinsic competing uses for the bandwidth. Likewise, this may be implemented in or as part of existing cellular network handsets, using common protocols. On the other hand, different spectrum and/or protocols may be employed, which may be licensed or unlicensed. Various studies have shown that modeled multihop mobile ad hoc network architectures tend to have low efficiency over three to five or more hops, due to node mobility and the probability of finding an end-to-end connection, mutual interference and competition for bandwidth in shared channel protocols, and the overhead of maintaining useful routing tables. If we take five hops as a reasonable maximum, and each transceiver has a 1000 meter range, then a 5 km maximum range extension is possible. It is believed that by extending the fringe of cellular networks by 3-5 km, a significant portion of the unmet demand for cellular service will be satisfied, at relatively low cost. If we assume that a significant portion of the mobile nodes are power constrained (e.g., battery operated), that is, retransmission of packets imposes a power cost, then the stability of the mobile ad hoc network and cooperation with its requirements will depend on properly incentivizing intermediary nodes to allocate their resources to the network. Since this incentive is provided in a commercial context, that is, the cellular service is a commercial enterprise with substantial cash flow, a real economy with monetary incentives for cooperation may be provided. Under such circumstances, it is relatively straightforward to allocate costs and benefits between the competing interests to achieve consistent and apparent incentives. On the other hand, the cost of this additional process must be commensurate with the benefits provided, or else the ad hoc network will become unreliable. The incentives therefore may be, for example unrestricted credits (cash), recurring fee credits (basic monthly fee), or non-recurring fee credits (additional minute credits). The issue is, are users willing to pay for extended cellular reach? If so, do they value the benefits commensurate with the overall costs, including service fees, hardware, and ad hoc cooperative burdens? As such, care must be exercised to define competitive compensation or the business will be inefficient. Since this extension is driven by the cellular network operator, a suitable return on investment is mandated.
  • Many analyses and studies have concluded that voluntary ad hoc networks are efficient when the incentives to cooperate with the network goals are aligned and sufficient to incentivize users accordingly. If the reward for cooperation is optimum, then the network will benefit by increased coverage and reliability, each node will benefit from increased utility, and intermediary nodes will specifically benefit through compensation. Due to the technical possibility for potential intermediaries to fail to either participate in network administration or operation, while taking advantage of the network as a beneficiary, the promotion of network availability as an incentive for cooperation is typically itself insufficient incentive to assure cooperation. The particular cost of the limited power resource for potential intermediaries makes non-cooperation a particularly important factor. On the other hand, the presumption of power cost as a critical factor may be accurate only in some circumstances: In many cases, a cheap power source is available, such as in a home or office, or in a vehicle, making other factors more important. It is noted that, in a cellular telephone system, the reasonable acts of a user which might undermine the network are limited. Clearly, the user can choose a different network or provider. The user may turn off his phone or make it unavailable. The user may abuse the service contract, taking advantage of promotions or “free” access to the detriment of others. Notably, the user typically has no reasonable ability to reprogram the phone or alter its operation in accordance with the protocol, unless granted this option by the network operator. The user cannot reasonably compete or interfere with the licensed spectrum, and if he does, it is a problem outside the scope of the ad hoc network issues. While older analog cellular phones provided the user with an option to install power amplifiers and vehicle mount antennas, few current users employ these options. If one limits the present system to a five hop distance from fixed cellular infrastructure (or more accurately, permits the system to deny service to nodes more than five hops away) then the routing requirements and node complexity may be substantially simplified. We also presume that each node has geolocation capability, and therefore can provide both its location and velocity vector. This is reasonable, since the FCC E911 mandate provides for geolocation of handsets within range of the cellular infrastructure, and GPS is a one option to provide this feature.
  • The ad hoc communications can occur using a licensed or unlicensed band. For example, since we presume that nodes are beyond range of a fixed cellular tower (except the closest node), the ad hoc network may reuse licensed bandwidth in the uncovered region. The ad hoc communications may also occur in unlicensed spectrum, such as the 2.4 GHz ISM band.
  • In order to provide optimum compensation, two issues are confronted. First, the total compensation paid; and second, the distribution of payments between the intermediaries. The VCG auction is a known means for optimizing a payment which must be distributed between a number of participants. In this case, each potential intermediary places a “bid”. A multi-factorial optimization is performed to determine the lowest cost set which provides sufficient services.
  • In a cellular system, each subscriber typically purchases a number of minute units on a monthly recurring charge basis. Compensation might therefore be based on minutes or money. Since there is a substantial disincentive to exceed the number of committed minutes, providing a surplus of minutes may not provide a significant incentive, because the user will rarely exceed the committed amount, except for a minority of users. Monetary incentives, on the other hand, must be coupled to a higher monthly recurring fee, since the proposal would by unprofitable otherwise.
  • A more direct scheme provides an economy for multihop networks somewhat independent from the cellular system economy. That is, nodes that participate as intermediary, may also participate as a principal to the information communication, while those who abstain from intermediary activities are denied access to the network extension as a principal.
  • While, on a theoretical basis, optimization of both price and distribution would be considered useful, practically, simplifying presumptions and simplifications may be useful. For example, while a VCG auction may provide an optimal cost and distribution of compensation, in a commercial network, a degree of certainty may actually be advantageous. For example, a fixed compensation per hop or per milliWatt-second may prove both fair and reasonable. Likewise, a degree of certainty over cost would be beneficial over an “optimal” cost. On the other hand, fixed cost and fixed compensation are inconsistent in a revenue neutral system. Even if the cellular carrier subsidizes the extension operation, there is little rationale for making the usage at the fringe insensitive to cost, other than the relief from uncertainty, which will tend to increase fringe usage, and the scope of the subsidy cost. As discussed above, there are methods drawn from financial models which may also serve to improve certainty and reduce perceived risk.
  • Therefore, it is realistic for a node requesting extension service to apply a value function to define a maximum payment for service. The payment is therefore dependent on system cost, alleviating the requirement for subsidy, but also dependent on need.
  • In the typical case, the load on the extension network will be low, since if demand were high, the fixed infrastructure would likely be extended to this region. On the other hand, there may be cases where demand is high, and therefore there is competition for access to the network, leading to a need to arbitrate access. In general, where economic demand is high, there is a tendency to recruit new sources of supply. That is, the system may operate in two modes. In a first, low demand mode, costs are based on a relatively simple algorithm, with a monetary cap. In a second mode, costs are competitive (and typically in excess of the algorithmic level), with compensation also being competitive. In contrast to the proposal described above for allocating the surplus between the set of bidders and/or offerors, the surplus, in this case, would generally be allocated to the cellular carrier, since this represents the commercial profit of the enterprise. The first mode also serves another purpose; under lightly loaded conditions, the market may be thin, and therefore pricing unstable. Therefore, the imposition of fixed pricing leads to reduced pricing risk.
  • In the second mode, an intended user specifies his demand as a maximum price and demand function, that is for example, a bid based on a value of the communication. Generally, this would be set by the user in advance as a static value or relatively simple function representing the contextual value of the communication. The actual price may be, for example, the bid price less otherwise attributable discount under the first mode based on the maximum number of hops, etc. The intermediate nodes set forth their bids in the manner of a VCG auction, with each bid presumably exceeding the first mode compensation. The VCG optimization may be corrected for quality of service factors and anticipated network stability. It is noted, as elsewhere herein, that the preferred bidding and optimization is performed automatically as a part of the protocol, and not under direct human control and supervision. Therefore, the automated processes may be defined to promote stability and cooperation by both the device and its owner with the network. In other cases, human involvement may be used, although thus will typically be quite inefficient and impose transactional expenses (opportunity costs) in excess of the underlying transaction value. The use therefore provides a set of explicit or implicit subjective criteria as a basis for the agent to act accordingly. An intermediary chooses its bid for providing packet forwarding services based on a number of factors, such as anticipated power cost, opportunity cost, etc.
  • Clearly, the economics of the system are substantially under the control of the cellular carrier, who may offer “plans” and “services” for their customers, thus providing an alternative to the usage-based bidding process, at least for some users. The VCG process, however, remains useful for compensating intermediaries.
  • CONCLUSION: Game theory is a useful basis for analyzing ad hoc networks, and understanding the behavior of complex networks of independent nodes. By presuming a degree of choice and decision-making by nodes, we obtain an analysis that is robust with respect to such considerations. The principal issues impeding deployment are the inherent complexity of the system, as well as the overhead required to continuously optimize the system. A set of simplifying presumptions may be employed to reduce protocol overhead and reduce complexity. Hierarchal considerations can be imposed to alter the optimization of the system, which would be expected to provide only a small perturbation to the efficient and optimal operation of the system according to a pure VCG protocol.
  • Third Embodiment
  • A third embodiment of the invention, described below, represents a system which may employ a self-organizing network to convey information between mobile nodes. It is expressly understood that the concepts set forth above in the first and second embodiments are directly applicable, and each aspect of the third embodiment may be extended using the hierarchal principles and modifications, in a consistent manner, to achieve the advantages described herein. That is, while the third embodiment generally describes peer nodes, the extension of the systems and methods to non-peer nodes is specifically envisioned and encompassed. This patent builds upon and extends aspects of U.S. Pat. No. 6,252,544 (Hoffberg), Jun. 26, 2001, U.S. Pat. No. 6,429,812, Aug. 6, 2002, U.S. Pat. No. 6,791,472, Sep. 14, 2004, which are expressly incorporated herein by reference in its entirety. See, also, U.S. Pat. No. 6,397,141 (Binnig, May 28, 2002, Method and device for signaling local traffic delays), expressly incorporated herein by reference, which relates to a method and an apparatus for signaling local traffic disturbances wherein a decentralized communication between vehicles, which is performed by exchanging their respective vehicle data. Through repeated evaluation of these individual vehicle data, each reference vehicle may determine a group of vehicles having relevance for itself from within a maximum group of vehicles and compare the group behavior of the relevant group with its own behavior. The results of this comparison are indicated in the reference vehicle, whereby a homogeneous flow of traffic may be generated, and the occurrence of accidents is reduced. See, also U.S. Pat. Nos. 4,706,086; 4,860,216; 5,131,020; 5,164,904; 5,302,955; 5,428,544; 5,539,645; 5,594,779; 5,689,252; 5,699,056; 5,809,437; 5,864,305; 5,889,473; 5,919,246; 5,982,298; 6,115,654; 6,173,159; 6,304,758; 6,338,011; 6,359,571; 6,384,739; 6,401,027; 6,411,221; 6,411,889; and 6,473,688; and Japanese Patent Document Nos. JP 9-236650 (September, 1997); 10-84430 (March, 1998); 5-151496 (June, 1993); and 11-183184 (July, 1999), each of which is expressly incorporated herein by reference. See also: Martin E. Liggins, II, et al., “Distributed Fusion Architectures and Algorithms for Target Tracking”, Proceedings of the IEEE, vol. 85, No. 1, (XP-002166088) January, 1997, pp. 95-106; D. M. Hosmer, “Data-Linked Associate Systems”, 1994 IEEE International Conference on Systems, Man, and Cybernetics. Humans, Information and Technology (Cat. No. 94CH3571-5), Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Tex., vol. 3, (XP-002166089) (1994), pp. 2075-2079.
  • One aspect of the invention provides a communications system, method and infrastructure. According to one preferred embodiment, an ad hoc, self organizing, cellular radio system (sometimes known as a “mesh network”) is provided. Advantageously, high gain antennas are employed, preferably electronically steerable antennas, to provide efficient communications and to increase communications bandwidth, both between nodes and for the system comprising a plurality of nodes communicating with each other. See, U.S. Pat. No. 6,507,739 (Gross, et al., Jan. 14, 2003), expressly incorporated herein by reference.
  • In general, time-critical, e.g., voice communications require tight routing to control communications latency. On the other hand, non-time critical communications generally are afforded more leeway in terms of communications pathways, including a number of “hops”, retransmission latency, and out-of-order packet communication tolerance, between the source and destination or fixed infrastructure, and quality of communication pathway. Further, it is possible to establish redundant pathways, especially where communications bandwidth is available, multiple paths possible, and no single available path meets the entire communications requirements or preferences. Technologies for determining a position of a mobile device are also well known. Most popular are radio triangulation techniques, including artificial satellite and terrestrial transmitters or receivers, dead reckoning and inertial techniques. Advantageously, a satellite-based or augmented satellite system, although other suitable geolocation systems are applicable. Navigation systems are also well known. These systems generally combine a position sensing technology with a geographic information system (GIS), e.g., a mapping database, to assist navigation functions. Systems which integrate GPS, GLONASS, LORAN or other positioning systems into vehicular guidance systems are well known, and indeed navigational purposes were prime motivators for the creation of these systems. Environmental sensors are well known. For example, sensing technologies for temperature, weather, object proximity, location and identification, vehicular traffic and the like are well developed. In particular, known systems for analyzing vehicular traffic patterns include both stationary and mobile sensors, and networks thereof. Most often, such networks provide a stationary or centralized system for analyzing traffic information, which is then broadcast to vehicles.
  • Encryption technologies are well known and highly developed. These are generally classified as being symmetric key, for example the Data Encryption Standard (DES), and the more recent Advanced Encryption Standard (AES), in which the same key is used for encryption as decryption, and asymmetric key cryptography, in which different and complementary keys are used to encrypt and decrypt, in which the former and the latter are not derivable from each other (or one from the other) and therefore can be used for authentication and digital signatures. The use of asymmetric keys allows a so-called public key infrastructure, in which one of the keys is published, to allow communications to be directed to a possessor of a complementary key, and/or the identity of the sender of a message to be verified. Typical asymmetric encryption systems include the Rivest-Shamir-Adelman algorithm (RSA), the Diffie-Hellman algorithm (DH), elliptic curve encryption algorithms, and the so-called Pretty Good Privacy (PGP) algorithm.
  • One embodiment of the invention provides a system that analyzes both a risk and an associated reliability. Another embodiment of the invention communicates the risk and associated reliability in a manner for efficient human comprehension, especially in a distracting environment. See, U.S. Pat. Nos. 6,201,493; 5,977,884; 6,118,403; 5,982,325; 5,485,161; WO0077539, each of which is expressly incorporated herein by reference, and the Uniden GPSRD (see Uniden GPSRD User's Manual, expressly incorporated herein by reference). See, also U.S. Pat. Nos. 5,650,770; 5,450,329; 5,504,482; 5,504,491; 5,539,645; 5,929,753; 5,983,161; 6,084,510; 6,255,942; 6,225,901; 5,959,529; 5,752,976; 5,748,103; 5,720,770; 6,005,517; 5,805,055; 6,147,598; 5,687,215; 5,838,237; 6,044,257; 6,144,336; 6,285,867; 6,340,928; 6,356,822; 6,353,679 each of which is expressly incorporated herein by reference.
  • STATISTICAL ANALYSIS: It is understood that the below analysis and analytical tools, as well as those known in the art, may be used individually, in sub-combination, or in appropriate combination, to achieve the goals of the invention. These techniques may be implemented in dedicated or reprogrammable/general purpose hardware, and may be employed for low level processing of signals, such as in digital signal processors, within an operating system or dynamic linked libraries, or within application software. Likewise, these techniques may be applicable, for example, to low level data processing, system-level data processing, or user interface data processing. A risk and reliability communication system may be useful, for example, to allow a user to evaluate a set of events in statistical context. Most indicators present data by means of a logical indicator or magnitude, as a single value. Scientific displays may provide a two-dimensional display of a distribution, but these typically require significant user focus to comprehend, especially where a multimodal distribution is represented. Typically, the human visual input can best accommodate a three dimensional color input representing a set of bounded objects which change partially over time, and it is ergonomically difficult to present more degrees of freedom of information simultaneously. That is, the spatial image is not arbitrary, but represents bounded objects (or possibly fuzzy edges), and the sequence over time should provide transitions. User displays of a magnitude or binary value typically do not provide any information about a likelihood of error. Thus, while a recent positive warning of the existence of an event may be a reliable indicator of the actual existence of the event, the failure to warn of an event does not necessarily mean that the event does not exist. Further, as events age, their reliability often decreases. The present invention therefore seeks to provide additional information which may be of use in decision-making, including a reliability of the information presented, and/or risk associated with that information, if true. These types of information are typically distinct from the data objects themselves. In order to present these additional degrees of freedom of information within the confines of efficient human cognition, a new paradigm is provided. Essentially, the objects presented (which may be, for example, identifiers of events), are mapped or ranked by a joint function of risk and reliability. Typically, the joint function will adopt economic theory to provide a normalized cost function. Of course, the risk and reliability need not be jointly considered, and these may remain independent considerations for mapping purposes. Because of human subjective perception of risk and reliability, it may be useful to tune the economic normalized cost function for subjective considerations, although in other instances, an objective evaluation is appropriate and efficient.
  • In analyzing a complex data set for both time and space patterns, wavelets may be useful. While the discrete wavelet transform (DWT), an analogy of the discrete Fourier transform (DFT) may be employed, it is perhaps more general to apply arbitrary wavelet functions to the data set, and adopting mathematical efficiencies as these present themselves, rather than mandating that an efficient and predefined transform necessarily be employed. A Bayesian network is a representation of the probabilistic relationships among distinctions about the world. Each distinction, sometimes called a variable, can take on one of a mutually exclusive and exhaustive set of possible states. Associated with each variable in a Bayesian network is a set of probability distributions. Using conditional probability notation, the set of probability distributions for a variable can be denoted by p(xii, X), where “p” refers to the probability distribution, where “πi” denotes the parents of variable Xi and where “X” denotes the knowledge of the expert. The Greek letter “X” indicates that the Bayesian network reflects the knowledge of an expert in a given field. Thus, this expression reads as follows: the probability distribution for variable Xi given the parents of Xi and the knowledge of the expert. For example, Xi is the parent of X2. The probability distributions specify the strength of the relationships between variables. For instance, if Xi has two states (true and false), then associated with Xi is a single probability distribution p(x1|X)p and associated with X2 are two probability distributions p(xi|X1=t, X) and p(xi|X2=t, X).
  • A Bayesian network is a representation of the probabilistic relationships among distinctions about the world. Each distinction, sometimes called a variable, can take on one of a mutually exclusive and exhaustive set of possible states. Associated with each variable in a Bayesian network is a set of probability distributions. Using conditional probability notation, the set of probability distributions for a variable can be denoted by p(xii, χ), where “p” refers to the probability distribution, where “πi” denotes the parents of variable Xi and where “χ” denotes the knowledge of the expert. The Greek letter “χ” indicates that the Bayesian network reflects the knowledge of an expert in a given field. Thus, this expression reads as follows: the probability distribution for variable Xi given the parents of Xi and the knowledge of the expert. For example, X1 is the parent of X2. The probability distributions specify the strength of the relationships between variables. For instance, if X1 has two states (true and false), then associated with X1 is a single probability distribution p(x1|χ)p and associated with X2 are two probability distributions p(xi|X1=t, χ) and p(X2|X1=f, χ).
  • A Bayesian network is expressed as an acyclic-directed graph where the variables correspond to nodes and the relationships between the nodes correspond to arcs. The arcs in a Bayesian network convey dependence between nodes. When there is an arc between two nodes, the probability distribution of the first node depends upon the value of the second node when the direction of the arc points from the second node to the first node. Missing arcs in a Bayesian network convey conditional independencies. However, two variables indirectly connected through intermediate variables are conditionally dependent given lack of knowledge of the values (“states”) of the intermediate variables. In other words, sets of variables X and Y are said to be conditionally independent, given a set of variables Z, if the probability distribution for X given Z does not depend on Y. If Z is empty, however, X and Y are said to be “independent” as opposed to conditionally independent. If X and Y are not conditionally independent, given Z, then X and Y are said to be conditionally dependent given Z.
  • The variables used for each node may be of different types. Specifically, variables may be of two types: discrete or continuous. A discrete variable is a variable that has a finite or countable number of states, whereas a continuous variable is a variable that has an effectively infinite number of states. An example of a discrete variable is a Boolean variable. Such a variable can assume only one of two states: “true” or “false.” An example of a continuous variable is a variable that may assume any real value between −1 and 1. Discrete variables have an associated probability distribution. Continuous variables, however, have an associated probability density function (“density”). Where an event is a set of possible outcomes, the density p(x) for a variable “x” and events “a” and “b” is defined as:
  • p ( x ) = Lim a b [ p ( a x b ) ( a - b ) ] ,
  • where p(a≦x≦b) is the probability that x lies between a and b. Conventional systems for generating Bayesian networks cannot use continuous variables in their nodes.
  • There are two conventional approaches for constructing Bayesian networks. Using the first approach (“the knowledge-based approach”), first the distinctions of the world that are important for decision making are determined. These distinctions correspond to the variables of the domain of the Bayesian network. The “domain” of a Bayesian network is the set of all variables in the Bayesian network. Next the dependencies among the variables (the arcs) and the probability distributions that quantify the strengths of the dependencies are determined. In the second approach (“called the data-based approach”), the variables of the domain are first determined. Next, data is accumulated for those variables, and an algorithm is applied that creates a Bayesian network from this data. The accumulated data comes from real world instances of the domain. That is, real world instances of decision making in a given field. Conventionally, this second approach exists for domains containing only discrete variables.
  • U.S. Pat. No. 5,704,018 describes a system and method for generating Bayesian networks (also known as “belief networks”) that utilize both expert data received from an expert (“expert knowledge”) and data received from real world instances of decisions made (“empirical data”). By utilizing both expert knowledge and empirical data, the network generator provides an improved Bayesian network that may be more accurate than conventional Bayesian networks or provide other advantages, e.g., ease of implementation and lower reliance on “expert” estimations of probabilities. Likewise, it is known to initiate a network using estimations of the probabilities (and often the relevant variables), and subsequently use accumulated data to refine the network to increase its accuracy and precision. Expert knowledge consists of two components: an equivalent sample size or sizes (“sample size”), and the prior probabilities of all possible Bayesian-network structures (“priors on structures”). The effective sample size is the effective number of times that the expert has rendered a specific decision. For example, a doctor with 20 years of experience diagnosing a specific illness may have an effective sample size in the hundreds. The priors on structures refers to the confidence of the expert that there is a relationship between variables (e.g., the expert is 70% sure that two variables are related). The priors on structures can be decomposed for each variable-parent pair known as the “prior probability” of the variable-parent pair. Empirical data is typically stored in a database. The database may contain a list of the observed state of some or all of the variables in the Bayesian network. Each data entry constitutes a case. When one or more variables are unobserved in a case, the case containing the unobserved variable is said to have “missing data.” Thus, missing data refers to when there are cases in the empirical data database that contain no observed value for one or more of the variables in the domain. An assignment of one state to each variable in a set of variables is called an “instance” of that set of variables. Thus, a “case” is an instance of the domain. The “database” is the collection of all cases. Therefore, it is seen that Bayesian networks can be used to probabilistically model a problem, in a mathematical form. This model may then be analyzed to produce one or more outputs representative of the probability that a given fact is true, or a probability density distribution that a variable is at a certain value. A review of certain statistical methods is provided below for the convenience of the reader, and is not intended to limit the scope of methods, of statistical of other type, which may be employed in conjunction with the system and method according to the present invention. It is understood that these mathematical models and methods may be implemented in known manner on general purpose computing platforms, for example as a compiled application in a real-time operating system such as RT Linux, QNX, versions of Microsoft Windows, or the like. Further, these techniques may be implemented as applets operating under Matlab or other scientific computing platform. Alternately, the functions may be implemented natively in an embedded control system or on a microcontroller.
  • It is also understood that, while the mathematical methods are capable of producing precise and accurate results, various simplifying presumptions and truncations may be employed to increase the tractability of the problem to be solved. Further, the outputs generally provided according to preferred embodiments of the present invention are relatively low precision, and therefore higher order approximation of the analytic solution, in the case of a rapidly convergent calculation, will often be sufficient.
  • A time domain process demonstrates a Markov property if the conditional probability density of the current event, given all present and past events, depends only on the jth most recent events. If the current event depends solely on the most recent past event, then the process is a first order Markov process. There are three key problems in HMM use: evaluation, estimation, and decoding. The evaluation problem is that given an observation sequence and a model, what is the probability that the observed sequence was generated by the model (Pr(O|λ)). If this can be evaluated for all competing models for an observation sequence, then the model with the highest probability can be chosen for recognition.
  • Pr(O|λ) can be calculated several ways. The naive way is to sum the probability over all the possible state sequences in a model for the observation sequence:
  • Pr ( O λ ) = allS t = 1 T a s t - 1 s t b s t ( O t ) .
  • However, this method is exponential in time, so the more efficient forward-backward algorithm is used in practice. The following algorithm defines the forward variable α and uses it to generate Pr(O|λ) (π are the initial state probabilities, a are the state transition probabilities, and b are the output probabilities). a1(i)=πibi(Oi), for all states
  • i ( if i S I , π i = 1 a I ; otherwise π i = 0 ) ;
  • Calculating α( ) along the time axis, for t=2, . . . , T, and all states j, compute
  • α i ( j ) = [ i α i - 1 ( i ) α ij ] b j ( O i ) ,
  • Final probability is given by
  • Pr ( O λ ) = i Sp α T ( i ) .
  • The first step initializes the forward variable with the initial probability for all states, while the second step inductively steps the forward variable through time. The final step gives the desired result Pr(O|λ), and it can be shown by constructing a lattice of states and transitions through time that the computation is only order O(N2T). The backward algorithm, using a process similar to the above, can also be used to compute Pr(O|λ) and defines the convenience variable β.
  • The estimation problem concerns how to adjust λ to maximize Pr(O|λ) given an observation sequence O. Given an initial model, which can have flat probabilities, the forward-backward algorithm allows us to evaluate this probability. All that remains is to find a method to improve the initial model. Unfortunately, an analytical solution is not known, but an iterative technique can be employed. Using the actual evidence from the training data, a new estimate for the respective output probability can be assigned:
  • b _ j ( k ) = t O t = v k γ t ( j ) t = 1 T γ t ( j ) ,
  • where γt(i) is defined as the posterior probability of being in state i at time t given the observation sequence and the model. Similarly, the evidence can be used to develop a new estimate of the probability of a state transition (α ij) and initial state probabilities (π i). Thus all the components of model (λ) can be re-estimated. Since either the forward or backward algorithm can be used to evaluate Pr(O|λ) versus the previous estimation, the above technique can be used iteratively to converge the model to some limit. While the technique described only handles a single observation sequence, it is easy to extend to a set of observation sequences.
  • The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution [˜gerjanos/HMM/node4.html-r4#4]. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are “hidden” to the outside; hence the name Hidden Markov Model. In order to define an HMM completely, following elements are needed: The number of states of the model, N; The number of observation symbols in the alphabet, M. If the observations are continuous then M is infinite; A set of state transition probabilities Λ={aij} aij=p{qt+1=j|qt=i}, 1≦i, j≦N, where qt denotes the current state. Transition probabilities should satisfy the normal stochastic constraints, aij≧0, 1≦i, j≦N and
  • j = 1 N a ij = 1 ,
  • 1≦i≦N. A probability distribution in each of the states, B={bj(k)}, bj(k)=p{ot=vk|qt=j}, 1≦j≦N, 1≦k≦M, where vk denotes the kth observation symbol in the alphabet, and ot the current parameter vector.
  • Following stochastic constraints must be satisfied. bj≧0, 1≦j≦N, 1≦k≦M and
  • k = 1 M b j ( k ) = 1 ,
  • 1≦j≦N.
  • If the observations are continuous then we will have to use a continuous probability density function, instead of a set of discrete probabilities. In this case we specify the parameters of the probability density function. Usually the probability density is approximated by a weighted sum of M Gaussian distributions
  • N , b b ( o t ) = m = 1 M c jm ( μ jm , jm , o t ) ,
  • where,
  • cjm=weighting coefficients; μjm=mean vectors; Σjm=Covariance matrices
  • cjm should satisfy the stochastic constrains, cjm≧0, 1≦j≦N, 1≦m≦M, and
  • m = 1 M c jm = 1 ,
  • 1≦j≦N.
  • The initial state distribution, π={πi}. where, πi=p{q1=i}, 1≦i≦N. Therefore we can use the compact notation λ=(Λ, B, π) to denote an HMM with discrete probability distributions, while λ=(Λ, cjm, μjm, Σjm, π) to denote one with continuous densities. For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs.
  • (1) The Markov assumption: As given in the definition of HMMs, transition probabilities are defined as, aij=p{qt+1=j|qt=i}. In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM. However generally the next state may depend on past k states and it is possible to obtain a such model, called an kth order HMM by defining the transition probabilities as follows. ai 1 i 2 . . . i k j=p{qt+1=j|qt=i1, qt−1=i2, . . . , qt−k+1=ik}, 1≦i1, i2, . . . , ik, j≦N. But it is seen that a higher order HMM will have a higher complexity. Even though the first order HMMs are the most common, some attempts have been made to use the higher order HMMs too.
  • (2) The stationarity assumption: Here it is assumed that state transition probabilities are independent of the actual time at which the transitions takes place. Mathematically, p{qt 1 +1=j|qt 1 =i}=p{qt 2 +1=j|qt 2 =i}, for any t1 and t2.
  • (3) The output independence assumption: This is the assumption that current output (observation) is statistically independent of the previous outputs (observations). The assumption can be formulated mathematically, by considering a sequence of observations, O=o1, o2, . . . , oT. Then
  • p { O q 1 , q 2 , , q T , λ } = t = 1 T p ( o t q t , λ ) ,
  • according to the assumption for an HMM λ. However unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs. A Hidden Markov Model (HMM) is a Markov chain, where each state generates an observation. You only see the observations, and the goal is to infer the hidden state sequence. HMMs are very useful for time-series modeling, since the discrete state-space can be used to approximate many non-linear, non-Gaussian systems.
  • HMMs and some common variants (e.g., input-output HMMs) can be concisely explained using the language of Bayesian Networks. Consider the Bayesian network in FIG. 1, which represents a hidden Markov model (HMM). (Circles denote continuous-valued random variables, squares denote discrete-valued, clear means hidden, shaded means observed.) This encodes the joint distribution P(Q,Y)=P(Q1)P(Y1|Q1)P(Q2|Q1)P(Y2|Q2) . . . . For a sequence of length T, we simply “unroll” the model for T time steps. In general, such a dynamic Bayesian network (DBN) can be specified by just drawing two time slices (this is sometimes called a 2TBN)—the structure (and parameters) are assumed to repeat. The Markov property states that the future is independent of the past given the present, i.e. Q{t+1}\indep Q{t−1}|Qt. We can parameterize this Markov chain using a transition matrix, Mij=P(Q{t+1}=j|Qt=i), and a prior distribution, πi=P(Q1=i).
  • We have assumed that this is a homogeneous Markov chain, i.e., the parameters do not vary with time. This assumption can be made explicit by representing the parameters as nodes: see FIG. 2: P1 represents π, P2 represents the transition matrix, and P3 represents the parameters for the observation model. If we think of these parameters as random variables (as in the Bayesian approach), parameter estimation becomes equivalent to inference. If we think of the parameters as fixed, but unknown, quantities, parameter estimation requires a separate learning procedure (usually EM). In the latter case, we typically do not represent the parameters in the graph; shared parameters (as in this example) are implemented by specifying that the corresponding CPDs are “tied”. An HMM is a hidden Markov model because we don't see the states of the Markov chain, Qt, but just a function of them, namely Yt. For example, if Yt is a vector, we might define P(Yt=y|Qt i)=N(y,μiσi). A richer model, widely used in speech recognition, is to model the output (conditioned on the hidden state) as a mixture of Gaussians. This is shown in FIG. 3.
  • Some popular variations on the basic HMM theme are illustrated in FIGS. 4A, 4B and 4C, which represent, respectively, an input-output HMM, a factorial HMM, and a coupled HMM. (In the input-output model, the CPD P(Q|U) could be a softmax function, or a neural network.) Software is available to handle inference and learning in general Bayesian networks, making all of these models trivial to implement. It is noted that the parameters may also vary with time. This does not violate the presumptions inherent in an HMM, but rather merely complicates the analysis since a static simplifying presumption may not be made.
  • A discrete-time, discrete-space dynamical system governed by a Markov chain emits a sequence of observable outputs: one output (observation) for each state in a trajectory of such states. From the observable sequence of outputs, we may infer the most likely dynamical system. The result is a model for the underlying process. Alternatively, given a sequence of outputs, we can infer the most likely sequence of states. We might also use the model to predict the next observation or more generally a continuation of the sequence of observations.
  • The Evaluation Problem and the Forward Algorithm: We have a model λ=(Λ,B,π) and a sequence of observations O=o1.o2, . . . , oT, and p{O|λ} must be found. We can calculate this quantity using simple probabilistic arguments. But this calculation involves number of operations in the order of NT. This is very large even if the length of the sequence, T is moderate. Therefore we have to look for another method for this calculation. Fortunately there exists one which has a considerably low complexity and makes use an auxiliary variable, αt(i) called forward variable. The forward variable is defined as the probability of the partial observation sequence o1.o2, . . . , oT, when it terminates at the state i. Mathematically,

  • αt(i)=p{o 1 ,o 2 , . . . , o t ,q t =i|λ}  (1.1)
  • Then it is easy to see that following recursive relationship holds.
  • α t + 1 ( j ) = b j ( o t + 1 ) i = 1 N α t ( i ) a ij , 1 j N , 1 t T - 1 , where , α 1 ( j ) = π j b j ( o 1 ) , 1 j N ( 1.2
  • Using this recursion to calculate αT(i), 1≦i≦N and then the required probability is given by,
  • p { O λ } = i = 1 N α T ( i ) . ( 1.3 )
  • The complexity of this method, known as the forward algorithm is proportional to N2T, which is linear with respect to T whereas the direct calculation mentioned earlier, had an exponential complexity. In a similar way we can define the backward variable βt(i) as the probability of the partial observation sequence ot+1, ot+2, . . . , oT, given that the current state is i. Mathematically,

  • βt(i)=p{o t+1 ,o t+2 , . . . , o T |q t =i,λ}  (1.4)
  • As in the case of αt(i) there is a recursive relationship which can be used to calculate βt(i) efficiently.
  • β t ( i ) = j = 1 N β t + 1 ( j ) a ij b j ( o t + 1 ) , 1 t T - 1 , where , β T ( i ) = 1 , 1 i N ( 1.5 )
  • Further we can see that,

  • αt(it(i)=p{O,q t =i|λ}, 1≦i≦N, 1≦t≦T  (1.6)
  • Therefore this gives another way to calculate p{O|λ}, by using both forward and backward variables as given in eqn. 1.7. See,˜gerjanos/HMM/, expressly incorporated herein by reference.
  • p { O λ } = i = 1 N p { O , q t = i λ } = i = 1 N α t ( i ) β t ( i ) ( 1.7 )
  • Eqn. 1.7 is very useful, especially in deriving the formulas required for gradient based training.
  • The Decoding Problem and the Viterbi Algorithm: While the estimation and evaluation processes described above are sufficient for the development of an HMM system, the Viterbi algorithm provides a quick means of evaluating a set of HMM's in practice as well as providing a solution for the decoding problem. In decoding, the goal is to recover the state sequence given an observation sequence. The Viterbi algorithm can be viewed as a special form of the forward-backward algorithm where only the maximum path at each time step is taken instead of all paths. This optimization reduces computational load and allows the recovery of the most likely state sequence. The steps to the Viterbi are: Initialization. For all states i, δ1(i)=πibi(O1); ψi(i)=0; Recursion. From t=2 to T and for all states j, δi(j)=Maxit−1(i)aij]bj(Ot); ψt(j)=arg maxit−1(i)aij]; Termination. P=Maxs∈S p T(s)]; sT=arg maxs∈S p T(s)]; Recovering the state sequence. From t=T−1 to 1, stt+1(st+1)
  • In many HMM system implementations, the Viterbi algorithm is used for evaluation at recognition time. Note that since Viterbi only guarantees the maximum of Pr(O, S|λ) over all state sequences S (as a result of the first order Markov assumption) instead of the sum over all possible state sequences, the resultant scores are only an approximation.
  • So far the discussion has assumed some method of quantization of feature vectors into classes. However, instead of using vector quantization, the actual probability densities for the features may be used. Baum-Welch, Viterbi, and the forward-backward algorithms can be modified to handle a variety of characteristic densities. In this context, however, the densities will be assumed to be Gaussian. Specifically,
  • b j ( O t ) = 1 ( 2 π ) a σ j 1 2 ( O t - μ j ) t σ j - 1 ( O t - μ j ) .
  • Initial estimations of μ and σ may be calculated by dividing the evidence evenly among the states of the model and calculating the mean and variance in the normal way. Whereas flat densities were used for the initialization step before, the evidence is used here. Now all that is needed is a way to provide new estimates for the output probability. We wish to weight the influence of a particular observation for each state based on the likelihood of that observation occurring in that state. Adapting the solution from the discrete case yields
  • μ _ j = t = 1 T γ t ( j ) O t t = 1 T γ t ( j ) , and σ _ j = t = 1 T γ t ( j ) ( O t - μ _ j ) ( O t - μ _ j ) t t = 1 T γ t ( j ) .
  • For convenience, μj is used to calculate σ j instead of the re-estimated μ j. While not strictly proper, the values are approximately equal in contiguous iterations and seem not to make an empirical difference. See,˜testarne/asl/asl-tr375, expressly incorporated herein by reference. Since only one stream of data is being used and only one mixture (Gaussian density) is being assumed, the algorithms above can proceed normally, incorporating these changes for the continuous density case. We want to find the most likely state sequence for a given sequence of observations, O=o1, o2, . . . , oT and a model, λ=(Λ,B,π). The solution to this problem depends upon the way “most likely state sequence” is defined. One approach is to find the most likely state qt at t=t and to concatenate all such ‘qt’s. But sometimes, this method does not give a physically meaningful state sequence. Therefore we would seek another method which has no such problems. In this method, commonly known as Viterbi algorithm, the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable,
  • δ t ( i ) = max q 1 q 2 q t - 1 p { q 1 , q 2 , , q t - 1 , q t = i , o 1 , o 2 , , o t - 1 λ } ,
  • which gives the highest probability that partial observation sequence and state sequence up to t=t can have, when the current state is i.
  • It is easy to observe that the following recursive relationship holds.
  • δ t + 1 ( j ) = b j ( o t + 1 ) [ max 1 i N δ t a ij ] , 1 i N , 1 t T - 1 , where , δ 1 ( j ) = π j b j ( o 1 ) , 1 j N ( 1.8 )
  • So the procedure to find the most likely state sequence starts from calculation of δT(j), 1≦j≦N using recursion in 1.8, while always keeping a pointer to the “winning state” in the maximum finding operation. Finally the state j*, is found where
  • j * = arg max 1 j N δ T ( j ) ,
  • and starting from this state, the sequence of states is back-tracked as the pointer in each state indicates. This gives the required set of states. This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the states of the HMM in each of the time instant t, 1≦t≦T.
  • The Learning Problem: Generally, the learning problem is how to adjust the HMM parameters, so that the given set of observations (called the training set) is represented by the model in the best way for the intended application. Thus it would be clear that the “quantity” we wish to optimize during the learning process can be different from application to application. In other words there may be several optimization criteria for learning, out of which a suitable one is selected depending on the application.
  • There are two main optimization criteria found in ASR literature; Maximum Likelihood (ML) and Maximum Mutual Information (MMI). The solutions to the learning problem under each of those criteria is described below.
  • Maximum Likelihood (ML) criterion: In ML we try to maximize the probability of a given sequence of observations OW, belonging to a given class w, given the HMM λw of the class w, with respect to the parameters of the model λw. This probability is the total likelihood of the observations and can be expressed mathematically as Ltot=p{OWw}.
  • However since we consider only one class w at a time we can drop the subscript and superscript ‘w’s. Then the ML criterion can be given as,

  • L tot =p{O|λ}  (1.9)
  • However there is no known way to analytically solve for the model λ=(Λ,B,π), which maximize the quantity Ltot. But we can choose model parameters such that it is locally maximized, using an iterative procedure, like Baum-Welch method or a gradient based method, which are described below.
  • Baum-Welch Algorithm: This method can be derived using simple “occurrence counting” arguments or using calculus to maximize the auxiliary quantity
  • Q ( λ , λ _ ) = q p { q O , λ } log [ p { O , q , λ _ } ] over λ _
  • [˜gerjanos/HMM/node11.html-r4#4], [˜gerjanos/HMM/node11.html-r21#r21, p 344-346 [[,]]]. A special feature of the algorithm is the guaranteed convergence. To describe the Baum-Welch algorithm, (also known as Forward-Backward algorithm), we need to define two more auxiliary variables, in addition to the forward and backward variables defined in a previous section. These variables can however be expressed in terms of the forward and backward variables.
  • First one of those variables is defined as the probability of being in state i at t=t and in state j at t=t+1. Formally,
  • ξ t ( i , j ) = p { q t = i , q t + 1 = j O , λ ) , which is the same as , ξ t ( i , j ) = p { q t = i , q t + 1 = j O , λ } p { O λ } ( 1.10 ) ( 1.11 )
  • Using forward and backward variables this can be expressed as,
  • ξ t ( i , j ) = α t ( i ) a ij β t + 1 ( j ) b j ( o t + 1 ) i = 1 N j = 1 N α t ( i ) a ij β t + 1 ( j ) b j ( o t + 1 ) ( 1.12 )
  • The second variable is the a posteriori probability,

  • λt(i)=p{q t =i|O,λ}  (1.13)
  • that is the probability of being in state i at t=t, given the observation sequence and the model. In forward and backward variables this can be expressed by,
  • γ t ( i ) = [ α t ( i ) β t ( i ) i = 1 N α t ( i ) β t ( i ) ] ( 1.14 )
  • One can see that the relationship between γt(i) and ξt(i,j) is given by,
  • γ i ( i ) = j = 1 N ξ t ( i , j ) , 1 i N , 1 t M ( 1.15 )
  • Now it is possible to describe the Baum-Welch learning process, where parameters of the HMM is updated in such a way to maximize the quantity, p{O|λ}. Assuming a starting model λ=(Λ,B,π), we calculate the ‘α’s and ‘β’s using the recursions 1.5 and 1.2, and then ‘ξ’s and ‘γ’s using 1.12 and 1.15. Next step is to update the HMM parameters according to eqns 1.16 to 1.18, known as re-estimation formulas.
  • π _ i = λ i ( i ) , 1 i N ( 1.16 ) a _ ij = t = 1 T - 1 ξ t ( i , j ) t = 1 T - 1 γ t ( i ) , 1 i N , 1 j N ( 1.17 ) b _ j ( k ) = t = 1 O t = v k T λ t ( j ) t = 1 T γ t ( j ) , 1 j N , 1 k M ( 1.18 )
  • These re-estimation formulas can easily be modified to deal with the continuous density case too.
  • Gradient based method: In the gradient based method, any parameter Θ of the HMM λ is updated according to the standard formula,
  • Θ new = Θ old - η [ J Θ ] Θ = Θ old ( 1.19 )
  • where J is a quantity to be minimized. We define in this case,

  • J=E ML=−log(p{O|λ})=−log(L tot)  (1.20)
  • Since the minimization of J=EML is equivalent to the maximization of Ltot, eqn. 1.19 yields the required optimization criterion, ML. But the problem is to find the derivative
  • J Θ
  • for any parameter Θ of the model. This can be easily done by relating J to model parameters via Ltot. As a key step to do so, using the eqns. 1.7 and 1.9 we can obtain,
  • L tot = i = 1 N p { O , q t = i λ } = i = 1 N α i ( i ) β t ( i ) ( 1.21 )
  • Differentiating the last equality in eqn. 1.20 with respect to an arbitrary parameter Θ,
  • J Θ = - 1 L tot L tot Θ ( 1.22 )
  • Eqn. 1.22 gives
  • J Θ ,
  • if we know
  • L tot Θ
  • which can be found using eqn. 1.21. However this derivative is specific to the actual parameter concerned. Since there are two main parameter sets in the HMM, namely transition probabilities aij, 1≦i, j≦N and observation probabilities bj(k), 1≦j≦N, 1≦k≦M, we can find the derivative
  • L tot Θ
  • for each of the parameter sets and hence the gradient,
  • J Θ .
  • Gradient with respect to transition probabilities: Using the chain rule,
  • L tot a ij = t = 1 T L tot α t ( j ) α t ( j ) a ij ( 1.23 )
  • By differentiating eqn. 1.21 with respect to αt(j) we get,
  • L tot α t ( j ) = β t ( j ) , ( 1.24 )
  • and differentiating (a time shifted version of) eqn. 1.2 with respect to
  • a ij α t ( j ) a ij = b j ( o t ) α t - 1 ( i ) ( 1.25 )
  • Eqns. 1.23, 1.24 and 1.25 give,
  • L tot α ij ,
  • and substituting this quantity in eqn. 1.22 (keeping in mind that Θ=aij in this case), we get the required result,
  • J a ij = - 1 L tot t = 1 T β t ( j ) b j ( o t ) α t - 1 ( i ) ( 1.26 )
  • Gradient with respect to observation probabilities: Using the chain rule,
  • L tot b j ( o t ) = L tot α t ( j ) α t ( j ) b j ( o t ) ( 1.27 )
  • Differentiating (a time shifted version of) the eqn. 1.2 with respect to
  • b j ( o t ) α t ( j ) b j ( o t ) = α t ( j ) b j ( o t ) ( 1.28 )
  • Finally we get the required probability, by substituting for
  • L tot b j ( o t )
  • in eqn. 1.22 (keeping in mind that Θ=bj(ot) in this case), which is obtained by substituting eqns. 1.28 and 1.24 in eqn. 1.27.
  • J b j ( o t ) = - 1 L tot α t ( j ) β t ( j ) b j ( o t ) ( 1.29 )
  • Usually this is given the following form, by first substituting for Ltot from eqn. 1.21 and then substituting from eqn. 1.14.
  • J b j ( o t ) = γ t ( j ) b j ( o t ) , ( 1.30 )
  • If the continuous densities are used then
  • J c jm , J μ jm , and J jm
  • can be found by further propagating the derivative
  • J b j ( o t )
  • using the chain rule. The same method can be used to propagate the derivative (if necessary) to a front end processor of the HMM. This will be discussed in detail later.
  • Maximum Mutual Information (MMI) criterion: In ML we optimize an HMM of only one class at a time, and do not touch the HMMs for other classes at that time. This procedure does not involve the concept “discrimination” which is of great interest in Pattern Recognition. Thus the ML learning procedure gives a poor discrimination ability to the HMM system, especially when the estimated parameters (in the training phase) of the HMM system do not match with the inputs used in the recognition phase. This type of mismatches can arise due to two reasons. One is that the training and recognition data may have considerably different statistical properties, and the other is the difficulties of obtaining reliable parameter estimates in the training.
  • The MMI criterion on the other hand consider HMMs of all the classes simultaneously, during training. Parameters of the correct model are updated to enhance its contribution to the observations, while parameters of the alternative models are updated to reduce their contributions. This procedure gives a high discriminative ability to the system and thus MMI belongs to the so called “discriminative training” category. consider a set of HMMs Λ={λv, 1≦v≦V}. The task is to minimize the conditional uncertainty of a class v of utterances given an observation sequence Ô of that class. This is equivalent minimize the conditional information,

  • I(v|Ô,Λ)=−log p{v|Ô,Λ} with respect to Λ  (1.31)
  • In an information theoretical frame work this leads to the minimization of conditional entropy, defined as the expectation (E(•)) of the conditional information I,

  • H(V|O)=E[I(v|Ô]  (1.32)
  • where V represents all the classes and O represents all the observation sequences. Then the mutual information between the classes and observations,

  • H(V,O)=H(V)−H(V|O)  (1.33)
  • become maximized; provided H(V) is constant. This is the reason for calling it Maximum Mutual Information (MMI) criterion. The other name of the method, Maximum A Posteriori (MAP) has the roots in eqn. 1.31 where the a posteriori probability p{v|Ô,Λ} is maximized.
  • Even though the eqn. 1.31 defines the MMI criterion, it can be rearranged using the Bayes theorem to obtain a better insight, as in
  • eqn . 1.34 . E MMI = - log p { v { O ^ , Λ } = - log p { v { O ^ , Λ ) p ( O ^ Λ ) = - log p { v , O ^ Λ } w p { w , O ^ Λ } ( 1.34 ) where w represents an arbitrary class .
  • If we use an analogous notation as in eqn. 1.9, we can write the likelihoods,
  • L tot clamped = p { v , O ^ λ } L tot free = w p { w , O ^ λ } ( 1.35 ) ( 1.36 )
  • In the above equations the superscripts clamped and free are used to imply the correct class and all the other classes respectively. If we substitute eqns. 1.35 and 1.36 in the eqn. 1.34, we get,
  • E MMI = - log L tot clamped L tot free ( 1.37 )
  • As in the case of ML re-estimation or gradient methods can be used to minimize the quantity EMMI. In the following a gradient based method, which again makes use of the eqn. 1.19, is described.
  • Since EMMI is to be minimized, in this case J=EMMI, and therefore J is directly given by eqn. 1.37. The problem then simplifies to the calculation of gradients
  • J Θ ,
  • where Θ is an arbitrary parameter of the whole set of HMMs, Λ. This can be done by differentiating 1.37 with respect to Θ,
  • J Θ = 1 L tot free L tot free Θ - 1 L tot clamped L tot clamped Θ ( 1.38 )
  • The same technique, as in the case of ML, can be used to compute the gradients of the likelihoods with respect to the parameters. As a first step likelihoods from eqns. 1.35 and 1.36, are expressed in terms of forward and backward variables using the form as in eqn. 1.7.
  • L tot clamped = i class v α t ( i ) β t ( i ) L tot free = w i class w α t ( i ) β t ( i ) ( 1.39 ) ( 1.40 )
  • Then the required gradients can be found by differentiating eqns. 1.39 and 1.40. But we consider two cases; one for the transition probabilities and another for the observation probabilities, similar to the case of ML.
  • Gradient with respect to transition probabilities
  • Using the chain rule for any of the likelihoods, free or clamped,
  • L tot ( · ) a ij = t = 1 T L tot ( · ) α t ( j ) α t ( j ) a ij ( 1.41
  • Differentiating eqns. 1.39 and 1.40 with respect to αt(j), to get two results for free and clamped cases and using the common result in eqn. 1.25, we get substitutions for both terms on the right hand side of eqn. 1.41. This substitution yields two separate results for free and clamped cases.
  • L tot clamped a ij = δ kv t = 1 T β t ( j ) i class k b j ( o t ) α t - 1 ( 1.42 )
  • where δkv is a Kronecker delta.
  • L tot free a ij = t = 1 T β t ( j ) b j ( o t ) α t - 1 ( i ) ( 1.43 )
  • Substitution of eqns. 1.42 and 1.43 in the eqn. 1.38 (keeping in mind that Θ=aij in this case) gives the required result,
  • J a ij = [ 1 L tot free - δ kv L tot clamped ] t = 1 T β t ( j ) b j ( o t ) α t - 1 ( i ) , ( 1.44 )
  • Gradient with respect to observation probabilities:
  • Using the chain rule for any of the likelihoods, free or clamped,
  • L tot ( · ) b j ( o t ) = L tot ( · ) α t ( j ) α t ( j ) b j ( o t ) ( 1.45 )
  • Differentiating eqns. 1.39 and 1.40 with respect to αt(j), to get two results for free and clamped cases, and using the common result in eqn. 1.28, we get substitutions for both terms on the right hand side of eqn. 1.45. This substitution yields two separate results for free and clamped cases.
  • L tot clamped b j ( o t ) = δ kv α t ( h ) β t ( j ) b j ( o t ) j class k ( 1.46 )
  • where δkv is a Kronecker delta. And
  • L tot free b j ( o t ) = α t ( j ) β t ( j ) b j ( o t ) ( 1.47 )
  • Substitution of eqns. 1.46 and 1.47 in eqn. 1.38 we get the required result,
  • J b j ( o t ) = [ 1 L tot free - δ kv L tot clamped ] α t ( j ) β t ( j ) b j ( o t ) j class k ( 1.48 )
  • This equation can be given a more aesthetic form by defining,
  • γ t ( j ) clamped = δ kv α t ( j ) β t ( j ) L tot clamped j class k ( 1.49 )
  • where δkv is a Kronecker delta, and
  • γ t ( j ) free = α t ( j ) β t ( j ) L tot clamped ( 1.50 )
  • With these variables we express the eqn. 1.48 in the following form.
  • J b j ( o t ) = 1 b j ( o t ) [ λ t ( j ) free - λ t ( j ) clamped ] ( 1.51 )
  • This equation completely defines the update of observation probabilities. If however continuous densities are used then we can further propagate this derivative using the chain rule, in exactly the same way as mentioned in the case ML. Similar comments are valid also for preprocessors.
  • Training: We assume that the preprocessing part of the system gives out a sequence of observation vectors O={o1, o2, . . . , oN}. Starting from a certain set of values, parameters of each of the HMMs λi, 1≦i≦N can be updated as given by the eqn. 1.19, while the required gradients will be given by eqns. 1.44 and 1.48. However for this particular case, isolated recognition, likelihoods in the last two equations are calculated in a peculiar way. First consider the clamped case. Since we have an HMM for each class of units in isolated recognition, we can select the model λl of the class l to which the current observation sequence O1 belongs. Then starting from eqn. 1.39,
  • L tot clamped = L l l = i λ i α t ( i ) β t = i λ i α T ( i ) ( 1.52 )
  • where the second line follows from eqn. 1.3.
  • Similarly for the free case, starting from eqn. 1.40,
  • L tot free = m = 1 N L m l = m = 1 N [ i λ m α t ( i ) β T ( i ) ] m = 1 N i λ m α T ( i ) ( 1.53 )
  • where Lm l represents the likelihood of the current observation sequence belonging to class l, in the model λm. With those likelihoods defined in eqns. 1.52 and 1.53, the gradient giving equations 1.44 and 1.48 will take the forms,
  • J a ij = [ 1 m = 1 N L m l - δ kl L l l ] t = 1 T β t ( j ) b j ( o t ) α t - 1 i , j λ k ( 1.54 ) J b ij ( o t ) = [ 1 m = 1 N L m l - δ kl L l l ] α t ( j ) β t ( j ) b j ( o t ) j λ k ( 1.55 )
  • Now we can summarize the training procedure as follows.
  • (1) Initialize the each HMM, λi=(Λi,Bii), 1≦i≦N with values generated randomly or using an initialization algorithm like segmental K means [˜gerjanos/HMM/node19.html-r4#r4].
  • (2) Take an observation sequence and: Calculate the forward and backward probabilities for each HMM, using the recursions 1.5 and 1.2; Using the equations 1.52 and 1.53 calculate the likelihoods; Using the equations 1.54 and 1.55 calculate the gradients with respect to parameters for each model; Update parameters in each of the models using the eqn. 1.19.
  • (3) Go to step (2), unless all the observation sequences are considered.
  • (4) Repeat step (2) to (3) until a convergence criterion is satisfied.
  • This procedure can easily be modified if the continuous density HMMs are used, by propagating the gradients via chain rule to the parameters of the continuous probability distributions. Further it is worth to mention that preprocessors can also be trained simultaneously, with such a further back propagation.
  • Recognition: Comparative to the training, recognition is much simpler and the procedure is given below.
  • (1) Take an observation sequence to be recognized and (2) Calculate the forward and backward probabilities for each HMM, using the recursions 1.5 and 1.2; As in the equation 1.53 calculate the likelihoods, Lm l, 1≦m≦N; The recognized class l*, to which the observation sequence belongs, is given by
  • l * = arg max l m N L m l ;
  • (3) Go to step (2), unless all the observation sequences to be recognized are considered. The recognition rate in this case can be calculated as the ratio between number of correctly recognized speech units and total number of speech units (observation sequences) to be recognized.
  • Use of Fourier transform in pre-processing: The Hartley Transform is an integral transform which shares some features with the Fourier Transform, but which (in the discrete case), multiplies the kernel by
  • cos ( 2 π kn N ) - sin ( 2 π kn N ) ( 1.56 )
  • instead of
  • - 2 π kn / M = cos ( 2 π kn N ) - i sin ( 2 π kn N ) . ( 1.57 )
  • The Hartley transform produces real output for a real input, and is its own inverse. It therefore can have computational advantages over the discrete Fourier transform, although analytic expressions are usually more complicated for the Hartley transform. The discrete version of the Hartley transform can be written explicitly as
  • [ a ] 1 N n = 0 N - 1 a n [ cos ( 2 π kn N ) - sin ( 2 π kn N ) ] ( 1.58 ) [ a ] - [ a ] ( 1.59 )
  • where
    Figure US20160224951A1-20160804-P00001
    denotes the Fourier Transform. The Hartley transform obeys the convolution property

  • ā≡a 0  (1.61)

  • Figure US20160224951A1-20160804-P00002
    [a*b] k=½(A k B k)−Ā k B k +A k B k k B k) where  (1.60)

  • ā n/2 ≡a n/2  (1.62)

  • ā k ≡a n−k  (1.63)
  • (Arndt). Like the fast Fourier Transform algorithm, there is a “fast” version of the Hartley transform algorithm. A decimation in time algorithm makes use of

  • Figure US20160224951A1-20160804-P00002
    n left [a]=
    Figure US20160224951A1-20160804-P00002
    n/2 [a even]+χ
    Figure US20160224951A1-20160804-P00002
    n/2 [a odd]  (1.64)

  • Figure US20160224951A1-20160804-P00002
    n right [a]=
    Figure US20160224951A1-20160804-P00002
    n/2 [a even]+χ
    Figure US20160224951A1-20160804-P00002
    n/2 [a odd]  (1.65)
  • where χ denotes the sequence with elements
  • a n cos ( π n N ) - a _ sin ( π n N ) ( 1.66 )
  • A decimation in frequency algorithm makes use of

  • Figure US20160224951A1-20160804-P00002
    n even [a]=
    Figure US20160224951A1-20160804-P00002
    n/2 [a left +a right]  (1.67)

  • Figure US20160224951A1-20160804-P00002
    n odd [a]=
    Figure US20160224951A1-20160804-P00002
    n/2 [a left −a right]  (1.68)
  • The discrete Fourier transform
  • A k [ a ] = n = 0 N - 1 - 2 π kn / N a n ( 1.69 )
  • can be written
  • [ A k A - k ] n = 0 N - 1 [ - 2 π kn / N 0 0 2 π kn / N ] ( 1.70 )
  • n = 0 N - 1 [ 1 - i 1 + i 1 + i 1 - i ] [ cos ( 2 π kn N ) sin ( 2 π kn N ) - sin ( 2 π kn N ) cos ( 2 π kn N ) ] 1 2 [ 1 + i 1 - i 1 - i 1 + i ] [ a n a n ] ( 1.71 )
  • so F=T−1HT. See,
  • A Hartley transform based fixed pre-processing may be considered, on some bases, inferior to that based on Fourier transform. One explanation for this is based on the respective symmetries and shift invariance properties. Therefore we expect improved performances from Fourier transform even when the pre-processing is adaptive. However a training procedure which preserves the symmetries of weight distributions must be used. Main argument of the use of Hartley transform is to avoid the complex weights. A Fourier transform, however, can be implemented as a neural network containing real weights, but with a slightly modified network structure than the usual MLP. We can easily derive the equations which give the forward and backward pass.
  • Forward pass is given by,
  • [ i = 0 N - 1 x t ( i ) cos ( 2 π ij N ) ] 2 + [ i = 0 N - 1 x t ( i ) sin ( 2 π ij N ) ] 2 = X ~ t 2 ( j ) ( 2.1 )
  • where N denotes the window length, and {tilde over (X)}t(j)=|Xt(j)|.
  • If we use the notation
  • θ ij = 2 π ij N ,
  • and error is denoted by J, then we can find
  • J Θ ij
  • simply by using the chain rule,
  • J θ ij = t = 1 T J X ~ t 2 ( j ) X ~ t 2 ( j ) θ ij ( 2.2 )
  • We assume that
  • J X ~ t 2 ( j )
  • is known and
  • X ~ t 2 ( j ) θ ij ( 2.3 )
  • can simply be found by differentiating eqn. 2.1 with respect to θij. Thus we get,
  • X ~ t 2 ( j ) θ ij = 2 x t ( i ) cos ( θ ij ) k = 1 N - 1 x t ( k ) sin ( θ kj ) - 2 x t ( i ) sin ( θ ij ) k = 1 N - 1 x t ( k ) cos ( θ kj )
  • Eqns. 2.2 and 2.3 define the backward pass. Note that θij can be further back propagated as usual. Training procedure which preserves symmetry: We can use a training procedure which preserves symmetrical distribution of weights in the Hartley or Fourier transform stages. In addition to the improved shift invariance, this approach can lead to parameter reduction. The procedure starts by noting the equal weights at initialization. Then the forward and backward passes are performed as usual. But in updating we use the same weight update for all the equal weights, namely the average value of all the weight updates corresponding to the equal weights. In this way we can preserve any existing symmetry in the initial weight distributions. At the same time number of parameters is reduced because only one parameter is needed to represent the whole class of equal weights.
  • See, A Hybrid ANN-HMM ASR system with NN based adaptive preprocessing, Narada Dilp Warakagoda, M.Sc. thesis (Norges Tekniske Hgskole, Institutt for Teleteknikk Transmisjonsteknikk),˜gerjanos/HMM/hoved.html.
  • As an alternate to the Hartley transform, a Wavelet transform may be applied.
  • The fast Fourier transform (FFT) and the discrete wavelet transform (DWT) are both linear operations that generate a data structure that contains segments of various lengths, usually filling and transforming it into a different data vector of length.
  • The mathematical properties of the matrices involved in the transforms are similar as well. The inverse transform matrix for both the FFT and the DWT is the transpose of the original. As a result, both transforms can be viewed as a rotation in function space to a different domain. For the FFT, this new domain contains basis functions that are sines and cosines. For the wavelet transform, this new domain contains more complicated basis functions called wavelets, mother wavelets, or analyzing wavelets.
  • Both transforms have another similarity. The basis functions are localized in frequency, making mathematical tools such as power spectra (how much power is contained in a frequency interval) and scalegrams (to be defined later) useful at picking out frequencies and calculating power distributions.
  • The most interesting dissimilarity between these two kinds of transforms is that individual wavelet functions are localized in space. Fourier sine and cosine functions are not. This localization feature, along with wavelets' localization of frequency, makes many functions and operators using wavelets “sparse” when transformed into the wavelet domain. This sparseness, in turn, results in a number of useful applications such as data compression, detecting features in images, and removing noise from time series.
  • One way to see the time-frequency resolution differences between the Fourier transform and the wavelet transform is to look at the basis function coverage of the time-frequency plane.
  • In a windowed Fourier transform, where the window is simply a square wave, the square wave window truncates the sine or cosine function to fit a window of a particular width. Because a single window is used for all frequencies in the WFT, the resolution of the analysis is the same at all locations in the time-frequency plane.
  • An advantage of wavelet transforms is that the windows vary. In order to isolate signal discontinuities, one would like to have some very short basis functions. At the same time, in order to obtain detailed frequency analysis, one would like to have some very long basis functions. A way to achieve this is to have short high-frequency basis functions and long low-frequency ones. This happy medium is exactly what you get with wavelet transforms.
  • One thing to remember is that wavelet transforms do not have a single set of basis functions like the Fourier transform, which utilizes just the sine and cosine functions. Instead, wavelet transforms have an infinite set of possible basis functions. Thus wavelet analysis provides immediate access to information that can be obscured by other time-frequency methods such as Fourier analysis.
  • Wavelet transforms comprise an infinite set. The different wavelet families make different trade-offs between how compactly the basis functions are localized in space and how smooth they are.
  • Some of the wavelet bases have fractal structure. The Daubechies wavelet family is one example.
  • Within each family of wavelets (such as the Daubechies family) are wavelet subclasses distinguished by the number of coefficients and by the level of iteration. Wavelets are classified within a family most often by the number of vanishing moments. This is an extra set of mathematical relationships for the coefficients that must be satisfied, and is directly related to the number of coefficients. For example, within the Coiflet wavelet family are Coiflets with two vanishing moments, and Coiflets with three vanishing moments.
  • The Discrete Wavelet Transform: Dilations and translations of the “Mother function,” or “analyzing wavelet” Φ(x) define an orthogonal basis, our wavelet basis:
  • Φ ( sf ) ( x ) = 2 - s 2 Φ ( 2 - 2 x - l ) .
  • The variables s and l are integers that scale and dilate the mother function Φ(x) to generate wavelets, such as a Daubechies wavelet family. The scale index s indicates the wavelet's width, and the location index l gives its position. Notice that the mother functions are rescaled, or “dilated” by powers of two, and translated by integers. What makes wavelet bases especially interesting is the self-similarity caused by the scales and dilations. Once we know about the mother functions, we know everything about the basis. Note that the scaling-by-two is a feature of the Discrete Wavelet Transform (DWT), and is not, itself, compelled by Wavelet theory. That is, while it is computationally convenient to employ a binary tree, in theory, if one could define a precise wavelet that corresponds to a feature of a data set to be processed, this wavelet could be directly extracted. Clearly, the utility of the DWT is its ability to handle general cases without detailed pattern searching, and therefore the more theoretical wavelet transform techniques based on precise wavelet matching are often reserved for special cases. On the other hand, by carefully selecting wavelet basis functions, or combinations of basis functions, a very sparse representation of a complex and multidimensional data space may be obtained. The utility, however, may depend on being able to operate in the wavelet transform domain (or subsequent transforms of the sparse representation coefficients) for subsequent analysis. Note that, while wavelets are generally represented as two dimensional functions of amplitude and time, it is clear that wavelet theory extends into n-dimensional space. Thus, the advantageous application of wavelet theory is in cases where a modest number of events, for example having associated limited time and space parameters, are represented in a large data space. If the events could be extracted with fair accuracy, the data space could be replaced with a vector quantized model (VQM), wherein the extracted events correspond to real events, and wherein the VQM is highly compressed as compared to the raw data space. Further, while there may be some data loss as a result of the VQM expression, if the real data corresponds to the wavelet used to model it, then the VQM may actually serve as a form of error correction. Clearly, in some cases, especially where events are overlapping, the possibility for error occurs. Further, while the DWT is often useful in denoising data, in some cases, noise may be inaccurately represented as an event, while in the raw data space, it might have been distinguished. Thus, one aspect of a denoised DWT representation is that there is an implicit presumption that all remaining elements of the representation matrix are signal.
  • A particular advantage of a DWT approach is that it facilitates a multiresolution analysis of data sets. That is, if decomposition of the raw data set with the basis function, transformed according to a regular progressions, e.g., powers of 2, then at each level of decomposition, a level of scale is revealed and presented. It is noted that the transform need not be a simple power of two, and itself may be a function or complex and/or multidimensional function. Typically, non-standard analyses are reserved for instances where there is, or is believed to be, a physical basis for the application of such functions instead of binary splitting of the data space.
  • Proceeding with the DWT analysis, we span our data domain at different resolutions, see, using the analyzing wavelet in a scaling equation:
  • W ( x ) = k = 1 N - 2 ( - 1 ) c k + 1 Φ ( 2 x + k ) ,
  • where W(x) is the scaling function for the mother function Φ(x), and ck are the wavelet coefficients. The wavelet coefficients must satisfy linear and quadratic constraints of the form
  • k = 0 N - 1 c k = 2 , k = 0 N - 1 c k c k + 2 l = 2 δ l , 0 ,
  • where δ is the delta function and l is the location index.
  • One of the most useful features of wavelets is the ease with which one can choose the defining coefficients for a given wavelet system to be adapted for a given problem. In Daubechies' original paper, I. Daubechies, “Orthonormal Bases of Compactly Supported Wavelets,” Comm. Pure Appl. Math., Vol 41, 1988, pp. 906-966, she developed specific families of wavelet systems that were very good for representing polynomial behavior. The Haar wavelet is even simpler, and it is often used for educational purposes. (That is, while it may be limited to certain classes of problems, the Haar wavelet often produces comprehensible output which can be generated into graphically pleasing results).
  • It is helpful to think of the coefficients {c0, . . . , cn} as a filter. The filter or coefficients are placed in a transformation matrix, which is applied to a raw data vector. The coefficients are ordered using two dominant patterns, one that works as a smoothing filter (like a moving average), and one pattern that works to bring out the data's “detail” information. These two orderings of the coefficients are called a quadrature mirror filter pair in signal processing parlance. A more detailed description of the transformation matrix can be found in W. Press et al., Numerical Recipes in Fortran, Cambridge University Press, New York, 1992, pp. 498-499, 584-602. To complete the discussion of the DWT, let's look at how the wavelet coefficient matrix is applied to the data vector. The matrix is applied in a hierarchical algorithm, sometimes called a pyramidal algorithm. The wavelet coefficients are arranged so that odd rows contain an ordering of wavelet coefficients that act as the smoothing filter, and the even rows contain an ordering of wavelet coefficient with different signs that act to bring out the data's detail. The matrix is first applied to the original, full-length vector. Then the vector is smoothed and decimated by half and the matrix is applied again. Then the smoothed, halved vector is smoothed, and halved again, and the matrix applied once more. This process continues until a trivial number of “smooth-smooth-smooth . . . ” data remain. Each matrix application brings out a higher resolution of the data while time smoothing the remaining data. The output of the DWT consists of the remaining “smooth (etc.)” components, and all of the accumulated “detail” components.
  • The Fast Wavelet Transform: If the DWT matrix is not sparse, so we face the same complexity issues that we had previously faced for the discrete Fourier transform. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, AK Peters, Boston, 1994, pp. 213-214, 237, 273-274, 387. We solve it as we did for the FFT, by factoring the DWT into a product of a few sparse matrices using self-similarity properties. The result is an algorithm that requires only order n operations to transform an n-sample vector. This is the “fast” DWT of Mallat and Daubechies.
  • Wavelet Packets: The wavelet transform is actually a subset of a far more versatile transform, the wavelet packet transform. M. A. Cody, “The Wavelet Packet Transform,” Dr. Dobb's Journal, Vol 19, April 1994, pp. 44-46, 50-54. Wavelet packets are particular linear combinations of wavelets. V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, AK Peters, Boston, 1994, pp. 213-214, 237, 273-274, 387. They form bases which retain many of the orthogonality, smoothness, and localization properties of their parent wavelets. The coefficients in the linear combinations are computed by a recursive algorithm making each newly computed wavelet packet coefficient sequence the root of its own analysis tree.
  • Adapted Waveforms: Because we have a choice among an infinite set of basis functions, we may wish to find the best basis function for a given representation of a signal. Wickerhauser, Id. A basis of adapted waveform is the best basis function for a given signal representation. The chosen basis carries substantial information about the signal, and if the basis description is efficient (that is, very few terms in the expansion are needed to represent the signal), then that signal information has been compressed.
  • According to Wickerhauser, Id., some desirable properties for adapted wavelet bases are
  • 1. speedy computation of inner products with the other basis functions;
  • 2. speedy superposition of the basis functions;
  • 3. good spatial localization, so researchers can identify the position of a signal that is contributing a large component;
  • 4. good frequency localization, so researchers can identify signal oscillations; and
  • 5. independence, so that not too many basis elements match the same portion of the signal.
  • For adapted waveform analysis, researchers seek a basis in which the coefficients, when rearranged in decreasing order, decrease as rapidly as possible. to measure rates of decrease, they use tools from classical harmonic analysis including calculation of information cost functions. This is defined as the expense of storing the chosen representation. Examples of such functions include the number above a threshold, concentration, entropy, logarithm of energy, Gauss-Markov calculations, and the theoretical dimension of a sequence.
  • Multiresolution analysis results from the embedded subsets generated by the interpolations at different scales.
  • A function f(x) is projected at each step j onto the subset Vj. This projection is defined by the scalar product cj(k) of ƒ(x) with the scaling function φ(x) which is dilated and translated: cj(k)=<ƒ(x), 2−jφ(2−jx−k)>
  • As φ(x) is a scaling function which has the property:
  • 1 2 φ ( x 2 ) = n h ( n ) φ ( x - n )
  • or {circumflex over (φ)}(2v)=ĥ(v){circumflex over (φ)}(v) where ĥ(v) is the Fourier transform of the function Σnh(n)δ(x−n). We get: ĥ(v)=Σnh(n)e−2πnv.
  • The property of the scaling function of φ(x) is that it permits us to compute directly the set cj+1(k) from cj(k). If we start from the set c0(k) we compute all the sets cj(k), with j>0, without directly computing any other scalar product: cj+1(k)=Σnh(n−2k)cj(n). At each step, the number of scalar products is divided by 2. Step by step the signal is smoothed and information is lost. The remaining information can be restored using the complementary subspace Wj+1 of Vj+1 in Vj. This subspace can be generated by a suitable wavelet function Ψ(x) with translation and dilation.
  • 1 2 ψ ( x 2 ) = n g ( n ) φ ( x - n )
  • or {circumflex over (ψ)}(2v)=ĝ(v){circumflex over (φ)}(v)
  • We compute the scalar products <ƒ(x),2−(j+1)ψ(2−(j+1)x−k)> with:
  • w j + 1 ( k ) = n g ( n - 2 k ) c j ( n )
  • With this analysis, we have built the first part of a filter bank. In order to restore the original data, Mallat uses the properties of orthogonal wavelets, but the theory has been generalized to a large class of filters by introducing two other filters {tilde over (h)} and {tilde over (g)} named conjugated to h and g. The restoration, that is, the inverse transform after filtering in the transform domain, is performed with:
  • c j ( k ) = 2 l [ c j + 1 ( l ) h ~ ( k + 2 l ) + w j + 1 ( l ) g ^ ( k + 2 l ) ]
  • In order to get an exact restoration, two conditions are required for the conjugate filters:

  • Dealiasing condition: {circumflex over (h)}(v+½){tilde over ({circumflex over (h)})}(v)+{circumflex over (g)}(v+½){tilde over ({circumflex over (g)})}(v)=0

  • Exact restoration: {circumflex over (h)}(v){tilde over ({circumflex over (h)})}(v)+{circumflex over (g)}(v){tilde over ({circumflex over (g)})}(v)=1
  • In the decomposition, the function is successively convolved with the two filters H (low frequencies) and G (high frequencies). Each resulting function is decimated by suppression of one sample out of two. The high frequency signal is left, and we iterate with the low frequency signal. In the reconstruction, we restore the sampling by inserting a 0 between each sample, then we convolve with the conjugate filters {tilde over (H)} and {tilde over (G)}, we add the resulting functions and we multiply the result by 2. We iterate up to the smallest scale.
  • Orthogonal wavelets correspond to the restricted case where:

  • {circumflex over (g)}(v)e −2πv ĥ*(v+½)

  • {tilde over ({circumflex over (h)})}(v) {circumflex over (h)}*(v), |{circumflex over (h)}(v)|2 +|ĥ(v+½)|2=1

  • {tilde over ({circumflex over (g)})}(v) {circumflex over (g)}*(v)
  • We can easily see that this set satisfies the dealiasing condition and exact restoration condition. Daubechies wavelets are the only compact solutions. For biorthogonal wavelets we have the relations:

  • {circumflex over (g)}(v) e −2πv{tilde over ({circumflex over (h)})}*(v+½),

  • {circumflex over (h)}(v){tilde over ({circumflex over (h)})}(v)+{circumflex over (h)}*(v+½)+{tilde over ({circumflex over (h)})}*(v+½)=1

  • {tilde over ({circumflex over (g)})}(v) e 2πv ĥ*(v+½)
  • Which also satisfy the dealiasing condition and exact restoration condition. A large class of compact wavelet functions can be derived. Many sets of filters were proposed, especially for coding. The choice of these filters must be guided by the regularity of the scaling and the wavelet functions. The complexity is proportional to N. The algorithm provides a pyramid of N elements.
  • The 2D algorithm is based on separate variables leading to prioritizing of x and y directions. The scaling function is defined by: φ(x,y)=φ(x)φ(y)
  • The passage from a resolution to the next one is done by:
  • f j + 1 ( k x , k y ) = l x = - + l y = - + h ( l x - 2 k x ) h ( l y - 2 k y ) f j ( l x , l y )
  • The detail signal is obtained from three wavelets: a vertical wavelet: ψ1(x,y)=φ(x)ψ(y), a horizontal wavelet: ψ2 (x,y)=ψ(x)φ(y), and a diagonal wavelet: ψ3 (x,y)=ψ(x)ψ(y), which leads to three sub-images:
  • C j + 1 1 ( k x , k y ) = l x = - + l y = - + g ( l x - 2 k x ) h ( l y - 2 k y ) f j ( l x , l y ) C j + 1 2 ( k x , k y ) = l x = - + l y = - + h ( l x - 2 k x ) g ( l y - 2 k y ) f j ( l x , l y ) C j + 1 3 ( k x , k y ) = l x = - + l y = - + g ( l x - 2 k x ) g ( l y - 2 k y ) f j ( l x , l y )
  • The wavelet transform can be interpreted as the decomposition on frequency sets with a spatial orientation.
  • The à trous algorithm: The discrete approach of the wavelet transform can be done with the special version of the so-called à trous algorithm (with holes). One assumes that the sampled data {c0(k)} are the scalar products at pixels k of the function f(x) with a scaling function φ(x) which corresponds to a low pass filter.
  • The first filtering is then performed by a twice magnified scale leading to the {c1(k)} set. The signal difference {c0(k)}−{c1(k)} contains the information between these two scales and is the discrete set associated with the wavelet transform corresponding to φ(x). The associated wavelet is therefore
  • ψ ( x ) · 1 2 ψ ( x 2 ) = φ ( x ) - 1 2 φ ( x 2 )
  • The distance between samples increasing by a factor 2 from the scale (i−1) (j>0) to the next one, ci(k) is given by:
  • c i ( k ) = l h ( l ) c i - 1 ( k + 2 i - 1 l )
  • and the discrete wavelet transform wi (k) by: wi(k)=ci−1(k)−ci(k)
  • The coefficients {h(k)} derive from the scaling function
  • φ ( x ) : 1 2 φ ( x 2 ) = l h ( l ) φ ( x - l )
  • The algorithm allowing one to rebuild the data frame is evident: the last smoothed array cn p is added to all the differences wi.
  • c 0 ( k ) = c n p ( k ) j = 1 n p w j ( k ) .
  • If we choose the linear interpolation for the scaling function φ:

  • φ(x)=1−|x| if x∈[−1,1]

  • φ(x)=0 if x∈[−1,1]
  • we have:
  • 1 2 φ ( x 2 ) = 1 4 φ ( x + 1 ) + 1 2 φ ( x ) + 1 4 φ ( x - 1 ) ,
  • c1 is obtained by:

  • c 1(k)=¼c 0(k−1)+½c 0(k)+¼c 0(k+1)
  • and cj+1 is obtained from cj by

  • c j+1(k)=¼c j(k−2j)+½c j(k)+¼c j(k+2j)
  • The wavelet coefficients at the scale j are:

  • c j+1(k)=−¼c j(k−2j)+½c j(k)−¼c j(k+2j)
  • The above à trous algorithm is easily extensible to the two dimensional space. This leads to a convolution with a mask of 3×3 pixels for the wavelet connected to linear interpolation. The coefficients of the mask are:
  • ( 1 16 1 8 1 16 1 8 1 4 1 8 1 16 1 8 1 16 )
  • At each scale j, we obtain a set {wj(k,l)} (we will call it wavelet plane in the following), which has the same number of pixels as the image. If we choose a B3-spline for the scaling function, the coefficients of the convolution mask in one dimension are
  • 1 16 ,
  • ¼, ⅜, ¼,
  • 1 16 ,
  • and in two dimensions:
  • ( 1 256 1 64 3 128 1 64 1 256 1 64 1 16 3 32 1 16 1 64 3 128 3 32 9 64 3 32 3 128 1 64 1 16 3 32 1 16 1 64 1 256 1 64 3 128 1 64 1 256 )
  • The Wavelet transform using the Fourier transform: We start with the set of scalar products c0(k)=
    Figure US20160224951A1-20160804-P00003
    Figure US20160224951A1-20160804-P00004
    . If φ(x) has a cut-off frequency vc≦½, the data are correctly sampled. The data at the resolution j=1 are:
  • c 1 ( k ) = f ( x ) , 1 2 φ ( x 2 - k )
  • and we can compute the set c1(k) from c0(k) with a discrete filter
  • h ^ ( v ) : h ^ ( v ) = { φ ^ ( 2 v ) φ ^ ( v ) if v < v 0 if v c v < 1 2
  • and ∀v, ∀n ĥ(v+n)=ĥ(v), where n is an integer. So: ĉj+1(v)=ĉj(v)ĥ(2jv)
  • The cut-off frequency is reduced by a factor 2 at each step, allowing a reduction of the number of samples by this factor.
  • The wavelet coefficients at the scale j+1 are: wj+1(k)=
    Figure US20160224951A1-20160804-P00003
    Figure US20160224951A1-20160804-P00004
  • and they can be computed directly from cj(k) by: ŵj−1(v)=ĉj(v)ĝ(2jv)
  • where g is the following discrete filter:
  • g ^ ( v ) = { ψ ^ ( 2 v ) φ ^ ( v ) if v < v c 1 if v c v < 1 2 , and v , n g ^ ( v + n ) = g ^ ( v )
  • The frequency band is also reduced by a factor 2 at each step. Applying the sampling theorem, we can build a pyramid of
  • N + N 2 + + 1 = 2 N
  • elements. For an image analysis the number of elements is 4/3N2. The overdetermination is not very high. The B-spline functions are compact in this direct space. They correspond to the autoconvolution of a square function. In the Fourier space we have:
  • β ^ l ( v ) = sin π c l + 1 π v
  • B3(x) is a set of 4 polynomials of degree 3. We choose the scaling function φ(v) which has a Bx(x) profile in the Fourier space: {circumflex over (φ)}(v)=3/2B3 (4v) In the direct space we get:
  • φ ( x ) = 3 8 [ sin π x 4 π x 4 ] 4
  • This function is quite similar to a Gaussian one and converges rapidly to 0. For 2-D the scaling function is defined by {circumflex over (φ)}(u,v)=3/2B3 (4r), with r=√{square root over ((u2+v2))}. It is an isotropic function.
  • The wavelet transform algorithm with np scales is the following one:
  • 1. We start with a B3-Spline scaling function and we derive ψ, h and g numerically.
  • 2. We compute the corresponding image FFT. We name T0 the resulting complex array;
  • 3. We set j to 0. We iterate:
  • 4. We multiply Tj by ĝ(2ju,2jv). We get the complex array Wj+1. The inverse FFT gives the wavelet coefficients at the scale 2j;
  • 5. We multiply Tj by ĥ(2ju,2jv). We get the array Tj+1. Its inverse FFT gives the image at the scale 2j+1. The frequency band is reduced by a factor 2.
  • 6. We increment j
  • 7. If j≦np, we go back to 4.
  • 8. The set {w1, w2, . . . , wn p , cn p } describes the wavelet transform.
  • If the wavelet is the difference between two resolutions, we have: {circumflex over (ψ)}(2v)={circumflex over (φ)}(v)−{circumflex over (φ)}(2v) and: ĝ(v)=1−ĥ(v) then the wavelet coefficients ŵj(v) can be computed by ĉj−1(v)−ĉj(v). The Reconstruction: If the wavelet is the difference between two resolutions, an evident reconstruction for a wavelet transform
  • = { w 1 , w 2 , , w n p , c n p } is : c ^ 0 ( v ) = c ^ n p ( v ) + j w ^ j ( v ) .
  • But this is a particular case and other wavelet functions can be chosen. The reconstruction can be done step by step, starting from the lowest resolution. At each scale, we have the relations: ĉj+1=ĥ(2jv)ĉj(v), ŵj+1=ĝ(2jv)ĉj(v) we look for cj knowing cj+1, wj+1, h and g. We restore ĉj(v) with a least mean square estimator:

  • {circumflex over (p)} h(2j(v)|ĉ j+1(v)−{circumflex over (h)}(2j v)ĉ j(v)|2 +{circumflex over (p)} g(2j v)|ŵ j+1(v)−{circumflex over (g)}(2j v)ĉ j(v)|2
  • is minimum. {circumflex over (p)}h(v) and {circumflex over (p)}g(v) are weight functions which permit a general solution to the restoration of ĉj(v). By ĉj(v) derivation we get: ĉj (v)=ĉj+1(v){tilde over (ĥ)}(2jv)+ŵj+1(v){tilde over (ĝ)}(2jv), where the conjugate filters have the expression:
  • h ~ ^ ( v ) = p ^ h ( v ) h ^ * ( v ) p ^ h ( v ) | h ^ ( v ) | 2 + p ^ g ( v ) | g ^ ( v ) | 2 , g ~ ^ ( v ) = p ^ g ( v ) g ^ * ( v ) p ^ h ( v ) | h ^ ( v ) | 2 + p ^ g ( v ) | g ^ ( v ) | 2
  • It is easy to see that these filters satisfy the exact reconstruction equation. In fact, above pair of equations give the general solution to this equation. In this analysis, the Shannon sampling condition is always respected. No aliasing exists, so that the dealiasing condition is not necessary (i.e., it is satisfied as a matter of course).
  • The denominator is reduced if we choose: ĝ(v)=√{square root over (1−|ĥ(v)|2)} This corresponds to the case where the wavelet is the difference between the square of two resolutions: |{circumflex over (ψ)}(2v)|2|{circumflex over (φ)}(v)|2−|{circumflex over (φ)}(2v)|2. The reconstruction algorithm is:
  • 1. We compute the FFT of the image at the low resolution.
  • 2. We set j to np. We iterate:
  • 3. We compute the FFT of the wavelet coefficients at the scale j.
  • 4. We multiply the wavelet coefficients ŵj by {tilde over (ĝ)}.
  • 5. We multiply the image at the lower resolution ĉj by {tilde over (ĥ)}.
  • 6. The inverse Fourier Transform of the addition of ŵj{tilde over (ĝ)} and ĉi{tilde over (ĥ)} gives the image cj+1.
  • 7. j=j˜1 and we go back to 3.
  • The use of a scaling function with a cut-off frequency allows a reduction of sampling at each scale, and limits the computing time and the memory size. Thus, it is seen that the DWT is in many respects comparable to the DFT, and, where convenient, may be employed in place thereof. While substantial work has been done in the application of wavelet analysis and filtering to image data, it is noted that the wavelet transform analysis is not so limited. In particular, one embodiment of the present invention applies the transform to describe statistical events represented within a multidimensional data-space. By understanding the multi-resolution interrelationships of various events and probabilities of events, in a time-space representation, a higher level analysis is possible than with other common techniques. Likewise, because aspects of the analysis are relatively content dependent, they may be accelerated by digital signal processing techniques or array processors, without need to apply artificial intelligence. On the other hand, the transformed (and possibly filtered) data set, is advantageously suitable for intelligent analysis, either by machine or human. Generally, there will be no need to perform an inverse transform on the data set. On the other hand, the wavelet analysis may be useful for characterizing and analyzing only a limited range of events. Advantageously, if an event is recognized with high reliability within a transform domain, the event may be extracted from the data representation and an inverse transform performed to provide the data set absent the recognized feature or event. This allows a number of different feature-specific transforms to be conducted, and analyzed. This analysis may be in series, that is, having a defined sequence of transforms, feature extractions, and inverse transforms. On the other hand, the process may be performed in parallel. That is, the data set is subjected to various “tests”, which are conducted by optimally transforming the data to determine if a particular feature (event) is present, determined with high reliability. As each feature is identified, the base data set may be updated for the remaining “tests”, which will likely simplify the respective analysis, or improve the reliability of the respective determination. As each event or feature is extracted, the data set becomes simpler and simpler, until only noise remains. It should be noted that, in some instances, a high reliability determination of the existence of an event cannot be concluded. In those cases, it is also possible to perform a contingent analysis, leading to a plurality of possible results for each contingency. Thus, a putative feature is extracted or not extracted from the data set and both results passed on for further analysis. Where one of the contingencies is inconsistent with a subsequent high reliability determination, that entire branch of analysis may be truncated. Ideally, the output consists of a data representation with probabilistic representation of the existence of events or features represented within the data set. As discussed below, this may form the basis for a risk-reliability output space representation of the data, useable directly by a human (typically in the form of a visual output) and/or for further automated analysis. It is also noted that the data set is not temporally static, and therefore the analysis may be conducted in real time based on a stream of data.
  • The Process to be Estimated: The Kalman filter addresses the general problem of trying to estimate the state x∈
    Figure US20160224951A1-20160804-P00005
    n of a discrete-time controlled process that is governed by the linear stochastic difference equation

  • x k =Ax k−1 +Bu k +w k−1,  (3-1)
  • with a measurement zk
    Figure US20160224951A1-20160804-P00005
    m that is

  • z k =Hx k +v k.  (3.2)
  • The random variables wk and vk represent the process and measurement noise (respectively). They are assumed to be independent (of each other), white, and with normal probability distributions

  • p(w)−N(0,Q),  (3.3)

  • p(v)−N(0,R).  (3.4)
  • In practice, the process noise covariance Q and measurement noise covariance R matrices might change with each time step or measurement, however here we assume they are constant. See, Kalman, Rudolph, Emil, “New Approach to Linear Filtering and Prediction Problems”, Transactions of the ASME—Journal of Basic Engineering, 82D:35-45 (1960) (describes the namesake Kalman filter, which is a set of mathematical equations that provides an efficient computational (recursive) solution of the least-squares method. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown.)
  • The n×n matrix A in the difference equation (3.1) relates the state at the previous time step k−1 to the state at the current step k, in the absence of either a driving function or process noise. Note that in practice A might change with each time step, but here we assume it is constant. The n×1 matrix B relates the optional control input u∈
    Figure US20160224951A1-20160804-P00005
    l to the state x. The m×n matrix H in the measurement equation (3.2) relates the state to the measurement zk. In practice H might change with each time step or measurement, but here we assume it is constant.
  • The Computational Origins of the Filter: We define {circumflex over (x)}k
    Figure US20160224951A1-20160804-P00005
    n (note the “super minus”) to be our a priori state estimate at step k given knowledge of the process prior to step k, and {circumflex over (x)}k
    Figure US20160224951A1-20160804-P00005
    n to be our a posteriori state estimate at step k given measurement zk. We can then define a priori and a posteriori estimate errors as ek ≡xk−{circumflex over (x)}k and ek ≡xk−{circumflex over (x)}k.
  • The a priori and the a posteriori estimate error covariance is then

  • P k =E[e k e k −T],  (3.5)

  • P k =E[e k e k −T].  (3.6)
  • In deriving the equations for the Kalman filter, we begin with the goal of finding an equation that computes an a posteriori state estimate {circumflex over (x)}k as a linear combination of an a priori estimate {circumflex over (x)}k and a weighted difference between an actual measurement zk and a measurement prediction H{circumflex over (x)}k as shown below in (3.7). Some justification for (3.7) is given in “The Probabilistic Origins of the Filter” found below. See,˜welch/kalman/kalman_filter/kalman-1.htm, expressly incorporated herein by reference.

  • {circumflex over (x)} k ={circumflex over (x)} k +K(z k −H{circumflex over (x)} k )  (3.7)
  • The difference (zk−H{circumflex over (x)}k ) in (3.7) is called the measurement innovation, or the residual. The residual reflects the discrepancy between the predicted measurement H{circumflex over (x)}k and the actual measurement zk. A residual of zero means that the two are in complete agreement. The n×m matrix K in (3.7) is chosen to be the gain or blending factor that minimizes the a posteriori error covariance (3.6). This minimization can be accomplished by first substituting (3.7) into the above definition for ek, substituting that into (3.6), performing the indicated expectations, taking the derivative of the trace of the result with respect to K, setting that result equal to zero, and then solving for K. For more details see [Maybeck79; Brown92; Jacobs93]. One form of the resulting K that minimizes (3.6) is given by
  • K k = P k - H T ( HP k - H T + R ) - 1 = P k - H T HP k - H T + R . ( 3.8 )
  • Looking at (3.8) we see that as the measurement error covariance R approaches zero, the gain K weights the residual more heavily. Specifically,
  • lim R k 0 K k = H - 1 .
  • On the other hand, as the a priori estimate error covariance Pk approaches zero, the gain K weights the residual less heavily. Specifically,
  • lim P α x 0 K k = 0.
  • Another way of thinking about the weighting by K is that as the measurement error covariance R approaches zero, the actual measurement zk is “trusted” more and more, while the predicted measurement H{circumflex over (x)}k is trusted less and less. On the other hand, as the a priori estimate error covariance Pk approaches zero the actual measurement zk is trusted less and less, while the predicted measurement H{circumflex over (x)}k is trusted more and more.
  • The Probabilistic Origins of the Filter: The justification for (3.7) is rooted in the probability of the a priori estimate {circumflex over (x)}k conditioned on all prior measurements zk (Bayes' rule). For now let it suffice to point out that the Kalman filter maintains the first two moments of the state distribution, E[xk]={circumflex over (x)}k, E[(xk−{circumflex over (x)}k)(xk−{circumflex over (x)}k)T]=Pk. The a posteriori state estimate (3.7) reflects the mean (the first moment) of the state distribution—it is normally distributed if the conditions of (3.3) and (3.4) are met. The a posteriori estimate error covariance (3.6) reflects the variance of the state distribution (the second non-central moment). In other words, p(xk|zk)−N(E[xk],E[(xk−{circumflex over (x)}k)(xk−{circumflex over (x)}k)T]=N({circumflex over (x)}k,Pk).
  • For more details on the probabilistic origins of the Kalman filter, see [Maybeck79; Brown92; Jacobs93].
  • The Discrete Kalman Filter Algorithm: The Kalman filter estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements. As such, the equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step. The measurement update equations are responsible for the feedback—i.e. for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate. The time update equations can also be thought of as predictor equations, while the measurement update equations can be thought of as corrector equations. Indeed the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems as shown below in FIG. 5, which shows the ongoing discrete Kalman filter cycle. The time update projects the current state estimate ahead in time. The measurement update adjusts the projected estimate by an actual measurement at that time.
  • The specific equations for the time and measurement updates are presented below: Discrete Kalman filter time update equations.

  • x k =A{circumflex over (x)} k−1 +Bu k,  (3.9)

  • P k =AP k−1 A T +Q  (3.10)
  • The time update equations (3.9) and (3.10) project the state and covariance estimates forward from time step k−1 to step k. A and B are from (3.1), while Q is from (3.3). Initial conditions for the filter are discussed in the earlier references.
  • Discrete Kalman filter measurement update equations.

  • K k =P k H T(HP k H T +R)−1  (3.11)

  • x k ={circumflex over (x)} k +K k(z k −H{circumflex over (x)} k )  (3.12)

  • P k=(i−K k H)P k   (3.13)
  • The first task during the measurement update is to compute the Kalman gain, Kk. Notice that the equation given here as (3.11) is the same as (3.8). The next step is to actually measure the process to obtain zk, and then to generate an a posteriori state estimate by incorporating the measurement as in (3.12). Again (3.12) is simply (3.7) repeated here for completeness. The final step is to obtain an a posteriori error covariance estimate via (3.13). All of the Kalman filter equations can be algebraically manipulated into to several forms. Equation (3.8) represents the Kalman gain in one popular form.
  • After each time and measurement update pair, the process is repeated with the previous a posteriori estimates used to project or predict the new a priori estimates. This recursive nature is one of the very appealing features of the Kalman filter—it makes practical implementations much more feasible than (for example) an implementation of a Wiener filter [Brown92] which is designed to operate on all of the data directly for each estimate. The Kalman filter instead recursively conditions the current estimate on all of the past measurements. FIG. 6 offers a complete picture of the operation of the filter, combining the high-level diagram of FIG. 5 with the equations (3.9) to (3.13). Filter Parameters and Tuning: In the actual implementation of the filter, the measurement noise covariance R is usually measured prior to operation of the filter. Measuring the measurement error covariance R is generally practical (possible) because we need to be able to measure the process anyway (while operating the filter) so we should generally be able to take some off-line sample measurements in order to determine the variance of the measurement noise. Determining of the process noise covariance Q is generally more difficult as typically there is an inability to directly observe the process being estimated. Sometimes a relatively simple (poor) process model can produce acceptable results if one “injects” enough uncertainty into the process via the selection of Q. In this case one would hope that the process measurements are reliable. In either case, whether or not we have a rational basis for choosing the parameters, often superior filter performance (statistically speaking) can be obtained by tuning the filter parameters Q and R. The tuning is usually performed off-line, frequently with the help of another (distinct) Kalman filter in a process generally referred to as system identification. Under conditions where Q and R. are in fact constant, both the estimation error covariance Pk and the Kalman gain Kk will stabilize quickly and then remain constant (see the filter update equations in FIG. 6). If this is the case, these parameters can be pre-computed by either running the filter off-line, or for example by determining the steady-state value of Pk as described in [Grewal93].
  • It is frequently the case however that the measurement error (in particular) does not remain constant. For example, observing like transmitters, the noise in measurements of nearby transmitters will generally be smaller than that in far-away transmitters. Also, the process noise Q is sometimes changed dynamically during filter operation—becoming Qk—in order to adjust to different dynamics. For example, in the case of tracking the head of a user of a 3D virtual environment we might reduce the magnitude of Qk if the user seems to be moving slowly, and increase the magnitude if the dynamics start changing rapidly. In such cases Qk might be chosen to account for both uncertainty about the user's intentions and uncertainty in the model.
  • 2 The Extended Kalman Filter (EKF): The Process to be Estimated: As described above, the Kalman filter addresses the general problem of trying to estimate the state x∈
    Figure US20160224951A1-20160804-P00005
    n of a discrete-time controlled process that is governed by a linear stochastic difference equation. But what happens if the process to be estimated and (or) the measurement relationship to the process is non-linear? Some of the most interesting and successful applications of Kalman filtering have been such situations. A Kalman filter that linearizes about the current mean and covariance is referred to as an extended Kalman filter or EKF.
  • In something akin to a Taylor series, we can linearize the estimation around the current estimate using the partial derivatives of the process and measurement functions to compute estimates even in the face of non-linear relationships. To do so, we must begin by modifying some of the analysis presented above. Let us assume that our process again has a state vector x∈
    Figure US20160224951A1-20160804-P00005
    n, but that the process is now governed by the non-linear stochastic difference equation

  • x k=ƒ(x k−v ,u k ,w k−1),  (4.1)
  • with a measurement zk=
    Figure US20160224951A1-20160804-P00005
    m that is

  • z k =h(x k v k),  (4.2)
  • where the random variables wk and vk again represent the process and measurement noise as in (4.3) and (4.4). In this case the non-linear function ƒ in the difference equation (4.1) relates the state at the previous time step k−1 to the state at the current time step k. It includes as parameters any driving function vk and the zero-mean process noise wk. The non-linear function h in the measurement equation (4.2) relates the state xk to the measurement zk. See,˜welch/kalman/kalman_filter/kalman-2.html, expressly incorporated herein by reference.
  • In practice of course one does not know the individual values of the noise wk and vk at each time step. However, one can approximate the state and measurement vector without them as

  • {tilde over (x)} k=ƒ({circumflex over (x)} k−1 ,u k,0), and,  (4.3)

  • {tilde over (z)} k =h({tilde over (x)} k,0)  (4.4)
  • where {circumflex over (x)}k is some a posteriori estimate of the state (from a previous time step k).
  • It is important to note that a fundamental flaw of the EKF is that the distributions (or densities in the continuous case) of the various random variables are no longer normal after undergoing their respective nonlinear transformations. The EKF is simply an ad hoc state estimator that only approximates the optimality of Bayes' rule by linearization. Some interesting work has been done by Julier et al. in developing a variation to the EKF, using methods that preserve the normal distributions throughout the non-linear transformations [Julier96].
  • The Computational Origins of the Filter: To estimate a process with non-linear difference and measurement relationships, we begin by writing new governing equations that linearize an estimate about (4.3) and (4.3),

  • x k ={tilde over (x)} k +A(x k−1 −{circumflex over (x)} k)+Ww k−1,  (4.5)

  • z k ={tilde over (z)} k +H(x k −{tilde over (x)} k)+Vv k.  (4.6)
  • Where: xk and zk are the actual state and measurement vectors, {tilde over (x)}k and {tilde over (z)}k are the approximate state and measurement vectors from (4.3) and (4.4), {circumflex over (x)}k is an a posteriori estimate of the state at step k, the random variables wk and vk represent the process and measurement noise as in (4.3) and (4.4),
  • A is the Jacobian matrix of partial derivatives of ƒ with respect to x, that is
  • A ( i , j ) = f ( i ) x ( j ) ( x ^ k - 1 , u k , 0 ) ,
  • W is the Jacobian matrix of partial derivatives of ƒ with respect to w,
  • W ( i , j ) = f ( i ) W ( j ) ( x ^ k - 1 , u k , 0 ) ,
  • H is the Jacobian matrix of partial derivatives of h with respect to x,
  • H ( i , j ) = h ( i ) x ( j ) ( x ~ k , 0 ) ,
  • and
  • V is the Jacobian matrix of partial derivatives of h with respect to v,
  • V ( i , j ) = h ( i ) v ( j ) ( x ^ k , 0 ) .
  • Note that for simplicity in the notation we do not use the time step subscript kk with the Jacobians A, W, H, and V, even though they are in fact different at each time step. We define a new notation for the prediction error,

  • {tilde over (e)} x k ≡x k −{tilde over (x)} k,  (4.7)
  • and the measurement residual,

  • {tilde over (e)} z k =z k −{tilde over (z)} k.  (4.8)
  • Remember that in practice one does not have access to xx in (4.7), it is the actual state vector, i.e. the quantity one is trying to estimate. On the other hand, one does have access to zk in (4.8), it is the actual measurement that one is using to estimate zk. Using (4.7) and (4.8) we can write governing equations for an error process as

  • {tilde over (e)} x k =A(x k−1 −{tilde over (x)} k−1)+∈k,  (4.9)

  • {tilde over (e)} z k =H{tilde over (e)} x k k  (4.10)
  • where ∈k and ηk represent new independent random variables having zero mean and covariance matrices WQWT and VRVT, with Q and R as in (3.3) and (3.4) respectively.
  • Notice that the equations (4.9) and (4.10) are linear, and that they closely resemble the difference and measurement equations (3.1) and (3.2) from the discrete Kalman filter. This motivates us to use the actual measurement residual {tilde over (e)}z k in (4.8) and a second (hypothetical) Kalman filter to estimate the prediction error {tilde over (e)}x k given by (4.9). This estimate, call it êk, could then be used along with (4.7) to obtain the a posteriori state estimates for the original non-linear process as

  • {circumflex over (x)} k ={tilde over (x)} k k.  (4.11)
  • The random variables of (4.9) and (4.10) have approximately the following probability distributions:

  • p({tilde over (e)} x k )−N(0,E[{tilde over (e)} x k {tilde over (e)} x k T])

  • p({tilde over (∈)}k)−N(0,WQ k W T)

  • pk)−N(0,VR k V T)
  • Given these approximations and letting the predicted value of êk be zero, the Kalman filter equation used to estimate

  • ê k is ê k =K k {tilde over (e)} z k .  (4.12)
  • By substituting (4.12) back into (4.11) and making use of (4.8) we see that we do not actually need the second (hypothetical) Kalman filter:

  • {circumflex over (x)} k ={tilde over (x)} k +K k {tilde over (e)} z k ={tilde over (x)} k +K k(z k −{tilde over (z)} k)  (4.13)
  • Equation (4.13) can now be used for the measurement update in the extended Kalman filter, with {tilde over (x)}k and {tilde over (z)}k coming from (4.3) and (4.4), and the Kalman gain Kk coming from (3.11) with the appropriate substitution for the measurement error covariance. The complete set of EKF equations is shown below. Note that we have substituted {circumflex over (x)}k for {tilde over (x)}k to remain consistent with the earlier “super minus” a priori notation, and that we now attach the subscript k to the Jacobians A, W, H, and V, to reinforce the notion that they are different at (and therefore must be recomputed at) each time step.
  • EKF time update equations:

  • {circumflex over (x)} k=ƒ({circumflex over (x)} k−1 ,u k,0)  (4.14)

  • P k =A k P k−1 A k T +W k Q k−1 W k T  (4.15)
  • As with the basic discrete Kalman filter, the time update equations (4.14) and (4.15) project the state and covariance estimates from the previous time step k−1 to the current time step k. Again ƒ in (4.14) comes from (4.3), Ak and Wk are the process Jacobians at step k, and Qk is the process noise covariance (3.3) at step k.
  • EKF measurement update equations:

  • K k =P k H k T(H k P k H k T +V k R k V k T)−1  (4.16)

  • {circumflex over (x)} k ={circumflex over (x)} k +K k(x k −h({circumflex over (x)} k ,0))  (4.17)

  • P k=(I−K k H k)P k   (4.18)
  • As with the basic discrete Kalman filter, the measurement update equations (4.16), (4.17) and (4.18) correct the state and covariance estimates with the measurement zk. Again h in (4.17) comes from (3.4), Hk and V are the measurement Jacobians at step k, and Rk is the measurement noise covariance (3.4) at step k. (Note we now subscript R allowing it to change with each measurement.) The basic operation of the EKF is the same as the linear discrete Kalman filter as shown in FIG. 5. FIG. 7 offers a complete picture of the operation of the EKF, combining the high-level diagram of FIG. 5 with the equations (4.14) through (4.18). An important feature of the EKF is that the Jacobian Hk in the equation for the Kalman gain Kk serves to correctly propagate or “magnify” only the relevant component of the measurement information. For example, if there is not a one-to-one mapping between the measurement zk and the state via h, the Jacobian Hk affects the Kalman gain so that it only magnifies the portion of the residual zk−h({circumflex over (x)}k ,0) that does affect the state. Of course if overall measurements there is not a one-to-one mapping between the measurement zk and the state via h, then as you might expect the filter will quickly diverge. In this case the process is unobservable.
  • The Process Model: In a simple example we attempt to estimate a scalar random constant, a voltage for example. Let's assume that we have the ability to take measurements of the constant, but that the measurements are corrupted by a 0.1 volt RMS white measurement noise (e.g. our analog to digital converter is not very accurate). In this example, our process is governed by the linear difference equation xk=Axk−1+Buk+wk=xk−1+wk, with a measurement z=
    Figure US20160224951A1-20160804-P00006
    1 that is zk=Hxk+vk=xk+vk.
  • The state does not change from step to step so A=1. There is no control input so u=0. Our noisy measurement is of the state directly so H=1. (Notice that we dropped the subscript k in several places because the respective parameters remain constant in our simple model.)
  • The Filter Equations and Parameters: Our time update equations are {circumflex over (x)}k={circumflex over (x)}k−1, Pk =Pk−1+Q, and our measurement update equations are
  • K k = P k - ( P k - + R ) - 1 = P k - P k - + R , x ^ k = x ^ k - + K k ( z k - x ^ k - ) , P k = ( 1 - K k ) P k - . ( 5.1 )
  • Presuming a very small process variance, we let Q=1e-5. (We could certainly let Q=0 but assuming a small but non-zero value gives us more flexibility in “tuning” the filter as we will demonstrate below.) Let's assume that from experience we know that the true value of the random constant has a standard normal probability distribution, so we will “seed” our filter with the guess that the constant is 0. In other words, before starting we let {circumflex over (x)}k−1=0. Similarly we need to choose an initial value for Pk−1, call it P0. If we were absolutely certain that our initial state estimate {circumflex over (x)}=0 was correct, we would let P0=0. However given the uncertainty in our initial estimate {circumflex over (x)}0, choosing P0=0 would cause the filter to initially and always believe {circumflex over (x)}k=0. As it turns out, the alternative choice is not critical. We could choose almost any P0≠0 and the filter would eventually converge. It is convenient, for example, to start with P0=0.
  • See, Brown92 Brown, R. G. and P. Y. C. Hwang. 1992. Introduction to Random Signals and Applied Kalman Filtering, Second Edition, John Wiley & Sons, Inc; Gelb74 Gelb, A. 1974. Applied Optimal Estimation, MIT Press, Cambridge, Mass.; Grewal93 Grewal, Mohinder S., and Angus P. Andrews (1993). Kalman Filtering Theory and Practice. Upper Saddle River, N.J. USA, Prentice Hall; Jacobs93 Jacobs, O. L. R. 1993. Introduction to Control Theory, 2nd Edition. Oxford University Press; Julier96 Julier, Simon and Jeffrey Uhlman. “A General Method of Approximating Nonlinear Transformations of Probability Distributions,” Robotics Research Group, Department of Engineering Science, University of Oxford [cited 14 Nov. 1995]. Available from˜siju/work/publications/; Kalman60 Kalman, R. E. 1960. “A New Approach to Linear Filtering and Prediction Problems,” Transaction of the ASME—Journal of Basic Engineering, pp. 35-45 (March 1960); Lewis86 Lewis, Richard. 1986. Optimal Estimation with an Introduction to Stochastic Control Theory, John Wiley & Sons, Inc; Maybeck79 Maybeck, Peter S. 1979. Stochastic Models, Estimation, and Control, Volume 1, Academic Press, Inc; Sorenson70 Sorenson, H. W. 1970. “Least-Squares estimation: from Gauss to Kalman,” IEEE Spectrum, vol. 7, pp. 63-68, July 1970. See, also: “A New Approach for Filtering Nonlinear Systems” by S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, Proceedings of the 1995 American Control Conference, Seattle, Wash., Pages:1628-1632. Available from˜siju/work/publications/; Simon Julier's home page at˜siju/; “Fuzzy Logic Simplifies Complex Control Problems”, Tom Williams, Computer Design, Mar. 1, 1991; “Neural Network And Fuzzy Systems—A Dynamical Systems Approach To Machine Intelligence”, Bart Kosko; Prentice Hall 1992; Englewood Cliffs, N.J.; pp. 13, 18, 19; B. Krogh et al., “Integrated Path Planning and Dynamic Steering Control for Autonomous Vehicles,” 1986; Brockstein, A., “GPS-Kalman-Augmented Inertial Navigation System Performance,” Naecom '76 Record, pp. 864-868, 1976; Brooks, R., “Solving the Fine-Path Problem by Good Representation of Free Space,” IEEE Transactions on Systems, Man, and Cybernetics, pp. 190-197, March-April, 1983; Brown, R., “Kalman Filtering Study Guide-A Guided Tour,” Iowa State University, pp. 1-19, 1984; Brown, R., Random Signal Analysis & Kalman Filtering, Chapter 5, pp. 181-209, no date; D. Kuan et al., “Model-based Geometric Reasoning for Autonomous Road Following,” pp. 416-423, 1987; D. Kuan, “Autonomous Robotic Vehicle Road Following,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 647-658, 1988; D. Touretzky et al., “What's Hidden in the Hidden Layers?,” Byte, pp. 227-233, August 1989; Data Fusion in Pathfinder and Travtek, Roy Sumner, VNIS '91 conference, October 20-23, Dearborn, Mich.; Database Accuracy Effects on Vehicle Positioning as Measured by the Certainty Factor, R. Borcherts, C. Collier, E. Koch, R. Bennet, VNIS '91 conference from October 20-23, Dearborn, Mich.; Daum, F., et al., “Decoupled Kalman Filters for Phased Array Radar Tracking,” IEEE Transactions on Automatic Control, pp. 269-283, March 1983; Denavit, J. et al., “A Kinematic Notation for Lower-Pair Mechanisms Bases on Matrices,” pp. 215-221, June, 1955; Dickmanns, E. et al., “Guiding Land Vehicles Along Roadways by Computer Vision”, The Tools for Tomorrow, Oct. 23, 1985; Edward J. Krakiwsky, “A Kalman Filter for Integrating Dead Reckoning, Map Matching and GPS Positioning”, IEEE Plans '88 Position Location and Navigation Symposium Record, Kissemee, Fla. USA, Nov. 29-Dec. 2, 1988, pp. 39-46; Fuzzy Systems and Applications, United Signals and Systems, Inc., Bart Kosko with Fred Watkins, Jun. 5-7, 1991; IEEE Journal of Robotics & Automation, vol. 4, No. 4, August. 1988, IEEE (New York) J. LeM “Domain-dependent reasoning for visual navigation of roadways, pp. 419-427 (Nissan) Mar. 24, 1988; J. Crowley, “Part 3: Knowledge Based Supervision of Robotics Systems,” 1989 IEEE Conference on Robotics and Automation, pp. 37-42, 1989; Kaczmarek, K. W., “Cellular Networking: A Carrier's Perspective”, 39th IEEE Vehicular Technology Conference, May 1, 1989, vol. 1, pp. 1-6; Knowledge Representation in Fuzzy Logic, Lotfi A. Zadeh, IEEE Transactions on Knowledge and Data Engineering, vol. 1, No. 1, March 1989; Sennott, J. et al., “A Queuing Model for Analysis of A Bursty Multiple-Access Communication Channel,” IEEE, pp. 317-321, 1981; Sheridan, T. “Three Models of Preview Control,” IEEE Transactions on Human Factors in Electronics, pp. 91-102, June 1966; Sheth, P., et al., “A Generalized Symbolic Notation for Mechanism,” Transactions of the ASME, pp. 102-112, Febuary 1971; Sorenson, W., “Least-Squares estimation: From Gauss to Kalman,” IEEE Spectrum, pp. 63-68, July 1970; “Automobile Navigation System Using Beacon Information” pp. 139-145; W. Uttal, “Teleoperators,” Scientific American, pp. 124-129, December 1989; Wareby, Jan, “Intelligent Signaling: FAR & SS7”, Cellular Business, pp. 58, 60 and 62, July 1990; Wescon/87 Conference Record, vol. 31, 1987, (Los Angeles, US) M. T. Allison et al “The next generation navigation system”, pp. 941-947; Ekaterina L.-Rundblad, Alexei Maidan, Peter Novak, Valeriy Labunets, Fast Color Wavelet-Haar Hartley-Prometheus Transforms For Image Processing,; Richard Tolimieri and Myoung An, Group Filters And Image Processing,; Daniel N. Rockmore, Recent Progress And Applications In Group FFTs,; Thomas TheuβI and Robert F. Tobler and Eduard Gröller, “The Multi-Dimensional Hartley Transform as a Basis for Volume Rendering”, M; Krebs, “Cars That Tell You Where To Go,” The New York Times, Dec. 15, 1996, section 11, p. 1; L. Kraar, “Knowledge Engineering,” Fortune, Oct. 28, 1996, pp. 163-164; S. Heuchert, “Eyes Forward: An ergonomic solution to driver information overload,” Society of Automobile Engineering, September 1996, pp. 27-31; J. Braunstein, “Airbag Technology Take Off,” Automotive & Transportation Interiors, August 1996, p. 16; I. Adcock, No Longer Square,” Automotive & Transportation Interiors, August 1996, p. 38. See also, U.S. patent Nos. (expressly incorporated herein by reference): U.S. Pat. Nos. 3,582,926; 4,291,749; 4,314,232; 4,337,821; 4,401,848; 4,407,564; 4,419,730; 4,441,405; 4,451,887; 4,477,874; 4,536,739; 4,582,389; 4,636,782; 4,653,003; 4,707,788; 4,731,769; 4,740,779; 4,740,780; 4,752,824; 4,787,039; 4,795,223; 4,809,180; 4,818,048; 4,827,520; 4,837,551; 4,853,687; 4,876,594; 4,914,705; 4,967,178; 4,988,976; 4,995,258; 4,996,959; 5,006,829; 5,043,736; 5,051,735; 5,070,323; 5,070,931; 5,119,504; 5,198,797; 5,203,499; 5,214,413; 5,214,707; 5,235,633; 5,257,190; 5,274,560; 5,278,532; 5,293,115; 5,299,132; 5,334,974; 5,335,276; 5,335,743; 5,345,817; 5,351,041; 5,361,165; 5,371,510; 5,400,045; 5,404,443; 5,414,439; 5,416,318; 5,422,565; 5,432,904; 5,440,428; 5,442,553; 5,450,321; 5,450,329; 5,450,613; 5,475,399; 5,479,482; 5,483,632; 5,486,840; 5,493,658; 5,494,097; 5,497,271; 5,497,339; 5,504,622; 5,506,595; 5,511,724; 5,519,403; 5,519,410; 5,523,559; 5,525,977; 5,528,248; 5,528,496; 5,534,888; 5,539,869; 5,547,125; 5,553,661; 5,555,172; 5,555,286; 5,555,502; 5,559,520; 5,572,204; 5,576,724; 5,579,535; 5,627,547; 5,638,305; 5,648,769; 5,650,929; 5,653,386; 5,654,715; 5,666,102; 5,670,953; 5,689,252; 5,691,695; 5,702,165; 5,712,625; 5,712,640; 5,714,852; 5,717,387; 5,732,368; 5,734,973; 5,742,226; 5,752,754; 5,758,311; 5,777,394; 5,781,872; 5,919,239; 6,002,326; 6,013,956; 6,078,853; 6,104,101; and 6,449,535.
  • One embodiment of the present invention thus advances the art by explicitly communicating reliability or risk information to the user. Therefore, in addition to communicating an event or predicted event, the system also computes or determines a reliability of the information and outputs this information. The reliability referred to herein generally is unavailable to the original detection device, though such device may generate its own reliability information for a sensor reading.
  • Therefore, one user interface embodiment according to this embodiment is improved by outputting information relating to both the event and a reliability or risk with respect to that information.
  • According to a preferred embodiment of the invention, a vehicle travel information system is provided, for example integrated with a vehicular navigation system. In a symmetric peer-to-peer model, each vehicle includes both environmental event sensors and a user interface, but the present invention is not dependent on both aspects being present in a device. As the vehicle travels, and as time advances, its context sphere is altered. For any context sphere, certain events or sensed conditions will be most relevant. These most relevant events or sensed, to the extent known by the system, are then output through a user interface. However, often, the nature or existence of relevant or potentially relevant event is unreliable, or reliance thereon entails risk.
  • In the case of a vehicle traveling along a roadway, there are two particular risks to analyze: first, that the recorded event may not exist (false positive), and second, that an absence of indication of an event is in error (false negative). For example, the degree of risk may be indicated by an indication of color (e.g., red, yellow green) or magnitude (e.g., a bar graph or dial).
  • In many cases, the degree of risk is calculable, and thus may be readily available. For example, if the event sensor is a detection of police radar, reliability may be inferred from a time since last recording of an event. If a car is traveling along a highway, and receives a warning of traffic enforcement radar from a car one mile ahead, there is a high degree of certainty that the traffic enforcement radar will actually exist as the vehicle proceeds along the highway. Further, if the traffic radar is in fixed location, there is a high degree of certainty that there is no traffic enforcement radar closer than one mile. On the other hand, if a warning of traffic radar at a given location is two hours old, then the risk of reliance on this information is high, and the warning should be deemed general and advisory of the nature of risks in the region. Preferably, as such a warning ages, the temporal proximity of the warning is spread from its original focus. On the contrary, if the warning relates to a pothole in a certain lane on the highway, the temporal range of risk is much broader: even a week later, the reliability of the continued existence at that location remains high. However, over the course of a year, the reliability wanes. On the other hand, while there may be a risk of other potholes nearby, the particular detected pothole would not normally move. The algorithm may also be more complex. For example, if a traffic accident occurs at a particular location, there are generally acceptable predictions of the effect of the accident on road traffic for many hours thereafter. These include rubbernecking, migrations of the traffic pattern, and secondary accidents. These considerations may be programmed, and the set of events and datapoints used to predict spatial and temporal effects, as well as the reliability of the existence of such effects. This, in turn, may be used to advise a traveler to take a certain route to a destination.
  • Eventually, the reliability of the information is inferred to be so low as to cause an expiration of the event, although preferably a statistical database is maintained to indicate geographic regional issues broadly.
  • Therefore, the system and method according to the present invention provides an output that can be considered “two dimensional” (or higher dimensional); the nature of the warning, and the reliability of the warning. In conjunction, the system may therefore output a reliability of an absence of warning. In order to conserve communications bandwidth, it is preferred that an absence of warning is inferred from the existence of a communications channel with a counterpart, along with a failure of a detection of an event triggering a warning. Alternately, such communications may be explicit.
  • The present invention can provide a mobile warning system having a user interface for conveying an event warning and an associated reliability or risk of reliance on the warning. Preferably, the reliability or risk of reliance is assessed based on a time between original sensing and proximity. The reliability may also be based on the nature of the event or sensed condition. An intrinsic reliability of the original sensed event or condition may also be relayed, as distinct from the reliability or risk of reliance assuming the event or condition to have been accurately sensed.
  • In order to determine risk, often statistical and probabilistic techniques may be used. Alternately, non-linear techniques, such as neural networks, may be employed. In employing a probabilistic scheme, a sensor reading at time zero, and the associated intrinsic probability of error are stored. A model is associated with the sensor reading to determine a decay pattern. Thus, in the case of traffic enforcement radar, the half-life for a “radar trap” for K band radar being fixed in one location is, for example, about 5 minutes. Thereafter, the enforcement officer may give a ticket, and proceed up the road. Thus, for times less than three minutes, the probability of the traffic enforcement radar remaining in fixed position is high. For this same time-period, the probability that the traffic enforcement officer has moved up the road against the direction of traffic flow is low. A car following 3 miles behind a reliable sensor at 60 mph would therefore have a highly reliable indication of prospective conditions. As the time increases, so does the risk; a car following ten miles behind a sensor would only have a general warning of hazards, and a general indication of the lack thereof. However, over time, a general (and possibly diurnal or other cyclic time-sensitive variation) risk of travel within a region may be established, to provide a baseline. It is noted that the risks are not limited to traffic enforcement radar or laser. Rather, the scheme according to the present invention is generalized to all sorts of risks. For example, a sensor may detect or predict sun glare. In this case, a model would be quite accurate for determining changes over time, and assuming a reliable model is employed, this condition could generally be accurately predicted.
  • Another example is road flooding. This may be detected, for example, through the use of optical sensors, tire drag sensors, “splash” sensors, or other known sensors. In this case, the relevant time-constant for onset and decay will be variable, although for a given location, the dynamics may be modeled with some accuracy, based on sensed actual conditions, regional rainfall, ground saturation, and particular storm pattern. Therefore, a puddle or hydroplaning risk may be communicated to the driver in terms of location, likely magnitude, and confidence. It is noted that these three independent parameters need not all be conveyed to the user. For example, the geographic proximity to an event location may be used to trigger an output. Therefore, no independent output of location may be necessary in this case. In some cases, the magnitude of the threat is relevant, in other cases it is not. In many present systems (e.g., radar detection), threat magnitude is used as a surrogate for risk. However, it is well understood that there are high magnitude artifacts, and low magnitude true threats, and thus this paradigm has limited basis for use. The use of risk or confidence as an independent factor may be express or intermediate. Thus, a confidence threshold may be internally applied before communicating an event to the user. In determining or predicting risk or confidence, it may be preferred to provide a central database. Therefore, generally more complex models may be employed, supported by a richer data set derived from many measurements over an extended period of time. The central database may either directly perform the necessary computations, or convey an appropriate model, preferably limited to the context (e.g., geography, time, general environmental conditions), for local calculation of risk. The incorporated references relate, for example, to methods and apparatus which may be used as part of, or in conjunction with the present invention. Therefore, it is understood that the present invention may integrate other systems, or be integrated in other systems, having complementary, synergistic or related in some way. For example, common sensors, antennas, processors, memory, communications hardware, subsystems and the like may provide a basis for combination, even if the functions are separate. The techniques according to the present invention may be applied to other circumstances. Therefore, it is understood that the present invention has, as an object to provide a user interface harnessing the power of statistical methods. Therefore, it is seen that, as an aspect of the present invention, a user interface, a method of providing a user interface, computer software for generating a human-computer interface, and a system providing such a user interface, presents a prediction of a state as well as an indication of a statistical reliability of the prediction. Within a vehicular environment, the statistical analysis according to the present invention may also be used to improve performance and the user interface of other systems. In particular, modern vehicles have a number of indicators and warnings. In most known systems, warnings are provided at pre-established thresholds. According to the present invention, a risk analysis may be performed on sensor and other data to provide further information for the user, e.g., an indication of the reliability of the sensor data, or the reliability under the circumstances of the sensor data as basis for decision. (For example, a temperature sensor alone does not indicate whether an engine is operating normally.)
  • Fourth Embodiment
  • The present example provides a mobile telecommunications device having a position detector, which may be absolute, relative, hybrid, or other type, and preferably a communications device for communicating information, typically location relevant information. The device may serve as a transmitter, transmitting information relevant to the location (or prior locations) of the device, a receiver, receiving information relevant to the location (or prospective location) of the device, or a composite. In the case of a transmitter device or stand-alone device, a sensor is provided to determine a condition of or about the device or its context. This sensor may populate a map or mapping system with historical map data.
  • During use, a receiving device seeks to output location context-relevant information to the user, and therefore in this embodiment includes a human user interface. Typically, in a vehicle having a general linear or highly constrained type path, a position output is not a critical feature, and may be suppressed in order to simplify the interface. Rather, a relative position output is more appropriate, indicating a relative position (distance, time, etc.) with respect to a potential contextually relevant position. In addition, especially in systems where a plurality of different types of sensors or sensed parameters are available, the nature of the relevant context is also output. Further, as a particular feature of the present invention, a risk or reliability assessment is indicated to the user. This risk or reliability assessment is preferably statistically derived, although it may be derived through other known means, for example Boolean analysis, fuzzy logic, or neural networks. For example, the device may provide weather information to the user. Through one or more of meteorological data from standard reporting infrastructure (e.g., NOAA, Accuweather®, etc.), mobile reporting nodes (e.g., mobiles devices having weather sensors), satellite data, and other weather data sources, a local weather map is created, preferably limited to contextual relevance. In most cases, this weather map is stored locally; however, if the quality of service for a communications link may be assured, a remote database system serving one or more devices may be provided. For example, a cellular data communications system may be used to communicate with the Internet or a service provider.
  • The mobile unit, in operation, determines its position, and, though explicit user input and/or inferential analysis, determines the itinerary or expected path of the device and time sequence. The device (or associated systems) then determines the available weather information for the route and anticipated itinerary (which may itself be dependent on the weather information and/or reaction thereto). This available information is then modeled, for example using a statistical model as described hereinabove, to predict the forthcoming weather conditions for the device or transporting vehicle.
  • The device then determines the anticipated conditions and relevance sorts them. In this case, both positive and negative information may be useful, i.e., a warning about bad weather, ice, freezing road surfaces, fog, sand-storms, rain, snow, sleet, hail, sun glare, etc., and an indication of dry, warm, well-illuminated road surfaces may both be useful information.
  • In addition, through the analysis, a number of presumptions and predictions are made, for example using a chain. Therefore, while the system may predict a most likely state of affairs, this alone does not provide sufficient information for full reliance thereon. For example, the present road surface freezing conditions thirty miles ahead on a road may be a poor indicator of the road conditions when the device is at that position. In addition to changes in the weather, human action may be taken, such as road salt, sand, traffic, etc., which would alter the conditions, especially in response to a warning. On the other hand, a report of freezing road conditions one mile ahead would generally have high predictive value for the actual road conditions when the device is at that location, assuming that the vehicle is traveling in that direction. In many cases, there is too much raw information to effectively display to the user all relevant factors in making a reliability or risk determination. Thus, the device outputs a composite estimation of the reliability or risk, which may be a numeric or non-parametric value. This is output in conjunction with the nature of the alert and its contextual proximity. As stated above, there will generally be a plurality of events, each with an associated risk or reliability and location. The relevance of an event may be predicted based on the dynamics of the vehicle in which the device is transported and the nature of the event. Thus, if the vehicle requires 170 feet to stop from a speed of 60 MPH, a warning which might trigger a panic stop should be issued between 170-500 feet in advance. If the warning is triggered closer than 170 feet, preferably the warning indicates that the evasive maneuver will be necessary. In this case, the risk indicator includes a number of factors. First, there is the reliability of the data upon which the warning is based. Second, there is the reliability of the predictive model which extrapolates from, the time the raw data is acquired to the conjunction of the device and the location of the event. Third, there is an assessment of the relative risks of, responding to a false positive versus failing to respond to a false negative. Other risks may also be included in the analysis. Together, the composite risk is output, for example as a color indicator. Using, for example, a tricolor (red-green-blue) light emitting diode (LED) or bicolor LED (red-green), a range of colors may be presented to the user. Likewise, in an audio alert, the loudness or harmonic composition (e.g., harmonic distortion) of a tone or alert signal may indicate the risk or reliability. (In the case of loudness, preferably a microphone measures ambient noise to determine a minimum loudness necessary to indicate an alert).
  • The position detector is preferably a GPS or combined GPS-GLONASS receiver, although a network position detection system (e.g., Enhanced 911 type system) may also be employed. Preferably, the position detector achieves an accuracy of ±30 meters 95% of the time, and preferably provides redundant sensors, e.g., GPS and inertial sensors, in case of failure or error of one of the systems. However, for such purposes as pothole reporting, positional accuracies of 1 to 3 meters are preferred. These may be obtained through a combination of techniques, and therefore the inherent accuracy of any one technique need not meet the overall system requirement. The position detector may also be linked to a mapping system and possibly a dead reckoning system, in order to pinpoint a position with a geographic landmark. Thus, while precise absolute coordinate measurements of position may be used, it may also be possible to obtain useful data at reduced cost by applying certain presumptions to available data. In an automotive system, steering angle, compass direction, and wheel revolution information may be available, thereby giving a rough indication of position from a known starting point. When this information is applied to a mapping system, a relatively precise position may be estimated. Therefore, the required precision of another positioning system used in conjunction need not be high, in order to provide high reliability position information. For example, where it is desired to map potholes, positional accuracy of 10 cm may be desired, far more precise than might be available from a normal GPS receiver mounted in a moving automobile. Systems having such accuracy may then be used as part of an automated repair system. However, when combined with other data, location and identification of such events is possible. Further, while the system may include or tolerate inaccuracies, it is generally desired that the system have high precision, as compensation for inaccuracies may be applied.
  • A typical implementation of the device provides a memory for storing events and respective locations. Preferably, further information is also stored, such as a time of the event, its character or nature, and other quantitative or qualitative aspects of the information or its source and/or conditions of acquisition. This memory may be a solid state memory or module (e.g., 64-256 MB Flash memory), rotating magnetic and/or optical memory devices, or other known types of memory. The events to be stored may be detected locally, such as through a detector for radar and/or laser emission source, radio scanner, traffic or road conditions (mechanical vehicle sensors, visual and/or infrared imaging, radar or LIDAR analysis, acoustic sensors, or the like), places of interest which may be selectively identified, itinerary stops, and/or fixed locations. The events may also be provided by a remote transmitter, with no local event detection. Therefore, while means for identifying events having associated locations is a part of the system as a whole, such means need not be included in every apparatus embodying the invention.
  • Radar detectors typically are employed to detect operating emitters of X (10.5 GHz), K (25 GHz) and Ka (35 GHz) radar emissions from traffic control devices or law enforcement personnel for detecting vehicle speed by the Doppler effect. These systems typically operate as superheterodyne receivers which sweep one or more bands, and detect a wave having an energy significantly above background. As such, these types of devices are subject to numerous sources of interference, accidental, intentional, and incidental. A known system, Safety Warning System (SWS) licensed by Safety Warning System L.C., Englewood Fla., makes use of such radar detectors to specifically warn motorists of identified road hazards. In this case, one of a set of particular signals is modulated within a radar band by a transmitter operated near the roadway. The receiver decodes the transmission and warns the driver of the hazard.
  • LIDAR devices emit an infrared laser signal, which is then reflected off a moving vehicle and analyzed for delay, which relates to distance. Through successive measurements, a sped can be calculated. A LIDAR detector therefore seeks to detect the characteristic pulsatile infrared energy. Police radios employ certain restricted frequencies, and in some cases, police vehicles continuously transmit a signal. While certain laws restrict interception of messages sent on police bands, it is believed that the mere detection and localization of a carrier wave is not and may not be legally restricted. These radios tend to operate below 800 MHz, and thus a receiver may employ standard radio technologies. Potholes and other road obstructions and defects have two characteristics. First, they adversely affect vehicles which encounter them. Second, they often cause a secondary effect of motorists seeking to avoid a direct encounter or damage, by slowing or executing an evasive maneuver. These obstructions may therefore be detected in three ways; first, by analyzing the suspension of the vehicle for unusual shocks indicative of such vents; second, by analyzing speed and steering patterns of the subject vehicle and possibly surrounding vehicles; and third, by a visual, ultrasonic, or other direct sensor for detecting the pothole or other obstruction. Such direct sensors are known; however, their effectiveness is limited, and therefore an advance mapping of such potholes and other road obstructions greatly facilitates avoiding vehicle damage and executing unsafe or emergency evasive maneuvers. An advance mapping may also be useful in remediation of such road hazards, as well.
  • Traffic jams occur for a variety of reasons. Typically, the road carries traffic above a threshold, and for some reason the normal traffic flow patterns are disrupted. Therefore, there is a dramatic slowdown in the average vehicle speed, and a reduced throughput. Because of the reduced throughput, even after the cause of the disruption has abated, the roadways may take minutes to hours to return to normal. Therefore, it is typically desired to have advance warnings of disruptions, which include accidents, icing, rain, sun glare, lane closures, road debris, police action, exits and entrances, and the like, in order to allow the driver to avoid the involved region or plan accordingly. Abnormal traffic patterns may be detected by comparing a vehicle speed to the speed limit or a historical average speed, by a visual evaluation of traffic conditions, or by broadcast road advisories. High traffic conditions are associated with braking of traffic, which in turn results in deceleration and the illumination of brake lights. Brake lights may be determined by both the specific level of illumination and the center brake light, which is not normally illuminated. Deceleration may be detected by an optical, radar or LIDAR sensor for detecting the speed and/or acceleration state of nearby vehicles.
  • While a preferred embodiment of the present invention employs one or more sensors, broadcast advisories, including those from systems according to or compatible with the present invention, provide a valuable source of information relating to road conditions and information of interest at a particular location. Therefore, the sensors need not form a part of the core system. Further, some or all of the required sensors may be integrated with the vehicle electronics (“vetronics”), and therefore the sensors may be provided separately or as options. It is therefore an aspect of an embodiment of the invention to integrate the transceiver, and event database into a vetronics system, preferably using a digital vetronics data bus to communicate with existing systems, such as speed sensors, antilock brake sensors, cruise control, automatic traction system, suspension, engine, transmission, and other vehicle systems. According to one aspect of the invention, an adaptive cruise control system is provided which, in at least one mode of operation, seeks to optimize various factors of vehicle operation, such as fuel efficiency, acceleration, comfort, tire wear, etc. For example, an automatic acceleration feature is provided which determines a most fuel-efficient acceleration for a vehicle. Too slow an acceleration will result i