WO2002008943A2 - Method and system for finding match in database related to waveforms - Google Patents

Method and system for finding match in database related to waveforms Download PDF

Info

Publication number
WO2002008943A2
WO2002008943A2 PCT/US2001/022891 US0122891W WO0208943A2 WO 2002008943 A2 WO2002008943 A2 WO 2002008943A2 US 0122891 W US0122891 W US 0122891W WO 0208943 A2 WO0208943 A2 WO 0208943A2
Authority
WO
WIPO (PCT)
Prior art keywords
database
recited
recordings
represented
selected recording
Prior art date
Application number
PCT/US2001/022891
Other languages
French (fr)
Other versions
WO2002008943A3 (en
Inventor
Steven D. Scherf
Paul E. Quinn
Original Assignee
Cddb, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/621,619 external-priority patent/US7228280B1/en
Application filed by Cddb, Inc. filed Critical Cddb, Inc.
Priority to JP2002514577A priority Critical patent/JP2004511838A/en
Priority to AU2001277034A priority patent/AU2001277034A1/en
Priority to EP01954813A priority patent/EP1303817A2/en
Publication of WO2002008943A2 publication Critical patent/WO2002008943A2/en
Publication of WO2002008943A3 publication Critical patent/WO2002008943A3/en
Priority to NO20030319A priority patent/NO20030319L/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data

Definitions

  • the present invention is directed to locating records in a database and, more particularly, to locating a match for a waveform in a database of records representing waveforms.
  • the traditional experience of the musical recording is listening by a small group of persons gathered together in a room.
  • the music fills the room acoustically, but there is little associated visual content, and there is only a limited interaction with the recording, consisting essentially of deciding which tracks to play and performing simple transformations on the recorded sound, such as setting the volume or applying an audio equalizer.
  • This traditional experience dates back to the early age of 78 r.p.m. musical recordings almost a century ago.
  • the traditional production of a musical recording complements the traditional experience of the recording.
  • the recording is produced in a number of recording sessions, subject to careful mixing and editing, and then released to the public. At that point, the recording is in a fixed form, nowadays an audio CD, whose purpose is to record as faithfully as possible the final sonic experience designed by its authors, the musicians, producer, and recording engineers.
  • Music videos have supplemented the traditional experience of musical recordings by allowing the association of visual content with tracks of such a recording. In practice, however, music videos have been broadcast, with all the problems of lack of user control which that implies, and they have not contributed to interactivity or participation by the consumer.
  • On-line services offer opportunities for enriching the experience associated with musical recordings.
  • the present invention is addressed to computer programs, systems, and protocols which can fulfil this promise.
  • software which permits a computer program running on a remote host to control a compact disc (CD) player, DVD player, or the like on a user's computer.
  • CD compact disc
  • DVD player DVD player
  • the software is designed to permit the remote host both to initiate actions on the CD player and to become aware of actions which the user has initiated by other control means, such as the buttons on the CD player's front panel or a different CD player control program.
  • This aspect of the invention is a building-block for the provision of complementary entertainment for musical recordings when those recordings are fixed in the prevailing contemporary form, the audio CD.
  • visual content including interactive content
  • visual content may be delivered over an on-line service in such a way that it is synchronized to the delivery of content from a musical recording.
  • Such visual content may, for example, be synchronized to the playing of an audio CD in the user's computer.
  • the visual content is thematically linked to the musical recording, for example in the manner of a music video.
  • a method for assigning a unique identifier to musical recordings consisting of a number of tracks.
  • a unique identifier is a useful complement to the delivery of visual content in conjunction with the playing of an audio CD in that it allows the software which delivers the visual content to be sure that the audio CD is in fact the correct CD to which the visual content corresponds. If the visual content is designed, for example, to accompany the Rosary Sonatas of Heinrich Ignaz Franz Biber, it would presumably not function well if the CD in the user's player were the soundtrack for the film Mary Poppins.
  • the unique identifier also allows a CD to be used as a key to access a premium Web area. Furthermore, the unique identifier can allow the user to be directed to an area of the Web corresponding to the CD which is in the user's machine.
  • the enormous popular on-line service generally referred to as a "chat room” may be enhanced by means of a link to a musical recording to which all persons in the room are listening.
  • the chat room experience as it exists today in on-line services has a disembodied quality by comparison with traditional face-to-face social encounters, in which there are identifiable surroundings.
  • the only common experience to the chat users today are the words of the chat as they fly by on a computer screen, and perhaps the user icons ("avatars") or other visual content occupying a small space on the screen.
  • avatars user icons
  • the use of a musical recording in conjunction with a chat room opens up the possibility of restoring to the experience a degree of the shared ambience of traditional social encounters.
  • the musical recording offers a focal point that allows chat-seekers to group together by means of shared interests in a particular type of recording.
  • FIG. 1 is a block diagram of the environment in which the preferred embodiment operates.
  • FIG. 2 is a flowchart of the synchronization code of the invention.
  • FIG. 3 is a flowchart of the sequence of operations for connection to a chat room focused on a musical recording.
  • FIGS. 4A and 4B are explanatory diagrams of waveform analysis according to the present invention.
  • the preferred embodiment of this invention operates on the World Wide Web.
  • the software implementation environment provided by the World Wide Web is described in a number of books, for example, John December & Mark Ginsburg, HTML 3.2 and CGI Unleashed (1996).
  • the World Wide Web is based on a network protocol called HTTP (hypertext transfer protocol), which is described in
  • HTTP protocol must be run atop a general connection-oriented protocol, which today is generally TCP/IP, described in Douglas E. Comer, Internetworking with TCP/IP (3d ed. 1995).
  • TCP/IP general connection-oriented protocol
  • the invention described here is not limited to HTTP running over any particular kind of network software or hardware. The principles of the invention apply to other protocols for access to remote information that may come to compete with or supplant HTTP.
  • a Web user sits at his or her computer and runs a computer program called a browser.
  • the browser sends out HTTP requests to other computers, referred to as servers.
  • requests particular items of data, referred to as resources, which are available on servers, are referred to by means of uniform resource locators (URL's), character strings in a particular format defined in Berners-Lee et al., supra.
  • URL's uniform resource locators
  • a URL includes both an identification of the server and an identification of a particular item of data within the server. Reacting to the requests, the servers return responses to the user's browser, and the browser acts upon those responses, generally by displaying some sort of content to the user.
  • the content portion of the responses can be a "Web page,” expressed in the hypertext markup language (HTML). That language allows one to express content consisting of text interspersed with bitmap-format images and links (also known as anchors and hyperlinks). The links are further URL's to which the browser may, at the user's prompting, send further requests.
  • the responses can also include more complex commands to be interpreted by the browser, e.g., commands which result in an animation. HTML itself does not define complex commands, but rather they are considered to belong to separately-defined scripting languages, of which the two most common ones are JavaScript and VBScript.
  • plug-in In addition to extending the function of the browser by means of code written in a scripting language, it is also possible to extend the function of a browser with compiled code. Such compiled code is referred to as a "plug-in.” The precise protocol for writing a plug-in is dependent on the particular browser. Plug-ins for the Microsoft browser are referred to by the name of ActiveX controls.
  • Plug-ins may be very complex.
  • a plug-in which may advantageously be used in connection with the invention is Shockwave from Macromedia. It permits animations which are part of a server response to be downloaded and played to the user.
  • Shockwave defines its own scripting language called Lingo.
  • Lingo scripts are contained within the downloadable animations which the Shockwave plug-in can play.
  • the general format of a Shockwave animation is a timeline consisting of a series of frames, together with a number of visual objects which appear, perform motions, and disappear at particular frames within the timeline.
  • Lingo scripts may be invoked in addition to predefined visual objects.
  • a preferred embodiment of the invention employs a plug-in, referred to as the command plug-in, which provides to a scripting language the ability to command in a detailed fashion the playing of a musical recording.
  • the command plug-in should provide, at a minimum, the following basic functions: (1 ) Start and stop play.
  • the command plug-in is preferably written in a conventional programming language such as C++.
  • the plug-in must conform to the existing standards for plug-ins, such as those required of Microsoft ActiveX objects.
  • the command plug-in In order to obtain the information and carry out the functions which the command plug-in makes available to the scripting language, the command plug-in relies on functions which provide control and information regarding the playing musical recording. These functions will depend on the precise source of the recording. If, as in the currently preferred embodiment, the recording is being played on an audio CD in the computer CD player, and if the browser is running under Microsoft Windows 3.1 or Windows 95, these functions would be the MCI functions, which form a part of the Win32 application programming interface. These functions are documented, for example, in Microsoft Win32 Programmer's Reference. Different functions may be provided by streaming audio receivers, as for example receivers which capture audio which is coming into the user's computer over a network connection in a suitable audio encoding format such as MPEG.
  • command plug-in An important point to note about the implementation of the command plug- in is that the operations which it carries out, as for example seeks, may take times on the order of a second. It is undesirable for the command-plug in to retain control of the machine during that interval, so it is important that the plug-in relinquish control of the machine to the browser whenever a lengthy operation is undertaken, and report on the results of the operation via the asynchronous event handling capability used in the common scripting languages. Given the above summary of the functions which the command plug-in provides, a general knowledge of how to write plug-ins (e.g., of how to write
  • a command plug-in providing the functions listed above to a scripting language is a foundation on which entertainment complementary to a musical recording may be constructed.
  • the synchronization of the visual content to the audio CD proceeds as follows.
  • the visual content is provided by means of a Shockwave animation, which is downloaded from the server and displayed for the user by means of a Shockwave plug-in.
  • This downloading may take place before the animation is displayed, or alternatively it may take place as the animation is being displayed, provided the user's connection to the network is fast enough to support download at an appropriate speed.
  • the downloading is a function provided by the Shockwave plug-in itself.
  • a Lingo script executes each time a frame finishes displaying.
  • the Lingo script contains a description of the relationship which should exist between frames of the animation and segments of the musical recording, identified by track number and by time.
  • the Lingo script determines, by means of the command plug-in described above, at which track and time the play of the audio CD is. It then refers to the description in order to determine which frames of the animation correspond to that portion of the audio CD. If the current frame is not one of those frames, the Lingo script resets the time line of the animation so that the animation will begin to play at the frame which corresponds to the current position of the audio CD.
  • the frames of the animation are arranged into groups of contiguous frames. A correspondence is established between each such group of frames and a particular segment of the audio recording (box 200 in FIG. 2). At the end of each frame of the animation, the audio play position is determined (box 210). A test is done to determine whether the audio play position is within the segment of the recording that corresponds to the group of frames to which the next sequential frame belongs (box 215).
  • the playback of the animation proceeds with that next frame (box 230). If the audio play position is not within that segment, then the playback of the animation is advanced to the frame corresponding to where the audio is (boxes 220 and 225).
  • a further aspect of the invention is the ability, by making use of the command plug-in, to provide a technique for establishing a unique identifier for a recording which may be stored in mass storage, whether integrated circuit, magnetic (e.g., hard disk), or any other medium, or on a removable medium, such as an audio CD, or integrated circuit memory, such as compact flash memory, Memory StickTM, etc., accessed by a CD-ROM drive of a computer, MP3 player/recorder or any other device capable of accessing the medium.
  • the unique identifier may be based on the number and lengths of the tracks
  • the identifier could simply be a concatenation of the track lengths that can be used with a fuzzy comparison algorithm and also for more precise matching if more than one possible match is located.
  • the algorithm then divides the match error by the match number, subtracts the resulting quotient from 1 , and converts the difference to a percentage which is indicative of how well the two CDs match.
  • track length to create an identifier for a recording is best suited to media that have multiple tracks and preferably those that store such information in a table of contents or TOC, such as CDs and DVDs. Furthermore, use of track length or TOC data has been found to work best with fuzzy matching, but this sometimes results in finding more than one possible match.
  • An alternative or supplement for TOC data is to use the content of the recording. However, it is desirable to use a content-based identifier that is relatively small, to minimize storage space and bandwidth requirements.
  • An embodiment of the present invention uses an amplitude signature providing a content-based identifier generated from short, e.g., five second sample segments from multiple locations in each track (if there is more than one track in a recording), such as the beginning, middle and end.
  • a content-based identifier generated from short, e.g., five second sample segments from multiple locations in each track (if there is more than one track in a recording), such as the beginning, middle and end.
  • An example of one such sample segment (the term sample segment is used to distinguish the segments used for generating the identifier from identified segments, i.e., segments identified in the TOC, that are commonly referred to as "tracks" on a CD) is illustrated in Fig. 4A with a waveform 410.
  • a plurality of amplitude bands or slots are defined and the number of occurrences of all segments of the waveform within each slot are counted.
  • the first step is to normalize the waveform, so that the first and last slots have at least one occurrence of the waveform.
  • the waveform 410 in Fig. 4A is normalized over the seven slots 420 illustrated in Fig. 4Ato produce waveform 410b in Fig. 4B with the slots 420 separately indicated as slots 421-427.
  • 16 time samples are taken, one at each of the vertical lines.
  • This can be represented by the linear array A1 [1, 3, 2, 1 , 2, 3, 4].
  • a fuzzy match may be accomplished by calculating an average of the difference between the elements of A1 and the elements of existing signature arrays.
  • one of the records in the database may have a signature array of A2 [2, 3, 4, 1 , 1 , 3, 3] for a difference array of [1 , 0, 2, 0, 1 , 0, 1] or an average difference of 5/7 or 0.714.
  • a "fuzzy match" based on average difference allows for errors in the waveform and imperfect starting locations for the signature generation. However, the average difference that is accepted as a match should be set to minimize false positives.
  • the number or length of the sample segments could be increased to reduce false positives, but this increases the time spent reading the recording and calculating the signature array.
  • an average difference of 10 has been found able to find virtually all possible matches while eliminating a significant number of false positives when using CD waveforms and three sample segments of five seconds each with 2048 slots. Under these conditions it has been found that 256 slots produces too many matches of nonsimilar waveforms and 4000 slots leaves the slots so sparsely populated that there are a large number of near matches.
  • the precise number of slots can be varied depending on the size of the sample segment(s) and the type of waveforms being sampled.
  • more precise comparison of the identifying and existing signature arrays may be performed.
  • the number of slots that match exactly or are within one occurrence of matching may be used. In the example given above, 6 out of 7 or 86% of the elements of arrays A1 and A2 match if an error of one (or one grace) is permitted and 3 out of 7 or 43% of the elements match precisely. It has been found that a better than 80% match for a one grace or a better than 70% match with no grace is likely to be an acceptable match.
  • the grace value can be increased to more than one to allow more forgiveness in matching the waveforms.
  • a unique identifier for a musical recording may be employed as a database key.
  • a site may maintain a database of information about CDs, for example information about all CDs issued by the particular record company can be maintained on that record company's site.
  • a third way of searching which is enabled by the unique identifier of the invention is for there to be Web page which invites the user to place in the computer's CD drive the CD about which he or she is seeking information.
  • a script in the Web page computes the unique identifier corresponding to the CD and sends it to the server.
  • the server displays information about the CD retrieved from a database on the basis of that unique identifier.
  • This information may include a Web address (URL) that is related to the audio CD (e.g., that of the artists' home page), simple data such as the names of the songs, and also complementary entertainment, including potentially photographs (e.g., of the band), artwork, animations, and video clips.
  • URL Web address
  • the Web browser is launched if not already running, (ii) the browser computes the CD's unique identifier and from that unique identifier derives a URL, and (iii) the browser does an HTTP get transaction on that URL.
  • An alternative application of unique identifiers for musical recordings is to employ an audio CD as a key for entering into a premium area of the Web.
  • An audio CD is presently premium areas of the Web to which people are admitted by subscription.
  • a simple form of admission based on the unique identifier is to require, before accessing a particular area of the Web, that the user place in his or her CD drive a particular CD, or a CD published by a particular company or containing the music of a particular band or artist. This is readily accomplished by means of a script which invokes the functions provided by the command plug-in and computes a unique identifier.
  • a third aspect of the invention is the connection of chat rooms with musical recordings. The goal is to provide all participants in a chat room with the same music at approximately the same time.
  • the prevailing network protocol for chat services is Internet Relay Chat (IRC), described J. Oikarinen & D. Reed, Internet Relay Chat Protocol (Internet Request for Comments No. 1459, 1993).
  • IRC Internet Relay Chat
  • the chat server receives messages from all of its of clients and relays the messages sent in by one client to all the other clients connected in the same room as that client.
  • the messages which a client sends are typically typed in by the user who is running the client, and the messages which a client receives are typically displayed for the user who is running the client to read.
  • a chat client is customized by means of a plug-in, which we will call the chat plug-in.
  • the chat client is started up by a browser as follows (see FIG. 3).
  • the user connects by means of the browser to a central Web page (box 300) which, upon being downloaded, asks that the user insert a CD into his or her player (box 305).
  • a unique identifier of the CD is computed and communicated back to the server by using the control plug-in described above under the command of a script in the central Web page (box 310).
  • the server then employs the unique identifier to determine whether it has a chat room focused on the CD (box 315).
  • This step may be carried out by looking the unique identifier up in a database using techniques well known in the art. There exists a vast literature on connecting Web pages to databases, e.g., December & Ginsburg, supra, chapter 21. If a chat room focused on the CD exists or can be created, the server responds with the name of that chat room, and the browser starts up a chat client on the user's computer as a client of that chat room (box 320).
  • the chat room's name is set by the server to contain information about the track which the CD is playing in the other chat room clients' machines and the time at which the track started to play, as well as about the volume at which the
  • the chat client plug-in employs that information to direct the control plug-in to set the CD in the user's computer to play in such a manner that it is approximately synchronized to the CD which is playing in the other chat room clients' machines (box 320).
  • Each user in the chat room is able to control the CD which is playing in his or her machine.
  • Control actions result in the chat plug-in sending messages to the chat server which describe the control action being taken (box 325). For example, such messages may indicate a change in the position of the CD, a change in the volume, or the ejection of the CD to replace it with another.
  • the chat plug-ins running on the other users' machines upon seeing a message of this kind, replicate the action (as far as possible) on the other users' machines by using the control plug-in described above (box 330).
  • a chat room focused on a particular musical recording might allow for a voting procedure to select particular tracks.
  • a simple voting procedure would be for each chat plug-in to act upon a change message of the kind described in the preceding paragraph only when it sees two identical consecutive change messages. This would mean that in order to change the track which is being played, it would be necessary for two users to change to that track. The number two may be replaced by a higher number.
  • the messages delivered to the users of a chat can be driven from a text file rather than manual typing. This would allow a pre-recorded experience to be played back for a group of chat users. Such a technique may be used to create a pre-recorded, narrated tour of an audio CD.
  • An important advantage of the preferred embodiment as described above is that it may be used with any chat server software which supports the minimal functionality required by Internet Relay Chat or by a protocol providing similar minimum chat service. The additional software required is located in the chat client plug-in and in the central Web page, with its connection to a database of CD information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

To determine whether there is a record in a database corresponding to a file containing a waveform, one or more segments of a digitally sampled wave form are used to form an amplitude signature of the waveform. The amplitude signature is generated by counting the number of occurrences with the segment(s) of the waveform in each of a plurality of amplitude bands or slots. The amplitude signature of the waveform undergoes a fuzzy comparison with amplitude signatures in the database. If more than one potential match is found, a more precise comparison is made. A CD amplitude signature may be formed of approximately 2000 amplitude bands or slots from the lowest amplitude to the highest amplitude of the waveform by accumulating the occurrence of signals within each amplitude slot for all of the sample segments of the CD. The amplitude signature can be used to distinguish between multiple potential matches obtained based on table of contents (TOC) data for the CD indicating the number of tracks and the length of each.

Description

METHOD AND SYSTEM FOR FINDING MATCH IN DATABASE RELATED TO WAVEFORMS
CROSS-REFERENCE TO RELATED APPLICATIONS This is a continuation-in-part of co-pending U.S. Patent Application Serial
No. 09/354,164 filed July 16, 1999, which is a divisional of U.S. Patent Application Serial No. 08/838,082, incorporated herein by reference, which issued November 16, 1999 as U.S. Patent No. 5,987,525.
BACKGROUND OF THE INVENTION Field of the Invention
The present invention is directed to locating records in a database and, more particularly, to locating a match for a waveform in a database of records representing waveforms.
Description of the Related Art Over the past few years, on-line services have experienced explosive growth and have become a major new form of entertainment. Alongside this new entertainment, more traditional forms such as musical recordings have continued to be consumed on a massive scale.
The traditional experience of the musical recording is listening by a small group of persons gathered together in a room. The music fills the room acoustically, but there is little associated visual content, and there is only a limited interaction with the recording, consisting essentially of deciding which tracks to play and performing simple transformations on the recorded sound, such as setting the volume or applying an audio equalizer. This traditional experience dates back to the early age of 78 r.p.m. musical recordings almost a century ago.
The traditional production of a musical recording complements the traditional experience of the recording. The recording is produced in a number of recording sessions, subject to careful mixing and editing, and then released to the public. At that point, the recording is in a fixed form, nowadays an audio CD, whose purpose is to record as faithfully as possible the final sonic experience designed by its authors, the musicians, producer, and recording engineers. Music videos have supplemented the traditional experience of musical recordings by allowing the association of visual content with tracks of such a recording. In practice, however, music videos have been broadcast, with all the problems of lack of user control which that implies, and they have not contributed to interactivity or participation by the consumer.
On-line services offer opportunities for enriching the experience associated with musical recordings. The present invention is addressed to computer programs, systems, and protocols which can fulfil this promise.
SUMMARY OF THE INVENTION It is therefore an object of this invention to provide computer programs, systems, and protocols which allow producers to deliver entertainment complementary to musical recordings by means of on-line services such as the Internet. It is a further object of this invention to provide computer programs, systems, and protocols which allow such complementary entertainment to be meaningfully interactive for the consumer, such that the consumer can also be a creator of the experience.
It is a further object of the invention to achieve the foregoing objects by means of implementations designed to attain integration with existing environments and programs, particularly on the Internet, while retaining the flexibility to adapt to the continuing evolution of standards for on-line services.
In one aspect of the invention, software is provided which permits a computer program running on a remote host to control a compact disc (CD) player, DVD player, or the like on a user's computer. (For convenience, we use the term "CD player" to refer also to DVD players and similar devices.) The software is designed to permit the remote host both to initiate actions on the CD player and to become aware of actions which the user has initiated by other control means, such as the buttons on the CD player's front panel or a different CD player control program. This aspect of the invention is a building-block for the provision of complementary entertainment for musical recordings when those recordings are fixed in the prevailing contemporary form, the audio CD.
In a second aspect of the invention, visual content, including interactive content, may be delivered over an on-line service in such a way that it is synchronized to the delivery of content from a musical recording. Such visual content may, for example, be synchronized to the playing of an audio CD in the user's computer. The visual content is thematically linked to the musical recording, for example in the manner of a music video.
In a third aspect of the invention, a method is provided for assigning a unique identifier to musical recordings consisting of a number of tracks. A unique identifier is a useful complement to the delivery of visual content in conjunction with the playing of an audio CD in that it allows the software which delivers the visual content to be sure that the audio CD is in fact the correct CD to which the visual content corresponds. If the visual content is designed, for example, to accompany the Rosary Sonatas of Heinrich Ignaz Franz Biber, it would presumably not function well if the CD in the user's player were the soundtrack for the film Mary Poppins. The unique identifier also allows a CD to be used as a key to access a premium Web area. Furthermore, the unique identifier can allow the user to be directed to an area of the Web corresponding to the CD which is in the user's machine.
In a fourth aspect of the invention, the immensely popular on-line service generally referred to as a "chat room" may be enhanced by means of a link to a musical recording to which all persons in the room are listening. The chat room experience as it exists today in on-line services has a disembodied quality by comparison with traditional face-to-face social encounters, in which there are identifiable surroundings. The only common experience to the chat users today are the words of the chat as they fly by on a computer screen, and perhaps the user icons ("avatars") or other visual content occupying a small space on the screen. The use of a musical recording in conjunction with a chat room opens up the possibility of restoring to the experience a degree of the shared ambience of traditional social encounters. Furthermore, the musical recording offers a focal point that allows chat-seekers to group together by means of shared interests in a particular type of recording.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of the environment in which the preferred embodiment operates.
FIG. 2 is a flowchart of the synchronization code of the invention. FIG. 3 is a flowchart of the sequence of operations for connection to a chat room focused on a musical recording.
FIGS. 4A and 4B are explanatory diagrams of waveform analysis according to the present invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
The preferred embodiment of this invention operates on the World Wide Web. The software implementation environment provided by the World Wide Web is described in a number of books, for example, John December & Mark Ginsburg, HTML 3.2 and CGI Unleashed (1996). The World Wide Web is based on a network protocol called HTTP (hypertext transfer protocol), which is described in
T. Berners-Lee et al., Hypertext Transfer Protocol-HTTP/1.0 (Internet Request for Comments No. 1945, 1996). The HTTP protocol must be run atop a general connection-oriented protocol, which today is generally TCP/IP, described in Douglas E. Comer, Internetworking with TCP/IP (3d ed. 1995). However, the invention described here is not limited to HTTP running over any particular kind of network software or hardware. The principles of the invention apply to other protocols for access to remote information that may come to compete with or supplant HTTP.
As shown in FIG. 1 , a Web user sits at his or her computer and runs a computer program called a browser. The browser sends out HTTP requests to other computers, referred to as servers. In requests, particular items of data, referred to as resources, which are available on servers, are referred to by means of uniform resource locators (URL's), character strings in a particular format defined in Berners-Lee et al., supra. A URL includes both an identification of the server and an identification of a particular item of data within the server. Reacting to the requests, the servers return responses to the user's browser, and the browser acts upon those responses, generally by displaying some sort of content to the user.
The content portion of the responses can be a "Web page," expressed in the hypertext markup language (HTML). That language allows one to express content consisting of text interspersed with bitmap-format images and links (also known as anchors and hyperlinks). The links are further URL's to which the browser may, at the user's prompting, send further requests. The responses can also include more complex commands to be interpreted by the browser, e.g., commands which result in an animation. HTML itself does not define complex commands, but rather they are considered to belong to separately-defined scripting languages, of which the two most common ones are JavaScript and VBScript.
In addition to extending the function of the browser by means of code written in a scripting language, it is also possible to extend the function of a browser with compiled code. Such compiled code is referred to as a "plug-in." The precise protocol for writing a plug-in is dependent on the particular browser. Plug-ins for the Microsoft browser are referred to by the name of ActiveX controls.
Plug-ins may be very complex. A plug-in which may advantageously be used in connection with the invention is Shockwave from Macromedia. It permits animations which are part of a server response to be downloaded and played to the user. Shockwave defines its own scripting language called Lingo. Lingo scripts are contained within the downloadable animations which the Shockwave plug-in can play. The general format of a Shockwave animation is a timeline consisting of a series of frames, together with a number of visual objects which appear, perform motions, and disappear at particular frames within the timeline. To achieve more complex effects within a Shockwave animation, Lingo scripts may be invoked in addition to predefined visual objects.
A preferred embodiment of the invention employs a plug-in, referred to as the command plug-in, which provides to a scripting language the ability to command in a detailed fashion the playing of a musical recording. The command plug-in should provide, at a minimum, the following basic functions: (1 ) Start and stop play.
(2) Get current track and position within the track.
(3) Seek to a track and a position within the track.
(4) Get and set volume.
(5) Get information regarding the CD (e.g., the number of tracks, their lengths, the pauses between tracks).
(6) Get information regarding the capabilities of the CD drive. Other functions may be provided, limited only by what the underlying operating system services are able to provide.
The command plug-in is preferably written in a conventional programming language such as C++. The plug-in must conform to the existing standards for plug-ins, such as those required of Microsoft ActiveX objects. In order to obtain the information and carry out the functions which the command plug-in makes available to the scripting language, the command plug-in relies on functions which provide control and information regarding the playing musical recording. These functions will depend on the precise source of the recording. If, as in the currently preferred embodiment, the recording is being played on an audio CD in the computer CD player, and if the browser is running under Microsoft Windows 3.1 or Windows 95, these functions would be the MCI functions, which form a part of the Win32 application programming interface. These functions are documented, for example, in Microsoft Win32 Programmer's Reference. Different functions may be provided by streaming audio receivers, as for example receivers which capture audio which is coming into the user's computer over a network connection in a suitable audio encoding format such as MPEG.
An important point to note about the implementation of the command plug- in is that the operations which it carries out, as for example seeks, may take times on the order of a second. It is undesirable for the command-plug in to retain control of the machine during that interval, so it is important that the plug-in relinquish control of the machine to the browser whenever a lengthy operation is undertaken, and report on the results of the operation via the asynchronous event handling capability used in the common scripting languages. Given the above summary of the functions which the command plug-in provides, a general knowledge of how to write plug-ins (e.g., of how to write
ActiveX objects), and a knowledge of the relevant application programming interface for controlling the play of the musical recording (e.g., MCI in Win32), a o person skilled in the art could readily and without undue experimentation develop an actual working command plug-in. For this reason, further details of how the command plug-in is implemented are not provided here.
The existence of a command plug-in providing the functions listed above to a scripting language is a foundation on which entertainment complementary to a musical recording may be constructed. In particular, it is possible to devise, building on this foundation, a method for synchronizing the display of visual content by means of the scripting language with the events which are occurring on the audio CD.
In a preferred embodiment of the invention, the synchronization of the visual content to the audio CD proceeds as follows. The visual content is provided by means of a Shockwave animation, which is downloaded from the server and displayed for the user by means of a Shockwave plug-in. This downloading may take place before the animation is displayed, or alternatively it may take place as the animation is being displayed, provided the user's connection to the network is fast enough to support download at an appropriate speed. The downloading is a function provided by the Shockwave plug-in itself.
As the Shockwave animation is played, a Lingo script executes each time a frame finishes displaying. The Lingo script contains a description of the relationship which should exist between frames of the animation and segments of the musical recording, identified by track number and by time. The Lingo script determines, by means of the command plug-in described above, at which track and time the play of the audio CD is. It then refers to the description in order to determine which frames of the animation correspond to that portion of the audio CD. If the current frame is not one of those frames, the Lingo script resets the time line of the animation so that the animation will begin to play at the frame which corresponds to the current position of the audio CD. This permits the visual content to catch up if it ever lags the CD, for example because downloading from the network has fallen behind, because the user's computer lacks the cycles to play the animation at full speed, or because the user has fast forwarded the CD. In a variant form of this synchronization algorithm (shown in FIG. 2), the frames of the animation are arranged into groups of contiguous frames. A correspondence is established between each such group of frames and a particular segment of the audio recording (box 200 in FIG. 2). At the end of each frame of the animation, the audio play position is determined (box 210). A test is done to determine whether the audio play position is within the segment of the recording that corresponds to the group of frames to which the next sequential frame belongs (box 215). If the audio play position is within that segment, the playback of the animation proceeds with that next frame (box 230). If the audio play position is not within that segment, then the playback of the animation is advanced to the frame corresponding to where the audio is (boxes 220 and 225).
A further aspect of the invention is the ability, by making use of the command plug-in, to provide a technique for establishing a unique identifier for a recording which may be stored in mass storage, whether integrated circuit, magnetic (e.g., hard disk), or any other medium, or on a removable medium, such as an audio CD, or integrated circuit memory, such as compact flash memory, Memory Stick™, etc., accessed by a CD-ROM drive of a computer, MP3 player/recorder or any other device capable of accessing the medium. The unique identifier may be based on the number and lengths of the tracks
(measured in frames, i.e., 1/75ths of a second) from Table of Contents (TOC) data or the content of the recording itself. The identifier could simply be a concatenation of the track lengths that can be used with a fuzzy comparison algorithm and also for more precise matching if more than one possible match is located.
Following is an example of a fuzzy comparison algorithm that can be used with the present invention. For each of the two audio CDs to be compared, one determines the lengths of all the tracks in the recordings in milliseconds. One then shifts each of the track lengths to the right by eight bits, in effect performing a truncating division by 28 = 256. One then goes through both of the recordings track by track, accumulating two numbers as one proceeds, the match total and the match error. These numbers are both initialized to zero at the start of the comparison. For each of the tracks, one increments the match total by the shifted length of that track in the first CD to be compared, and one increments the match error by the absolute value of the difference between the shifted lengths of the track in the two CDs. If one of the CDs has fewer tracks than the other, when one gets to the last track in the CD with fewer tracks, one continues with the tracks in the other CD, incrementing both the match total and the match error by the shifted lengths of the remaining tracks. Following these steps of going through the tracks, the algorithm then divides the match error by the match number, subtracts the resulting quotient from 1 , and converts the difference to a percentage which is indicative of how well the two CDs match.
Use of track length to create an identifier for a recording is best suited to media that have multiple tracks and preferably those that store such information in a table of contents or TOC, such as CDs and DVDs. Furthermore, use of track length or TOC data has been found to work best with fuzzy matching, but this sometimes results in finding more than one possible match. An alternative or supplement for TOC data is to use the content of the recording. However, it is desirable to use a content-based identifier that is relatively small, to minimize storage space and bandwidth requirements.
An embodiment of the present invention uses an amplitude signature providing a content-based identifier generated from short, e.g., five second sample segments from multiple locations in each track (if there is more than one track in a recording), such as the beginning, middle and end. An example of one such sample segment (the term sample segment is used to distinguish the segments used for generating the identifier from identified segments, i.e., segments identified in the TOC, that are commonly referred to as "tracks" on a CD) is illustrated in Fig. 4A with a waveform 410. According to the present invention, a plurality of amplitude bands or slots are defined and the number of occurrences of all segments of the waveform within each slot are counted. Redbook CD Audio is a sampled digital audio file of 44.1 K samples per channel, 16-bit stereo with 75 frames of data per second. Thus, there are a maximum of 220,500 occurrences in one five second sample segment (75 frames/sec * 588 samples/frame * 5 sec = 220,500 samples in 5 seconds of data). To ensure uniqueness, it is desirable to use about 2000 (e.g., 211 or 2048) slots, but other sizes, number and types of samples and number of slots can be used, depending on the characteristics of the waveforms being compared. To simplify the explanation of the invention, a coarser example will be given with respect to Figs. 4A and 4B.
In the preferred embodiment, the first step is to normalize the waveform, so that the first and last slots have at least one occurrence of the waveform. The waveform 410 in Fig. 4A is normalized over the seven slots 420 illustrated in Fig. 4Ato produce waveform 410b in Fig. 4B with the slots 420 separately indicated as slots 421-427. In the simplified example provided in Fig. 4B, 16 time samples are taken, one at each of the vertical lines. Thus, there is one sample of waveform 410b in slot 421 , three in slot 422, two in slot 423, one in slot 424, two in slot 425, three in slot 426 and four in slot 427. This can be represented by the linear array A1 [1, 3, 2, 1 , 2, 3, 4]. If the array A1 is an identifying signature array representing a selected recording for which a match in a database is sought, a fuzzy match may be accomplished by calculating an average of the difference between the elements of A1 and the elements of existing signature arrays. For example, one of the records in the database may have a signature array of A2 [2, 3, 4, 1 , 1 , 3, 3] for a difference array of [1 , 0, 2, 0, 1 , 0, 1] or an average difference of 5/7 or 0.714. A "fuzzy match" based on average difference allows for errors in the waveform and imperfect starting locations for the signature generation. However, the average difference that is accepted as a match should be set to minimize false positives. Alternatively, the number or length of the sample segments could be increased to reduce false positives, but this increases the time spent reading the recording and calculating the signature array. For the waveforms that have been tested, an average difference of 10 has been found able to find virtually all possible matches while eliminating a significant number of false positives when using CD waveforms and three sample segments of five seconds each with 2048 slots. Under these conditions it has been found that 256 slots produces too many matches of nonsimilar waveforms and 4000 slots leaves the slots so sparsely populated that there are a large number of near matches. The precise number of slots can be varied depending on the size of the sample segment(s) and the type of waveforms being sampled.
If more than one possible match has been found, more precise comparison of the identifying and existing signature arrays may be performed. The number of slots that match exactly or are within one occurrence of matching may be used. In the example given above, 6 out of 7 or 86% of the elements of arrays A1 and A2 match if an error of one (or one grace) is permitted and 3 out of 7 or 43% of the elements match precisely. It has been found that a better than 80% match for a one grace or a better than 70% match with no grace is likely to be an acceptable match. The grace value can be increased to more than one to allow more forgiveness in matching the waveforms. A unique identifier for a musical recording may be employed as a database key. A site may maintain a database of information about CDs, for example information about all CDs issued by the particular record company can be maintained on that record company's site. There are various alternative ways for users to navigate this information. For example, they could use a Web page containing many hyperlinks as a table of contents, or they could use a conventional search engine. A third way of searching which is enabled by the unique identifier of the invention is for there to be Web page which invites the user to place in the computer's CD drive the CD about which he or she is seeking information. Upon detection of the presence of the CD in the drive, a script in the Web page computes the unique identifier corresponding to the CD and sends it to the server. The server then displays information about the CD retrieved from a database on the basis of that unique identifier. This information may include a Web address (URL) that is related to the audio CD (e.g., that of the artists' home page), simple data such as the names of the songs, and also complementary entertainment, including potentially photographs (e.g., of the band), artwork, animations, and video clips. It is also possible to arrange things so that, when the user inserts an audio CD into the computer, (i) the Web browser is launched if not already running, (ii) the browser computes the CD's unique identifier and from that unique identifier derives a URL, and (iii) the browser does an HTTP get transaction on that URL.
An alternative application of unique identifiers for musical recordings is to employ an audio CD as a key for entering into a premium area of the Web. There are presently premium areas of the Web to which people are admitted by subscription. A simple form of admission based on the unique identifier is to require, before accessing a particular area of the Web, that the user place in his or her CD drive a particular CD, or a CD published by a particular company or containing the music of a particular band or artist. This is readily accomplished by means of a script which invokes the functions provided by the command plug-in and computes a unique identifier. A third aspect of the invention is the connection of chat rooms with musical recordings. The goal is to provide all participants in a chat room with the same music at approximately the same time.
The prevailing network protocol for chat services is Internet Relay Chat (IRC), described J. Oikarinen & D. Reed, Internet Relay Chat Protocol (Internet Request for Comments No. 1459, 1993). In this protocol, when one becomes a client of a chat server, one sends the name of a chat room. The chat server receives messages from all of its of clients and relays the messages sent in by one client to all the other clients connected in the same room as that client. The messages which a client sends are typically typed in by the user who is running the client, and the messages which a client receives are typically displayed for the user who is running the client to read.
In a preferred embodiment of the invention, a chat client is customized by means of a plug-in, which we will call the chat plug-in. The chat client is started up by a browser as follows (see FIG. 3). The user connects by means of the browser to a central Web page (box 300) which, upon being downloaded, asks that the user insert a CD into his or her player (box 305). A unique identifier of the CD is computed and communicated back to the server by using the control plug-in described above under the command of a script in the central Web page (box 310). The server then employs the unique identifier to determine whether it has a chat room focused on the CD (box 315). This step may be carried out by looking the unique identifier up in a database using techniques well known in the art. There exists a vast literature on connecting Web pages to databases, e.g., December & Ginsburg, supra, chapter 21. If a chat room focused on the CD exists or can be created, the server responds with the name of that chat room, and the browser starts up a chat client on the user's computer as a client of that chat room (box 320).
The chat room's name is set by the server to contain information about the track which the CD is playing in the other chat room clients' machines and the time at which the track started to play, as well as about the volume at which the
CD is playing. The chat client plug-in employs that information to direct the control plug-in to set the CD in the user's computer to play in such a manner that it is approximately synchronized to the CD which is playing in the other chat room clients' machines (box 320). Each user in the chat room is able to control the CD which is playing in his or her machine. Control actions result in the chat plug-in sending messages to the chat server which describe the control action being taken (box 325). For example, such messages may indicate a change in the position of the CD, a change in the volume, or the ejection of the CD to replace it with another. The chat plug-ins running on the other users' machines, upon seeing a message of this kind, replicate the action (as far as possible) on the other users' machines by using the control plug-in described above (box 330).
In a further aspect of the invention, a chat room focused on a particular musical recording might allow for a voting procedure to select particular tracks. A simple voting procedure would be for each chat plug-in to act upon a change message of the kind described in the preceding paragraph only when it sees two identical consecutive change messages. This would mean that in order to change the track which is being played, it would be necessary for two users to change to that track. The number two may be replaced by a higher number.
In a further aspect of the invention the messages delivered to the users of a chat can be driven from a text file rather than manual typing. This would allow a pre-recorded experience to be played back for a group of chat users. Such a technique may be used to create a pre-recorded, narrated tour of an audio CD. An important advantage of the preferred embodiment as described above is that it may be used with any chat server software which supports the minimal functionality required by Internet Relay Chat or by a protocol providing similar minimum chat service. The additional software required is located in the chat client plug-in and in the central Web page, with its connection to a database of CD information.
The many features and advantages of the present invention are apparent from the detailed specification and thus, it is intended by the appended claims to cover all such features and advantages of the system which fall within the true spirit and scope of the invention. Further, numerous modifications and changes will readily occur to those skilled in the art from the disclosure of this invention. It is not desired to limit the invention to the exact construction and operation illustrated and described; accordingly, suitable modification and equivalents may be resorted to, as falling within the scope and spirit of the invention.

Claims

CLAIMSWhat is claimed is:
1. A method of searching for a match in a database of a plurality of records, where the records in the database correspond to recordings containing waveforms, comprising: generating an amplitude signature for at least one segment of a selected recording; and determining at least one matching record in the database for the selected recording based on the amplitude signature.
2. A method as recited in claim 1 , further comprising calculating approximate length information for the records in the database and for the selected recording, and wherein said determining is also based on the approximate length information.
3. A method as recited in claim 2, wherein the recordings have at least one track wherein said calculating calculates a length of each track of each recording represented in the database and for the selecting recording, and wherein said determining is also based on the number and length of tracks of the recordings represented in the database and the selected recording.
4. A method as recited in claim 3, wherein the waveforms are represented by sampled digital data in the recordings and the selected recording, wherein said method further comprises storing an existing signature array for each of the recordings represented in the database, where each element of the existing signature array corresponds to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said generating produces an identifying signature array with each element of the identifying signature array corresponding to a number of occurrences of the sampled digital data within an amplitude band in the at least one segment of the selected recording.
5. A method as recited in claim 4, wherein said determining includes calculating an average difference between the elements of the identifying signature array and the existing signature array for the recordings represented in the database; and identifying as a possible match any recording represented in the database for which the average difference is greater than a predetermined value.
6. A method as recited in claim 4, wherein said determining includes calculating a matching percentage of corresponding elements in the identifying signature array and the existing signature arrays within a predetermined number of each other; and indicating as a possible match any recording represented in the database for which the matching percentage is greater than a predetermined percentage.
7. A method as recited in claim 6, wherein the predetermined number is zero and the predetermined percentage is approximately 70%.
8. A method as recited in claim 6, wherein the predetermined number is one and the predetermined percentage is approximately 80%.
9. A method as recited in claim 4, wherein the recordings are stored on removable storage media possessed by the user.
10. A method as recited in claim 4, wherein the recordings are digital files stored on mass storage accessible by a listener of the selected recording.
11. A method as recited in claim 3, further comprising receiving a query to search for a match between the selected recording and the records in the database, the query including the number of tracks and the length information for the selected recording.
12. A method as recited in claim 1 , wherein the waveforms are represented by sampled digital data in the recordings and the selected recording, wherein said method further comprises storing an existing signature array for each of the recordings represented in the database, where each element of the existing signature array corresponds to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said generating produces an identifying signature array with each element of the identifying signature array corresponding to a number of occurrences of the sampled digital data within an amplitude band in the at least one segment of the selected recording.
13. A method as recited in claim 12, wherein said determining includes calculating an average difference between the elements of the identifying signature array and the existing signature array for the recordings represented in the database; and identifying as a possible match any recording represented in the database for which the average difference is greater than a predetermined value.
14. A method as recited in claim 13, wherein said determining includes calculating a matching percentage of corresponding elements in the identifying signature array and the existing signature arrays within a predetermined number of each other; and indicating as a possible match any recording represented in the database for which the matching percentage is greater than a predetermined percentage.
15. A method as recited in claim 14, wherein the predetermined number is zero and the predetermined percentage is approximately 70%.
16. A method as recited in claim 14, wherein the predetermined number is one and the predetermined percentage is approximately 80%.
17. A method as recited in claim 12, wherein the recordings are stored on removable storage media possessed by the user.
18. A method as recited in claim 17, wherein the recordings are digital files stored on mass storage accessible by a listener of the selected recording.
19. A method as recited in claim 11 , wherein the selected recording is played at a first location on equipment possessed by a user, and wherein said method further comprises: generating a query by the equipment at the first location; and sending the query to a server at a second location where the database is stored, to search for at least one matching record.
20. A method as recited in claim 19, further comprising sending from the server to the equipment at the first location additional information stored in the at least one approximately matching record and not included in the selected recording.
21. A database system, comprising: a storage unit storing a database of records including existing signatures for recordings corresponding to the records; and a processing unit, coupled to said storage unit, programmed to generate an identifying amplitude signature for a selected recording, and to determine at least one matching record in the database for the selected recording by comparing the identifying amplitude signature with the existing amplitude signatures in the database.
22. A database system as recited in claim 21 , wherein said storage unit further stores information indicating length and number of identified segments of the recordings, and wherein said processing unit calculates approximate length information for the selected recording and further determines the at least one matching record in the database based on the approximate length information and a number of identified segments in the selected recording and the recordings corresponding to the records in the database.
23. A database system as recited in claim 21, wherein the recordings contain sampled digital data, wherein said storage unit stores the existing signature array with each element corresponding to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said processing unit generates the identifying signature array with each element corresponding to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the selected recording and determines the at least one matching record by calculating an average difference between the elements of the identifying signature array and the existing signature array for the recordings represented in the database and identifying as a possible match any recording represented in the database for which the average difference is greater than a predetermined value.
24. A database system as recited in claim 21 , wherein the recordings contain sampled digital data, wherein said storage unit stores the existing signature array with each element corresponding to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said processing unit generates the identifying signature array with each element corresponding to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the selected recording and determines the at least one matching record by calculating a matching percentage of corresponding elements in the identifying signature array and the existing signature arrays within a predetermined number of each other and indicating as a possible match any recording represented in the database for which the matching percentage is greater than a predetermined percentage.
25. A database system as recited in claim 21 , further comprising a communication unit, coupled to said storage unit, to receive a query to search for a match between the selected recording and the records in the database, the query including the number of segments and the length information for the selected recording.
26. A database system as recited in claim 25, wherein the recordings corresponding to the records in the database and the selected recording each contain at least an audio portion and the number of segments are the number of tracks in the audio portion.
27. A database system as recited in claim 26, wherein the recordings are stored on removable storage media possessed by the user.
28. A database system as recited in claim 26, wherein the recordings are digital files stored on mass storage accessible by a listener of the selected recording.
29. A database system as recited in claim 25, wherein said processing unit, storage unit and communication unit are at a first location, and wherein said database system further comprises: equipment possessed by a user at a second location, remote from the first location, to generate the query and play the selected recording; and a communication network at least temporarily coupling said equipment and said communication unit to send the query from said equipment to said communication unit. (
30. A database system as recited in claim 29, wherein said communication unit sends to the equipment via said communication network additional information stored in the at least one approximately matching record and not included in the selected recording.
31. At least one computer program stored on a computer-readable medium, embodying a method of searching for a match in a database of a plurality of records, where the records in the database correspond to recordings containing waveforms, comprising: generating an amplitude signature for at least one segment of a selected recording; and determining at least one matching record in the database for the selected recording based on the amplitude signature.
32. At least one computer program as recited in claim 31 , further comprising calculating approximate length information for the records in the database and for the selected recording, and wherein said determining is also based on the approximate length information.
33. At least one computer program as recited in claim 32, wherein the recordings have at least one track wherein said calculating calculates a length of each track of each recording represented in the database and for the selecting recording, and wherein said determining is also based on the number and length of tracks of the recordings represented in the database and the selected recording.
34. At least one computer program as recited in claim 33, wherein the waveforms are represented by sampled digital data in the recordings and the selected recording, wherein said method further comprises storing an existing signature array for each of the recordings represented in the database, where each element of the existing signature array corresponds to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said generating produces an identifying signature array with each element of the identifying signature array corresponding to a number of occurrences of the sampled digital data within an amplitude band in the at least one segment of the selected recording.
35. At least one computer program as recited in claim 34, wherein said determining includes calculating an average difference between the elements of the identifying signature array and the existing signature array for the recordings represented in the database; and identifying as a possible match any recording represented in the database for which the average difference is greater than a predetermined value.
36. At least one computer program as recited in claim 34, wherein said determining includes calculating a matching percentage of corresponding elements in the identifying signature array and the existing signature arrays within a predetermined number of each other; and indicating as a possible match any recording represented in the database for which the matching percentage is greater than a predetermined percentage.
37. At least one computer program as recited in claim 34, wherein the recordings are stored on removable storage media possessed by the user.
38. At least one computer program as recited in claim 34, wherein the recordings are digital files stored on mass storage accessible by a listener of the selected recording.
39. At least one computer program as recited in claim 33, further comprising receiving a query to search for a match between the selected recording and the records in the database, the query including the number of tracks and the length information for the selected recording.
40. At least one computer program as recited in claim 31 , wherein the waveforms are represented by sampled digital data in the recordings and the selected recording, wherein said method further comprises storing an existing signature array for each of the recordings represented in the database, where each element of the existing signature array corresponds to a number of occurrences of the sampled digital data within an amplitude band in at least one segment of the recordings represented in the database, and wherein said generating produces an identifying signature array with each element of the identifying signature array corresponding to a number of occurrences of the sampled digital data within an amplitude band in the at least one segment of the selected recording.
41. At least one computer program as recited in claim 40, wherein said determining includes calculating an average difference between the elements of the identifying signature array and the existing signature array for the recordings represented in the database; and identifying as a possible match any recording represented in the database for which the average difference is greater than a predetermined value.
42. At least one computer program as recited in claim 40, wherein said determining includes calculating a matching percentage of corresponding elements in the identifying signature array and the existing signature arrays within a predetermined number of each other; and indicating as a possible match any recording represented in the database for which the matching percentage is greater than a predetermined percentage.
43. At least one computer program as recited in claim 40, wherein the recordings are stored on removable storage media possessed by the user.
44. At least one computer program as recited in claim 43, wherein the recordings are digital files stored on mass storage accessible by a listener of the selected recording.
45. At least one computer program as recited in claim 40, wherein the selected recording is played at a first location on equipment possessed by a user, and wherein said method further comprises: generating a query by the equipment at the first location; and sending the query to a server at a second location where the database is stored, to search for at least one matching record.
46. At least one computer program as recited in claim 45, further comprising sending from the server to the equipment at the first location additional information stored in the at least one approximately matching record and not included in the selected recording.
PCT/US2001/022891 2000-07-21 2001-07-20 Method and system for finding match in database related to waveforms WO2002008943A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2002514577A JP2004511838A (en) 2000-07-21 2001-07-20 Method and system for finding matches in a database for waveforms
AU2001277034A AU2001277034A1 (en) 2000-07-21 2001-07-20 Method and system for finding match in database related to waveforms
EP01954813A EP1303817A2 (en) 2000-07-21 2001-07-20 Method and system for finding match in database related to waveforms
NO20030319A NO20030319L (en) 2000-07-21 2003-01-21 Procedure and system for finding a match in a waveform database

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/621,619 US7228280B1 (en) 1997-04-15 2000-07-21 Finding database match for file based on file characteristics
US09/621,619 2000-07-21

Publications (2)

Publication Number Publication Date
WO2002008943A2 true WO2002008943A2 (en) 2002-01-31
WO2002008943A3 WO2002008943A3 (en) 2002-07-25

Family

ID=24490907

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/022891 WO2002008943A2 (en) 2000-07-21 2001-07-20 Method and system for finding match in database related to waveforms

Country Status (5)

Country Link
EP (1) EP1303817A2 (en)
JP (1) JP2004511838A (en)
AU (1) AU2001277034A1 (en)
NO (1) NO20030319L (en)
WO (1) WO2002008943A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005101243A1 (en) * 2004-04-13 2005-10-27 Matsushita Electric Industrial Co. Ltd. Method and apparatus for identifying audio such as music
US20120294459A1 (en) * 2011-05-17 2012-11-22 Fender Musical Instruments Corporation Audio System and Method of Using Adaptive Intelligence to Distinguish Information Content of Audio Signals in Consumer Audio and Control Signal Processing Function

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5982791B2 (en) 2011-11-16 2016-08-31 ソニー株式会社 Information processing apparatus, information processing method, information providing apparatus, and information providing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5347083A (en) * 1992-07-27 1994-09-13 Yamaha Corporation Automatic performance device having a function of automatically controlling storage and readout of performance data
EP0731446A1 (en) * 1995-03-08 1996-09-11 GENERALMUSIC S.p.A. A microprocessor device for selection and recognition of musical pieces
US5983176A (en) * 1996-05-24 1999-11-09 Magnifi, Inc. Evaluation of media content in media files

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987525A (en) * 1997-04-15 1999-11-16 Cddb, Inc. Network delivery of interactive entertainment synchronized to playback of audio recordings
JP3434223B2 (en) * 1998-11-19 2003-08-04 日本電信電話株式会社 Music information search device, music information storage device, music information search method, music information storage method, and recording medium recording these programs
JP3467415B2 (en) * 1998-12-01 2003-11-17 日本電信電話株式会社 Music search device, music search method, and recording medium recording music search program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5347083A (en) * 1992-07-27 1994-09-13 Yamaha Corporation Automatic performance device having a function of automatically controlling storage and readout of performance data
EP0731446A1 (en) * 1995-03-08 1996-09-11 GENERALMUSIC S.p.A. A microprocessor device for selection and recognition of musical pieces
US5983176A (en) * 1996-05-24 1999-11-09 Magnifi, Inc. Evaluation of media content in media files

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PFEIFFER S ET AL: "AUTOMATIC AUDIO CONTENT ANALYSIS" PROCEEDINGS OF ACM MULTIMEDIA 96. BOSTON, NOV. 18 - 22, 1996, NEW YORK, ACM, US, 18 November 1996 (1996-11-18), pages 21-30, XP000734706 ISBN: 0-89791-871-1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005101243A1 (en) * 2004-04-13 2005-10-27 Matsushita Electric Industrial Co. Ltd. Method and apparatus for identifying audio such as music
US20120294459A1 (en) * 2011-05-17 2012-11-22 Fender Musical Instruments Corporation Audio System and Method of Using Adaptive Intelligence to Distinguish Information Content of Audio Signals in Consumer Audio and Control Signal Processing Function

Also Published As

Publication number Publication date
NO20030319L (en) 2003-03-20
JP2004511838A (en) 2004-04-15
AU2001277034A1 (en) 2002-02-05
NO20030319D0 (en) 2003-01-21
WO2002008943A3 (en) 2002-07-25
EP1303817A2 (en) 2003-04-23

Similar Documents

Publication Publication Date Title
US7167857B2 (en) Method and system for finding approximate matches in database
US5987525A (en) Network delivery of interactive entertainment synchronized to playback of audio recordings
US7945645B2 (en) Method and system for accessing web pages based on playback of recordings
US7228280B1 (en) Finding database match for file based on file characteristics
EP0875846B1 (en) Multimedia information transfer via a wide area network
US20020194260A1 (en) Method and apparatus for creating multimedia playlists for audio-visual systems
KR20020072453A (en) Reproducing apparatus and additional information providing server system therefor
JP2010257466A (en) Digital audio track set recognition system
JPH09247599A (en) Interactive video recording and reproducing system
CA2553159A1 (en) Network delivery of interactive entertainment complementing audio recording
WO2002008943A2 (en) Method and system for finding match in database related to waveforms
Fingerhut The ircam multimedia library: A digital music library
KR20030094155A (en) Information storage medium for additional information
KR20030094153A (en) Additional information providing method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 2001954813

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2001954813

Country of ref document: EP