WO2012033505A1 - Systems and methods for recording and sharing audio files - Google Patents

Systems and methods for recording and sharing audio files Download PDF

Info

Publication number
WO2012033505A1
WO2012033505A1 PCT/US2010/053084 US2010053084W WO2012033505A1 WO 2012033505 A1 WO2012033505 A1 WO 2012033505A1 US 2010053084 W US2010053084 W US 2010053084W WO 2012033505 A1 WO2012033505 A1 WO 2012033505A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio files
server
audio
website
user
Prior art date
Application number
PCT/US2010/053084
Other languages
French (fr)
Inventor
Walter Bachtiger
Original Assignee
Walter Bachtiger
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/878,014 external-priority patent/US20110072350A1/en
Application filed by Walter Bachtiger filed Critical Walter Bachtiger
Publication of WO2012033505A1 publication Critical patent/WO2012033505A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/638Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics

Definitions

  • the field of the present invention relates to systems and methods for recording, indexing, storing, and sharing audio files, including, but not limited to, recorded telephone conversations, internet communications, and other audio content.
  • Such currently-available systems do not efficiently allow users to share and publish comments regarding specific and limited portions of the content of a particular audio file, for a plurality of other users to review.
  • currently-available systems do not efficiently allow users to query a large body of different audio files for content that relates to a particular topic - or rank such audio files in order of relevance to a particular topic.
  • currently-available communication management systems fail to adequately incentivize users to share, publish, and make available to others the audio files that may be recorded within a particular database used by such systems.
  • the present invention addresses many of these, and other, drawbacks that are associated with currently-available audio storage and retrieval systems.
  • systems are provided for recording and sharing audio files among a plurality of users. More particularly, the systems generally comprise a server that is configured to receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in communication with the server.
  • the invention provides that the server is configured to make one or more of the audio files accessible to one or more persons - other than the original sources of such audio files. In other words, if certain conditions are satisfied, the audio files that a first person records within the database of the system will be accessible by other persons.
  • the server is preferably configured to receive and publish comments associated with specific portions of the audio files within a graphical user interface of a website.
  • the invention provides that the comments may be submitted to the server through the website by persons other than the original sources (or authors) of such audio files.
  • the audio files that are stored within the server may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
  • systems for recording and sharing audio files which incentivize users of the system to share, publish, and make available to others the audio files that may be recorded within a particular database.
  • the systems generally comprise a server that is configured to (a) receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in communication with the server; and (b) make one or more of the audio files accessible to persons other than the original sources (or authors) of such audio files.
  • the server is also configured to track the number of audio files shared by each user of the system.
  • the invention provides that an audio file is considered "shared" when a user makes an audio file accessible to, or otherwise refers the audio file to, another user of the system. Still further, the invention provides that the server may, optionally, be configured to grant credit to each user of the system based on the number of audio files shared by each user during a defined period of time. According to such embodiments, the credit that is granted to each user may be redeemed, for example, in exchange for the right to use the system without charge (for a defined period of time) or other forms of consideration.
  • systems and methods for converting audio content into text if, and only if, such conversion satisfies a minimum accuracy confidence threshold are provided.
  • the invention provides that other non-literary symbols are used to signify the presence of those audio-to-text conversions that do not meet the predefined minimum accuracy confidence threshold.
  • the methods comprise receiving audio content within a computer server, and instructing the server to perform an audio content to text transcription using one or more algorithms.
  • the methods further comprise instructing the server to display a set of results for such transcription within a graphical user interface of a computing device for each word that (i) was converted into text from the audio content and (ii) meets or exceeds a predefined accuracy confidence threshold. Still further, according to such embodiments, the methods comprise instructing the server to display a non-literary symbol for each word that was converted into text from the audio content, but which does not meet or exceed the predefined accuracy confidence threshold. The invention further provides that a non-literary symbol may be shown for each letter that comprises the word that does not meet or exceed the predefined accuracy confidence threshold. For example, if the non-displayed word includes five letters, the transcription results would display five consecutive non-literary symbols (to indicate the number of letters that are included in the non-displayed word).
  • methods for recording, indexing, storing, and sharing audio files are provided, which generally comprise the use of the systems described herein.
  • FIGURE 1 is a diagram showing the different components of the systems described herein.
  • FIGURE 2 is a diagram showing the interactive nature and audio file sharing capability of the systems described herein.
  • FIGURE 3 is a flow chart illustrating the controls provided by the systems described herein, which allow only specified users to access certain audio files and/or comments related thereto within the centralized website.
  • FIGURE 4 is a diagram showing certain non-limiting components of an exemplary graphical user interface in which a user may query the content of a plurality of audio files, identify those audio files which include a certain key word (or set of key words) that the user defines, and quickly view the context in which such key word is used in one or more audio files.
  • FIGURE 5 is a flow diagram that summarizes certain audio-to-text methods of the present invention.
  • FIGURE 6 is a non-limiting example of certain output of an audio-to- text conversion using the methods and systems of the present invention.
  • the present invention generally encompasses systems for recording and sharing audio files among a plurality of users.
  • the systems generally comprise a server 2 that is configured to receive, index, and store a plurality of audio files, which are received by the server 2 from a plurality of sources, within at least one database 4 in communication with the server 2.
  • the invention provides that the database 4 may reside within the server 2 or, alternatively, may exist outside of the server 4 while being in communication therewith via a network connection.
  • the audio files may be indexed 6 and categorized within the database 4 based on author, time of recordation, geographical location of origin, IP addresses, language, key word usage, combinations of the foregoing, and other factors.
  • the invention provides that the audio files are preferably submitted to the server 2 through a centralized website 8 that may be accessed through a standard internet connection 10.
  • the invention provides that the website 8 may be accessed, and the audio files submitted to the server 2, using any device that is capable of establishing an internet connection 10, such as using a personal computer 12 (including tablet computers), telephone 14 (including smart phones, PDAs, and other similar devices), meeting conference speaker phones 16, and other devices.
  • the invention provides that the audio files may be created by such devices and then uploaded to the server 2 or, alternatively, the audio content may be streamed in real time (through such devices) with the audio file being created (and then indexed and stored) within the server 2 and database 4.
  • the invention provides that the audio files that are stored within the server 2 and database 4 may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
  • the invention provides that the server 2 may receive and manage audio files in many ways, such that the contents thereof may be deciphered and used as described herein.
  • the invention provides that upon an audio file being submitted to the server 2, the server 2 may perform a speech-to- text, speech-to-phoneme, speech-to-syllable, and/or speech-to-subword conversion, and then store an output of such conversion within the database 4.
  • the content of each audio file may be intelligently queried and used in the manner described herein, such as for querying such content for key words.
  • the invention provides that when reference is made to "audio files that contain a key word,” and similar phrases, it should be understood that such phrase encompasses a text file that contains the key word, with the text file being derived from an audio file, as explained above.
  • audio files that contain a key word For example, after performing a speech-to-text conversion, and storing such text within the database 4, if a search is performed using the system of the present invention for audio files that contain a particular key word, the system will actually search the converted text forms of such audio files. Upon identifying any text forms of such audio files that contain the queried key word, it will be inferred that the audio file that corresponds with the searched text file will actually contain the key word.
  • the audio file that is provided to the server 2 and database 4 may represent and be derived from, for example, a recorded telephone conversation, VoIP conversation, group meeting (through a speaker phone), speech or lecture (through a microphone), deposition or court room testimony (through a court reporter's microphone and/or transcript data entry), and other audio sources.
  • the invention provides that the system described herein is preferably compatible with, and capable of receiving audio files from, any devices that may be used among persons to communicate, to transmit communications, or to record communications. In general, the invention provides that such devices may record the audio file, which may be then be submitted to the server 2 as described herein.
  • the system may include a recordation means which records, in real time, an audio file that is representative of (and streamed from) a conversation between two or more people using, for example, a cellular telephone or other electronic communication devices.
  • the invention provides that the server 2 may comprise a single server or a group of servers.
  • the invention provides that the system may employ the use of cloud computing, whereby the server paradigm that is utilized to support the system of the present invention is scalable and may involve the use of different servers (and a variable number of servers) at any given time, depending on the amount of users who are utilizing the system at different time points, which are in fluid communication with the database 4 described herein.
  • the invention provides that the server 2 is configured to make one or more of the audio files accessible to persons other than the original source (or author) of the audio files.
  • the invention provides that the term "source” refers to a person who is responsible for uploading an audio file to the server 2, whereas the term “author” refers to one or more persons who contributed content to an uploaded audio file (who may, or many not, be the same person who uploads the audio file to the server 2).
  • a first user (User-1 ) 18 may submit 20 an audio file to the server 2 through the centralized website 8, which is then indexed and stored within a database 4.
  • the invention provides that if certain conditions are satisfied, as described below, the audio files that the first user (User-1 ) 18 records within and uploads to the database 4 will then be accessible by other persons.
  • a second user (User-2) 22 may retrieve 24 and listen to User-1 's audio file from the database 4 through the centralized website 8.
  • User-2 22 may publish comments 26 regarding User-1 's audio files within a graphical user interface of the website 8. Moreover, User-2 22 may publish comments 26 regarding certain limited portions of User-1 's audio files, with the relative location of such comments being quickly ascertainable within the graphical user interface of the website 8.
  • the invention provides that the comments 26 may be submitted to the server 2 through the website 8 by User-2 22, or any other persons who are granted access to User-1 's 18 original audio files.
  • the invention provides that the comments 26 will be associated with User-1 's 18 original audio files within the database 4, along with other information collected by the server 2, such as the identity of the user / person submitting the comments 26, the date and time of submission, and/or other relevant information.
  • the invention further provides that the comments 26 may be viewed by any person accessing the website 8 or, alternatively, a limited group of persons who are granted access to User-1 's 18 original audio files.
  • an author of an audio file, and/or the person (source) who submits an audio file to the server 2 may submit instructions to the server 2 which only allow certain persons to access and listen to the audio file.
  • the invention provides that such access controls may be used if a user (or author or source of an audio file) does not want an audio file to be generally available to all users of the system.
  • the invention provides that a user may access his/her account 34, by providing the server 2 with an authorized username / password through the centralized website 8. The user may then perform a search 36 of the database 4 for desired audio files, namely, audio files containing one or more search terms (key words), as described herein.
  • the invention provides that the server 2 will then generate a list of results 38, i.e., audio files that contain one or more of the queried search terms, and then display (within the centralized website 8) only those audio files to which the user is granted access 40.
  • the user may then select one or more audio files within the viewable search results for playback and/or other content review 42.
  • the server 2 upon selecting an audio file from the search results within the centralized website 8, the server 2 will display only those comments (related to the selected audio file) that the user is allowed to view 44. In other words, the individuals who publish comments regarding an audio file may further limit access to such comments to only authorized users of the system.
  • the invention provides that a user of the system, such as User-2 22, may refer 28 an audio file (with or without comments 26 associated therewith) to another user.
  • the other user e.g., User-3 30, receives notice of such referral 28, the other user may access and listen to the referred audio file and, optionally, publish comments 32 regarding User-1 's audio files within a graphical user interface of the website 8.
  • the invention provides that users of the system may share, refer, and transmit to other users a limited portion of one or more audio files.
  • the invention provides that the graphical user interface of the website 8 may include certain controls which allow a user to excise portions of an audio file and refer the same to another user, e.g., by using time coordinates associated with an audio file, from beginning to end, to identify and refer only the relevant portion of an audio file to another user of the system.
  • the system is configured to allow users to query the database 4, preferably through the website 8, for audio files that include within the content thereof one or more key words.
  • a non-limiting example of a portion of a graphical user interface showing an exemplary search function 46 is provided in Figure 4. More particularly, the invention provides that the server 2 of the system may be configured to receive one or more key words 48 that are submitted by a user of the system through the website 8, whereupon the server 2 queries the database 4 to identify all audio files which include the one or more key words 48.
  • system, and search function 46 may employ Boolean search logic, e.g., by allowing conjunctive and disjunctive searches, truncated and non-truncated forms of key words, exact match searches, and other forms of Boolean search logic.
  • the server 2 may then present the search results 50 to the user within the website 8 and, preferably, list all responsive audio files in a defined order within such graphical user interface, but only those audio files to which the user has been granted access, as described above.
  • the search results may list the audio files in chronological order based on the date (and time) 52 that each audio file was recorded and provided to the database 4.
  • the audio files may be listed in an order that is based on the number of occasions that a key word is used within each audio file.
  • the invention provides that the location of each search term that was queried may be indicated along the line 56.
  • the location of each search term may be indicated with a triangle 58, or other suitable and readily visible element.
  • the line 56 may be annotated with multiple triangles 58 (or other suitable elements), each of which may exhibit a different color that is correlated with a particular search term. More particularly, for example, if two search terms are used, the line 56 may be annotated with triangles 58 (or other suitable elements), which exhibit one of two colors, with one color representing a location of a first search term and a second color indicating the location of a second search term.
  • each line 56 that represents a relevant audio file may be annotated with one or more comments 60 posted by other users, as described herein.
  • the invention provides that such annotation of the comments 60 will preferably indicate the location within the audio file to which each comment 60 relates.
  • the invention provides that when a user places a cursor (within the centralized website 8) over or in the near vicinity of a triangle 58 (or other element indicating the location of a search term) or a comment 60, the graphical user interface of the website 8 will automatically publish a temporary text box 62 in which the search term may be viewed, along with a limited number of words before and after the search term (i.e., the context in which the search term is used).
  • the invention provides that the text box 62 will allow a user to quickly review the context in which the search term is used, which will facilitate knowing whether the audio file (or a portion thereof) may be relevant to the user and worthy of playback and/or further review.
  • the invention provides that a user may, optionally, control the number of words appearing before and after the search term in the text box 62, by entering the desired number of words in a specified field within the user's dedicated account page. This way, each user may adjust the size of the text box 62 in accordance with his / her personal preferences.
  • the invention provides that a limited number of fields within the database 4 (which are associated with a particular audio file) may be pre-filled by an audio recording device.
  • the invention provides that the title and description fields (within the database 4) that are associated with an audio file may be pre-filled with information that is sourced from the calendar entries stored within, for example, a mobile phone of the user that is submitting the audio file (through the mobile phone) to the server 2 and database 4.
  • the system will automatically query any calendar entries stored within the phone and transmit relevant information to the appropriate fields of a database 4 entry that is created for the audio file, such as the audio file title, the names of the persons who contributed to the content of the audio file, date and time of recordation, and/or other relevant information.
  • the automatically- filled data fields would be editable by the user, in order to make any necessary corrections thereto.
  • the invention provides that similar functionality may be implemented using other recording means, such as internet-mediated communication portals (which may allow the system to automatically query emails and/or calendar programs stored within a personal computer).
  • the invention provides that the system described herein will allow users to identify other users who, based on the frequency of certain key word usage, may be experts or knowledgeable regarding a particular topic.
  • the database 4 may be queried for other users who have submitted one or more audio files which include the word "golf,” with the search results being listed in the website 8 - e.g., the names (or usernames) of such users who satisfy the search criteria.
  • the invention provides that this search functionality will be useful for identifying persons who may be knowledgeable about a particular topic.
  • the search results may be listed in an order that is most relevant to the user, such as by ranking the users who use the search term most often - either relatively or absolutely - and/or based on geographical proximity to the user who initiated the search.
  • the system may further communicate with one or more social networking sites, such as Linkedln, MySpace, Facebook, and others.
  • social networking sites such as Linkedln, MySpace, Facebook, and others.
  • the system will not only list the users (usernames) who have submitted at least one audio file which includes the word "golf,” it may also query the communications (i.e., audio files stored within the server 2 and database 4) of those users' "friends" and/or "friends-of-friends,” as listed in the associated social networking sites, who have also submitted audio files to the server 2 and database 4.
  • the key word is a person's name (or social network username)
  • such functionality would allow users of the system to easily identify other users who may know, or be related to, the person identified by the key word search.
  • the audio files provided to the server 2 and database 4 by each user may be automatically queried for certain key words included therein. More particularly, the system may query each audio file to determine whether any words included therein are found in a pre-recorded list of advertising terms. If such analysis reveals that any of the words included within the audio files match any of the pre-recorded advertising terms, the server 2 may cause a relevant advertisement to be posted within the graphical user interface of the website 8 when the user accesses the website 8.
  • the server 2 may published one or more golf-related advertisements in the graphical user interface of the website 8. According to such embodiments, the invention provides that the server 2 will be in communication with one or more databases that correlate certain terms with one or more advertisements.
  • the invention provides that whether certain advertisements are posted within the website 8 may be determined not only on whether a particular user's audio file includes a certain key word, but also (1 ) the number of times that such key word is used within an audio file, (2) the number of distinct audio files provided by the same user over a period of time that includes the key word, or (3) combinations of the foregoing.
  • the server 2 may cause one or more advertisements related to golf products or golf services to be published in the website 8 - when the user visits the website 8 (with the publication of the advertisement being triggered based on the user's IP address) and/or when the user submits a valid username / password to login to the website 8.
  • the invention provides that other criteria may be employed to determine which advertisement(s) to display, such as the location in which the audio file is recorded (e.g., the geographic location may be communicated to the server 2 if a mobile device is used to capture the audio recording), the level of background noise, the quality of the audio file, the type of recording device used, and/or other information and data that may be retrieved by the server 2 regarding a user, an audio file or the contents thereof.
  • the invention provides that advertisements may be posted within the graphical user interface of the website 8 based on the key words that may be used by a particular user to query the database 4 for relevant audio files.
  • the server 2 may search for and determine if the word "golf matches any terms included within a pre-recorded list of advertising terms and, if so, the server 2 will cause one or more advertisements related to golf products or golf services to be published in the website 8.
  • systems for recording and sharing audio files are provided, which incentivize users of the system to share and publish comments regarding the audio files described herein.
  • such embodiments are designed to encourage users to distribute, and make publicly available, the audio files recorded by each user and, in the case of referrals, the audio files recorded by other users.
  • these systems will generally comprise a server 2 that is configured to receive, index, and store a plurality of audio files - which are received by the server 2 from a plurality of sources - within at least one database 4 in communication with the server 2.
  • the system will preferably make one or more of the audio files accessible to persons other than the original sources (author) of such audio files.
  • the system will preferably allow users to share, exchange, and publish comments regarding the plurality of audio files recorded within the database 4 and managed by the system.
  • the server 2 may also be configured to track the number of audio files shared by each user of the system. The invention provides that an audio file is considered "shared" when a user makes an audio file accessible to, or otherwise refers the audio file to, another user of the system.
  • the invention provides that the system may be configured to enable a user to send (such as via e-mail) to another user, directly or indirectly, a hyperlink to the website 8 or a location therein where a particular audio file may be accessed - such that the receiving user may listen to and optionally submit comments regarding the audio file.
  • the referring or sharing user may provide instructions to the server 2 that are housed within the database 4, which provide that certain audio files submitted to the server 2 by the referring or sharing user may only be accessed by another user (or set of users) specified by the sharing or referring user.
  • Such lists of authorized users, who may access a particular audio file, may also be configured and communicated to such authorized users as an invitation to access, listen to, and submit comments regarding a particular audio file.
  • the system may be configured to track the number of audio files shared in such manner by each user of the system.
  • the invention provides that the server 2 may, optionally, be configured to grant credit to each user of the system based on the number of audio files shared or referred by each user during a defined period of time.
  • the credit that is granted to each user may be redeemed for a variety of items, such as money, gift certificates, gift cards, the right to use the system without charge for a defined period of time, or other items.
  • the invention provides that such credit system will preferably encourage audio file sharing among users of the system.
  • the website 8 may include an account page for each user, which lists the amount of accumulated credit that has been awarded to each user at any given time (and, optionally, may further display credit that has been redeemed by the user of the system).
  • systems and methods for converting audio content into text are provided. More specifically, the systems and methods of the present invention will transcribe and display audio content in the form of published text, provided that the audio-to-text conversion satisfies a minimum accuracy confidence threshold. According to such embodiments, the invention provides that other non-literary symbols are preferably used to signify the presence of certain audio-to-text conversions that do not meet the predefined minimum accuracy confidence threshold.
  • the methods comprise receiving audio content within a computer server 64, and instructing the server to perform an audio content to text transcription 66 using one or more algorithms.
  • the audio content may be provided to the server via any of various ways.
  • the invention provides that the audio content is submitted to the server through a centralized website that may be accessed through a standard internet connection.
  • the invention provides that the website may be accessed, and the audio files submitted to the server, using any device that is capable of establishing an internet connection, such as using a personal computer (including tablet computers), telephone (including smart phones, PDAs, and other similar devices), meeting conference speaker phones, and other devices.
  • the invention provides that the audio content may be created by such devices and then uploaded to the server or, alternatively, the audio content may be streamed in real time (through such devices) with the audio file being created within the server and stored within a database.
  • the invention provides that the audio files that are stored within the server and database may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
  • a variety of algorithms may be employed during the transcription step, including, but not limited to, algorithms that may be used to perform speech- to-text, speech-to-phoneme, speech-to-syllable, and/or speech-to-subword conversions, as described herein.
  • Hidden Markov Model algorithms may be employed to execute the transcription.
  • the methods further comprise calculating an accuracy confidence value 68, which will be a quantitative measure of the estimated accuracy of the transcription of a word derived from the audio content into written text.
  • the methods further comprise instructing the server to display a set of results for such transcription 70 within a graphical user interface of a computing device, which may include a personal computer, smart phone, PDA, and other devices having a visual display.
  • a computing device which may include a personal computer, smart phone, PDA, and other devices having a visual display.
  • the invention provides, however, that such results will include transcribed words for only those words that meet or exceed a predefined accuracy confidence threshold 72.
  • the associated accuracy confidence value for such word will be compared to the predefined accuracy confidence threshold. If the accuracy confidence value meets or exceeds the predefined accuracy confidence threshold, the transcribed word will be published within the set of results for such transcription 72.
  • non-literary symbols include, but are not limited to, spaces (i.e., no text or symbols), punctuation marks (e.g., !, @,
  • FIG. 6 A non-limiting example of such audio-to-text conversion is illustrated in Figure 6.
  • the invention further provides that a non- literary symbol may be shown for each letter that comprises the word that does not meet or exceed the predefined accuracy confidence threshold. For example, if the non-displayed word includes five letters, the transcription results would display five consecutive non-literary symbols (to indicate the number of letters that are included in the non-displayed word).
  • a centralized website is provided in which the audio-to-text conversions may be viewed.
  • the website may include a set of controls and, particularly, a control that allows a user to quickly and easily adjust the predefined accuracy confidence threshold that is applied to a transcription (either before or after a transcription).
  • the invention provides that the website may include a sliding control, which allows a user to adjust the predefined accuracy confidence threshold up and down, while simultaneously viewing the effect that such adjustment has on the number of words transcribed and the accuracy thereof.
  • systems which comprise a computer server, a website, a database, and at least one computing device, which are configured to execute the methods of the present invention.
  • a system of the present invention will preferably include a computer server that is configured to receive audio content provided by a source device, wherein the server is further configured to perform an audio content to text transcription using one or more algorithms.
  • the systems will further include a database that is accessed and used by the server, in connection with the one or more algorithms, to perform the audio content to text transcription.
  • the systems of the present invention will comprise a computing device that includes a graphical user interface in which the server displays a set of transcription results for each word that (i) was converted into text from said audio content and (ii) meets or exceeds a predefined accuracy confidence threshold (as described herein).
  • the server is further configured to display a non-literary symbol for each word that was converted into text from said audio content, which does not meet or exceed the predefined accuracy confidence threshold.
  • methods for recording, indexing, storing, sharing, and publishing comments regarding audio files are provided, which generally comprise the use of the systems described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems for recording and sharing audio files among a plurality of users. The systems include a server that is configured to receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the audio files accessible to one or more persons - other than the original sources of such audio files. Still further, the server is configured to receive and publish comments associated with the audio files within a graphical user interface of a website. The comments may be submitted to the server through the website by persons other than the original sources of such audio files.

Description

SYSTEMS AND METHODS FOR RECORDING AND SHARING AUDIO FILES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to, and incorporates by reference, U.S. patent application serial number 12/878,014, filed September 8, 2010, and U.S. provisional patent application serial number 61/392,41 1 , filed October 12, 2010.
FIELD OF THE INVENTION
[0002] The field of the present invention relates to systems and methods for recording, indexing, storing, and sharing audio files, including, but not limited to, recorded telephone conversations, internet communications, and other audio content.
BACKGROUND OF THE INVENTION
[0003] Systems for recording and storing audio files have been available for many years and, indeed, are used by many individuals and businesses today. In addition, currently-available systems allow users to retrieve, either using a telephone or internet connection, audio files that may be stored in a database and correlated with a specific user of the system. Although these systems have become a ubiquitous part of communication (and communication management) in today's world, these systems do not efficiently capture, utilize, and make available to others, the value of the content stored within such audio files. [0004] For example, currently-available systems do not efficiently allow users to share recorded audio files with other persons and, more importantly, publish comments regarding the content of a particular audio file for a plurality of other users to view. More particularly, such currently-available systems do not efficiently allow users to share and publish comments regarding specific and limited portions of the content of a particular audio file, for a plurality of other users to review. In addition, currently-available systems do not efficiently allow users to query a large body of different audio files for content that relates to a particular topic - or rank such audio files in order of relevance to a particular topic. Still further, currently-available communication management systems fail to adequately incentivize users to share, publish, and make available to others the audio files that may be recorded within a particular database used by such systems.
[0005] As described further below, the present invention addresses many of these, and other, drawbacks that are associated with currently-available audio storage and retrieval systems.
SUMMARY OF THE INVENTION
[0006] According to certain aspects of the present invention, systems are provided for recording and sharing audio files among a plurality of users. More particularly, the systems generally comprise a server that is configured to receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the invention provides that the server is configured to make one or more of the audio files accessible to one or more persons - other than the original sources of such audio files. In other words, if certain conditions are satisfied, the audio files that a first person records within the database of the system will be accessible by other persons. Still further, according to such embodiments of the invention, the server is preferably configured to receive and publish comments associated with specific portions of the audio files within a graphical user interface of a website. The invention provides that the comments may be submitted to the server through the website by persons other than the original sources (or authors) of such audio files. The audio files that are stored within the server may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
[0007] According to additional aspects of the present invention, similar to the embodiments described above, systems for recording and sharing audio files are provided, which incentivize users of the system to share, publish, and make available to others the audio files that may be recorded within a particular database. According to such embodiments, the systems generally comprise a server that is configured to (a) receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in communication with the server; and (b) make one or more of the audio files accessible to persons other than the original sources (or authors) of such audio files. [0008] According to such embodiments, the server is also configured to track the number of audio files shared by each user of the system. The invention provides that an audio file is considered "shared" when a user makes an audio file accessible to, or otherwise refers the audio file to, another user of the system. Still further, the invention provides that the server may, optionally, be configured to grant credit to each user of the system based on the number of audio files shared by each user during a defined period of time. According to such embodiments, the credit that is granted to each user may be redeemed, for example, in exchange for the right to use the system without charge (for a defined period of time) or other forms of consideration.
[0009] According to still further embodiments of the present invention, systems and methods for converting audio content into text if, and only if, such conversion satisfies a minimum accuracy confidence threshold are provided. Furthermore, according to certain embodiments, the invention provides that other non-literary symbols are used to signify the presence of those audio-to-text conversions that do not meet the predefined minimum accuracy confidence threshold. For example, according to certain embodiments of the present invention, the methods comprise receiving audio content within a computer server, and instructing the server to perform an audio content to text transcription using one or more algorithms. The methods further comprise instructing the server to display a set of results for such transcription within a graphical user interface of a computing device for each word that (i) was converted into text from the audio content and (ii) meets or exceeds a predefined accuracy confidence threshold. Still further, according to such embodiments, the methods comprise instructing the server to display a non-literary symbol for each word that was converted into text from the audio content, but which does not meet or exceed the predefined accuracy confidence threshold. The invention further provides that a non-literary symbol may be shown for each letter that comprises the word that does not meet or exceed the predefined accuracy confidence threshold. For example, if the non-displayed word includes five letters, the transcription results would display five consecutive non-literary symbols (to indicate the number of letters that are included in the non-displayed word).
[0010] According to yet further aspects of the present invention, methods for recording, indexing, storing, and sharing audio files are provided, which generally comprise the use of the systems described herein.
[0011] The above-mentioned and additional features of the present invention are further illustrated in the Detailed Description contained herein.
BRIEF DESCRIPTION OF THE FIGURES
[0012] FIGURE 1 is a diagram showing the different components of the systems described herein.
[0013] FIGURE 2 is a diagram showing the interactive nature and audio file sharing capability of the systems described herein. [0014] FIGURE 3 is a flow chart illustrating the controls provided by the systems described herein, which allow only specified users to access certain audio files and/or comments related thereto within the centralized website.
[0015] FIGURE 4 is a diagram showing certain non-limiting components of an exemplary graphical user interface in which a user may query the content of a plurality of audio files, identify those audio files which include a certain key word (or set of key words) that the user defines, and quickly view the context in which such key word is used in one or more audio files.
[0016] FIGURE 5 is a flow diagram that summarizes certain audio-to-text methods of the present invention.
[001] FIGURE 6 is a non-limiting example of certain output of an audio-to- text conversion using the methods and systems of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0017] The following will describe, in detail, several preferred embodiments of the present invention. These embodiments are provided by way of explanation only, and thus, should not unduly restrict the scope of the invention. In fact, those of ordinary skill in the art will appreciate upon reading the present specification and viewing the present drawings that the invention teaches many variations and modifications, and that numerous variations of the invention may be employed, used and made without departing from the scope and spirit of the invention. [0018] According to certain preferred embodiments, the present invention generally encompasses systems for recording and sharing audio files among a plurality of users. Referring to Figure 1 , the systems generally comprise a server 2 that is configured to receive, index, and store a plurality of audio files, which are received by the server 2 from a plurality of sources, within at least one database 4 in communication with the server 2. The invention provides that the database 4 may reside within the server 2 or, alternatively, may exist outside of the server 4 while being in communication therewith via a network connection.
[0019] The audio files may be indexed 6 and categorized within the database 4 based on author, time of recordation, geographical location of origin, IP addresses, language, key word usage, combinations of the foregoing, and other factors. The invention provides that the audio files are preferably submitted to the server 2 through a centralized website 8 that may be accessed through a standard internet connection 10. The invention provides that the website 8 may be accessed, and the audio files submitted to the server 2, using any device that is capable of establishing an internet connection 10, such as using a personal computer 12 (including tablet computers), telephone 14 (including smart phones, PDAs, and other similar devices), meeting conference speaker phones 16, and other devices. The invention provides that the audio files may be created by such devices and then uploaded to the server 2 or, alternatively, the audio content may be streamed in real time (through such devices) with the audio file being created (and then indexed and stored) within the server 2 and database 4. In addition, the invention provides that the audio files that are stored within the server 2 and database 4 may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
[0020] The invention provides that the server 2 may receive and manage audio files in many ways, such that the contents thereof may be deciphered and used as described herein. For example, the invention provides that upon an audio file being submitted to the server 2, the server 2 may perform a speech-to- text, speech-to-phoneme, speech-to-syllable, and/or speech-to-subword conversion, and then store an output of such conversion within the database 4. This way, the content of each audio file may be intelligently queried and used in the manner described herein, such as for querying such content for key words.
[0021] The invention provides that when reference is made to "audio files that contain a key word," and similar phrases, it should be understood that such phrase encompasses a text file that contains the key word, with the text file being derived from an audio file, as explained above. In other words, for example, after performing a speech-to-text conversion, and storing such text within the database 4, if a search is performed using the system of the present invention for audio files that contain a particular key word, the system will actually search the converted text forms of such audio files. Upon identifying any text forms of such audio files that contain the queried key word, it will be inferred that the audio file that corresponds with the searched text file will actually contain the key word. [0022] The audio file that is provided to the server 2 and database 4 may represent and be derived from, for example, a recorded telephone conversation, VoIP conversation, group meeting (through a speaker phone), speech or lecture (through a microphone), deposition or court room testimony (through a court reporter's microphone and/or transcript data entry), and other audio sources. The invention provides that the system described herein is preferably compatible with, and capable of receiving audio files from, any devices that may be used among persons to communicate, to transmit communications, or to record communications. In general, the invention provides that such devices may record the audio file, which may be then be submitted to the server 2 as described herein. In other embodiments, the invention provides that the system may include a recordation means which records, in real time, an audio file that is representative of (and streamed from) a conversation between two or more people using, for example, a cellular telephone or other electronic communication devices.
[0023] When the present specification refers to the server 2, the invention provides that the server 2 may comprise a single server or a group of servers. In addition, the invention provides that the system may employ the use of cloud computing, whereby the server paradigm that is utilized to support the system of the present invention is scalable and may involve the use of different servers (and a variable number of servers) at any given time, depending on the amount of users who are utilizing the system at different time points, which are in fluid communication with the database 4 described herein. [0024] According to certain preferred embodiments, the invention provides that the server 2 is configured to make one or more of the audio files accessible to persons other than the original source (or author) of the audio files. The invention provides that the term "source" refers to a person who is responsible for uploading an audio file to the server 2, whereas the term "author" refers to one or more persons who contributed content to an uploaded audio file (who may, or many not, be the same person who uploads the audio file to the server 2). For example, referring now to Figure 2, a first user (User-1 ) 18 may submit 20 an audio file to the server 2 through the centralized website 8, which is then indexed and stored within a database 4. The invention provides that if certain conditions are satisfied, as described below, the audio files that the first user (User-1 ) 18 records within and uploads to the database 4 will then be accessible by other persons. For example, a second user (User-2) 22 may retrieve 24 and listen to User-1 's audio file from the database 4 through the centralized website 8.
[0025] Upon retrieving and accessing User-1 's audio file, User-2 22 may publish comments 26 regarding User-1 's audio files within a graphical user interface of the website 8. Moreover, User-2 22 may publish comments 26 regarding certain limited portions of User-1 's audio files, with the relative location of such comments being quickly ascertainable within the graphical user interface of the website 8. The invention provides that the comments 26 may be submitted to the server 2 through the website 8 by User-2 22, or any other persons who are granted access to User-1 's 18 original audio files. The invention provides that the comments 26 will be associated with User-1 's 18 original audio files within the database 4, along with other information collected by the server 2, such as the identity of the user / person submitting the comments 26, the date and time of submission, and/or other relevant information.
[0026] The invention further provides that the comments 26 may be viewed by any person accessing the website 8 or, alternatively, a limited group of persons who are granted access to User-1 's 18 original audio files. For example, an author of an audio file, and/or the person (source) who submits an audio file to the server 2, may submit instructions to the server 2 which only allow certain persons to access and listen to the audio file. The invention provides that such access controls may be used if a user (or author or source of an audio file) does not want an audio file to be generally available to all users of the system.
[0027] Referring to Figure 3, for example, the invention provides that a user may access his/her account 34, by providing the server 2 with an authorized username / password through the centralized website 8. The user may then perform a search 36 of the database 4 for desired audio files, namely, audio files containing one or more search terms (key words), as described herein. The invention provides that the server 2 will then generate a list of results 38, i.e., audio files that contain one or more of the queried search terms, and then display (within the centralized website 8) only those audio files to which the user is granted access 40. The user may then select one or more audio files within the viewable search results for playback and/or other content review 42. In addition, upon selecting an audio file from the search results within the centralized website 8, the server 2 will display only those comments (related to the selected audio file) that the user is allowed to view 44. In other words, the individuals who publish comments regarding an audio file may further limit access to such comments to only authorized users of the system.
[0028] Referring now to Figure 2, according to certain preferred embodiments, the invention provides that a user of the system, such as User-2 22, may refer 28 an audio file (with or without comments 26 associated therewith) to another user. When the other user, e.g., User-3 30, receives notice of such referral 28, the other user may access and listen to the referred audio file and, optionally, publish comments 32 regarding User-1 's audio files within a graphical user interface of the website 8. In addition, the invention provides that users of the system may share, refer, and transmit to other users a limited portion of one or more audio files. For example, if a first user determines that a second user may find a particular portion of an audio file to be of interest, the first user may refer only the interesting portion of that audio file to the second user. According to such embodiments, the invention provides that the graphical user interface of the website 8 may include certain controls which allow a user to excise portions of an audio file and refer the same to another user, e.g., by using time coordinates associated with an audio file, from beginning to end, to identify and refer only the relevant portion of an audio file to another user of the system.
[0029] As mentioned above, according to certain preferred embodiments of the present invention, the system is configured to allow users to query the database 4, preferably through the website 8, for audio files that include within the content thereof one or more key words. A non-limiting example of a portion of a graphical user interface showing an exemplary search function 46 is provided in Figure 4. More particularly, the invention provides that the server 2 of the system may be configured to receive one or more key words 48 that are submitted by a user of the system through the website 8, whereupon the server 2 queries the database 4 to identify all audio files which include the one or more key words 48. The invention provides that the system, and search function 46, may employ Boolean search logic, e.g., by allowing conjunctive and disjunctive searches, truncated and non-truncated forms of key words, exact match searches, and other forms of Boolean search logic.
[0030] The server 2 may then present the search results 50 to the user within the website 8 and, preferably, list all responsive audio files in a defined order within such graphical user interface, but only those audio files to which the user has been granted access, as described above. For example, the search results may list the audio files in chronological order based on the date (and time) 52 that each audio file was recorded and provided to the database 4. In other embodiments, the audio files may be listed in an order that is based on the number of occasions that a key word is used within each audio file. These criteria, combinations thereof, or other criteria may be employed to list the responsive audio files in a manner that will be most relevant to the user. Still further, the invention provides that a user may specify the criteria that should be used to rank (and sort) the search results, with such criteria preferably being selected from a predefined list 54. [0031] Still referring to Figure 4, each audio file included within a set of search results will preferably be graphically portrayed, such as in the form of a line 56 that begins at time equals zero (t = 0) and ends at a point when the audio file is terminated. For example, if the total length of an audio file is five minutes, the left side of the line will be correlated with t = 0 of the audio file, whereas the right side of the line will be correlated with t = 5 minutes of the audio file. Still further, the invention provides that the location of each search term that was queried may be indicated along the line 56. For example, the location of each search term may be indicated with a triangle 58, or other suitable and readily visible element. The invention further provides that if multiple search terms were used in the search, the line 56 may be annotated with multiple triangles 58 (or other suitable elements), each of which may exhibit a different color that is correlated with a particular search term. More particularly, for example, if two search terms are used, the line 56 may be annotated with triangles 58 (or other suitable elements), which exhibit one of two colors, with one color representing a location of a first search term and a second color indicating the location of a second search term.
[0032] The invention further provides that each line 56 that represents a relevant audio file may be annotated with one or more comments 60 posted by other users, as described herein. The invention provides that such annotation of the comments 60 will preferably indicate the location within the audio file to which each comment 60 relates. According to yet further embodiments, the invention provides that when a user places a cursor (within the centralized website 8) over or in the near vicinity of a triangle 58 (or other element indicating the location of a search term) or a comment 60, the graphical user interface of the website 8 will automatically publish a temporary text box 62 in which the search term may be viewed, along with a limited number of words before and after the search term (i.e., the context in which the search term is used).
[0033] The invention provides that the text box 62 will allow a user to quickly review the context in which the search term is used, which will facilitate knowing whether the audio file (or a portion thereof) may be relevant to the user and worthy of playback and/or further review. According to certain embodiments, the invention provides that a user may, optionally, control the number of words appearing before and after the search term in the text box 62, by entering the desired number of words in a specified field within the user's dedicated account page. This way, each user may adjust the size of the text box 62 in accordance with his / her personal preferences.
[0034] According to certain embodiments, the invention provides that a limited number of fields within the database 4 (which are associated with a particular audio file) may be pre-filled by an audio recording device. For example, the invention provides that the title and description fields (within the database 4) that are associated with an audio file may be pre-filled with information that is sourced from the calendar entries stored within, for example, a mobile phone of the user that is submitting the audio file (through the mobile phone) to the server 2 and database 4. For purposes of illustration, when the user submits an audio file to the server 2 and database 4 through a mobile phone, the system will automatically query any calendar entries stored within the phone and transmit relevant information to the appropriate fields of a database 4 entry that is created for the audio file, such as the audio file title, the names of the persons who contributed to the content of the audio file, date and time of recordation, and/or other relevant information. According to such embodiments, the automatically- filled data fields would be editable by the user, in order to make any necessary corrections thereto. The invention provides that similar functionality may be implemented using other recording means, such as internet-mediated communication portals (which may allow the system to automatically query emails and/or calendar programs stored within a personal computer).
[0035] The invention provides that the system described herein will allow users to identify other users who, based on the frequency of certain key word usage, may be experts or knowledgeable regarding a particular topic. For example, the database 4 may be queried for other users who have submitted one or more audio files which include the word "golf," with the search results being listed in the website 8 - e.g., the names (or usernames) of such users who satisfy the search criteria. The invention provides that this search functionality will be useful for identifying persons who may be knowledgeable about a particular topic. The search results may be listed in an order that is most relevant to the user, such as by ranking the users who use the search term most often - either relatively or absolutely - and/or based on geographical proximity to the user who initiated the search.
[0036] According to certain embodiments, the system may further communicate with one or more social networking sites, such as Linkedln, MySpace, Facebook, and others. Referring to the example above, when a user submits a key word search as described above, the system will not only list the users (usernames) who have submitted at least one audio file which includes the word "golf," it may also query the communications (i.e., audio files stored within the server 2 and database 4) of those users' "friends" and/or "friends-of-friends," as listed in the associated social networking sites, who have also submitted audio files to the server 2 and database 4. This way, a user may quickly identify a group of people who may be knowledgeable about a particular topic. Still further, if the key word is a person's name (or social network username), such functionality would allow users of the system to easily identify other users who may know, or be related to, the person identified by the key word search.
[0037] According to further embodiments of the present invention, the audio files provided to the server 2 and database 4 by each user may be automatically queried for certain key words included therein. More particularly, the system may query each audio file to determine whether any words included therein are found in a pre-recorded list of advertising terms. If such analysis reveals that any of the words included within the audio files match any of the pre-recorded advertising terms, the server 2 may cause a relevant advertisement to be posted within the graphical user interface of the website 8 when the user accesses the website 8. Referring to the example above, if a user uploads an audio file to the database 4 which includes (in the transcript of the audio content thereof) the word "golf," the server 2 may published one or more golf-related advertisements in the graphical user interface of the website 8. According to such embodiments, the invention provides that the server 2 will be in communication with one or more databases that correlate certain terms with one or more advertisements.
[0038] In addition, the invention provides that whether certain advertisements are posted within the website 8 may be determined not only on whether a particular user's audio file includes a certain key word, but also (1 ) the number of times that such key word is used within an audio file, (2) the number of distinct audio files provided by the same user over a period of time that includes the key word, or (3) combinations of the foregoing. For example, if the system detects that a particular user has submitted a certain minimum number of audio files to the database 4 which include the word "golf (and not just a single audio file that contains such term), the server 2 may cause one or more advertisements related to golf products or golf services to be published in the website 8 - when the user visits the website 8 (with the publication of the advertisement being triggered based on the user's IP address) and/or when the user submits a valid username / password to login to the website 8. In addition, the invention provides that other criteria may be employed to determine which advertisement(s) to display, such as the location in which the audio file is recorded (e.g., the geographic location may be communicated to the server 2 if a mobile device is used to capture the audio recording), the level of background noise, the quality of the audio file, the type of recording device used, and/or other information and data that may be retrieved by the server 2 regarding a user, an audio file or the contents thereof. [0039] Still further, the invention provides that advertisements may be posted within the graphical user interface of the website 8 based on the key words that may be used by a particular user to query the database 4 for relevant audio files. For example, using the example described above, if a user queries the database 4 for audio files that include the word "golf," the server 2 may search for and determine if the word "golf matches any terms included within a pre-recorded list of advertising terms and, if so, the server 2 will cause one or more advertisements related to golf products or golf services to be published in the website 8.
[0040] According to additional and related embodiments of the present invention, similar to the embodiments described above, systems for recording and sharing audio files are provided, which incentivize users of the system to share and publish comments regarding the audio files described herein. In other words, such embodiments are designed to encourage users to distribute, and make publicly available, the audio files recorded by each user and, in the case of referrals, the audio files recorded by other users. For example, as described above, these systems will generally comprise a server 2 that is configured to receive, index, and store a plurality of audio files - which are received by the server 2 from a plurality of sources - within at least one database 4 in communication with the server 2.
[0041] In addition, as described above, the system will preferably make one or more of the audio files accessible to persons other than the original sources (author) of such audio files. In other words, the system will preferably allow users to share, exchange, and publish comments regarding the plurality of audio files recorded within the database 4 and managed by the system. According to such embodiments, the server 2 may also be configured to track the number of audio files shared by each user of the system. The invention provides that an audio file is considered "shared" when a user makes an audio file accessible to, or otherwise refers the audio file to, another user of the system.
[0042] For example, the invention provides that the system may be configured to enable a user to send (such as via e-mail) to another user, directly or indirectly, a hyperlink to the website 8 or a location therein where a particular audio file may be accessed - such that the receiving user may listen to and optionally submit comments regarding the audio file. In other embodiments, the referring or sharing user may provide instructions to the server 2 that are housed within the database 4, which provide that certain audio files submitted to the server 2 by the referring or sharing user may only be accessed by another user (or set of users) specified by the sharing or referring user. Such lists of authorized users, who may access a particular audio file, may also be configured and communicated to such authorized users as an invitation to access, listen to, and submit comments regarding a particular audio file. As described above, the system may be configured to track the number of audio files shared in such manner by each user of the system.
[0043] Still further, according to such embodiments, the invention provides that the server 2 may, optionally, be configured to grant credit to each user of the system based on the number of audio files shared or referred by each user during a defined period of time. According to such embodiments, the credit that is granted to each user may be redeemed for a variety of items, such as money, gift certificates, gift cards, the right to use the system without charge for a defined period of time, or other items. The invention provides that such credit system will preferably encourage audio file sharing among users of the system. The invention provides that the website 8 may include an account page for each user, which lists the amount of accumulated credit that has been awarded to each user at any given time (and, optionally, may further display credit that has been redeemed by the user of the system).
[0044] According to yet further embodiments of the present invention, systems and methods for converting audio content into text are provided. More specifically, the systems and methods of the present invention will transcribe and display audio content in the form of published text, provided that the audio-to-text conversion satisfies a minimum accuracy confidence threshold. According to such embodiments, the invention provides that other non-literary symbols are preferably used to signify the presence of certain audio-to-text conversions that do not meet the predefined minimum accuracy confidence threshold.
[0045] Referring now to Figure 5, the methods comprise receiving audio content within a computer server 64, and instructing the server to perform an audio content to text transcription 66 using one or more algorithms. The audio content may be provided to the server via any of various ways. According to certain embodiments, the invention provides that the audio content is submitted to the server through a centralized website that may be accessed through a standard internet connection. The invention provides that the website may be accessed, and the audio files submitted to the server, using any device that is capable of establishing an internet connection, such as using a personal computer (including tablet computers), telephone (including smart phones, PDAs, and other similar devices), meeting conference speaker phones, and other devices. The invention provides that the audio content may be created by such devices and then uploaded to the server or, alternatively, the audio content may be streamed in real time (through such devices) with the audio file being created within the server and stored within a database. In addition, the invention provides that the audio files that are stored within the server and database may be derived from audio-only content (e.g., a telephone conversation) or, in certain cases, may comprise audio content derived from a video file (which has an audio component embedded therein).
[0046] A variety of algorithms may be employed during the transcription step, including, but not limited to, algorithms that may be used to perform speech- to-text, speech-to-phoneme, speech-to-syllable, and/or speech-to-subword conversions, as described herein. In certain embodiments, Hidden Markov Model algorithms may be employed to execute the transcription. The methods further comprise calculating an accuracy confidence value 68, which will be a quantitative measure of the estimated accuracy of the transcription of a word derived from the audio content into written text.
[0047] The methods further comprise instructing the server to display a set of results for such transcription 70 within a graphical user interface of a computing device, which may include a personal computer, smart phone, PDA, and other devices having a visual display. The invention provides, however, that such results will include transcribed words for only those words that meet or exceed a predefined accuracy confidence threshold 72. In other words, for each word that is transcribed from the audio content, the associated accuracy confidence value for such word will be compared to the predefined accuracy confidence threshold. If the accuracy confidence value meets or exceeds the predefined accuracy confidence threshold, the transcribed word will be published within the set of results for such transcription 72.
[0048] If the accuracy confidence value does not meet or exceed the predefined accuracy confidence threshold, the transcribed word will not be published within the set of results for such transcription and, in its place, a non- literary symbol will be shown 74. Examples of non-literary symbols include, but are not limited to, spaces (i.e., no text or symbols), punctuation marks (e.g., !, @,
#, $, *, -, etc.), underscores (e.g., ), or other symbols that are not included within the 26-letter English alphabet. A non-limiting example of such audio-to-text conversion is illustrated in Figure 6. The invention further provides that a non- literary symbol may be shown for each letter that comprises the word that does not meet or exceed the predefined accuracy confidence threshold. For example, if the non-displayed word includes five letters, the transcription results would display five consecutive non-literary symbols (to indicate the number of letters that are included in the non-displayed word). [0049] In certain preferred embodiments of the invention, a centralized website is provided in which the audio-to-text conversions may be viewed. In such case, the website may include a set of controls and, particularly, a control that allows a user to quickly and easily adjust the predefined accuracy confidence threshold that is applied to a transcription (either before or after a transcription). For example, the invention provides that the website may include a sliding control, which allows a user to adjust the predefined accuracy confidence threshold up and down, while simultaneously viewing the effect that such adjustment has on the number of words transcribed and the accuracy thereof.
[0050] According to additional embodiments of the invention, systems are provided which comprise a computer server, a website, a database, and at least one computing device, which are configured to execute the methods of the present invention. For example, a system of the present invention will preferably include a computer server that is configured to receive audio content provided by a source device, wherein the server is further configured to perform an audio content to text transcription using one or more algorithms. The systems will further include a database that is accessed and used by the server, in connection with the one or more algorithms, to perform the audio content to text transcription. Still further, the systems of the present invention will comprise a computing device that includes a graphical user interface in which the server displays a set of transcription results for each word that (i) was converted into text from said audio content and (ii) meets or exceeds a predefined accuracy confidence threshold (as described herein). The server is further configured to display a non-literary symbol for each word that was converted into text from said audio content, which does not meet or exceed the predefined accuracy confidence threshold.
[0051] According to still further embodiments of the present invention, methods for recording, indexing, storing, sharing, and publishing comments regarding audio files are provided, which generally comprise the use of the systems described herein.
[0052] The many aspects and benefits of the invention are apparent from the detailed description, and thus, it is intended for the following claims to cover all such aspects and benefits of the invention which fall within the scope and spirit of the invention. In addition, because numerous modifications and variations will be obvious and readily occur to those skilled in the art, the claims should not be construed to limit the invention to the exact construction and operation illustrated and described herein. Accordingly, all suitable modifications and equivalents should be understood to fall within the scope of the invention as claimed herein.

Claims

What is claimed is:
1 . A system for recording and sharing audio files, which comprises a server that is configured to:
(a) receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in
communication with the server;
(b) make one or more of the audio files accessible to persons other than the sources of such audio files; and
(c) receive and publish comments associated with the audio files within a graphical user interface of a website, wherein the comments are submitted to the server through the website by the persons other than the sources of such audio files.
2. The system of claim 1 , wherein the server is further configured to:
(a) receive a key word that is submitted by a user of the system through the website, whereupon the server queries the database to identify all audio files which include the key word; and
(b) list all audio files that include the key word in a defined order within the graphical user interface of the website.
3. The system of claim 2, wherein the defined order may (a) list the audio files in chronological order based on a date of recordation in the database for each audio file, (b) list the audio files based on a number of occasions that the key word is used in each audio file, or (c) combinations of the foregoing.
4. The system of claim 3, wherein the audio files provided to the server by each user are automatically queried; words included within the audio files are compared to a list of pre-recorded advertising terms; and, if any of the words included within the audio files match any of the pre-recorded advertising terms, an advertisement is posted within the graphical user interface of the website when the user accesses the website.
5. The system of claim 3, wherein upon an audio file being submitted to the server, the server performs a speech-to-text, speech-to-phoneme, speech-to- syllable, or speech-to-subword conversion and stores an output of such
conversion within the database.
6. The system of claim 5, wherein an audio file may represent a recorded telephone conversation, VoIP conversation, group meeting, speech, or lecture.
7. The system of claim 6, wherein the website comprises a graphical user interface that portrays a beginning and an end of each audio file, and a location of each key word contained therein.
8. The system of claim 7, wherein the website is configured to display a text box in which a key word and surrounding context is shown upon placing a cursor over an element that indicates the location of a key word contained in the audio file.
9. A system for recording and sharing audio files, which comprises a server that is configured to:
(a) receive, index, and store a plurality of audio files, which are received by the server from a plurality of sources, within at least one database in
communication with the server;
(b) make one or more of the audio files accessible to persons other than the sources of such audio files;
(c) track a number of audio files shared by each user of the system, wherein an audio file is shared when a user makes the audio file accessible to another user of the system; and
(d) grant credit to each user of the system based on the number of audio files shared during a defined period of time by each user.
10. The system of claim 9, wherein a pre-defined amount of credit affords a user an ability to use the system without charge for a defined period of time.
1 1 . The system of claim 10, wherein the server is further configured to: (a) receive a key word that is submitted by a user of the system through the website, whereupon the server queries the database to identify all audio files which include the key word; and
(b) list all audio files in a defined order within the graphical user interface of the website.
12. The system of claim 1 1 , wherein the defined order may (a) list the audio files in chronological order based on a date of recordation for each audio file, (b) list the audio files based on a number of occasions that the key word is used in each audio file, or (c) combinations of the foregoing.
13. The system of claim 12, wherein the audio files provided to the server by each user are automatically queried; words included within the audio files are compared to a list of pre-recorded advertising terms; and, if any of the words included within the audio files match any of the pre-recorded advertising terms, an advertisement is posted within the graphical user interface of the website when the user accesses the website.
14. The system of claim 13, wherein upon an audio file being submitted to the server, the server performs a speech-to-text, speech-to-phoneme, speech-to- syllable, or speech-to-subword conversion and stores an output of such conversion within the database.
15. The system of claim 14, wherein an audio file may represent a recorded telephone conversation, VoIP conversation, group meeting, speech, or lecture.
16. A method for recording and sharing audio files, which comprises:
(a) receiving within a server a plurality of audio files, which are provided to the server from a plurality of sources, and storing and indexing said audio files within at least one database in communication with the server;
(b) allowing one or more of the audio files to be accessible to persons other than the sources of such audio files; and
(c) receiving and publishing comments associated with the audio files within a graphical user interface of a website, wherein the comments are submitted to the server through the website by the persons other than the sources of such audio files.
17. The method of claim 16, which further comprises:
(a) receiving, within the server, a key word that is submitted by a user of the website, whereupon the server queries the database to identify all audio files which include the key word; and
(b) listing all audio files that include the key word in a defined order within the graphical user interface of the website.
18. The method of claim 17, which further comprises instructing the server to perform a speech-to-text, speech-to-phoneme, speech-to-syllable, or speech-to- subword conversion of each audio file uploaded to the server, and storing an output of such conversion within the database.
19. The method of claim 18, which further comprises portraying a beginning and an end of each audio file that comprises a key word within a graphical user interface of the website, and a location of each key word within each such audio file.
20. The method of claim 19, which further comprises displaying a text box in which a key word and surrounding context is shown upon placing a cursor over an element that indicates the location of a key word contained in each audio file.
PCT/US2010/053084 2010-09-08 2010-10-18 Systems and methods for recording and sharing audio files WO2012033505A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US12/878,014 US20110072350A1 (en) 2009-09-21 2010-09-08 Systems and methods for recording and sharing audio files
US12/878,014 2010-09-08
US39241110P 2010-10-12 2010-10-12
US61/392,411 2010-10-12

Publications (1)

Publication Number Publication Date
WO2012033505A1 true WO2012033505A1 (en) 2012-03-15

Family

ID=45810916

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/053084 WO2012033505A1 (en) 2010-09-08 2010-10-18 Systems and methods for recording and sharing audio files

Country Status (1)

Country Link
WO (1) WO2012033505A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014105912A1 (en) * 2012-12-29 2014-07-03 Genesys Telecommunications Laboratories Inc. Fast out-of-vocabulary search in automatic speech recognition systems
CN113688422A (en) * 2021-08-26 2021-11-23 上海明略人工智能(集团)有限公司 Method and device for checking recording data, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1752925A1 (en) * 2005-07-20 2007-02-14 Siemens Aktiengesellschaft Method and system for distribution of digital protected content data via a peer-to-peer data network
US20070233551A1 (en) * 2000-02-29 2007-10-04 Ebay Inc. Method and system for harvesting feedback and comments regarding multiple items from users of a network-based transaction facility
US20070265855A1 (en) * 2006-05-09 2007-11-15 Nokia Corporation mCARD USED FOR SHARING MEDIA-RELATED INFORMATION
KR20080019981A (en) * 2006-08-30 2008-03-05 (주)컴스타 Reputation management method for reliable peer to peer services
US20080201361A1 (en) * 2007-02-16 2008-08-21 Alexander Castro Targeted insertion of an audio - video advertising into a multimedia object
US20090082111A1 (en) * 2007-04-06 2009-03-26 Smith Michael J System and method for connecting users based on common interests, such as shared interests of representations of professional athletes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233551A1 (en) * 2000-02-29 2007-10-04 Ebay Inc. Method and system for harvesting feedback and comments regarding multiple items from users of a network-based transaction facility
EP1752925A1 (en) * 2005-07-20 2007-02-14 Siemens Aktiengesellschaft Method and system for distribution of digital protected content data via a peer-to-peer data network
US20070265855A1 (en) * 2006-05-09 2007-11-15 Nokia Corporation mCARD USED FOR SHARING MEDIA-RELATED INFORMATION
KR20080019981A (en) * 2006-08-30 2008-03-05 (주)컴스타 Reputation management method for reliable peer to peer services
US20080201361A1 (en) * 2007-02-16 2008-08-21 Alexander Castro Targeted insertion of an audio - video advertising into a multimedia object
US20090082111A1 (en) * 2007-04-06 2009-03-26 Smith Michael J System and method for connecting users based on common interests, such as shared interests of representations of professional athletes

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014105912A1 (en) * 2012-12-29 2014-07-03 Genesys Telecommunications Laboratories Inc. Fast out-of-vocabulary search in automatic speech recognition systems
US9542936B2 (en) 2012-12-29 2017-01-10 Genesys Telecommunications Laboratories, Inc. Fast out-of-vocabulary search in automatic speech recognition systems
US10290301B2 (en) 2012-12-29 2019-05-14 Genesys Telecommunications Laboratories, Inc. Fast out-of-vocabulary search in automatic speech recognition systems
CN113688422A (en) * 2021-08-26 2021-11-23 上海明略人工智能(集团)有限公司 Method and device for checking recording data, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20120029918A1 (en) Systems and methods for recording, searching, and sharing spoken content in media files
US20110072350A1 (en) Systems and methods for recording and sharing audio files
US10146869B2 (en) Systems and methods for organizing and analyzing audio content derived from media files
US11552916B2 (en) Indexing and searching content behind links presented in a communication
US20210294825A1 (en) System and method for context enhanced mapping
US9858348B1 (en) System and method for presentation of media related to a context
US10275485B2 (en) Retrieving context from previous sessions
US9183291B2 (en) Mobile content capture and discovery system based on augmented user identity
US20130138438A1 (en) Systems and methods for capturing, publishing, and utilizing metadata that are associated with media files
US8200757B2 (en) Semantic note taking system
US20090106307A1 (en) System of a knowledge management and networking environment and method for providing advanced functions therefor
US11080287B2 (en) Methods, systems and techniques for ranking blended content retrieved from multiple disparate content sources
US20130311181A1 (en) Systems and methods for identifying concepts and keywords from spoken words in text, audio, and video content
US20170097966A1 (en) Method and system for updating an intent space and estimating intent based on an intent space
CN103403705A (en) Loading a mobile computing device with media files
US11232522B2 (en) Methods, systems and techniques for blending online content from multiple disparate content sources including a personal content source or a semi-personal content source
US20130138637A1 (en) Systems and methods for ranking media files
US11836169B2 (en) Methods, systems and techniques for providing search query suggestions based on non-personal data and user personal data according to availability of user personal data
US20130124531A1 (en) Systems for extracting relevant and frequent key words from texts and their presentation in an auto-complete function of a search service
WO2012033505A1 (en) Systems and methods for recording and sharing audio files
US9142216B1 (en) Systems and methods for organizing and analyzing audio content derived from media files
EP2201519A2 (en) System of a knowledge management and networking environment and method for providing advanced functions therefor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10857086

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10857086

Country of ref document: EP

Kind code of ref document: A1