US12033619B2 - Intelligent media transcription - Google Patents
Intelligent media transcription Download PDFInfo
- Publication number
- US12033619B2 US12033619B2 US17/095,797 US202017095797A US12033619B2 US 12033619 B2 US12033619 B2 US 12033619B2 US 202017095797 A US202017095797 A US 202017095797A US 12033619 B2 US12033619 B2 US 12033619B2
- Authority
- US
- United States
- Prior art keywords
- transcription
- media
- features
- user
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/003—Repetitive work cycles; Sequence of movements
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/02—Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the exemplary embodiments relate generally to digital media, and more particularly to transcribing digital media.
- the exemplary embodiments disclose a method, a computer program product, and a computer system for transcribing media.
- the exemplary embodiments may include collecting media, extracting one or more features from the media, and transcribing the media based on the extracted one or more features and one or more models.
- FIG. 1 depicts an exemplary schematic diagram of an intelligent transcription system 100 , in accordance with the exemplary embodiments.
- FIG. 3 depicts an exemplary flowchart illustrating the operations of an intelligent transcription analyzer 134 of the intelligent transcription system 100 in transcribing media, in accordance with the exemplary embodiments.
- FIG. 4 depicts an exemplary block diagram depicting the operations of an intelligent transcription analyzer 134 of the intelligent transcription system 100 in synchronizing the presentation of a video with the video's transcription.
- FIG. 5 depicts an exemplary block diagram depicting the hardware components of the intelligent transcription system 100 of FIG. 1 , in accordance with the exemplary embodiments.
- FIG. 6 depicts a cloud computing environment, in accordance with the exemplary embodiments.
- FIG. 7 depicts abstraction model layers, in accordance with the exemplary embodiments.
- references in the specification to “one embodiment,” “an embodiment,” “an exemplary embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to implement such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- the network 108 may be a communication channel capable of transferring data between connected devices. Accordingly, the components of the intelligent transcription system 100 may represent network components or network devices interconnected via the network 108 .
- the network 108 may be the Internet, representing a worldwide collection of networks and gateways to support communications between devices connected to the Internet. Moreover, the network 108 may utilize various types of connections such as wired, wireless, fiber optic, etc. which may be implemented as an intranet network, a local area network (LAN), a wide area network (WAN), or a combination thereof.
- the network 108 may be a Bluetooth network, a Wi-Fi network, or a combination thereof.
- the network 108 may be a telecommunications network used to facilitate telephone calls between two or more parties comprising a landline network, a wireless network, a closed network, a satellite network, or a combination thereof.
- the network 108 may represent any combination of connections and protocols that will support communications between connected devices.
- the intelligent transcription server 130 may include one or more intelligent transcription models 132 and an intelligent transcription analyzer 134 , and may act as a server in a client-server relationship with the intelligent transcription client 122 .
- the intelligent transcription server 130 may be an enterprise server, a laptop computer, a notebook, a tablet computer, a netbook computer, a PC, a desktop computer, a server, a PDA, a rotary phone, a touchtone phone, a smart phone, a mobile phone, a virtual device, a thin client, an IoT device, or any other electronic device or computing system capable of receiving and sending data to and from other computing devices.
- the intelligent transcription models 132 may be one or more algorithms modelling a correlation between one or more features detected by the sensors 124 and a media transcription style, media transcription, or both.
- the intelligent transcription models 132 may be generated using machine learning methods, such as neural networks, deep learning, hierarchical learning, Gaussian Mixture modelling, Hidden Markov modelling, and K-Means, K-Medoids, or Fuzzy C-Means learning, etc., and may model a likelihood of one or more features being indicative of an appropriate media transcription style, media transcription, or both.
- such features may include speech features such as topics, importance, vocabulary, frequency, tones, moods, etc.
- Such features may additionally include gestural features such as pointing, waving, facial expressions, eye direction/movement, etc.
- the intelligent transcription models 132 may weight the features based on an effect that the features have on appropriately transcribing media.
- the intelligent transcription analyzer 134 determines that an outline is an appropriate transcription style and transcribes the video in the format of a searchable outline with timestamps and sections of text considered “high importance” bolded and highlighted, the intelligent transcription analyzer 134 visually notifies the user of the outline transcription according to the user's preferences.
- On-demand self-service a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
- PaaS Platform as a Service
- the consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
- Private cloud the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
- Hybrid cloud the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the blocks may occur out of the order noted in the Figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Electrically Operated Instructional Devices (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
| TABLE 1 |
| Extracted Features |
| Topic: Addition | Importance: High (on exam) | ||
| Tone/Mood: Stressed | |||
| Facial Expression: Serious | |||
| Pointing | |||
| Topic: Subtraction | Importance: High (on exam) | ||
| Tone/Mood: Stressed | |||
| Facial Expression: Serious | |||
| Pointing, Waving | |||
| Topic: Multiplication | Importance: Low (not on exam) | ||
| Tone/Mood: Relaxed | |||
| Facial Expression: Smiling | |||
| Topic: Division | Importance: Low (not on exam) | ||
| Tone/Mood: Relaxed | |||
| Facial Expression: Smiling | |||
Claims (13)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/095,797 US12033619B2 (en) | 2020-11-12 | 2020-11-12 | Intelligent media transcription |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/095,797 US12033619B2 (en) | 2020-11-12 | 2020-11-12 | Intelligent media transcription |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220148583A1 US20220148583A1 (en) | 2022-05-12 |
| US12033619B2 true US12033619B2 (en) | 2024-07-09 |
Family
ID=81454721
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/095,797 Active 2041-04-07 US12033619B2 (en) | 2020-11-12 | 2020-11-12 | Intelligent media transcription |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12033619B2 (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12271869B2 (en) * | 2021-03-12 | 2025-04-08 | Hubspot, Inc. | Multi-service business platform system having conversation intelligence systems and methods |
| EP4036755A1 (en) * | 2021-01-29 | 2022-08-03 | Deutsche Telekom AG | Method for generating and providing information of a service presented to a user |
| US20250308529A1 (en) * | 2024-03-28 | 2025-10-02 | Lenovo (United States) Inc. | Real-time transcript production with digital assistant |
Citations (59)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5625748A (en) * | 1994-04-18 | 1997-04-29 | Bbn Corporation | Topic discriminator using posterior probability or confidence scores |
| US7117231B2 (en) | 2000-12-07 | 2006-10-03 | International Business Machines Corporation | Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data |
| US20070106508A1 (en) * | 2003-04-29 | 2007-05-10 | Jonathan Kahn | Methods and systems for creating a second generation session file |
| US7318031B2 (en) * | 2001-05-09 | 2008-01-08 | International Business Machines Corporation | Apparatus, system and method for providing speech recognition assist in call handover |
| US20090055175A1 (en) * | 2007-08-22 | 2009-02-26 | Terrell Ii James Richard | Continuous speech transcription performance indication |
| US20090300003A1 (en) * | 2008-05-30 | 2009-12-03 | Kabushiki Kaisha Toshiba | Apparatus and method for supporting keyword input |
| US20110026689A1 (en) * | 2009-07-30 | 2011-02-03 | Metz Brent D | Telephone call inbox |
| US7889847B2 (en) | 1994-04-19 | 2011-02-15 | Securus Technologies Holdings, Inc. | Computer-based method and apparatus for controlling, monitoring, recording and reporting telephone access |
| US20110087491A1 (en) * | 2009-10-14 | 2011-04-14 | Andreas Wittenstein | Method and system for efficient management of speech transcribers |
| US20110099011A1 (en) * | 2009-10-26 | 2011-04-28 | International Business Machines Corporation | Detecting And Communicating Biometrics Of Recorded Voice During Transcription Process |
| US20110112832A1 (en) * | 2009-11-06 | 2011-05-12 | Altus Learning Systems, Inc. | Auto-transcription by cross-referencing synchronized media resources |
| US20120022865A1 (en) * | 2010-07-20 | 2012-01-26 | David Milstein | System and Method for Efficiently Reducing Transcription Error Using Hybrid Voice Transcription |
| US20130006623A1 (en) * | 2011-06-30 | 2013-01-03 | Google Inc. | Speech recognition using variable-length context |
| US20140214426A1 (en) * | 2013-01-29 | 2014-07-31 | International Business Machines Corporation | System and method for improving voice communication over a network |
| US20140350918A1 (en) * | 2013-05-24 | 2014-11-27 | Tencent Technology (Shenzhen) Co., Ltd. | Method and system for adding punctuation to voice files |
| US20150279357A1 (en) * | 2014-03-31 | 2015-10-01 | NetTalk.com, Inc. | System and method for processing flagged words or phrases in audible communications |
| US9552548B1 (en) * | 2016-07-01 | 2017-01-24 | Intraspexion Inc. | Using classified text and deep learning algorithms to identify risk and provide early warning |
| US9641681B2 (en) | 2015-04-27 | 2017-05-02 | TalkIQ, Inc. | Methods and systems for determining conversation quality |
| US20170148341A1 (en) * | 2015-11-25 | 2017-05-25 | David A. Boulton | Methodology and system for teaching reading |
| US20170236517A1 (en) * | 2016-02-17 | 2017-08-17 | Microsoft Technology Licensing, Llc | Contextual note taking |
| US20170287482A1 (en) * | 2016-04-05 | 2017-10-05 | SpeakWrite, LLC | Identifying speakers in transcription of multiple party conversations |
| US9805018B1 (en) * | 2013-03-15 | 2017-10-31 | Steven E. Richfield | Natural language processing for analyzing internet content and finding solutions to needs expressed in text |
| US20180046710A1 (en) * | 2015-06-01 | 2018-02-15 | AffectLayer, Inc. | Automatic generation of playlists from conversations |
| US20180143956A1 (en) * | 2016-11-18 | 2018-05-24 | Microsoft Technology Licensing, Llc | Real-time caption correction by audience |
| US20180144747A1 (en) * | 2016-11-18 | 2018-05-24 | Microsoft Technology Licensing, Llc | Real-time caption correction by moderator |
| US20180174587A1 (en) * | 2016-12-16 | 2018-06-21 | Kyocera Document Solution Inc. | Audio transcription system |
| US20180315429A1 (en) * | 2017-04-28 | 2018-11-01 | Cloud Court, Inc. | System and method for automated legal proceeding assistant |
| US20180358052A1 (en) * | 2017-06-13 | 2018-12-13 | 3Play Media, Inc. | Efficient audio description systems and methods |
| US20190051301A1 (en) * | 2017-08-11 | 2019-02-14 | Slack Technologies, Inc. | Method, apparatus, and computer program product for searchable real-time transcribed audio and visual content within a group-based communication system |
| US20190175011A1 (en) * | 2017-12-11 | 2019-06-13 | 1-800 Contacts, Inc. | Digital visual acuity eye examination for remote physician assessment |
| US20190236148A1 (en) * | 2018-02-01 | 2019-08-01 | Jungle Disk, L.L.C. | Generative text using a personality model |
| US20190236139A1 (en) * | 2018-01-31 | 2019-08-01 | Jungle Disk, L.L.C. | Natural language generation using pinned text and multiple discriminators |
| US20200190585A1 (en) * | 2017-02-15 | 2020-06-18 | Checkpoint Sciences, Inc. | Data Processing and Classification for Determining a Likelihood Score for Immune-Related Adverse Events |
| US20200243094A1 (en) * | 2018-12-04 | 2020-07-30 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
| US20200265485A1 (en) * | 2019-02-15 | 2020-08-20 | Highradius Corporation | Customer relationship management call intent generation |
| US20200286485A1 (en) * | 2018-09-24 | 2020-09-10 | Veritone, Inc. | Methods and systems for transcription |
| US20200321007A1 (en) * | 2019-04-08 | 2020-10-08 | Speech Cloud, Inc. | Real-Time Audio Transcription, Video Conferencing, and Online Collaboration System and Methods |
| US20200349950A1 (en) * | 2019-04-30 | 2020-11-05 | Microsoft Technology Licensing, Llc | Speaker Attributed Transcript Generation |
| US20200395111A1 (en) * | 2019-06-11 | 2020-12-17 | Esaote Spa | Method for generating medical reports and an imaging system carrying out said method |
| US20210034819A1 (en) * | 2017-08-25 | 2021-02-04 | Ping An Technology (Shenzhen) Co., Ltd. | Method and device for identifying a user interest, and computer-readable storage medium |
| US20210050000A1 (en) * | 2017-10-02 | 2021-02-18 | Yobs Technologies, Inc. | Multimodal video system for generating a personality assessment of a user |
| WO2021026617A1 (en) * | 2019-08-15 | 2021-02-18 | Imran Bonser | Method and system of generating and transmitting a transcript of verbal communication |
| US20210076105A1 (en) * | 2019-09-11 | 2021-03-11 | Educational Vision Technologies, Inc. | Automatic Data Extraction and Conversion of Video/Images/Sound Information from a Slide presentation into an Editable Notetaking Resource with Optional Overlay of the Presenter |
| US10978077B1 (en) * | 2019-10-31 | 2021-04-13 | Wisdom Garden Hong Kong Limited | Knowledge point mark generation system and method thereof |
| US11003839B1 (en) * | 2017-04-28 | 2021-05-11 | I.Q. Joe, Llc | Smart interface with facilitated input and mistake recovery |
| US20210174787A1 (en) * | 2019-12-09 | 2021-06-10 | Microsoft Technology Licensing, Llc | Interactive augmentation and integration of real-time speech-to-text |
| US20210210097A1 (en) * | 2018-05-04 | 2021-07-08 | Microsoft Technology Licensing, Llc | Computerized Intelligent Assistant for Conferences |
| US20210264921A1 (en) * | 2020-02-21 | 2021-08-26 | BetterUp, Inc. | Synthesizing higher order conversation features for a multiparty conversation |
| WO2021206679A1 (en) * | 2020-04-06 | 2021-10-14 | Hi Auto LTD. | Audio-visual multi-speacer speech separation |
| US20210375289A1 (en) * | 2020-05-29 | 2021-12-02 | Microsoft Technology Licensing, Llc | Automated meeting minutes generator |
| US20210375291A1 (en) * | 2020-05-27 | 2021-12-02 | Microsoft Technology Licensing, Llc | Automated meeting minutes generation service |
| US20210383127A1 (en) * | 2020-06-04 | 2021-12-09 | Microsoft Technology Licensing, Llc | Classification of auditory and visual meeting data to infer importance of user utterances |
| US20210407506A1 (en) * | 2020-06-30 | 2021-12-30 | Snap Inc. | Augmented reality-based translation of speech in association with travel |
| US20220109585A1 (en) * | 2020-10-05 | 2022-04-07 | International Business Machines Corporation | Customized meeting notes |
| US20220115020A1 (en) * | 2020-10-12 | 2022-04-14 | Soundhound, Inc. | Method and system for conversation transcription with metadata |
| US11315569B1 (en) * | 2019-02-07 | 2022-04-26 | Memoria, Inc. | Transcription and analysis of meeting recordings |
| US11315546B2 (en) * | 2015-09-02 | 2022-04-26 | Verizon Patent And Licensing Inc. | Computerized system and method for formatted transcription of multimedia content |
| US11431517B1 (en) * | 2018-10-17 | 2022-08-30 | Otter.ai, Inc. | Systems and methods for team cooperation with real-time recording and transcription of conversations and/or speeches |
| US20230266874A1 (en) * | 2020-09-29 | 2023-08-24 | Google Llc | Scroller Interface for Transcription Navigation |
-
2020
- 2020-11-12 US US17/095,797 patent/US12033619B2/en active Active
Patent Citations (60)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5625748A (en) * | 1994-04-18 | 1997-04-29 | Bbn Corporation | Topic discriminator using posterior probability or confidence scores |
| US7889847B2 (en) | 1994-04-19 | 2011-02-15 | Securus Technologies Holdings, Inc. | Computer-based method and apparatus for controlling, monitoring, recording and reporting telephone access |
| US7117231B2 (en) | 2000-12-07 | 2006-10-03 | International Business Machines Corporation | Method and system for the automatic generation of multi-lingual synchronized sub-titles for audiovisual data |
| US7318031B2 (en) * | 2001-05-09 | 2008-01-08 | International Business Machines Corporation | Apparatus, system and method for providing speech recognition assist in call handover |
| US20070106508A1 (en) * | 2003-04-29 | 2007-05-10 | Jonathan Kahn | Methods and systems for creating a second generation session file |
| US20090055175A1 (en) * | 2007-08-22 | 2009-02-26 | Terrell Ii James Richard | Continuous speech transcription performance indication |
| US20090300003A1 (en) * | 2008-05-30 | 2009-12-03 | Kabushiki Kaisha Toshiba | Apparatus and method for supporting keyword input |
| US20110026689A1 (en) * | 2009-07-30 | 2011-02-03 | Metz Brent D | Telephone call inbox |
| US20110087491A1 (en) * | 2009-10-14 | 2011-04-14 | Andreas Wittenstein | Method and system for efficient management of speech transcribers |
| US20110099011A1 (en) * | 2009-10-26 | 2011-04-28 | International Business Machines Corporation | Detecting And Communicating Biometrics Of Recorded Voice During Transcription Process |
| US20110112832A1 (en) * | 2009-11-06 | 2011-05-12 | Altus Learning Systems, Inc. | Auto-transcription by cross-referencing synchronized media resources |
| US20120022865A1 (en) * | 2010-07-20 | 2012-01-26 | David Milstein | System and Method for Efficiently Reducing Transcription Error Using Hybrid Voice Transcription |
| US20130006623A1 (en) * | 2011-06-30 | 2013-01-03 | Google Inc. | Speech recognition using variable-length context |
| US20140214426A1 (en) * | 2013-01-29 | 2014-07-31 | International Business Machines Corporation | System and method for improving voice communication over a network |
| US9805018B1 (en) * | 2013-03-15 | 2017-10-31 | Steven E. Richfield | Natural language processing for analyzing internet content and finding solutions to needs expressed in text |
| US20140350918A1 (en) * | 2013-05-24 | 2014-11-27 | Tencent Technology (Shenzhen) Co., Ltd. | Method and system for adding punctuation to voice files |
| US20150279357A1 (en) * | 2014-03-31 | 2015-10-01 | NetTalk.com, Inc. | System and method for processing flagged words or phrases in audible communications |
| US9641681B2 (en) | 2015-04-27 | 2017-05-02 | TalkIQ, Inc. | Methods and systems for determining conversation quality |
| US20180046710A1 (en) * | 2015-06-01 | 2018-02-15 | AffectLayer, Inc. | Automatic generation of playlists from conversations |
| US11315546B2 (en) * | 2015-09-02 | 2022-04-26 | Verizon Patent And Licensing Inc. | Computerized system and method for formatted transcription of multimedia content |
| US20170148341A1 (en) * | 2015-11-25 | 2017-05-25 | David A. Boulton | Methodology and system for teaching reading |
| US20170236517A1 (en) * | 2016-02-17 | 2017-08-17 | Microsoft Technology Licensing, Llc | Contextual note taking |
| US20170287482A1 (en) * | 2016-04-05 | 2017-10-05 | SpeakWrite, LLC | Identifying speakers in transcription of multiple party conversations |
| US9552548B1 (en) * | 2016-07-01 | 2017-01-24 | Intraspexion Inc. | Using classified text and deep learning algorithms to identify risk and provide early warning |
| US20180144747A1 (en) * | 2016-11-18 | 2018-05-24 | Microsoft Technology Licensing, Llc | Real-time caption correction by moderator |
| US20180143956A1 (en) * | 2016-11-18 | 2018-05-24 | Microsoft Technology Licensing, Llc | Real-time caption correction by audience |
| US20180174587A1 (en) * | 2016-12-16 | 2018-06-21 | Kyocera Document Solution Inc. | Audio transcription system |
| US20200190585A1 (en) * | 2017-02-15 | 2020-06-18 | Checkpoint Sciences, Inc. | Data Processing and Classification for Determining a Likelihood Score for Immune-Related Adverse Events |
| US20180315429A1 (en) * | 2017-04-28 | 2018-11-01 | Cloud Court, Inc. | System and method for automated legal proceeding assistant |
| US11003839B1 (en) * | 2017-04-28 | 2021-05-11 | I.Q. Joe, Llc | Smart interface with facilitated input and mistake recovery |
| US20180358052A1 (en) * | 2017-06-13 | 2018-12-13 | 3Play Media, Inc. | Efficient audio description systems and methods |
| US20190051301A1 (en) * | 2017-08-11 | 2019-02-14 | Slack Technologies, Inc. | Method, apparatus, and computer program product for searchable real-time transcribed audio and visual content within a group-based communication system |
| US20210034819A1 (en) * | 2017-08-25 | 2021-02-04 | Ping An Technology (Shenzhen) Co., Ltd. | Method and device for identifying a user interest, and computer-readable storage medium |
| US20210050000A1 (en) * | 2017-10-02 | 2021-02-18 | Yobs Technologies, Inc. | Multimodal video system for generating a personality assessment of a user |
| US20190175011A1 (en) * | 2017-12-11 | 2019-06-13 | 1-800 Contacts, Inc. | Digital visual acuity eye examination for remote physician assessment |
| US20190236139A1 (en) * | 2018-01-31 | 2019-08-01 | Jungle Disk, L.L.C. | Natural language generation using pinned text and multiple discriminators |
| US20190236148A1 (en) * | 2018-02-01 | 2019-08-01 | Jungle Disk, L.L.C. | Generative text using a personality model |
| US20210210097A1 (en) * | 2018-05-04 | 2021-07-08 | Microsoft Technology Licensing, Llc | Computerized Intelligent Assistant for Conferences |
| US20200286485A1 (en) * | 2018-09-24 | 2020-09-10 | Veritone, Inc. | Methods and systems for transcription |
| US11431517B1 (en) * | 2018-10-17 | 2022-08-30 | Otter.ai, Inc. | Systems and methods for team cooperation with real-time recording and transcription of conversations and/or speeches |
| US20200243094A1 (en) * | 2018-12-04 | 2020-07-30 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
| US11315569B1 (en) * | 2019-02-07 | 2022-04-26 | Memoria, Inc. | Transcription and analysis of meeting recordings |
| US20200265485A1 (en) * | 2019-02-15 | 2020-08-20 | Highradius Corporation | Customer relationship management call intent generation |
| US20200321007A1 (en) * | 2019-04-08 | 2020-10-08 | Speech Cloud, Inc. | Real-Time Audio Transcription, Video Conferencing, and Online Collaboration System and Methods |
| US11170782B2 (en) * | 2019-04-08 | 2021-11-09 | Speech Cloud, Inc | Real-time audio transcription, video conferencing, and online collaboration system and methods |
| US20200349950A1 (en) * | 2019-04-30 | 2020-11-05 | Microsoft Technology Licensing, Llc | Speaker Attributed Transcript Generation |
| US20200395111A1 (en) * | 2019-06-11 | 2020-12-17 | Esaote Spa | Method for generating medical reports and an imaging system carrying out said method |
| WO2021026617A1 (en) * | 2019-08-15 | 2021-02-18 | Imran Bonser | Method and system of generating and transmitting a transcript of verbal communication |
| US20210076105A1 (en) * | 2019-09-11 | 2021-03-11 | Educational Vision Technologies, Inc. | Automatic Data Extraction and Conversion of Video/Images/Sound Information from a Slide presentation into an Editable Notetaking Resource with Optional Overlay of the Presenter |
| US10978077B1 (en) * | 2019-10-31 | 2021-04-13 | Wisdom Garden Hong Kong Limited | Knowledge point mark generation system and method thereof |
| US20210174787A1 (en) * | 2019-12-09 | 2021-06-10 | Microsoft Technology Licensing, Llc | Interactive augmentation and integration of real-time speech-to-text |
| US20210264921A1 (en) * | 2020-02-21 | 2021-08-26 | BetterUp, Inc. | Synthesizing higher order conversation features for a multiparty conversation |
| WO2021206679A1 (en) * | 2020-04-06 | 2021-10-14 | Hi Auto LTD. | Audio-visual multi-speacer speech separation |
| US20210375291A1 (en) * | 2020-05-27 | 2021-12-02 | Microsoft Technology Licensing, Llc | Automated meeting minutes generation service |
| US20210375289A1 (en) * | 2020-05-29 | 2021-12-02 | Microsoft Technology Licensing, Llc | Automated meeting minutes generator |
| US20210383127A1 (en) * | 2020-06-04 | 2021-12-09 | Microsoft Technology Licensing, Llc | Classification of auditory and visual meeting data to infer importance of user utterances |
| US20210407506A1 (en) * | 2020-06-30 | 2021-12-30 | Snap Inc. | Augmented reality-based translation of speech in association with travel |
| US20230266874A1 (en) * | 2020-09-29 | 2023-08-24 | Google Llc | Scroller Interface for Transcription Navigation |
| US20220109585A1 (en) * | 2020-10-05 | 2022-04-07 | International Business Machines Corporation | Customized meeting notes |
| US20220115020A1 (en) * | 2020-10-12 | 2022-04-14 | Soundhound, Inc. | Method and system for conversation transcription with metadata |
Non-Patent Citations (15)
| Title |
|---|
| Anonymous, "Convolutional Neural Network," Wikipedia, https://en.wikipedia.org/wiki/Convolutional_neural_network, May 8, 2020, pp. 1-30. |
| Anonymous, "Long Short-Term Memory," Wikipedia, https://en.wikipedia.org/wiki/Long_short-term_memory, May 8, 2020, p. 1-13. |
| Anonymous, "Mel-Frequency Cepstrum," Wikipedia, https://en.wikipedia.org/wiki/Mel-frequency_cepstrum, May 8, 2020, pp. 1-4. |
| Anonymous, Watson Media, "Streaming Video Platform & Hosting Services," https://video.ibm.com/, 2018, pp. 1-5. |
| Cisco, "Cisco Unified Workforce Optimization Quality Management User Guide Release 11.5," Cisco Systems, Inc., Oct. 28, 2019, pp. 1-213. |
| Daga et al., Domain-Specific Language Model Using Domain Literature and Experts' Spoken Language, ip.com, Oct. 20, 2017, pp. 1-11. |
| Disclosed Anonymously, "Click-To-Call Conversion Measurement Based On Transcribing Conversation," ip.com, Feb. 13, 2019, pp. 1-6. |
| Disclosed Anonymously, "Extracting Meaning and Sentiment From Recorded Conversations Using NLP According to a User's Physiological Signal Spikes in Order to Compose Sound Highlights," ip.com, Dec. 18, 2019, pp. 1-5. |
| Disclosed Anonymously, "System and Method to Record, Transcribe and Index a Custodian's Compliance to a Legal Notice Through a Telephonic Conversation," ip.com, Feb. 8, 2017, pp. 1-6. |
| Duran, et al., "Align: Analyzing Linguistic Interactions With Generalizable Techniques—A Python Library," American Psychological Association, 2019, pp. 1-68. |
| Hansen, "Knowledge Sharing, Maintenance and Use in Online Support Communities," ResearchGate, https://www.researchgate.net/publication/30858323, Apr. 2006, pp. 1-399. |
| Mell et al., "The NIST Definition of Cloud Computing", National Institute of Standards and Technology, Special Publication 800-145, Sep. 2011, pp. 1-7. |
| Myers, "How Transcriptions Can Help Professors with Lecture Courses," Rev.Com, https://www.rev.com/blog/how-transcriptions-can-help-professors-with-lecture-courses, Aug. 23, 2019, pp. 1-7. |
| Tatan, "Auto Generated FAQ with Python Dash, Topic Analysis and Reddit Praw API," Towards Data Science, https://towardsdatascience.com/auto-generated-faq-with-python-dash-text-analysis-and-reddit-api-90fb66a86633, May 14, 2019, pp. 1-3. |
| X. Che, H. Yang and C. Meinel, "Automatic Online Lecture Highlighting Based on Multimedia Analysis," in IEEE Transactions on Learning Technologies, vol. 11, No. 1, pp. 27-40, Jan. 1-Mar. 2018, doi: 10.1109/TLT.2017.2716372. (Year: 2018). * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220148583A1 (en) | 2022-05-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2020400345B2 (en) | Anaphora resolution | |
| KR102346046B1 (en) | 3d virtual figure mouth shape control method and device | |
| US11443227B2 (en) | System and method for cognitive multilingual speech training and recognition | |
| US12417381B2 (en) | Alternative soft label generation | |
| US10223440B2 (en) | Question and answer system emulating people and clusters of blended people | |
| US10970898B2 (en) | Virtual-reality based interactive audience simulation | |
| US20230169272A1 (en) | Communication framework for automated content generation and adaptive delivery | |
| US11645138B2 (en) | Diagnosing and resolving technical issues | |
| US11019371B2 (en) | Control of content broadcasting | |
| US10678855B2 (en) | Generating descriptive text contemporaneous to visual media | |
| US12033619B2 (en) | Intelligent media transcription | |
| US11158210B2 (en) | Cognitive real-time feedback speaking coach on a mobile device | |
| WO2023046016A1 (en) | Optimization of lip syncing in natural language translated video | |
| US12417762B2 (en) | Speech-to-text voice visualization | |
| US20220415317A1 (en) | Virtual meeting content enhancement triggered by audio tracking | |
| US11922824B2 (en) | Individualized media playback pacing to improve the listener's desired outcomes | |
| US10621990B2 (en) | Cognitive print speaker modeler | |
| US12019673B2 (en) | Digital semantic structure conversion | |
| US11024289B2 (en) | Cognitive recommendation engine facilitating interaction between entities | |
| US11146678B2 (en) | Determining the context of calls | |
| US11386056B2 (en) | Duplicate multimedia entity identification and processing | |
| US11526544B2 (en) | System for object identification | |
| US20220114219A1 (en) | Determining device assistant manner of reply | |
| US11036925B2 (en) | Managing the distinctiveness of multimedia | |
| US12361298B2 (en) | Displaying contextual information of media |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DECROP, CLEMENT;AGRAWAL, TUSHAR;FOX, JEREMY R.;AND OTHERS;SIGNING DATES FROM 20201111 TO 20201112;REEL/FRAME:054344/0307 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |