GB2342802A - Indexing conference content onto a timeline - Google Patents

Indexing conference content onto a timeline Download PDF

Info

Publication number
GB2342802A
GB2342802A GB9916394A GB9916394A GB2342802A GB 2342802 A GB2342802 A GB 2342802A GB 9916394 A GB9916394 A GB 9916394A GB 9916394 A GB9916394 A GB 9916394A GB 2342802 A GB2342802 A GB 2342802A
Authority
GB
United Kingdom
Prior art keywords
conference
participant
sound
audio
timeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9916394A
Other versions
GB2342802B (en
GB9916394D0 (en
Inventor
Steven L Potts
Peter L Chu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Polycom Inc
Original Assignee
Picturetel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Picturetel Corp filed Critical Picturetel Corp
Publication of GB9916394D0 publication Critical patent/GB9916394D0/en
Publication of GB2342802A publication Critical patent/GB2342802A/en
Application granted granted Critical
Publication of GB2342802B publication Critical patent/GB2342802B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • H04M3/382Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords
    • H04M3/385Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords using speech signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • H04M3/387Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using subscriber identification cards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6045Identity confirmation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2242/00Special services or facilities
    • H04M2242/30Determination of the location of a subscriber
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42221Conversation recording systems

Abstract

A method and system to index the content of conferences. It includes identifying each such conference participant producing a sound, capturing an image of each such conference participant, and with correlating the images of the conference participants with audio segments of an audio recording, that is the segments corresponding to the audio produced by the conference participant. The indexing system includes a sound recording mechanism, at least one identifier of locations of conference participants, a camera, an image storage device, a processor for associating the still images captured by the camera to the sound recorded by the sound recording mechanism thereby correlating still images of the conference participants to audio segments produced by the conference participants, and a graphical user interface which allows easy access to stored sound, images, and correlated data. It may also include an aiming device for pointing the camera at the person speaking.

Description

METHOD AND APPARATUS FOR INDEXING CONFERENCE CONTENT This invention relates to the field of multimedia.
With the advent of economical digital storage media and sophisticated video/audio decompression technology capable of running on personal computers, thousands of hours of digitized video/audio data can be stored with virtually instantaneous random access. In order for this stored data to be utilized, it must be indexed efficiently in a manner allowing a user to find desired portions of the digitized video/audio data quickly.
For recorded conferences having a number of participants, indexing is generally performed on the basis of"who"said"what"and"when" (at what time). Currently used methods of indexing do not reliably give this information, primarily because video pattern recognition, speech recognition, and speaker identification techniques are unreliable technologies in the noisy, reverberant, uncontrolled environments in which conferences occur.
Also, a need exists for a substitut for tedious trial-and-error techniques for finding when a conference participant first starts speaking in a recording.
The invention features a method and a system for indexing the content of a conference by matching images captured during the conference to the recording of sounds produced by conference participants.
Using reliable sound source localization technology implemented with microphone arrays, the invention produces reliable information concerning"who"and"when" (which persons spoke at what time) for a conference. While information concerning"whatn (subject matter) is missing, the"who-when"information greatly facilitates manual annotation for the missing"what"information. In many search-retrieval situations, the"who-when"information alone will be sufficient for indexing.
In one aspect of the invention, the method includes identifying a conference participant producing a sound, capturing a still image of the conference participant, correlating the still image of the conference participant to the audio segments of the audio recording corresponding to the sound produced by the conference participant, and generating a timeline by creating a speech-present segment representing the correlated still image and associated audio segment. Thus, the timeline includes speech-present segments representing a still image and associated audio segments. The still image is a visual representation of the sound source producing the associated audio segments.
The audio recording can be segmente into audio segment portions and associated with conference participants, whose images are captured, for example, with a video camera.
Embodiments of this aspect of the invention may include one or more of the following features.
The still image of each conference participant producing a sound is captured as a segment of a continuous video recording of the conference, thereby establishing a complete visual indicator of all speakers participating in a conference.
The timeline is presented visually so that a user can quickly and easily access individual segments of the continuous recording of the conference.
The timeline can include a colored line or bar representing the duration of each speech segment with a correlated image to index the recorded conference. The timeline can be presented as a graphical user interface (GUI), so that the user can use an input device (for example, a mouse) to select or highlight the appropriate part of the timeline corresponding to the start of the desired recording, access that part, and start playing the recording. Portions of the audio and video recordings can be played on a playback monitor.
Various approaches can be used to identify a conference participant. In one embodiment, a microphone array is used to locate the conference participant by sound.
The microphone arrays together with reliable sound source localization technology reliably and accurately estimate the position and presence of sound sources in space.
The time elapsed from a start of the conference is stored with each audio segment and each still image. An indexing engine can be provided to generate the timeline by matching the elapsed time associated with an audio segment and a still image.
The system can be used to index a conference with only one participant. The timeline then includes an indication of the times in which sound was produced, as well as an image of the lone participant.
In applications in which more than one conference participant is present and identified, the system stores the times elapsed from the start of the conference and identifications of when a speaker begins speaking with each still image, a participant being associated with each image.
The elapsed time is also stored with the audio recording each time a change in sound location is identified. The indexing engine creates an index, that is, a list of associated images and sound segments. Based on this index, a timeline is then generated for each still image (that is, each conference participant) designating the times from the start of the conference when the participant speaks. The timeline also indicates any other conference participant who might also appear in the still image (for example, a neighbor sitting in close proximity to the speaker), but is silent at the particular elapsed time, thus giving a comprehensive overview of the sounds produced by all conference participants, as well as helping identify all persons present in the still images. The timeline may be generated either in real time or after the conference is finished.
In embodiments in which a video camera is used to capture still images of the conference participants, it can also be used to record a continuous video recording of the conference.
The system can be used for a conference with all participants in one room (near-end participants) as well as for a conference with participants (far-end participants) at another site.
Assuming that a speaker has limited movement during a conference, the same person is assumed to be talking every time sound is detected from a particular locality. Thus, if the speech source is determined to be the same as the locality of a previously detected conference participant, a speech-present segment is added to the timeline for the previously detected conference participant. If the location of a conference participant is different from a previously detected location of a near-end conference participant, a still image of the new near-end conference participant is stored and a new timeline is started for the new near-end conference participant.
In a video conference involving a far-end participant, the audio source is a loudspeaker at the near end transmitting a sound from a far-end speech source. The timeline is then associated with the far-end, and generating a timeline includes creating a speech-present segment for . the far-end if a far-end speech source is present. Thus, a user of the invention can identify and access far-end speech segments. Further, if a far-end speech source is involved in the conference, echo can be suppressed by subtracting a block of accumulated far-end loudspeaker data from a block of accumulated near-end microphone array data.
Advantageously, therefore, a video image of a display presented at the conference is captured, and a timeline is generated for the captured video image of the display. This enables the indexing of presentation material as well as sounds produced by conference participants.
The present invention is illustrated in the following figures.
Fig. 1 is a schematic representation of a videoconferencing embodiment using two microphone arrays; Fig. 2 is a block diagram of the computer which performs some of the functions illustrated in Fig. 1 ; Fig. 3 is an exemplary display showing timelines generated during a videoconference; and Fig. 4 is a flow diagram illustrating operation of the microphone array conference indexing method.
While the description which follows is associated with a videoconference between the local or near-end site and a distant or far-end site, the invention can be used with a single site conference as well.
Referring then to Fig. 1, a videoconference indexing system 10 (shown enclosed by dashed lines) is used to record and index a videoconference having, in this particular embodiment, four conference participants 62,64,66,68 sitting around a table 60 and engaged in a videoconference.
One or more far-end conference participants (not shown) also participate in the conference through the use of a local videoconferencing system 20 connected over a communication channel 16 to a far-end video conferencing system 18. The communication channel 16 connects the far-end video conferencing system to the near-end videoconferencing system 20 and far-end decompressed audio is available to a source locator 22.
Videoconference indexing system 10 includes videoconferencing system 20, a computer 30, and a playback system 50. Videoconferencing system 20 includes a display monitor 21 and a loudspeaker 23 for allowing the far-end conference participant to be seen and heard by conference participants 62,64,66, and 68. In an alternative embodiment, the embodiment shown in Fig. 1 is used to record a meeting not in a conference-call mode, so the need for the display monitor 21 and loudspeaker 23 of videoconferencing system 20 is eliminated. System 20 also includes microphone arrays 12,14 for acquiring sound (for example, participants'speech), the source locator 22 for determining the location of a sound-producing conference participant, and a video camera 24 for capturing video images of the setting and participants as part of a continuous video 'recording. In one embodiment, source locator 22 is a standalone hardware, called"LIMELIGHT", manufactured and sold by PictureTel Corporation, and which is a videoconferencing unit having an integrated motorized camera and microphone array. The"LIMELIGHT"locator 22 has a digital signal processing (DSP) integrated circuit which efficiently implements the source locator function, receiving electrical signals representing sound picked up in the room and outputting source location parameters. Further details of the structure and implementation of the"Limelightwn system is described in U. S. 5, 778,082, the contents of which are incorporated herein by reference. (In other embodiments of the invention, multiple cameras and microphone configurations can be used.) Alternative methods can be used to fulfill the function of source locator 22. For example, a camera video pattern recognition algorithm can be used to identify the location of an audio source, based on mouth movements. In another embodiment of the invention, an infrared motion detector can be used to identify an audio source location, for example to detect a speaker approaching a podium.
Computer 30 includes an audio storage 32 and a video storage 34 for storing audio and video data provided from microphone arrays 12,14 and video camera 24, respectively.
Computer 30 also includes an indexing engine software module 40 whose operations will be discussed in greater detail below.
Referring to Fig. 2, the hardware for computer 30 used to store and process data and computer instructions is shown. In particular, computer 30 includes a processor 31, a memory storage 33, and a working memory 35, all of which are connected by an interface bus 37. Memory storage 33, typically a disk drive, is used for storing the audio and video data provided from microphone arrays 12,14 and camera 24, respectively, and thus includes audio storage 32 and video storage 34. In operation, indexing engine software 40 is loaded into working memory 35, typically RAM, from memory storage 33 so that the computer instructions from the indexing engine can be processed by processor 31. Computer 30 serves as an intermediate storage facility which records, compresses, and combines the audio, video, and indexing information data as the actual conference occurs.
Referring again to Fig. 1, playback system 50 is connected to computer 30 and includes a playback display 52 and a playback server 54, which together allow the recording of the videoconference to be reviewed quickly and accessed at a later time.
Although a more detailed description of the operation is provided below, in general, microphone arrays 12,14 generate signals, in response to sound generated in the videoconference, which are sent to source locator 22.
Source locator 22, in turn, transmits signals representative of the location of a sound source both to a pointing mechanism 26 connected to video camera 24 and to computer 30. These signals are transmitted along lines 27 and 28, respectively. Pointing mechanism 26 includes motors which, in the most general case, control panning, tilting, zooming, and auto-focus functions of the video camera (subsets of these functions can also be used). Further details of pointing mechanism 26 are described in U. S. 5,633,681, incorporated herein by reference. Video camera 24, in response to the signals from source locator 22, is then pointed,. by pointing mechanism 26, in the direction of the conference participant who is the current sound source.
Images of the conference participant captured by video camera 24 are stored in video storage 34 as video data, along with an indication of the time which has elapsed from the start of the conference.
Simultaneously, the sound picked up by microphone arrays 12,14 is transmitted to and stored in audio storage 32, also along with the time which has elapsed from the start of the conference until the beginning of each new sound segment. Thus, the elapsed time is stored with each sound segment in audio storage 32. A new sound segment corresponds to each change, determined by the source locator 22, in the detected location of sound source.
In order to minimize storage requirements, both the audio and video data are stored, in this illustrated embodiment, in a compressed format. If further storage minimization is necessary, only those portions of the videoconference during which speech is detected will be stored, and further, if necessary, the video data, other than the conference participant still images, need not be stored.
Although the embodiment illustrated in Fig. 1 uses one camera, more than one camera can be used to capture the video images of conference participants. This approach is especially useful for cases where one participant may block a camera's view of another participant. Alternatively, a separate camera can be dedicated to recording, for example viewgraphs or whiteboard drawings, shown during the course of a conference.
As noted above, audio storage 32 and video storage 34 are both part of computer 30 and the stored audio and video images are available to both the indexing engine 40 and playback system 50. The latter includes the playback display 52 and the playback server 54 as noted above.
Indexing engine 40 associates the stored video images to the stored sound (segments) based on elapsed time from the start of the conference, and generates a file with indexing information; it indexes compressed audio and video data using a protocol such as, for example, AVI format. Foi long term storage, audio, video, and indexing information i : transmitted from computer 30 to the playback server 54 for access by users of the system. Playback server 54 can retrieve from its''own. memory the audio and video data when requested by a user. Playback server 54 stores data from the conference in such a way as to make it quickly available for many users on a computer network. In one embodiment, playback server 54 includes many computers, with a library of multimedia files distributed across the computers. A user can access playback server 54 as well as the information generated by the indexing engine 40 by using GUI 45 with a GUI display 47. Then, the playback display terminal 52 is used to display video data stored in video storage 34 and to play audio data stored in audio storage 32; playback display 52 is also used to display video data and to play audio data stored in playback server 54.
Alternatively, instead of using video images for indexing, an icon is generated based on a still image selected from the continuous video recording. Then, the icon of the conference participant is associated with the audio segment generated by the conference participant. Thus the system builds a database index associating with each identified sound source and its representative icon or image, a sequence of elapsed times and time durations for each instance when the participant was a"sound source".
The elapsed times and the durations can be used to access the stored audio and video as described in detail below.
One feature of the invention is to index conference content using the identification of various sound sources and their locations. In the embodiment shown in Fig. 1, the identification and location of sound sources is achieved by the source locator 22 and the two microphone arrays 12, 14. Each microphone array is a PictureTel"LimeLight" array having four microphones, one microphone positioned at each vertex of an inverted T and at the intersection of the two linear portions of the"T". In this illustrated embodiment, the inverted T array has a height of 12 inches and a width of 18 inches. Arrays of this type are described in U. S. Patent 5,778,082 by Chu et al., the contents of which are incorporated herein by reference.
In other embodiments, other microphone array position estimation procedures and microphone array configurations, with different structures and techniques of estimating spatial location, can be used to locate a sound source. For example, a microphone can be situated close to each conference participant, and any microphone with a sufficiently loud signal indicates that the particular person associated with that microphone is speaking.
Accurate time-of-arrival difference times of emitted sound in the room are obtained between selected combinations of microphone pairs in each microphone array 12,14 by the use of a highly modified cross-correlation technique (modified for robustness to room echo and background noise degradation) as described in U. S. 5, 778,082. Assuming plane sound waves (the far-field assumption), these pairs of timedifferences can be translated by source locator 22 correspondingly into bearing angles from the respective array. The angles provide an estimate of the location of the sound source in three-dimensional space.
In the embodiment shown in Fig. 1, the sound is picked up by a microphone array integrated with the sound localization array, so that the microphone arrays serve double duty as both sound localization and sound pick-up apparatus. However, in other embodiments, one microphone or microphone array can be used for recording while another microphone or microphone array can be used for sound localization.
Although two microphone arrays 12,14 are shown in use with videoconferencing indexing system 10, only one array is required. In other embodiments, the number and configurations of microphone arrays may vary, for example, from one microphone to many. Using more than one array provides advantages. In particular, while the azimuth and elevation angles provided by each of arrays 12,14 are highly accurate and are estimated to within a fraction of a degree, range estimates are not nearly as accurate. Even though the range error is higher, however, the information is sufficient for use with pointing mechanism 26.
However, the larger range estimation error of the microphone arrays gives rise to sound source ambiguity problems for a single microphone array. Thus, with reference to Fig. 1, microphone array 12 might view persons 66,68 as the same person, since their difference in range to microphone array 12 might be less than the range error of array 12. To address this problem, source localization estimates from microphone array 14 could be used by source locator 22 as a second source of information to separate persons 66 and 68, since persons 66 and 68 are separated substantially in azimuth angle from the viewpoint of microphone array 14.
An alternative approach to indexing by sound source location is to use manual camera position commands such as pan/tilt commands and presets to index the meeting. These commands in general may indicate a change in content whereby a change in camera position is indicative of a change in sound source location.
Fig. 3 shows an example of a display 80, viewed on GUI display 47 (Fig. 1), resulting from a videoconference.
The following features, included in the display 80, indicate to a user of system 10 exactly who was speaking and when that person spoke, Horizontal axis 99 is a time scale, representing the actual time during the recorded conference.
Pictures of conference participants appear along the vertical axis of display 80. Indexing engine 40 (Fig. 1) selects and extracts from video storage 34 pictures 81,83, 85 of conference participants 62,64,66, on the basis of elapsed time from the start of the conference and the beginning of new sound segments. These pictures represent the conference participant (s) producing the sound segment (s). Pictures 81,83,85 are single still frames from a continuous video recording captured by video camera 24 and stored in video storage 34. A key criteria for selection of images for the pictures is the elapsed time from the start of the conference to the beginning of each respective sound segment: the pictures selected for the timeline are the ones which are captured at the same elapsed time as the beginning of each respective sound segment.
Display 80 includes a picture 87, denoting a far-end conference participant. This image, too, is selected by the indexing engine 40. It can be an image of the far-end conference participant, if images from a far-end camera are available. Alternatively, it can be an image of a logo, a photograph, etc., captured by a near-end camera.
Display 80 also includes a block 89 representing, for example, data presented by one of the conference participants at the conference. Data content can be recorded by use of an electronic viewgraph display system (not shown) which provides signals to videoconferencing system 20. Alternatively, a second camera can be used to record slides presented with a conventional viewgraph. The slides, greatly reduced in size, would then form part of display 80.
Associated with each picture 81,83,85,87 and block 89 are line'segments representing when sound corresponding to each respective picture occurred. For example, segments 90,92,92', and 94 represent the duration of sound produced by three conference participants, e. g. 62, 64, and 66 of Fig. 1. Segment 96 represents sounds produced by a far-end conference participant (not shown in Fig. 1).
Segments 97 and 98, on the other hand, show when data content was displayed during the presentation and show a representation of the data content. The segments may be different colors, with different meaning assigned to each color. For example, a blue line could represent a near-end sound source, and a red line could represent a far-end sound source. In essence, the pictures and blocks, together with the segments, provide a series of timelines for each conference participant and presented data block.
In display 80, the content of what each person 62, 64,66 said is not presented, but this information can, if desired, be filled in after-the-fact by manual annotation, such as a note on the display 80 through the GUI 45 at each speech segment 90,92,92', and 94.
A user can view display 80 using GUI 45, GUI display 47, and playback display 52. In particular, the user can click a mouse or other input device (for example, a trackball or cursor control keys on a keyboard) on any point in segments 90,92,92', 94,96,97, and 98 in the display 80 to access and playback or display that portion of the stored conference file.
A flow diagram of a method 100, according to the invention, is presented in Fig. 4. Method 100 of Fig. 4 is generic to system operation, and could be applied to a wide variety of different microphone array configurations. With reference also to. Figs. 1-3, the operation of system will be described.
In operation, audio is simultaneously acquired from both the far end and the near end of a videoconference.
From the far end, audio is continuously acquired for successive preselected durations of time as it is received by videoconferencing system 20 (step 101). Audio received from the far-end videoconferencing system 18 is thus directed to the source locator 22 (step 102). The source locator analyzes the frequency components of far end audio signals. The onset of a new segment is characterized by i) the magnitude of a particular frequency component being greater than the background noise for that frequency and~ii) the magnitude of a particular frequency component being greater than the magnitude of the same component acquired during a predetermined number of preceding time frames. If speech is present, an audio segment (e. g., segment 96 in Fig. 3) is begun (step 103) for the timeline corresponding to audio produced by the far-end conference participant (s).
An audio segment is continued for the timeline, corresponding to a far-end conference participant, if speech continues to be present at the far-end and there has been no temporal interruption since the beginning of the previously started audio segment.
While the preselected durations of far-end audio are being acquired (step 101) and analyzed, the system simultaneously acquires successive N second durations of audio from microphone arrays 12,14 (step 104). Because the audio from the far-end site can interfere with near-end detection of audio in the room, the far-end signal received through the microphone arrays is suppressed by the subtraction of a block of N second durations of far-end audio from the acquired near-end audio (step 105). In this way, false sound localization of the loudspeaker as a "person" (audio source) will not occur. Echo suppression will not affect a signal resulting from two near-end participants speaking simultaneously. In this case, the sound locator locates both participants, locates the stronger of the two, or does nothing.
Echo suppression can be implemented with adaptive filters, or by use of a bandpass filter bank (not shown) with band-by-band gating (setting to zero those bands with significant far-end energy, allowing processing to occur only on bands with far-end energy near the far-end background noise level), as is well-known to those skilled in the art. Methods for achieving both adaptive filtering and echo suppression are described in U. S. 5, 305, 307 by Chu, the contents of which are incorporated herein by reference.
The detection and location of speech of a near-end source is determined (step 106) using source locator 22 and microphone arrays 12,14. If speech is detected, then source locator 22 estimates the spatial location of the speech source (step 107). Further details for the manner in which source location is accomplished is described in U. S.
5,778,082. This method involves estimating the time delay between signals arriving at a pair of microphones from a common source. As described in connection with the far-end audio analysis,-a near-end speech source is detected if the magnitude of a frequency component is significantly greater than the background noise for that frequency, and if the magnitude of the frequency component is greater than that acquired for that frequency in a predetermined number of preceding time frames. The fulfillment of both conditions signifies the start of a speech segment from a particular speech source. A speech source location is calculated by comparing the time delay of the signals received at the microphone arrays 12, 14, as determined by source locator 22.
Indexing engine 40 compares the newly derived source location parameters (step 107) to the parameters of previously detected sources (step 108). Due to errors in estimation and small movements of the person speaking, the new source location parameters may differ slightly from previously estimated parameters of the same person. If the difference between location parameters for the new source and old source is small enough, it is assumed that a previously detected source (person) is audible (speaking) again, and the speech segment in his/her timeline is simmly extended or reinstated (step 111).
The difference thresholds for the location parameters according to one particular embodiment of the invention are: 1. If the range of both of two sources (previously detected and current) is less than 2 meters, then it is determined that a new source is audible if: the pan angle difference is greater than 12 degrees, or the tilt angle difference is greater than 4 degrees, or the range difference is greater than. 5 meters.
2. If the range of either of two sources is greater than 2 meters but less than 3.5 meters, then it is determined that a new source is audible if: the pan angle difference is greater than 9 degrees, or the tilt angle difference is greater than 3 degrees, or the range difference is greater than. 75 meters.
3. If the range of either of two sources is greater than 3.5 meters, then it is determined that a new source is audible if: the pan angle difference is greater than 6 degrees, or the tilt angle difference is greater than 2 degrees, or the range difference is greater than 1 meter.
Video camera 24, according to this embodiment of the invention, is automatically pointed in the response to the determined location, at the current or most recent sound source. Thus, during a meeting, a continuous video recording can be made of each successive speaker. Indexing engine 40, based on correlating the elapsed times for the video images and sound segments, extracts still images from the video for purposes of providing images to be shown on GUI display 47 to allow the user to visually identify the person associated with a timeline (step 109). A new segment of data storage is begun for each new speaker (step 110).
Alternatively, a continuous video recording of the meeting can be sampled after the meeting is over, and still video images, such as pictures 81, 83, and 85 of the participants, can be extracted by the indexing engine 40 from the continuous stored video recording.
Occasionally, a person may change his position during a conference. The method of Fig. 4 treats the new position of the person as a new speaker. By using video pattern recognition and/or speaker audio identification techniques, however, the new speaker can be identified as being one of the old speakers who has moved. When such a positive identification occurs, the new speaker timeline (including, for example, images and sound segments, 85 and 94 in Fig. 3) can be merged with the original timeline for the speaker. Techniques of video-based tracking are discussed in a co'-pending patent application (Serial No.
09/79840, filed May 15, 1998) assigned to the assignee of the present invention, and the contents of which are hereby incorporated by reference. The co-pending application describes the combination of video with audio techniques for autopositioning the camera.
In some cases, more than one conference participant may appear in a still image. The timeline can also indicate any other conference participant who might also appear in the still image (for example, a neighbor sitting in close proximity to the speaker), but is silent at the particular elapsed time, thus giving a comprehensive overview of the sounds produced by all conference participants, as well as helping identify all persons present in the still images.
Conference data can also be indexed for a multipoint conference in which more than two sites engage in a conference together. In this multipoint configuration, microphone arrays at each site can send indexing information for the stream of video/audio/data content from that site to a central computer for storage and display.
Additions, deletions, and other modifications of the described embodiments will be apparent to those practiced in this field and are within the scope of the following claims.

Claims (26)

Claims
1. A method for indexing the content of a conference with at. least one participant, said method comprising : recording an audio recording of the conference; identifying a conference participant producing a sound; capturing a still image of the identified conference participant; correlating the still image of the conference participant to at least one audio segment portion of the audio recording, said at least one segment corresponding to the sound produced by the identified conference participant; and generating a timeline by creating at least one speech-present segment representing the correlated still image and associated at least one audio segment.
2. The method claimed in claim 1, further comprising: displaying the timeline on a display monitor; and accessing the timeline displayed on the monitor using a graphical user interface (GUI).
3. The method claimed in claim 2, wherein capturing the still image includes making a video recording of the conference and capturing a video image of the conference participant producing the sound from a segment of the associated video recording of the conference, and further comprising : using the GUI to select a portion of a specific audio segment for replaying portions of the audio and video recordings on a playback monitor.
4. The method of claim 1, wherein capturing the still image comprises capturing a video image of the conference participant producing the sound from a segment of an associated continuous video recording of the conference.
5. The method of claim 1, further comprising using a video camera to capture the still video image.
6. The method of claim 1 wherein identifying the conference participant is based on identifying the location of the participant.
7. The method of claim 6, wherein identifying the conference participant includes using a microphone array.
8. The method of claim 1, further comprising: storing time elapsed from a start of the conference with the audio segment and the still image, wherein the timeline is generated by an indexing engine matching the elapsed time associated with the audio segment and the still image.
9. The method of claim 1, further comprising: identifying a plurality of conference participants; capturing a still image of each one of the plurality of conference participants; storing a time elapsed from a start of the conference indicating the time of the capturing of each still image ; and storing a time elapsed from a start of the conference in association with the audio recording each time a change in audio source location is identified, wherein generating a timeline includes indicating for each identified conference participant the particular elapsed times from the start of the conference during which the particular participant was speaking, and wherein generating the timeline includes indicating any other conference participant who also appears in the video image and is silent at the particular elapsed time.
10. The method of claim 9, wherein a conference participant has been previously identified and wherein a speech-present segment is added to the timeline for the previously detected conference participant when the participant speaks.
11. The method of claim 10, wherein each identified conference participant is a near-end conference participant.
12. The method of claim 11, wherein identifying each near-end conference participant is based on location.
13. The method of claim 12, wherein a still image of a new near-end conference participant is identified and a new timeline is started for the new near-end conference participant, if the location of the new near-end conference participant is different from previously detected locations of the other identified near-end conference participant.
14. The method of claim 1, wherein the audio source is a far-end loudspeaker transmitting a sound from a far-end speech source, wherein the timeline is a far-end timeline, and wherein generating the far-end timeline includes creating a speech-present segment on the far-end timeline if a far-end speech source is present.
15. The method of claim 14, further comprising: accumulating a block of far-end loudspeaker microphone array data; accumulating a block of near-end microphone array data ; and suppressing echo by subtracting accumulated far-end loudspeaker data from accumulated near-end microphone array data.
16. The method of claim 1, further comprising: capturing a video image of a display presented at the conference; and generating a timeline for the captured video image of the display.
17. The method of claim 1, wherein the generated timeline is color-coded.
18. A system for indexing the content of a conference with at least one participant, said system comprising : a sound recording mechanism which records sound created by a conference participant; at least one source locator for identifying the location. of a conference participant, wherein the source locator generates signals corresponding to the location of the conference participant; a camera assembly including a camera and a camera movement device, which, in response to the signals generated by said source locator, moves the camera to point at the conference participant; an image capture unit for capturing an image of the conference participant; an image storage device for storing images captured by said image capture unit ; a processor for associating the image captured by the camera to the sound recorded by the sound recording mechanism and to create a timeline comprising images and indicators of presence of associated sound; and a graphical user interface which allows access to the stored sound, images, and timeline.
19. The system of claim 18, wherein the sound locator uses at least one microphone array.
20. The system of claim 18, wherein the sound locator uses a plurality of microphones.
21. The system of claim 18, wherein the sound locator comprises a plurality of microphone arrays.
22. A system for indexing the content of a conference with at least one participant, said system comprising: means for recording an audio recording of the conference; means for identifying each conference participant producing a sound; means for capturing a still image of each identified conference participant; and means for associating the still image of each identified conference participant to at least one audio segment portion of the audio recording corresponding to the sound produced by such conference participant.
23. A method for presenting an audio index database representation of'a conference comprising: generating a plurality of participant timelines, each timeline having at least one speech-present segment representing a correlated still image and at least one associated audio segment ; enabling a user to identify any of the segments representing audio desired ; and playing back the identified segment.
24. A method for indexing the content of a conference with at least one participant substantially as herein described with reference to Figures 1 to 4.
25. A system for indexing the content of a conference with at least one participant ged substantially as herein described and shown with reference to Figures 1 to 4.
26. A method for presenting an audio index database representation of a conference substantially as herein described with reference to Figures 1 to 4.
GB9916394A 1998-10-14 1999-07-13 Method and apparatus for indexing conference content Expired - Fee Related GB2342802B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US17346298A 1998-10-14 1998-10-14

Publications (3)

Publication Number Publication Date
GB9916394D0 GB9916394D0 (en) 1999-09-15
GB2342802A true GB2342802A (en) 2000-04-19
GB2342802B GB2342802B (en) 2003-04-16

Family

ID=22632148

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9916394A Expired - Fee Related GB2342802B (en) 1998-10-14 1999-07-13 Method and apparatus for indexing conference content

Country Status (2)

Country Link
JP (1) JP2000125274A (en)
GB (1) GB2342802B (en)

Cited By (155)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2351628A (en) * 1999-04-14 2001-01-03 Canon Kk Image and sound processing apparatus
GB2351627A (en) * 1999-03-26 2001-01-03 Canon Kk Image processing apparatus
WO2002013522A2 (en) * 2000-08-10 2002-02-14 Quindi Audio and video notetaker
EP1427205A1 (en) * 2001-09-14 2004-06-09 Sony Corporation Network information processing system and information processing method
FR2849564A1 (en) * 2002-12-31 2004-07-02 Droit In Situ METHOD AND SYSTEM FOR PRODUCING A MULTIMEDIA EDITION BASED ON ORAL SERVICES
US7113201B1 (en) 1999-04-14 2006-09-26 Canon Kabushiki Kaisha Image processing apparatus
US7117157B1 (en) 1999-03-26 2006-10-03 Canon Kabushiki Kaisha Processing apparatus for determining which person in a group is speaking
GB2429133A (en) * 2004-08-31 2007-02-14 Sony Corp Method and device for indexing image data to associated audio data
EP1906707A1 (en) * 2005-07-08 2008-04-02 Yamaha Corporation Audio transmission system and communication conference device
GB2486793A (en) * 2010-12-23 2012-06-27 Samsung Electronics Co Ltd Identifying a speaker via mouth movement and generating a still image
EP2557778A1 (en) * 2010-09-15 2013-02-13 ZTE Corporation Method and apparatus for video recording in video calls
US8452037B2 (en) 2010-05-05 2013-05-28 Apple Inc. Speaker clip
US8560309B2 (en) 2009-12-29 2013-10-15 Apple Inc. Remote conferencing center
WO2013169621A1 (en) * 2012-05-11 2013-11-14 Qualcomm Incorporated Audio user interaction recognition and context refinement
US8644519B2 (en) 2010-09-30 2014-02-04 Apple Inc. Electronic devices with improved audio
EP2709357A1 (en) * 2012-01-16 2014-03-19 Huawei Technologies Co., Ltd Conference recording method and conference system
US8811648B2 (en) 2011-03-31 2014-08-19 Apple Inc. Moving magnet audio transducer
US8858271B2 (en) 2012-10-18 2014-10-14 Apple Inc. Speaker interconnect
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8903108B2 (en) 2011-12-06 2014-12-02 Apple Inc. Near-field null and beamforming
US8942410B2 (en) 2012-12-31 2015-01-27 Apple Inc. Magnetically biased electromagnet for audio applications
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US8989428B2 (en) 2011-08-31 2015-03-24 Apple Inc. Acoustic systems in electronic devices
US9007871B2 (en) 2011-04-18 2015-04-14 Apple Inc. Passive proximity detection
US9020163B2 (en) 2011-12-06 2015-04-28 Apple Inc. Near-field null and beamforming
US9225701B2 (en) 2011-04-18 2015-12-29 Intelmate Llc Secure communication systems and methods
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9357299B2 (en) 2012-11-16 2016-05-31 Apple Inc. Active protection for acoustic device
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9525943B2 (en) 2014-11-24 2016-12-20 Apple Inc. Mechanically actuated panel acoustic system
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9746916B2 (en) 2012-05-11 2017-08-29 Qualcomm Incorporated Audio user interaction recognition and application interface
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9820033B2 (en) 2012-09-28 2017-11-14 Apple Inc. Speaker assembly
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
NO20160989A1 (en) * 2016-06-08 2017-12-11 Pexip AS Video Conference timeline
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9858948B2 (en) 2015-09-29 2018-01-02 Apple Inc. Electronic equipment with ambient noise sensing input circuitry
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9900698B2 (en) 2015-06-30 2018-02-20 Apple Inc. Graphene composite acoustic diaphragm
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10063977B2 (en) 2014-05-12 2018-08-28 Apple Inc. Liquid expulsion from an orifice
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10284951B2 (en) 2011-11-22 2019-05-07 Apple Inc. Orientation-based audio
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10402151B2 (en) 2011-07-28 2019-09-03 Apple Inc. Devices with enhanced audio
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10757491B1 (en) 2018-06-11 2020-08-25 Apple Inc. Wearable interactive audio device
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10873798B1 (en) 2018-06-11 2020-12-22 Apple Inc. Detecting through-body inputs at a wearable audio device
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11307661B2 (en) 2017-09-25 2022-04-19 Apple Inc. Electronic device with actuators for producing haptic and audio output along a device housing
US11334032B2 (en) 2018-08-30 2022-05-17 Apple Inc. Electronic watch with barometric vent
US11499255B2 (en) 2013-03-13 2022-11-15 Apple Inc. Textile product having reduced density
US11561144B1 (en) 2018-09-27 2023-01-24 Apple Inc. Wearable electronic device with fluid-based pressure sensing
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11857063B2 (en) 2019-04-17 2024-01-02 Apple Inc. Audio output system for a wirelessly locatable tag

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4212274B2 (en) * 2001-12-20 2009-01-21 シャープ株式会社 Speaker identification device and video conference system including the speaker identification device
US7598975B2 (en) 2002-06-21 2009-10-06 Microsoft Corporation Automatic face extraction for use in recorded meetings timelines
JP2005277445A (en) * 2004-03-22 2005-10-06 Fuji Xerox Co Ltd Conference video image processing apparatus, and conference video image processing method and program
JP2005354541A (en) * 2004-06-11 2005-12-22 Fuji Xerox Co Ltd Display apparatus, system, and display method
JP2005352933A (en) * 2004-06-14 2005-12-22 Fuji Xerox Co Ltd Display arrangement, system, and display method
JP4656395B2 (en) * 2005-03-30 2011-03-23 カシオ計算機株式会社 Recording apparatus, recording method, and recording program
JP2007052565A (en) 2005-08-16 2007-03-01 Fuji Xerox Co Ltd Information processing system and information processing method
JP5573402B2 (en) * 2010-06-21 2014-08-20 株式会社リコー CONFERENCE SUPPORT DEVICE, CONFERENCE SUPPORT METHOD, CONFERENCE SUPPORT PROGRAM, AND RECORDING MEDIUM

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60205151A (en) * 1984-03-29 1985-10-16 Toshiba Electric Equip Corp Sun tracking device
EP0660249A1 (en) * 1993-12-27 1995-06-28 AT&T Corp. Table of contents indexing system
WO1997001932A1 (en) * 1995-06-27 1997-01-16 At & T Corp. Method and apparatus for recording and indexing an audio and multimedia conference
US5717869A (en) * 1995-11-03 1998-02-10 Xerox Corporation Computer controlled display system using a timeline to control playback of temporal data representing collaborative activities
US5729741A (en) * 1995-04-10 1998-03-17 Golden Enterprises, Inc. System for storage and retrieval of diverse types of information obtained from different media sources which includes video, audio, and text transcriptions
US5786814A (en) * 1995-11-03 1998-07-28 Xerox Corporation Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03162187A (en) * 1989-11-21 1991-07-12 Mitsubishi Electric Corp Video conference equipment
JP3266959B2 (en) * 1993-01-07 2002-03-18 富士ゼロックス株式会社 Electronic conference system
JPH06266632A (en) * 1993-03-12 1994-09-22 Toshiba Corp Method and device for processing information of electronic conference system
US5778082A (en) * 1996-06-14 1998-07-07 Picturetel Corporation Method and apparatus for localization of an acoustic source
JPH10145763A (en) * 1996-11-15 1998-05-29 Mitsubishi Electric Corp Conference system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60205151A (en) * 1984-03-29 1985-10-16 Toshiba Electric Equip Corp Sun tracking device
EP0660249A1 (en) * 1993-12-27 1995-06-28 AT&T Corp. Table of contents indexing system
US5729741A (en) * 1995-04-10 1998-03-17 Golden Enterprises, Inc. System for storage and retrieval of diverse types of information obtained from different media sources which includes video, audio, and text transcriptions
WO1997001932A1 (en) * 1995-06-27 1997-01-16 At & T Corp. Method and apparatus for recording and indexing an audio and multimedia conference
US5717869A (en) * 1995-11-03 1998-02-10 Xerox Corporation Computer controlled display system using a timeline to control playback of temporal data representing collaborative activities
US5786814A (en) * 1995-11-03 1998-07-28 Xerox Corporation Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities

Cited By (228)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2351627A (en) * 1999-03-26 2001-01-03 Canon Kk Image processing apparatus
GB2351627B (en) * 1999-03-26 2003-01-15 Canon Kk Image processing apparatus
US7117157B1 (en) 1999-03-26 2006-10-03 Canon Kabushiki Kaisha Processing apparatus for determining which person in a group is speaking
GB2351628B (en) * 1999-04-14 2003-10-01 Canon Kk Image and sound processing apparatus
GB2351628A (en) * 1999-04-14 2001-01-03 Canon Kk Image and sound processing apparatus
US7113201B1 (en) 1999-04-14 2006-09-26 Canon Kabushiki Kaisha Image processing apparatus
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
WO2002013522A2 (en) * 2000-08-10 2002-02-14 Quindi Audio and video notetaker
WO2002013522A3 (en) * 2000-08-10 2003-10-30 Quindi Audio and video notetaker
EP1427205A4 (en) * 2001-09-14 2006-10-04 Sony Corp Network information processing system and information processing method
EP1427205A1 (en) * 2001-09-14 2004-06-09 Sony Corporation Network information processing system and information processing method
FR2849564A1 (en) * 2002-12-31 2004-07-02 Droit In Situ METHOD AND SYSTEM FOR PRODUCING A MULTIMEDIA EDITION BASED ON ORAL SERVICES
WO2004062285A1 (en) * 2002-12-31 2004-07-22 Dahan Templier Jennifer Method and system for producing a multimedia publication on the basis of oral material
GB2429133B (en) * 2004-08-31 2007-08-29 Sony Corp Recording and reproduction device
US7636121B2 (en) 2004-08-31 2009-12-22 Sony Corporation Recording and reproducing device
GB2429133A (en) * 2004-08-31 2007-02-14 Sony Corp Method and device for indexing image data to associated audio data
EP1906707A1 (en) * 2005-07-08 2008-04-02 Yamaha Corporation Audio transmission system and communication conference device
EP1906707A4 (en) * 2005-07-08 2010-01-20 Yamaha Corp Audio transmission system and communication conference device
US8208664B2 (en) 2005-07-08 2012-06-26 Yamaha Corporation Audio transmission system and communication conference device
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US8560309B2 (en) 2009-12-29 2013-10-15 Apple Inc. Remote conferencing center
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en) 2010-01-25 2021-04-20 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en) 2010-01-25 2021-04-20 New Valuexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US11410053B2 (en) 2010-01-25 2022-08-09 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US9424862B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US9424861B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US9431028B2 (en) 2010-01-25 2016-08-30 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US8452037B2 (en) 2010-05-05 2013-05-28 Apple Inc. Speaker clip
US10063951B2 (en) 2010-05-05 2018-08-28 Apple Inc. Speaker clip
US9386362B2 (en) 2010-05-05 2016-07-05 Apple Inc. Speaker clip
US8866867B2 (en) 2010-09-15 2014-10-21 Zte Corporation Method and apparatus for video recording in video calls
EP2557778A4 (en) * 2010-09-15 2014-01-15 Zte Corp Method and apparatus for video recording in video calls
EP2557778A1 (en) * 2010-09-15 2013-02-13 ZTE Corporation Method and apparatus for video recording in video calls
US8644519B2 (en) 2010-09-30 2014-02-04 Apple Inc. Electronic devices with improved audio
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
GB2486793B (en) * 2010-12-23 2017-12-20 Samsung Electronics Co Ltd Moving image photographing method and moving image photographing apparatus
GB2486793A (en) * 2010-12-23 2012-06-27 Samsung Electronics Co Ltd Identifying a speaker via mouth movement and generating a still image
US8687076B2 (en) 2010-12-23 2014-04-01 Samsung Electronics Co., Ltd. Moving image photographing method and moving image photographing apparatus
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US8811648B2 (en) 2011-03-31 2014-08-19 Apple Inc. Moving magnet audio transducer
US9674625B2 (en) 2011-04-18 2017-06-06 Apple Inc. Passive proximity detection
US9007871B2 (en) 2011-04-18 2015-04-14 Apple Inc. Passive proximity detection
US10032066B2 (en) 2011-04-18 2018-07-24 Intelmate Llc Secure communication systems and methods
US9225701B2 (en) 2011-04-18 2015-12-29 Intelmate Llc Secure communication systems and methods
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10402151B2 (en) 2011-07-28 2019-09-03 Apple Inc. Devices with enhanced audio
US10771742B1 (en) 2011-07-28 2020-09-08 Apple Inc. Devices with enhanced audio
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US8989428B2 (en) 2011-08-31 2015-03-24 Apple Inc. Acoustic systems in electronic devices
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10284951B2 (en) 2011-11-22 2019-05-07 Apple Inc. Orientation-based audio
US8903108B2 (en) 2011-12-06 2014-12-02 Apple Inc. Near-field null and beamforming
US9020163B2 (en) 2011-12-06 2015-04-28 Apple Inc. Near-field null and beamforming
EP2709357A4 (en) * 2012-01-16 2014-11-12 Huawei Tech Co Ltd Conference recording method and conference system
EP2709357A1 (en) * 2012-01-16 2014-03-19 Huawei Technologies Co., Ltd Conference recording method and conference system
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9746916B2 (en) 2012-05-11 2017-08-29 Qualcomm Incorporated Audio user interaction recognition and application interface
WO2013169618A1 (en) * 2012-05-11 2013-11-14 Qualcomm Incorporated Audio user interaction recognition and context refinement
US10073521B2 (en) 2012-05-11 2018-09-11 Qualcomm Incorporated Audio user interaction recognition and application interface
WO2013169621A1 (en) * 2012-05-11 2013-11-14 Qualcomm Incorporated Audio user interaction recognition and context refinement
US9736604B2 (en) 2012-05-11 2017-08-15 Qualcomm Incorporated Audio user interaction recognition and context refinement
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9820033B2 (en) 2012-09-28 2017-11-14 Apple Inc. Speaker assembly
US8858271B2 (en) 2012-10-18 2014-10-14 Apple Inc. Speaker interconnect
US9357299B2 (en) 2012-11-16 2016-05-31 Apple Inc. Active protection for acoustic device
US8942410B2 (en) 2012-12-31 2015-01-27 Apple Inc. Magnetically biased electromagnet for audio applications
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11499255B2 (en) 2013-03-13 2022-11-15 Apple Inc. Textile product having reduced density
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10063977B2 (en) 2014-05-12 2018-08-28 Apple Inc. Liquid expulsion from an orifice
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10362403B2 (en) 2014-11-24 2019-07-23 Apple Inc. Mechanically actuated panel acoustic system
US9525943B2 (en) 2014-11-24 2016-12-20 Apple Inc. Mechanically actuated panel acoustic system
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US9900698B2 (en) 2015-06-30 2018-02-20 Apple Inc. Graphene composite acoustic diaphragm
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US9858948B2 (en) 2015-09-29 2018-01-02 Apple Inc. Electronic equipment with ambient noise sensing input circuitry
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
NO20160989A1 (en) * 2016-06-08 2017-12-11 Pexip AS Video Conference timeline
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11307661B2 (en) 2017-09-25 2022-04-19 Apple Inc. Electronic device with actuators for producing haptic and audio output along a device housing
US11907426B2 (en) 2017-09-25 2024-02-20 Apple Inc. Electronic device with actuators for producing haptic and audio output along a device housing
US10757491B1 (en) 2018-06-11 2020-08-25 Apple Inc. Wearable interactive audio device
US10873798B1 (en) 2018-06-11 2020-12-22 Apple Inc. Detecting through-body inputs at a wearable audio device
US11743623B2 (en) 2018-06-11 2023-08-29 Apple Inc. Wearable interactive audio device
US11740591B2 (en) 2018-08-30 2023-08-29 Apple Inc. Electronic watch with barometric vent
US11334032B2 (en) 2018-08-30 2022-05-17 Apple Inc. Electronic watch with barometric vent
US11561144B1 (en) 2018-09-27 2023-01-24 Apple Inc. Wearable electronic device with fluid-based pressure sensing
US11857063B2 (en) 2019-04-17 2024-01-02 Apple Inc. Audio output system for a wirelessly locatable tag

Also Published As

Publication number Publication date
GB2342802B (en) 2003-04-16
JP2000125274A (en) 2000-04-28
GB9916394D0 (en) 1999-09-15

Similar Documents

Publication Publication Date Title
GB2342802A (en) Indexing conference content onto a timeline
KR101238586B1 (en) Automatic face extraction for use in recorded meetings timelines
Lee et al. Portable meeting recorder
Cutler et al. Distributed meetings: A meeting capture and broadcasting system
JP3143125B2 (en) System and method for recording and playing multimedia events
US7428000B2 (en) System and method for distributed meetings
US5548346A (en) Apparatus for integrally controlling audio and video signals in real time and multi-site communication control method
US7113201B1 (en) Image processing apparatus
US7355623B2 (en) System and process for adding high frame-rate current speaker data to a low frame-rate video using audio watermarking techniques
JP3620855B2 (en) Method and apparatus for recording and indexing audio and multimedia conferences
US7362350B2 (en) System and process for adding high frame-rate current speaker data to a low frame-rate video
US20060251384A1 (en) Automatic video editing for real-time multi-point video conferencing
CN107820037B (en) Audio signal, image processing method, device and system
US7355622B2 (en) System and process for adding high frame-rate current speaker data to a low frame-rate video using delta frames
CN111193890B (en) Conference record analyzing device and method and conference record playing system
JP2006085440A (en) Information processing system, information processing method and computer program
JP4414708B2 (en) Movie display personal computer, data display system, movie display method, movie display program, and recording medium
WO2002013522A2 (en) Audio and video notetaker
Wu et al. MoVieUp: Automatic mobile video mashup
Arnaud et al. The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements
Sumec Multi camera automatic video editing
JP6860178B1 (en) Video processing equipment and video processing method
TWI799048B (en) Panoramic video conference system and method
JP2000333125A (en) Editing device and recording device

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20150713