GB2342802A - Indexing conference content onto a timeline - Google Patents
Indexing conference content onto a timeline Download PDFInfo
- Publication number
- GB2342802A GB2342802A GB9916394A GB9916394A GB2342802A GB 2342802 A GB2342802 A GB 2342802A GB 9916394 A GB9916394 A GB 9916394A GB 9916394 A GB9916394 A GB 9916394A GB 2342802 A GB2342802 A GB 2342802A
- Authority
- GB
- United Kingdom
- Prior art keywords
- conference
- participant
- sound
- audio
- timeline
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/38—Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
- H04M3/382—Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords
- H04M3/385—Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords using speech signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/38—Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
- H04M3/387—Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using subscriber identification cards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/567—Multimedia conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/60—Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
- H04M2203/6045—Identity confirmation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2242/00—Special services or facilities
- H04M2242/30—Determination of the location of a subscriber
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42221—Conversation recording systems
Abstract
A method and system to index the content of conferences. It includes identifying each such conference participant producing a sound, capturing an image of each such conference participant, and with correlating the images of the conference participants with audio segments of an audio recording, that is the segments corresponding to the audio produced by the conference participant. The indexing system includes a sound recording mechanism, at least one identifier of locations of conference participants, a camera, an image storage device, a processor for associating the still images captured by the camera to the sound recorded by the sound recording mechanism thereby correlating still images of the conference participants to audio segments produced by the conference participants, and a graphical user interface which allows easy access to stored sound, images, and correlated data. It may also include an aiming device for pointing the camera at the person speaking.
Description
METHOD AND APPARATUS FOR INDEXING CONFERENCE CONTENT
This invention relates to the field of multimedia.
With the advent of economical digital storage media and sophisticated video/audio decompression technology capable of running on personal computers, thousands of hours of digitized video/audio data can be stored with virtually instantaneous random access. In order for this stored data to be utilized, it must be indexed efficiently in a manner allowing a user to find desired portions of the digitized video/audio data quickly.
For recorded conferences having a number of participants, indexing is generally performed on the basis of"who"said"what"and"when" (at what time). Currently used methods of indexing do not reliably give this information, primarily because video pattern recognition, speech recognition, and speaker identification techniques are unreliable technologies in the noisy, reverberant, uncontrolled environments in which conferences occur.
Also, a need exists for a substitut for tedious trial-and-error techniques for finding when a conference participant first starts speaking in a recording.
The invention features a method and a system for indexing the content of a conference by matching images captured during the conference to the recording of sounds produced by conference participants.
Using reliable sound source localization technology implemented with microphone arrays, the invention produces reliable information concerning"who"and"when" (which persons spoke at what time) for a conference. While information concerning"whatn (subject matter) is missing, the"who-when"information greatly facilitates manual annotation for the missing"what"information. In many search-retrieval situations, the"who-when"information alone will be sufficient for indexing.
In one aspect of the invention, the method includes identifying a conference participant producing a sound, capturing a still image of the conference participant, correlating the still image of the conference participant to the audio segments of the audio recording corresponding to the sound produced by the conference participant, and generating a timeline by creating a speech-present segment representing the correlated still image and associated audio segment. Thus, the timeline includes speech-present segments representing a still image and associated audio segments. The still image is a visual representation of the sound source producing the associated audio segments.
The audio recording can be segmente into audio segment portions and associated with conference participants, whose images are captured, for example, with a video camera.
Embodiments of this aspect of the invention may include one or more of the following features.
The still image of each conference participant producing a sound is captured as a segment of a continuous video recording of the conference, thereby establishing a complete visual indicator of all speakers participating in a conference.
The timeline is presented visually so that a user can quickly and easily access individual segments of the continuous recording of the conference.
The timeline can include a colored line or bar representing the duration of each speech segment with a correlated image to index the recorded conference. The timeline can be presented as a graphical user interface (GUI), so that the user can use an input device (for example, a mouse) to select or highlight the appropriate part of the timeline corresponding to the start of the desired recording, access that part, and start playing the recording. Portions of the audio and video recordings can be played on a playback monitor.
Various approaches can be used to identify a conference participant. In one embodiment, a microphone array is used to locate the conference participant by sound.
The microphone arrays together with reliable sound source localization technology reliably and accurately estimate the position and presence of sound sources in space.
The time elapsed from a start of the conference is stored with each audio segment and each still image. An indexing engine can be provided to generate the timeline by matching the elapsed time associated with an audio segment and a still image.
The system can be used to index a conference with only one participant. The timeline then includes an indication of the times in which sound was produced, as well as an image of the lone participant.
In applications in which more than one conference participant is present and identified, the system stores the times elapsed from the start of the conference and identifications of when a speaker begins speaking with each still image, a participant being associated with each image.
The elapsed time is also stored with the audio recording each time a change in sound location is identified. The indexing engine creates an index, that is, a list of associated images and sound segments. Based on this index, a timeline is then generated for each still image (that is, each conference participant) designating the times from the start of the conference when the participant speaks. The timeline also indicates any other conference participant who might also appear in the still image (for example, a neighbor sitting in close proximity to the speaker), but is silent at the particular elapsed time, thus giving a comprehensive overview of the sounds produced by all conference participants, as well as helping identify all persons present in the still images. The timeline may be generated either in real time or after the conference is finished.
In embodiments in which a video camera is used to capture still images of the conference participants, it can also be used to record a continuous video recording of the conference.
The system can be used for a conference with all participants in one room (near-end participants) as well as for a conference with participants (far-end participants) at another site.
Assuming that a speaker has limited movement during a conference, the same person is assumed to be talking every time sound is detected from a particular locality. Thus, if the speech source is determined to be the same as the locality of a previously detected conference participant, a speech-present segment is added to the timeline for the previously detected conference participant. If the location of a conference participant is different from a previously detected location of a near-end conference participant, a still image of the new near-end conference participant is stored and a new timeline is started for the new near-end conference participant.
In a video conference involving a far-end participant, the audio source is a loudspeaker at the near end transmitting a sound from a far-end speech source. The timeline is then associated with the far-end, and generating a timeline includes creating a speech-present segment for . the far-end if a far-end speech source is present. Thus, a user of the invention can identify and access far-end speech segments. Further, if a far-end speech source is involved in the conference, echo can be suppressed by subtracting a block of accumulated far-end loudspeaker data from a block of accumulated near-end microphone array data.
Advantageously, therefore, a video image of a display presented at the conference is captured, and a timeline is generated for the captured video image of the display. This enables the indexing of presentation material as well as sounds produced by conference participants.
The present invention is illustrated in the following figures.
Fig. 1 is a schematic representation of a videoconferencing embodiment using two microphone arrays;
Fig. 2 is a block diagram of the computer which performs some of the functions illustrated in Fig. 1 ;
Fig. 3 is an exemplary display showing timelines generated during a videoconference; and
Fig. 4 is a flow diagram illustrating operation of the microphone array conference indexing method.
While the description which follows is associated with a videoconference between the local or near-end site and a distant or far-end site, the invention can be used with a single site conference as well.
Referring then to Fig. 1, a videoconference indexing system 10 (shown enclosed by dashed lines) is used to record and index a videoconference having, in this particular embodiment, four conference participants 62,64,66,68 sitting around a table 60 and engaged in a videoconference.
One or more far-end conference participants (not shown) also participate in the conference through the use of a local videoconferencing system 20 connected over a communication channel 16 to a far-end video conferencing system 18. The communication channel 16 connects the far-end video conferencing system to the near-end videoconferencing system 20 and far-end decompressed audio is available to a source locator 22.
Videoconference indexing system 10 includes videoconferencing system 20, a computer 30, and a playback system 50. Videoconferencing system 20 includes a display monitor 21 and a loudspeaker 23 for allowing the far-end conference participant to be seen and heard by conference participants 62,64,66, and 68. In an alternative embodiment, the embodiment shown in Fig. 1 is used to record a meeting not in a conference-call mode, so the need for the display monitor 21 and loudspeaker 23 of videoconferencing system 20 is eliminated. System 20 also includes microphone arrays 12,14 for acquiring sound (for example, participants'speech), the source locator 22 for determining the location of a sound-producing conference participant, and a video camera 24 for capturing video images of the setting and participants as part of a continuous video 'recording. In one embodiment, source locator 22 is a standalone hardware, called"LIMELIGHT", manufactured and sold by PictureTel Corporation, and which is a videoconferencing unit having an integrated motorized camera and microphone array. The"LIMELIGHT"locator 22 has a digital signal processing (DSP) integrated circuit which efficiently implements the source locator function, receiving electrical signals representing sound picked up in the room and outputting source location parameters. Further details of the structure and implementation of the"Limelightwn system is described in U. S. 5, 778,082, the contents of which are incorporated herein by reference. (In other embodiments of the invention, multiple cameras and microphone configurations can be used.)
Alternative methods can be used to fulfill the function of source locator 22. For example, a camera video pattern recognition algorithm can be used to identify the location of an audio source, based on mouth movements. In another embodiment of the invention, an infrared motion detector can be used to identify an audio source location, for example to detect a speaker approaching a podium.
Computer 30 includes an audio storage 32 and a video storage 34 for storing audio and video data provided from microphone arrays 12,14 and video camera 24, respectively.
Computer 30 also includes an indexing engine software module 40 whose operations will be discussed in greater detail below.
Referring to Fig. 2, the hardware for computer 30 used to store and process data and computer instructions is shown. In particular, computer 30 includes a processor 31, a memory storage 33, and a working memory 35, all of which are connected by an interface bus 37. Memory storage 33, typically a disk drive, is used for storing the audio and video data provided from microphone arrays 12,14 and camera 24, respectively, and thus includes audio storage 32 and video storage 34. In operation, indexing engine software 40 is loaded into working memory 35, typically RAM, from memory storage 33 so that the computer instructions from the indexing engine can be processed by processor 31. Computer 30 serves as an intermediate storage facility which records, compresses, and combines the audio, video, and indexing information data as the actual conference occurs.
Referring again to Fig. 1, playback system 50 is connected to computer 30 and includes a playback display 52 and a playback server 54, which together allow the recording of the videoconference to be reviewed quickly and accessed at a later time.
Although a more detailed description of the operation is provided below, in general, microphone arrays 12,14 generate signals, in response to sound generated in the videoconference, which are sent to source locator 22.
Source locator 22, in turn, transmits signals representative of the location of a sound source both to a pointing mechanism 26 connected to video camera 24 and to computer 30. These signals are transmitted along lines 27 and 28, respectively. Pointing mechanism 26 includes motors which, in the most general case, control panning, tilting, zooming, and auto-focus functions of the video camera (subsets of these functions can also be used). Further details of pointing mechanism 26 are described in U. S. 5,633,681, incorporated herein by reference. Video camera 24, in response to the signals from source locator 22, is then pointed,. by pointing mechanism 26, in the direction of the conference participant who is the current sound source.
Images of the conference participant captured by video camera 24 are stored in video storage 34 as video data, along with an indication of the time which has elapsed from the start of the conference.
Simultaneously, the sound picked up by microphone arrays 12,14 is transmitted to and stored in audio storage 32, also along with the time which has elapsed from the start of the conference until the beginning of each new sound segment. Thus, the elapsed time is stored with each sound segment in audio storage 32. A new sound segment corresponds to each change, determined by the source locator 22, in the detected location of sound source.
In order to minimize storage requirements, both the audio and video data are stored, in this illustrated embodiment, in a compressed format. If further storage minimization is necessary, only those portions of the videoconference during which speech is detected will be stored, and further, if necessary, the video data, other than the conference participant still images, need not be stored.
Although the embodiment illustrated in Fig. 1 uses one camera, more than one camera can be used to capture the video images of conference participants. This approach is especially useful for cases where one participant may block a camera's view of another participant. Alternatively, a separate camera can be dedicated to recording, for example viewgraphs or whiteboard drawings, shown during the course of a conference.
As noted above, audio storage 32 and video storage 34 are both part of computer 30 and the stored audio and video images are available to both the indexing engine 40 and playback system 50. The latter includes the playback display 52 and the playback server 54 as noted above.
Indexing engine 40 associates the stored video images to the stored sound (segments) based on elapsed time from the start of the conference, and generates a file with indexing information; it indexes compressed audio and video data using a protocol such as, for example, AVI format. Foi long term storage, audio, video, and indexing information i : transmitted from computer 30 to the playback server 54 for access by users of the system. Playback server 54 can retrieve from its''own. memory the audio and video data when requested by a user. Playback server 54 stores data from the conference in such a way as to make it quickly available for many users on a computer network. In one embodiment, playback server 54 includes many computers, with a library of multimedia files distributed across the computers. A user can access playback server 54 as well as the information generated by the indexing engine 40 by using GUI 45 with a GUI display 47. Then, the playback display terminal 52 is used to display video data stored in video storage 34 and to play audio data stored in audio storage 32; playback display 52 is also used to display video data and to play audio data stored in playback server 54.
Alternatively, instead of using video images for indexing, an icon is generated based on a still image selected from the continuous video recording. Then, the icon of the conference participant is associated with the audio segment generated by the conference participant. Thus the system builds a database index associating with each identified sound source and its representative icon or image, a sequence of elapsed times and time durations for each instance when the participant was a"sound source".
The elapsed times and the durations can be used to access the stored audio and video as described in detail below.
One feature of the invention is to index conference content using the identification of various sound sources and their locations. In the embodiment shown in Fig. 1, the identification and location of sound sources is achieved by the source locator 22 and the two microphone arrays 12, 14. Each microphone array is a PictureTel"LimeLight" array having four microphones, one microphone positioned at each vertex of an inverted T and at the intersection of the two linear portions of the"T". In this illustrated embodiment, the inverted T array has a height of 12 inches and a width of 18 inches. Arrays of this type are described in U. S. Patent 5,778,082 by Chu et al., the contents of which are incorporated herein by reference.
In other embodiments, other microphone array position estimation procedures and microphone array configurations, with different structures and techniques of estimating spatial location, can be used to locate a sound source. For example, a microphone can be situated close to each conference participant, and any microphone with a sufficiently loud signal indicates that the particular person associated with that microphone is speaking.
Accurate time-of-arrival difference times of emitted sound in the room are obtained between selected combinations of microphone pairs in each microphone array 12,14 by the use of a highly modified cross-correlation technique (modified for robustness to room echo and background noise degradation) as described in U. S. 5, 778,082. Assuming plane sound waves (the far-field assumption), these pairs of timedifferences can be translated by source locator 22 correspondingly into bearing angles from the respective array. The angles provide an estimate of the location of the sound source in three-dimensional space.
In the embodiment shown in Fig. 1, the sound is picked up by a microphone array integrated with the sound localization array, so that the microphone arrays serve double duty as both sound localization and sound pick-up apparatus. However, in other embodiments, one microphone or microphone array can be used for recording while another microphone or microphone array can be used for sound localization.
Although two microphone arrays 12,14 are shown in use with videoconferencing indexing system 10, only one array is required. In other embodiments, the number and configurations of microphone arrays may vary, for example, from one microphone to many. Using more than one array provides advantages. In particular, while the azimuth and elevation angles provided by each of arrays 12,14 are highly accurate and are estimated to within a fraction of a degree, range estimates are not nearly as accurate. Even though the range error is higher, however, the information is sufficient for use with pointing mechanism 26.
However, the larger range estimation error of the microphone arrays gives rise to sound source ambiguity problems for a single microphone array. Thus, with reference to Fig. 1, microphone array 12 might view persons 66,68 as the same person, since their difference in range to microphone array 12 might be less than the range error of array 12. To address this problem, source localization estimates from microphone array 14 could be used by source locator 22 as a second source of information to separate persons 66 and 68, since persons 66 and 68 are separated substantially in azimuth angle from the viewpoint of microphone array 14.
An alternative approach to indexing by sound source location is to use manual camera position commands such as pan/tilt commands and presets to index the meeting. These commands in general may indicate a change in content whereby a change in camera position is indicative of a change in sound source location.
Fig. 3 shows an example of a display 80, viewed on
GUI display 47 (Fig. 1), resulting from a videoconference.
The following features, included in the display 80, indicate to a user of system 10 exactly who was speaking and when that person spoke, Horizontal axis 99 is a time scale, representing the actual time during the recorded conference.
Pictures of conference participants appear along the vertical axis of display 80. Indexing engine 40 (Fig. 1) selects and extracts from video storage 34 pictures 81,83, 85 of conference participants 62,64,66, on the basis of elapsed time from the start of the conference and the beginning of new sound segments. These pictures represent the conference participant (s) producing the sound segment (s). Pictures 81,83,85 are single still frames from a continuous video recording captured by video camera 24 and stored in video storage 34. A key criteria for selection of images for the pictures is the elapsed time from the start of the conference to the beginning of each respective sound segment: the pictures selected for the timeline are the ones which are captured at the same elapsed time as the beginning of each respective sound segment.
Display 80 includes a picture 87, denoting a far-end conference participant. This image, too, is selected by the indexing engine 40. It can be an image of the far-end conference participant, if images from a far-end camera are available. Alternatively, it can be an image of a logo, a photograph, etc., captured by a near-end camera.
Display 80 also includes a block 89 representing, for example, data presented by one of the conference participants at the conference. Data content can be recorded by use of an electronic viewgraph display system (not shown) which provides signals to videoconferencing system 20. Alternatively, a second camera can be used to record slides presented with a conventional viewgraph. The slides, greatly reduced in size, would then form part of display 80.
Associated with each picture 81,83,85,87 and block 89 are line'segments representing when sound corresponding to each respective picture occurred. For example, segments 90,92,92', and 94 represent the duration of sound produced by three conference participants, e. g. 62, 64, and 66 of Fig. 1. Segment 96 represents sounds produced by a far-end conference participant (not shown in Fig. 1).
Segments 97 and 98, on the other hand, show when data content was displayed during the presentation and show a representation of the data content. The segments may be different colors, with different meaning assigned to each color. For example, a blue line could represent a near-end sound source, and a red line could represent a far-end sound source. In essence, the pictures and blocks, together with the segments, provide a series of timelines for each conference participant and presented data block.
In display 80, the content of what each person 62, 64,66 said is not presented, but this information can, if desired, be filled in after-the-fact by manual annotation, such as a note on the display 80 through the GUI 45 at each speech segment 90,92,92', and 94.
A user can view display 80 using GUI 45, GUI display 47, and playback display 52. In particular, the user can click a mouse or other input device (for example, a trackball or cursor control keys on a keyboard) on any point in segments 90,92,92', 94,96,97, and 98 in the display 80 to access and playback or display that portion of the stored conference file.
A flow diagram of a method 100, according to the invention, is presented in Fig. 4. Method 100 of Fig. 4 is generic to system operation, and could be applied to a wide variety of different microphone array configurations. With reference also to. Figs. 1-3, the operation of system will be described.
In operation, audio is simultaneously acquired from both the far end and the near end of a videoconference.
From the far end, audio is continuously acquired for successive preselected durations of time as it is received by videoconferencing system 20 (step 101). Audio received from the far-end videoconferencing system 18 is thus directed to the source locator 22 (step 102). The source locator analyzes the frequency components of far end audio signals. The onset of a new segment is characterized by i) the magnitude of a particular frequency component being greater than the background noise for that frequency and~ii) the magnitude of a particular frequency component being greater than the magnitude of the same component acquired during a predetermined number of preceding time frames. If speech is present, an audio segment (e. g., segment 96 in
Fig. 3) is begun (step 103) for the timeline corresponding to audio produced by the far-end conference participant (s).
An audio segment is continued for the timeline, corresponding to a far-end conference participant, if speech continues to be present at the far-end and there has been no temporal interruption since the beginning of the previously started audio segment.
While the preselected durations of far-end audio are being acquired (step 101) and analyzed, the system simultaneously acquires successive N second durations of audio from microphone arrays 12,14 (step 104). Because the audio from the far-end site can interfere with near-end detection of audio in the room, the far-end signal received through the microphone arrays is suppressed by the subtraction of a block of N second durations of far-end audio from the acquired near-end audio (step 105). In this way, false sound localization of the loudspeaker as a "person" (audio source) will not occur. Echo suppression will not affect a signal resulting from two near-end participants speaking simultaneously. In this case, the sound locator locates both participants, locates the stronger of the two, or does nothing.
Echo suppression can be implemented with adaptive filters, or by use of a bandpass filter bank (not shown) with band-by-band gating (setting to zero those bands with significant far-end energy, allowing processing to occur only on bands with far-end energy near the far-end background noise level), as is well-known to those skilled in the art. Methods for achieving both adaptive filtering and echo suppression are described in U. S. 5, 305, 307 by Chu, the contents of which are incorporated herein by reference.
The detection and location of speech of a near-end source is determined (step 106) using source locator 22 and microphone arrays 12,14. If speech is detected, then source locator 22 estimates the spatial location of the speech source (step 107). Further details for the manner in which source location is accomplished is described in U. S.
5,778,082. This method involves estimating the time delay between signals arriving at a pair of microphones from a common source. As described in connection with the far-end audio analysis,-a near-end speech source is detected if the magnitude of a frequency component is significantly greater than the background noise for that frequency, and if the magnitude of the frequency component is greater than that acquired for that frequency in a predetermined number of preceding time frames. The fulfillment of both conditions signifies the start of a speech segment from a particular speech source. A speech source location is calculated by comparing the time delay of the signals received at the microphone arrays 12, 14, as determined by source locator 22.
Indexing engine 40 compares the newly derived source location parameters (step 107) to the parameters of previously detected sources (step 108). Due to errors in estimation and small movements of the person speaking, the new source location parameters may differ slightly from previously estimated parameters of the same person. If the difference between location parameters for the new source and old source is small enough, it is assumed that a previously detected source (person) is audible (speaking) again, and the speech segment in his/her timeline is simmly extended or reinstated (step 111).
The difference thresholds for the location parameters according to one particular embodiment of the invention are: 1. If the range of both of two sources (previously
detected and current) is less than 2 meters, then it
is determined that a new source is audible if:
the pan angle difference is greater than 12 degrees,
or the tilt angle difference is greater than 4
degrees, or the range difference is greater than. 5
meters.
2. If the range of either of two sources is greater
than 2 meters but less than 3.5 meters, then it is
determined that a new source is audible if:
the pan angle difference is greater than 9 degrees,
or the tilt angle difference is greater than 3
degrees, or the range difference is greater than. 75
meters.
3. If the range of either of two sources is greater
than 3.5 meters, then it is determined that a new
source is audible if:
the pan angle difference is greater than 6 degrees,
or the tilt angle difference is greater than 2
degrees, or the range difference is greater than 1 meter.
Video camera 24, according to this embodiment of the invention, is automatically pointed in the response to the determined location, at the current or most recent sound source. Thus, during a meeting, a continuous video recording can be made of each successive speaker. Indexing engine 40, based on correlating the elapsed times for the video images and sound segments, extracts still images from the video for purposes of providing images to be shown on
GUI display 47 to allow the user to visually identify the person associated with a timeline (step 109). A new segment of data storage is begun for each new speaker (step 110).
Alternatively, a continuous video recording of the meeting can be sampled after the meeting is over, and still video images, such as pictures 81, 83, and 85 of the participants, can be extracted by the indexing engine 40 from the continuous stored video recording.
Occasionally, a person may change his position during a conference. The method of Fig. 4 treats the new position of the person as a new speaker. By using video pattern recognition and/or speaker audio identification techniques, however, the new speaker can be identified as being one of the old speakers who has moved. When such a positive identification occurs, the new speaker timeline (including, for example, images and sound segments, 85 and 94 in Fig. 3) can be merged with the original timeline for the speaker. Techniques of video-based tracking are discussed in a co'-pending patent application (Serial No.
09/79840, filed May 15, 1998) assigned to the assignee of the present invention, and the contents of which are hereby incorporated by reference. The co-pending application describes the combination of video with audio techniques for autopositioning the camera.
In some cases, more than one conference participant may appear in a still image. The timeline can also indicate any other conference participant who might also appear in the still image (for example, a neighbor sitting in close proximity to the speaker), but is silent at the particular elapsed time, thus giving a comprehensive overview of the sounds produced by all conference participants, as well as helping identify all persons present in the still images.
Conference data can also be indexed for a multipoint conference in which more than two sites engage in a conference together. In this multipoint configuration, microphone arrays at each site can send indexing information for the stream of video/audio/data content from that site to a central computer for storage and display.
Additions, deletions, and other modifications of the described embodiments will be apparent to those practiced in this field and are within the scope of the following claims.
Claims (26)
1. A method for indexing the content of a conference with at. least one participant, said method comprising :
recording an audio recording of the conference;
identifying a conference participant producing a sound;
capturing a still image of the identified conference participant;
correlating the still image of the conference participant to at least one audio segment portion of the audio recording, said at least one segment corresponding to the sound produced by the identified conference participant; and
generating a timeline by creating at least one speech-present segment representing the correlated still image and associated at least one audio segment.
2. The method claimed in claim 1, further comprising:
displaying the timeline on a display monitor; and
accessing the timeline displayed on the monitor using a graphical user interface (GUI).
3. The method claimed in claim 2, wherein capturing the still image includes making a video recording of the conference and capturing a video image of the conference participant producing the sound from a segment of the associated video recording of the conference, and further comprising :
using the GUI to select a portion of a specific audio segment for replaying portions of the audio and video recordings on a playback monitor.
4. The method of claim 1, wherein capturing the still image comprises capturing a video image of the conference participant producing the sound from a segment of an associated continuous video recording of the conference.
5. The method of claim 1, further comprising using a video camera to capture the still video image.
6. The method of claim 1 wherein identifying the conference participant is based on identifying the location of the participant.
7. The method of claim 6, wherein identifying the conference participant includes using a microphone array.
8. The method of claim 1, further comprising:
storing time elapsed from a start of the conference with the audio segment and the still image, wherein the timeline is generated by an indexing engine matching the elapsed time associated with the audio segment and the still image.
9. The method of claim 1, further comprising:
identifying a plurality of conference participants;
capturing a still image of each one of the plurality of conference participants;
storing a time elapsed from a start of the conference indicating the time of the capturing of each still image ; and
storing a time elapsed from a start of the conference in association with the audio recording each time a change in audio source location is identified, wherein generating a timeline includes indicating for each identified conference participant the particular elapsed times from the start of the conference during which the particular participant was speaking, and wherein generating the timeline includes indicating any other conference participant who also appears in the video image and is silent at the particular elapsed time.
10. The method of claim 9, wherein a conference participant has been previously identified and wherein a speech-present segment is added to the timeline for the previously detected conference participant when the participant speaks.
11. The method of claim 10, wherein each identified conference participant is a near-end conference participant.
12. The method of claim 11, wherein identifying each near-end conference participant is based on location.
13. The method of claim 12, wherein a still image of a new near-end conference participant is identified and a new timeline is started for the new near-end conference participant, if the location of the new near-end conference participant is different from previously detected locations of the other identified near-end conference participant.
14. The method of claim 1, wherein the audio source is a far-end loudspeaker transmitting a sound from a far-end speech source, wherein the timeline is a far-end timeline, and wherein generating the far-end timeline includes creating a speech-present segment on the far-end timeline if a far-end speech source is present.
15. The method of claim 14, further comprising:
accumulating a block of far-end loudspeaker microphone array data;
accumulating a block of near-end microphone array data ; and
suppressing echo by subtracting accumulated far-end loudspeaker data from accumulated near-end microphone array data.
16. The method of claim 1, further comprising:
capturing a video image of a display presented at the conference; and
generating a timeline for the captured video image of the display.
17. The method of claim 1, wherein the generated timeline is color-coded.
18. A system for indexing the content of a conference with at least one participant, said system comprising :
a sound recording mechanism which records sound created by a conference participant;
at least one source locator for identifying the location. of a conference participant, wherein the source locator generates signals corresponding to the location of the conference participant;
a camera assembly including a camera and a camera movement device, which, in response to the signals generated by said source locator, moves the camera to point at the conference participant;
an image capture unit for capturing an image of the conference participant;
an image storage device for storing images captured by said image capture unit ;
a processor for associating the image captured by the camera to the sound recorded by the sound recording mechanism and to create a timeline comprising images and indicators of presence of associated sound; and
a graphical user interface which allows access to the stored sound, images, and timeline.
19. The system of claim 18, wherein the sound locator uses at least one microphone array.
20. The system of claim 18, wherein the sound locator uses a plurality of microphones.
21. The system of claim 18, wherein the sound locator comprises a plurality of microphone arrays.
22. A system for indexing the content of a conference with at least one participant, said system comprising:
means for recording an audio recording of the conference;
means for identifying each conference participant producing a sound;
means for capturing a still image of each identified conference participant; and
means for associating the still image of each identified conference participant to at least one audio segment portion of the audio recording corresponding to the sound produced by such conference participant.
23. A method for presenting an audio index database representation of'a conference comprising:
generating a plurality of participant timelines, each timeline having at least one speech-present segment representing a correlated still image and at least one associated audio segment ;
enabling a user to identify any of the segments representing audio desired ; and
playing back the identified segment.
24. A method for indexing the content of a conference with at least one
participant substantially as herein described with reference to
Figures 1 to 4.
25. A system for indexing the content of a conference with at least one
participant ged substantially as herein described
and shown with reference to Figures 1 to 4.
26. A method for presenting an audio index database representation of a
conference substantially as herein described with reference to
Figures 1 to 4.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17346298A | 1998-10-14 | 1998-10-14 |
Publications (3)
Publication Number | Publication Date |
---|---|
GB9916394D0 GB9916394D0 (en) | 1999-09-15 |
GB2342802A true GB2342802A (en) | 2000-04-19 |
GB2342802B GB2342802B (en) | 2003-04-16 |
Family
ID=22632148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB9916394A Expired - Fee Related GB2342802B (en) | 1998-10-14 | 1999-07-13 | Method and apparatus for indexing conference content |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2000125274A (en) |
GB (1) | GB2342802B (en) |
Cited By (155)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2351628A (en) * | 1999-04-14 | 2001-01-03 | Canon Kk | Image and sound processing apparatus |
GB2351627A (en) * | 1999-03-26 | 2001-01-03 | Canon Kk | Image processing apparatus |
WO2002013522A2 (en) * | 2000-08-10 | 2002-02-14 | Quindi | Audio and video notetaker |
EP1427205A1 (en) * | 2001-09-14 | 2004-06-09 | Sony Corporation | Network information processing system and information processing method |
FR2849564A1 (en) * | 2002-12-31 | 2004-07-02 | Droit In Situ | METHOD AND SYSTEM FOR PRODUCING A MULTIMEDIA EDITION BASED ON ORAL SERVICES |
US7113201B1 (en) | 1999-04-14 | 2006-09-26 | Canon Kabushiki Kaisha | Image processing apparatus |
US7117157B1 (en) | 1999-03-26 | 2006-10-03 | Canon Kabushiki Kaisha | Processing apparatus for determining which person in a group is speaking |
GB2429133A (en) * | 2004-08-31 | 2007-02-14 | Sony Corp | Method and device for indexing image data to associated audio data |
EP1906707A1 (en) * | 2005-07-08 | 2008-04-02 | Yamaha Corporation | Audio transmission system and communication conference device |
GB2486793A (en) * | 2010-12-23 | 2012-06-27 | Samsung Electronics Co Ltd | Identifying a speaker via mouth movement and generating a still image |
EP2557778A1 (en) * | 2010-09-15 | 2013-02-13 | ZTE Corporation | Method and apparatus for video recording in video calls |
US8452037B2 (en) | 2010-05-05 | 2013-05-28 | Apple Inc. | Speaker clip |
US8560309B2 (en) | 2009-12-29 | 2013-10-15 | Apple Inc. | Remote conferencing center |
WO2013169621A1 (en) * | 2012-05-11 | 2013-11-14 | Qualcomm Incorporated | Audio user interaction recognition and context refinement |
US8644519B2 (en) | 2010-09-30 | 2014-02-04 | Apple Inc. | Electronic devices with improved audio |
EP2709357A1 (en) * | 2012-01-16 | 2014-03-19 | Huawei Technologies Co., Ltd | Conference recording method and conference system |
US8811648B2 (en) | 2011-03-31 | 2014-08-19 | Apple Inc. | Moving magnet audio transducer |
US8858271B2 (en) | 2012-10-18 | 2014-10-14 | Apple Inc. | Speaker interconnect |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8903108B2 (en) | 2011-12-06 | 2014-12-02 | Apple Inc. | Near-field null and beamforming |
US8942410B2 (en) | 2012-12-31 | 2015-01-27 | Apple Inc. | Magnetically biased electromagnet for audio applications |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US8989428B2 (en) | 2011-08-31 | 2015-03-24 | Apple Inc. | Acoustic systems in electronic devices |
US9007871B2 (en) | 2011-04-18 | 2015-04-14 | Apple Inc. | Passive proximity detection |
US9020163B2 (en) | 2011-12-06 | 2015-04-28 | Apple Inc. | Near-field null and beamforming |
US9225701B2 (en) | 2011-04-18 | 2015-12-29 | Intelmate Llc | Secure communication systems and methods |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9357299B2 (en) | 2012-11-16 | 2016-05-31 | Apple Inc. | Active protection for acoustic device |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9525943B2 (en) | 2014-11-24 | 2016-12-20 | Apple Inc. | Mechanically actuated panel acoustic system |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9746916B2 (en) | 2012-05-11 | 2017-08-29 | Qualcomm Incorporated | Audio user interaction recognition and application interface |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9820033B2 (en) | 2012-09-28 | 2017-11-14 | Apple Inc. | Speaker assembly |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
NO20160989A1 (en) * | 2016-06-08 | 2017-12-11 | Pexip AS | Video Conference timeline |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9858948B2 (en) | 2015-09-29 | 2018-01-02 | Apple Inc. | Electronic equipment with ambient noise sensing input circuitry |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9900698B2 (en) | 2015-06-30 | 2018-02-20 | Apple Inc. | Graphene composite acoustic diaphragm |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10063977B2 (en) | 2014-05-12 | 2018-08-28 | Apple Inc. | Liquid expulsion from an orifice |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10284951B2 (en) | 2011-11-22 | 2019-05-07 | Apple Inc. | Orientation-based audio |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10402151B2 (en) | 2011-07-28 | 2019-09-03 | Apple Inc. | Devices with enhanced audio |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10757491B1 (en) | 2018-06-11 | 2020-08-25 | Apple Inc. | Wearable interactive audio device |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10873798B1 (en) | 2018-06-11 | 2020-12-22 | Apple Inc. | Detecting through-body inputs at a wearable audio device |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11307661B2 (en) | 2017-09-25 | 2022-04-19 | Apple Inc. | Electronic device with actuators for producing haptic and audio output along a device housing |
US11334032B2 (en) | 2018-08-30 | 2022-05-17 | Apple Inc. | Electronic watch with barometric vent |
US11499255B2 (en) | 2013-03-13 | 2022-11-15 | Apple Inc. | Textile product having reduced density |
US11561144B1 (en) | 2018-09-27 | 2023-01-24 | Apple Inc. | Wearable electronic device with fluid-based pressure sensing |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11857063B2 (en) | 2019-04-17 | 2024-01-02 | Apple Inc. | Audio output system for a wirelessly locatable tag |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4212274B2 (en) * | 2001-12-20 | 2009-01-21 | シャープ株式会社 | Speaker identification device and video conference system including the speaker identification device |
US7598975B2 (en) | 2002-06-21 | 2009-10-06 | Microsoft Corporation | Automatic face extraction for use in recorded meetings timelines |
JP2005277445A (en) * | 2004-03-22 | 2005-10-06 | Fuji Xerox Co Ltd | Conference video image processing apparatus, and conference video image processing method and program |
JP2005354541A (en) * | 2004-06-11 | 2005-12-22 | Fuji Xerox Co Ltd | Display apparatus, system, and display method |
JP2005352933A (en) * | 2004-06-14 | 2005-12-22 | Fuji Xerox Co Ltd | Display arrangement, system, and display method |
JP4656395B2 (en) * | 2005-03-30 | 2011-03-23 | カシオ計算機株式会社 | Recording apparatus, recording method, and recording program |
JP2007052565A (en) | 2005-08-16 | 2007-03-01 | Fuji Xerox Co Ltd | Information processing system and information processing method |
JP5573402B2 (en) * | 2010-06-21 | 2014-08-20 | 株式会社リコー | CONFERENCE SUPPORT DEVICE, CONFERENCE SUPPORT METHOD, CONFERENCE SUPPORT PROGRAM, AND RECORDING MEDIUM |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60205151A (en) * | 1984-03-29 | 1985-10-16 | Toshiba Electric Equip Corp | Sun tracking device |
EP0660249A1 (en) * | 1993-12-27 | 1995-06-28 | AT&T Corp. | Table of contents indexing system |
WO1997001932A1 (en) * | 1995-06-27 | 1997-01-16 | At & T Corp. | Method and apparatus for recording and indexing an audio and multimedia conference |
US5717869A (en) * | 1995-11-03 | 1998-02-10 | Xerox Corporation | Computer controlled display system using a timeline to control playback of temporal data representing collaborative activities |
US5729741A (en) * | 1995-04-10 | 1998-03-17 | Golden Enterprises, Inc. | System for storage and retrieval of diverse types of information obtained from different media sources which includes video, audio, and text transcriptions |
US5786814A (en) * | 1995-11-03 | 1998-07-28 | Xerox Corporation | Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03162187A (en) * | 1989-11-21 | 1991-07-12 | Mitsubishi Electric Corp | Video conference equipment |
JP3266959B2 (en) * | 1993-01-07 | 2002-03-18 | 富士ゼロックス株式会社 | Electronic conference system |
JPH06266632A (en) * | 1993-03-12 | 1994-09-22 | Toshiba Corp | Method and device for processing information of electronic conference system |
US5778082A (en) * | 1996-06-14 | 1998-07-07 | Picturetel Corporation | Method and apparatus for localization of an acoustic source |
JPH10145763A (en) * | 1996-11-15 | 1998-05-29 | Mitsubishi Electric Corp | Conference system |
-
1999
- 1999-07-13 GB GB9916394A patent/GB2342802B/en not_active Expired - Fee Related
- 1999-08-03 JP JP11219819A patent/JP2000125274A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60205151A (en) * | 1984-03-29 | 1985-10-16 | Toshiba Electric Equip Corp | Sun tracking device |
EP0660249A1 (en) * | 1993-12-27 | 1995-06-28 | AT&T Corp. | Table of contents indexing system |
US5729741A (en) * | 1995-04-10 | 1998-03-17 | Golden Enterprises, Inc. | System for storage and retrieval of diverse types of information obtained from different media sources which includes video, audio, and text transcriptions |
WO1997001932A1 (en) * | 1995-06-27 | 1997-01-16 | At & T Corp. | Method and apparatus for recording and indexing an audio and multimedia conference |
US5717869A (en) * | 1995-11-03 | 1998-02-10 | Xerox Corporation | Computer controlled display system using a timeline to control playback of temporal data representing collaborative activities |
US5786814A (en) * | 1995-11-03 | 1998-07-28 | Xerox Corporation | Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities |
Cited By (228)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2351627A (en) * | 1999-03-26 | 2001-01-03 | Canon Kk | Image processing apparatus |
GB2351627B (en) * | 1999-03-26 | 2003-01-15 | Canon Kk | Image processing apparatus |
US7117157B1 (en) | 1999-03-26 | 2006-10-03 | Canon Kabushiki Kaisha | Processing apparatus for determining which person in a group is speaking |
GB2351628B (en) * | 1999-04-14 | 2003-10-01 | Canon Kk | Image and sound processing apparatus |
GB2351628A (en) * | 1999-04-14 | 2001-01-03 | Canon Kk | Image and sound processing apparatus |
US7113201B1 (en) | 1999-04-14 | 2006-09-26 | Canon Kabushiki Kaisha | Image processing apparatus |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
WO2002013522A2 (en) * | 2000-08-10 | 2002-02-14 | Quindi | Audio and video notetaker |
WO2002013522A3 (en) * | 2000-08-10 | 2003-10-30 | Quindi | Audio and video notetaker |
EP1427205A4 (en) * | 2001-09-14 | 2006-10-04 | Sony Corp | Network information processing system and information processing method |
EP1427205A1 (en) * | 2001-09-14 | 2004-06-09 | Sony Corporation | Network information processing system and information processing method |
FR2849564A1 (en) * | 2002-12-31 | 2004-07-02 | Droit In Situ | METHOD AND SYSTEM FOR PRODUCING A MULTIMEDIA EDITION BASED ON ORAL SERVICES |
WO2004062285A1 (en) * | 2002-12-31 | 2004-07-22 | Dahan Templier Jennifer | Method and system for producing a multimedia publication on the basis of oral material |
GB2429133B (en) * | 2004-08-31 | 2007-08-29 | Sony Corp | Recording and reproduction device |
US7636121B2 (en) | 2004-08-31 | 2009-12-22 | Sony Corporation | Recording and reproducing device |
GB2429133A (en) * | 2004-08-31 | 2007-02-14 | Sony Corp | Method and device for indexing image data to associated audio data |
EP1906707A1 (en) * | 2005-07-08 | 2008-04-02 | Yamaha Corporation | Audio transmission system and communication conference device |
EP1906707A4 (en) * | 2005-07-08 | 2010-01-20 | Yamaha Corp | Audio transmission system and communication conference device |
US8208664B2 (en) | 2005-07-08 | 2012-06-26 | Yamaha Corporation | Audio transmission system and communication conference device |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8560309B2 (en) | 2009-12-29 | 2013-10-15 | Apple Inc. | Remote conferencing center |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US10607141B2 (en) | 2010-01-25 | 2020-03-31 | Newvaluexchange Ltd. | Apparatuses, methods and systems for a digital conversation management platform |
US10984326B2 (en) | 2010-01-25 | 2021-04-20 | Newvaluexchange Ltd. | Apparatuses, methods and systems for a digital conversation management platform |
US10984327B2 (en) | 2010-01-25 | 2021-04-20 | New Valuexchange Ltd. | Apparatuses, methods and systems for a digital conversation management platform |
US11410053B2 (en) | 2010-01-25 | 2022-08-09 | Newvaluexchange Ltd. | Apparatuses, methods and systems for a digital conversation management platform |
US9424862B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US9424861B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US10607140B2 (en) | 2010-01-25 | 2020-03-31 | Newvaluexchange Ltd. | Apparatuses, methods and systems for a digital conversation management platform |
US9431028B2 (en) | 2010-01-25 | 2016-08-30 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US8452037B2 (en) | 2010-05-05 | 2013-05-28 | Apple Inc. | Speaker clip |
US10063951B2 (en) | 2010-05-05 | 2018-08-28 | Apple Inc. | Speaker clip |
US9386362B2 (en) | 2010-05-05 | 2016-07-05 | Apple Inc. | Speaker clip |
US8866867B2 (en) | 2010-09-15 | 2014-10-21 | Zte Corporation | Method and apparatus for video recording in video calls |
EP2557778A4 (en) * | 2010-09-15 | 2014-01-15 | Zte Corp | Method and apparatus for video recording in video calls |
EP2557778A1 (en) * | 2010-09-15 | 2013-02-13 | ZTE Corporation | Method and apparatus for video recording in video calls |
US8644519B2 (en) | 2010-09-30 | 2014-02-04 | Apple Inc. | Electronic devices with improved audio |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
GB2486793B (en) * | 2010-12-23 | 2017-12-20 | Samsung Electronics Co Ltd | Moving image photographing method and moving image photographing apparatus |
GB2486793A (en) * | 2010-12-23 | 2012-06-27 | Samsung Electronics Co Ltd | Identifying a speaker via mouth movement and generating a still image |
US8687076B2 (en) | 2010-12-23 | 2014-04-01 | Samsung Electronics Co., Ltd. | Moving image photographing method and moving image photographing apparatus |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US8811648B2 (en) | 2011-03-31 | 2014-08-19 | Apple Inc. | Moving magnet audio transducer |
US9674625B2 (en) | 2011-04-18 | 2017-06-06 | Apple Inc. | Passive proximity detection |
US9007871B2 (en) | 2011-04-18 | 2015-04-14 | Apple Inc. | Passive proximity detection |
US10032066B2 (en) | 2011-04-18 | 2018-07-24 | Intelmate Llc | Secure communication systems and methods |
US9225701B2 (en) | 2011-04-18 | 2015-12-29 | Intelmate Llc | Secure communication systems and methods |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10402151B2 (en) | 2011-07-28 | 2019-09-03 | Apple Inc. | Devices with enhanced audio |
US10771742B1 (en) | 2011-07-28 | 2020-09-08 | Apple Inc. | Devices with enhanced audio |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US8989428B2 (en) | 2011-08-31 | 2015-03-24 | Apple Inc. | Acoustic systems in electronic devices |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10284951B2 (en) | 2011-11-22 | 2019-05-07 | Apple Inc. | Orientation-based audio |
US8903108B2 (en) | 2011-12-06 | 2014-12-02 | Apple Inc. | Near-field null and beamforming |
US9020163B2 (en) | 2011-12-06 | 2015-04-28 | Apple Inc. | Near-field null and beamforming |
EP2709357A4 (en) * | 2012-01-16 | 2014-11-12 | Huawei Tech Co Ltd | Conference recording method and conference system |
EP2709357A1 (en) * | 2012-01-16 | 2014-03-19 | Huawei Technologies Co., Ltd | Conference recording method and conference system |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9746916B2 (en) | 2012-05-11 | 2017-08-29 | Qualcomm Incorporated | Audio user interaction recognition and application interface |
WO2013169618A1 (en) * | 2012-05-11 | 2013-11-14 | Qualcomm Incorporated | Audio user interaction recognition and context refinement |
US10073521B2 (en) | 2012-05-11 | 2018-09-11 | Qualcomm Incorporated | Audio user interaction recognition and application interface |
WO2013169621A1 (en) * | 2012-05-11 | 2013-11-14 | Qualcomm Incorporated | Audio user interaction recognition and context refinement |
US9736604B2 (en) | 2012-05-11 | 2017-08-15 | Qualcomm Incorporated | Audio user interaction recognition and context refinement |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9820033B2 (en) | 2012-09-28 | 2017-11-14 | Apple Inc. | Speaker assembly |
US8858271B2 (en) | 2012-10-18 | 2014-10-14 | Apple Inc. | Speaker interconnect |
US9357299B2 (en) | 2012-11-16 | 2016-05-31 | Apple Inc. | Active protection for acoustic device |
US8942410B2 (en) | 2012-12-31 | 2015-01-27 | Apple Inc. | Magnetically biased electromagnet for audio applications |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11499255B2 (en) | 2013-03-13 | 2022-11-15 | Apple Inc. | Textile product having reduced density |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10063977B2 (en) | 2014-05-12 | 2018-08-28 | Apple Inc. | Liquid expulsion from an orifice |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10362403B2 (en) | 2014-11-24 | 2019-07-23 | Apple Inc. | Mechanically actuated panel acoustic system |
US9525943B2 (en) | 2014-11-24 | 2016-12-20 | Apple Inc. | Mechanically actuated panel acoustic system |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US9900698B2 (en) | 2015-06-30 | 2018-02-20 | Apple Inc. | Graphene composite acoustic diaphragm |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US9858948B2 (en) | 2015-09-29 | 2018-01-02 | Apple Inc. | Electronic equipment with ambient noise sensing input circuitry |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
NO20160989A1 (en) * | 2016-06-08 | 2017-12-11 | Pexip AS | Video Conference timeline |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11307661B2 (en) | 2017-09-25 | 2022-04-19 | Apple Inc. | Electronic device with actuators for producing haptic and audio output along a device housing |
US11907426B2 (en) | 2017-09-25 | 2024-02-20 | Apple Inc. | Electronic device with actuators for producing haptic and audio output along a device housing |
US10757491B1 (en) | 2018-06-11 | 2020-08-25 | Apple Inc. | Wearable interactive audio device |
US10873798B1 (en) | 2018-06-11 | 2020-12-22 | Apple Inc. | Detecting through-body inputs at a wearable audio device |
US11743623B2 (en) | 2018-06-11 | 2023-08-29 | Apple Inc. | Wearable interactive audio device |
US11740591B2 (en) | 2018-08-30 | 2023-08-29 | Apple Inc. | Electronic watch with barometric vent |
US11334032B2 (en) | 2018-08-30 | 2022-05-17 | Apple Inc. | Electronic watch with barometric vent |
US11561144B1 (en) | 2018-09-27 | 2023-01-24 | Apple Inc. | Wearable electronic device with fluid-based pressure sensing |
US11857063B2 (en) | 2019-04-17 | 2024-01-02 | Apple Inc. | Audio output system for a wirelessly locatable tag |
Also Published As
Publication number | Publication date |
---|---|
GB2342802B (en) | 2003-04-16 |
JP2000125274A (en) | 2000-04-28 |
GB9916394D0 (en) | 1999-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
GB2342802A (en) | Indexing conference content onto a timeline | |
KR101238586B1 (en) | Automatic face extraction for use in recorded meetings timelines | |
Lee et al. | Portable meeting recorder | |
Cutler et al. | Distributed meetings: A meeting capture and broadcasting system | |
JP3143125B2 (en) | System and method for recording and playing multimedia events | |
US7428000B2 (en) | System and method for distributed meetings | |
US5548346A (en) | Apparatus for integrally controlling audio and video signals in real time and multi-site communication control method | |
US7113201B1 (en) | Image processing apparatus | |
US7355623B2 (en) | System and process for adding high frame-rate current speaker data to a low frame-rate video using audio watermarking techniques | |
JP3620855B2 (en) | Method and apparatus for recording and indexing audio and multimedia conferences | |
US7362350B2 (en) | System and process for adding high frame-rate current speaker data to a low frame-rate video | |
US20060251384A1 (en) | Automatic video editing for real-time multi-point video conferencing | |
CN107820037B (en) | Audio signal, image processing method, device and system | |
US7355622B2 (en) | System and process for adding high frame-rate current speaker data to a low frame-rate video using delta frames | |
CN111193890B (en) | Conference record analyzing device and method and conference record playing system | |
JP2006085440A (en) | Information processing system, information processing method and computer program | |
JP4414708B2 (en) | Movie display personal computer, data display system, movie display method, movie display program, and recording medium | |
WO2002013522A2 (en) | Audio and video notetaker | |
Wu et al. | MoVieUp: Automatic mobile video mashup | |
Arnaud et al. | The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements | |
Sumec | Multi camera automatic video editing | |
JP6860178B1 (en) | Video processing equipment and video processing method | |
TWI799048B (en) | Panoramic video conference system and method | |
JP2000333125A (en) | Editing device and recording device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
732E | Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977) | ||
PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20150713 |