WO2024093442A1 - Method and apparatus for checking audiovisual content, and device and storage medium - Google Patents

Method and apparatus for checking audiovisual content, and device and storage medium Download PDF

Info

Publication number
WO2024093442A1
WO2024093442A1 PCT/CN2023/113406 CN2023113406W WO2024093442A1 WO 2024093442 A1 WO2024093442 A1 WO 2024093442A1 CN 2023113406 W CN2023113406 W CN 2023113406W WO 2024093442 A1 WO2024093442 A1 WO 2024093442A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
timeline
content
audio
speaker
Prior art date
Application number
PCT/CN2023/113406
Other languages
French (fr)
Chinese (zh)
Inventor
郑康
张鼎
李继超
刘敬晖
和君
李想
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024093442A1 publication Critical patent/WO2024093442A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • Example embodiments of the present disclosure relate generally to the field of computers, and more particularly to methods, devices, apparatuses, and computer-readable storage media for viewing audiovisual content.
  • the Internet has become the main platform for people to obtain and share content.
  • people can use the Internet to publish a variety of content, or receive content shared by other users.
  • audiovisual content e.g., audio content or video content
  • people can use a player to play a speech or a video or audio recording of a meeting shared by other users.
  • a method for viewing audio-visual content includes: receiving a selection of a plurality of text segments, the plurality of text segments corresponding to a plurality of parts in target audio-visual content, the plurality of parts at least including a first part and a second part that are not continuous in the target audio-visual content; causing segment audio-visual content to be created based on at least the plurality of parts of the target audio-visual content, wherein the first part and the second part are continuous in the segment audio-visual content; And present a sharing entrance for sharing the audio-visual content clips.
  • a device for viewing audio-visual content includes a receiving module configured to receive selections for multiple text segments, the multiple text segments corresponding to multiple parts in the target audio-visual content, the multiple parts at least including a first part and a second part that are discontinuous in the target audio-visual content; a control module configured to create segmented audio-visual content based on at least the multiple parts of the target audio-visual content, wherein the first part and the second part are continuous in the segmented audio-visual content; and a presentation module configured to present a sharing entry for sharing the segmented audio-visual content.
  • an electronic device in a third aspect of the present disclosure, includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
  • a computer-readable storage medium wherein a computer program is stored on the medium, and when the program is executed by a processor, the method of the first aspect is implemented.
  • a playback system in a fifth aspect of the present disclosure, includes: a main timeline, which at least indicates the current playback position of the audio-visual content; and at least one speech timeline, which is used to indicate the temporal distribution of the speech content of at least one speaker associated with the audio-visual content.
  • FIG1 shows a schematic diagram of a conventional audio-visual content player
  • FIGS. 2A to 2C show schematic diagrams of example playback systems according to some embodiments of the present disclosure
  • 3A and 3B illustrate example viewing interfaces for audiovisual content according to some embodiments of the present disclosure
  • FIGS. 4A and 4B are schematic diagrams showing sharing of audio-visual content segments according to some embodiments of the present disclosure.
  • FIG5 illustrates a flow chart of an example process for viewing audiovisual content according to some embodiments of the present disclosure
  • FIG6 shows a block diagram of an apparatus for viewing audiovisual content according to some embodiments of the present disclosure.
  • FIG. 7 shows a block diagram of a device capable of implementing various embodiments of the present disclosure.
  • Figure 1 shows a schematic diagram of a traditional audio-visual content player 100. As shown in Figure 1, in the player 100, people usually need to drag the time axis control to locate the desired playback moment.
  • the audiovisual content has a length of more than 1 hour, which makes it difficult for the user to quickly locate the desired playback position through the time axis.
  • the embodiments of the present disclosure provide a system for playing audio-visual content (audio content or video content).
  • the system may include a main timeline to at least indicate the current playback position of the audio-visual content.
  • the system may also include at least one speech timeline, which is used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content in time.
  • an embodiment of the present disclosure also provides a solution for viewing audio-visual content.
  • a viewing interface for audio-visual content can be provided, wherein the viewing interface includes a playback control for playing the audio-visual content.
  • at least one speech timeline can be presented in the playback control, and the at least one speech timeline is used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content in time.
  • the embodiments of the present disclosure can provide a speech timeline in the playback system or playback control to provide a time distribution corresponding to the speech content of the speaker associated with the audio-visual content.
  • the implementation of the present disclosure can facilitate users to view the part corresponding to a specific speaker, thereby improving the efficiency of users in obtaining desired content.
  • embodiments of the present disclosure can utilize a timeline to provide richer information about audio-visual content.
  • FIG2A shows a schematic diagram 200A of an example playback system 205 according to some embodiments of the present disclosure.
  • the playback system 205 (also referred to as a player 205 or a playback control 205) can be used to play corresponding audio-visual content.
  • the playback system 205 can be provided by, for example, an appropriate electronic device, examples of which can include, but are not limited to, a desktop computer, a laptop computer, a smart phone, a tablet computer, a personal digital assistant, or a smart wearable device.
  • the audiovisual content may include audio and video files locally stored in the audiovisual system 205, audio and video files stored in the cloud, or audio and video streams.
  • it may include a playback stream of recorded audio-visual content (eg, a conference recording), or a live stream of live audio-visual content.
  • the playback system 205 may include a main timeline 210.
  • the main timeline 210 may indicate the current playback position of the audio-visual content, that is, the playback progress.
  • the main timeline 210 may include a playback position indicator 215 to indicate the time point at which the audio-visual content is currently being played.
  • the length information of the audiovisual content is fixed, and the total length of the time axis can correspond to the total duration of the time content.
  • the play position indicator 215 can be set accordingly according to the corresponding relationship between the position and the play time.
  • the play position indicator 215 may, for example, always be set to the far right of the main timeline 210.
  • the user may, for example, jump back to the corresponding time point by moving the play position indicator 215.
  • the main timeline 210 may also present graphic information corresponding to the audio waveform of the audio-visual content.
  • the user can more conveniently understand which parts of the audio-visual content are worth paying attention to, and which parts, such as those with less audio waveforms, can be temporarily ignored.
  • a playback system 205 can improve the efficiency of content acquisition for users.
  • the audiovisual content may be, for example, recorded content about an online conference. Accordingly, as shown in FIG1 , the main timeline 210 may also present, for example, an interaction identifier 220 corresponding to an interaction behavior in the online conference.
  • Such an interaction mark 220 may be set at a corresponding position of the main time axis 210 to indicate that a corresponding interaction behavior has occurred at a corresponding moment.
  • different graphics of the interaction mark 220 may correspond to different interaction behaviors.
  • the main timeline 210 may include, for example, an interactive identifier 220 for indicating file sharing in an online conference.
  • an interactive identifier 220 for indicating file sharing in an online conference.
  • the playback system 205 can, for example, guide the user to obtain descriptive information about the file sharing. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 can, for example, indicate the information of the shared file in a floating window, such as the file name, format, size, sharer, etc. In another example, if the user clicks the interactive mark 220, the playback system 205 can, for example, guide the user to obtain the content of the shared file, for example, can guide the user to jump to the online viewing interface of the file.
  • the main timeline 210 may include, for example, an interactive identifier 220 for indicating an online chat in an online conference.
  • the online chat herein refers to any appropriate chat based on text, emoticons, images, and/or audio, for example, using an instant messaging tool of an online conference.
  • the graphic identifier of the interactive identifier 220 may be determined, for example, based on the content of the online chat.
  • the graphic identifier of the interactive identifier 220 may be determined, for example, by a graphic identifier (e.g., an avatar) of a user participating in the chat.
  • the playback system 205 can guide the user to obtain descriptive information about the online chat. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 can indicate the information of the online chat, such as the participants of the online chat, the chat content, etc., in a floating window. In another example, if the user clicks the interactive mark 220, the playback system 205 can guide the user to obtain the complete content of the previous chat, for example, guide the user to jump to the viewing interface of the chat content in the meeting.
  • the main timeline 210 may include, for example, an interactive identifier 220 for indicating comments in an online conference.
  • the comments here may include, for example, any appropriate comments based on text, expressions, images, and/or audio.
  • a user's like may also be understood as a comment on the corresponding content.
  • the graphic identifier of the interactive identifier 220 may be determined, for example, based on the content and/or type of the comment. For example, if it is an expression-based comment, the graphic representation of the interactive identifier 220 may be generated based on the expression.
  • the playback system 205 when the user selects the interactive indicator 220, the playback system 205 For example, the user may be guided to obtain descriptive information about the comment. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 may indicate the information of the comment, such as the commenter, comment time, and comment reply, etc., in a floating window. In another example, if the user clicks the interactive mark 220, the playback system 205 may guide the user to jump to the interface for viewing comments to obtain more abundant information about the comment.
  • the main timeline 210 may also present speaker information indicating the temporal distribution of speech content of at least one speaker associated with the audiovisual content.
  • the main timeline 210 may also be identified as a type of speech timeline.
  • the main timeline 210 may assign a corresponding color mark to each speaker. Accordingly, the color distribution on the main timeline 210 may be used to indicate which one or more speakers the corresponding time period corresponds to. It should be understood that other appropriate styles may also be used to use the main timeline 210 to indicate the distribution of the speech content of the speaker in time.
  • the playback system 205 may further include a viewing portal 230 for viewing the speech timeline.
  • the viewing portal 230 may indicate a graphic identifier (eg, avatar) of one or more speakers associated with the audiovisual content.
  • the playback system 205 may present, for example, a speech timeline 240 - 1 and a speech timeline 240 - 2 (individually or collectively referred to as speech timelines 240 ).
  • the speech timeline 240 may be used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content over time. For example, if the speaker made a speech at the corresponding moment, the speech timeline 240 may be filled with a first graphic; on the contrary, if the speaker did not make a speech at the corresponding moment, the speech timeline 240 may be filled with a second graphic. Thus, the user can intuitively understand at what moment each speaker made a speech.
  • the speech timeline 240 may also be similarly Graphic information corresponding to the audio waveform of the audio-visual content corresponding to the speaker is presented. Based on this method, the user can also intuitively understand at which moments the speaker did not speak and at which moments the speaker spoke frequently. Such information is more helpful for users to quickly obtain the desired content.
  • the number of speech timelines 240 may be determined based on the number of speakers participating in the audiovisual content. In some embodiments, the number of such speakers may be determined by the number of terminals participating in the online conference. For example, multiple conference participants may access the online conference through the same terminal (or use the same account), and such multiple participants may be identified as the same speaker, although they may include multiple different speakers.
  • the number of such speakers may be determined based on the number of speakers in the audiovisual content. It should be understood that any appropriate speaker recognition technology may be used to determine the corresponding speakers in the audiovisual content, and the present disclosure is not intended to be limited thereto.
  • the playback system 205 may present a speech timeline 240 corresponding to all speakers of the audio-visual content.
  • the audio-visual content may include two speakers (“speaker 1” and “speaker 2”). Accordingly, the presentation order of the corresponding speech timeline 240-1 and speech timeline 240-2 in the playback system 205 may be determined based on the information of the speakers.
  • the presentation order of the speech timeline may be determined based on the text identifier of the speaker, such as a user name or nickname of the speaker, and the presentation order of the speech timeline may be based on the order of the text identifier of the speaker.
  • the presentation order of the speech timeline can be determined based on the proportion of the speech content of the speaker. For example, if the speech content proportion of "Speaker 1" reaches “70%”, which is greater than the speech content proportion of "Speaker 2" "30%", then the speech timeline 240-1 can be presented in priority over the speech timeline 240-2.
  • the presentation order of the speech timeline can be determined based on the start time of the speech content of the speaker.
  • the time is, for example, the first minute after the meeting starts, which is earlier than the start time of the speech content of "Speaker 2", for example, the third minute after the meeting starts. Accordingly, the speech timeline 240-1 can be presented in priority to the speech timeline 240-2.
  • the playback system 205 may also present the description information of the corresponding speaker in association with the speech timeline 240.
  • the speech timeline 240-1 may have a text identifier (e.g., a user name or nickname) of the corresponding speaker.
  • the speech timeline 240-1 may also have a graphic identifier (e.g., an avatar) of the corresponding speaker.
  • the playback system 205 may also present the percentage information of the speech content of the corresponding speaker in association with at least one speech timeline.
  • the speech timeline 240-1 may include the percentage "XX%" of the speech content of "speaker 1".
  • the speech timeline 240 may also present an interaction identifier (not shown in FIG. 2B ) for indicating an interaction behavior associated with a corresponding speaker in the online conference.
  • such interactive behaviors refer to corresponding interactive behaviors in which the corresponding speaker participates, such as the file sharing, online chatting or commenting behaviors discussed above.
  • the interactive logic of the interactive identifiers presented on the speech timeline 240 may be similar to the interactive identifiers 220 discussed above, which will not be described in detail here.
  • the speech timeline 240 - 1 may also be automatically collapsed or expanded in response to the user's selection of the viewing portal 230 .
  • the playback system may always provide the speech timeline about all speakers by default, regardless of the selection of the viewing portal 230 .
  • the playback system 205 may also provide a search portal 250 for the speech timeline. Using the search portal 250, a user may initiate a viewing request associated with a specific speaker.
  • the playback system 205 may present visual elements associated with all speakers associated with the audio-visual content.
  • visual elements may include, for example, a text identifier of the speaker (e.g., a user name or nickname). or a graphic identifier (e.g., an avatar).
  • the playback system 205 may receive a user's selection of a specific visual element from among the multiple visual elements to determine that the user desires to view the speech timeline of the speaker corresponding to the selected visual element. For example, the user may click on the avatar of "Speaker 1" so that the playback system 205 only presents the speech timeline 240-1 corresponding to "Speaker 1" but not the speech timeline 240-2.
  • the user may also provide input indicating the target speaker by viewing the entry 250.
  • the user may enter at least part of the nickname or user name of "Speaker 1" to automatically match to "Speaker 1" and cause the playback system 205 to correspondingly present the speech timeline 240-1 corresponding to "Speaker 1" instead of presenting the speech timeline 240-2.
  • the search entry 250 may be provided independently of the viewing entry 230.
  • the playback system 205 may also provide a search entry 250 for viewing a specific speaker.
  • the search portal 250 may be provided, for example, dependent on the viewing portal 230. That is, only when the viewing portal 230 is activated and the speech timelines of all speakers are presented, the search portal 250 is provided accordingly for quickly filtering or locating a specific speech timeline.
  • the speech timeline 240 may also support various types of user interactions. For example, as shown in FIG2C , the user may click on a position 260 in the speech timeline 240 - 1 to indicate that the audiovisual content is expected to be played starting from that position.
  • the playback system 205 can play the audiovisual content from the time point 270 corresponding to the position 260.
  • the playback system 205 can play the audiovisual content continuously from the time point 270. For example, if the time point 270 is the moment "5 minutes and 30 seconds", the audiovisual content will be played continuously from "5 minutes and 30 seconds" until the end.
  • the playback system 205 may also play part of the audiovisual content corresponding to "Speaker 1" from time point 270. That is, the playback system 205 may only play part of the audiovisual content of "Speaker 1" corresponding to the speech timeline 240-1, and play it from time point 270, thereby achieving the effect of only listening to a specific speaker.
  • the playback system 205 can make the partial audio-visual content corresponding to "Speaker 1" play from the beginning, that is, only play the partial audio-visual content corresponding to "Speaker 1" in the audio-visual content.
  • the various features discussed above can be provided independently or in a combination different from that shown in FIGS. 2A to 2C .
  • the timeline of the playback system can be a graphical style similar to the timeline of a conventional playback system, and does not necessarily have to be used to indicate an audio waveform.
  • the playback system 205 can also be used to play real-time audio-visual content (e.g., live audio and video streams).
  • the speech timeline discussed above can be used to indicate the temporal distribution of the historical speech content of at least one speaker associated with the historical portion of the real-time audio-visual content.
  • the speech timeline can be presented in a graphical manner: the temporal distribution of the historical speech content of each speaker from the start of the live broadcast to the current moment.
  • the embodiments of the present disclosure may also provide a viewing interface for audio-visual content.
  • a viewing interface may be, for example, a playback interface for recorded content, or a live interface for real-time content.
  • the following uses the "meeting minutes" scenario as an example of viewing audio-visual content, but such a scenario is only exemplary, and the embodiments of the present disclosure may also be applied to other appropriate scenarios.
  • FIG. 3A shows an example viewing interface 300 according to some embodiments of the present disclosure.
  • the viewing interface 300 may include a playback control 310.
  • the playback control 310 may be implemented, for example, using the playback system 205 discussed above.
  • the control panel 310 may include, for example, a main timeline 312 and speech timelines 314 - 1 and 314 - 2 (individually or collectively referred to as speech timelines 314 ).
  • the viewing interface 300 further includes a text control 320 for presenting text content corresponding to the audio-visual content.
  • the text content may be generated based on the audio of the audio-visual content. Taking the audio-visual content as a meeting record as an example, the text content may be generated based on voice recognition of the audio of each speaker in the meeting. Taking the audio-visual content as a real-time live broadcast content as an example, the text content may be generated based on real-time voice recognition of each speaker.
  • the user may select the speech timeline 314 - 1 , for example, and accordingly, the text content 322 corresponding to “Speaker 1 ” may be adjusted to be highlighted in the text control 320 relative to other text content 324 of other speakers.
  • making the text content 322 highlighted relative to other text content 324 may include, for example, increasing the prominence of the text content 322 displayed in the text control 320.
  • the display style e.g., text color, background color, boldness, font size, underline
  • the text content 322 may be bolded or highlighted.
  • making the text content 322 highlighted relative to the text content 324 may also include, for example, reducing the prominence of the other text content 324 displayed in the text control.
  • the display style e.g., text color, background color, boldness, font size, underline
  • the text color of the other text content 324 may be gray, thereby forming a contrast with the black text content 322.
  • the user may also select a specific position in the speech timeline to trigger the audiovisual content to be played starting from the corresponding moment.
  • the text content corresponding to the specific position may also be adjusted to be highlighted in the text control 320.
  • the text content presented in the text control 320 always corresponds to the moment at which the audiovisual content is currently playing.
  • the text content of the segment corresponding to the moment e.g., the text corresponding to a certain paragraph spoken by the speaker
  • the display style of the text content of the segment can also be adjusted to highlight the text content of the segment. For example, one or more words corresponding to the time point can be highlighted to be highlighted.
  • embodiments of the present disclosure may also support sharing of audiovisual content segments based on speech timelines.
  • a user may select one or more speech timelines (e.g., speech timeline 430 - 1 ) of the multiple speech timelines in the playback control 410 (or the playback system 410 ) for sharing.
  • the audio-visual content corresponding to the speech timeline 430-1 can be generated for sharing.
  • the entire speech content of "Speaker 1" can be used to generate independent audio-visual content, for example, for sharing with other users or organizations.
  • the user can also select one or more time segments in the speech timelines 430-1 and 430-2, for example, time segment 440-1, time segment 440-2, and time segment 440-3. Accordingly, after the user clicks on the sharing entry 420 (i.e., sends a sharing request), multiple discrete audio-visual content segments corresponding to the time segments 440-1, 440-2, and 440-3 can be combined to generate independent segment audio-visual content, for example, for sharing with other users or organizations.
  • the embodiments of the present disclosure can support users to more efficiently share audio-visual content by selecting a speech timeline or a time segment, thereby improving the efficiency of audio-visual content sharing and improving the efficiency of information acquisition by the shared party.
  • the embodiments of the present disclosure also support users to select non-continuous segments to create, which further improves the flexibility of sharing audio-visual content segments.
  • FIG5 shows a flow chart of an example process 500 for viewing audiovisual content according to some embodiments of the present disclosure.
  • Process 500 may be implemented at a suitable electronic device. Examples of such electronic devices may include, but are not limited to, desktop computers, laptop computers, smart phones, tablet computers, personal digital assistants, or smart wearable devices.
  • the electronic device provides a viewing interface for audio-visual content, where the viewing interface includes a play control for playing the audio-visual content.
  • the electronic device presents at least one speech timeline in the playback control, where the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
  • the viewing interface further includes a text control for presenting text content corresponding to the audiovisual content, the text content being generated based on the audio of the audiovisual content.
  • the method also includes: in response to selection of a first speech timeline in at least one speech timeline, causing first text content corresponding to a first speaker in the text content to be highlighted in the text control relative to second text content of other speakers, wherein the first speech timeline corresponds to the first speaker.
  • first text content corresponding to a first speaker in the text content is highlighted in a text control relative to second text content of other speakers: the prominence of the first text content displayed in the text control is increased; and/or the prominence of the second text content displayed in the text control is reduced.
  • the method further includes: receiving a selection of a first position in a first speech timeline of at least one speech timeline; and causing text content in the text content corresponding to the first position to be highlighted in the text control.
  • presenting at least one speech timeline in the playback control includes: presenting a viewing entry for viewing the speech timeline in the playback control; and presenting at least one speech timeline in the playback control in response to a selection of the viewing entry.
  • At least one speech timeline includes multiple speech timelines
  • the presentation order of the multiple speech timelines in the playback control is determined based on at least one of the following: text identifiers of multiple speakers corresponding to the multiple speech timelines, The proportion of speeches by multiple speakers, or the start time of speeches by multiple speakers.
  • presenting at least one speech timeline in a playback control includes: receiving a viewing request associated with a target speaker; and presenting a target speech timeline corresponding to the target speaker, the target speech timeline being used to indicate the temporal distribution of the target speech content of the target speaker.
  • receiving a viewing request associated with a target speaker includes: presenting multiple visual elements associated with multiple speakers associated with audio-visual content; and receiving a viewing request associated with the target speaker based on a preset operation of a target visual element corresponding to the target speaker among the multiple visual elements.
  • receiving a view request associated with the target speaker includes receiving a view request associated with the target speaker based on an input indicating the target speaker.
  • the method further includes: receiving a selection of a second position in a first speech timeline of at least one speech timeline; and causing a corresponding portion of the audiovisual content to be played from a time point corresponding to the second position.
  • the first speech timeline corresponds to the first speaker
  • causing at least part of the audio-visual content to be played from a time point corresponding to the second position includes: causing the audio-visual content to be played continuously from the time point; or causing part of the audio-visual content corresponding to the first speaker in the audio-visual content to be played from the time point.
  • the method further comprises: presenting description information of the corresponding speaker in association with at least one speech timeline, the description information being generated based on a text identifier and/or a graphic identifier of the speaker.
  • the method further includes: presenting the proportion information of the speech content of the corresponding speaker in association with at least one speech timeline.
  • the playback controls further include a main timeline for presenting graphical information corresponding to an audio waveform of the audiovisual content.
  • the audiovisual content is an audiovisual recording of an online meeting
  • the playback control further includes a main timeline, which is used to present a first interaction identifier corresponding to a first interaction behavior in the online meeting.
  • At least one speech timeline also presents a timeline for indicating the online meeting A second interaction identifier of a second interaction behavior associated with the corresponding speaker.
  • the first interactive behavior and/or the second interactive behavior includes at least one of the following: file sharing, online chatting, and commenting.
  • the method further includes: in response to a first selection of the first interaction identifier, presenting first description information for the first interaction behavior; and/or in response to a second selection of the second interaction identifier, presenting second description information for the second interaction behavior.
  • the method further includes: receiving a selection of at least one time segment in at least one speech timeline; and based on a first sharing request associated with the at least one time segment, generating a first segment of audiovisual content corresponding to the at least one time segment for sharing.
  • the method also includes: receiving a selection of a group of speech timelines in at least one speech timeline, the group of timelines including one or more speech timelines; and based on a second sharing request associated with the group of timelines, causing a second segment of audio-visual content corresponding to the group of speech timelines to be generated for sharing.
  • the audiovisual content includes real-time audiovisual content
  • at least one speech timeline is used to indicate the temporal distribution of historical speech content of at least one speaker associated with the historical portion of the real-time audiovisual content.
  • Fig. 6 shows a schematic structural block diagram of a device 600 for viewing audio-visual content according to some embodiments of the present disclosure.
  • the apparatus 600 includes a providing module 610 configured to provide a viewing interface for audio-visual content, where the viewing interface includes a playback control for playing the audio-visual content.
  • the apparatus 600 further includes a presentation module 620 configured to present at least one speech timeline in the playback control, wherein the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
  • the viewing interface further includes a text control for presenting text content corresponding to the audiovisual content, the text content being generated based on the audio of the audiovisual content. become.
  • the presentation module 620 is further configured to: in response to selection of a first speech timeline in at least one speech timeline, cause first text content corresponding to a first speaker in the text content to be highlighted in the text control relative to second text content of other speakers, wherein the first speech timeline corresponds to the first speaker.
  • first text content corresponding to a first speaker in the text content is highlighted in a text control relative to second text content of other speakers: the prominence of the first text content displayed in the text control is increased; and/or the prominence of the second text content displayed in the text control is reduced.
  • the presentation module 620 is further configured to: receive a selection of a first position in a first speech timeline of at least one speech timeline; and highlight the text content corresponding to the first position in the text content in the text control.
  • the presentation module 620 is further configured to: present a viewing entry for viewing the speech timeline in the playback control; and present at least one speech timeline in the playback control in response to a selection of the viewing entry.
  • At least one speech timeline includes multiple speech timelines
  • the presentation order of the multiple speech timelines in the playback control is determined based on at least one of the following: text identifiers of the multiple speakers corresponding to the multiple speech timelines, the proportion of the speech content of the multiple speakers, or the start time of the speech content of the multiple speakers.
  • the presentation module 620 is further configured to: receive a viewing request associated with a target speaker; and present a target speech timeline corresponding to the target speaker, the target speech timeline being used to indicate the temporal distribution of the target speech content of the target speaker.
  • the presentation module 620 is further configured to: present multiple visual elements associated with multiple speakers associated with the audio-visual content; and receive a viewing request associated with a target speaker based on a preset operation of a target visual element corresponding to the target speaker among the multiple visual elements.
  • presentation module 620 is further configured to: based on the input indicating the target speaker, receive a viewing request associated with the target speaker.
  • the presentation module 620 is further configured to: receive a selection of a second position in a first speech timeline of at least one speech timeline; and cause the corresponding portion of the audiovisual content to be played from a time point corresponding to the second position.
  • the first speech timeline corresponds to the first speaker
  • the presentation module 620 is further configured to: enable the audiovisual content to be played continuously from a point in time; or enable the portion of the audiovisual content corresponding to the first speaker to be played from a point in time.
  • the presentation module 620 is further configured to present description information of the corresponding speaker in association with at least one speech timeline, where the description information is generated based on a text identifier and/or a graphic identifier of the speaker.
  • the presentation module 620 is further configured to present the proportion information of the speech content of the corresponding speaker in association with at least one speech timeline.
  • the playback controls further include a main timeline for presenting graphical information corresponding to an audio waveform of the audiovisual content.
  • the audiovisual content is an audiovisual recording of an online meeting
  • the playback control further includes a main timeline, which is used to present a first interaction identifier corresponding to a first interaction behavior in the online meeting.
  • At least one speech timeline further presents a second interaction identifier for indicating a second interaction behavior associated with the corresponding speaker in the online conference.
  • the first interactive behavior and/or the second interactive behavior includes at least one of the following: file sharing, online chatting, and commenting.
  • the presentation module 620 is further configured to: present first description information for a first interaction behavior in response to a first selection of a first interaction identifier; and/or present second description information for a second interaction behavior in response to a second selection of a second interaction identifier.
  • the presentation module 620 is further configured to: receive a selection of at least one time segment in at least one speech timeline; and based on a first sharing request associated with the at least one time segment, generate a first segment of audio-visual content corresponding to the at least one time segment for sharing.
  • the presentation module 620 is further configured to: receive a request for at least one A group of speech timelines is selected from the speech timelines, wherein the group of timelines includes one or more speech timelines; and based on a second sharing request associated with the group of timelines, a second segment of audio-visual content corresponding to the group of speech timelines is generated for sharing.
  • the audiovisual content includes real-time audiovisual content
  • at least one speech timeline is used to indicate the temporal distribution of historical speech content of at least one speaker associated with the historical portion of the real-time audiovisual content.
  • the units included in the device 600 can be implemented in various ways, including software, hardware, firmware, or any combination thereof.
  • one or more units can be implemented using software and/or firmware, such as machine executable instructions stored on a storage medium.
  • some or all of the units in the device 600 can be implemented at least in part by one or more hardware logic components.
  • exemplary types of hardware logic components include field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
  • Figure 7 shows a block diagram of a computing device/server 700 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 700 shown in Figure 7 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein.
  • computing device/server 700 is in the form of a general computing device.
  • the components of computing device/server 700 may include, but are not limited to, one or more processors or processing units 710, memory 720, storage device 730, one or more communication units 740, one or more input devices 760, and one or more output devices 760.
  • Processing unit 710 may be an actual or virtual processor and is capable of performing various processes according to a program stored in memory 720. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capabilities of computing device/server 700.
  • the computing device/server 700 typically includes a plurality of computer storage media. Such media can be any available media accessible to the computing device/server 700, including but not limited to volatile and nonvolatile media, removable and non-removable media.
  • the memory 720 can be a volatile memory (e.g., registers, cache, random access memory (RAM)), Non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof.
  • Storage device 730 may be removable or non-removable media and may include machine-readable media such as a flash drive, a disk, or any other media that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within computing device/server 700.
  • machine-readable media such as a flash drive, a disk, or any other media that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within computing device/server 700.
  • the computing device/server 700 may further include additional removable/non-removable, volatile/non-volatile storage media.
  • a disk drive for reading or writing from a removable, non-volatile disk e.g., a “floppy disk”
  • an optical drive for reading or writing from a removable, non-volatile optical disk may be provided.
  • each drive may be connected to a bus (not shown) by one or more data media interfaces.
  • the memory 720 may include a computer program product 725 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
  • the communication unit 740 enables communication with other computing devices via a communication medium. Additionally, the functions of the components of the computing device/server 700 can be implemented in a single computing cluster or multiple computing machines that can communicate via a communication connection. Thus, the computing device/server 700 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
  • PC network personal computer
  • Input device 750 may be one or more input devices, such as a mouse, keyboard, trackball, etc.
  • Output device 760 may be one or more output devices, such as a display, speaker, printer, etc.
  • Computing device/server 700 may also communicate with one or more external devices (not shown) as needed, such as storage devices, display devices, etc., with one or more devices that enable users to interact with computing device/server 700, or with any device that enables computing device/server 700 to communicate with one or more other computing devices (e.g., network card, modem, etc.) through communication unit 740. Such communication may be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.
  • These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operational steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of a module, program segment or instruction includes one or more executable instructions for realizing the logical function of the specification.
  • the function marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be realized by a special hardware-based system that performs the function or action of the specification, or can be realized by a combination of special hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

According to the embodiments of the present disclosure, provided are a method and apparatus for checking audiovisual content, and a device and a storage medium. The method comprises: providing a check interface for audiovisual content, wherein the check interface comprises a playing control for playing the audiovisual content; and presenting at least one speaking time axis in the playing control, wherein the at least one speaking time axis is used for indicating the temporal distribution of spoken content of at least one speaker associated with the audiovisual content. In this way, by means of the embodiments of the present disclosure, a user is aided in acquiring speaking time distribution information of each speaker in audiovisual content, thereby helping the user to acquire required information more conveniently.

Description

用于查看视听内容的方法、装置、设备和存储介质Method, apparatus, device and storage medium for viewing audiovisual content
本申请要求2022年10月31日递交的、标题为“用于查看视听内容的方法、装置、设备和存储介质”、申请号为202211352393.3的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese invention patent application entitled “Methods, devices, equipment and storage media for viewing audiovisual content” filed on October 31, 2022 and application number 202211352393.3, the entire contents of which are incorporated by reference into this application.
技术领域Technical Field
本公开的示例实施例总体涉及计算机领域,特别地涉及用于查看视听内容的方法、装置、设备和计算机可读存储介质。Example embodiments of the present disclosure relate generally to the field of computers, and more particularly to methods, devices, apparatuses, and computer-readable storage media for viewing audiovisual content.
背景技术Background technique
随着计算机技术的发展,互联网已经成为人们获取和分享内容的主要平台。例如,人们可以利用互联网来发布各式各样的内容,或者接收其它用户分享的内容。With the development of computer technology, the Internet has become the main platform for people to obtain and share content. For example, people can use the Internet to publish a variety of content, or receive content shared by other users.
在基于互联网的内容分享中,视听内容(例如,音频内容或视频内容)的分享已经成为最主要的形式之一。人们例如可以利用播放器来播放其它用户所分享的一段演讲或者某个会议的视频或音频记录。然而,在这样的播放过程中,人们难以快速地定位某个特定的发言方在这样的视频或音频记录中的对应部分。In the content sharing based on the Internet, the sharing of audiovisual content (e.g., audio content or video content) has become one of the most important forms. For example, people can use a player to play a speech or a video or audio recording of a meeting shared by other users. However, during such a playback process, it is difficult for people to quickly locate the corresponding part of a specific speaker in such a video or audio recording.
发明内容Summary of the invention
在本公开的第一方面,提供了一种查看视听内容的方法。该方法包括:接收针对多个文本片段的选择,多个文本片段对应于目标视听内容中的多个部分,多个部分至少包括在目标视听内容中不连续的第一部分和第二部分;使片段视听内容至少基于目标视听内容的多个部分而被创建,其中第一部分和第二部分在片段视听内容中是连续的; 以及呈现用于分享片段视听内容的分享入口。In a first aspect of the present disclosure, a method for viewing audio-visual content is provided. The method includes: receiving a selection of a plurality of text segments, the plurality of text segments corresponding to a plurality of parts in target audio-visual content, the plurality of parts at least including a first part and a second part that are not continuous in the target audio-visual content; causing segment audio-visual content to be created based on at least the plurality of parts of the target audio-visual content, wherein the first part and the second part are continuous in the segment audio-visual content; And present a sharing entrance for sharing the audio-visual content clips.
在本公开的第二方面,提供了一种用于查看视听内容的装置。该装置包括接收模块,被配置为接收针对多个文本片段的选择,多个文本片段对应于目标视听内容中的多个部分,多个部分至少包括在目标视听内容中不连续的第一部分和第二部分;控制模块,被配置为使片段视听内容至少基于目标视听内容的多个部分而被创建,其中第一部分和第二部分在片段视听内容中是连续的;以及呈现模块,被配置为呈现用于分享片段视听内容的分享入口。In a second aspect of the present disclosure, a device for viewing audio-visual content is provided. The device includes a receiving module configured to receive selections for multiple text segments, the multiple text segments corresponding to multiple parts in the target audio-visual content, the multiple parts at least including a first part and a second part that are discontinuous in the target audio-visual content; a control module configured to create segmented audio-visual content based on at least the multiple parts of the target audio-visual content, wherein the first part and the second part are continuous in the segmented audio-visual content; and a presentation module configured to present a sharing entry for sharing the segmented audio-visual content.
在本公开的第三方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
在本公开的第四方面,提供了一种计算机可读存储介质。介质上存储有计算机程序,程序被处理器执行时实现第一方面的方法。In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, wherein a computer program is stored on the medium, and when the program is executed by a processor, the method of the first aspect is implemented.
在本公开的第五方面,提供了一种播放系统。该播放系统包括:主时间轴,主时间轴至少指示视听内容的当前播放位置;以及至少一个发言时间轴,至少一个发言时间轴用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。In a fifth aspect of the present disclosure, a playback system is provided. The playback system includes: a main timeline, which at least indicates the current playback position of the audio-visual content; and at least one speech timeline, which is used to indicate the temporal distribution of the speech content of at least one speaker associated with the audio-visual content.
应当理解,本发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。It should be understood that the contents described in the summary of the present invention are not intended to limit the key features or important features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. In the accompanying drawings, the same or similar reference numerals represent the same or similar elements, wherein:
图1示出了传统的视听内容播放器的示意图;FIG1 shows a schematic diagram of a conventional audio-visual content player;
图2A至图2C示出了根据本公开的一些实施例的示例播放系统的示意图; 2A to 2C show schematic diagrams of example playback systems according to some embodiments of the present disclosure;
图3A和图3B示出了根据本公开的一些实施例的视听内容的示例查看界面;3A and 3B illustrate example viewing interfaces for audiovisual content according to some embodiments of the present disclosure;
图4A和图4B示出了根据本公开的一些实施例的分享片段视听内容的示意图;4A and 4B are schematic diagrams showing sharing of audio-visual content segments according to some embodiments of the present disclosure;
图5示出了根据本公开的一些实施例的查看视听内容的示例过程的流程图;FIG5 illustrates a flow chart of an example process for viewing audiovisual content according to some embodiments of the present disclosure;
图6示出了根据本公开的一些实施例的用于查看视听内容的装置的框图;以及FIG6 shows a block diagram of an apparatus for viewing audiovisual content according to some embodiments of the present disclosure; and
图7示出了能够实施本公开的多个实施例的设备的框图。FIG. 7 shows a block diagram of a device capable of implementing various embodiments of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be construed as being limited to the embodiments set forth herein. On the contrary, these embodiments are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes and are not intended to limit the scope of protection of the present disclosure.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "including" and similar terms should be understood as open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may also be included below.
如上文所讨论的,人们可以利用播放器来获取视听内容。图1示出了传统的视听内容播放器100的示意图。如图1所示,在播放器100中,人们通常需要通过拖拽时间轴控制来定位到期望播放的时刻。As discussed above, people can use a player to obtain audio-visual content. Figure 1 shows a schematic diagram of a traditional audio-visual content player 100. As shown in Figure 1, in the player 100, people usually need to drag the time axis control to locate the desired playback moment.
然而,这样的播放控制是低效的。例如,在图1的示例中,视听内容具有超过1个小时的长度,这使得用户很难通过时间轴来快速地定位到期望播放的位置。However, such playback control is inefficient. For example, in the example of FIG. 1 , the audiovisual content has a length of more than 1 hour, which makes it difficult for the user to quickly locate the desired playback position through the time axis.
这样的情况在诸如会议、演讲或在线课堂的视听内容的回看中尤 为显著。这样的场景通常具有多个发言方,人们可能期望快速地定位某个特定发言方的发言部分。This is especially true when reviewing audio-visual content such as conferences, lectures, or online classes. Such a scene usually has multiple speakers, and people may want to quickly locate the speaking part of a specific speaker.
本公开的实施例提供了一种视听内容(音频内容或视频内容)的播放系统。该系统可以包括主时间轴,以至少指示视听内容的当前播放位置。此外,该系统还可以包括至少一个发言时间轴,该至少一个发言时间轴用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。The embodiments of the present disclosure provide a system for playing audio-visual content (audio content or video content). The system may include a main timeline to at least indicate the current playback position of the audio-visual content. In addition, the system may also include at least one speech timeline, which is used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content in time.
此外,本公开的实施例还提供了一种查看视听内容的方案。根据该方案,可以提供针对视听内容的查看界面,其中查看界面包括用于播放视听内容的播放控件。进一步地,可以在播放控件中呈现至少一个发言时间轴,至少一个发言时间轴用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。In addition, an embodiment of the present disclosure also provides a solution for viewing audio-visual content. According to the solution, a viewing interface for audio-visual content can be provided, wherein the viewing interface includes a playback control for playing the audio-visual content. Further, at least one speech timeline can be presented in the playback control, and the at least one speech timeline is used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content in time.
基于这样的方式,本公开的实施例能够在播放系统或播放控件中提供发言时间轴,以提供与视听内容相关联的发言方的发言内容对应的时间分布。由此,本公开的实施能够方便用户查看特定发言方所对应的部分,从而提高用户获取期望内容的效率。Based on this approach, the embodiments of the present disclosure can provide a speech timeline in the playback system or playback control to provide a time distribution corresponding to the speech content of the speaker associated with the audio-visual content. Thus, the implementation of the present disclosure can facilitate users to view the part corresponding to a specific speaker, thereby improving the efficiency of users in obtaining desired content.
以下将结合附图详细描述根据本公开实施例的示例方案。Hereinafter, exemplary solutions according to embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
示例播放系统Sample playback system
在一些实施例中,本公开的实施例能够利用时间轴来提供关于视听内容的更加丰富的信息。In some embodiments, embodiments of the present disclosure can utilize a timeline to provide richer information about audio-visual content.
图2A示出了根据本公开的一些实施例的示例播放系统205的示意图200A。如图2A所示,播放系统205(也称为播放器205或播放控件205)可以用于播放相应的视听内容。播放系统205例如可以是由适当的电子设备所提供,这样的电子设备的示例可以包括但不限于:台式电脑、笔记本电脑、智能手机、平板电脑、个人数字助理或智能穿戴设备等。FIG2A shows a schematic diagram 200A of an example playback system 205 according to some embodiments of the present disclosure. As shown in FIG2A , the playback system 205 (also referred to as a player 205 or a playback control 205) can be used to play corresponding audio-visual content. The playback system 205 can be provided by, for example, an appropriate electronic device, examples of which can include, but are not limited to, a desktop computer, a laptop computer, a smart phone, a tablet computer, a personal digital assistant, or a smart wearable device.
在一些实施例中,视听内容可以包括在视听系统205本地的音视频文件、在云端存储的音视频文件或者音视频流。这样的音视频流例 如可以包括已经记录的视听内容(例如,会议记录)的播放流,或者是直播视听内容的直播流。In some embodiments, the audiovisual content may include audio and video files locally stored in the audiovisual system 205, audio and video files stored in the cloud, or audio and video streams. For example, it may include a playback stream of recorded audio-visual content (eg, a conference recording), or a live stream of live audio-visual content.
主时间轴Main Timeline
如图2A所示,播放系统205可以包括主时间轴210。在一些实施例中,主时间轴210可以指示视听内容的当前播放位置,也即播放进度。例如,主时间轴210可以包括播放位置指示符215,以指示视听内容当前被播放的时间点。As shown in Fig. 2A, the playback system 205 may include a main timeline 210. In some embodiments, the main timeline 210 may indicate the current playback position of the audio-visual content, that is, the playback progress. For example, the main timeline 210 may include a playback position indicator 215 to indicate the time point at which the audio-visual content is currently being played.
例如,如果待播放的视听内容是已经记录的内容,则该视听内容的长度信息是固定的,该时间轴的全部长度可以对应于该时间内容的总时长。此外,播放位置指示符215可以根据位置与播放时间的对应关系而被相应地设置。For example, if the audiovisual content to be played is a recorded content, the length information of the audiovisual content is fixed, and the total length of the time axis can correspond to the total duration of the time content. In addition, the play position indicator 215 can be set accordingly according to the corresponding relationship between the position and the play time.
在又一些示例中,如果待播放的视听内容是时长仍在增加的直播内容,则该播放位置指示符215例如可以总是被设置在主时间轴210的最右侧。此外,如果用户期望回看已经直播的特定内容,则用户例如可以通过移动播放位置指示符215来跳回到对应的时间点。In some other examples, if the audiovisual content to be played is live content whose duration is still increasing, the play position indicator 215 may, for example, always be set to the far right of the main timeline 210. In addition, if the user wants to watch back a specific content that has already been broadcast live, the user may, for example, jump back to the corresponding time point by moving the play position indicator 215.
在一些实施例中,如图1所示,主时间轴210还可以呈现与视听内容的音频波形相对应的图形信息。以此方式,用户可以更加方便地了解视听内容的哪些部分是值得中关注的部分,哪些部分例如音频波形较少而可以暂时被忽略。由此,这样的播放系统205可以提供用户获取内容的效率。In some embodiments, as shown in FIG1 , the main timeline 210 may also present graphic information corresponding to the audio waveform of the audio-visual content. In this way, the user can more conveniently understand which parts of the audio-visual content are worth paying attention to, and which parts, such as those with less audio waveforms, can be temporarily ignored. Thus, such a playback system 205 can improve the efficiency of content acquisition for users.
在一些实施例中,视听内容例如可以是关于在线会议的记录内容。相应地,如图1所示,主时间轴210例如还可以呈现与在线会议中的交互行为相对应的交互标识220。In some embodiments, the audiovisual content may be, for example, recorded content about an online conference. Accordingly, as shown in FIG1 , the main timeline 210 may also present, for example, an interaction identifier 220 corresponding to an interaction behavior in the online conference.
这样的交互标识220可以被设置在主时间轴210的对应位置,以指示在相应的时刻发生了对应的交互行为。在一些实施例中,交互标识220的不同图形可以对应于不同的交互行为。Such an interaction mark 220 may be set at a corresponding position of the main time axis 210 to indicate that a corresponding interaction behavior has occurred at a corresponding moment. In some embodiments, different graphics of the interaction mark 220 may correspond to different interaction behaviors.
在一些实施例中,主时间轴210例如可以包括用于指示在线会议中的文件共享的交互标识220。相应地,该交互标识220例如可以具 有与被分享文件的格式所对应的图形,例如,可以是被分享的文件的缩略图。In some embodiments, the main timeline 210 may include, for example, an interactive identifier 220 for indicating file sharing in an online conference. There is a graphic corresponding to the format of the shared file, for example, it can be a thumbnail of the shared file.
在一些实施例中,当用户选择该交互标识220时,播放系统205例如可以引导用户获取关于该文件分享的描述信息。例如,当用户例如通过鼠标悬停到该交互标识220上方时,播放系统205例如可以通过浮窗的方式来指示被分享的文件的信息,例如,文件的名称、格式、大小、分享人等等。在又一示例中,如果用户点击该交互标识220,则播放系统205例如可以引导用户获取该被分享的文件的内容,例如,可以引导用户跳转至该文件的在线查看界面。In some embodiments, when the user selects the interactive mark 220, the playback system 205 can, for example, guide the user to obtain descriptive information about the file sharing. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 can, for example, indicate the information of the shared file in a floating window, such as the file name, format, size, sharer, etc. In another example, if the user clicks the interactive mark 220, the playback system 205 can, for example, guide the user to obtain the content of the shared file, for example, can guide the user to jump to the online viewing interface of the file.
在一些实施例中,主时间轴210例如可以包括用于指示在线会议中的在线聊天的交互标识220。此处的在线聊天是指例如利用在线会议的即时通讯工具所进行的基于文本、表情、图像和/或音频的任意适当聊天。相应地,该交互标识220的图形标识例如可以基于在线聊天的内容而被确定。备选地,该交互标识220的图形标识例如可以参与聊天的用户的图形标识(例如,头像)而被确定。In some embodiments, the main timeline 210 may include, for example, an interactive identifier 220 for indicating an online chat in an online conference. The online chat herein refers to any appropriate chat based on text, emoticons, images, and/or audio, for example, using an instant messaging tool of an online conference. Accordingly, the graphic identifier of the interactive identifier 220 may be determined, for example, based on the content of the online chat. Alternatively, the graphic identifier of the interactive identifier 220 may be determined, for example, by a graphic identifier (e.g., an avatar) of a user participating in the chat.
在一些实施例中,当用户选择该交互标识220时,播放系统205例如可以引导用户获取关于该在线聊天的描述信息。例如,当用户例如通过鼠标悬停到该交互标识220上方时,播放系统205例如可以通过浮窗的方式来指示在线聊天的信息,例如,在线聊天的参与方、聊天内容等等。在又一示例中,如果用户点击该交互标识220,则播放系统205例如可以引导用户获取在先聊天的完整内容,例如,引导用户跳转至会议中聊天内容的查看界面。In some embodiments, when the user selects the interactive mark 220, the playback system 205 can guide the user to obtain descriptive information about the online chat. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 can indicate the information of the online chat, such as the participants of the online chat, the chat content, etc., in a floating window. In another example, if the user clicks the interactive mark 220, the playback system 205 can guide the user to obtain the complete content of the previous chat, for example, guide the user to jump to the viewing interface of the chat content in the meeting.
在一些实施例中,主时间轴210例如可以包括用于指示在线会议中的评论的交互标识220。此处的评论例如可以包括基于文本、表情、图像和/或音频的任意适当评论。例如,用户的点赞也可以被理解为是针对相应内容的一种评论。相应地,该交互标识220的图形标识例如可以基于评论的内容和/或类型而被确定。例如,如果是基于表情的评论,则该交互标识220的图形表示可以基于该表情而被生成。In some embodiments, the main timeline 210 may include, for example, an interactive identifier 220 for indicating comments in an online conference. The comments here may include, for example, any appropriate comments based on text, expressions, images, and/or audio. For example, a user's like may also be understood as a comment on the corresponding content. Accordingly, the graphic identifier of the interactive identifier 220 may be determined, for example, based on the content and/or type of the comment. For example, if it is an expression-based comment, the graphic representation of the interactive identifier 220 may be generated based on the expression.
在一些实施例中,当用户选择该交互标识220时,播放系统205 例如可以引导用户获取关于该评论的描述信息。例如,当用户例如通过鼠标悬停到该交互标识220上方时,播放系统205例如可以通过浮窗的方式来指示评论的信息,例如,评论方、评论时间、评论的回复等。在又一示例中,如果用户点击该交互标识220,则播放系统205例如可以引导用户跳转至查看评论的界面,以获取关于该评论的更加丰富的信息。In some embodiments, when the user selects the interactive indicator 220, the playback system 205 For example, the user may be guided to obtain descriptive information about the comment. For example, when the user hovers over the interactive mark 220 with a mouse, the playback system 205 may indicate the information of the comment, such as the commenter, comment time, and comment reply, etc., in a floating window. In another example, if the user clicks the interactive mark 220, the playback system 205 may guide the user to jump to the interface for viewing comments to obtain more abundant information about the comment.
在一些实施例中,主时间轴210还可以呈现用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布的发言人信息。在这种情况下,主时间轴210也可以被认定为一种类型的发言时间轴。In some embodiments, the main timeline 210 may also present speaker information indicating the temporal distribution of speech content of at least one speaker associated with the audiovisual content. In this case, the main timeline 210 may also be identified as a type of speech timeline.
示例性地,主时间轴210可以为每个发言方分配对应的颜色标志。相应地,主时间轴210上的颜色分布可以用于指示对应的时间段对应于哪一个或多个发言人。应当理解,还可以利用其它适当的样式来利用主时间轴210指示发言方的发言内容在时间上的分布。For example, the main timeline 210 may assign a corresponding color mark to each speaker. Accordingly, the color distribution on the main timeline 210 may be used to indicate which one or more speakers the corresponding time period corresponds to. It should be understood that other appropriate styles may also be used to use the main timeline 210 to indicate the distribution of the speech content of the speaker in time.
发言时间轴Speech Timeline
在一些实施例中,如图2A所示,播放系统205例如还可以包括用于查看发言时间轴的查看入口230。在一些实施例中,查看入口230例如可以指示与视听内容相关联的一个或多个发言方的图形标识(例如,头像)。2A, the playback system 205 may further include a viewing portal 230 for viewing the speech timeline. In some embodiments, the viewing portal 230 may indicate a graphic identifier (eg, avatar) of one or more speakers associated with the audiovisual content.
在接收用户对于查看入口230的选择后,如图2B,播放系统205例如可以呈现发言时间轴240-1和发言时间轴240-2(单独或统一称为发言时间轴240)。After receiving the user's selection of the viewing portal 230 , as shown in FIG. 2B , the playback system 205 may present, for example, a speech timeline 240 - 1 and a speech timeline 240 - 2 (individually or collectively referred to as speech timelines 240 ).
在一些实施例中,发言时间轴240可以用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。例如,如果对应的时刻该发言方进行了发言,则发言时间轴240可以被填充第一图形;相反,如果对应的时刻该发言方未进行发言,则发言时间轴240可以被填充第二图形。由此,用户可以直观地了解发言方都在什么时刻进行了发言。In some embodiments, the speech timeline 240 may be used to indicate the distribution of speech content of at least one speaker associated with the audio-visual content over time. For example, if the speaker made a speech at the corresponding moment, the speech timeline 240 may be filled with a first graphic; on the contrary, if the speaker did not make a speech at the corresponding moment, the speech timeline 240 may be filled with a second graphic. Thus, the user can intuitively understand at what moment each speaker made a speech.
在一些实施例中,如图2B所示,发言时间轴240也可以类似地 呈现与发言方对应的部分视听内容的音频波形相对应的图形信息。基于这样的方式,用户也可以直观地了解哪些时刻,该发言方并没有进行发言,哪些时刻该发言方高频地进行了发言。这样的信息更加有助于用户快速地获取期望的内容。In some embodiments, as shown in FIG. 2B , the speech timeline 240 may also be similarly Graphic information corresponding to the audio waveform of the audio-visual content corresponding to the speaker is presented. Based on this method, the user can also intuitively understand at which moments the speaker did not speak and at which moments the speaker spoke frequently. Such information is more helpful for users to quickly obtain the desired content.
在一些实施例中,发言时间轴240的数目可以基于参与了视听内容的发言方的数目而被确定。在一些实施例中,这样的发言方的数目可以是参与在线会议的终端的数目而被确定。例如,多个会议参与人可能通过同一个终端(或使用同一个账号)接入了在线会议,则这样的多个参与人可能被认定为同一个发言方,尽管其可能包括多个不同的说话人。In some embodiments, the number of speech timelines 240 may be determined based on the number of speakers participating in the audiovisual content. In some embodiments, the number of such speakers may be determined by the number of terminals participating in the online conference. For example, multiple conference participants may access the online conference through the same terminal (or use the same account), and such multiple participants may be identified as the same speaker, although they may include multiple different speakers.
在一些实施例中,这样的发言方的数目可以是基于视听内容中说话人的数目而被确定的。应当理解,可以采用任何适当的说话人识别技术来确定视听内容中对应的说话人,本公开不旨在对此进行限定。In some embodiments, the number of such speakers may be determined based on the number of speakers in the audiovisual content. It should be understood that any appropriate speaker recognition technology may be used to determine the corresponding speakers in the audiovisual content, and the present disclosure is not intended to be limited thereto.
在一些实施例中,在接收到对于查看入口230的选择后,播放系统205可以呈现与视听内容的全部发言方对应的发言时间轴240。以图2B作为示例,该视听内容例如可以包括两个发言方(“发言方1”和“发言方2”)。相应地,其对应的发言时间轴240-1和发言时间轴240-2在播放系统205中的呈现顺序例如是可以基于发言方的信息而被确定。In some embodiments, after receiving a selection for the viewing entry 230, the playback system 205 may present a speech timeline 240 corresponding to all speakers of the audio-visual content. Taking FIG. 2B as an example, the audio-visual content may include two speakers (“speaker 1” and “speaker 2”). Accordingly, the presentation order of the corresponding speech timeline 240-1 and speech timeline 240-2 in the playback system 205 may be determined based on the information of the speakers.
在一些示例中,发言时间轴的呈现顺序例如可以基于发言方的文本标识而被确定。这样的文本标识例如可以包括发言方的用户名或昵称等,发言时间轴的呈现顺序例如可以基于发言方的文本标识的排序。In some examples, the presentation order of the speech timeline may be determined based on the text identifier of the speaker, such as a user name or nickname of the speaker, and the presentation order of the speech timeline may be based on the order of the text identifier of the speaker.
在又一些示例中,发言时间轴的呈现顺序例如可以基于发言方的发言内容的比例而被确定。例如,如果“发言方1”的发言内容比例达到“70%”,其大于“发言方2”的发言内容比例“30%”,则发言时间轴240-1例如可以优先于发言时间轴240-2而被呈现。In some other examples, the presentation order of the speech timeline can be determined based on the proportion of the speech content of the speaker. For example, if the speech content proportion of "Speaker 1" reaches "70%", which is greater than the speech content proportion of "Speaker 2" "30%", then the speech timeline 240-1 can be presented in priority over the speech timeline 240-2.
在又一些示例中,发言时间轴的呈现顺序例如可以基于发言方的发言内容的起始时间而被确定。例如,“发言方1”的发言内容起始 时间例如为会议开始后第一分钟,其早于“发言方2”的发言内容起始时间,例如,会议开始后第三分钟。相应地,发言时间轴240-1例如可以优先于发言时间轴240-2而被呈现。In some other examples, the presentation order of the speech timeline can be determined based on the start time of the speech content of the speaker. The time is, for example, the first minute after the meeting starts, which is earlier than the start time of the speech content of "Speaker 2", for example, the third minute after the meeting starts. Accordingly, the speech timeline 240-1 can be presented in priority to the speech timeline 240-2.
应当理解,还可以采用其它适当的排序策略来对多个发言时间轴240进行排序,以方便用户更加高效地获取期望的内容。It should be understood that other appropriate sorting strategies may be used to sort the multiple speech timelines 240 to facilitate users to obtain desired content more efficiently.
在一些实施例中,播放系统205还可以与发言时间轴240相关联地呈现相应的发言方的描述信息。例如,发言时间轴240-1可以具有对应的发言方的文本标识(例如,用户名或昵称)。或者,发言时间轴240-1还可以具有对应的发言方的图形标识(例如,头像)。In some embodiments, the playback system 205 may also present the description information of the corresponding speaker in association with the speech timeline 240. For example, the speech timeline 240-1 may have a text identifier (e.g., a user name or nickname) of the corresponding speaker. Alternatively, the speech timeline 240-1 may also have a graphic identifier (e.g., an avatar) of the corresponding speaker.
在一些实施例中,播放系统205还可以与至少一个发言时间轴相关联地呈现相应的发言方的发言内容的占比信息。发言时间轴240-1可以包括“发言方1”的发言内容的占比“XX%”。In some embodiments, the playback system 205 may also present the percentage information of the speech content of the corresponding speaker in association with at least one speech timeline. The speech timeline 240-1 may include the percentage "XX%" of the speech content of "speaker 1".
在一些实施例中,类似于主时间轴210,发言时间轴240还可以呈现用于指示在线会议中与对应的发言方相关联的交互行为的交互标识(图2B中未示出)。In some embodiments, similar to the main timeline 210 , the speech timeline 240 may also present an interaction identifier (not shown in FIG. 2B ) for indicating an interaction behavior associated with a corresponding speaker in the online conference.
在一些实施例中,这样的交互行为是指对应的发言方所参与的对应交互行为,例如,上文所讨论的文件共享、在线聊天或评论等行为。在发言时间轴240上所呈现的交互标识的交互逻辑可以类似于上文所讨论的交互标识220,在此不再详叙。In some embodiments, such interactive behaviors refer to corresponding interactive behaviors in which the corresponding speaker participates, such as the file sharing, online chatting or commenting behaviors discussed above. The interactive logic of the interactive identifiers presented on the speech timeline 240 may be similar to the interactive identifiers 220 discussed above, which will not be described in detail here.
在一些实施例中,发言时间轴240-1还可以响应于用户对查看入口230的选择而被自动地收起或展开。例如,播放系统可以总是默认地提供关于全部发言方的发言时间轴,而不依赖于查看入口230的选择。In some embodiments, the speech timeline 240 - 1 may also be automatically collapsed or expanded in response to the user's selection of the viewing portal 230 . For example, the playback system may always provide the speech timeline about all speakers by default, regardless of the selection of the viewing portal 230 .
在一些实施例中,播放系统205还可以提供关于发言时间轴的查找入口250。利用查找入口250,用户可以发起与特定发言方相关联的查看请求。In some embodiments, the playback system 205 may also provide a search portal 250 for the speech timeline. Using the search portal 250, a user may initiate a viewing request associated with a specific speaker.
在一些实施例中,在接收到对查看入口250的选择后,播放系统205可以呈现与视听内容相关联的全部发言方相关联的视觉元素。这样的视觉元素例如可以包括发言方的文本标识(例如,用户名或昵称) 或图形标识(例如,头像)。In some embodiments, upon receiving a selection of the viewing portal 250, the playback system 205 may present visual elements associated with all speakers associated with the audio-visual content. Such visual elements may include, for example, a text identifier of the speaker (e.g., a user name or nickname). or a graphic identifier (e.g., an avatar).
进一步地,播放系统205可以接收用户对于多个视觉元素中特定视觉元素的选择,来确定用户期望查看对应于所选择视觉元素的发言方的发言时间轴。例如,用户可以点击“发言方1”的头像,以使得播放系统205仅呈现与“发言方1”所对应的发言时间轴240-1而不呈现发言时间轴240-2。Furthermore, the playback system 205 may receive a user's selection of a specific visual element from among the multiple visual elements to determine that the user desires to view the speech timeline of the speaker corresponding to the selected visual element. For example, the user may click on the avatar of "Speaker 1" so that the playback system 205 only presents the speech timeline 240-1 corresponding to "Speaker 1" but not the speech timeline 240-2.
作为另一示例,用户例如也可以通过查看入口250来提供指示目标发言方的输入。例如,用户可以输入“发言方1”的昵称或用户名的至少部分,以自动匹配到该“发言方1”,并使得播放系统205相应地呈现与“发言方1”所对应的发言时间轴240-1而不呈现发言时间轴240-2。As another example, the user may also provide input indicating the target speaker by viewing the entry 250. For example, the user may enter at least part of the nickname or user name of "Speaker 1" to automatically match to "Speaker 1" and cause the playback system 205 to correspondingly present the speech timeline 240-1 corresponding to "Speaker 1" instead of presenting the speech timeline 240-2.
在一些实施例中,查找入口250例如可以独立于查看入口230而被提供。例如,在查看入口230未被选择的情况下,播放系统205也可以相应地提供关于查看特定发言方的查找入口250。In some embodiments, the search entry 250 may be provided independently of the viewing entry 230. For example, when the viewing entry 230 is not selected, the playback system 205 may also provide a search entry 250 for viewing a specific speaker.
备选地,查找入口250例如可以依赖于查看入口230而被提供。也即,只有在查看入口230被激活,并且全部发言方的发言时间轴均被呈现的情况下,查找入口250才被相应地提供,以用于快速地过滤或定位特定的发言时间轴。Alternatively, the search portal 250 may be provided, for example, dependent on the viewing portal 230. That is, only when the viewing portal 230 is activated and the speech timelines of all speakers are presented, the search portal 250 is provided accordingly for quickly filtering or locating a specific speech timeline.
在一些实施例中,发言时间轴240还可以支持用户的各种类型交互。例如,如图2C所示,用户可以点击发言时间轴240-1中的位置260,以指示期望从该位置开始播放该视听内容。In some embodiments, the speech timeline 240 may also support various types of user interactions. For example, as shown in FIG2C , the user may click on a position 260 in the speech timeline 240 - 1 to indicate that the audiovisual content is expected to be played starting from that position.
相应地,播放系统205可以使视听内容从与该位置260对应的时间点270处播放。在一些实施例中,播放系统205可以使视听内容从时间点270处被连续播放。例如,如果时间点270为时刻“5分30秒”,则该视听内容将从“5分30秒”连续播放直至结束。Accordingly, the playback system 205 can play the audiovisual content from the time point 270 corresponding to the position 260. In some embodiments, the playback system 205 can play the audiovisual content continuously from the time point 270. For example, if the time point 270 is the moment "5 minutes and 30 seconds", the audiovisual content will be played continuously from "5 minutes and 30 seconds" until the end.
备选地,播放系统205也可以使与“发言方1”对应的部分视听内容从时间点270处被播放。也即,播放系统205可以仅播放与发言时间轴240-1对应的“发言方1”的部分视听内容,并且使其从时间点270处被播放,从而实现只听特定发言方的效果。 Alternatively, the playback system 205 may also play part of the audiovisual content corresponding to "Speaker 1" from time point 270. That is, the playback system 205 may only play part of the audiovisual content of "Speaker 1" corresponding to the speech timeline 240-1, and play it from time point 270, thereby achieving the effect of only listening to a specific speaker.
在一些实施例中,如果用户对发言时间轴240-1执行了预设操作(例如,双击该发言时间轴240-1),则播放系统205可以使与“发言方1”对应的部分视听内容从头开始播放,也即,只播放视听内容中与“发言方1”对应的部分视听内容。In some embodiments, if the user performs a preset operation on the speech timeline 240-1 (for example, double-clicking the speech timeline 240-1), the playback system 205 can make the partial audio-visual content corresponding to "Speaker 1" play from the beginning, that is, only play the partial audio-visual content corresponding to "Speaker 1" in the audio-visual content.
应当理解,出于描述的目的,以上虽然结合图2A至图2C讨论了播放系统的各种示例,但是上文所讨论的各种特性(例如,音频波形的提供、交互标识的提供、发言时间轴的提供、发言时间轴的样式、发言时间轴的交互等)可以被独立地或者以不同于图2A至图2C中所示出的组合方式而被提供。例如,在播放系统提供了交互标识这一特性的情况下,播放系统的时间轴可以是类似于传统播放系统的时间轴的图形样式,而不必须是用于指示音频波形。It should be understood that, for the purpose of description, although various examples of playback systems are discussed above in conjunction with FIGS. 2A to 2C , the various features discussed above (e.g., provision of audio waveforms, provision of interactive identifiers, provision of speech timelines, style of speech timelines, interaction of speech timelines, etc.) can be provided independently or in a combination different from that shown in FIGS. 2A to 2C . For example, in the case where the playback system provides the feature of interactive identifiers, the timeline of the playback system can be a graphical style similar to the timeline of a conventional playback system, and does not necessarily have to be used to indicate an audio waveform.
此外,图2A至图2C中所示出的示例虽然是针对已经记录的内容的回看,但播放系统205例如也可以用于播放实时视听内容(例如,音视频直播流)。相应地,以上所讨论的发言时间轴例如可以用于指示与实时视听内容的历史部分相关联的至少一个发言方的历史发言内容在时间上的分布。例如,发言时间轴可以利用图形的方式来呈现:从直播开始时刻到当前时刻,各发言方的历史发言内容在时间上的分布情况。In addition, although the examples shown in FIG. 2A to FIG. 2C are for reviewing the recorded content, the playback system 205 can also be used to play real-time audio-visual content (e.g., live audio and video streams). Accordingly, the speech timeline discussed above can be used to indicate the temporal distribution of the historical speech content of at least one speaker associated with the historical portion of the real-time audio-visual content. For example, the speech timeline can be presented in a graphical manner: the temporal distribution of the historical speech content of each speaker from the start of the live broadcast to the current moment.
示例查看界面Example viewing interface
在一些实施例中,本公开的实施例还可以提供视听内容的查看界面。这样的查看界面例如可以是记录内容的回看界面,或者实时内容的直播界面。应当理解,仅是处于方便说明的目的,下文以“会议记录”这一场景作为查看视听内容的示例,但这样的场景仅是示例性的,本公开的实施例还可以应用于其它适当的场景。In some embodiments, the embodiments of the present disclosure may also provide a viewing interface for audio-visual content. Such a viewing interface may be, for example, a playback interface for recorded content, or a live interface for real-time content. It should be understood that, for the purpose of convenience of explanation, the following uses the "meeting minutes" scenario as an example of viewing audio-visual content, but such a scenario is only exemplary, and the embodiments of the present disclosure may also be applied to other appropriate scenarios.
图3A示出了根据本公开的一些实施例的示例查看界面300。如图3A所示,查看界面300可以包括播放控件310。播放控件310例如可以采用如上文所讨论的播放系统205来实现。如图3A所示,播 放控件310例如可以包括主时间轴312和发言时间轴314-1和314-2(单独或统一称为发言时间轴314)。FIG. 3A shows an example viewing interface 300 according to some embodiments of the present disclosure. As shown in FIG. 3A , the viewing interface 300 may include a playback control 310. The playback control 310 may be implemented, for example, using the playback system 205 discussed above. The control panel 310 may include, for example, a main timeline 312 and speech timelines 314 - 1 and 314 - 2 (individually or collectively referred to as speech timelines 314 ).
在一些实施例中,所述查看界面300还包括文本控件320,其用于呈现与所述视听内容对应的文本内容。在一些实施例中,该文本内容可以是基于所述视听内容的音频而被生成。以视听内容为会议记录为例,文本内容例可以是基于对各发言方在会议中的发言音频进行语音识别而生成。以视听内容为实时直播内容为例,该文本内容例如可以是基于对各发言方的实时语音识别而生成。In some embodiments, the viewing interface 300 further includes a text control 320 for presenting text content corresponding to the audio-visual content. In some embodiments, the text content may be generated based on the audio of the audio-visual content. Taking the audio-visual content as a meeting record as an example, the text content may be generated based on voice recognition of the audio of each speaker in the meeting. Taking the audio-visual content as a real-time live broadcast content as an example, the text content may be generated based on real-time voice recognition of each speaker.
在一些实施例中,如图3B所示,用户例如可以选择发言时间轴314-1,相应地,与“发言方1”对应的文本内容322可以被调整为在所述文本控件320中相对于其它发言方的其它文本内容324被强调显示。In some embodiments, as shown in FIG. 3B , the user may select the speech timeline 314 - 1 , for example, and accordingly, the text content 322 corresponding to “Speaker 1 ” may be adjusted to be highlighted in the text control 320 relative to other text content 324 of other speakers.
在一些实施例中,使文本内容322相对于其它文本内容324被强调显示例如可以包括提高文本内容322在所述文本控件320中被显示的突出程度。例如,可以使得文本内容322的显示样式(例如,文本颜色、背景色、加粗程度、字体大小、下划线)等被调整为更加突出。例如,可以使得文本内容322被加粗或者被高亮。In some embodiments, making the text content 322 highlighted relative to other text content 324 may include, for example, increasing the prominence of the text content 322 displayed in the text control 320. For example, the display style (e.g., text color, background color, boldness, font size, underline) of the text content 322 may be adjusted to be more prominent. For example, the text content 322 may be bolded or highlighted.
备选地,使文本内容322相对于文本内容324被强调显示例如还可以包括降低其它文本内容324在所述文本控件中被显示的突出程度。例如,可以使得文本内容324的显示样式(例如,文本颜色、背景色、加粗程度、字体大小、下划线)等被调整为不突出。例如,瑞图3B所示,可以使得其它文本内容324的文本颜色该为灰色,从而形成与黑色的文本内容322的对比。Alternatively, making the text content 322 highlighted relative to the text content 324 may also include, for example, reducing the prominence of the other text content 324 displayed in the text control. For example, the display style (e.g., text color, background color, boldness, font size, underline) of the text content 324 may be adjusted to be non-prominent. For example, as shown in FIG. 3B , the text color of the other text content 324 may be gray, thereby forming a contrast with the black text content 322.
如参考图2C所讨论的,用户还可以选择发言时间轴中的特定位置,以触发视听内容从对应的时刻开始被播放。备选地或附加地,在发言时间轴中的特定位置被选择时,与该特定位置所对应的文本内容还可以被调整为在文本控件320中突出呈现。As discussed with reference to FIG2C , the user may also select a specific position in the speech timeline to trigger the audiovisual content to be played starting from the corresponding moment. Alternatively or additionally, when a specific position in the speech timeline is selected, the text content corresponding to the specific position may also be adjusted to be highlighted in the text control 320.
例如,文本控件320中的所呈现的文本内容总是对应于视听内容当前所播放的时刻。当用户选择了发言时间轴中的特定时刻后,与该 时刻所对应的片段文本内容(例如,该发言方所说的某一段话对应的文本)可以被调整至文本公开320的最上方,以突出呈现。备选地或附加地,该片段文本内容的显示样式也可以被调整,以突出呈现该片段文本内容。例如,与该时间点所对应的一个或多个字词可以被该高亮以突出呈现。For example, the text content presented in the text control 320 always corresponds to the moment at which the audiovisual content is currently playing. The text content of the segment corresponding to the moment (e.g., the text corresponding to a certain paragraph spoken by the speaker) can be adjusted to the top of the text disclosure 320 to be highlighted. Alternatively or additionally, the display style of the text content of the segment can also be adjusted to highlight the text content of the segment. For example, one or more words corresponding to the time point can be highlighted to be highlighted.
片段视听内容的分享Sharing of audio-visual content
在一些实施例中,本公开的实施例例如还可以支持基于发言时间轴的片段视听内容分享。如图4A所示,用户可以选择播放控件410(或播放系统410)中的多个发言时间轴的一个或多个发言时间轴(例如,发言时间轴430-1),以用于分享。In some embodiments, embodiments of the present disclosure may also support sharing of audiovisual content segments based on speech timelines. As shown in FIG4A , a user may select one or more speech timelines (e.g., speech timeline 430 - 1 ) of the multiple speech timelines in the playback control 410 (or the playback system 410 ) for sharing.
在接收到该选择后,与该发言时间轴430-1所对应的片段视听内容可以被生成以用于分享。以图4A作为示例,在用户选择了发言时间轴430-1,并点击分享入口420后(即,发出分享请求),“发言方1”的全部发言内容可以用于生成独立的片段视听内容,以例如用于分享至其它用户或组织。After receiving the selection, the audio-visual content corresponding to the speech timeline 430-1 can be generated for sharing. Taking FIG. 4A as an example, after the user selects the speech timeline 430-1 and clicks the sharing entry 420 (i.e., sends a sharing request), the entire speech content of "Speaker 1" can be used to generate independent audio-visual content, for example, for sharing with other users or organizations.
作为另一示例,如图4B所示,用户例如还可以选择发言时间轴430-1和430-2中的一个或多个时间片段,例如,时间片段440-1、时间片段440-2和时间片段440-3。相应地,在用户点击分享入口420后(即,发出分享请求),与时间片段440-1、时间片段440-2和时间片段440-3对应的多个离散的视听内容片段可以被组合以生成独立的片段视听内容,以例如用于分享至其它用户或组织。As another example, as shown in Fig. 4B, the user can also select one or more time segments in the speech timelines 430-1 and 430-2, for example, time segment 440-1, time segment 440-2, and time segment 440-3. Accordingly, after the user clicks on the sharing entry 420 (i.e., sends a sharing request), multiple discrete audio-visual content segments corresponding to the time segments 440-1, 440-2, and 440-3 can be combined to generate independent segment audio-visual content, for example, for sharing with other users or organizations.
基于这样的方式,本公开的实施例能够支持用户通过选取发言时间轴或时间片段来更为高效地分享片段视听内容,由此可以提高视听内容分享的效率,并且提高被分享者获取信息的效率。此外,本公开的实施例还支持用户选取非连续的片段来创建,这进一步提高了片段视听内容分享的灵活性。Based on this approach, the embodiments of the present disclosure can support users to more efficiently share audio-visual content by selecting a speech timeline or a time segment, thereby improving the efficiency of audio-visual content sharing and improving the efficiency of information acquisition by the shared party. In addition, the embodiments of the present disclosure also support users to select non-continuous segments to create, which further improves the flexibility of sharing audio-visual content segments.
示例过程Example Process
图5示出了根据本公开的一些实施例的用于查看视听内容的示例过程500的流程图。过程500可以在适当的电子设备处实现。这样的电子设备的示例可以包括但不限于:台式电脑、笔记本电脑、智能手机、平板电脑、个人数字助理或智能穿戴设备等。FIG5 shows a flow chart of an example process 500 for viewing audiovisual content according to some embodiments of the present disclosure. Process 500 may be implemented at a suitable electronic device. Examples of such electronic devices may include, but are not limited to, desktop computers, laptop computers, smart phones, tablet computers, personal digital assistants, or smart wearable devices.
如图5所示,在框510,电子设备提供针对视听内容的查看界面,查看界面包括用于播放视听内容的播放控件。As shown in FIG. 5 , in block 510 , the electronic device provides a viewing interface for audio-visual content, where the viewing interface includes a play control for playing the audio-visual content.
在框520,电子设备在播放控件中呈现至少一个发言时间轴,至少一个发言时间轴用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。In box 520, the electronic device presents at least one speech timeline in the playback control, where the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
在一些实施例中,查看界面还包括文本控件,文本控件用于呈现与视听内容对应的文本内容,文本内容是基于视听内容的音频而被生成。In some embodiments, the viewing interface further includes a text control for presenting text content corresponding to the audiovisual content, the text content being generated based on the audio of the audiovisual content.
在一些实施例中,方法还包括:响应于与至少一个发言时间轴中第一发言时间轴的选择,使文本内容中与第一发言方对应的第一文本内容在文本控件中相对于其它发言方的第二文本内容被强调显示,其中第一发言时间轴对应于第一发言方。In some embodiments, the method also includes: in response to selection of a first speech timeline in at least one speech timeline, causing first text content corresponding to a first speaker in the text content to be highlighted in the text control relative to second text content of other speakers, wherein the first speech timeline corresponds to the first speaker.
在一些实施例中,使文本内容中与第一发言方对应的第一文本内容在文本控件中相对于其它发言方的第二文本内容被强调显示:提高第一文本内容在文本控件中被显示的突出程度;和/或降低第二文本内容在文本控件中被显示的突出程度。In some embodiments, first text content corresponding to a first speaker in the text content is highlighted in a text control relative to second text content of other speakers: the prominence of the first text content displayed in the text control is increased; and/or the prominence of the second text content displayed in the text control is reduced.
在一些实施例中,方法还包括:接收针对至少一个发言时间轴的第一发言时间轴中第一位置的选择;以及使文本内容中与第一位置对应的文本内容在文本控件中被突出呈现。In some embodiments, the method further includes: receiving a selection of a first position in a first speech timeline of at least one speech timeline; and causing text content in the text content corresponding to the first position to be highlighted in the text control.
在一些实施例中,在播放控件中呈现至少一个发言时间轴包括:在播放控件中呈现用于查看发言时间轴的查看入口;以及响应于针对查看入口的选择,在播放控件中呈现至少一个发言时间轴。In some embodiments, presenting at least one speech timeline in the playback control includes: presenting a viewing entry for viewing the speech timeline in the playback control; and presenting at least one speech timeline in the playback control in response to a selection of the viewing entry.
在一些实施例中,至少一个发言时间轴包括多个发言时间轴,并且多个发言时间轴在播放控件中的呈现顺序是基于以下至少一项而被确定:与多个发言时间轴对应的多个发言方的文本标识,多个发言 方的发言内容的比例,或多个发言方的发言内容的起始时间。In some embodiments, at least one speech timeline includes multiple speech timelines, and the presentation order of the multiple speech timelines in the playback control is determined based on at least one of the following: text identifiers of multiple speakers corresponding to the multiple speech timelines, The proportion of speeches by multiple speakers, or the start time of speeches by multiple speakers.
在一些实施例中,在播放控件中呈现至少一个发言时间轴包括:接收与目标发言方相关联的查看请求;以及呈现与目标发言方对应的目标发言时间轴,目标发言时间轴用于指示目标发言方的目标发言内容在时间上的分布。In some embodiments, presenting at least one speech timeline in a playback control includes: receiving a viewing request associated with a target speaker; and presenting a target speech timeline corresponding to the target speaker, the target speech timeline being used to indicate the temporal distribution of the target speech content of the target speaker.
在一些实施例中,接收与目标发言方相关联的查看请求包括:呈现与视听内容相关联的多个发言方相关联的多个视觉元素;以及基于与多个视觉元素中与目标发言方对应的目标视觉元素的预设操作,接收与目标发言方相关联的查看请求。In some embodiments, receiving a viewing request associated with a target speaker includes: presenting multiple visual elements associated with multiple speakers associated with audio-visual content; and receiving a viewing request associated with the target speaker based on a preset operation of a target visual element corresponding to the target speaker among the multiple visual elements.
在一些实施例中,接收与目标发言方相关联的查看请求包括:基于指示目标发言方的输入,接收与目标发言方相关联的查看请求。In some embodiments, receiving a view request associated with the target speaker includes receiving a view request associated with the target speaker based on an input indicating the target speaker.
在一些实施例中,方法还包括:接收针对至少一个发言时间轴的第一发言时间轴中第二位置的选择;以及使视听内容的对应部分从与第二位置对应的时间点处被播放。In some embodiments, the method further includes: receiving a selection of a second position in a first speech timeline of at least one speech timeline; and causing a corresponding portion of the audiovisual content to be played from a time point corresponding to the second position.
在一些实施例中,第一发言时间轴对应于第一发言方,并且使视听内容的至少部分从与第二位置对应的时间点处被播放包括:使视听内容从时间点处被连续播放;或使视听内容中与第一发言方对应的部分视听内容从时间点处被播放。In some embodiments, the first speech timeline corresponds to the first speaker, and causing at least part of the audio-visual content to be played from a time point corresponding to the second position includes: causing the audio-visual content to be played continuously from the time point; or causing part of the audio-visual content corresponding to the first speaker in the audio-visual content to be played from the time point.
在一些实施例中,方法还包括:与至少一个发言时间轴相关联地呈现相应的发言方的描述信息,描述信息基于发言方的文本标识和/或图形标识而被生成。In some embodiments, the method further comprises: presenting description information of the corresponding speaker in association with at least one speech timeline, the description information being generated based on a text identifier and/or a graphic identifier of the speaker.
在一些实施例中,方法还包括:与至少一个发言时间轴相关联地呈现相应的发言方的发言内容的占比信息。In some embodiments, the method further includes: presenting the proportion information of the speech content of the corresponding speaker in association with at least one speech timeline.
在一些实施例中,播放控件还包括主时间轴,主时间轴用于呈现与视听内容的音频波形相对应的图形信息。In some embodiments, the playback controls further include a main timeline for presenting graphical information corresponding to an audio waveform of the audiovisual content.
在一些实施例中,视听内容是针对在线会议的视听记录,并且播放控件还包括主时间轴,主时间轴用于呈现与在线会议中的第一交互行为相对应的第一交互标识。In some embodiments, the audiovisual content is an audiovisual recording of an online meeting, and the playback control further includes a main timeline, which is used to present a first interaction identifier corresponding to a first interaction behavior in the online meeting.
在一些实施例中,至少一个发言时间轴还呈现用于指示在线会议 中与对应的发言方相关联的第二交互行为的第二交互标识。In some embodiments, at least one speech timeline also presents a timeline for indicating the online meeting A second interaction identifier of a second interaction behavior associated with the corresponding speaker.
在一些实施例中,第一交互行为和/或第二交互行为包括以下至少一项:文件共享、在线聊天、以及评论。In some embodiments, the first interactive behavior and/or the second interactive behavior includes at least one of the following: file sharing, online chatting, and commenting.
在一些实施例中,方法还包括:响应于对第一交互标识的第一选择,呈现针对第一交互行为的第一描述信息;和/或响应于对第二交互标识的第二选择,呈现针对第二交互行为的第二描述信息。In some embodiments, the method further includes: in response to a first selection of the first interaction identifier, presenting first description information for the first interaction behavior; and/or in response to a second selection of the second interaction identifier, presenting second description information for the second interaction behavior.
在一些实施例中,方法还包括:接收针对至少一个发言时间轴中的至少一个时间片段的选择;以及基于与至少一个时间片段相关联的第一分享请求,使与至少一个时间片段对应的第一片段视听内容被生成以用于分享。In some embodiments, the method further includes: receiving a selection of at least one time segment in at least one speech timeline; and based on a first sharing request associated with the at least one time segment, generating a first segment of audiovisual content corresponding to the at least one time segment for sharing.
在一些实施例中,方法还包括:接收针对至少一个发言时间轴中的一组发言时间轴的选择,一组时间轴包括一个或多个发言时间轴;以及基于与一组时间轴相关联的第二分享请求,使与一组发言时间轴对应的第二片段视听内容被生成以用于分享。In some embodiments, the method also includes: receiving a selection of a group of speech timelines in at least one speech timeline, the group of timelines including one or more speech timelines; and based on a second sharing request associated with the group of timelines, causing a second segment of audio-visual content corresponding to the group of speech timelines to be generated for sharing.
在一些实施例中,视听内容包括实时视听内容,并且至少一个发言时间轴用于指示与实时视听内容的历史部分相关联的至少一个发言方的历史发言内容在时间上的分布。In some embodiments, the audiovisual content includes real-time audiovisual content, and at least one speech timeline is used to indicate the temporal distribution of historical speech content of at least one speaker associated with the historical portion of the real-time audiovisual content.
示例装置和设备Example devices and equipment
本公开的实施例还提供了用于实现上述方法或过程的相应装置。图6示出了根据本公开的一些实施例的用于查看视听内容的装置600的示意性结构框图。The embodiments of the present disclosure also provide corresponding devices for implementing the above methods or processes. Fig. 6 shows a schematic structural block diagram of a device 600 for viewing audio-visual content according to some embodiments of the present disclosure.
如图6所示,装置600包括提供模块610,被配置为提供针对视听内容的查看界面,查看界面包括用于播放视听内容的播放控件。As shown in FIG. 6 , the apparatus 600 includes a providing module 610 configured to provide a viewing interface for audio-visual content, where the viewing interface includes a playback control for playing the audio-visual content.
此外,装置600还包括呈现模块620,被配置为在播放控件中呈现至少一个发言时间轴,至少一个发言时间轴用于指示与视听内容相关联的至少一个发言方的发言内容在时间上的分布。In addition, the apparatus 600 further includes a presentation module 620 configured to present at least one speech timeline in the playback control, wherein the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
在一些实施例中,查看界面还包括文本控件,文本控件用于呈现与视听内容对应的文本内容,文本内容是基于视听内容的音频而被生 成。In some embodiments, the viewing interface further includes a text control for presenting text content corresponding to the audiovisual content, the text content being generated based on the audio of the audiovisual content. become.
在一些实施例中,呈现模块620还被配置为:响应于与至少一个发言时间轴中第一发言时间轴的选择,使文本内容中与第一发言方对应的第一文本内容在文本控件中相对于其它发言方的第二文本内容被强调显示,其中第一发言时间轴对应于第一发言方。In some embodiments, the presentation module 620 is further configured to: in response to selection of a first speech timeline in at least one speech timeline, cause first text content corresponding to a first speaker in the text content to be highlighted in the text control relative to second text content of other speakers, wherein the first speech timeline corresponds to the first speaker.
在一些实施例中,使文本内容中与第一发言方对应的第一文本内容在文本控件中相对于其它发言方的第二文本内容被强调显示:提高第一文本内容在文本控件中被显示的突出程度;和/或降低第二文本内容在文本控件中被显示的突出程度。In some embodiments, first text content corresponding to a first speaker in the text content is highlighted in a text control relative to second text content of other speakers: the prominence of the first text content displayed in the text control is increased; and/or the prominence of the second text content displayed in the text control is reduced.
在一些实施例中,呈现模块620还被配置为:接收针对至少一个发言时间轴的第一发言时间轴中第一位置的选择;以及使文本内容中与第一位置对应的文本内容在文本控件中被突出呈现。In some embodiments, the presentation module 620 is further configured to: receive a selection of a first position in a first speech timeline of at least one speech timeline; and highlight the text content corresponding to the first position in the text content in the text control.
在一些实施例中,呈现模块620还被配置为:在播放控件中呈现用于查看发言时间轴的查看入口;以及响应于针对查看入口的选择,在播放控件中呈现至少一个发言时间轴。In some embodiments, the presentation module 620 is further configured to: present a viewing entry for viewing the speech timeline in the playback control; and present at least one speech timeline in the playback control in response to a selection of the viewing entry.
在一些实施例中,至少一个发言时间轴包括多个发言时间轴,并且多个发言时间轴在播放控件中的呈现顺序是基于以下至少一项而被确定:与多个发言时间轴对应的多个发言方的文本标识,多个发言方的发言内容的比例,或多个发言方的发言内容的起始时间。In some embodiments, at least one speech timeline includes multiple speech timelines, and the presentation order of the multiple speech timelines in the playback control is determined based on at least one of the following: text identifiers of the multiple speakers corresponding to the multiple speech timelines, the proportion of the speech content of the multiple speakers, or the start time of the speech content of the multiple speakers.
在一些实施例中,呈现模块620还被配置为:接收与目标发言方相关联的查看请求;以及呈现与目标发言方对应的目标发言时间轴,目标发言时间轴用于指示目标发言方的目标发言内容在时间上的分布。In some embodiments, the presentation module 620 is further configured to: receive a viewing request associated with a target speaker; and present a target speech timeline corresponding to the target speaker, the target speech timeline being used to indicate the temporal distribution of the target speech content of the target speaker.
在一些实施例中,呈现模块620还被配置为:呈现与视听内容相关联的多个发言方相关联的多个视觉元素;以及基于与多个视觉元素中与目标发言方对应的目标视觉元素的预设操作,接收与目标发言方相关联的查看请求。In some embodiments, the presentation module 620 is further configured to: present multiple visual elements associated with multiple speakers associated with the audio-visual content; and receive a viewing request associated with a target speaker based on a preset operation of a target visual element corresponding to the target speaker among the multiple visual elements.
在一些实施例中,呈现模块620还被配置为:基于指示目标发言方的输入,接收与目标发言方相关联的查看请求。 In some embodiments, presentation module 620 is further configured to: based on the input indicating the target speaker, receive a viewing request associated with the target speaker.
在一些实施例中,呈现模块620还被配置为:接收针对至少一个发言时间轴的第一发言时间轴中第二位置的选择;以及使视听内容的对应部分从与第二位置对应的时间点处被播放。In some embodiments, the presentation module 620 is further configured to: receive a selection of a second position in a first speech timeline of at least one speech timeline; and cause the corresponding portion of the audiovisual content to be played from a time point corresponding to the second position.
在一些实施例中,第一发言时间轴对应于第一发言方,并且呈现模块620还被配置为:使视听内容从时间点处被连续播放;或使视听内容中与第一发言方对应的部分视听内容从时间点处被播放。In some embodiments, the first speech timeline corresponds to the first speaker, and the presentation module 620 is further configured to: enable the audiovisual content to be played continuously from a point in time; or enable the portion of the audiovisual content corresponding to the first speaker to be played from a point in time.
在一些实施例中,呈现模块620还被配置为:与至少一个发言时间轴相关联地呈现相应的发言方的描述信息,描述信息基于发言方的文本标识和/或图形标识而被生成。In some embodiments, the presentation module 620 is further configured to present description information of the corresponding speaker in association with at least one speech timeline, where the description information is generated based on a text identifier and/or a graphic identifier of the speaker.
在一些实施例中,呈现模块620还被配置为:与至少一个发言时间轴相关联地呈现相应的发言方的发言内容的占比信息。In some embodiments, the presentation module 620 is further configured to present the proportion information of the speech content of the corresponding speaker in association with at least one speech timeline.
在一些实施例中,播放控件还包括主时间轴,主时间轴用于呈现与视听内容的音频波形相对应的图形信息。In some embodiments, the playback controls further include a main timeline for presenting graphical information corresponding to an audio waveform of the audiovisual content.
在一些实施例中,视听内容是针对在线会议的视听记录,并且播放控件还包括主时间轴,主时间轴用于呈现与在线会议中的第一交互行为相对应的第一交互标识。In some embodiments, the audiovisual content is an audiovisual recording of an online meeting, and the playback control further includes a main timeline, which is used to present a first interaction identifier corresponding to a first interaction behavior in the online meeting.
在一些实施例中,至少一个发言时间轴还呈现用于指示在线会议中与对应的发言方相关联的第二交互行为的第二交互标识。In some embodiments, at least one speech timeline further presents a second interaction identifier for indicating a second interaction behavior associated with the corresponding speaker in the online conference.
在一些实施例中,第一交互行为和/或第二交互行为包括以下至少一项:文件共享、在线聊天、以及评论。In some embodiments, the first interactive behavior and/or the second interactive behavior includes at least one of the following: file sharing, online chatting, and commenting.
在一些实施例中,呈现模块620还被配置为:响应于对第一交互标识的第一选择,呈现针对第一交互行为的第一描述信息;和/或响应于对第二交互标识的第二选择,呈现针对第二交互行为的第二描述信息。In some embodiments, the presentation module 620 is further configured to: present first description information for a first interaction behavior in response to a first selection of a first interaction identifier; and/or present second description information for a second interaction behavior in response to a second selection of a second interaction identifier.
在一些实施例中,呈现模块620还被配置为:接收针对至少一个发言时间轴中的至少一个时间片段的选择;以及基于与至少一个时间片段相关联的第一分享请求,使与至少一个时间片段对应的第一片段视听内容被生成以用于分享。In some embodiments, the presentation module 620 is further configured to: receive a selection of at least one time segment in at least one speech timeline; and based on a first sharing request associated with the at least one time segment, generate a first segment of audio-visual content corresponding to the at least one time segment for sharing.
在一些实施例中,呈现模块620还被配置为:接收针对至少一个 发言时间轴中的一组发言时间轴的选择,一组时间轴包括一个或多个发言时间轴;以及基于与一组时间轴相关联的第二分享请求,使与一组发言时间轴对应的第二片段视听内容被生成以用于分享。In some embodiments, the presentation module 620 is further configured to: receive a request for at least one A group of speech timelines is selected from the speech timelines, wherein the group of timelines includes one or more speech timelines; and based on a second sharing request associated with the group of timelines, a second segment of audio-visual content corresponding to the group of speech timelines is generated for sharing.
在一些实施例中,视听内容包括实时视听内容,并且至少一个发言时间轴用于指示与实时视听内容的历史部分相关联的至少一个发言方的历史发言内容在时间上的分布。In some embodiments, the audiovisual content includes real-time audiovisual content, and at least one speech timeline is used to indicate the temporal distribution of historical speech content of at least one speaker associated with the historical portion of the real-time audiovisual content.
装置600中所包括的单元可以利用各种方式来实现,包括软件、硬件、固件或其任意组合。在一些实施例中,一个或多个单元可以使用软件和/或固件来实现,例如存储在存储介质上的机器可执行指令。除了机器可执行指令之外或者作为替代,装置600中的部分或者全部单元可以至少部分地由一个或多个硬件逻辑组件来实现。作为示例而非限制,可以使用的示范类型的硬件逻辑组件包括现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准品(ASSP)、片上系统(SOC)、复杂可编程逻辑器件(CPLD),等等。The units included in the device 600 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some embodiments, one or more units can be implemented using software and/or firmware, such as machine executable instructions stored on a storage medium. In addition to or as an alternative to machine executable instructions, some or all of the units in the device 600 can be implemented at least in part by one or more hardware logic components. As an example and not limitation, exemplary types of hardware logic components that can be used include field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
图7示出了其中可以实施本公开的一个或多个实施例的计算设备/服务器700的框图。应当理解,图7所示出的计算设备/服务器700仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。Figure 7 shows a block diagram of a computing device/server 700 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 700 shown in Figure 7 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein.
如图7所示,计算设备/服务器700是通用计算设备的形式。计算设备/服务器700的组件可以包括但不限于一个或多个处理器或处理单元710、存储器720、存储设备730、一个或多个通信单元740、一个或多个输入设备760以及一个或多个输出设备760。处理单元710可以是实际或虚拟处理器并且能够根据存储器720中存储的程序来执行各种处理。在多处理器系统中,多个处理单元并行执行计算机可执行指令,以提高计算设备/服务器700的并行处理能力。As shown in Figure 7, computing device/server 700 is in the form of a general computing device. The components of computing device/server 700 may include, but are not limited to, one or more processors or processing units 710, memory 720, storage device 730, one or more communication units 740, one or more input devices 760, and one or more output devices 760. Processing unit 710 may be an actual or virtual processor and is capable of performing various processes according to a program stored in memory 720. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capabilities of computing device/server 700.
计算设备/服务器700通常包括多个计算机存储介质。这样的介质可以是计算设备/服务器700可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器720可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、 非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备730可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在计算设备/服务器700内被访问。The computing device/server 700 typically includes a plurality of computer storage media. Such media can be any available media accessible to the computing device/server 700, including but not limited to volatile and nonvolatile media, removable and non-removable media. The memory 720 can be a volatile memory (e.g., registers, cache, random access memory (RAM)), Non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 730 may be removable or non-removable media and may include machine-readable media such as a flash drive, a disk, or any other media that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within computing device/server 700.
计算设备/服务器700可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图7中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器720可以包括计算机程序产品725,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。The computing device/server 700 may further include additional removable/non-removable, volatile/non-volatile storage media. Although not shown in FIG. 7 , a disk drive for reading or writing from a removable, non-volatile disk (e.g., a “floppy disk”) and an optical drive for reading or writing from a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. The memory 720 may include a computer program product 725 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
通信单元740实现通过通信介质与其他计算设备进行通信。附加地,计算设备/服务器700的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,计算设备/服务器700可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。The communication unit 740 enables communication with other computing devices via a communication medium. Additionally, the functions of the components of the computing device/server 700 can be implemented in a single computing cluster or multiple computing machines that can communicate via a communication connection. Thus, the computing device/server 700 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
输入设备750可以是一个或多个输入设备,例如鼠标、键盘、追踪球等。输出设备760可以是一个或多个输出设备,例如显示器、扬声器、打印机等。计算设备/服务器700还可以根据需要通过通信单元740与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与计算设备/服务器700交互的设备进行通信,或者与使得计算设备/服务器700与一个或多个其他计算设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。Input device 750 may be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 760 may be one or more output devices, such as a display, speaker, printer, etc. Computing device/server 700 may also communicate with one or more external devices (not shown) as needed, such as storage devices, display devices, etc., with one or more devices that enable users to interact with computing device/server 700, or with any device that enables computing device/server 700 to communicate with one or more other computing devices (e.g., network card, modem, etc.) through communication unit 740. Such communication may be performed via an input/output (I/O) interface (not shown).
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行以实现上文描述的方法。 According to an exemplary implementation of the present disclosure, a computer-readable storage medium is provided, on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.
这里参照根据本公开实现的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products implemented according to the present disclosure. It should be understood that each box in the flowchart and/or block diagram and the combination of each box in the flowchart and/or block diagram can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
也可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operational steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
附图中的流程图和框图显示了根据本公开的多个实现的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。 The flow chart and block diagram in the accompanying drawings show the possible architecture, function and operation of the system, method and computer program product according to multiple implementations of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of a module, program segment or instruction includes one or more executable instructions for realizing the logical function of the specification. In some implementations as replacements, the function marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be realized by a special hardware-based system that performs the function or action of the specification, or can be realized by a combination of special hardware and computer instructions.
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各实现。 The above descriptions of various implementations of the present disclosure are exemplary, non-exhaustive, and not limited to the disclosed implementations. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The selection of terms used herein is intended to best explain the principles of the implementations, practical applications, or improvements to the technology in the marketplace, or to enable other persons of ordinary skill in the art to understand the implementations disclosed herein.

Claims (32)

  1. 一种查看视听内容的方法,包括:A method for viewing audiovisual content, comprising:
    提供针对视听内容的查看界面,所述查看界面包括用于播放所述视听内容的播放控件;以及providing a viewing interface for the audio-visual content, the viewing interface including a playback control for playing the audio-visual content; and
    在所述播放控件中呈现至少一个发言时间轴,所述至少一个发言时间轴用于指示与所述视听内容相关联的至少一个发言方的发言内容在时间上的分布。At least one speech timeline is presented in the playback control, and the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
  2. 根据权利要求1所述的方法,其中所述查看界面还包括文本控件,所述文本控件用于呈现与所述视听内容对应的文本内容,所述文本内容是基于所述视听内容的音频而被生成。The method according to claim 1, wherein the viewing interface further comprises a text control, the text control being used to present text content corresponding to the audio-visual content, the text content being generated based on audio of the audio-visual content.
  3. 根据权利要求2所述的方法,还包括:The method according to claim 2, further comprising:
    响应于与所述至少一个发言时间轴中第一发言时间轴的选择,使所述文本内容中与第一发言方对应的第一文本内容在所述文本控件中相对于其它发言方的第二文本内容被强调显示,其中所述第一发言时间轴对应于所述第一发言方。In response to selection of a first speech timeline in the at least one speech timeline, first text content in the text content corresponding to a first speaker is highlighted in the text control relative to second text content of other speakers, wherein the first speech timeline corresponds to the first speaker.
  4. 根据权利要求3所述的方法,其中使所述文本内容中与第一发言方对应的第一文本内容在所述文本控件中相对于其它发言方的第二文本内容被强调显示:The method according to claim 3, wherein the first text content corresponding to the first speaker in the text content is highlighted in the text control relative to the second text content of other speakers:
    提高所述第一文本内容在所述文本控件中被显示的突出程度;和/或increasing the prominence of the first text content displayed in the text control; and/or
    降低所述第二文本内容在所述文本控件中被显示的突出程度。The prominence with which the second text content is displayed in the text control is reduced.
  5. 根据权利要求2所述的方法,还包括:The method according to claim 2, further comprising:
    接收针对所述至少一个发言时间轴的第一发言时间轴中第一位置的选择;以及receiving a selection of a first position in a first speech timeline of the at least one speech timeline; and
    使所述文本内容中与所述第一位置对应的文本内容在所述文本控件中被突出呈现。The text content corresponding to the first position in the text content is highlighted in the text control.
  6. 根据权利要求1所述的方法,其中在所述播放控件中呈现至少一个发言时间轴包括: The method according to claim 1, wherein presenting at least one speech timeline in the playback control comprises:
    在所述播放控件中呈现用于查看发言时间轴的查看入口;以及Presenting a viewing entry for viewing the speech timeline in the playback control; and
    响应于针对所述查看入口的选择,在所述播放控件中呈现所述至少一个发言时间轴。In response to selection of the viewing entry, the at least one speech timeline is presented in the playback control.
  7. 根据权利要求1所述的方法,其中所述至少一个发言时间轴包括多个发言时间轴,并且所述多个发言时间轴在所述播放控件中的呈现顺序是基于以下至少一项而被确定:The method according to claim 1, wherein the at least one speech timeline includes multiple speech timelines, and the presentation order of the multiple speech timelines in the playback control is determined based on at least one of the following:
    与所述多个发言时间轴对应的多个发言方的文本标识,text identifiers of multiple speakers corresponding to the multiple speech timelines,
    所述多个发言方的发言内容的比例,或the proportion of the speech contents of the multiple speakers, or
    所述多个发言方的发言内容的起始时间。The starting time of the speech contents of the multiple speakers.
  8. 根据权利要求1所述的方法,其中在所述播放控件中呈现至少一个发言时间轴包括:The method according to claim 1, wherein presenting at least one speech timeline in the playback control comprises:
    接收与目标发言方相关联的查看请求;以及receiving a viewing request associated with a target speaker; and
    呈现与所述目标发言方对应的目标发言时间轴,所述目标发言时间轴用于指示所述目标发言方的目标发言内容在时间上的分布。A target speech timeline corresponding to the target speaker is presented, wherein the target speech timeline is used to indicate the temporal distribution of the target speech content of the target speaker.
  9. 根据权利要求8所述的方法,其中接收与目标发言方相关联的查看请求包括:The method of claim 8, wherein receiving a viewing request associated with a target speaker comprises:
    呈现与所述视听内容相关联的多个发言方相关联的多个视觉元素;以及presenting a plurality of visual elements associated with a plurality of speakers associated with the audiovisual content; and
    基于与所述多个视觉元素中与所述目标发言方对应的目标视觉元素的预设操作,接收与目标发言方相关联的查看请求。Based on a preset operation of a target visual element corresponding to the target speaker among the plurality of visual elements, a viewing request associated with the target speaker is received.
  10. 根据权利要求8所述的方法,其中接收与目标发言方相关联的查看请求包括:The method of claim 8, wherein receiving a viewing request associated with a target speaker comprises:
    基于指示目标发言方的输入,接收与目标发言方相关联的查看请求。Based on the input indicating the target speaker, a view request associated with the target speaker is received.
  11. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    接收针对所述至少一个发言时间轴的第一发言时间轴中第二位置的选择;以及receiving a selection of a second position in a first speech timeline of the at least one speech timeline; and
    使所述视听内容的对应部分从与所述第二位置对应的时间点处被播放。 The corresponding portion of the audio-visual content is played from a time point corresponding to the second position.
  12. 根据权利要求11所述的方法,其中所述第一发言时间轴对应于第一发言方,并且使所述视听内容的至少部分从与所述第二位置对应的时间点处被播放包括:The method according to claim 11, wherein the first speech timeline corresponds to a first speaker, and causing at least a portion of the audiovisual content to be played from a time point corresponding to the second position comprises:
    使所述视听内容从所述时间点处被连续播放;或causing the audiovisual content to be played continuously from the time point; or
    使所述视听内容中与所述第一发言方对应的部分视听内容从所述时间点处被播放。The part of the audio-visual content corresponding to the first speaker is played from the time point.
  13. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    与所述至少一个发言时间轴相关联地呈现相应的发言方的描述信息,所述描述信息基于所述发言方的文本标识和/或图形标识而被生成。The description information of the corresponding speaker is presented in association with the at least one speech timeline, wherein the description information is generated based on the text identifier and/or the graphic identifier of the speaker.
  14. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    与所述至少一个发言时间轴相关联地呈现相应的发言方的发言内容的占比信息。The proportion information of the speech content of the corresponding speaker is presented in association with the at least one speech timeline.
  15. 根据权利要求1所述的方法,其中所述播放控件还包括主时间轴,所述主时间轴用于呈现与所述视听内容的音频波形相对应的图形信息。The method according to claim 1, wherein the playback control further includes a main timeline, the main timeline being used to present graphical information corresponding to an audio waveform of the audiovisual content.
  16. 根据权利要求1所述的方法,其中所述视听内容是针对在线会议的视听记录,并且所述播放控件还包括主时间轴,所述主时间轴用于呈现与所述在线会议中的第一交互行为相对应的第一交互标识。The method according to claim 1, wherein the audio-visual content is an audio-visual recording of an online meeting, and the playback control further includes a main timeline, wherein the main timeline is used to present a first interaction identifier corresponding to a first interaction behavior in the online meeting.
  17. 根据权利要求16所述的方法,其中所述至少一个发言时间轴还呈现用于指示所述在线会议中与对应的发言方相关联的第二交互行为的第二交互标识。The method according to claim 16, wherein the at least one speech timeline also presents a second interaction identifier for indicating a second interaction behavior associated with the corresponding speaker in the online conference.
  18. 根据权利要求16或17所述的方法,其中所述第一交互行为和/或所述第二交互行为包括以下至少一项:文件共享、在线聊天、以及评论。The method according to claim 16 or 17, wherein the first interactive behavior and/or the second interactive behavior comprises at least one of the following: file sharing, online chatting, and commenting.
  19. 根据权利要求16或17所述的方法,还包括:The method according to claim 16 or 17, further comprising:
    响应于对所述第一交互标识的第一选择,呈现针对所述第一交互行为的第一描述信息;和/或In response to a first selection of the first interaction identifier, presenting first description information for the first interaction behavior; and/or
    响应于对所述第二交互标识的第二选择,呈现针对所述第二交互 行为的第二描述信息。In response to a second selection of the second interaction identifier, a Second description of the behavior.
  20. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    接收针对所述至少一个发言时间轴中的至少一个时间片段的选择;以及receiving a selection of at least one time segment in the at least one speech timeline; and
    基于与所述至少一个时间片段相关联的第一分享请求,使与所述至少一个时间片段对应的第一片段视听内容被生成以用于分享。Based on a first sharing request associated with the at least one time segment, a first segment of audio-visual content corresponding to the at least one time segment is generated for sharing.
  21. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    接收针对所述至少一个发言时间轴中的一组发言时间轴的选择,所述一组时间轴包括一个或多个发言时间轴;以及receiving a selection of a group of speech timelines among the at least one speech timeline, the group of timelines comprising one or more speech timelines; and
    基于与所述一组时间轴相关联的第二分享请求,使与所述一组发言时间轴对应的第二片段视听内容被生成以用于分享。Based on a second sharing request associated with the set of timelines, a second segment of audio-visual content corresponding to the set of speech timelines is generated for sharing.
  22. 根据权利要求1所述的方法,其中所述视听内容包括实时视听内容,并且所述至少一个发言时间轴用于指示与所述实时视听内容的历史部分相关联的至少一个发言方的历史发言内容在时间上的分布。The method according to claim 1, wherein the audio-visual content comprises real-time audio-visual content, and the at least one speech timeline is used to indicate the temporal distribution of historical speech content of at least one speaker associated with the historical portion of the real-time audio-visual content.
  23. 一种用于查看视听内容的装置,包括:A device for viewing audio-visual content, comprising:
    提供模块,被配置为提供针对视听内容的查看界面,所述查看界面包括用于播放所述视听内容的播放控件;以及A providing module configured to provide a viewing interface for audio-visual content, wherein the viewing interface includes a play control for playing the audio-visual content; and
    呈现模块,被配置为在所述播放控件中呈现至少一个发言时间轴,所述至少一个发言时间轴用于指示与所述视听内容相关联的至少一个发言方的发言内容在时间上的分布。The presentation module is configured to present at least one speech timeline in the playback control, wherein the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
  24. 一种播放系统,包括:A playback system, comprising:
    主时间轴,所述主时间轴至少指示视听内容的当前播放位置;以及a main timeline, the main timeline indicating at least a current playback position of the audiovisual content; and
    至少一个发言时间轴,所述至少一个发言时间轴用于指示与所述视听内容相关联的至少一个发言方的发言内容在时间上的分布。At least one speech timeline, wherein the at least one speech timeline is used to indicate the temporal distribution of speech content of at least one speaker associated with the audio-visual content.
  25. 根据权利要求24所述的播放系统,其中所述主时间轴还呈现与所述视听内容的音频波形相对应的图形信息。The playback system of claim 24, wherein the main timeline also presents graphical information corresponding to an audio waveform of the audiovisual content.
  26. 根据权利要求24所述的播放系统,其中所述视听内容是针对 在线会议的视听记录,所述主时间轴还呈现与所述在线会议中的第一交互行为相对应的第一交互标识。The playback system according to claim 24, wherein the audio-visual content is for The audio-visual record of the online conference, the main timeline also presents a first interaction identifier corresponding to a first interaction behavior in the online conference.
  27. 根据权利要求26所述的播放系统,其中所述至少一个发言时间轴还呈现用于指示所述在线会议中与对应的发言方相关联的第二交互行为的第二交互标识。The playback system according to claim 26, wherein the at least one speech timeline also presents a second interaction identifier for indicating a second interaction behavior associated with the corresponding speaker in the online conference.
  28. 根据权利要求26或27所述的播放系统,其中所述第一交互行为和/或所述第二交互行为包括以下至少一项:文件共享、在线聊天、以及评论。The playback system according to claim 26 or 27, wherein the first interactive behavior and/or the second interactive behavior comprises at least one of the following: file sharing, online chatting, and commenting.
  29. 根据权利要求26或27所述的播放系统,其中:The playback system according to claim 26 or 27, wherein:
    对所述第一交互标识的第一选择用于触发针对所述第一交互行为的第一描述信息;和/或A first selection of the first interaction identifier is used to trigger first description information for the first interaction behavior; and/or
    对所述第二交互标识的第二选择用于触发呈现针对所述第二交互行为的第二描述信息。The second selection of the second interaction identifier is used to trigger presentation of second description information for the second interaction behavior.
  30. 根据权利要求24所述的播放系统,其中所述至少一个发言时间轴包括多个发言时间轴,并且所述多个发言时间轴在所述播放系统中的呈现顺序是基于以下至少一项而被确定:The playback system according to claim 24, wherein the at least one speech timeline includes a plurality of speech timelines, and the presentation order of the plurality of speech timelines in the playback system is determined based on at least one of the following:
    与所述多个发言时间轴对应的多个发言方的文本标识,text identifiers of multiple speakers corresponding to the multiple speech timelines,
    所述多个发言方的发言内容的比例,或the proportion of the speech contents of the multiple speakers, or
    所述多个发言方的发言内容的起始时间。The starting time of the speech contents of the multiple speakers.
  31. 一种电子设备,包括:An electronic device, comprising:
    至少一个处理单元;以及at least one processing unit; and
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述设备执行根据权利要求1至22中任一项所述的方法。At least one memory, the at least one memory being coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the device to perform a method according to any one of claims 1 to 22 when executed by the at least one processing unit.
  32. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现根据权利要求1至22中任一项所述的方法。 A computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method according to any one of claims 1 to 22.
PCT/CN2023/113406 2022-10-31 2023-08-16 Method and apparatus for checking audiovisual content, and device and storage medium WO2024093442A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211352393.3A CN117956233A (en) 2022-10-31 2022-10-31 Method, apparatus, device and storage medium for viewing audio-visual content
CN202211352393.3 2022-10-31

Publications (1)

Publication Number Publication Date
WO2024093442A1 true WO2024093442A1 (en) 2024-05-10

Family

ID=90798805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113406 WO2024093442A1 (en) 2022-10-31 2023-08-16 Method and apparatus for checking audiovisual content, and device and storage medium

Country Status (2)

Country Link
CN (1) CN117956233A (en)
WO (1) WO2024093442A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151041A (en) * 2019-06-26 2020-12-29 北京小米移动软件有限公司 Recording method, device and equipment based on recorder program and storage medium
CN113194349A (en) * 2021-04-25 2021-07-30 腾讯科技(深圳)有限公司 Video playing method, commenting method, device, equipment and storage medium
CN113326387A (en) * 2021-05-31 2021-08-31 引智科技(深圳)有限公司 Intelligent conference information retrieval method
JP2021184189A (en) * 2020-05-22 2021-12-02 i Smart Technologies株式会社 Online conference system
CN114491087A (en) * 2022-01-13 2022-05-13 Oppo广东移动通信有限公司 Text processing method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151041A (en) * 2019-06-26 2020-12-29 北京小米移动软件有限公司 Recording method, device and equipment based on recorder program and storage medium
JP2021184189A (en) * 2020-05-22 2021-12-02 i Smart Technologies株式会社 Online conference system
CN113194349A (en) * 2021-04-25 2021-07-30 腾讯科技(深圳)有限公司 Video playing method, commenting method, device, equipment and storage medium
CN113326387A (en) * 2021-05-31 2021-08-31 引智科技(深圳)有限公司 Intelligent conference information retrieval method
CN114491087A (en) * 2022-01-13 2022-05-13 Oppo广东移动通信有限公司 Text processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117956233A (en) 2024-04-30

Similar Documents

Publication Publication Date Title
US9154531B2 (en) Systems and methods for enhanced conference session interaction
US20180039951A1 (en) Computer-assisted agendas for videoconferences
JP2017229060A (en) Methods, programs and devices for representing meeting content
US12015683B2 (en) Method, apparatus and device for issuing and replying to multimedia content
US20240129263A1 (en) Shared Group Reactions Within A Video Communication Session
US20220182428A1 (en) Promotion of users in collaboration sessions
WO2024041549A1 (en) Method and apparatus for presenting session message, and device and storage medium
US10237082B2 (en) System and method for multimodal interaction aids
CN108845741A (en) A kind of generation method, client, terminal and the storage medium of AR expression
CN113574555A (en) Intelligent summarization based on context analysis of auto-learning and user input
US20190095392A1 (en) Methods and systems for facilitating storytelling using visual media
US11792468B1 (en) Sign language interpreter view within a communication session
WO2024083124A1 (en) Live streaming interface interaction method and apparatus, device and storage medium
US20240098362A1 (en) Method, apparatus, device and storage medium for content capturing
WO2024099452A1 (en) Video interaction method and apparatus, and device and storage medium
CN116368459A (en) Voice commands for intelligent dictation automated assistant
WO2023226853A1 (en) Method and apparatus for work reposting, and device and storage medium
WO2023226855A1 (en) Work forwarding method and apparatus, device and storage medium
WO2024093442A1 (en) Method and apparatus for checking audiovisual content, and device and storage medium
US20230353613A1 (en) Active speaker proxy presentation for sign language interpreters
CN115623133A (en) Online conference method and device, electronic equipment and readable storage medium
WO2024093937A1 (en) Method and apparatus for viewing audio-visual content, device, and storage medium
CN114915836A (en) Method, apparatus, device and storage medium for editing audio
US8572497B2 (en) Method and system for exchanging contextual keys
WO2023246395A1 (en) Method and apparatus for audio-visual content sharing, device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23884381

Country of ref document: EP

Kind code of ref document: A1