CN113329237B

CN113329237B - Method and equipment for presenting event label information

Info

Publication number: CN113329237B
Application number: CN202110584469.4A
Authority: CN
Inventors: 陈童
Original assignee: Beijing Yijiao Wenshu Technology Co ltd
Current assignee: Beijing Yijiao Wenshu Technology Co ltd
Priority date: 2021-02-02
Filing date: 2021-05-27
Publication date: 2023-03-21
Anticipated expiration: 2041-05-27
Also published as: CN113329237A

Abstract

The application aims to provide a method and equipment for presenting event label information, wherein the method comprises the following steps: recording multi-user audio and video contents corresponding to a plurality of users in the process of a conference of an audio and video conference participated by the plurality of users; responding to a trigger event in the audio and video conference, and creating and storing event tag information corresponding to the trigger event; sending the event tag information to the plurality of users so as to present the event tag information on a conference time axis corresponding to the audio and video conference of the plurality of users; and receiving an event viewing request about the event tag information sent by a first user of the multiple users, obtaining target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returning the target multi-user audio and video content to the first user for playing.

Description

Method and equipment for presenting event label information

Application No. CN 202110142958.4 (application No. 2021-2-2) priority of this case

Technical Field

The present application relates to the field of communications, and more particularly, to a technique for presenting event tag information.

Background

With the development of the times, networks have become essential tools in people's life and work, and audio and video is one of the main means for people to interact in networks, for example, people can communicate through audio and video conferences. In the prior art, audio and video content is usually recorded by video recording or audio is converted into text.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for presenting event tag information.

According to an aspect of the present application, there is provided a method for presenting event tag information applied to a network device, the method comprising:

recording multi-user audio and video contents corresponding to a plurality of users in the process of a conference of an audio and video conference participated by the plurality of users;

responding to a trigger event in the audio and video conference, and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

sending the event tag information to the plurality of users so as to present the event tag information on a conference time axis corresponding to the audio and video conference of the plurality of users;

and receiving an event viewing request about the event tag information sent by a first user of the multiple users, obtaining target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returning the target multi-user audio and video content to the first user for playing.

According to an aspect of the present application, there is provided a method for presenting event tag information applied to a first user equipment, the method including:

receiving event tag information sent by network equipment in a conference process of an audio and video conference participated by a plurality of users, and presenting the event tag information on a conference time axis corresponding to the audio and video conference, wherein the event tag information is created by the network equipment in response to a trigger event in the audio and video conference, and the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

in response to an event viewing operation performed by a first user for the event label information, generating an event viewing request about the event label information and sending the event viewing request to the network equipment;

and receiving target multi-person audio and video content corresponding to the event tag information returned by the network equipment, and playing the target multi-person audio and video content, wherein the target multi-person audio and video content is obtained from multi-person audio and video content corresponding to a plurality of users recorded by the network equipment in the meeting process.

According to an aspect of the present application, there is provided a network device for presenting event tag information, the device including:

the one-to-one module is used for recording multi-user audio and video contents corresponding to a plurality of users in the process of the audio and video conference participated by the plurality of users;

the first module and the second module are used for responding to a trigger event in the audio and video conference, and creating and storing event label information corresponding to the trigger event, wherein the event label information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

a third module, configured to send the event tag information to the multiple users, so as to present the event tag information on a conference time axis corresponding to the audio/video conference of the multiple users;

and the fourth module is used for receiving an event viewing request which is sent by a first user of the multiple users and is about to the event tag information, acquiring target multi-person audio and video contents corresponding to the event tag information from the multi-person audio and video contents, and returning the target multi-person audio and video contents to the first user for playing.

According to an aspect of the present application, there is provided a first user equipment for presenting event tag information, the apparatus comprising:

the device comprises a first module and a second module, wherein the first module is used for receiving event tag information sent by network equipment in the process of a video and audio conference participated by a plurality of users, and presenting the event tag information on a conference time axis corresponding to the video and audio conference, the event tag information is created by the network equipment in response to a trigger event in the video and audio conference, and the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

a second module, configured to generate an event viewing request related to the event tag information and send the event viewing request to the network device in response to an event viewing operation performed by the first user for the event tag information;

and the second and third modules are used for receiving the target multi-person audio and video content corresponding to the event tag information returned by the network equipment and playing the target multi-person audio and video content, wherein the target multi-person audio and video content is obtained from multi-person audio and video content corresponding to the plurality of users recorded by the network equipment in the conference process.

According to an aspect of the present application, there is provided an apparatus for presenting event tag information, wherein the apparatus includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

sending the event tag information to the users so as to present the event tag information on a conference time axis corresponding to the audio and video conference of the users;

a processor; and

receiving target multi-person audio and video contents corresponding to the event tag information returned by the network equipment, and playing the target multi-person audio and video contents, wherein the target multi-person audio and video contents are obtained from multi-person audio and video contents corresponding to the plurality of users recorded by the network equipment in the conference process.

According to one aspect of the application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to:

According to another aspect of the application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method of:

Compared with the prior art, this application can respond to trigger event in the audio and video meeting, establish and save the event label information that trigger event corresponds, later will event label information send for a plurality of users, in order a plurality of users appear on the meeting time axis that the audio and video meeting corresponds event label information, and then when receiving first user among a plurality of users sends about event label information's event is looked over the request, follow in the many people audio and video content obtain the many people audio and video content of the target that event label information corresponds, and will many people audio and video content of the target returns and gives first user plays, thereby can record many people audio and video content simultaneously, record the trigger event that corresponds time node user or system emergence, provide the function of slice suggestion for the user who joins in later or browses later, can be convenient for this user to know fast or learn the many people audio and video content of its target that interests, can provide very big facility for the user of audio and video meeting.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 shows a flowchart of a method for presenting event tag information applied to a network device according to an embodiment of the present application;

fig. 2 shows a flowchart of a method for presenting event tag information applied to a first user equipment according to an embodiment of the present application;

FIG. 3 illustrates a schematic diagram of presenting event tag information, according to one embodiment of the present application;

FIG. 4 illustrates a flow diagram of a system method for presenting event tag information according to one embodiment of the present application;

FIG. 5 illustrates a network device architecture diagram presenting event tag information, according to one embodiment of the present application;

FIG. 6 illustrates a first user equipment structure diagram presenting event tag information according to one embodiment of the present application;

FIG. 7 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (PRAM), static Random-Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other Memory technology, compact Disc Read Only Memory (CD-ROM), digital Versatile Disc (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, etc. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically defined otherwise.

Fig. 1 shows a flowchart of a method for presenting event tag information applied to a network device side according to an embodiment of the present application, where the method includes step S11, step S12, step S13, and step S14. In step S11, the network device records multi-user audio/video content corresponding to a plurality of users during a conference process of an audio/video conference in which the plurality of users participate; in step S12, in response to a trigger event in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event; in step S13, the network device sends the event tag information to the multiple users, so as to present the event tag information on a conference time axis corresponding to the audio/video conference of the multiple users; in step S14, the network device receives an event viewing request about the event tag information sent by a first user of the multiple users, obtains target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returns the target multi-user audio and video content to the first user for playing.

In step S11, the network device records multi-user audio/video content corresponding to a plurality of users during a conference process of an audio/video conference in which the plurality of users participate. In some embodiments, the video input content and the audio input content of each user are recorded respectively, and the recorded multiple audio input contents corresponding to the multiple users are synthesized to obtain synthesized multi-user audio contents. In some embodiments, the recorded multiple video input contents corresponding to the multiple users are synthesized to obtain synthesized multi-user video contents, and at this time, the multi-user audio and video contents corresponding to the multiple users include the synthesized multi-user audio contents and the synthesized multi-user video contents. In some embodiments, it is not necessary to synthesize a plurality of video input contents corresponding to a plurality of recorded users, where the multi-user audio and video contents corresponding to the plurality of users include the synthesized multi-user audio contents and the plurality of video input contents, and when a subsequent user device plays the multi-user audio and video contents, the user device merges the plurality of video input contents together for playing. In some embodiments, after the audio/video conference starts, multi-user audio/video content corresponding to a plurality of users participating in the audio/video conference starts to be recorded, and after the audio/video conference ends, the recording of the multi-user audio/video content is stopped, or after all the users quit the audio/video conference, the recording of the multi-user audio/video content is stopped.

In step S12, in response to a trigger event in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. In some embodiments, a first user device, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio/video conference, acquires event description information corresponding to a trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to a network device, where the network device creates and stores event tag information corresponding to the trigger event according to a received trigger event creating request sent by the first user, where the event tag information includes event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, where the first user is any one of the multiple users participating in the audio/video conference, and the event description information includes, but is not limited to, an event title, an event focus discussion, event related personnel, event related business information, and the like. In some embodiments, in response to a trigger event that is automatically triggered according to one or more preset event trigger rules in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event description information corresponding to the trigger event and event trigger time corresponding to the trigger event, where the event trigger rule may be preset by a server, or may also be preset or temporarily set by a creator or an initiator of the audio/video conference, or may also be temporarily set by one participant of the audio/video conference. In some embodiments, the event triggering rule may be that a triggering event is automatically triggered at a predetermined time, for example, every 5 minutes from the beginning of the conference. In some embodiments, the event triggering rule may also be to automatically trigger a triggering event in response to a non-user action in the audio-video conference, for example, to automatically trigger a triggering event when the network condition of a user participating in the audio-video conference is poor, and to automatically trigger a triggering event when the network condition of the user is restored. In some embodiments, the event triggering rule may also be that a triggering event is automatically triggered in response to a user action of a certain user participating in the audio-video conference, for example, a triggering event is automatically triggered after a new user joins the audio-video conference, and a triggering event is automatically triggered after a user exits the audio-video conference. In some embodiments, for a trigger event automatically triggered according to one or more preset event trigger rules, event description information corresponding to the trigger event may be determined according to a target event trigger rule corresponding to the trigger event, or target multi-person audio/video content corresponding to the trigger event may be obtained from recorded multi-person audio/video content corresponding to multiple users participating in the audio/video conference according to an event trigger time corresponding to the trigger event, and then, by performing voice recognition and/or image recognition on the target multi-person audio/video content, event description information corresponding to the trigger event is determined, or event description information corresponding to the trigger event may be manually input by a creator, an initiator, or any one user participating in the audio/video conference and uploaded to a network device. In some embodiments, in response to each trigger event in the audio-video conference, the network device creates and stores event tag information corresponding to the trigger event, so that one or more event tag information exists in each audio-video conference. In some embodiments, each event tag information further includes event identification information (e.g., an event ID) for uniquely identifying the event tag information, and each event tag information corresponds to a different respective event identification information.

In step S13, the network device sends the event tag information to the multiple users, so as to present the event tag information on a conference time axis corresponding to the audio/video conference of the multiple users. In some embodiments, for each piece of event tag information, after the piece of event tag information is created, the piece of event tag information is sent to each user participating in the audio/video conference, and the piece of event tag information is presented on a conference time axis corresponding to the audio/video conference of each user. In some embodiments, as shown in fig. 3, after each user joins in an audio/video conference, a conference time axis (time progress bar) is presented on an interface (conference window) of the audio/video conference, a starting point of the conference time axis is a conference starting time of the audio/video conference or a time when a first user joins the audio/video conference last time, an ending point of the conference time axis is a current time or a predetermined conference event ending time corresponding to the audio/video conference, and one or more event Tag information (e.g., tag1, tag2, etc.) sent to each user by a network device is presented on the conference time axis, where each event Tag information is presented at a corresponding position corresponding to an event trigger time in the event Tag information on the conference time axis. In some embodiments, according to the event trigger time in the event tag information, the event tag information is directly presented at a corresponding position on the conference timeline, or only a control (e.g., an arrow-shaped graphical control) corresponding to the event tag information is presented at the corresponding position on the conference timeline, and a user needs to move his finger or a mouse used by the user over the control to present the event tag information at the corresponding position, or needs to click the control to present the event tag information at the corresponding position.

In step S14, the network device receives an event viewing request about the event tag information sent by a first user of the multiple users, obtains target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returns the target multi-user audio and video content to the first user for playing. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, the first user performs an event operation on certain event tag information presented on the conference timeline, generates an event viewing request about the event tag information, and sends the event viewing request to the network device, where the event viewing request includes event identification information in the event tag information. In some embodiments, according to a received event viewing request sent by a first user, and according to event identification information in the event viewing request, the network device searches for event label information identified by the event identification information from one or more pieces of event label information corresponding to the audio/video conference stored in the network device, and then obtains target multi-user audio/video content corresponding to the event label information from multi-user audio/video content corresponding to multiple users of the audio/video conference. In some embodiments, according to the event trigger time in the event tag information, the target multi-person audio/video content after the event trigger time is obtained from the multi-person audio/video content, and the audio/video data stream of the target multi-person audio/video content is returned to the first user for playing.

In some embodiments, the step S12 includes: the method comprises the steps that network equipment receives a trigger event creating request sent by a first user, wherein the trigger event creating request comprises event description information corresponding to a trigger event; and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and the event description information, and the event trigger time is request time corresponding to the trigger event creation request. In some embodiments, a first user device, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio/video conference, acquires event description information corresponding to a trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to a network device, where the network device creates and stores event tag information corresponding to the trigger event according to a received trigger event creating request sent by the first user, where the event tag information includes event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, where the first user is any one of the multiple users participating in the audio/video conference, and the event description information includes, but is not limited to, an event title, an event focus discussion, event related personnel, event related business information, and the like.

In some embodiments, the step S12 includes: and the network equipment responds to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference, and creates and stores event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. In some embodiments, in response to a trigger event that is automatically triggered according to one or more preset event trigger rules in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event description information corresponding to the trigger event and event trigger time corresponding to the trigger event, where the event trigger rule may be preset by a server, or may also be preset or temporarily set by a creator or an initiator of the audio/video conference, or may also be temporarily set by one participant of the audio/video conference. In some embodiments, the event triggering rule may be that a triggering event is automatically triggered at a predetermined time, for example, every 5 minutes from the beginning of the conference. In some embodiments, the event triggering rule may also be to automatically trigger a triggering event in response to some non-user action in the audio-visual conference, for example, to automatically trigger a triggering event when the network condition of a certain user participating in the audio-visual conference is poor, and to automatically trigger a triggering event when the network condition of the user is recovered. In some embodiments, the event triggering rule may also be that a triggering event is automatically triggered in response to a user action of a certain user participating in the audio-video conference, for example, a triggering event is automatically triggered after a new user joins the audio-video conference, and a triggering event is automatically triggered after a user exits the audio-video conference. In some embodiments, for a trigger event automatically triggered according to one or more preset event trigger rules, event description information corresponding to the trigger event may be determined according to a target event trigger rule corresponding to the trigger event, or target multi-person audio/video content corresponding to the trigger event may be obtained from recorded multi-person audio/video content corresponding to multiple users participating in the audio/video conference according to an event trigger time corresponding to the trigger event, and then, by performing voice recognition and/or image recognition on the target multi-person audio/video content, event description information corresponding to the trigger event is determined, or event description information corresponding to the trigger event may be manually input by a creator, an initiator, or any one user participating in the audio/video conference and uploaded to a network device.

In some embodiments, the triggering event includes, but is not limited to:

1) Meeting start event or meeting end event

In some embodiments, the conference start event includes, but is not limited to, the audio-visual conference being created, more than a predetermined number of users having joined the audio-visual conference, a user speaking for the first time in the audio-visual conference, and the like. In some embodiments, the meeting end event includes, but is not limited to, all users in the audio-video meeting having exited, a ratio of the number of users having exited to the number of all users in the audio-video meeting being greater than or equal to a predetermined ratio threshold, no user of the audio-video meeting having spoken within a predetermined time frame (e.g., 5 minutes), etc.

2) Event of change of participants

In some embodiments, participant change events include, but are not limited to, each time a user joins the audiovisual conference, each time a user exits the audiovisual conference, and the like.

3) Speaking event of participant

In some embodiments, participant speaking events include, but are not limited to, speaking every time there is a user speaking, speaking every time there is a previously undescribed user speaking, and the like.

4) Participant turn on or off audio-video events

In some embodiments, the participant turn on or off audiovisual events include, but are not limited to, turning on audiovisual inputs each time a user turns on audiovisual inputs, turning off audiovisual inputs each time a user turns off audiovisual inputs, and the like.

5) Participant start or stop sharing screen content events

In some embodiments, participant start or stop sharing screen content events include, but are not limited to, starting sharing screen content each time there is a user, stopping sharing screen content each time there is a user, and the like.

6) Network change event

In some embodiments, the network change event may be to automatically trigger a trigger event when the network condition of a certain user is poor, or may also be to automatically trigger a trigger event when the network condition of the user is recovered, or may also be to automatically trigger a trigger event when the overall network condition of the audio/video conference is poor, or may also be to automatically trigger a trigger event when the overall network condition of the audio/video conference is recovered.

7) Predetermined time trigger event

In some embodiments, a trigger event may be automatically triggered at a predetermined time, for example, every 5 minutes from the start of the conference.

In some embodiments, the method further comprises: and the network equipment determines the event description information according to a target event trigger rule corresponding to the trigger event. In some embodiments, if the event triggering rule is that a triggering event is automatically triggered at a predetermined time, the event description information corresponding to the triggering event may be automatically determined according to the predetermined time, for example, if a triggering event is automatically triggered every 5 minutes after the conference starts, the event description information corresponding to the triggering event is "5 minutes", "10 minutes", "15 minutes", and the like. In some embodiments, if the event trigger rule is to automatically trigger a trigger event in response to a certain non-User action in the audio/video conference, or if the event trigger rule is to automatically trigger a trigger event in response to a User action of a certain User participating in the audio/video conference, the event description information corresponding to the trigger event may be automatically determined according to the action content of the non-User action or the User action, for example, if a trigger event is automatically triggered after the User1 joins the audio/video conference, the event description information corresponding to the trigger event is "User1 joins the conference", and if a trigger event is automatically triggered after the User2 exits the audio/video conference, the event description information corresponding to the trigger event is "User2 exits the conference".

In some embodiments, the method further comprises: the network equipment acquires target multi-person audio and video contents corresponding to the event tag information from the multi-person audio and video contents; and carrying out voice recognition and/or image recognition on the target multi-person audio and video content, and determining the event description information. In some embodiments, according to the event trigger time in the event tag information, target multi-person audio and video content after the event trigger time is obtained from multi-person audio and video content corresponding to multiple recorded users of the audio and video conference, then voice recognition and/or image recognition is performed on the target multi-person audio and video content, and event description information such as an event title, an event key discussion item, event related personnel, event related service information and the like is determined according to a voice recognition result and/or an image recognition result.

In some embodiments, the method further comprises the following steps performed after said step S14: receiving a single event viewing request sent by the first user, wherein the single event viewing request comprises target user identification information; and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-user audio and video content, and returning the single audio and video content to the first user for playing. In some embodiments, for a target multi-user audio and video content corresponding to certain event tag information, a first user may select a dimension request of a certain target user from multiple users participating in the audio and video conference to independently play a single-user audio and video content of the target multi-user audio and video content in the dimension of the target user, generate a single-user event viewing request including the target user identification information, and send the single-user event viewing request to the network device. In some embodiments, the single event check request further includes identification information of the target multi-person audio/video content, or the single event check request further includes event identification information, the identified target multi-person audio/video content can be obtained in the network device according to the identification information of the target multi-person audio/video content or the event identification information, then the single audio/video content of the target user dimension identified by the target user identification information is obtained from the target multi-person audio/video content according to the target user identification information, and an audio/video data stream of the single audio/video content is returned to the first user for playing.

In some embodiments, the method further comprises: after the audio and video conference is finished, the network equipment stores the multi-person audio and video content, and generates and stores event list information corresponding to the audio and video conference according to a plurality of event tag information corresponding to the audio and video conference; responding to a conference viewing request which is sent by the first user and relates to the audio and video conference in a historical conference list in which the first user participates, and returning the event list information to the first user for presentation; and responding to an event viewing request which is sent by the first user and is about target event tag information in the event list information, obtaining target multi-person audio and video content corresponding to the target event tag information from the multi-person audio and video content, and returning the target multi-person audio and video content to the first user for playing. In some embodiments, after the audio-video conference is finished, the recording of the multi-person audio-video content corresponding to the audio-video conference is stopped, the recorded multi-person audio-video content is stored, the event tag information corresponding to the audio-video conference is combined into an event list and stored in the network device, a first user can see the audio-video conference in a historical conference list which the first user participates in, the event list can be seen when the audio-video conference is started, the event tag information corresponding to the audio-video conference can be seen, then the first user clicks one of the event tag information, the target multi-person audio-video content corresponding to the event tag information obtained from the multi-person audio-video content can be seen, in some embodiments, user identification information of the multiple users who participate in the audio-video conference can be also presented on a playing interface of the target multi-person audio-video content, the first user clicks one of the user identification information, and the target multi-person audio-video content under the user dimension identified by the user identification information can be seen, wherein the first user is any one of the multiple users who participate in the audio-video conference, and no limitation is made.

In some embodiments, the event tag information further includes a first frame video frame corresponding to the target multi-person audio-video content. In some embodiments, each event tag information further includes a first frame video frame of the target multi-person audio-video content corresponding to the event tag information. In some embodiments, according to the event trigger time in the event tag information, the first frame of video picture is directly presented at a corresponding position on the conference timeline corresponding to the event tag information, or the user needs to move his finger or a mouse used by the user to the corresponding position on the conference timeline to present the first frame of video picture, or the user needs to click the corresponding position on the conference timeline to present the first frame of video picture.

In some embodiments, the returning the target multi-person audio-video content to the first user for playing includes: and returning the target multi-user audio and video content to the first user for playing, and sending at least one user identification information of at least one user corresponding to the target multi-user audio and video content to the first user for presentation. In some embodiments, when the target multi-person audio and video content is provided for the first user to play, at least one user identification information corresponding to at least one user participating in the target multi-person audio and video content is provided for the first user to present, and the first user clicks one user identification information therein, so that the single-person audio and video content of the target multi-person audio and video content in the user dimension identified by the user identification information can be viewed.

In some embodiments, the method further comprises: and the network equipment performs voice recognition on the target multi-person audio and video content and determines at least one user corresponding to the target multi-person audio and video content. In some embodiments, at least one user who has an audio input that is a spoken word in the target multi-person audio-video content is obtained by performing speech recognition on the target multi-person audio-video content, the at least one user is determined as the at least one user participating in the target multi-person audio-video content, and at least one user identification information corresponding to the at least one user is provided for the first user to be presented.

In some embodiments, the event tag information further includes an event end time corresponding to the trigger event; the obtaining of the target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content includes: and acquiring target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content according to the event triggering time and the event ending time. In some embodiments, if the event tag information only includes the event trigger time, the target multi-person audio-video content from the event trigger time to the last is obtained from the multi-person audio-video content corresponding to the audio-video conference according to the event trigger time. In some embodiments, if the event tag information includes an event trigger time and an event end time, the target multi-person audio-video content from the event trigger time to the event end time is obtained from the multi-person audio-video content corresponding to the audio-video conference according to the event trigger time and the event end time.

In some embodiments, the step S12 includes: the method comprises the steps that network equipment receives a trigger event creating request and a trigger event ending request sent by a first user, wherein the trigger event creating request or the trigger event ending request comprises event description information corresponding to a trigger event; and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event description information corresponding to the trigger event and event end time corresponding to the trigger event, the event trigger time is first request time corresponding to the trigger event creation request, and the event end time is second request time corresponding to the trigger event expiration request. In some embodiments, the first user equipment generates a trigger event creation request and sends the trigger event creation request to the network equipment in response to a trigger event creation operation executed by the first user, then generates a trigger event expiration request and sends the trigger event expiration request to the network equipment in response to a trigger event expiration operation executed by the first user, takes a request time corresponding to the trigger event creation request as an event trigger time corresponding to the trigger event, and takes a request time corresponding to the trigger event expiration request as an event end time corresponding to the trigger event. In some embodiments, the first user may input event description information corresponding to the trigger event in the trigger event creation operation performed by the first user, where the trigger event creation request includes the event description information. In some embodiments, the first user may input event description information corresponding to the trigger event in a trigger event expiration operation performed by the first user, and at this time, the trigger event expiration request includes the event description information.

In some embodiments, the step S12 includes: the method comprises the steps that a network device responds to a trigger event which is automatically triggered according to one or more preset event trigger rules in an audio and video conference and an ending trigger event corresponding to the trigger event, and creates and stores event label information corresponding to the trigger event, wherein the event label information comprises event trigger time corresponding to the trigger event, event ending time corresponding to the ending trigger event and event description information corresponding to the trigger event. In some embodiments, there is a corresponding end trigger event for each of at least one of the one or more trigger events that is automatically triggered according to the one or more predefined event trigger rules. In some embodiments, in response to an automatically triggered trigger event, event tag information corresponding to the trigger event is created, where the event tag information includes event trigger time and event description information corresponding to the trigger event, and then, in response to an automatically triggered end trigger event, the event tag information corresponding to the trigger event is updated, and event end time corresponding to the end trigger event is newly added to the event tag information. In some embodiments, in response to an automatically triggered triggering event, first record event triggering time and event description information corresponding to the triggering event, and then in response to an automatically triggered ending triggering event, create event tag information corresponding to the triggering event, where the event tag information includes the previously recorded event triggering time and event description information, and also includes event ending time corresponding to the ending triggering event. For example, when a certain trigger event is that a user starts speaking every time, the ending trigger event corresponding to the trigger event is that the user ends speaking. For another example, when a certain trigger event is that a user opens the audio/video input, the end trigger event corresponding to the trigger event is that the user closes the audio/video input.

Fig. 2 shows a flowchart of a method for presenting event tag information applied to a first user equipment, according to an embodiment of the present application, where the method includes step S21, step S22, and step S23. In step S21, a first user device receives event tag information sent by a network device during a conference process of an audio/video conference in which multiple users participate, and presents the event tag information on a conference time axis corresponding to the audio/video conference, where the event tag information is created by the network device in response to a trigger event in the audio/video conference, and the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event; in step S22, the first user equipment generates an event viewing request about the event label information to send to the network equipment, in response to an event viewing operation performed by the first user for the event label information; in step S23, the first user equipment receives the target multi-user audio and video content corresponding to the event tag information returned by the network equipment, and plays the target multi-user audio and video content, where the target multi-user audio and video content is obtained from multi-user audio and video contents corresponding to the multiple users recorded by the network equipment in the meeting process.

In step S21, a first user device receives event tag information sent by a network device during a conference process of an audio/video conference in which multiple users participate, and presents the event tag information on a conference time axis corresponding to the audio/video conference, where the event tag information is created by the network device in response to a trigger event in the audio/video conference, and the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. In some embodiments, the event description information includes, but is not limited to, an event title, an event focus discussion, event related personnel, event related business information, and the like. In some embodiments, the first user device, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio/video conference, acquires event description information corresponding to the trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to the network device, the network device creates and stores event tag information corresponding to the trigger event according to the received trigger event creating request sent by the first user, where the event tag information includes the event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, and then sends the event tag information to each user participating in the audio/video conference. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, in response to a trigger event that is automatically triggered according to one or more preset event trigger rules in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event description information corresponding to the trigger event and event trigger time corresponding to the trigger event, and then sends the event tag information to each user participating in the audio/video conference. In some embodiments, the event triggering rule may be preset by the server, or may also be preset or temporarily set by a creator or an initiator of the audio/video conference, or may also be temporarily set by one of participants of the audio/video conference, and the event triggering rule is described in detail in the foregoing embodiments and is not described herein again. In some embodiments, each event tag information further includes event identification information (e.g., an event ID) for uniquely identifying the event tag information, and each event tag information corresponds to a different respective event identification information. In some embodiments, as shown in fig. 3, after each user joins the audio/video conference, a conference time axis is presented on an interface of the audio/video conference, and one or more pieces of event tag information sent to each user by the network device are presented on the conference time axis, where each piece of event tag information is presented at a corresponding position corresponding to an event trigger time in the event tag information on the conference time axis. In some embodiments, according to the event trigger time in the event tag information, the event tag information is directly presented at a corresponding position on the conference timeline, or only a control (e.g., an arrow-shaped graphical control) corresponding to the event tag information is presented at the corresponding position on the conference timeline, and a user needs to move his finger or a mouse used by the user over the control to present the event tag information at the corresponding position, or needs to click the control to present the event tag information at the corresponding position.

In step S22, the first user device generates an event viewing request about the event label information to send to the network device in response to an event viewing operation performed by the first user for the event label information. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, the first user performs an event operation on certain event tag information presented on the conference timeline, generates an event viewing request about the event tag information, and sends the event viewing request to the network device, where the event viewing request includes event identification information in the event tag information.

In step S23, the first user equipment receives the target multi-user audio and video content corresponding to the event tag information returned by the network equipment, and plays the target multi-user audio and video content, where the target multi-user audio and video content is obtained from multi-user audio and video contents corresponding to the multiple users recorded by the network equipment in the meeting process. In some embodiments, according to a received event viewing request sent by a first user, and according to event identification information in the event viewing request, the network device searches for event label information identified by the event identification information from one or more pieces of event label information corresponding to the audio/video conference stored in the network device, and then obtains target multi-user audio/video content corresponding to the event label information from multi-user audio/video content corresponding to multiple users of the audio/video conference. In some embodiments, according to the event trigger time in the event tag information, the target multi-person audio/video content after the event trigger time is obtained from the multi-person audio/video content, and the audio/video data stream of the target multi-person audio/video content is returned to the first user for playing. In some embodiments, if the multi-person audio-video content includes synthesized multi-person audio content and synthesized multi-person video content, the multi-person audio-video content may be directly played. In some embodiments, if the multi-user audio/video content includes the synthesized multi-user audio content and the plurality of video input contents, when a subsequent user device plays the multi-user audio/video content, the user device merges the plurality of video input contents together for playing.

In some embodiments, the method further comprises: the method comprises the steps that a first user device responds to a trigger event creating operation executed by a first user, obtains event description information corresponding to a trigger event input by the first user, generates a trigger event creating request and sends the trigger event creating request to a network device, wherein the trigger event creating request comprises the event description information. In some embodiments, the first user device, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio/video conference, acquires event description information corresponding to a trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to the network device, where the network device creates and stores event tag information corresponding to the trigger event according to a received trigger event creating request sent by the first user, where the event tag information includes event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, for example, the first user applies for creating a trigger event at a current time point of the audio/video conference, edits event description information corresponding to the trigger event, and then reports the trigger event description information to the network device.

In some embodiments, the starting point of the conference timeline is the conference starting time of the audio and video conference or the time when the audio and video conference has the first user joining last time, and the ending point of the conference timeline is the current time or the predetermined conference event ending time corresponding to the audio and video conference. In some embodiments, the starting point of the conference time axis is a conference starting time of the audio and video conference or a time when a first user joins the audio and video conference at the last time, and the ending point of the conference time axis is a current time or a predetermined conference event ending time corresponding to the audio and video conference.

In some embodiments, the method further comprises: and the first user equipment automatically reduces the audio volume of the audio and video conference in the playing process of the target multi-person audio and video content. In some embodiments, in the process of playing the target multi-user audio/video content, to avoid audio interference, the audio volume of the audio/video conference is automatically reduced, and the reduction amplitude may be preset by the network device or preset by the first user.

In some embodiments, the method further comprises: and the first user equipment responds to the volume adjustment operation of the first user aiming at the target multi-person audio and video content or the audio and video conference in the playing process of the target multi-person audio and video content, and adjusts the audio volume of the target multi-person audio and video content or the audio and video conference. In some embodiments, in the process of playing the target multi-person audio-video content, to avoid audio interference, the first user may manually adjust the audio volume of the target multi-person audio-video content, or may manually adjust the audio volume of the audio-video conference.

In some embodiments, the method further comprises at least one of: the first user equipment continuously records the input audio of the first user in the playing process of the target multi-person audio and video content; and stopping recording the input audio of the first user in the playing process of the target multi-person audio and video content. In some embodiments, during the playing of the target multi-person audio-video content, the first user's microphone stops recording the first user's input audio. In some embodiments, during the playing of the target multi-person audio-video content, the first user's microphone keeps recording the first user's input audio and continues to send to other users participating in the audio-video conference.

In some embodiments, the receiving the target multi-person audio and video content corresponding to the event tag information returned by the network device, and playing the target multi-person audio and video content includes: receiving target multi-person audio and video content corresponding to the event tag information returned by the network equipment and at least one user identification information of at least one user corresponding to the target multi-person audio and video content; playing the target multi-person audio and video content and presenting the at least one user identification information; and in response to the event viewing operation of the first user aiming at the target user identification information in the at least one user identification information, stopping playing the target multi-user audio and video content, and playing the single-user audio and video content corresponding to the user identified by the target user identification information. In some embodiments, the network device provides, while providing the target multi-user audio-video content to the first user for playing, at least one user identification information corresponding to at least one user participating in the target multi-user audio-video content to the first user for presentation, and the first user may view the single-user audio-video content of the user dimension identified by the user identification information by clicking one of the user identification information. In some embodiments, the network device provides, while providing the target multi-user audio-video content to the first user for playing, at least one user identification information corresponding to at least one user participating in the target multi-user audio-video content to the first user for presentation, and the first user can view the single-user audio-video content of the target multi-user audio-video content in the user dimension identified by the user identification information by clicking one of the user identification information.

In some embodiments, the playing the single audiovisual content corresponding to the user identified by the target user identification information includes: and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-user audio and video content, and playing the single audio and video content. In some embodiments, according to the target user identification information, the single-person audio/video content of the target user dimension identified by the target user identification information is directly obtained from the target multi-person audio/video content locally cached or locally stored by the first user equipment, and then the single-person audio/video content is played.

In some embodiments, the playing the single audiovisual content corresponding to the user identified by the target user identification information includes: generating a single-person event viewing request and sending the single-person event viewing request to the network equipment, wherein the single-person event viewing request comprises the target user identification information; and receiving the single audio and video content which is returned by the network equipment and corresponds to the user identified by the target user identification information, and playing the single audio and video content. In some embodiments, the first user performs an event viewing operation on target user identification information in at least one user identification information presented on the first user device, generates a single event viewing request including the target user identification information, and sends the single event viewing request to the network device. In some embodiments, the single event viewing request further includes identification information or event identification information of the target multi-person audio/video content, the network device may obtain the identified target multi-person audio/video content in the network device according to the received single event viewing request sent by the first user, according to the identification information of the target multi-person audio/video content or the event identification information, then obtain the single audio/video content of the target user dimension identified by the target user identification information from the target multi-person audio/video content according to the target user identification information, and return the audio/video data stream of the single audio/video content to the first user for playing.

In some embodiments, the method further comprises: after the audio and video conference is finished, the network equipment responds to the conference viewing operation of the first user aiming at the audio and video conference in the participated historical conference list of the first user, generates a conference viewing request about the audio and video conference and sends the conference viewing request to the network equipment; receiving and presenting event list information corresponding to the audio and video conference returned by the network equipment; responding to an event viewing operation of the first user for target event label information in the event list information, generating an event viewing request about the target event label information and sending the event viewing request to the network equipment; and receiving and presenting target multi-person audio and video content corresponding to the target event tag information obtained from the multi-person audio and video content returned by the network equipment. In some embodiments, after the audio/video conference is finished, the network device stops recording the multi-person audio/video content corresponding to the audio/video conference, stores the recorded multi-person audio/video content, merges a plurality of event tag information corresponding to the audio/video conference into an event list and stores the event list in the network device, and then the first user can see the audio/video conference in a history conference list in which the first user participates, can see the event list by clicking the audio/video conference, can see a plurality of event tag information corresponding to the audio/video conference, and then can see the target multi-person audio/video content corresponding to the event tag information obtained from the multi-person audio/video content by clicking one of the event tag information. In some embodiments, user identification information of multiple users participating in the audio-video conference is also presented on a playing interface of the target multi-person audio-video content, and a first user clicks one of the user identification information to view the single-person audio-video content of the target multi-person audio-video content in the user dimension identified by the user identification information.

In some embodiments, the event tag information further includes a first frame video frame corresponding to the target multi-person audio-video content. In some embodiments, each event tag information further includes a first frame video frame of the target multi-person audio-video content corresponding to the event tag information. In some embodiments, according to the event trigger time in the event tag information, the first user equipment may directly present the first frame of video image at a corresponding position on the conference timeline corresponding to the event tag information, or the user may present the first frame of video image only by moving a finger of the user or a mouse used by the user to the corresponding position on the conference timeline, or may present the first frame of video image only by clicking the corresponding position on the conference timeline.

In some embodiments, the event tag information further includes an event end time corresponding to the trigger event. In some embodiments, the event tag information further includes an event end time, and the event end time may be directly presented on the conference timeline, or the user needs to move his finger or his mouse to the display position of the event tag information on the conference timeline to present the event end time, or the user needs to click the display position of the event tag information on the conference timeline to present the event end time. In some embodiments, each of at least one of the one or more trigger events that the network device automatically triggers according to the preset one or more event trigger rules has a corresponding end trigger event. In some embodiments, in response to an automatically triggered trigger event, a network device creates event tag information corresponding to the trigger event, where the event tag information includes event trigger time and event description information corresponding to the trigger event, then, in response to an automatically triggered end trigger event, updates the event tag information corresponding to the trigger event, and adds an event end time corresponding to the end trigger event in the event tag information. In some embodiments, in response to an automatically triggered trigger event, a network device records an event trigger time and event description information corresponding to the trigger event, and then creates event tag information corresponding to the trigger event in response to an automatically triggered end trigger event, where the event tag information includes the previously recorded event trigger time and event description information, and also includes an event end time corresponding to the end trigger event. For example, when a certain trigger event is that a user starts speaking every time, the ending trigger event corresponding to the trigger event is that the user ends speaking. For another example, when a certain trigger event is that a user opens the audio/video input, the end trigger event corresponding to the trigger event is that the user closes the audio/video input.

In some embodiments, the method further comprises: the method comprises the steps that a first user device responds to a trigger event creating operation and a trigger event ending operation executed by a first user, obtains event description information corresponding to a trigger event input by the first user, generates a trigger event creating request and a trigger event ending request and sends the trigger event creating request and the trigger event ending request to a network device, wherein the trigger event creating request or the trigger event ending request comprises the event description information. In some embodiments, the first user equipment generates a trigger event creation request in response to a trigger event creation operation performed by the first user, and sends the trigger event creation request to the network equipment, the network equipment uses a request time corresponding to the trigger event creation request as an event trigger time corresponding to the trigger event, then the first user equipment generates a trigger event expiration request in response to a trigger event expiration operation performed by the first user, and sends the trigger event expiration request to the network equipment, and the network equipment uses a request time corresponding to the trigger event expiration request as an event end time corresponding to the trigger event. In some embodiments, the first user may input event description information corresponding to the trigger event in the trigger event creating operation performed by the first user, and at this time, the trigger event creating request includes the event description information. In some embodiments, the first user may input event description information corresponding to the trigger event in a trigger event expiration operation performed by the first user, and at this time, the trigger event expiration request includes the event description information.

FIG. 4 illustrates a flow diagram of a system method for presenting event tag information according to one embodiment of the present application.

As shown in fig. 4, in step S31, the network device records multi-user audio/video content corresponding to a plurality of users in a conference process of an audio/video conference in which the plurality of users participate, where step S31 is the same as or similar to step S11, and is not described herein again; in step S32, the network device creates and stores event tag information corresponding to the trigger event in response to the trigger event in the audio/video conference, where the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event, and step S32 is the same as or similar to step S12, and is not described again here; in step S33, the network device sends the event label information to the first user equipment, and step S33 is the same as or similar to step S13, and is not described herein again; in step S34, the first user equipment receives the event tag information sent by the network equipment, and presents the event tag information on a conference time axis corresponding to the audio/video conference, where step S34 is the same as or similar to step S21, and is not described herein again; in step S35, the first user equipment generates an event viewing request related to the event tag information in response to an event viewing operation performed by the first user for the event tag information, and sends the event viewing request to the network equipment, where step S35 is the same as or similar to step S22, and is not described herein again; in step S36, the network device receives the event viewing request sent by the first user device, obtains target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returns the target multi-user audio and video content to the first user, where step S36 is the same as or similar to step S14, and is not described herein again; in step S37, the first user equipment receives the target multi-user audio and video content returned by the network equipment, and plays the target multi-user audio and video content, where step S37 is the same as or similar to step S23, and is not described herein again.

Fig. 5 is a block diagram of a network device for presenting event label information according to an embodiment of the present application, where the network device includes a one-to-one module 11, a two-to-two module 12, a three-to-three module 13, and a four-to-four module 14. The one-to-one module 11 is used for recording multi-user audio and video contents corresponding to a plurality of users in the process of an audio and video conference participated by the plurality of users; a second module 12, configured to create and store event tag information corresponding to a trigger event in the audio/video conference in response to the trigger event, where the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event; a third module 13, configured to send the event tag information to the multiple users, so as to present the event tag information on a conference time axis corresponding to the audio/video conference of the multiple users; a fourth module 14, configured to receive an event viewing request about the event tag information sent by a first user of the multiple users, obtain target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and return the target multi-user audio and video content to the first user for playing.

And the one-to-one module 11 is used for recording multi-user audio and video contents corresponding to a plurality of users in the process of the audio and video conference participated by the plurality of users. In some embodiments, the video input content and the audio input content of each user are recorded respectively, and the recorded multiple audio input contents corresponding to the multiple users are synthesized to obtain synthesized multi-user audio contents. In some embodiments, the recorded multiple video input contents corresponding to the multiple users are synthesized to obtain synthesized multi-user video contents, and at this time, the multi-user audio and video contents corresponding to the multiple users include the synthesized multi-user audio contents and the synthesized multi-user video contents. In some embodiments, it is not necessary to synthesize a plurality of video input contents corresponding to a plurality of recorded users, where the multi-user audio and video contents corresponding to the plurality of users include the synthesized multi-user audio contents and the plurality of video input contents, and when a subsequent user device plays the multi-user audio and video contents, the user device merges the plurality of video input contents together for playing. In some embodiments, after the audio/video conference starts, multi-user audio/video content corresponding to a plurality of users participating in the audio/video conference starts to be recorded, and after the audio/video conference ends, the recording of the multi-user audio/video content is stopped, or after all the users quit the audio/video conference, the recording of the multi-user audio/video content is stopped.

And a secondary module 12, configured to create and store event tag information corresponding to a trigger event in response to the trigger event in the audio/video conference, where the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. In some embodiments, a first user device, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio/video conference, acquires event description information corresponding to a trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to a network device, where the network device creates and stores event tag information corresponding to the trigger event according to a received trigger event creating request sent by the first user, where the event tag information includes event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, where the first user is any one of the multiple users participating in the audio/video conference, and the event description information includes, but is not limited to, an event title, an event focus discussion, event related personnel, event related business information, and the like. In some embodiments, in response to a trigger event that is automatically triggered according to one or more preset event trigger rules in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event description information corresponding to the trigger event and event trigger time corresponding to the trigger event, where the event trigger rule may be preset by a server, or may also be preset or temporarily set by a creator or an initiator of the audio/video conference, or may also be temporarily set by one participant of the audio/video conference. In some embodiments, the event triggering rule may be that a triggering event is automatically triggered at a predetermined time, for example, every 5 minutes from the beginning of the conference. In some embodiments, the event triggering rule may also be to automatically trigger a triggering event in response to a non-user action in the audio-video conference, for example, to automatically trigger a triggering event when the network condition of a user participating in the audio-video conference is poor, and to automatically trigger a triggering event when the network condition of the user is restored. In some embodiments, the event triggering rule may also be that a triggering event is automatically triggered in response to a user action of a certain user participating in the audio-video conference, for example, a triggering event is automatically triggered after a new user joins the audio-video conference, and a triggering event is automatically triggered after a user exits the audio-video conference. In some embodiments, for a trigger event automatically triggered according to one or more preset event trigger rules, event description information corresponding to the trigger event may be determined according to a target event trigger rule corresponding to the trigger event, or target multi-person audio/video content corresponding to the trigger event may be obtained from recorded multi-person audio/video content corresponding to multiple users participating in the audio/video conference according to an event trigger time corresponding to the trigger event, and then, by performing voice recognition and/or image recognition on the target multi-person audio/video content, event description information corresponding to the trigger event is determined, or event description information corresponding to the trigger event may be manually input by a creator, an initiator, or any one user participating in the audio/video conference and uploaded to a network device. In some embodiments, in response to each trigger event in the audio-video conference, the network device creates and stores event tag information corresponding to the trigger event, so that one or more event tag information exists in each audio-video conference. In some embodiments, each event tag information further includes event identification information (e.g., an event ID) for uniquely identifying the event tag information, and each event tag information corresponds to a different respective event identification information.

And a third module 13, configured to send the event tag information to the multiple users, so as to present the event tag information on a conference time axis corresponding to the audio/video conference of the multiple users. In some embodiments, for each piece of event tag information, after the piece of event tag information is created, the piece of event tag information is sent to each user participating in the audio/video conference, and the piece of event tag information is presented on a conference time axis corresponding to the audio/video conference of each user. In some embodiments, as shown in fig. 3, after each user joins in an audio/video conference, a conference timeline (a time progress bar) is presented on an interface (a conference window) of the audio/video conference, a starting point of the conference timeline is a conference starting time of the audio/video conference or a time when a first user has joined in the audio/video conference most recently, an ending point of the conference timeline is a current time or a predetermined conference event ending time corresponding to the audio/video conference, one or more event Tag information (e.g., tag1, tag2, etc.) sent to each user by a network device is presented on the conference timeline, where each event Tag information is presented at a corresponding position corresponding to an event trigger time in the event Tag information on the conference timeline. In some embodiments, according to the event trigger time in the event tag information, the event tag information is directly presented at a corresponding position on the conference timeline, or only a control (e.g., an arrow-shaped graphical control) corresponding to the event tag information is presented at the corresponding position on the conference timeline, and a user needs to move his finger or a mouse used by the user over the control to present the event tag information at the corresponding position, or needs to click the control to present the event tag information at the corresponding position.

A fourth module 14, configured to receive an event viewing request about the event tag information sent by a first user of the multiple users, obtain target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and return the target multi-user audio and video content to the first user for playing. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, the first user performs an event operation on certain event tag information presented on the conference timeline, generates an event viewing request about the event tag information, and sends the event viewing request to the network device, where the event viewing request includes event identification information in the event tag information. In some embodiments, according to a received event viewing request sent by a first user, and according to event identification information in the event viewing request, network equipment searches for and obtains event tag information identified by the event identification information from one or more pieces of event tag information corresponding to the audio/video conference stored in the network equipment, and then obtains target multi-user audio/video content corresponding to the event tag information from multi-user audio/video content corresponding to multiple recorded users of the audio/video conference. In some embodiments, according to the event trigger time in the event tag information, the target multi-person audio/video content after the event trigger time is obtained from the multi-person audio/video content, and the audio/video data stream of the target multi-person audio/video content is returned to the first user for playing.

In some embodiments, the secondary module 12 is configured to: receiving a trigger event creating request sent by the first user, wherein the trigger event creating request comprises event description information corresponding to a trigger event; and creating and storing event label information corresponding to the trigger event, wherein the event label information comprises event trigger time corresponding to the trigger event and the event description information, and the event trigger time is request time corresponding to the trigger event creation request. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.

In some embodiments, the secondary module 12 is configured to: and in response to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference, creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the triggering event includes, but is not limited to:

1) Meeting start event or meeting end event

2) Event of change of participants

3) Speaking event of participant

4) Participant turn on or off audio-video events

5) Participant start or stop sharing screen content events

6) Network change event

7) Predetermined time trigger event

Here, the related trigger events are the same as or similar to those in the embodiment shown in fig. 1, and therefore are not described herein again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: and determining the event description information according to a target event trigger rule corresponding to the trigger event. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: acquiring target multi-person audio and video contents corresponding to the event tag information from the multi-person audio and video contents; and carrying out voice recognition and/or image recognition on the target multi-person audio and video content, and determining the event description information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: receiving a single-person event viewing request sent by the first user, wherein the single-person event viewing request comprises target user identification information; and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-person audio and video content, and returning the single audio and video content to the first user for playing. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: after the audio and video conference is finished, storing the multi-person audio and video content, and generating and storing event list information corresponding to the audio and video conference according to a plurality of event tag information corresponding to the audio and video conference; responding to a conference viewing request which is sent by the first user and relates to the audio and video conference in a historical conference list in which the first user participates, and returning the event list information to the first user for presentation; and responding to an event viewing request which is sent by the first user and is about to the target event label information in the event list information, obtaining target multi-person audio and video content corresponding to the target event label information from the multi-person audio and video content, and returning the target multi-person audio and video content to the first user for playing. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.

In some embodiments, the event tag information further includes a first frame video frame corresponding to the target multi-person audio-video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.

In some embodiments, the returning the target multi-person audio-video content to the first user for playing includes: and returning the target multi-user audio and video content to the first user for playing, and sending at least one user identification information of at least one user corresponding to the target multi-user audio and video content to the first user for presentation. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: and performing voice recognition on the target multi-person audio and video content, and determining at least one user corresponding to the target multi-person audio and video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the event tag information further includes an event end time corresponding to the trigger event; the obtaining of the target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content includes: and acquiring target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content according to the event triggering time and the event ending time. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the secondary module 12 is configured to: receiving a trigger event creating request and a trigger event ending request sent by the first user, wherein the trigger event creating request or the trigger event ending request comprises event description information corresponding to the trigger event; and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event description information corresponding to the trigger event and event end time corresponding to the trigger event, the event trigger time is first request time corresponding to the trigger event creation request, and the event end time is second request time corresponding to the trigger event expiration request. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.

In some embodiments, the secondary module 12 is configured to: and responding to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference and an end trigger event corresponding to the trigger event, and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event end time corresponding to the end trigger event and event description information corresponding to the trigger event. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

Fig. 6 shows a structure diagram of a first user equipment for presenting event label information according to an embodiment of the present application, which includes a two-to-one module 21, a two-to-two module 22, and a two-to-three module 23. The second module 21 is configured to receive event tag information sent by a network device in a conference process of an audio/video conference in which multiple users participate, and present the event tag information on a conference time axis corresponding to the audio/video conference, where the event tag information is created by the network device in response to a trigger event in the audio/video conference, and the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event; a second module 22, configured to generate an event viewing request regarding the event tag information and send the event viewing request to the network device in response to an event viewing operation performed by the first user for the event tag information; and a second and third module 23, configured to receive a target multi-user audio and video content corresponding to the event tag information returned by the network device, and play the target multi-user audio and video content, where the target multi-user audio and video content is obtained from multi-user audio and video contents corresponding to the multiple users recorded by the network device in the meeting process.

The second module 21 is configured to receive event tag information sent by a network device in a conference process of an audio/video conference in which multiple users participate, and present the event tag information on a conference time axis corresponding to the audio/video conference, where the event tag information is created by the network device in response to a trigger event in the audio/video conference, and the event tag information includes event trigger time corresponding to the trigger event and event description information corresponding to the trigger event. In some embodiments, the event description information includes, but is not limited to, an event title, an event focus discussion, event related personnel, event related business information, and the like. In some embodiments, in response to a trigger event creating operation executed by a first user of multiple users participating in the audio and video conference, a first user device obtains event description information corresponding to a trigger event input by the first user, generates a trigger event creating request, and sends the trigger event creating request to a network device, the network device creates and stores event tag information corresponding to the trigger event according to the received trigger event creating request sent by the first user, where the event tag information includes the event description information input by the first user and event trigger time corresponding to the trigger event, that is, request time corresponding to the trigger event creating request, and then sends the event tag information to each user participating in the audio and video conference. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, in response to a trigger event that is automatically triggered according to one or more preset event trigger rules in the audio/video conference, the network device creates and stores event tag information corresponding to the trigger event, where the event tag information includes event description information corresponding to the trigger event and event trigger time corresponding to the trigger event, and then sends the event tag information to each user participating in the audio/video conference. In some embodiments, the event triggering rule may be preset by the server, or may also be preset or temporarily set by a creator or an initiator of the audio/video conference, or may also be temporarily set by one of participants of the audio/video conference, and the event triggering rule is described in detail in the foregoing embodiments and is not described herein again. In some embodiments, each event tag information further includes event identification information (e.g., an event ID) for uniquely identifying the event tag information, and each event tag information corresponds to a different respective event identification information. In some embodiments, as shown in fig. 3, after each user joins in the audio/video conference, a conference time axis is presented on an interface of the audio/video conference, and one or more pieces of event tag information sent to each user by the network device are presented on the conference time axis, where each piece of event tag information is presented at a corresponding position corresponding to an event trigger time in the event tag information on the conference time axis. In some embodiments, according to the event trigger time in the event tag information, the event tag information is directly presented at a corresponding position on the conference timeline, or only a control (e.g., an arrow-shaped graphical control) corresponding to the event tag information is presented at the corresponding position on the conference timeline, and a user needs to move his finger or a mouse used by the user over the control to present the event tag information at the corresponding position, or needs to click the control to present the event tag information at the corresponding position.

A second module 22, configured to generate an event viewing request regarding the event tag information and send the event viewing request to the network device in response to an event viewing operation performed by the first user for the event tag information. In some embodiments, the first user is any one of a plurality of users participating in the audio-video conference, without limitation. In some embodiments, the first user performs an event operation on certain event tag information presented on the conference timeline, generates an event viewing request about the event tag information, and sends the event viewing request to the network device, where the event viewing request includes event identification information in the event tag information.

And the second and third modules 23 are configured to receive the target multi-person audio and video content corresponding to the event tag information returned by the network device, and play the target multi-person audio and video content, where the target multi-person audio and video content is obtained from multi-person audio and video contents corresponding to multiple users recorded by the network device in the meeting process. In some embodiments, according to a received event viewing request sent by a first user, and according to event identification information in the event viewing request, the network device searches for event label information identified by the event identification information from one or more pieces of event label information corresponding to the audio/video conference stored in the network device, and then obtains target multi-user audio/video content corresponding to the event label information from multi-user audio/video content corresponding to multiple users of the audio/video conference. In some embodiments, according to the event trigger time in the event tag information, the target multi-person audio/video content after the event trigger time is obtained from the multi-person audio/video content, and the audio/video data stream of the target multi-person audio/video content is returned to the first user for playing. In some embodiments, if the multi-user audio/video content includes synthesized multi-user audio content and synthesized multi-user video content, the multi-user audio/video content may be directly played. In some embodiments, if the multi-user audio/video content includes the synthesized multi-user audio content and the plurality of video input contents, when a subsequent user device plays the multi-user audio/video content, the user device merges the plurality of video input contents together for playing.

In some embodiments, the apparatus is further configured to: and responding to a trigger event creating operation executed by the first user, acquiring event description information corresponding to the trigger event input by the first user, generating a trigger event creating request and sending the trigger event creating request to the network equipment, wherein the trigger event creating request comprises the event description information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the starting point of the conference time axis is a conference starting time of the audio and video conference or a time when a first user joins the audio and video conference at the last time, and the ending point of the conference time axis is a current time or a predetermined conference event ending time corresponding to the audio and video conference. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: and automatically reducing the audio volume of the audio and video conference in the playing process of the target multi-person audio and video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: and in the playing process of the target multi-person audio and video content, responding to the volume adjustment operation of the first user aiming at the target multi-person audio and video content or the audio and video conference, and adjusting the audio volume of the target multi-person audio and video content or the audio and video conference. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further for at least one of: continuously recording the input audio of the first user in the playing process of the target multi-person audio and video content; and stopping recording the input audio of the first user in the playing process of the target multi-person audio and video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the receiving the target multi-person audio and video content corresponding to the event tag information returned by the network device, and playing the target multi-person audio and video content includes: receiving target multi-person audio and video content corresponding to the event tag information returned by the network equipment and at least one user identification information of at least one user corresponding to the target multi-person audio and video content; playing the target multi-person audio and video content and presenting the at least one user identification information; and in response to the event viewing operation of the first user aiming at the target user identification information in the at least one user identification information, stopping playing the target multi-user audio and video content, and playing the single-user audio and video content corresponding to the user identified by the target user identification information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the playing the single audiovisual content corresponding to the user identified by the target user identification information includes: and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-user audio and video content, and playing the single audio and video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the playing the single audiovisual content corresponding to the user identified by the target user identification information includes: generating a single-person event viewing request and sending the single-person event viewing request to the network equipment, wherein the single-person event viewing request comprises the target user identification information; and receiving the single audio and video content which is returned by the network equipment and corresponds to the user identified by the target user identification information, and playing the single audio and video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: after the audio and video conference is finished, responding to conference viewing operation of the first user aiming at the audio and video conference in a history conference list in which the first user participates, generating a conference viewing request about the audio and video conference and sending the conference viewing request to the network equipment; receiving and presenting event list information corresponding to the audio and video conference returned by the network equipment; responding to an event viewing operation of the first user for target event label information in the event list information, generating an event viewing request about the target event label information and sending the event viewing request to the network equipment; and receiving and presenting target multi-person audio and video content corresponding to the target event tag information obtained from the multi-person audio and video content returned by the network equipment. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the event tag information further includes a first frame video frame corresponding to the target multi-person audio-video content. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the event tag information further includes an event end time corresponding to the trigger event. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: responding to a trigger event creating operation and a trigger event ending operation executed by the first user, acquiring event description information corresponding to the trigger event input by the first user, generating a trigger event creating request and a trigger event ending request, and sending the trigger event creating request and the trigger event ending request to the network equipment, wherein the trigger event creating request or the trigger event ending request comprises the event description information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 2, and therefore are not described again, and are included herein by reference.

In some embodiments, as shown in FIG. 7, the system 300 can be implemented as any of the devices in the various embodiments described. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a holding computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Additionally, some portions of the present application may be applied as a computer program product, such as computer program instructions, which, when executed by a computer, may invoke or provide the method and/or solution according to the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not to denote any particular order.

Various aspects of various embodiments are defined in the claims. These and other aspects of the various embodiments are specified in the following numbered clauses:

1. a method for presenting event label information is applied to a network equipment terminal, wherein the method comprises the following steps:

2. The method according to clause 1, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event in the audio-video conference includes:

receiving a trigger event creating request sent by the first user, wherein the trigger event creating request comprises event description information corresponding to a trigger event;

and creating and storing event label information corresponding to the trigger event, wherein the event label information comprises event trigger time corresponding to the trigger event and the event description information, and the event trigger time is request time corresponding to the trigger event creation request.

3. The method according to clause 1, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event in the audio-video conference includes:

and in response to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference, creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event.

4. The method of clause 3, wherein the triggering event comprises at least one of:

a meeting start event or a meeting end event;

participant change events;

speaking events of participants;

the participants turn on or off the audio and video events;

the participant starts or stops sharing screen content events;

a network change event;

a predetermined time triggers an event.

5. The method of clause 3, wherein the method further comprises:

and determining the event description information according to a target event trigger rule corresponding to the trigger event.

6. The method of clause 3, wherein the method further comprises:

acquiring target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content;

and carrying out voice recognition and/or image recognition on the target multi-person audio and video content, and determining the event description information.

7. The method according to clause 1, wherein the returning the target multi-person audio-video content to the first user for playing comprises:

and returning the target multi-user audio and video content to the first user for playing, and sending at least one user identification information of at least one user corresponding to the target multi-user audio and video content to the first user for presentation.

8. The method of clause 7, wherein the method further comprises:

and performing voice recognition on the target multi-person audio and video content, and determining at least one user corresponding to the target multi-person audio and video content.

9. The method according to clause 7, wherein the method further includes, after receiving an event viewing request about the event tag information sent by a first user of the multiple users, obtaining target multi-person audio-video content corresponding to the event tag information from the multi-person audio-video content, and returning the target multi-person audio-video content to the first user for playing, the following steps are performed:

receiving a single event viewing request sent by the first user, wherein the single event viewing request comprises target user identification information;

and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-user audio and video content, and returning the single audio and video content to the first user for playing.

10. The method of clause 1, wherein the method further comprises:

after the audio and video conference is finished, storing the multi-person audio and video content, and generating and storing event list information corresponding to the audio and video conference according to a plurality of event tag information corresponding to the audio and video conference;

responding to a conference viewing request which is sent by the first user and relates to the audio and video conference in a historical conference list in which the first user participates, and returning the event list information to the first user for presentation;

and responding to an event viewing request which is sent by the first user and is about to the target event label information in the event list information, obtaining target multi-person audio and video content corresponding to the target event label information from the multi-person audio and video content, and returning the target multi-person audio and video content to the first user for playing.

11. The method of clause 1, wherein the event tag information further includes a first frame video frame corresponding to the target multi-person audio-video content.

12. The method of clause 1, wherein the event tag information further includes an event end time corresponding to the triggering event;

the obtaining of the target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content includes:

and acquiring target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content according to the event triggering time and the event ending time.

13. The method according to clause 12, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event in the audio-video conference includes:

receiving a trigger event creating request and a trigger event ending request sent by the first user, wherein the trigger event creating request or the trigger event ending request comprises event description information corresponding to the trigger event;

and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event description information corresponding to the trigger event and event end time corresponding to the trigger event, the event trigger time is first request time corresponding to the trigger event creation request, and the event end time is second request time corresponding to the trigger event ending request.

14. The method according to clause 12, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event in the audio-video conference includes:

and in response to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference and an end trigger event corresponding to the trigger event, creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event end time corresponding to the end trigger event and event description information corresponding to the trigger event.

15. A method for presenting event label information is applied to a first user equipment terminal, wherein the method comprises the following steps:

16. The method of clause 15, wherein the method further comprises:

and responding to a trigger event creating operation executed by the first user, acquiring event description information corresponding to the trigger event input by the first user, generating a trigger event creating request and sending the trigger event creating request to the network equipment, wherein the trigger event creating request comprises the event description information.

17. The method according to clause 15, wherein a starting point of the conference time axis is conference starting time of the audio/video conference or time when a first user has joined the audio/video conference at the last time, and an ending point of the conference time axis is current time or predetermined conference event ending time corresponding to the audio/video conference.

18. The method of clause 15, wherein the method further comprises:

and automatically reducing the audio volume of the audio and video conference in the playing process of the target multi-person audio and video content.

19. The method of clause 15, wherein the method further comprises:

and in the playing process of the target multi-person audio and video content, responding to the volume adjustment operation of the first user aiming at the target multi-person audio and video content or the audio and video conference, and adjusting the target multi-person audio and video content or the audio volume of the audio and video conference.

20. The method of clause 15, wherein the method further comprises any one of:

continuously recording the input audio of the first user in the playing process of the target multi-person audio and video content;

and stopping recording the input audio of the first user in the playing process of the target multi-person audio and video content.

21. The method according to clause 15, wherein the receiving of the target multi-person audio-video content corresponding to the event tag information returned by the network device and playing of the target multi-person audio-video content includes:

receiving target multi-person audio and video content corresponding to the event tag information returned by the network equipment and at least one user identification information of at least one user corresponding to the target multi-person audio and video content;

playing the target multi-person audio and video content and presenting the at least one user identification information;

and in response to the event viewing operation of the first user aiming at the target user identification information in the at least one user identification information, stopping playing the target multi-user audio and video content, and playing the single-user audio and video content corresponding to the user identified by the target user identification information.

22. The method of clause 21, wherein the playing the single audiovisual content corresponding to the user identified by the target user identification information comprises:

and acquiring single audio and video content corresponding to the user identified by the target user identification information from the target multi-user audio and video content, and playing the single audio and video content.

23. The method of clause 21, wherein the playing the single audiovisual content corresponding to the user identified by the target user identification information comprises:

generating a single-person event viewing request and sending the single-person event viewing request to the network equipment, wherein the single-person event viewing request comprises the target user identification information;

and receiving the single audio and video content which is returned by the network equipment and corresponds to the user identified by the target user identification information, and playing the single audio and video content.

24. The method of clause 15, wherein the method further comprises:

after the audio and video conference is finished, responding to conference viewing operation of the first user aiming at the audio and video conference in a history conference list in which the first user participates, generating a conference viewing request about the audio and video conference and sending the conference viewing request to the network equipment;

receiving and presenting event list information corresponding to the audio and video conference returned by the network equipment;

responding to an event viewing operation of the first user for target event label information in the event list information, generating an event viewing request about the target event label information and sending the event viewing request to the network equipment;

and receiving and presenting target multi-person audio and video content corresponding to the target event tag information from the multi-person audio and video content returned by the network equipment according to the target event trigger time in the target event tag information.

25. The method according to clause 15, wherein the event tag information further comprises a first frame video picture corresponding to the target multi-person audio-video content.

26. The method of clause 15, wherein the event tag information further includes an event end time corresponding to the triggering event.

27. The method of clause 26, wherein the method further comprises:

responding to a trigger event creating operation and a trigger event ending operation executed by the first user, acquiring event description information corresponding to the trigger event input by the first user, generating a trigger event creating request and a trigger event ending request, and sending the trigger event creating request and the trigger event ending request to the network equipment, wherein the trigger event creating request or the trigger event ending request comprises the event description information.

28. A method of presenting event tag information, wherein the method comprises:

the method comprises the steps that network equipment records multi-user audio and video contents corresponding to a plurality of users in the process of an audio and video conference participated by the plurality of users;

the network equipment responds to a trigger event in the audio and video conference, and creates and stores event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

the network equipment sends the event label information to first user equipment;

the first user equipment receives the event label information sent by the network equipment and presents the event label information on a conference time axis corresponding to the audio and video conference;

the first user equipment generates an event viewing request related to the event label information and sends the event viewing request to the network equipment in response to an event viewing operation executed by the first user on the event label information;

the network equipment receives the event viewing request sent by the first user equipment, obtains target multi-person audio and video content corresponding to the event tag information from the multi-person audio and video content, and returns the target multi-person audio and video content to the first user;

and the first user equipment receives the target multi-person audio and video content returned by the network equipment and plays the target multi-person audio and video content.

29. An apparatus for presenting event tag information, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of clauses 1 to 27.

30. A computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform the operations of the method of any of clauses 1 to 27.

31. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method according to any of clauses 1 to 27 when executed by a processor.

Claims

receiving an event viewing request about the event tag information sent by a first user of the multiple users, obtaining target multi-user audio and video content corresponding to the event tag information from the multi-user audio and video content, and returning the target multi-user audio and video content to the first user for playing;

wherein, the creating and storing event tag information corresponding to the trigger event in response to the trigger event in the audio and video conference includes:

responding to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference, and creating and storing event tag information corresponding to the trigger event;

wherein the method further comprises:

and performing voice recognition and/or image recognition on the target multi-person audio and video content, and determining the event description information, wherein the event description information comprises at least one item of event titles, event key discussion matters, event related personnel and event related service information.

2. The method of claim 1, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event comprises:

3. The method of claim 1, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event comprises:

4. The method of claim 3, wherein the triggering event comprises at least one of:

a meeting start event or a meeting end event;

participant change events;

speaking events of participants;

the participants turn on or off the audio and video events;

the participant starts or stops sharing screen content events;

a network change event;

a predetermined time triggers an event.

5. The method of claim 3, wherein the method further comprises:

6. The method of claim 1, wherein said returning said target multi-person audio-video content to said first user for playing comprises:

7. The method of claim 6, wherein the method further comprises:

8. The method of claim 6, wherein the method further comprises the following steps, executed after receiving an event viewing request about the event tag information sent by a first user of the multiple users, obtaining target multi-person audio-video content corresponding to the event tag information from the multi-person audio-video content, and returning the target multi-person audio-video content to the first user for playing:

9. The method of claim 1, wherein the method further comprises:

10. The method of claim 1, wherein the event tag information further comprises a first frame video frame corresponding to the target multi-person audio-video content.

11. The method of claim 1, wherein the event tag information further includes an event end time corresponding to the trigger event;

12. The method of claim 11, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event comprises:

and creating and storing event tag information corresponding to the trigger event, wherein the event tag information comprises event trigger time corresponding to the trigger event, event description information corresponding to the trigger event and event end time corresponding to the trigger event, the event trigger time is first request time corresponding to the trigger event creation request, and the event end time is second request time corresponding to the trigger event expiration request.

13. The method of claim 11, wherein the creating and saving event tag information corresponding to a trigger event in the audio-video conference in response to the trigger event comprises:

14. A method for presenting event label information is applied to a first user equipment terminal, wherein the method comprises the following steps:

receiving event tag information sent by network equipment in a conference process of an audio and video conference participated by a plurality of users, and presenting the event tag information on a conference time axis corresponding to the audio and video conference, wherein the event tag information is created by the network equipment in response to a trigger event which is automatically triggered in the audio and video conference according to one or more preset event trigger rules, the event tag information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event, the network equipment obtains target multi-person audio and video content corresponding to the event tag information from multi-person audio and video content, performs voice identification and/or image identification on the target multi-person audio and video content, and determines the event description information, wherein the event description information comprises at least one of an event title, an event key discussion item, event related personnel and event related service information;

15. The method of claim 14, wherein the method further comprises:

16. The method according to claim 14, wherein a starting point of the conference time axis is a conference starting time of the audio and video conference or a time when a first user has joined the audio and video conference at the last time, and an ending point of the conference time axis is a current time or a predetermined conference event ending time corresponding to the audio and video conference.

17. The method of claim 14, wherein the method further comprises:

18. The method of claim 14, wherein the method further comprises:

and in the playing process of the target multi-person audio and video content, responding to the volume adjustment operation of the first user aiming at the target multi-person audio and video content or the audio and video conference, and adjusting the audio volume of the target multi-person audio and video content or the audio and video conference.

19. The method of claim 14, wherein the method further comprises any one of:

20. The method of claim 14, wherein the receiving of the target multi-person audio-video content corresponding to the event tag information returned by the network device and playing the target multi-person audio-video content comprises:

playing the target multi-person audio and video content, and presenting the at least one user identification information;

21. The method of claim 20, wherein the playing the solo audio-video content corresponding to the user identified by the target user identification information comprises:

22. The method of claim 20, wherein the playing the solo audio-video content corresponding to the user identified by the target user identification information comprises:

23. The method of claim 14, wherein the method further comprises:

24. The method of claim 14, wherein the event tag information further comprises a first frame video frame corresponding to the target multi-person audio-video content.

25. The method of claim 14, wherein the event tag information further includes an event end time corresponding to the triggering event.

26. The method of claim 25, wherein the method further comprises:

27. A method of presenting event tag information, wherein the method comprises:

the network equipment responds to a trigger event which is automatically triggered according to one or more preset event trigger rules in the audio and video conference, and creates and stores event label information corresponding to the trigger event, wherein the event label information comprises event trigger time corresponding to the trigger event and event description information corresponding to the trigger event;

the first user equipment receives the target multi-person audio and video content returned by the network equipment and plays the target multi-person audio and video content;

wherein the method further comprises:

the network equipment acquires target multi-person audio and video contents corresponding to the event tag information from the multi-person audio and video contents;

28. An apparatus for presenting event tag information, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 26.

29. A computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform operations of any of the methods of claims 1-26.