CN107317817B

CN107317817B - Method for generating index file, method for identifying speaking state of user and terminal

Info

Publication number: CN107317817B
Application number: CN201710542650.2A
Authority: CN
Inventors: 刘滔; 陈群; 张丛武
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2017-07-05
Filing date: 2017-07-05
Publication date: 2021-03-16
Anticipated expiration: 2037-07-05
Also published as: CN107317817A

Abstract

The invention provides a method for generating an index file based on an HLS protocol by a server, which comprises the steps of generating a plurality of audio and video files with preset playing time length when the server mixes sound; writing the uniform resource locator corresponding to the audio and video file into an index file; the unique user identification of the speaking user in the audio and video file is written into an index file, a mapping relation between the uniform resource locator and the unique user identification is established, and through the mapping relation, a terminal can simultaneously prompt who the speaking user is when playing the audio and video file after acquiring the index file, so that the user experience can be effectively improved. Correspondingly, a method for identifying the speaking state of the user by the terminal according to the index file and the corresponding terminal are also provided.

Description

Method for generating index file, method for identifying speaking state of user and terminal

Technical Field

The invention relates to the technical field of internet, in particular to a method for generating an index file based on an HLS protocol by a server, a method for identifying a speaking state of a user by a terminal according to the index file and a corresponding terminal.

Background

The hls (HTTP Live streaming) protocol is a streaming media transmission protocol based on HTTP and promoted by Apple inc (Apple), can realize Live broadcast and on-demand of streaming media, is mainly applied to the iOS system, and provides an audio/video Live broadcast and on-demand scheme for iOS devices (such as iPhone and iPad). The HLS protocol is the basis of the H5(HTML 5) audio and video live broadcast technology of the mobile terminal. However, when the mobile terminal plays the audio/video through the H5 page using the HLS protocol, the speaking user in the played audio/video cannot be identified and distinguished (for example, other users cannot know which user name the current speaking user corresponds to through the page in the live broadcast), so that the user experience is poor.

Disclosure of Invention

The aim of the present invention is to solve at least one of the above-mentioned technical drawbacks, in particular the technical drawback of not being able to identify and distinguish the speaking user in the played audio video.

The invention provides a method for generating an index file based on an HLS protocol by a server, which comprises the following steps:

generating a plurality of audio and video files with preset playing time duration when the server performs sound mixing;

writing the uniform resource locator corresponding to the audio and video file into an index file;

and writing the unique user identification of the speaking user in the audio and video file into an index file, and establishing a mapping relation between the uniform resource locator and the unique user identification.

In one embodiment, the method further comprises the following steps:

providing and updating the index file to a terminal so that the terminal performs the following operations:

and acquiring a uniform resource locator corresponding to the audio and video file and the unique user identifier, and identifying the speaking state of the user corresponding to the unique user identifier in real time through a user interface when the audio and video file is played.

In one embodiment, the user unique identifier is written to the index file in the form of an annotation.

In one embodiment, the mapping relationship between the uniform resource locator and the user unique identifier includes: and the annotation recorded with the unique user identifier is attached to the corresponding uniform resource locator according to a preset position relationship.

In one embodiment, the preset positional relationship includes: the annotation is located in a line subsequent to the uniform resource locator to which it corresponds.

According to the method for generating the index file based on the HLS protocol by the server, a plurality of audio and video files with preset playing time length are generated when the server performs sound mixing; writing the uniform resource locator corresponding to the audio and video file into an index file; the unique user identification of the speaking user in the audio and video file is written into an index file, a mapping relation between the uniform resource locator and the unique user identification is established, and through the mapping relation, a terminal can simultaneously prompt who the speaking user is when playing the audio and video file after acquiring the index file, so that the user experience can be effectively improved.

The invention also provides a method for identifying the speaking state of the user by the terminal according to the index file, which comprises the following steps:

the method comprises the steps that a terminal acquires an index file based on an HLS protocol from a server, wherein uniform resource locators of a plurality of audio and video files and a user unique identifier of a speaking user in the audio and video files are recorded in the index file, the playing time of the audio and video files is preset time, and a mapping relation is established between the uniform resource locators and the user unique identifier;

and extracting the uniform resource locator and the unique user identifier from the index file, playing the audio and video file positioned at the network position corresponding to the uniform resource locator, and identifying the speaking state of the user corresponding to the unique user identifier in real time through a user interface.

In one embodiment, the user unique identifier is recorded in the index file in the form of an annotation.

In one embodiment, the terminal sequentially extracts the corresponding annotations every the preset time according to the arrangement order of the uniform resource locators of the plurality of video files.

The present invention also provides a terminal, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: method for terminal to identify speaking state of user according to index file in executing any embodiment

According to the method for identifying the speaking state of the user by the terminal according to the index file and the corresponding terminal, the terminal acquires the index file based on the HLS protocol from the server, the uniform resource locators of a plurality of audio and video files and the unique user identification of the speaking user in the audio and video files are recorded in the index file, the playing time of the audio and video files is preset time, and a mapping relation is established between the uniform resource locators and the unique user identification; and extracting the uniform resource locator and the unique user identifier from the index file, playing the audio and video file at a network position corresponding to the uniform resource locator, and identifying the speaking state of the user corresponding to the unique user identifier in real time through a user interface (such as a page based on H5). The terminal can identify the speaking state of the user corresponding to the unique user identification in real time through the user interface when playing the audio and video file by acquiring the index file from the server, so that the user who speaks is prompted, and the user experience can be effectively improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of a method for a server to generate an index file based on the HLS protocol according to one embodiment;

fig. 2 is a flowchart of a method for identifying a speaking status of a user by a terminal according to an index file.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Fig. 1 is a flowchart of a method for generating an index file based on the HLS protocol by a server according to an embodiment.

step S110: and generating a plurality of audio and video files with preset playing time duration when the server performs sound mixing. Mixing is a process of merging audio information from multiple sources into an integrated audio information, for example, in live broadcasting, merging sound information from multiple speaking users (each speaking user UID corresponds to one source) into one sound information. Here, the audio and video file may be an audio file, such as a file in MP3 format, or a video file with audio information, such as a TS file.

Hls (HTTP Live streaming) is an HTTP protocol-based streaming solution developed by apple for iOS system-based mobile devices such as iPhone, iPod, iTouch, and iPad. In HLS technology, a Web server provides near real-time audio-video streams to clients. But the standard HTTP protocol is used in the process of use, so that the on-demand and live broadcast can be directly provided on the common HTTP application by using the HLS technology. Video related applications in the App Store basically apply this technology.

The basic principle of the technology is to cut a video file or a video stream into a plurality of small pieces (namely, the audio and video file such as a TS file) and establish an index file (such as an M3U8 file). The supported video stream is encoded with h.264 and the audio stream is encoded with AAC.

The preset time duration is preset by one end of the server, for example, 2 seconds, that is, when the server performs audio mixing, the server splits audio and video information after audio mixing to generate a plurality of audio and video files with the play time duration of 2 seconds, for example, TS files.

Step S120: and writing the uniform resource locator corresponding to the audio and video file into an index file.

Each audio/video file generated by the server is stored in a certain network location, for example, on the server. Therefore, each audio/video file has a corresponding uniform resource locator URL to identify its location. For example, URLs of n audio and video files, namely 0.ts, 1.ts, … and n.ts, are written into the index file, and the URLs are as follows in sequence:

http://…/0.ts

http://…/1.ts

…

http://…/n.ts

when the URLs of the audio and video files are written into the index file, the arrangement sequence of the URLs in the index file can be arranged according to the playing sequence of the audio and video files. For example, if the playing sequence of the audio-video file is 0.ts, 1.ts, …, n.ts, the arrangement sequence of the URLs in the index file is as follows:

http://…/0.ts

http://…/1.ts

…

http://…/n.ts

the index file, which is composed of data files, is an indexed sequential file, and may be an M3U8 file, for example. The M3U8 file refers to an M3U file in UTF-8 encoding format. The M3U file records an index plain text file, and when it is opened, the playing software does not play it, but finds the network address of the corresponding audio/video file according to its index to play online. In the following description, the index file is exemplified by an M3U8 file.

Step S130: and writing the unique user identification of the speaking user in the audio and video file into an index file, and establishing a mapping relation between the uniform resource locator and the unique user identification.

When the server mixes sound, the user unique identifier UID of the speaking user corresponding to the audio information of each source can be determined through the audio information of multiple sources, because the audio information of each source corresponds to a unique speaking user, namely the UID of the unique speaking user. Therefore, the server may determine UIDs (which may be 1 or more, and may also be 0) of speaking users corresponding to each of the audio-video files when mixing to generate each of the audio-video files. Thus, in each of the audio-video files, there may be a speaking user speaking or there may be no one speaking. For example, only one speaking user speaks in some audio/video files, more than one speaking user speaks in some audio/video files, and no user speaks in some audio/video files.

Therefore, when the audio and video file is generated, after the corresponding URL is determined, the UID of the speaking user in the audio and video file is written into the index file, and the mapping relation between the URL and the unique identification of the user is established, so that the terminal can simultaneously prompt who the speaking user is when the audio and video file is played after acquiring the index file.

In this embodiment, the user unique identifier may be written in the form of an annotation to the index file. The M3U8 file is a text file consisting of a number of individual lines, each line distinguished by carriage returns/linefeeds. Each row may be a URL or a string that begins with a "#" sign, and a space can only exist at the separation of different elements in a row. A URL represents a media segment or "variant playlist file" (at most one level of nesting, i.e. one M3U8 file nesting into another M3U 8). The ones beginning with the "# EXT" character represent a label "tag", otherwise the ones beginning with the other "#" character represent annotations. Therefore, in the present embodiment, a specific character string may be attached immediately after the "#" character to indicate that the piece of annotation is for recording the UID of the speaking user in the audio-video file. The specific character string can be appointed by self, and only the specific character string can be distinguished from the existing label. For example, a specific character string may be preset as hjuudioext, and if a speaking user in an audio/video file has UID1 and UID2, a comment "# hjuudioext UID1, UID 2" corresponding to the audio/video file may be written in the index file.

In the index file, the mapping relation between the URL and the UID is used for identifying the UID corresponding to the audio and video files, namely, the terminal can know which UIDs correspond to the URL corresponding to each audio and video file. Thus, there are various embodiments of this mapping relationship, which are illustrated below:

the first embodiment is as follows:

the note bearing the UID is attached to its corresponding URL according to a preset positional relationship. The preset positional relationship may be: the annotation is located one line after its corresponding URL. For example, if there are n audio-video files 0.ts, 1.ts, …, n.ts, their URLs and corresponding UIDs are as follows:

http://…/0.ts；UID1，UID2；

http://…/1.ts；UID3；

…

http://…/n.ts；UID1，UID3；

then, in the index file, the preset positional relationship between the URL and the comment in which the corresponding UID is recorded is as follows:

http://…/0.ts

#HJUIDAUDIOEXT UID1，UID2

http://…/1.ts；

#HJUIDAUDIOEXT UID3

…

http://…/n.ts

#HJUIDAUDIOEXT UID1，UID3

wherein, "# hjuudioext UID1, UID 2", "# hjuudioext 3", …, "# hjuudioext UID1, UID 3" are all comments corresponding to the respective URLs. As can be seen, the URLs and annotations form a set of information (e.g., http:// …/0.ts and # HJUIDAUDIOEXT UID1, UID2), each URL has its corresponding set of information, and the grouping information between different URLs is distinguished from one another. Therefore, the comment is not necessarily located at the next line of the URL corresponding to the comment, but may also be located at the M last line of the URL corresponding to the comment, or may even be located at the previous line or M last line of the URL corresponding to the comment, as long as each URL and the corresponding comment form a set of information, and each set of information between different URLs is distinguished from each other.

Further, if more than one speaking user is included in the audio-video file, and the starting time and the ending time of the speaking of different speaking users are not consistent, in order to more accurately identify which speaking users exist at each moment, it may be considered that the starting time and the ending time of the speaking of the corresponding speaking user are written into the annotation. For example, the UID corresponding to the URL http:// …/0.ts includes UID1 and UID2, the Start time and the End time of the speech of the speaking user corresponding to UID1 are Start1 and End1, respectively, and the Start time and the End time of the speech of the speaking user corresponding to UID2 are Start2 and End2, respectively, and the preset positional relationship between the URL and the comment in which the corresponding UID is recorded is as follows:

http://…/0.ts

#HJUIDAUDIOEXT UID1，Start1，End1|UID2Start2，End2

if the speaking user speaks intermittently, for example, UID1 speaks intermittently from Start1 to End1, then speaks again for some time from Start1 'to End 1', and these time points are all within a preset time period, the preset positional relationship between the URL and the comment recorded with the corresponding UID is as follows:

http://…/0.ts

#HJUIDAUDIOEXT UID1，Start1，End1|UID1，Start1’，End1’|UID2Start2，End2

example two:

the corresponding URL can be written directly in the annotation.

Continuing with the above example, the UID corresponding to the URL of http:// …/0.ts is UID1 and UID2, and the corresponding comments may be "# HJUIDAUDIOEXT http:// …/0.tsUID1, UID 2".

Therefore, the mapping relationship may have various embodiments, which are not repeated herein.

In this embodiment, after step S130, the method may further include the steps of: providing and updating the index file to a terminal so that the terminal performs the following operations:

and acquiring a uniform resource locator and the user unique identifier corresponding to the audio and video file, and displaying a user icon corresponding to the user unique identifier in real time through a user interface when the audio and video file is played.

When the server generates the index file, the index file is continuously updated according to the continuous increase of the audio and video files. After the terminal acquires the index file from the server, according to the acquired URL and the corresponding UID, when the audio and video file is played, the speaking state of the user corresponding to the UID is identified in real time through a user interface (for example, a page based on H5). For example, when a certain audio/video file is played, the speaking user has UID1 and UID2, and the speaking states of the users of UID1 and UID2 can be identified in real time on the user interface at the same time during playing, for example, the user head portraits of UID1 and UID2 can be shown or lightened in real time to show that UID1 and UID2 are speaking, or the speaking icons corresponding to UID1 and UID2 can be shown or lightened in real time to show that UID1 and UID2 are speaking, so long as the viewer knows which user is speaking, which is not repeated herein.

As a simple example of the method of the present invention, if the user clicks a button icon similar to "i want to speak" on the user interface, the server will default that the user is speaking and identify the user interface regardless of whether the user actually makes a sound. When the user turns off the button icon of "i want to speak", the default user gives up speaking. The method of the invention can be applied to multi-person voice interaction scenes, such as various voice interaction software or game software. Taking the discovery APP of YY corporation (huaduo network technology limited), in this APP, the host and the guest of the upper seat communicate and interact with each other through voice, and when the host or the guest of the upper seat chooses to speak, the speaking state of the host or the guest of the upper seat can be identified in real time on the user interface of the game client. By applying the method of the invention, the game client side of H5 can synchronously update the user speaking state icon on the corresponding host or guest seat when playing voice, thereby improving the experience of application. Taking a multiplayer battle game as an example, if a team member chooses to speak, the speaking state of the team member can be identified on the game interface, and the game experience is improved.

Fig. 2 is a flowchart of a method for identifying a speaking status of a user by a terminal according to an index file. Corresponding to the method for generating the index file based on the HLS protocol by the server, the following method for identifying the speaking state of the user by the terminal according to the index file is also provided, and the method comprises the following steps:

step S210: the method comprises the steps that a terminal obtains an index file based on an HLS protocol from a server, wherein uniform resource locators of a plurality of audio and video files and a user unique identification of a speaking user in the audio and video files are recorded in the index file, the playing time length of the audio and video files is preset time length, and a mapping relation is established between the uniform resource locators and the user unique identification.

The audio/video files are generated when the server mixes the audio and video files, and the step S110 in the method for generating the index file based on the HLS protocol by the server is described in detail.

Each audio/video file generated by the server is stored in a certain network location, for example, on the server. Therefore, each audio/video file has a corresponding uniform resource locator URL to identify its location. For example, the index file includes URLs of n audio/video files, i.e. 0.ts, 1.ts, …, and n.ts, and the URLs are as follows in sequence:

http://…/0.ts

http://…/1.ts

…

http://…/n.ts

the arrangement order of the URLs in the index file may be arranged according to the playing order of the audio-video files. For example, if the playing sequence of the audio-video file is 0.ts, 1.ts, …, n.ts, the arrangement sequence of the URLs in the index file is as follows:

http://…/0.ts

http://…/1.ts

…

http://…/n.ts

Therefore, when the server generates the audio and video file, after the corresponding URL is determined, the UID of the speaking user in the audio and video file is written into the index file, and the mapping relation between the URL and the unique user identification is established, so that the terminal can simultaneously prompt who the speaking user is when the audio and video file is played after acquiring the index file.

The user unique identifier is recorded in the index file in an annotated form, and the mapping relation between the uniform resource locator and the user unique identifier comprises the following steps: and the annotation recorded with the unique user identifier is attached to the corresponding uniform resource locator according to a preset position relationship. Specifically, refer to the description in step S130 in the method for generating the index file based on the HLS protocol by the server, which is not repeated herein.

Step S220: and extracting the uniform resource locator and the unique user identifier from the index file, playing the audio and video file positioned at the network position corresponding to the uniform resource locator, and identifying the speaking state of the user corresponding to the unique user identifier in real time through a user interface.

In some embodiments, the terminal sequentially extracts the corresponding annotations every the preset time length (for example, 2 seconds) according to the arrangement order of the uniform resource locators of the plurality of video files. As described above, since the arrangement order of the URLs in the index file can be arranged according to the playing order of the audio/video file, the UID-recorded comment is attached to the corresponding URL according to a preset positional relationship, for example, in the index file, the preset positional relationship between the URL and the UID-recorded comment is as follows:

http://…/0.ts

#HJUIDAUDIOEXT UID1，UID2

http://…/1.ts；

#HJUIDAUDIOEXT UID3

…

http://…/n.ts

#HJUIDAUDIOEXT UID1，UID3

if the preset time is 2 seconds, when the terminal plays the audio and video files according to the URL arrangement sequence, the timer can be set to acquire the comments once every 2 seconds, so that the corresponding speaking user UID can be acquired in real time during playing, and the speaking state of the user corresponding to the UID is identified in real time through the user interface.

After the terminal acquires the index file from the server, according to the acquired URL and the corresponding UID, when the audio and video file is played, the speaking state of the user corresponding to the UID is identified in real time through a user interface (for example, a page based on H5). For example, when a certain audio/video file is played, the speaking user has UID1 and UID2, and the speaking states of the users of UID1 and UID2 can be identified in real time on the user interface at the same time during playing, for example, the user head portraits of UID1 and UID2 can be shown or lightened in real time to show that UID1 and UID2 are speaking, or the speaking icons corresponding to UID1 and UID2 can be shown or lightened in real time to show that UID1 and UID2 are speaking, so long as the viewer knows which user is speaking, which is not repeated herein.

For example, if the user clicks on a button icon like "i want to speak" on the user interface, the server defaults to the user speaking and identifies it on the user interface, regardless of whether the user actually uttered the sound. When the user turns off the button icon of "i want to speak", the default user gives up speaking. The method of the invention can be applied to multi-person voice interaction scenes, such as various voice interaction software or game software. Taking the discovery APP of YY corporation (huaduo network technology limited), in this APP, the host and the guest of the upper seat communicate and interact with each other through voice, and when the host or the guest of the upper seat chooses to speak, the speaking state of the host or the guest of the upper seat can be identified in real time on the user interface of the game client. By applying the method of the invention, the game client side of H5 can synchronously update the user speaking state icon on the corresponding host or guest seat when playing voice, thereby improving the experience of application. Taking a multiplayer battle game as an example, if a team member chooses to speak, the speaking state of the team member can be identified on the game interface, and the game experience is improved.

Corresponding to the above method for identifying a speaking status of a user according to an index file by a terminal, the following provides a terminal comprising one or more processors, a memory, and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, and the one or more programs are configured to: and executing the method for identifying the speaking state of the user by the terminal according to the index file in any embodiment. The terminal can be a fixed terminal, such as a desktop computer; the mobile terminal can also be a mobile terminal, such as a notebook computer, a tablet computer, a mobile phone and other intelligent terminals.

According to the method and the terminal for identifying the speaking state of the user by the terminal according to the index file, the terminal acquires the index file based on the HLS protocol from the server, the uniform resource locators of a plurality of audio and video files and the unique user identification of the speaking user in the audio and video files are recorded in the index file, the playing time of the audio and video files is preset time, and a mapping relation is established between the uniform resource locators and the unique user identification; and extracting the uniform resource locator and the unique user identifier from the index file, playing the audio and video file at a network position corresponding to the uniform resource locator, and identifying the speaking state of the user corresponding to the unique user identifier in real time through a user interface (such as a page based on H5). The terminal can identify the speaking state of the user corresponding to the unique user identification in real time through the user interface when playing the audio and video file by acquiring the index file from the server, so that the user who speaks is prompted, and the user experience can be effectively improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for generating an index file based on an HLS protocol by a server is characterized by comprising the following steps:

when the server mixes sound, a plurality of audio and video files with preset playing duration are generated, index files are established, and the unique user identification of the speaking user corresponding to each audio and video file is determined according to the source of the audio information;

writing uniform resource locators corresponding to the audio and video files into the index files;

writing the unique user identifier of the speaking user in the audio and video file into an index file in an annotation form, and establishing a mapping relation between the uniform resource locator and the unique user identifier;

2. The method for generating the index file based on the HLS protocol by the server according to claim 1, wherein the mapping relationship between the uniform resource locator and the user unique identifier comprises: and the annotation recorded with the unique user identifier is attached to the corresponding uniform resource locator according to a preset position relationship.

3. The method for generating the index file based on the HLS protocol by the server according to claim 2, wherein the preset location relationship comprises: the annotation is located in a line subsequent to the uniform resource locator to which it corresponds.

4. A method for a terminal to identify the speaking state of a user according to an index file is characterized by comprising the following steps:

a terminal acquires an index file based on an HLS protocol from a server, wherein a uniform resource locator of an audio and video file and a user unique identifier of a speaking user in the audio and video file are recorded in the index file in an annotation mode, the playing time of the audio and video file is preset time, and a mapping relation is established between the uniform resource locator and the user unique identifier;

5. The method according to claim 4, wherein the mapping relationship between the uniform resource locator and the user unique identifier comprises: and the annotation recorded with the unique user identifier is attached to the corresponding uniform resource locator according to a preset position relationship.

6. The method as claimed in claim 4, wherein the terminal sequentially extracts the corresponding annotations every the preset duration according to the ranking order of the uniform resource locators of the plurality of video files.

7. A terminal for identifying a speaking status of a user according to an index file, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the method for identifying the speaking status of the user according to the index file by the terminal according to any one of claims 4 to 6 is carried out.