CN110572716A

CN110572716A - Multimedia data playing method, device and storage medium

Info

Publication number: CN110572716A
Application number: CN201910927850.9A
Authority: CN
Inventors: 平思嘉; 沈艳慧; 张仁寿; 贝俊达; 梁志杰; 徐子闻; 林婧; 周文翊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2019-12-13
Anticipated expiration: 2039-09-27
Also published as: CN110572716B

Abstract

The application discloses a multimedia data playing method, a multimedia data playing device and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: playing the multimedia data based on the playing interface and displaying comment information of the multimedia data; when the skip operation of the comment information is detected at a first play time point of the multimedia data, a second play time point associated with the comment information is obtained; and jumping the multimedia data from the first playing time point to the second playing time point. The multimedia data can be played from the playing time point by jumping to the playing time point associated with the comment information only by triggering the jumping operation of the comment information in the multimedia data, and a user does not need to manually drag the progress bar to the position of the target playing time point, so that the problem of error of the positioned playing time point is avoided, and the positioning accuracy is improved.

Description

multimedia data playing method, device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a multimedia data playing method, apparatus, and storage medium.

background

With the rapid development of computer technology and the increasing entertainment requirements of users, various types of multimedia data are widely spread in the internet, such as video data, audio data, etc., and playing multimedia data has become a function commonly used by people.

In the related art, a terminal plays multimedia data based on a play interface. And the playing interface comprises a time axis, and a progress bar for representing the current playing progress is arranged on the time axis. When the user needs to jump the multimedia data to the target playing time point, the user can manually drag the progress bar to the position of the target playing time point.

however, there is an error in the manual operation of the user, and the progress bar may be dragged to a position near the target playing time point, but the progress bar cannot be accurately dragged to the target playing time point, so that the positioned playing time point is incorrect, and the positioning accuracy is low.

Disclosure of Invention

the embodiment of the application provides a multimedia data playing method, a multimedia data playing device and a storage medium, which can avoid the problem of wrong playing time points of positioning and improve the positioning accuracy. The technical scheme is as follows:

In one aspect, a multimedia data playing method is provided, and the method includes:

Playing multimedia data based on a playing interface, and displaying comment information of the multimedia data;

When the skip operation of the comment information is detected at a first playing time point of the multimedia data, a second playing time point associated with the comment information is obtained;

Skipping the multimedia data from the first play time point to the second play time point.

In another aspect, a multimedia data playing apparatus is provided, the apparatus comprising:

the display module is used for playing multimedia data based on a playing interface and displaying comment information of the multimedia data;

the time point acquisition module is used for acquiring a second play time point related to the comment information when the skip operation of the comment information is detected at a first play time point of the multimedia data;

And the skipping module is used for skipping the multimedia data from the first playing time point to the second playing time point.

optionally, the display module is configured to display a return option based on the play interface;

and the skipping module is used for skipping the multimedia data from the second playing time point to the first playing time point when the triggering operation of the return option is detected.

optionally, the time point obtaining module includes:

The first query unit is used for querying the incidence relation between the comment information and the playing time point to obtain the playing time point associated with the comment information as the second playing time point; alternatively, the first and second electrodes may be,

The multimedia data is video data, and the second query unit is used for querying the incidence relation between the comment information and the video frame to obtain the video frame associated with the comment information, and determining the playing time point corresponding to the video frame as the second playing time point.

Optionally, the apparatus further comprises:

the first establishing module is used for establishing the incidence relation between the comment information and the playing time point indicated by the vocabulary when the comment information comprises the vocabulary used for indicating the playing time point.

Optionally, the first establishing module is further configured to:

and when the comment information comprises words used for indicating playing time points and the playing time points are positioned between the starting time points and the ending time points of the multimedia data, establishing the incidence relation between the comment information and the playing time points indicated by the words.

Optionally, the apparatus further comprises:

The set acquisition module is used for acquiring an information set of the multimedia data, wherein the information set comprises at least one piece of text information and a playing time point corresponding to each piece of text information, and each piece of text information is matched with the voice information of the corresponding playing time point in the multimedia data;

The first establishing module is further configured to establish an association relationship between the comment information and a playing time point corresponding to any piece of text information when the similarity between the comment information and any piece of text information in the information set is greater than a preset similarity.

Optionally, the apparatus further comprises:

The word segmentation module is used for segmenting words of the comment information to obtain at least one word of the comment information;

The word segmentation module is further configured to segment each piece of text information in the information set to obtain at least one word of the text information;

And the first similarity obtaining module is used for obtaining the similarity between the comment information and the text information according to the at least one vocabulary of the comment information and the at least one vocabulary of the text information.

Optionally, the multimedia data is video data, and the set obtaining module is further configured to extract subtitle information included in each video frame and a playing time point corresponding to each video frame from the video data; alternatively, the first and second electrodes may be,

The multimedia data is audio data, and the set acquisition module is further configured to extract text information included in each frame of audio data and a playing time point corresponding to each frame of audio data from the audio data.

Optionally, the apparatus further comprises:

the video frame acquisition module is used for acquiring a plurality of video frames included in the video frame data;

and the second establishing module is used for establishing the incidence relation between the comment information and any one of the video frames when the similarity between the comment information and any one of the video frames is greater than the preset similarity.

Optionally, the apparatus further comprises:

The semantic feature determining module is used for segmenting the comment information to obtain at least one vocabulary of the comment information, and determining a first semantic feature of the comment information according to the at least one vocabulary;

the semantic feature determining module is further configured to obtain, for each of the plurality of video frames, a second semantic feature of the video frame based on a feature extraction model;

And the second similarity obtaining module is used for obtaining the similarity between the comment information and the video frame according to the first semantic feature and the second semantic feature.

In another aspect, a multimedia data playing apparatus is provided, the apparatus comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded and executed by the processor to implement the multimedia data playing method as described.

in another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement the multimedia data playing method as described.

the multimedia data playing method, the multimedia data playing device and the storage medium provided by the embodiment of the application play the multimedia data based on the playing interface, display the comment information of the multimedia data, acquire a second playing time point associated with the comment information when the skip operation of the comment information is detected at a first playing time point of the multimedia data, and skip the multimedia data from the first playing time point to the second playing time point. The method for positioning the playing time point is provided, the playing time point related to the comment information can be skipped to only by triggering the skipping operation of the comment information in the multimedia data, the multimedia data is played from the playing time point, the user does not need to manually drag the progress bar to the position of the target playing time point, the problem of error of the positioned playing time point is avoided, and the positioning accuracy is improved.

And by establishing the incidence relation between the comment information and the playing time points of the multimedia data, the multimedia data corresponding to any playing time point associated with the comment information of the multimedia can be determined, wherein the multimedia data corresponding to any playing time point is the key data concerned by the user who issues the comment information. Or by establishing an association relationship between the comment information and the video frame of the multimedia data, the video frame associated with the comment information of the multimedia data can be determined, and the video frame is key data concerned by a user who issues the comment information. The multiple users can evaluate the multimedia data from multiple angles, issue multiple comment information, and obtain more key data in the multimedia data through the multiple comment information, so that the user playing the multimedia data can jump to the corresponding key data through any comment information, and the process of playing the multimedia data is more intelligent.

And after the multimedia data is jumped from the first playing time point to the second playing time point, a return option can be displayed in the playing interface, when the triggering operation of the return option is detected, the multimedia data is jumped from the second playing time point to the first playing time point, and when the multimedia data at the second playing time point is not interested by the user, the multimedia data at the first playing time point can be returned by triggering the return option, so that the multimedia data playing process is fully considered for the user, the operation is simplified, and the multimedia data playing process is more intelligent.

drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

Fig. 2 is a flowchart of a method for establishing an association relationship according to an embodiment of the present application;

FIG. 3 is an interface diagram of a video frame provided by an embodiment of the present application;

Fig. 4 is a flowchart of a method for recognizing text information according to an embodiment of the present application;

fig. 5 is a flowchart of a method for segmenting comment information provided by an embodiment of the present application;

Fig. 6 is a flowchart illustrating a process of establishing an association relationship between bullet screen information and a playing time point according to an embodiment of the present application;

fig. 7 is a flowchart of a method for establishing an association relationship according to an embodiment of the present application;

fig. 8 is a flowchart of a multimedia data playing method according to an embodiment of the present application;

Fig. 9 is a schematic diagram of a playing interface provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a playback interface provided in an embodiment of the present application;

FIG. 11 is a schematic diagram of an interface for displaying a jump button according to an embodiment of the present application;

FIG. 12 is a schematic diagram of an interface for displaying a jump button according to an embodiment of the present application;

Fig. 13 is a schematic view of a playback interface provided in an embodiment of the present application;

FIG. 14 is a schematic diagram of an interface for displaying a return option provided by an embodiment of the present application;

fig. 15 is a schematic structural diagram of a multimedia data playing apparatus according to an embodiment of the present application;

Fig. 16 is a schematic structural diagram of another multimedia data playing apparatus according to an embodiment of the present application;

Fig. 17 is a schematic structural diagram of a terminal according to an embodiment of the present application;

Fig. 18 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

the embodiment of the present application performs Processing by using a Natural Language Processing (NLP) technique. The natural language processing technology is an important direction in the fields of computer science and artificial intelligence, researches various theories and methods for realizing effective communication between people and computers by using natural language, and is a science integrating linguistics, computer science and mathematics into a whole. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

the embodiment of the application provides a multimedia data playing method, which can jump to a playing time point associated with comment information when a jump operation on the comment information is detected.

the method provided by the embodiment of the application can be applied to a video playing scene, the terminal plays video data in the playing interface, and comment information of a user watching the video data on the video data is displayed in the playing interface.

Or, the method provided by the embodiment of the present application may be applied to an audio playing scene, where a terminal plays audio data in a playing interface, and displays comment information of a user on the audio data in the playing interface, and with the method provided by the embodiment of the present application, a playing time point associated with the comment information and the audio data may be determined, and when the user performs a skip operation on the comment information, the terminal may skip to the playing time point associated with the comment information, and start playing from the playing time point.

Or, the method provided by the embodiment of the application can be applied to a live broadcast scene, in the process that any user uses a terminal to perform live broadcast, the time point when the terminal starts live broadcast is taken as the starting time point, the live broadcast data of the terminal is cached, the live broadcast data and comment information of other users can be displayed in a playing interface, by adopting the method provided by the embodiment of the application, the playing time point of the comment information related to the historical live broadcast data can be determined, when other users perform skip operation on the comment information, the terminal can skip to the playing time point corresponding to the historical live broadcast data, and the historical live broadcast data can be played from the playing time point.

Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the embodiment of the present application includes a server 101 and a plurality of terminals 102.

The server 101 may be a server 101, a server 101 cluster composed of several servers 101, or a cloud computing service center. The terminal 102 may be a mobile phone, a tablet computer, a personal computer, etc. The terminal 102 may log into the server 101 based on a user identification, which may be a telephone number, a user nickname, a user account, etc., for representing a unique corresponding user.

The server 101 is used for storing at least one multimedia data, and the terminal 102 is used for playing any multimedia data. The multimedia data may be video data, audio data, or other types of data. When the terminal 102 plays any multimedia data, comment information of the user on the multimedia data can be published, and the comment information of the multimedia data is used for representing the evaluation of the user on the multimedia data. The server 101 can thus collect comment information of each terminal 102 on multimedia data, store the collected comment information in correspondence with the multimedia data, and can also provide comment information of any multimedia data to the terminal 102.

Wherein, the process of playing the multimedia data comprises the following steps: after receiving the multimedia data and the comment information of the multimedia data sent by the server 101, the terminal 102 plays the multimedia data and displays the comment information of the multimedia data during the playing process.

For example, the terminal 102 displays a plurality of multimedia data identifiers provided by the server 101, when the terminal 102 detects a trigger operation on any multimedia data identifier, a data acquisition request carrying the multimedia data identifier is sent to the server 101, when the server 101 receives the data acquisition request sent by the terminal 102, the multimedia data corresponding to the multimedia data identifier and comment information of the multimedia data are sent to the terminal 102, and after the terminal 102 receives the multimedia data and the comment information of the multimedia data, the multimedia data can be played and the comment information of the multimedia data is displayed in the playing process.

in addition, the process of posting comment information includes: in the process of playing the multimedia data, the terminal 102 acquires comment information input by the user, wherein the comment information represents the evaluation of the user on the multimedia data. The terminal 102 sends a comment publishing request to the server 101, wherein the comment publishing request comprises comment information and a multimedia data identifier, and after the server 101 receives the comment publishing request, the comment information and the multimedia data identifier are correspondingly stored.

On the basis of the above implementation environment, fig. 2 is a flowchart of a method for establishing an association relationship according to an embodiment of the present application. Referring to fig. 2, the method is applied to a terminal, and the method includes:

201. And obtaining comment information of the multimedia data.

The embodiment of the application explains the process of establishing the incidence relation between the comment information of the multimedia data and the playing time point by the terminal.

In one possible implementation, the process of establishing the association relationship is performed while the multimedia data is played. The method provided by the embodiment of the application is executed when the terminal starts to play the multimedia data.

In another possible implementation, the process of establishing the association is performed before the multimedia data is played. The terminal executes the method provided by the embodiment of the application aiming at any multimedia data in advance. The multimedia data can be multimedia data designated by a user, or multimedia data recommended by a server for a terminal, and the like.

In the embodiment of the application, the server may issue any multimedia data to the terminal, and may also issue comment information of the multimedia data to the terminal.

for example, the server stores the multimedia data identifier and the comment information correspondingly, and sends the comment information corresponding to the multimedia data identifier to the terminal when the server recommends any multimedia data identifier for the terminal.

It should be noted that the comment information of the multimedia data includes one or more pieces, in the embodiment of the present application, only one piece of comment information is taken as an example, and the process of establishing an association relationship for each piece of comment information is similar to this, which is not described in detail in the embodiment of the present application.

202. An information set of multimedia data is obtained.

The information set comprises at least one piece of text information and a playing time point corresponding to each piece of text information, and each piece of text information is matched with the voice information corresponding to the playing time point in the multimedia data.

the text information may be subtitle information in the multimedia data, text information generated by conversion of audio data in the multimedia data, or may also be information generated by other means for the multimedia data.

in one possible implementation, the multimedia data is video data, and the subtitle information included in each video frame and the corresponding playing time point of each video frame are extracted from the video data. Each video frame corresponds to a playing time point in the video data, so that after the subtitle information contained in the video frame is extracted, the subtitle information also corresponds to the playing time point of the video frame to which the subtitle information belongs.

In extracting a video frame from video data, FFmpeg (an open source cross-platform video and audio streaming scheme) may be employed to extract a video frame from video data.

Optionally, each video frame in the video data and the playing time point corresponding to each video frame are extracted, for each video frame in the video data, text information in each video frame is detected based on a text detection model, the detected text information is used as subtitle information of the video frame, and the text information detected in each video frame corresponds to the playing time point of the video frame.

Because the subtitle information in the video data is white font, the character detection model identifies the white area in the video frame, and determines the subtitle information in the video frame according to the identified white area. For example, as shown in fig. 3, a video frame in the video data includes text information of "is you", and the text information in the video frame can be identified by the character detection model.

In another possible implementation manner, the multimedia data is audio data, and text information included in each frame of audio data and a playing time point corresponding to each frame of audio data are extracted from the audio data. When text information contained in each frame of audio data is extracted from the audio data, a voice recognition algorithm is adopted to extract each frame of audio data in the audio data, and the voice information in the extracted audio data is converted into text information, so that the text information contained in each frame of audio data can be obtained.

in one possible implementation, when the multimedia data is video data and the video data does not include subtitle information, audio data may be extracted from the video data and converted into text information as subtitle information in the video data. In another possible implementation manner, when the multimedia data is video data and the video data includes subtitle information, the video data is directly divided into video frames, and the subtitle information in each video frame is extracted as the subtitle information in the video data.

In a possible implementation manner, the server obtains the information set according to the multimedia data and sends the information set to the terminal, or the server sends the multimedia data to the terminal, and the terminal obtains the information set according to the multimedia data.

203. For each piece of text information in the information set, performing word segmentation on the text information to obtain at least one word of the text information.

Each piece of text information in the information set can represent multimedia data corresponding to a playing time point in the multimedia data, and for each piece of text information in the information set, the text information is segmented to obtain at least one vocabulary of the text information, and the at least one vocabulary can represent semantic characteristics of the text information.

in the process of segmenting the text information, at least one vocabulary labeled with the part of speech can be obtained, for example, the obtained vocabulary is divided into proper nouns, common spoken words, verbs, adjectives and the like.

In one possible implementation, when performing word segmentation processing on text information, a Jiaba word segmentation, an NLTK (Natural Language tool), an LTP (Language technology platform), or another tool may be used.

For example, as shown in fig. 4, when the multimedia data is video data, a video frame is extracted from the video data by calling an image recognition interface, so as to detect text information in the video frame, and obtain semantic features of the video frame.

204. And segmenting the comment information to obtain at least one vocabulary of the comment information.

the comment information is segmented, so that at least one vocabulary of the comment information can be obtained, and the at least one vocabulary can represent semantic features of the comment information.

In the process of segmenting the comment information, at least one vocabulary labeled with the part of speech can be obtained, for example, the obtained vocabulary is divided into proper nouns, common spoken words, verbs, adjectives and the like.

In one possible implementation, when performing word segmentation processing on the comment information, Jiaba (jiba) word segmentation, NLTK (Natural Language tool), LTP (Language technology platform), or other tools may be adopted.

for example, as shown in fig. 5, after obtaining the comment information, the comment information may be segmented to obtain at least one vocabulary of the comment information.

205. and acquiring the similarity between the comment information and the text information according to at least one vocabulary of the text information and at least one vocabulary of the comment information.

The comment information is used for representing the evaluation of the user on the media data, the text information is used for representing the multimedia data played at the corresponding playing time point, and whether the comment information is related to the multimedia data played at the corresponding playing time point can be determined by acquiring the similarity between the text information and the comment information.

In one possible implementation manner, word vectors of at least one vocabulary of text information are obtained, an average value of the word vectors of the at least one vocabulary is calculated to serve as a first vector of the text information, word vectors of at least one vocabulary of comment information are obtained, the average value of the word vectors of the at least one vocabulary is calculated to serve as a second vector of the comment information, and a similarity between the first vector and the second vector is obtained to serve as a similarity between the text information and the comment information.

In another possible implementation manner, word vectors of at least one vocabulary of text information are obtained, an average value of the word vectors of the at least one vocabulary is calculated to serve as a first vector of the text information, word vectors of at least one vocabulary of comment information are obtained, the average value of the word vectors of the at least one vocabulary is calculated to serve as a second vector of the comment information, a vector difference between the first vector and the second vector is calculated, a modulus of the vector difference is obtained, the modulus of the vector difference is adopted to represent the similarity between the text information and the comment information, and the modulus of the vector difference and the similarity are in a negative correlation relationship.

It should be noted that, in the embodiment of the present application, the similarity between the text information and the comment information is obtained according to the original vocabulary in the text information and the comment information. In another embodiment, similar words can be obtained by expanding the original words in the comment information and the text information, and the original words and the similar words participate in the process of obtaining the similarity.

Optionally, the terminal obtains similar words of at least one word in the comment information, the at least one word in the comment information and the corresponding similar words form a first word set, the similar words of the at least one word in the text information are obtained, the at least one word in the text information and the corresponding similar words form a second word set, and the similarity between the text information and the comment information is obtained according to the first word set and the second word set.

Optionally, in step 203 and step 204, the text information and the comment information are segmented to obtain at least one vocabulary of the text information and at least one vocabulary of the comment information, and the at least one vocabulary of the text information and the at least one vocabulary of the comment information may include proper nouns, at this time, according to the proper nouns stored in the database, other proper nouns similar to the proper nouns of the text information and the proper nouns of the comment information may be obtained. Furthermore, at least one word of the text information and at least one word of the comment information may include a verb, and other verbs similar to the verbs of the text information or the comment information may be obtained by the verbs stored in the database.

the similarity between the text information and the comment information is obtained through the proper nouns and the similar vocabularies of verbs of the text information and the comment information, and any one of the following items can be included:

The method comprises the steps of obtaining word vectors of at least one vocabulary of text information, calculating an average value of the word vectors of the at least one vocabulary, using the average value as a first vector of the text information, obtaining word vectors of the at least one vocabulary of comment information, calculating an average value of the word vectors of the at least one vocabulary, using the average value as a second vector of the comment information, obtaining a similarity between the first vector and the second vector, using the similarity as a first similarity between the text information and the comment information, replacing proper nouns and verbs in the text information and the comment information with similar vocabularies, obtaining an average value of the word vectors of the at least one vocabulary in the replaced text information, using the average value as a third vector of the text information, obtaining an average value of the word vectors of the at least one vocabulary in the replaced comment information, using the average value as a fourth vector of the comment information, obtaining a similarity between the third vector and the fourth vector, and as a second similarity between the text information and the comment information, obtaining an average value of the first similarity and the second similarity as the similarity between the text information and the comment information.

Secondly, obtaining word vectors of at least one vocabulary of the text information, calculating an average value of the word vectors of the at least one vocabulary, using the average value as a first vector of the text information, obtaining word vectors of the at least one vocabulary of the comment information, calculating an average value of the word vectors of the at least one vocabulary as a second vector of the comment information, calculating a first vector difference between the first vector and the second vector, obtaining a modulus of the first vector difference, replacing proper nouns and verbs in the text information and the comment information with similar vocabularies, obtaining an average value of the word vectors of the at least one vocabulary in the replaced text information, using the average value as a third vector of the text information, obtaining an average value of the word vectors of the at least one vocabulary in the replaced comment information, using the average value as a fourth vector of the comment information, calculating a second vector difference between the third vector and the fourth vector, and obtaining a modulus of the second vector difference, and acquiring a modulus of the first vector difference and a modulus of the second vector difference as a modulus of the difference between the text information and the comment information, and adopting the modulus of the vector to represent the similarity between the text information and the comment information, wherein the modulus of the vector difference and the similarity are in a negative correlation relationship.

And thirdly, replacing proper nouns and verbs in the text information and the comment information with similar vocabularies, acquiring the average value of word vectors of at least one vocabulary in the replaced text information as a third vector of the text information, acquiring the average value of word vectors of at least one vocabulary in the replaced comment information as a fourth vector of the comment information, and acquiring the similarity between the third vector and the fourth vector as the similarity between the text information and the comment information.

Replacing proper nouns and verbs in the text information and the comment information with similar vocabularies, obtaining an average value of word vectors of at least one vocabulary in the replaced text information as a third vector of the text information, obtaining an average value of word vectors of at least one vocabulary in the replaced comment information as a fourth vector of the comment information, calculating a vector difference between the third vector and the fourth vector, obtaining a modulus of the vector difference, and representing the similarity between the text information and the comment information by adopting the modulus of the vector, wherein the modulus of the vector difference and the similarity are in a negative correlation relationship.

in the embodiment of the application, since the words in the text information and the comment information may express the same meaning, that is, the two words are similar, but since the two words are not completely the same, the determined word vectors are different, and the obtained similarity may be affected. And similar words of the words in the text information and the comment information are obtained, and the words in the text information and the comment information are replaced by the similar words, so that the situation is avoided, the accuracy of the similarity can be improved, and the accuracy of the subsequently established association relation is ensured.

206. And when the similarity between the comment information and any piece of text information in the information set is greater than the preset similarity, establishing an association relation between the comment information and a playing time point corresponding to any piece of text information.

The similarity between the comment information and the text information is used for representing the similarity between the comment information and the text information, the higher the similarity between the comment information and the text information is, the more the similarity between the comment information and the text information is represented, and the lower the similarity between the comment information and the text information is, the more the dissimilarity between the comment information and the text information is represented.

In a possible implementation manner, after the similarity between the comment information and each text information in the information set is obtained in step 205, it is determined whether the similarity between the comment information and the text information is greater than a preset similarity, when the similarity between the comment information and the text information is greater than the preset similarity, it indicates that the comment information is more similar to the text information, and an association relationship between the comment information and a play time point corresponding to the text information is established, and when the similarity between the comment information and the text information is not greater than the preset similarity, it indicates that the comment information is more dissimilar to the text information, and an association relationship between the comment information and a play time point corresponding to the text information is not established.

The preset similarity may be set by a terminal or a developer.

in a possible implementation manner, the similarity between the comment information and the text information is directly obtained, and when whether the similarity between the comment information and the text information is greater than the preset similarity is judged, the similarity between the comment information and the text information and the preset similarity are directly compared.

In another possible implementation manner, a modulus of a vector difference between the comment information and the text information may be further calculated, and the modulus of the vector difference is adopted to represent the similarity between the text information and the comment information. Because the modulus of the vector difference and the similarity are in a negative correlation relationship, when judging whether the similarity between the comment information and the text information is greater than the preset similarity, a preset numerical value corresponding to the preset similarity is set, when the modulus of the vector difference is less than the preset numerical value, the comment information is similar to the text information, and when the modulus of the vector difference is not less than the preset numerical value, the comment information is not similar to the text information.

After the incidence relation between the comment information and the playing time point corresponding to the text information with larger similarity is established, if a user is interested in the comment information in the subsequent process, the user can trigger the skip operation of the comment information, the terminal detects the skip operation of the comment information at the moment, the playing time point associated with the comment information is obtained, the multimedia data is skipped to the playing time point, the multimedia data starts to be played, and the user can watch the multimedia data corresponding to the playing time point associated with the comment information.

In a possible implementation manner, when two or more text messages similar to the comment information exist, the text message with the maximum similarity is obtained, and the incidence relation between the comment information and the playing time point corresponding to the text message is established.

In another possible implementation manner, when two or more text messages similar to the comment information exist, the text message with the playing time point closest to the current playing time point is acquired from the two or more text messages, and the association relationship between the comment information and the playing time point corresponding to the text message is established.

Fig. 6 is a flowchart for establishing an association relationship between bullet screen information and a playing time point according to an embodiment of the present application. Referring to fig. 6, when the multimedia data is video data, any user sends a piece of bullet screen information to obtain a video frame in the video data, performs text detection on the video frame to obtain text information in the video frame, obtains a similarity between the bullet screen information and the text information, determines that the bullet screen information matches the video frame in which the text information is located in the video data according to the similarity, establishes an association relationship between the bullet screen information and a playing time point corresponding to the video frame in the video data, determines that the bullet screen information does not match the video frame in which the text information is located in the video data according to the similarity, and does not establish an association relationship between the bullet screen information and the playing time point corresponding to the video frame in the video data.

the first point to be described is that the embodiment of the present application is only described as an example, in which an information set is obtained, and an association between comment information and text information is established according to a similarity between text information and comment information in the information set. In another embodiment, steps 202-203 and steps 205-206 need not be performed, and only after step 204, it is detected whether the comment information includes a word indicating a playing time point, and when the comment information includes a word indicating a playing time point, an association relationship between the comment information and the playing time point indicated by the word is established.

In a possible implementation manner, since the playing time in the comment information may exceed the playing termination time point of the multimedia data, when the comment information includes a vocabulary for indicating the playing time point and the playing time point is located between the starting time point and the ending time point of the multimedia data, an association relationship between the comment information and the playing time point indicated by the vocabulary is established. And when the playing time point is not positioned between the starting time point and the ending time point of the multimedia data, the incidence relation between the comment information and the playing time point indicated by the vocabulary is not established.

The second point to be described is that the embodiments of the present application are described only by taking a terminal as an execution subject. In another embodiment, the server may be used as an execution subject, the server executes steps 201 to 206, establishes an association relationship between the comment information and the playing time point corresponding to the text information, and stores the association relationship locally, or may also send the association relationship to the terminal, and store the association relationship by the terminal.

The method provided by the embodiment of the application obtains comment information and an information set of the multimedia data, wherein the information set comprises at least one piece of text information and a playing time point corresponding to each piece of text information, each piece of text information is matched with voice information corresponding to the playing time point in the multimedia data, and when the similarity between the comment information and any piece of text information in the information set is larger than a preset similarity, the incidence relation between the comment information and the playing time point corresponding to any piece of text information is established. In the subsequent process, the multimedia data can be played from the play time point associated with the comment information only by triggering the skip operation of the comment information in the multimedia data, and the user does not need to manually drag the progress bar to the position of the target play time point, so that the problem of error of the positioned play time point is avoided, and the positioning accuracy is improved. And the similarity between the comment information and the text information is obtained by segmenting the text information and the comment information and according to at least one word of the text information and at least one word of the comment information, so that the accuracy of the similarity is improved.

and by establishing the incidence relation between the comment information and the playing time points of the multimedia data, the multimedia data corresponding to any playing time point can be associated through the comment information of the multimedia, and the multimedia data corresponding to any playing time point is the key data concerned by the user who issues the comment information. And a plurality of users can evaluate the multimedia data from a plurality of angles, and issue a plurality of comment information, so that more key data in the multimedia data can be acquired through the comment information, and the user playing the multimedia data can jump to the corresponding key data through any comment information, so that the process of playing the multimedia data is more intelligent.

Based on the above implementation environment, fig. 7 is a flowchart of a method for establishing an association relationship according to an embodiment of the present application. Referring to fig. 7, the method is applied to a terminal, and the method includes:

701. and obtaining comment information of the multimedia data.

The process of obtaining the comment information in step 701 is similar to that in step 201, and is not described herein again.

702. a plurality of video frames of multimedia data is acquired.

The multimedia data is video data, and the video data comprises a plurality of video frames.

Because a plurality of video frames in the video data can have different pictures, after the video data is acquired, the plurality of video frames are extracted from the video data, and then the comment information matched with the plurality of video frames can be determined according to the plurality of video frames.

703. The comment information is segmented to obtain at least one vocabulary of the comment information, and a first semantic feature of the comment information is determined according to the at least one vocabulary.

The process of segmenting the comment information in step 703 is similar to that in step 204, and is not described herein again.

For example, after at least one vocabulary of the comment information is acquired, a word vector of the at least one vocabulary is acquired, an average value of the word vectors of the at least one vocabulary is calculated, and the calculated average value is used as a first semantic feature of the comment information.

704. And for each video frame in the plurality of video frames, obtaining a second semantic feature of the video frame based on the feature extraction model.

since the picture of each of the plurality of video frames may be different, for example, an image with a river as a background, an image with a mountain as a background, an image of animal motion, or the like may be included in the plurality of video frames. Therefore, the second semantic feature of each video frame in the plurality of video frames is obtained, and the picture of the corresponding video frame can be represented by adopting the second semantic feature.

The feature extraction model is used for acquiring semantic features of any video frame according to the picture of the video frame.

In one possible implementation, the feature extraction model may be a convolutional neural network model, a full convolutional neural network model, or other models, etc.

for each video frame in the plurality of video frames, inputting the video frame into a feature extraction model, outputting a second semantic feature of the video frame by the feature extraction model, and subsequently determining comment information matched with the video frame according to the obtained second semantic feature.

705. And acquiring the similarity of the comment information and the video frame according to the first semantic feature and the second semantic feature.

the comment information is used for representing the evaluation of the user on the video data, the video frame belongs to a part of the video data, therefore, the similarity between the first semantic feature of the comment information and the second semantic feature of the video frame is obtained, the obtained similarity can represent the similarity between the comment information and the video frame, and whether the comment information is related to the video frame or not can be determined according to the similarity.

The higher the similarity of the comment information to the video frame, the higher the association of the comment information to the video frame, and the lower the similarity of the comment information to the video frame, the lower the association of the comment information to the video frame.

In one possible implementation manner, the similarity between the first semantic feature and the second semantic feature may be represented by parameters such as cosine similarity, jaccard similarity, euclidean distance, and the like.

706. and when the similarity between the comment information and any one of the plurality of video frames is greater than the preset similarity, establishing the association relationship between the comment information and any one of the video frames.

In a possible implementation manner, after the similarity between the comment information and the video frame is obtained in step 705, it is determined whether the similarity between the comment information and the video frame is greater than a preset similarity, when the similarity between the comment information and the video frame is greater than the preset similarity, it indicates that the comment information is similar to the video frame, an association relationship between the comment information and the video frame is established, and when the similarity between the comment information and the video frame is not greater than the preset similarity, it indicates that the comment information is not similar to the video frame, it does not establish an association relationship between the comment information and the video frame.

The preset similarity may be set by a terminal or a developer.

It should be noted that, in the embodiment of the present application, the similarity between the comment information and the video frame is directly obtained. In another possible implementation manner, a modulus of a vector difference between the comment information and the video frame may be calculated, and the modulus of the vector difference is used to represent the similarity between the video frame and the comment information. Because the modulus of the vector difference and the similarity are in a negative correlation relationship, when judging whether the similarity between the comment information and the video frame is greater than the preset similarity, a preset numerical value corresponding to the preset similarity is set, when the modulus of the vector difference is less than the preset numerical value, the comment information is similar to the video frame, and when the modulus of the vector difference is not less than the preset numerical value, the comment information is not similar to the video frame.

After the incidence relation between the comment information and the video frame with larger similarity is established, if a user is interested in the comment information in the subsequent process, the user can trigger the skip operation of the comment information, at the moment, the terminal detects the skip operation of the comment information, obtains the video frame associated with the comment information, skips the multimedia data to the video frame, and starts to play the multimedia data with the video frame as the starting point.

In one possible implementation manner, when two or more video frames similar to the comment information exist, the video frame with the maximum similarity is obtained, and the incidence relation between the comment information and the video frame is established.

In another possible implementation manner, when two or more video frames similar to the comment information exist, a video frame with a playing time point closest to the current playing time point is acquired from the two or more video frames, and an association relationship between the comment information and the video frame is established.

it should be noted that the embodiments of the present application are described only by taking a terminal as an execution subject. In another embodiment, the server may be used as an execution subject, the server executes steps 701 to 706, establishes an association relationship between the comment information and the video frame, and then stores the association relationship locally, or the association relationship may be sent to the terminal and stored by the terminal.

According to the method provided by the embodiment of the application, comment information and a plurality of video frames of multimedia data are obtained, and when the similarity between the comment information and any video frame is larger than the preset similarity, the incidence relation between the comment information and any video frame is established, and the picture incidence relation between the comment information and the video frames is expressed. In the subsequent process, the jump operation of the comment information in the multimedia data is triggered, so that the play time point corresponding to the video frame associated with the comment information can be jumped to, the multimedia data is played from the play time point corresponding to the video frame, the user does not need to manually drag the progress bar to the position of the target play time point, the problem of error of the positioned play time point is avoided, and the positioning accuracy is improved.

In addition, it is to be added that the embodiment shown in fig. 2 and the embodiment shown in fig. 7 respectively establish the association relationship between the comment information and the playing time point, and the association relationship between the comment information and the video frame. For the same comment information, the two association relations can be established, or only one of the association relations can be established.

For example, whether or not a word indicating a play time point is included in the comment information may be used as a determination criterion, and when a word indicating a play time point is included in the comment information, an association relationship between the comment information and the play time point indicated by the word is established, and when a word indicating a play time point is not included in the comment information, an association relationship between the comment information and the video frame is established.

Fig. 8 is a flowchart of a multimedia data playing method according to an embodiment of the present application. Referring to fig. 8, the method is applied to a terminal and includes:

801. And playing the multimedia data based on the playing interface, and displaying the comment information of the multimedia data.

Wherein, the playing interface is an interface for playing multimedia data.

In a possible implementation manner, the terminal is installed with an application program, when the terminal runs the application program, a main interface of the application program is displayed, at least one multimedia data identifier is displayed in the main interface of the application program, when a trigger operation on any multimedia data identifier is detected, the terminal is switched from the main interface to a play interface, and multimedia data corresponding to the multimedia data identifier is played in the play interface.

optionally, when the terminal displays the main interface of the application, a preview display area is displayed in the main interface, and any multimedia data can be played in the preview display area.

the application program may be a video playing application program, an audio playing application program, a live broadcast application program, or may also be a browser application program, or may also be other types of application programs.

When the terminal plays the multimedia data based on the playing interface, the multimedia data is played, and comment information of the multimedia data can be displayed.

In the process of playing the multimedia data, the user of the terminal can also input comment information on the terminal to publish the own view of the multimedia data. The terminal sends the comment information to the server, so that the server can collect the comment information of each terminal on the multimedia data, and send the collected comment information to each terminal playing the multimedia data, and each terminal displays the comment information. In addition, the server also stores the comment information sent by each terminal, and then when any terminal plays multimedia data, the server sends the stored comment information to the terminal, and the terminal displays the received comment information.

In addition, when the terminal displays the comment information based on the play interface, the comment information may be displayed in different forms.

In one possible implementation manner, the comment information is displayed in a floating manner on an upper layer of the playing interface, and the comment information can be located on at least one part of an upper part, a middle part and a lower part of the playing interface. And the comment information can move from right to left or from top to bottom or from bottom to top in the playing interface.

When the comment information moves from right to left in the playing interface, the comment information gradually appears from the right edge of the playing interface and gradually moves to the left side, and when the comment information moves to the left edge of the playing interface, the comment information gradually disappears. Or when the comment information moves from top to bottom in the playing interface, the comment information gradually appears from the upper side edge of the playing interface and gradually moves to the lower side, and when the comment information moves to the lower side edge of the playing interface, the comment information gradually disappears. Or, when the comment information moves from bottom to top in the playing interface, the comment information gradually appears from the lower side edge of the playing interface and gradually moves to the upper side edge of the playing interface, and when the comment information moves to the upper side edge of the playing interface, the comment information gradually disappears.

for example, as shown in fig. 9, the comment information in the multimedia data may be bullet screen information. The bullet screen information is displayed in a sliding mode in the playing interface, gradually appears from one side of the playing interface, and gradually disappears from the other side of the playing interface.

Moreover, because the comment information is displayed in the playing interface in a floating manner, in order to prevent the comment information from influencing the played multimedia data, the terminal can also adjust the transparency of the comment information and display the comment information in the playing interface according to the adjusted transparency.

In another possible implementation manner, the playing interface of the terminal includes a first display area and a second display area, the first display area is used for displaying the played multimedia data, the second display area is used for displaying the comment information of the user, and the first display area is different from the second display area. For example, as shown in fig. 10, the upper area of the playing interface is a first display area for playing multimedia data, and the lower area of the playing interface is a second display area for displaying comment information.

When the terminal displays the comment information of the multimedia data in the second display area, the comment information is displayed in a manner similar to that in the playing interface, and is not described herein again.

In a possible implementation manner, on the basis that the association relationship between the comment information and the play time point is established, the terminal may further display a skip button corresponding to the comment information, and the skip operation is a trigger operation on the skip button. When the terminal detects the trigger operation of the skip button at the first playing time point, the skip operation of the comment information is determined to be detected, and a second playing time point associated with the comment information is obtained.

optionally, in the process of displaying the comment information, when the comment information has an associated play time point, the terminal displays a skip button corresponding to the comment information. And when the comment information does not have the associated playing time point, only displaying the comment information, and not displaying the jump button corresponding to the comment information.

For example, as shown in fig. 11, since only the comment information of "want to go to this place" has an associated play time point, when the comment information is displayed, a skip button of "go around" is also displayed.

Optionally, the terminal displays a skip button corresponding to each piece of comment information. When the comment information has the associated playing time point, the skip button corresponding to the comment information is in a triggerable state, and when the comment information does not have the associated playing time point, the skip button corresponding to the comment information is in a non-triggerable state.

For example, as shown in fig. 12, the skip buttons of "go around" of the comment information are all displayed in the play interface, but only the comment information of "want to go to this place" has an associated play time point, and therefore only the skip button of the comment information is in a triggerable state due to the fact that it is in a triggerable state.

802. And when the skip operation of the comment information is detected at the first playing time point of the multimedia data, acquiring a second playing time point associated with the comment information.

The terminal plays the multimedia data based on the play time point of the multimedia data, when the skip operation of the comment information is detected, the play time point of the currently played multimedia data is a first play time point, and the play time point associated with the obtained comment information is a second play time point.

The skip operation can be click operation, double click operation, long press operation and the like on the comment information.

In a possible implementation manner, on the basis that the association relationship between the comment information and the play time point is established, and when a skip operation to the comment information is detected at a first play time point of the multimedia data, the association relationship between the comment information and the play time point is inquired, and the play time point associated with the comment information is obtained and serves as a second play time point.

Since the association relationship between the comment information and the play time point is already established in the embodiment of fig. 2, the terminal may acquire the established association relationship, and when a skip operation to the comment information is detected, the association relationship is queried, and a second play time point associated with the comment information is acquired.

In another possible implementation manner, on the basis that the association relationship between the comment information and the play time point is not established, when a skip operation to the comment information is detected at the first play time point of the multimedia data, the above steps 201 to 205 may be executed, the similarity between the comment information and at least one text information is obtained, and the play time point corresponding to the text information whose similarity to the comment information is greater than a preset similarity is used as the play time point associated with the comment information.

In another possible implementation manner, when the terminal detects a skip operation to the comment information, a confirmation window is displayed in the play interface, and when the terminal detects a confirmation operation of the user through the confirmation window, a second play time point corresponding to the comment information is acquired. And when the terminal detects the negative operation of the user through the confirmation window, closing the confirmation window, not executing the operation of acquiring the second playing time point corresponding to the comment information, and continuing to play the multimedia data from the first playing time point.

Optionally, when the multimedia data is video data and the terminal detects a skip operation to the comment information, a video frame associated with the comment information may also be acquired, and a playing time point corresponding to the video frame is taken as a second playing time point.

in a possible implementation manner, on the basis that the association relationship between the comment information and the video frame is established, when a skip operation to the comment information is detected at a first play time point of the multimedia data, the association relationship between the comment information and the video frame is inquired to obtain a video frame associated with the comment information, and a play time point corresponding to the video frame is taken as a second play time point.

In another possible implementation manner, on the basis that the association relationship between the comment information and the play time point is not established, when a skip operation to the comment information is detected at a first play time point of the multimedia data, the above steps 701 to 705 may be performed, the similarity between the comment information and a plurality of video frames is obtained, a video frame having a similarity with the comment information greater than a preset similarity is used as a video frame associated with the comment information, and a play time point corresponding to the video frame is used as a second play time point.

in another possible implementation manner, when the terminal detects a skip operation to the comment information, a confirmation window is displayed in the play interface, and when the terminal detects a confirmation operation of the user through the confirmation window, a video frame associated with the comment information is acquired. And when the terminal detects the negative operation of the user through the confirmation window, closing the confirmation window, not executing the step of acquiring the video frame associated with the comment information, and continuing to play the multimedia data from the first play time point.

According to the method and the device, the video frame associated with the comment information is displayed by jumping to the playing time point corresponding to the video frame associated with the comment information, and from the perspective of a user, the user can jump to the associated video frame by triggering the jumping operation on the comment information, so that the video frame can be watched, and the operation is simple, convenient and quick.

it should be noted that, in the embodiment of the present application, the play time point associated with the comment information may be directly obtained, or the corresponding play time point may be obtained after the video frame associated with the comment information is obtained. In another embodiment, whether a vocabulary for indicating a playing time point is included in the comment information may be used as a determination criterion, when the vocabulary for indicating the playing time point is included in the comment information, the playing time point indicated by the vocabulary may be used as a second playing time point associated with the comment information, and when the vocabulary for indicating the playing time point is not included in the comment information, a video frame associated with the comment information is acquired, and the playing time point corresponding to the video frame is used as the second playing time point.

In another embodiment, when the play time point and the video frame associated with the comment information have been obtained, a time difference between the play time point associated with the comment information and a first play time point at which the comment information is currently located, and a time difference between a play time point corresponding to the video frame associated with the comment information and the first play time point may also be obtained, and the play time point with a smaller time difference is selected as the second play time point.

803. and jumping the multimedia data from the first playing time point to the second playing time point.

The terminal plays the multimedia data at a first playing time point, after a second playing time point corresponding to the comment information is obtained, the multimedia data can be jumped from the first playing time point to the second playing time point, the multimedia data is played from the second playing time point, and the multimedia data can be jumped to the corresponding playing time point to play the multimedia data by triggering the jump operation of the comment information of the multimedia data.

For example, when the playing time point of the played multimedia data is 11:51 in the playing interface shown in fig. 12, when the terminal detects a jump operation to "want to go to this place", the terminal jumps the multimedia data from 11:51 to 26:51 as shown in fig. 13, and starts playing the multimedia data from 26: 51.

in the embodiment of the application, when the terminal plays multimedia data on the playing interface, comment information can be displayed on the playing interface, when a user watches the multimedia data, the comment information of the multimedia data can be watched, when the user is interested in any comment information, the terminal can detect the skip operation of the comment information by triggering the skip operation of the comment information, skip to the playing time point associated with the comment information, play the multimedia data, and the user can watch the interested multimedia data.

In a possible implementation manner, the multimedia data is stored in the terminal, and when the terminal detects a skip operation to the comment information, and determines a second play time point associated with the comment information, the terminal directly skips the multimedia data from the first play time point to the second play time point.

In another possible implementation manner, the multimedia data is stored in the server, when the terminal detects a skip operation to the comment information and determines a second play time point associated with the comment information, the terminal sends a data acquisition request to the server, the data acquisition request carries the second play time point, when the server receives the data acquisition request, the multimedia data is sent to the terminal from the second play time point, and the terminal plays the multimedia data after receiving the multimedia data.

And when the terminal switches from playing the multimedia data at the first playing time point to playing the multimedia data at the second playing time point in the playing interface, adopting a preset dynamic effect to display the switching process.

the preset dynamic effect can be that the video frames corresponding to the second playing time points gradually appear from top to bottom, the video frames corresponding to the first playing time points gradually disappear from top to bottom, the video frames corresponding to the second playing time points gradually appear by sliding from left to right, the video frames corresponding to the first playing time points gradually disappear by sliding from right to left, the video frames corresponding to the second playing time points gradually appear by rolling from top to bottom, the video frames corresponding to the first playing time points gradually disappear by turning from one side, the video frames corresponding to the second playing time points gradually appear by turning from one side, and the like.

804. And displaying a return option based on the playing interface.

805. And when the triggering operation of the return option is detected, jumping the multimedia data from the second playing time point to the first playing time point.

After the terminal jumps the multimedia data from the first playing time point to the second playing time point, the user may not be interested in the multimedia data at the second playing time point, at this time, in order to ensure that the user can also control the terminal to return to the first playing time point, a return option is displayed in the playing interface, the return option is used for indicating the first playing time point before the jump, when the user performs a trigger operation on the return option, the terminal can detect the trigger operation on the return option, jump the multimedia data from the second playing time point to the first playing time point, and continue to play the multimedia data from the first playing time point.

The trigger operation may be a single-click operation, a double-click operation, a long-press operation, and the like.

For example, as shown in fig. 14, after the terminal jumps from the first playing time point to the second playing time point, the return option is displayed in the lower right corner of the playing interface. When the terminal detects the trigger operation for the return option, it returns to the first play time point as shown in fig. 11.

In a possible implementation manner, when the terminal detects a trigger operation on the return option, a confirmation window is displayed in the play interface, and when the terminal detects a confirmation operation of the user through the confirmation window, the multimedia data is jumped from the first play time point to the second play time point. And when the terminal detects the negative operation of the user through the confirmation window, closing the confirmation window, not executing the operation of jumping the multimedia data from the first playing time point to the second playing time point, and continuing to play the multimedia data from the second playing time point.

in another possible implementation manner, when the time length for displaying the return option in the play interface by the terminal reaches the preset time length, the return option is not displayed any more.

after the terminal jumps the multimedia data from the first playing time point to the second playing time point, if the triggering operation of the return option is not detected within the preset time length, the terminal can think that the user can continue to watch the multimedia data, and can not trigger the return option any more, and the terminal can not display the return option any more.

The preset duration may be set by the terminal, or may also be set by the user. The preset time period may be 30 seconds, 60 seconds, 90 seconds, or other values.

It should be noted that steps 804-805 are optional steps. In another embodiment, the multimedia data is skipped from the second play time point to the first play time point without displaying a return option in the play interface or according to the return option.

According to the method provided by the embodiment of the application, the multimedia data are played based on the playing interface, the comment information of the multimedia data is displayed, when the skip operation of the comment information is detected at the first playing time point of the multimedia data, the second playing time point related to the comment information is obtained, and the multimedia data are skipped from the first playing time point to the second playing time point. The method for positioning the playing time point is provided, the playing time point related to the comment information can be skipped to only by triggering the skipping operation of the comment information in the multimedia data, the multimedia data is played from the playing time point, the user does not need to manually drag the progress bar to the position of the target playing time point, the problem of error of the positioned playing time point is avoided, and the positioning accuracy is improved.

And by establishing the incidence relation between the comment information and the playing time point of the multimedia data, the multimedia data corresponding to any playing time point correlated with the comment information of the multimedia data can be determined, and the multimedia data corresponding to any playing time point is the key data concerned by the user who issues the comment information. Or by establishing an association relationship between the comment information and the video frames of the multimedia data, any video frame associated with the comment information of the multimedia data can be determined, and the video frame is key data concerned by a user who issues the comment information. The multiple users can evaluate the multimedia data from multiple angles, issue multiple comment information, and obtain more key data in the multimedia data through the multiple comment information, so that the user playing the multimedia data can jump to the corresponding key data through any comment information, and the process of playing the multimedia data is more intelligent.

Fig. 15 is a schematic structural diagram of a multimedia data playing apparatus according to an embodiment of the present application, and referring to fig. 15, the apparatus includes:

The display module 1501 is configured to play the multimedia data based on the play interface and display comment information of the multimedia data;

A time point obtaining module 1502, configured to obtain a second play time point associated with the comment information when a skip operation on the comment information is detected at a first play time point of the multimedia data;

the skipping module 1503 is configured to skip the multimedia data from the first playing time point to the second playing time point.

the device provided by the embodiment of the application plays the multimedia data based on the playing interface, displays the comment information of the multimedia data, acquires the second playing time point associated with the comment information when the skip operation of the comment information is detected at the first playing time point of the multimedia data, and skips the multimedia data from the first playing time point to the second playing time point. The method for positioning the playing time point is provided, the playing time point related to the comment information can be skipped to only by triggering the skipping operation of the comment information in the multimedia data, the multimedia data is played from the playing time point, the user does not need to manually drag the progress bar to the position of the target playing time point, the problem of error of the positioned playing time point is avoided, and the positioning accuracy is improved.

Alternatively, referring to fig. 16, the display module 1501 includes:

a display unit 15011, configured to play multimedia data based on the play interface, and display the comment information and a skip button corresponding to the comment information; the jump operation is a trigger operation of the jump button.

Optionally, the display module 1501 is configured to display a return option based on the play interface;

a skipping module 1503, configured to skip the multimedia data from the second playing time point to the first playing time point when the triggering operation for the returning option is detected.

Optionally, referring to fig. 16, the time point obtaining module 1502 includes:

a first query unit 15021, configured to query an association relationship between the comment information and the play time point, to obtain a play time point associated with the comment information, as a second play time point; alternatively, the first and second electrodes may be,

The multimedia data is video data, and the second query unit 15022 is configured to query an association relationship between the comment information and the video frame to obtain a video frame associated with the comment information, and determine a playing time point corresponding to the video frame as a second playing time point.

optionally, referring to fig. 16, the apparatus further comprises:

The first establishing module 1504 is configured to, when the comment information includes a vocabulary for indicating a playing time point, establish an association relationship between the comment information and the playing time point indicated by the vocabulary.

Optionally, the first establishing module 1504 is further configured to:

And when the comment information comprises words used for indicating the playing time point and the playing time point is positioned between the starting time point and the ending time point of the multimedia data, establishing the incidence relation between the comment information and the playing time point indicated by the words.

Optionally, referring to fig. 15, the apparatus further comprises:

The set obtaining module 1505 is used for obtaining an information set of the multimedia data, where the information set includes at least one piece of text information and a playing time point corresponding to each piece of text information, and each piece of text information is matched with the voice information corresponding to the playing time point in the multimedia data;

The first establishing module 1504 is further configured to establish an association relationship between the comment information and a playing time point corresponding to any piece of text information when the similarity between the comment information and any piece of text information in the information set is greater than a preset similarity.

Optionally, referring to fig. 16, the apparatus further comprises:

The word segmentation module 1506 is used for segmenting words of the comment information to obtain at least one word of the comment information;

the word segmentation module 1506 is further configured to segment each piece of text information in the information set to obtain at least one word of the text information;

a first similarity obtaining module 1507, configured to obtain a similarity between the comment information and the text information according to the at least one vocabulary of the comment information and the at least one vocabulary of the text information.

Optionally, the multimedia data is video data, and the set acquisition module is further configured to extract subtitle information included in each video frame and a playing time point corresponding to each video frame from the video data; alternatively, the first and second electrodes may be,

Optionally, the apparatus further comprises:

A video frame obtaining module 1508, configured to obtain a plurality of video frames included in the video frame data;

A second establishing module 1509, configured to establish an association relationship between the comment information and any one of the video frames when the similarity between the comment information and any one of the video frames is greater than a preset similarity.

Optionally, the apparatus further comprises:

the semantic feature determining module 1510 is configured to perform word segmentation on the comment information to obtain at least one vocabulary of the comment information, and determine a first semantic feature of the comment information according to the at least one vocabulary;

The semantic feature determining module 1510 is further configured to, for each video frame of the plurality of video frames, obtain a second semantic feature of the video frame based on the feature extraction model;

And a second similarity obtaining module 1511, configured to obtain similarity between the comment information and the video frame according to the first semantic feature and the second semantic feature.

all the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

It should be noted that: in the multimedia data playing device provided in the foregoing embodiment, when playing multimedia data, only the division of the functional modules is exemplified, and in practical applications, the functions may be allocated by different functional modules according to needs, that is, the internal structure of the multimedia data playing device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiment of the multimedia data playing apparatus provided in the foregoing embodiment and the embodiment of the multimedia data playing method belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

Fig. 17 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 1700 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts compression standard audio Layer 4), a notebook computer, a desktop computer, a head-mounted device, or any other intelligent terminal. Terminal 1700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.

in general, terminal 1700 includes: a processor 1701 and a memory 1702.

The processor 1701 may include one or more processing cores, such as 4-core processors, 8-core processors, and the like. The processor 1701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1701 may also include a main processor, which is a processor for Processing data in an awake state, also called a Central Processing Unit (CPU), and a coprocessor; a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1701 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and rendering content that the display screen needs to display. In some embodiments, the processor 1701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

the memory 1702 may include one or more computer-readable storage media, which may be non-transitory. The memory 1702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1702 is used to store at least one instruction for the processor 1701 to implement the multimedia data playback method provided by the method embodiments of the present application.

In some embodiments, terminal 1700 may also optionally include: a peripheral interface 1703 and at least one peripheral. The processor 1701, memory 1702 and peripheral interface 1703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1703 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1704, a touch display screen 1705, a camera assembly 1706, an audio circuit 1707, a positioning assembly 1708, and a power supply 1709.

The peripheral interface 1703 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1701 and the memory 1702. In some embodiments, the processor 1701, memory 1702, and peripheral interface 1703 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1701, the memory 1702, and the peripheral interface 1703 may be implemented on separate chips or circuit boards, which are not limited in this embodiment.

the Radio Frequency circuit 1704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1704 communicates with a communication network and other communication devices via electromagnetic signals. The rf circuit 1704 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 8G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1704 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

the display screen 1705 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1705 is a touch display screen, the display screen 1705 also has the ability to capture touch signals on or above the surface of the display screen 1705. The touch signal may be input as a control signal to the processor 1701 for processing. At this point, the display 1705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1705 may be one, providing the front panel of terminal 1700; in other embodiments, display 1705 may be at least two, each disposed on a different surface of terminal 1700 or in a folded design; in still other embodiments, display 1705 may be a flexible display disposed on a curved surface or a folded surface of terminal 1700. Even further, the display screen 1705 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 1706 is used to capture images or video. Optionally, camera assembly 1706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, inputting the electric signals into the processor 1701 for processing, or inputting the electric signals into the radio frequency circuit 1704 for voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1701 or the radio frequency circuit 1704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1707 may also include a headphone jack.

The positioning component 1708 is used to locate the current geographic Location of the terminal 1700 to implement navigation or LBS (Location Based Service). The positioning component 1708 may be a positioning component based on a GPS (global positioning System) in the united states, a beidou System in china, a greiner System in russia, or a galileo System in the european union.

Power supply 1709 is used to power the various components in terminal 1700. The power supply 1709 may be ac, dc, disposable or rechargeable. When power supply 1709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1700 also includes one or more sensors 1710. The one or more sensors 1710 include, but are not limited to: acceleration sensor 1711, gyro sensor 1712, pressure sensor 1713, fingerprint sensor 1714, optical sensor 1715, and proximity sensor 1716.

The acceleration sensor 1711 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 1700. For example, the acceleration sensor 1711 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1701 may control the touch display screen 1705 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1711. The acceleration sensor 1711 may also be used for acquisition of motion data of a game or a user.

the gyro sensor 1712 may detect a body direction and a rotation angle of the terminal 1700, and the gyro sensor 1712 may cooperate with the acceleration sensor 1711 to acquire a 3D motion of the user on the terminal 1700. The processor 1701 may perform the following functions based on the data collected by the gyro sensor 1712: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

pressure sensors 1713 may be disposed on the side frames of terminal 1700 and/or underlying touch display 1705. When the pressure sensor 1713 is disposed on the side frame of the terminal 1700, the user's grip signal to the terminal 1700 can be detected, and the processor 1701 performs left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 1713. When the pressure sensor 1713 is disposed at the lower layer of the touch display screen 1705, the processor 1701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1714 is configured to capture a fingerprint of the user, and the processor 1701 is configured to identify the user based on the fingerprint captured by the fingerprint sensor 1714, or the fingerprint sensor 1714 is configured to identify the user based on the captured fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1701 authorizes the user to have relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 1714 may be disposed on the front, back, or side of terminal 1700. When a physical key or vendor Logo is provided on terminal 1700, fingerprint sensor 1714 may be integrated with the physical key or vendor Logo.

The optical sensor 1715 is used to collect the ambient light intensity. In one embodiment, the processor 1701 may control the display brightness of the touch display screen 1705 based on the ambient light intensity collected by the optical sensor 1715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1705 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1705 is turned down. In another embodiment, the processor 1701 may also dynamically adjust the shooting parameters of the camera assembly 1706 according to the ambient light intensity collected by the optical sensor 1715.

proximity sensors 1716, also known as distance sensors, are typically disposed on the front panel of terminal 1700. Proximity sensor 1716 is used to gather the distance between the user and the front face of terminal 1700. In one embodiment, when proximity sensor 1716 detects that the distance between the user and the front surface of terminal 1700 is gradually reduced, processor 1701 controls touch display 1705 to switch from a bright screen state to a dark screen state; when proximity sensor 1716 detects that the distance between the user and the front surface of terminal 1700 is gradually increased, processor 1701 controls touch display 1705 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the architecture shown in fig. 17 is not intended to be limiting with respect to terminal 1700, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Fig. 18 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1801 and one or more memories 1802, where the memory 1802 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1801 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The server 1800 may be configured to perform the steps performed by the server in the multimedia data playing method.

the embodiment of the present application further provides a multimedia data playing apparatus, which includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded by the processor and has operations to implement the multimedia data playing method of the foregoing embodiment.

An embodiment of the present application further provides a computer-readable storage medium, in which at least one program code is stored, and the at least one program code is loaded by a processor and has operations in a multimedia data playing method for implementing the above-mentioned embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

the above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for playing multimedia data, the method comprising:

2. The method of claim 1, wherein the playing multimedia data based on the playing interface and displaying comment information of the multimedia data comprises:

Playing the multimedia data based on the playing interface, and displaying the comment information and a skip button corresponding to the comment information; the jump operation is a trigger operation of the jump button.

3. The method of claim 1, wherein after jumping the multimedia data from the first play time point to the second play time point, the method further comprises:

Displaying a return option based on the playing interface;

and jumping the multimedia data from the second playing time point to the first playing time point when the triggering operation of the return option is detected.

4. The method of claim 1, wherein the obtaining of the second play time point associated with the comment information comprises:

Inquiring the incidence relation between the comment information and the playing time point to obtain the playing time point associated with the comment information as the second playing time point; alternatively, the first and second electrodes may be,

And the multimedia data is video data, the incidence relation between comment information and video frames is inquired to obtain the video frames associated with the comment information, and the playing time point corresponding to the video frames is determined as the second playing time point.

5. the method according to claim 4, wherein the query of the association relationship between the comment information and the playing time point obtains the playing time point associated with the comment information as being before the second playing time point, and the method further comprises:

And when the comment information comprises words used for indicating the playing time point, establishing the incidence relation between the comment information and the playing time point indicated by the words.

6. The method of claim 5, further comprising:

7. The method according to claim 4, wherein the query of the association relationship between the comment information and the playing time point obtains the playing time point associated with the comment information as being before the second playing time point, and the method further comprises:

Acquiring an information set of the multimedia data, wherein the information set comprises at least one piece of text information and a playing time point corresponding to each piece of text information, and each piece of text information is matched with voice information of the corresponding playing time point in the multimedia data;

And when the similarity between the comment information and any piece of text information in the information set is greater than the preset similarity, establishing the association relationship between the comment information and the playing time point corresponding to any piece of text information.

8. The method of claim 7, further comprising:

Performing word segmentation on the comment information to obtain at least one word of the comment information;

for each piece of text information in the information set, performing word segmentation on the text information to obtain at least one word of the text information;

and acquiring the similarity between the comment information and the text information according to the at least one vocabulary of the comment information and the at least one vocabulary of the text information.

9. The method of claim 7, wherein the obtaining the set of information of the multimedia data comprises:

The multimedia data are video data, and subtitle information contained in each video frame and a playing time point corresponding to each video frame are extracted from the video data; alternatively, the first and second electrodes may be,

The multimedia data are audio data, and text information contained in each frame of audio data and a playing time point corresponding to each frame of audio data are extracted from the audio data.

10. The method of claim 4, wherein before querying the association relationship between the comment information and the video frame and obtaining the video frame associated with the comment information, the method further comprises:

acquiring a plurality of video frames included in the video data;

And when the similarity between the comment information and any one of the plurality of video frames is greater than a preset similarity, establishing an association relationship between the comment information and any one of the plurality of video frames.

11. The method of claim 10, further comprising:

Segmenting words of the comment information to obtain at least one word of the comment information, and determining a first semantic feature of the comment information according to the at least one word;

for each video frame in the plurality of video frames, obtaining a second semantic feature of the video frame based on a feature extraction model;

And acquiring the similarity between the comment information and the video frame according to the first semantic feature and the second semantic feature.

12. A multimedia data playback apparatus, comprising:

13. The apparatus of claim 12, wherein the display module comprises:

The display unit is used for playing the multimedia data based on the playing interface and displaying the comment information and a skip button corresponding to the comment information; the jump operation is a trigger operation of the jump button.

14. a multimedia data playback apparatus, comprising a processor and a memory, wherein at least one program code is stored in the memory, and wherein the at least one program code is loaded and executed by the processor to implement the multimedia data playback method according to any one of claims 1 to 11.

15. a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the multimedia data playback method as claimed in any one of claims 1 to 11.