CN111581403A

CN111581403A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN111581403A
Application number: CN202010251438.2A
Authority: CN
Inventors: 孔凡阳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2020-08-25
Anticipated expiration: 2040-04-01
Also published as: CN111581403B

Abstract

The present disclosure provides a data processing method, an apparatus, an electronic device, and a storage medium, wherein the method includes: acquiring target multimedia content; acquiring index information for indexing multimedia content; matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result; processing the partial multimedia content while rendering the target multimedia content, wherein processing the partial multimedia content comprises playing only the partial multimedia content, or fast forwarding or skipping the partial multimedia content. The embodiment of the disclosure can enhance the processing strength of the multimedia content.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of information processing, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development and popularization of information technology, people often come into contact with various multimedia contents in daily life. For example: the watching of local movies on a computer or online television shows on a mobile phone. In many cases, multimedia content is presented in which the user does not want it presented. For example: during the process of watching the films by family members, parents do not want languages which are unfavorable for the psychological development of children to appear in the films. Therefore, the method has important application value in targeted screening of multimedia content. In the prior art, the screening of the multimedia content is too extensive, so that the processing strength of the multimedia content is weak on the basis of the screening.

Disclosure of Invention

An object of the present disclosure is to provide a data processing method, apparatus, electronic device and storage medium, which can enhance the processing strength of multimedia content.

According to an aspect of the disclosed embodiments, a data processing method is disclosed, the method comprising:

acquiring target multimedia content;

acquiring index information for indexing multimedia content;

matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

processing the partial multimedia content while rendering the target multimedia content, wherein processing the partial multimedia content comprises playing only the partial multimedia content, or fast forwarding or skipping the partial multimedia content.

According to an aspect of the disclosed embodiments, a data processing apparatus is disclosed, the apparatus comprising:

the first acquisition module is configured to acquire target multimedia content;

a second obtaining module configured to obtain index information for indexing the multimedia content;

the determining module is configured to match the index information with the target multimedia content, and determine local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

a processing module configured to process the partial multimedia content when the target multimedia content is presented, wherein processing the partial multimedia content includes playing only the partial multimedia content, or fast forwarding or skipping the partial multimedia content.

In an exemplary embodiment of the present disclosure, the index information is an index text in a text form, and the apparatus is configured to:

acquiring a target text contained in the target multimedia content;

matching the index text with the target text, and determining the text position of the index text in the target text;

determining the appearance time point of the index text in the target multimedia content based on the text position;

and determining local multimedia content corresponding to the index text in the target multimedia content based on the appearance time point.

acquiring a target audio contained in the target multimedia content;

acquiring a target text corresponding to the target audio based on a preset audio recognition technology;

acquiring a target video contained in the target multimedia content;

acquiring image texts respectively contained in each image frame in the target video based on a preset image recognition technology;

matching the index text with each image text respectively, and determining the position of an image frame of the index text in the target video;

determining an appearance time point of the index text in the target multimedia content based on the image frame position;

In an exemplary embodiment of the disclosure, the apparatus is configured to:

acquiring a preset time interval length;

and determining the target multimedia content in a local time interval as the local multimedia content, wherein the local time interval is the time interval with the length of the time interval and the center of the time interval being the appearance time point.

In an exemplary embodiment of the disclosure, the index information is index audio in the form of audio, and the apparatus is configured to:

acquiring a target audio contained in the target multimedia content;

determining the audio position of the index audio in the target audio based on a preset audio matching technology;

determining an appearance time point of the index audio in the target multimedia content based on the audio position;

and determining local multimedia content corresponding to the index audio in the target multimedia content based on the occurrence time point.

In an exemplary embodiment of the present disclosure, the index information is an index image in the form of an image, and the apparatus is configured to:

acquiring a target video contained in the target multimedia content;

determining the position of an image frame of the index image in the target video based on a preset image matching technology;

determining an appearance time point of the index image in the target multimedia content based on the image frame position;

and determining local multimedia content corresponding to the index image in the target multimedia content based on the appearance time point.

In an exemplary embodiment of the disclosure, the apparatus is configured to: and acquiring pre-stored index information.

In an exemplary embodiment of the disclosure, the apparatus is configured to:

when detecting that an adding button is triggered, entering an adding interface for adding index information;

and acquiring the index information added in the adding interface.

In an exemplary embodiment of the present disclosure, an information input box is displayed in a first area of the add interface, and the apparatus is configured to: and determining the text input in the information input box as the added index information, and acquiring the added index information.

In an exemplary embodiment of the disclosure, each candidate index information is displayed in the second area of the add interface, and the apparatus is configured to: when a preset first gesture which triggers a candidate index information in the second area is detected, the candidate index information is determined as the added index information, and the added index information is obtained.

In an exemplary embodiment of the disclosure, each pre-saved index information is displayed in a third area of the add interface, and the apparatus is configured to:

displaying the added index information in the third area;

when detecting that a preset second gesture is triggered on index information displayed in the third area, deleting the displayed index information from the third area;

and when the condition that the save button in the adding interface is triggered is detected, saving the index information displayed in the third area.

According to an aspect of an embodiment of the present disclosure, there is disclosed a data processing electronic device including: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of the preceding claims.

According to an aspect of an embodiment of the present disclosure, a computer program medium is disclosed, having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any of the preceding claims.

In the embodiment of the disclosure, before the target multimedia content is presented, the index information is matched with the target multimedia content in advance, so as to determine the local multimedia content corresponding to the index information; and further processes the partial multimedia content while rendering the target multimedia content, the processes including a play-only process, a skip process, and a fast-forward process, thereby enhancing a processing power of the multimedia content.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

It is to be understood that both the foregoing general description and the following detailed description are exemplary, and are not restrictive, of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 illustrates an architectural diagram according to one embodiment of the present disclosure.

FIG. 2 illustrates an architectural diagram according to one embodiment of the present disclosure.

FIG. 3 shows a flow diagram of a data processing method according to one embodiment of the present disclosure.

FIG. 4 illustrates a multimedia presentation interface integrated with an add button according to one embodiment of the present disclosure.

FIG. 5 illustrates an add interface for adding keywords, according to one embodiment of the present disclosure.

Fig. 6 illustrates a process of inputting a keyword in an add interface of a terminal according to one embodiment of the present disclosure.

Fig. 7 illustrates a process of obtaining keywords from candidate keywords in an add interface of a terminal according to an embodiment of the disclosure.

Fig. 8 illustrates a process of deleting a keyword in an add interface of a terminal according to one embodiment of the present disclosure.

FIG. 9 illustrates a multimedia presentation interface with text displayed in a video frame according to one embodiment of the present disclosure.

FIG. 10 shows a full flow diagram of user interaction with a terminal according to one embodiment of the present disclosure.

FIG. 11 shows a block diagram of a data processing apparatus according to one embodiment of the present disclosure.

FIG. 12 shows a hardware diagram of data processing electronics, according to one embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

First, some concepts related to the embodiments of the present disclosure will be explained.

The target multimedia content may be all multimedia contents contained in the multimedia file (for example, when the terminal receives an instruction to play a local movie file, all multimedia contents contained in the movie file are determined as the target multimedia contents), or may be part of multimedia contents contained in the multimedia file (for example, when the terminal loads and plays the movie file from the server in real time, the multimedia contents contained in the preloaded part are determined as the target multimedia contents. Correspondingly, the target text refers to the text in the target multimedia content; target audio refers to audio in the target multimedia content; the target video refers to a video in the target multimedia content; the partial multimedia content refers to a part of the target multimedia content.

The index information refers to information for indexing multimedia content. In the embodiment of the present disclosure, a partial multimedia content matching with the index information is indexed in the target multimedia content according to the index information, and then the partial multimedia content can be processed (the processing includes a play-only processing, a skip processing, and a fast-forward processing). Specifically, the index information in the embodiment of the present disclosure may be an index text in a text form, so that local multimedia content matched with the index text can be indexed in the target multimedia content according to the index text; the index audio can be in the form of audio, so that local multimedia content matched with the index audio can be indexed in the target multimedia content according to the index audio; the index image may also be in the form of an image, so that the local multimedia content matched with the index image can be indexed in the target multimedia content according to the index image.

An architecture to which embodiments of the present disclosure may be applied is described below with reference to fig. 1 and 2.

FIG. 1 illustrates the architectural components of one embodiment of the present disclosure: the terminal 10. In this embodiment, the main body of execution of the data processing method is the terminal 10.

Specifically, the terminal 10 matches the acquired index information with the acquired target multimedia content, and determines a local multimedia content corresponding to the index information in the target multimedia content; and processing the local multimedia content when the target multimedia content is presented to the user, wherein the processing comprises a play-only processing, a skip processing and a fast-forward processing.

For example: the method comprises the steps that index information is added in advance in a terminal by a user, and the index information is a keyword word A, so that when the terminal plays target multimedia, audio and video corresponding to the word A in the target multimedia can be skipped over.

Specifically, the terminal continuously receives a multimedia data stream of a movie from the server to play the movie online. In the online playing process, on the basis of the current playing content, the terminal can preload the multimedia content for a certain time.

The terminal matches the vocabulary A with the preloaded multimedia content, determines whether the preloaded multimedia content has the vocabulary A, and if the preloaded multimedia content has the vocabulary A, the terminal locates the time point when the vocabulary A appears in the movie. When the movie is about to be played at the time point, the terminal skips playing the audio and video of the movie from '1 second before the time point' to '1 second after the time point', thereby achieving the purpose of skips the audio and video corresponding to the 'vocabulary A'.

The embodiment has the advantages that the data processing logic disclosed by the invention is integrated in the terminal, and the terminal is used as an execution main body, so that the requirement on the network condition of a user is reduced, and the applicable scene for skipping the multimedia content is expanded.

FIG. 2 illustrates the architectural components of one embodiment of the present disclosure: terminal 10, server 20. In this embodiment, the terminal 10 and the server 20 together serve as an execution subject of the data processing method. Wherein, the server 20 is mainly responsible for preprocessing the target multimedia content; the terminal 10 is primarily responsible for rendering the targeted multimedia content.

Specifically, the server 20 matches the acquired index information with the acquired target multimedia content, and determines and marks a local multimedia content corresponding to the index information in the target multimedia content; and then transmits the target multimedia contents to which the partial multimedia contents are attached to the terminal 10. The terminal 10 thus processes the partial multimedia content when presenting the target multimedia content to the user, wherein the processes include a play-only process, a skip process, and a fast-forward process.

For example: the method comprises the steps that index information is added in advance in a terminal by a user and is a keyword word A, and the terminal reports the word A added by the user to a server, so that on the basis of processing of the server, when a target multimedia is played by the terminal, an audio corresponding to the word A in the multimedia can be skipped, and a video is still normally reserved.

Specifically, after receiving a request of the terminal for requesting on-line playing of the movie, the server matches the vocabulary A with the multimedia content of the movie, determines whether the multimedia content of the movie has the vocabulary A, and if the multimedia content of the movie has the vocabulary A, the server locates and marks the time point of the vocabulary A appearing in the movie.

The server continuously transmits the multimedia content marked with the time point of the appearance of the vocabulary A to the terminal, so that the terminal does not play the audio of the movie from 1 second before the time point to 1 second after the time point according to the mark in the process of playing the movie on line, and the video part is still normally played, thereby achieving the purpose of screening out the audio corresponding to the vocabulary A.

The embodiment has the advantages that the server is responsible for preprocessing the multimedia content, so that the accuracy and the efficiency of preprocessing the multimedia content are improved, and the accuracy and the efficiency of screening the multimedia content are improved.

It should be noted that the terminal shown in the above architecture may be provided with a client for presenting multimedia content, where the client may be a video client, a browser client, an instant messaging client, a content sharing client, or other client having a multimedia content presentation function; the terminal can be a mobile phone, a tablet computer, a desktop computer, a notebook computer, an intelligent television, a game console or other hardware terminals supporting the client. The server shown in the above architecture may be a single server, a server cluster formed by multiple servers, or a cloud server.

It should be noted that the above embodiments of the architecture are only exemplary and should not limit the scope and functionality of the disclosure. It is understood that in addition to the architectures shown in fig. 1 and 2, other architectures may exist depending on the specific application scenario. For example: in addition to "matching the index information with the target multimedia content" and "determining the local multimedia content" being responsible for the server, "processing the local multimedia content" may also be responsible for the server, and the terminal is solely responsible for rendering the received multimedia content. Specifically, after determining the local multimedia content, the server processes the local multimedia content, and then sends the target multimedia content with the processed local multimedia content to the terminal. The terminal originally presents the multimedia content received by the terminal, namely the target multimedia content which processes the local multimedia content.

Specific implementations of embodiments of the present disclosure are described in detail below.

First, it should be noted that, according to the above description of the architecture of the embodiment of the present disclosure, it can be seen that: the execution subject of the embodiment of the present disclosure may be a terminal or a server, or may be a combination of a terminal and a server. For the purpose of brief explanation, the following description of the embodiments of the present disclosure is mainly made by taking "a terminal as an execution subject" as an example; it is not intended that the execution body of the embodiments of the present disclosure be a terminal only.

It should be noted that there may be more than one local multimedia content determined by matching the index information with the target multimedia content (for example, it is determined that the "vocabulary a" appears 3 times at different positions of the movie, and there are 3 corresponding local multimedia contents). For the purpose of brief explanation, the following description of the embodiments of the present disclosure is mainly made by taking "determining a local multimedia content" as an example; but is not intended to represent that embodiments of the present disclosure are applicable only to the case where "one local multimedia content is determined".

Fig. 3 shows a data processing method according to an embodiment of the present disclosure, where an exemplary execution subject is a terminal, and the method includes:

step S310, obtaining target multimedia content;

step S320, acquiring index information for indexing the multimedia content;

step S330, matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

step S340, when the target multimedia content is presented, processing the local multimedia content, wherein the processing the local multimedia content includes only playing the local multimedia content, or fast forwarding or skipping the local multimedia content.

In the embodiment of the disclosure, after acquiring target multimedia content and index information, a terminal matches the index information with the target multimedia content, and further determines local multimedia content corresponding to the index information in the target multimedia content. According to different application scenarios, the target multimedia content may be all multimedia contents contained in the multimedia file or a part of multimedia contents contained in the multimedia file. The sources of the target multimedia content have been described in the above description, and thus will not be described in detail herein.

In one embodiment, obtaining index information for indexing multimedia content includes: and acquiring the index information saved in advance.

In this embodiment, the terminal stores index information in advance. The index information may be saved via a terminal default setting in advance; or the user may customize the settings in advance and then save the settings in the terminal.

For example: the terminal defaults to set the vocabulary A as index information and stores the index information. When a movie needs to be played, the terminal extracts the vocabulary A to match the vocabulary A with the multimedia content of the movie, and determines the local multimedia content corresponding to the vocabulary A.

In the operation process of the terminal, a user calls a setting interface of the terminal, and self-defines and sets the vocabulary B as index information, so that the terminal stores the vocabulary B as the index information. After the user-defined setting of the user is finished, when the movie needs to be played, the terminal extracts the vocabulary A so as to match the vocabulary A with the multimedia content of the movie and determine the local multimedia content corresponding to the vocabulary A; "vocabulary B" is also extracted to match "vocabulary B" with the multimedia content of the movie to determine the local multimedia content corresponding to "vocabulary B".

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.

In one embodiment, obtaining index information for indexing multimedia content includes:

and acquiring the index information added from the adding interface.

In this embodiment, an add button is integrated in the terminal in advance. When the adding button is detected to be triggered, the terminal enters an adding interface. The adding interface can be used for the user to configure and add the index information, so that the terminal obtains the index information added by the adding interface.

For example: the terminal displays a keyword button at the bottom of a multimedia presentation interface in the multimedia presentation interface for presenting the target multimedia content to prompt a user to add the keyword by clicking the keyword button, so that the terminal can process local multimedia content matched with the added keyword when the target multimedia content is played.

And when the user clicks the key word button, the terminal enters an adding interface. The adding interface comprises various components and buttons so that a user can add the index information in the adding interface.

It should be noted that the entry of the add interface does not affect the execution of the logic — the add button may be located in the multimedia presentation interface or in other interfaces, and wherever the add button is located, the add button is globally valid in the terminal as long as the index information added in the add interface is added, and the add button performs processing according to the same logic when presenting the target multimedia content.

FIG. 4 illustrates a multimedia presentation interface integrated with an add button according to an embodiment of the present disclosure. In this embodiment, the terminal plays the tv play in the multimedia presentation interface. At the bottom of the multimedia presentation interface, an episode jump button, a 'selection' button, a definition adjustment button, a '1080P' button, a double speed adjustment button, a 'double speed' button and an addition button, a 'keyword' button are integrated.

The user can click the "keyword" button to call out the addition interface for adding keywords shown in fig. 5, and further add the keywords. So that the terminal processes the text or audio or video matched with the keywords added by the user when the playing of the television series is continued.

FIG. 5 illustrates an add interface for adding keywords according to an embodiment of the disclosure. In this embodiment, the terminal displays a text input box for inputting a keyword, a "ok" button for determining the input keyword as index information in a top area of the addition interface; displaying each keyword which is determined as index information at the current time point in the middle area of the adding interface, wherein the displayed keywords can comprise pre-stored keywords and also can comprise keywords which are input and added by a user at this time; a "save keyword" button for saving the keyword displayed in the middle area is displayed in the bottom area of the addition interface.

Wherein the add interface can be implemented through an Activity (first interface for interacting with a user) in the android system. After a user clicks a 'keyword' button in a multimedia presentation interface in the terminal, the terminal opens a corresponding activity and enters the adding interface. The user can input a keyword to be skipped in a text input box (EditText), and the input keyword is displayed in the text input box. In the click event response of the 'confirm' button, logic for saving the input keywords in the memory is added, so that when the user clicks the 'confirm' button, the keywords are saved in the memory first. Meanwhile, a display text (TextView) is created by using the keywords and displayed in the middle area of the adding interface, and the user is prompted to input all the keywords by the current time point. The displayed text in the middle area can be deleted by long pressing. The storage of the keywords is realized through a Button (Button), and all input keywords can be stored in the memory of the terminal after the Button is clicked, so that when the terminal needs to match the target multimedia content with the keywords, the keywords can be matched with all the stored keywords one by one.

It should be noted that the terminal interfaces in fig. 4 and 5 are only exemplary illustrations, and should not limit the functions and the scope of the present disclosure.

In one embodiment, an information input box is displayed in the first area of the add interface. The index information added in the adding interface is acquired, and the index information comprises the following information: and determining the information input in the information input box as the added index information, and acquiring the added index information.

In this embodiment, the first area of the add interface in the terminal (for example, the top area of the add interface) is mainly used for the user to input the index information. The form of the information input box and the form of the index information are different according to different application scenes.

Specifically, when the information input box is used for a user to input a text, the user can input the text through the information input box, so that the terminal determines the text as the index information added by the user and obtains the index information. For example: and clicking the information input box in the adding interface by the user, and popping up a keyboard for text input by the terminal, so that the user inputs the text through the keyboard. And after the text is input, the terminal determines the text as the index information and acquires the index information.

When the information input box is used for a user to input audio, the user can input the audio through the information input box, so that the terminal determines the audio as the index information added by the user and obtains the audio. For example: and the user presses the information input box positioned on the adding interface for a long time, and the terminal continuously activates the microphone in the long-time pressing process, so that the user inputs audio through the microphone. After the user releases the information input box and marks that the audio input is finished, the terminal determines the audio as the index information and obtains the index information.

When the information input box is used for a user to input an image, the user can input the image through the information input box, so that the terminal determines the image as the index information added by the user and acquires the index information. For example: and clicking the information input box positioned on the adding interface by the user, calling the electronic album by the terminal, and selecting the image in the electronic album by the user. After the user determines the selection of the image, the terminal determines the image as the index information and acquires the index information.

The embodiment has the advantages that through the setting of the information input box, the user can set and add the index information through the input of the information, and the flexibility of adding the index information is improved.

Fig. 6 illustrates a process of inputting a keyword in an adding interface of a terminal in an embodiment of the present disclosure. In this embodiment, a text input box and a "confirm" button are displayed in the top area of the add interface of the terminal; the middle area of the addition interface displays the determined words- "word a", "word B", and "word C". When the word D is input in the text input box and the confirm button is clicked, the terminal confirms the word D and displays the word D in the middle area of the adding interface.

It should be noted that this embodiment is only exemplarily shown in the process of inputting the keyword in the adding interface of the terminal, and should not limit the function and the application scope of the present disclosure.

In an embodiment, each candidate index information is displayed in the second area of the adding interface. The index information added in the adding interface is acquired, and the index information comprises the following information: when a preset first gesture which is triggered to a candidate index information in the second area is detected, the candidate index information is determined as the added index information, and the added index information is obtained.

In this embodiment, the second area of the terminal addition interface (for example, the right sidebar area in the addition interface) is mainly used for displaying the candidate index information, so that the user can select and add the index information from the candidate index information.

The candidate index information can be obtained by counting the index information adding records of the user group within a certain time in advance by the terminal, and then determining according to a counting result (for example, the terminal determines the historical keywords with the top rank of 20 according to the keyword adding records of the user group within the last month in advance, and determines the 20 historical keywords as candidate keywords in the order of adding times from high to low, so that the user can select the keywords from the candidate keywords); the terminal can also decompose and count the target multimedia content and determine the target multimedia content according to the statistical result (for example, the target multimedia content is a movie file, the terminal extracts the subtitles in the movie file, performs word segmentation on the subtitles to obtain each real word in the subtitles, determines the real word words with the top 20 ranks according to the sequence of the occurrence times from high to low, and further determines the 20 real word words as candidate keywords for the user to select the keywords from).

In this embodiment, after the terminal enters the adding interface, the candidate index information is displayed in the second area of the adding interface. When a user carries out a preset first gesture on one candidate index information, the terminal determines the candidate index information as the index information, and then determines local multimedia content according to the index information.

For example: the terminal determines 20 candidate keywords in advance; the preset first gesture is dragging, and specifically, the first gesture is dragging the candidate keyword to an information input box of the added interface.

After the user clicks a 'keyword' button at the bottom of the playing interface of the terminal, the terminal enters the adding interface, and the 20 candidate keywords are displayed in the right side bar area of the adding interface. If the user drags the candidate keyword 'vocabulary C' in the candidate keyword 'vocabulary C' to the information input box of the adding interface, the terminal determines the 'vocabulary C' as index information and obtains the index information, and then local multimedia content matched with the 'vocabulary C' is processed in the subsequent process of presenting the target multimedia content.

The embodiment has the advantages that the operation of setting and adding the index information by the user is saved to a certain extent and the convenience of adding the index information is improved by displaying the predetermined candidate index information for the user to select.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure. It is understood that the first gesture may be a single click, a double click, a long press, a press with more than a certain force, or the like, in addition to the drag.

Fig. 7 illustrates a process of obtaining keywords from candidate keywords in an add interface of a terminal in an embodiment of the present disclosure. In the embodiment, a text input box is displayed in the top area of an adding interface of the terminal; the right side bar area of the add interface displays candidate keywords-from "vocabulary A" to "vocabulary H". After the vocabulary D is dragged to the text input box, the terminal determines the vocabulary D as a keyword and displays the keyword in the middle area of the adding interface.

It should be noted that this embodiment is only exemplarily shown in the process of obtaining the keyword from each candidate keyword in the adding interface of the terminal, and should not limit the function and the application scope of the present disclosure.

In an embodiment, the third area of the add interface displays the pre-saved index information. The method further comprises the following steps:

displaying the added index information in the third area;

In this embodiment, the third area of the terminal add interface (for example, the middle area of the add interface) is mainly used for displaying and deleting the index information.

Specifically, after the terminal enters the adding interface, the index information stored in advance is displayed in the third area of the adding interface. The index information stored in advance may be index information stored in a default setting of the terminal, or index information stored in a previous use by the user.

The terminal displays the index information added this time in the third area in addition to the index information stored in advance in the third area. That is, as long as the index information is determined at the current time, the index information is displayed in the third area.

When the user makes a preset second gesture on the index information displayed in the third area, the terminal deletes the index information from the third area; when the user triggers the save button in the add interface, the terminal saves the index information displayed in the third area.

For example: the index information pre-stored by the terminal comprises ' vocabulary A ', ' vocabulary B ', ' second gesture preset by the vocabulary is long press.

After entering the adding interface, the terminal displays 'vocabulary A', 'vocabulary B' and 'vocabulary D' in the middle area of the adding interface. If the user determines the vocabulary C as the index information and adds the index information, the terminal also displays the vocabulary C in the middle area; if the user presses the vocabulary A for a long time, the terminal deletes the vocabulary A from the middle area; if the user clicks a save button in the add interface when the index information displayed in the middle area is "vocabulary B", "vocabulary C", or "vocabulary D", the terminal saves "vocabulary B", "vocabulary C", or "vocabulary D".

The embodiment has the advantages that the added index information is displayed, so that the user is prompted, meanwhile, the user can further adjust and delete the added index information, and the management efficiency of the user on the index information is improved.

It should be noted that the embodiment is only an example to show a better case, and should not limit the function and the scope of the disclosure. It can be understood that the second gesture may be a long press, a single click, a double click, a drag, a press with more than a certain force, or the like; the storage of the displayed index information does not necessarily need to be performed by triggering a storage button; a save button is also and necessarily present in the third area.

It should be noted that the first region, the second region, and the third region in the above-illustrated adding interface are not necessarily related. In a specific embodiment, the three regions may exist simultaneously or only one of them may exist. The combination of these three regions will not be described in detail.

Fig. 8 illustrates a process of deleting a keyword in an add interface of a terminal in an embodiment of the present disclosure. In this embodiment, the middle area of the addition interface of the terminal displays words "word a", "word B", and "word C", which are each determined as a keyword. After double-clicking "vocabulary B", the terminal no longer determines "vocabulary B" as a keyword and deletes "vocabulary B" from the middle region.

It should be noted that this embodiment is only exemplarily shown in the process of deleting the keyword in the adding interface of the terminal, and should not limit the function and the application scope of the present disclosure.

The following describes in detail a specific implementation process of matching the index information with the target multimedia content and determining the local multimedia content in the embodiment of the present disclosure.

In one embodiment, the index information is index text in text form. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the matching comprises the following steps:

acquiring a target text contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target multimedia content directly contains target audio or video and target text in text form (for example, a movie file corresponding to a soft subtitle movie, in which subtitles are separated from audio and video). And the terminal determines local multimedia content corresponding to the keywords mainly according to the matching between the texts.

Specifically, after acquiring a target text in target multimedia content, the terminal matches the target text with an index text to determine a text position of the index text in the target text; and then determining the appearance time point of the index text in the target multimedia content based on the text position. It will be appreciated that, in general, audio-video and text belonging to the same multimedia content are synchronized in time at the same time as the content is matched. Therefore, the text position of the index text can be used to locate the appearance time point of the index text in the target multimedia content.

And further determining local multimedia content corresponding to the index text based on the appearance time point. So that the terminal can process the partial multimedia contents on this basis (the process includes a play-only process, a skip process, and a fast-forward process).

For example: the movie file includes, in addition to audio data (corresponding to the audio of the movie) and video data (corresponding to the video of the movie), subtitle data (corresponding to the subtitle of the movie) of the movie.

The user presets an index text in the terminal, the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the caption of the movie, matches the vocabulary A with the caption to determine the position of the vocabulary A in the caption, and the position is located in the caption time axis as "01: 10: the subtitle part at time 03'; further, it is determined that the appearance time point of "vocabulary a" in the movie is "01: 10: 03 "; further, the movie is displayed in a "01: 10: 02-01: 10: 04 "is determined as the local movie content to be processed.

acquiring a target audio contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target multimedia content at least directly contains the target audio, and may not directly contain the target text in text form (e.g., a movie file corresponding to a movie without subtitles). The terminal determines local multimedia content corresponding to the index text mainly according to matching between the text and the audio.

Specifically, after acquiring a target audio frequency in target multimedia content, the terminal processes the target audio frequency based on a preset audio frequency identification technology, and converts the target audio frequency into a target text in a text form; matching the target text with the index text, and determining the text position of the index text in the target text; further determining the appearance time point of the index text in the target multimedia content based on the text position; and further determining local multimedia content corresponding to the index text based on the appearance time point. So that the terminal can process the partial multimedia contents on this basis (the process includes a play-only process, a skip process, and a fast-forward process).

For example: the movie file contains audio data (corresponding to the audio of the movie) and video data (corresponding to the video of the movie) of the movie.

The user presets an index text in the terminal, the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the audio of the movie and converts the audio into corresponding text. It will be appreciated that the converted text will typically correspond to the subtitles of the movie. The terminal matches the vocabulary A with the caption to determine the position of the vocabulary A in the caption; and further determining the position of the word A in the audio, wherein the position is located in the audio time axis and is represented by' 01: 10: the audio portion at time 03 "; further, it is determined that the appearance time point of "vocabulary a" in the movie is "01: 10: 03 "; further, the movie is displayed in a "01: 10: 02-01: 10: 04 "is determined as the local movie content to be processed.

The embodiment has the advantages that the audio is converted into the target text, and the local multimedia content is determined according to the matching result of the index text and the target text, so that the determined local multimedia content can be suitable for the target multimedia content without subtitles and the like, and the applicable scenes of the disclosure are expanded.

In one embodiment, the index information is index text in text form. Matching the index text with the target multimedia content, and determining local multimedia content corresponding to the index text in the target multimedia content based on a matching result, wherein the local multimedia content comprises:

acquiring a target video contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target video is directly contained in the target multimedia content, and the text in the target multimedia content exists in the form of an image as a part of the target video (for example, a movie file corresponding to a hard subtitle movie, where a subtitle is a part of an image in the video).

Specifically, after a terminal acquires a target video in target multimedia content, the target video is processed based on a preset image recognition technology, and an image text contained in each image frame in the target video is extracted; further matching the image text with the index text, and determining the position of an image frame of the index text in the target video; further determining the appearance time point of the index text in the target multimedia content based on the image frame position; and further determining local multimedia content containing the index text based on the occurrence time point. So that the terminal can process the partial multimedia contents on this basis (the process includes a play-only process, a skip process, and a fast-forward process).

For example: the movie file contains audio data (corresponding to the audio of the movie) and video data (corresponding to the video of the movie) of the movie. Wherein the subtitles of the movie are present as images as part of the video.

The user presets an index text in the terminal, the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the video of the movie and extracts the image text contained in each image frame in the video; and then matching the word A with the image text to determine the image frame in which the word A appears, namely the image frame in which the word A appears, and the image frame is positioned in the video time axis, wherein the image frame is positioned in the position of 01: 10: image frame at time 03 "; further, the movie is displayed in a "01: 10: 02-01: 10: 04 "is determined as the local movie content to be processed.

The embodiment has the advantages that the local multimedia content is determined through the extraction of the image text in the image frame and the matching result of the index text and the image text, so that the determined local multimedia content can be suitable for the target multimedia content such as hard captions, and the applicable scenes of the disclosure are further expanded.

Fig. 9 illustrates a multimedia presentation interface with text displayed in a video frame in an embodiment of the disclosure. Referring to fig. 9, in this embodiment, text is displayed in the form of an image as part of a video in a multimedia presentation interface. Therefore, the terminal can match the image text with the index text by extracting the image text in the image frame, and further determine the local multimedia content corresponding to the index text.

In one embodiment, determining the local multimedia content corresponding to the index text in the target multimedia content based on the time of occurrence includes:

acquiring a preset time interval length;

and determining the target multimedia content in a local time interval as the local multimedia content, wherein the local time interval is the time interval of the length of the time interval and the center of the time interval is the appearance time point.

In this embodiment, the terminal locates the local multimedia content containing the index information and having the preset time interval length according to the appearance time point of the index information in the target multimedia content, so that the local multimedia content containing the index information and having the time interval length can be skipped over again. For example: the preset time interval length is 2 seconds. If it is determined that the time point of occurrence of the index information in the movie is "01: 10: 03 ", then the movie is recorded in" 01: 10: 02-01: 10: 04 "is determined as the local movie content to be processed.

In this embodiment, the length of the time interval may be set by default by the terminal, or may be set by user-defined.

In one embodiment, the index information is index audio in the form of audio. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the matching comprises the following steps:

acquiring a target audio contained in the target multimedia content;

determining the occurrence time point of the index audio in the target multimedia content based on the audio position;

In this embodiment, the index information acquired by the terminal is an index audio in an audio form; the target multimedia content at least comprises target audio. The terminal determines the local multimedia content corresponding to the index audio mainly according to the matching between the audios.

Specifically, after acquiring a target audio frequency in the target multimedia content, the terminal matches the target audio frequency with the index audio frequency and determines the audio frequency position of the index audio frequency in the target audio frequency; further determining the appearance time point of the index audio in the target multimedia content based on the audio position; and further determining the local multimedia content containing the index audio based on the occurrence time point. So that the terminal can process the partial multimedia contents on this basis (the process includes a play-only process, a skip process, and a fast-forward process).

The user sets the index audio "audio G" in the terminal in advance, and opens the terminal to play the movie file. The terminal extracts the audio of the movie, and then matches the audio G with the audio of the movie, namely determines the position of the audio G in the audio of the movie, and the position is located in an audio time axis (01: 10: 03-01: 10: 05 "audio portion of the time period; it is further determined that the occurrence period of "audio G" in the movie is "01: 10: 03-01: 10: 05 "; further, the movie is displayed in a "01: 10: 02-01: 10: 07 "is determined as the local movie content to be processed.

This embodiment has the advantage that local multimedia content is determined by matching between the audios, enabling a more accurate localization of the content to be processed to the language level.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure. It can be understood that the audio position where the index audio appears in the target audio can be determined by directly matching the index audio with the target audio in the audio characteristics; the audio position of the index audio in the target audio can also be determined by converting the index audio and the target audio into corresponding texts and then matching the two texts.

In one embodiment, the index information is an index image in the form of an image. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the matching comprises the following steps:

acquiring a target video contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index image in an image form; the target multimedia content at least comprises a target video. The terminal determines the local multimedia content corresponding to the index image mainly according to the matching between the images. The index image may be a photo image, a screenshot image, or an expression image (e.g., emoji expression).

Specifically, after the terminal acquires a target video in the target multimedia content, matching the target video with the index image, and determining the position of an image frame of the index image in the target audio; further determining the appearance time point of the index image in the target multimedia content based on the image frame position; and further determining local multimedia content comprising the index image based on the appearance time point. So that the terminal can process the partial multimedia contents on this basis (the process includes a play-only process, a skip process, and a fast-forward process).

The user sets the index image "image H" in the terminal in advance, and opens the terminal to play the movie file. The terminal extracts the video of the movie, and then matches the image H with the video of the movie, namely determines that the image frame where the image H appears is positioned in a video time axis of 01: 10: the video portion at time 03 "; it is further determined that the appearance time point of "image H" in the movie is "01: 10: 03 "; further, the movie is displayed in a "01: 10: 02-01: 10: 04 "is determined as the local movie content to be processed.

This embodiment has the advantage that local multimedia content is determined by matching between the images, thereby more accurately locating the content to be processed at the image level.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure. It can be understood that the image frame position of the index image appearing in the target video can be determined by directly matching the index image with the target video in the image characteristics; the positions of the image frames of the index image in the target video can also be determined by extracting the texts of the image frames in the index image and the target video and then matching the texts in the index image and the texts in the image frames in the target video; when the index image is an expression image, a specific expression represented by the expression image and an expression contained in each image frame in the target video can be identified based on an expression identification technology, and then the image frame containing the specific expression in the target video is determined as the position of the image frame where the expression image appears in the target video.

The following describes in detail a specific implementation process for presenting the target multimedia content in the embodiment of the present disclosure.

In one embodiment, audio and video in the local multimedia content are skipped when the target multimedia content is presented.

In the embodiment, the terminal screens the multimedia content by skipping the audio and video. Specifically, when the terminal presents the target multimedia content, the terminal skips over both audio and video in the local multimedia content containing the index information. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". The terminal plays the movie "01: 10: 02 ", skip" 01: 10: 02-01: 10: 04 ", directly from" 01: 10: 04 "continue playing the movie.

This embodiment has the advantage that the index information is screened out to the maximum extent by skipping the audio/video containing the index information.

In one embodiment, the audio of the partial multimedia content is skipped when the target multimedia content is presented.

In this embodiment, the terminal filters the multimedia content in a mute manner. Specifically, when the terminal presents the target multimedia content, the audio in the local multimedia content containing the index information is skipped. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". The terminal plays the movie "01: 10: 02-01: 10: 04 ", the movie of the part is muted, and the video of the part is not changed, that is, when the" 01: 10: 02-01: 10: 04 ", there is only a screen and no sound.

The embodiment has the advantage that the completeness of the visual information of the target multimedia content is ensured while the language information corresponding to the index information is screened out by skipping the audio containing the index information.

In one embodiment, the video of the partial multimedia content is skipped when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content in a black screen manner. Specifically, when presenting the target multimedia content, the terminal skips the video in the local multimedia content containing the index information. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". The terminal plays the movie "01: 10: 02-01: 10: 04 ", the movie of the part is subjected to a black screen process, and the audio of the part is not changed — that is, when the movie is played to" 01: 10: 02-01: 10: 04 ", there is no screen but only sound.

The embodiment has the advantage that the completeness of the language information of the target multimedia content is ensured while the visual information corresponding to the index information is screened out by skipping the video containing the index information.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure. The multimedia content can be skipped in various ways, not limited to the way of black screen processing, but also in other ways, such as displaying advertisements, displaying content related to a scenario, and the like.

In one embodiment, the audio and video in the local multimedia content is fast forwarded when the target multimedia content is rendered.

In the embodiment, the terminal screens the multimedia content by fast forwarding the audio and video. Specifically, when presenting the target multimedia content, the terminal synchronously fast forwards the audio and video in the local multimedia content containing the index information. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". The terminal plays the movie "01: 10: 02 ', the playing speed of the audio and video is increased, four times of the normal speed, and the' 01: 10: 02-01: 10: the audio and video in the period of 04' takes 0.5 second to be played completely, and then the movie is played continuously at the constant speed.

The embodiment has the advantage that the continuity of the audios and videos in the target multimedia content is maintained to a certain extent while the audios and videos containing the index information are screened.

In one embodiment, the audio of the local multimedia content is fast forwarded when the target multimedia content is rendered.

In this embodiment, the terminal filters the multimedia content by fast forwarding the audio. Specifically, when the terminal presents the target multimedia content, the terminal fast forwards the audio in the local multimedia content containing the index information.

For example: the terminal can be connected with the audio sharing platform and the video sharing platform simultaneously. The terminal acquires broadcast audio from the audio sharing platform and acquires calligraphy video from the video sharing platform, wherein information contained in the broadcast audio and information contained in the calligraphy video can be irrelevant; the terminal encapsulates the broadcast audio and the calligraphy video to obtain an encapsulated audio and video; when the terminal presents the packaged audio and video to the user, the user can watch the calligraphy video while listening to the broadcast audio.

The index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the packaged audio and video is ' 01: 10: 02-01: 10: 04". The terminal plays the audio and video' 01: 10: 02 'is the speed of broadcast audio is increased, four times of the normal speed, the' 01: 10: 02-01: 10: 04' period, the broadcast audio takes 0.5 second to play, and then the broadcast audio is resumed to play at normal speed; in the process, the playing speed of the calligraphy video is unchanged.

This embodiment has the advantage that the continuity of the audio in the target multimedia content is maintained to a certain extent while the audio containing the index information is filtered out.

In one embodiment, the video of the local multimedia content is fast forwarded when the target multimedia content is presented.

In this embodiment, the terminal filters the multimedia content by fast forwarding the video. Specifically, when the terminal presents the target multimedia content, the terminal fast forwards the video in the local multimedia content containing the index information.

The index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the packaged audio and video is ' 01: 10: 02-01: 10: 04". The terminal plays the audio and video' 01: 10: 02 ', the playing speed of the calligraphy video is increased, four times of the normal speed, and the calligraphy video can be played in 2 seconds, namely' 01: 10: 02-01: 10: the calligraphic video in the period of 04' takes 0.5 second to be played, and then the calligraphic video is played continuously at a normal speed; in this process, the playing speed of the broadcast audio is unchanged.

This embodiment has the advantage that the continuity of the video in the target multimedia content is maintained to a certain extent while the video containing the index information is filtered out.

In one embodiment, only the audio and video of the local multimedia content are played when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by playing only the audio and video of the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the audio and the video in the local multimedia content containing the index information are played. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". Then the terminal only plays "01: 10: 02-01: 10: audio and video during 04 ".

The embodiment has the advantage that the index information is screened out to the maximum extent by only playing the audio and video containing the index information.

In one embodiment, only the audio in the local multimedia content is played when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by playing only the audio in the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the audio frequency in the local multimedia content containing the index information is played. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". Then the terminal only plays "01: 10: 02-01: 10: 04 ".

This embodiment has the advantage that the language information of the index information is highlighted to the maximum extent by playing only the audio containing the index information.

In one embodiment, when the target multimedia content is presented, only the video in the local multimedia content is played.

In this embodiment, the terminal screens the multimedia content by playing only the video in the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the video in the local multimedia content containing the index information is played. For example: the index information is a keyword ' vocabulary A ', and the terminal determines that the time period of the ' vocabulary A ' appearing in the movie is ' 01: 10: 02-01: 10: 04". Then the terminal only plays "01: 10: 02-01: 10: 04 ".

The embodiment has the advantage that the language information of the index information is highlighted to the greatest extent by only playing the audio/video containing the index information.

It should be noted that, in the embodiment of the present disclosure, the processing of the local multimedia content when the terminal presents the target multimedia content is not limited to only play processing, skip processing, and fast forward processing, but may also include other processing manners capable of suppressing the audio or video expression information. For example: and sound mixing processing and filter processing.

Specifically, in one embodiment, for the processing of suppressing the audio expression information: the terminal presets a section of ring tone audio for sound mixing. After the local multimedia content corresponding to the index information is determined, the terminal mixes the ring tone audio with the audio in the local multimedia content when presenting the target multimedia content, so that the user can be interfered by the ring tone audio while hearing the audio in the local multimedia content. By this means, the audio expression information in the local multimedia content is suppressed, thereby suppressing the influence of the audio containing the index information.

Processing for suppressing video representation information: after the local multimedia content corresponding to the index information is determined, the terminal adds a preset filter to the video in the local multimedia content when presenting the target multimedia content, so that the user can be interfered by the filter when watching the video in the local multimedia content. In this way, the video presentation information in the local multimedia content is suppressed, thereby suppressing the influence of the video containing the index information.

Fig. 10 shows a complete flow diagram of user interaction with a terminal of an embodiment of the present disclosure.

In this embodiment, in a movie played by a terminal, subtitles exist as a part of a video in the form of an image.

Before a terminal watches a movie, a user inputs one or more keywords needing to be skipped in the terminal in advance, and then selects the movie to watch.

The terminal starts to play the movie, and data of a period of time is preloaded in the playing process; and then the terminal detects the pre-loaded data to determine whether the keywords appear in the pre-loaded data. The detection mode has two modes: 1. detecting voiceprint data of the preloaded data in a voiceprint matching mode, and determining whether the keyword appears in the preloaded data or not by matching the voiceprint with the keyword; 2. and detecting image frame data of the preloaded data in an image frame matching mode, and determining whether the keywords appear in the preloaded data or not by matching subtitles and keywords in the image frames.

If the keywords are detected in the preloaded data, marking a time period of occurrence of the matched keywords, and recording the time period in the terminal; thus, skipping is performed when the terminal plays to the marked time period, and the movie is played continuously directly from the end point of the time period.

According to an embodiment of the present disclosure, as shown in fig. 11, there is also provided a data processing apparatus including:

a first obtaining module 410 configured to obtain target multimedia content;

a second obtaining module 420 configured to obtain index information for indexing the multimedia content;

a determining module 430, configured to match the index information with the target multimedia content, and determine, based on a matching result, a local multimedia content corresponding to the index information in the target multimedia content;

a processing module 440 configured to process the partial multimedia content when rendering the target multimedia content, wherein processing the partial multimedia content comprises playing only the partial multimedia content, or fast forwarding or skipping the partial multimedia content.

acquiring a target text contained in the target multimedia content;

acquiring a target audio contained in the target multimedia content;

acquiring a target video contained in the target multimedia content;

In an exemplary embodiment of the disclosure, the apparatus is configured to:

acquiring a preset time interval length;

acquiring a target audio contained in the target multimedia content;

acquiring a target video contained in the target multimedia content;

In an exemplary embodiment of the disclosure, the apparatus is configured to:

and acquiring the index information added in the adding interface.

displaying the added index information in the third area;

Data processing electronics 50 according to an embodiment of the present disclosure are described below with reference to fig. 12. The data processing electronics 50 shown in fig. 12 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 12, the data processing electronics 50 is embodied in the form of a general purpose computing device. The components of the data processing electronics 50 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.

Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the description part of the above exemplary methods of the present specification. For example, the processing unit 510 may perform the various steps as shown in fig. 3.

The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.

Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The data processing electronic device 50 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the data processing electronic device 50, and/or with any devices (e.g., router, modem, etc.) that enable the data processing electronic device 50 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. An input/output (I/O) interface 550 is connected to the display unit 540. Also, the data processing electronics 50 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, network adapter 560 communicates with the other modules of data processing electronics 50 via bus 530. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the data processing electronics 50, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.

According to an embodiment of the present disclosure, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a server or a terminal or the like. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data processing, the method comprising:

acquiring target multimedia content;

acquiring index information for indexing multimedia content;

processing the partial multimedia content while rendering the target multimedia content, wherein processing the partial multimedia content includes playing only the partial multimedia content, or fast forwarding or skipping the partial multimedia content.

2. The method of claim 1, wherein the index information is index text in text form,

matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the matching comprises the following steps:

acquiring a target text contained in the target multimedia content;

3. The method of claim 1, wherein the index information is index text in text form,

acquiring a target audio contained in the target multimedia content;

4. The method of claim 1, wherein the index information is index text in text form,

matching the index text with the target multimedia content, and determining local multimedia content corresponding to the index text in the target multimedia content based on a matching result, wherein the matching comprises the following steps:

acquiring a target video contained in the target multimedia content;

5. The method according to any one of claims 2 to 4, wherein determining the local multimedia content corresponding to the index text in the target multimedia content based on the time point of occurrence comprises:

acquiring a preset time interval length;

6. The method of claim 1, wherein the index information is index audio in the form of audio,

acquiring a target audio contained in the target multimedia content;

7. The method of claim 1, wherein the index information is an index image in the form of an image,

acquiring a target video contained in the target multimedia content;

8. The method of claim 1, wherein obtaining index information for indexing multimedia content comprises: and acquiring pre-stored index information.

9. The method of claim 1, wherein obtaining index information for indexing multimedia content comprises:

and acquiring the index information added in the adding interface.

10. The method of claim 9, wherein an information entry box is displayed in a first area of the add interface,

the index information added in the adding interface is acquired, and the index information comprises the following information: and determining the text input in the information input box as the added index information, and acquiring the added index information.

11. The method of claim 9, wherein each candidate index information is displayed in the second area of the add interface,

the index information added in the adding interface is acquired, and the index information comprises the following information: when a preset first gesture which triggers a candidate index information in the second area is detected, the candidate index information is determined as the added index information, and the added index information is obtained.

12. The method of claim 9, wherein each pre-saved index information is displayed in a third area of the add interface, the method further comprising:

displaying the added index information in the third area;

13. A data processing apparatus, characterized in that the apparatus comprises:

14. An electronic device for data processing, comprising:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-12.

15. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any of claims 1-12.