CN111581403B

CN111581403B - Data processing method, device, electronic equipment and storage medium

Info

Publication number: CN111581403B
Application number: CN202010251438.2A
Authority: CN
Inventors: 孔凡阳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2023-05-23
Anticipated expiration: 2040-04-01
Also published as: CN111581403A

Abstract

The disclosure provides a data processing method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring target multimedia content; acquiring index information for indexing multimedia contents; matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result; processing the local multimedia content while presenting the target multimedia content, wherein processing the local multimedia content includes playing only the local multimedia content, or fast forwarding or skipping the local multimedia content. The embodiment of the disclosure can enhance the processing strength of the multimedia content.

Description

Data processing method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of information processing, and in particular relates to a data processing method, a data processing device, electronic equipment and a storage medium.

Background

With the rapid development and popularization of information technology, people often come into contact with various multimedia contents in daily life. For example: local movie viewing is performed on a computer or online television viewing is performed on a cell phone. In many cases, content that the user does not wish to present appears in the multimedia content. For example: during the process of watching the film together by family members, parents do not want the language which is unfavorable for the psychological development of children to be presented in the film. Therefore, the method has important application value in targeted screening of the multimedia content. In the prior art, the screening of the multimedia content is too one-sided, so that the processing strength of the multimedia content is weak on the basis of the screening.

Disclosure of Invention

An object of the present disclosure is to provide a data processing method, apparatus, electronic device, and storage medium, which can enhance the processing strength of multimedia content.

According to an aspect of an embodiment of the present disclosure, a data processing method is disclosed, the method including:

acquiring target multimedia content;

acquiring index information for indexing multimedia contents;

matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

processing the local multimedia content while presenting the target multimedia content, wherein processing the local multimedia content includes playing only the local multimedia content, or fast forwarding or skipping the local multimedia content.

According to an aspect of an embodiment of the present disclosure, there is disclosed a data processing apparatus, the apparatus including:

a first acquisition module configured to acquire a target multimedia content;

a second acquisition module configured to acquire index information for indexing the multimedia content;

a determining module configured to match the index information with the target multimedia content, and determine local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

A processing module configured to process the local multimedia content when presenting the target multimedia content, wherein processing the local multimedia content comprises playing only the local multimedia content, or fast forwarding or skipping the local multimedia content.

In an exemplary embodiment of the disclosure, the index information is an index text in text form, and the apparatus is configured to:

acquiring a target text contained in the target multimedia content;

matching the index text with the target text, and determining the text position of the index text in the target text;

determining a point in time of occurrence of the index text in the target multimedia content based on the text position;

and determining local multimedia content corresponding to the index text in the target multimedia content based on the occurrence time point.

acquiring target audio contained in the target multimedia content;

acquiring a target text corresponding to the target audio based on a preset audio recognition technology;

acquiring a target video contained in the target multimedia content;

acquiring image texts respectively contained in each image frame in the target video based on a preset image recognition technology;

respectively matching the index text with each image text, and determining the position of an image frame of the index text in the target video;

determining a point in time of occurrence of the index text in the target multimedia content based on the image frame position;

In an exemplary embodiment of the present disclosure, the apparatus is configured to:

acquiring a preset time interval length;

and determining the target multimedia content in a local time interval as the local multimedia content, wherein the local time interval is a time interval with the length of the time interval and the center of the local time interval is the appearance time point.

In an exemplary embodiment of the present disclosure, the index information is index audio in audio form, and the apparatus is configured to:

acquiring target audio contained in the target multimedia content;

determining an audio position of the index audio in the target audio based on a preset audio matching technology;

determining a point in time of occurrence of the index audio in the target multimedia content based on the audio position;

and determining local multimedia content corresponding to the index audio in the target multimedia content based on the occurrence time point.

In an exemplary embodiment of the disclosure, the index information is an index image in the form of an image, and the apparatus is configured to:

acquiring a target video contained in the target multimedia content;

determining the position of an image frame of the index image in the target video based on a preset image matching technology;

Determining a point in time of occurrence of the index image in the target multimedia content based on the image frame position;

and determining local multimedia content corresponding to the index image in the target multimedia content based on the occurrence time point.

In an exemplary embodiment of the present disclosure, the apparatus is configured to: and obtaining pre-stored index information.

when the trigger adding button is detected, an adding interface for adding index information is entered;

and obtaining the index information added in the adding interface.

In an exemplary embodiment of the present disclosure, an information input box is displayed in a first area of the add interface, and the apparatus is configured to: and determining the text input in the information input box as the added index information, and acquiring the added index information.

In an exemplary embodiment of the disclosure, each candidate index information is displayed in the second area of the add interface, and the apparatus is configured to: when a first gesture triggering a preset on a candidate index information in the second area is detected, the candidate index information is determined to be the added index information, and the added index information is acquired.

In an exemplary embodiment of the disclosure, each pre-saved index information is displayed in a third area of the add interface, and the apparatus is configured to:

displaying the added index information in the third area;

when a second preset gesture is detected to be triggered on the displayed index information in the third area, deleting the displayed index information from the third area;

and when the triggering of the save button in the adding interface is detected, saving the index information displayed in each third area.

According to an aspect of an embodiment of the present disclosure, there is disclosed a data processing electronic device including: a memory storing computer readable instructions; a processor reading computer readable instructions stored in a memory to perform the method of any one of the preceding claims.

According to an aspect of the disclosed embodiments, a computer program medium is disclosed, on which computer readable instructions are stored which, when executed by a processor of a computer, cause the computer to perform the method of any one of the preceding claims.

In the embodiment of the disclosure, before the target multimedia content is presented, the index information is matched with the target multimedia content in advance, so that local multimedia content corresponding to the index information is determined; and further processes (including a play-only process, a skip process, and a fast forward process) the partial multimedia content while presenting the target multimedia content, thereby enhancing the processing strength of the multimedia content.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 illustrates an architecture diagram according to one embodiment of the present disclosure.

FIG. 2 illustrates an architecture diagram according to one embodiment of the present disclosure.

Fig. 3 shows a flow chart of a data processing method according to one embodiment of the present disclosure.

Fig. 4 illustrates a multimedia presentation interface integrated with an add button according to one embodiment of the present disclosure.

FIG. 5 illustrates an add interface for adding keywords according to one embodiment of the present disclosure.

Fig. 6 illustrates a process of inputting keywords in an add interface of a terminal according to one embodiment of the present disclosure.

Fig. 7 illustrates a process of acquiring keywords from candidate keywords in an add interface of a terminal according to one embodiment of the present disclosure.

Fig. 8 illustrates a process of deleting keywords in an add interface of a terminal according to one embodiment of the present disclosure.

Fig. 9 illustrates a multimedia presentation interface with text displayed in a video screen according to one embodiment of the present disclosure.

Fig. 10 shows a complete flow chart of user interaction with a terminal according to one embodiment of the present disclosure.

FIG. 11 illustrates a block diagram of a data processing apparatus according to one embodiment of the present disclosure.

FIG. 12 illustrates a hardware diagram of a data processing electronic device according to one embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The drawings are schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, steps, etc. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

First, some concepts related to the embodiments of the present disclosure will be explained.

The target multimedia content may be all multimedia content contained in the multimedia file (for example, when the terminal receives an instruction to play the local movie file, all multimedia content contained in the movie file is determined to be the target multimedia content), or may be part of multimedia content contained in the multimedia file (for example, when the terminal loads and plays the movie file from the server in real time, the multimedia content contained in the preloading part is determined to be the target multimedia content). Accordingly, the target text refers to text in the target multimedia content; the target audio refers to audio in the target multimedia content; the target video refers to video in the target multimedia content; local multimedia content refers to a portion of target multimedia content.

The index information refers to information for indexing multimedia contents. In the embodiment of the disclosure, the local multimedia content matched with the index information is indexed in the target multimedia content according to the index information, and then the local multimedia content can be processed (the processing includes only play processing, skip processing and fast forward processing). Specifically, the index information in the embodiment of the present disclosure may be an index text in text form, so that local multimedia content matched with the index text can be indexed in the target multimedia content according to the index text; index audio, which may be in the form of audio, so that local multimedia content matching the index audio can be indexed in the target multimedia content according to the index audio; or may be an index image in the form of an image, so that the local multimedia content matched with the index image can be indexed in the target multimedia content according to the index image.

The architecture to which embodiments of the present disclosure may be applied is described below with reference to fig. 1 and 2.

FIG. 1 illustrates the architecture composition of an embodiment of the present disclosure: the terminal 10. In this embodiment, the execution subject of the data processing method is the terminal 10.

Specifically, the terminal 10 matches the acquired index information with the acquired target multimedia content, and determines local multimedia content corresponding to the index information in the target multimedia content; and further processing the local multimedia content when the target multimedia content is presented to the user, wherein the processing includes a play-only processing, a skip processing, and a fast forward processing.

For example: the user adds index information in the terminal in advance, wherein the index information is a keyword 'vocabulary A', so that the terminal can skip the audio and video corresponding to the 'vocabulary A' in the target multimedia when playing the target multimedia.

Specifically, the terminal continuously receives a multimedia data stream of a movie from a server to play the movie online. In the online playing process, the terminal can preload multimedia content for a certain time period on the basis of the current playing content.

The terminal matches the word A with the preloaded multimedia content, determines whether the word A appears in the preloaded multimedia content, if so, the terminal locates the time point of the word A appearing in the movie. When the time point is about to be played, the terminal skips and plays the audio and video of the movie from 1 second before the time point to 1 second after the time point, thereby achieving the purpose of skipping the audio and video corresponding to the word A.

The embodiment has the advantages that the data processing logic is integrated in the terminal, the terminal is used as an execution subject, the requirement on the network condition of the user is reduced, and the application scene for skipping the multimedia content is expanded.

FIG. 2 illustrates the architecture composition of an embodiment of the present disclosure: a terminal 10 and a server 20. In this embodiment, the terminal 10 and the server 20 together are the execution subject of the data processing method. Wherein the server 20 is mainly responsible for preprocessing the target multimedia content; the terminal 10 is mainly responsible for rendering the targeted multimedia content.

Specifically, the server 20 matches the obtained index information with the obtained target multimedia content, and determines and marks the local multimedia content corresponding to the index information in the target multimedia content; and in turn, the target multimedia content, to which the local multimedia content is tagged, is transmitted to the terminal 10. The terminal 10 thus processes the partial multimedia content when presenting the target multimedia content to the user, wherein the processes include a play-only process, a skip process, and a fast forward process.

For example: the user adds index information in the terminal in advance, wherein the index information is a keyword word A, and the terminal reports the word A added by the user to the server, so that on the basis of processing of the server, the terminal skips audio corresponding to the word A in the multimedia when playing the target multimedia, and the video is still normally reserved.

Specifically, after receiving a request of the terminal for playing a movie online, the server matches the "vocabulary a" with the multimedia content of the movie, determines whether the "vocabulary a" appears in the multimedia content of the movie, and if the "vocabulary a" appears, locates and marks a time point when the "vocabulary a" appears in the movie.

The server continuously transmits the multimedia content marked with the time point when the word A appears to the terminal, so that the terminal does not play the audio of the movie from 1 second before the time point to 1 second after the time point according to the mark in the process of online playing the movie, and the video part still plays normally, thereby achieving the purpose of screening out the audio corresponding to the word A.

The embodiment has the advantage that the server is responsible for preprocessing the multimedia content, so that the accuracy and efficiency of preprocessing the multimedia content are improved, and the accuracy and efficiency of screening the multimedia content are improved.

It should be noted that, the terminal shown in the above architecture may be provided with a client for presenting multimedia content, where the client may be a video client, a browser client, an instant messaging client, a content sharing client, or other devices with multimedia content presenting functions; the terminal can be a mobile phone, a tablet computer, a desktop computer, a notebook computer, an intelligent television, a game host or other hardware terminals supporting the client. The servers shown in the above architecture may be a single server, a server cluster formed by a plurality of servers, or a cloud server.

It should be noted that the above embodiments of the architecture are merely exemplary and should not be construed as limiting the functionality and scope of the disclosure. It will be appreciated that other architectures besides those shown in fig. 1 and 2 may exist depending on the particular application scenario. For example: in addition to the fact that the "matching the index information with the target multimedia content" and the "determining the local multimedia content" are responsible for the server, "the" processing the local multimedia content "may also be responsible for the server, and the terminal simply takes charge of rendering the received multimedia content. Specifically, after determining the local multimedia content, the server processes the local multimedia content, and then sends the target multimedia content processed with the local multimedia content to the terminal. The terminal presents the received multimedia content, i.e. the target multimedia content which processes the local multimedia content, as it is.

Specific implementations of embodiments of the present disclosure are described in detail below.

It should be noted that, from the above description of the architecture of the embodiments of the present disclosure, it is understood that: the execution body of the embodiment of the disclosure may be a terminal or a server, or may be a combination of a terminal and a server. For the sake of brief description, the following description of the embodiments of the present disclosure is mainly made taking "terminal as an execution subject" as an example; but does not represent that the executing entity of the embodiment of the present disclosure can only be a terminal.

It should be noted that the local multimedia content determined by matching the index information with the target multimedia content may have a plurality of local multimedia contents (for example, it is determined that the "vocabulary a" appears 3 times at different positions of the movie, and the corresponding local multimedia content has 3 local multimedia contents). For purposes of brevity, the following description of embodiments of the present disclosure will be mainly made by taking "determining a local multimedia content" as an example; but does not represent that the disclosed embodiments are only applicable to the case of "determining a local multimedia content".

Fig. 3 shows a data processing method according to an embodiment of the present disclosure, in which an exemplary execution body is a terminal, the method includes:

step S310, obtaining target multimedia content;

step S320, obtaining index information for indexing the multimedia content;

step S330, matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result;

step S340, when presenting the target multimedia content, processing the local multimedia content, where processing the local multimedia content includes playing only the local multimedia content, or fast forwarding or skipping the local multimedia content.

In the embodiment of the disclosure, after acquiring the target multimedia content and the index information, the terminal matches the index information with the target multimedia content, so as to determine local multimedia content corresponding to the index information in the target multimedia content. The target multimedia content may be all multimedia content contained in the multimedia file or part of multimedia content contained in the multimedia file according to different application scenes. The sources of the target multimedia content have been described in the above description, and thus are not described herein.

In one embodiment, obtaining index information for indexing multimedia content includes: the index information stored in advance is acquired.

In this embodiment, index information is stored in advance in the terminal. The index information may be stored via a terminal default setting in advance; or the terminal can be used for storing after the user pre-defined setting.

For example: the terminal defaults to set the word A as index information and stores the index information. When a movie needs to be played, the terminal extracts a word A to match the word A with the multimedia content of the movie, and determines local multimedia content corresponding to the word A.

In the running process of the terminal, the user calls out the setting interface of the terminal, and the user-defined word B is also index information, so that the terminal stores the word B as index information. After the user-defined setting is completed, when a movie is required to be played, the terminal extracts a word A so as to match the word A with the multimedia content of the movie and determine local multimedia content corresponding to the word A; the word B is also extracted to match the word B with the multimedia content of the movie, and the local multimedia content corresponding to the word B is determined.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and scope of use of the present disclosure.

In one embodiment, obtaining index information for indexing multimedia content includes:

and obtaining the index information added in the adding interface.

In this embodiment, the add button is integrated in the terminal in advance. When the addition button is detected to be triggered, the terminal enters an addition interface. The adding interface can be used for the user to configure and add the index information, so that the terminal obtains the index information added by the adding interface.

For example: the terminal displays a keyword button at the bottom of the multimedia presentation interface in the multimedia presentation interface for presenting the target multimedia content, so as to prompt a user to add keywords by clicking the keyword button, and further, the terminal can process local multimedia content matched with the added keywords when playing the target multimedia content.

After the user clicks the "keyword" button, the terminal enters the add interface. The adding interface comprises various components and buttons for a user to add index information in the adding interface.

It should be noted that, the entry of the adding interface does not affect the execution of the logic—the adding button may be located in the multimedia presentation interface, or may be located in another interface, and the adding button is located wherever the adding button is located, so long as the index information added in the adding interface is globally valid in the terminal, and the processing is performed according to the same logic when the target multimedia content is presented.

FIG. 4 illustrates a multimedia presentation interface integrated with an add button in accordance with an embodiment of the present disclosure. In this embodiment, the terminal plays the tv show in the multimedia presentation interface. At the bottom of the multimedia presentation interface, an episode jump button, an album selection button, a definition adjustment button, a 1080P button, a double speed adjustment button, a double speed button and an add button, and a keyword button are integrated.

The user can click on the "keyword" button, call up the adding interface for adding keywords shown in fig. 5, and further add keywords. So that the terminal processes text or audio or video matching the keywords added by the user while continuing the play of the television play.

FIG. 5 illustrates an add interface for adding keywords in accordance with an embodiment of the present disclosure. In this embodiment, the terminal displays a text input box for inputting keywords, a "determine" button for determining the inputted keywords as index information, in a top area of the addition interface; each keyword which is determined to be index information at the current time point is displayed in the middle area of the adding interface, wherein the displayed keywords can comprise pre-stored keywords or keywords added by the user in the current input; a "save keyword" button for saving keywords displayed in the middle area is displayed in the bottom area of the add interface.

Wherein, the adding interface can be realized by an Activity (first interface for interacting with a user) in the android system. After clicking a key word button in a multimedia presentation interface in the terminal, the terminal opens a corresponding activity and enters the adding interface. The user may input a keyword to be skipped in a text input box (EditText) in which the input keyword is displayed. Logic is added to save the entered keywords in memory in the click event response of the "ok" button, so that when the user clicks the "ok" button, the keywords are saved in memory first. Meanwhile, a display text (TextView) is created by using the keywords and displayed in the middle area of the adding interface, so that the user is prompted to stop all keywords which are already input at the current time point. The display text in the middle area can be deleted by long pressing. The keyword storage is realized through a Button (Button), and all the input keywords are stored in the memory of the terminal after the Button is clicked, so that the terminal is matched with all the stored keywords one by one when the terminal needs to match the target multimedia content with the keywords.

It should be noted that the terminal interfaces of fig. 4 and 5 are only exemplary, and should not limit the functions and application scope of the present disclosure.

In one embodiment, an information input box is displayed in a first area of the add-on interface. The index information added in the adding interface is obtained, and the index information comprises: and determining the information input in the information input box as the added index information, and acquiring the added index information.

In this embodiment, the first area of the add interface (e.g., the top area of the add interface) in the terminal is mainly used for the user to input index information. The form of the information input box and the form of the index information are different according to different specific application scenes.

Specifically, when the information input box is used for a user to input a text, the user can input the text through the information input box, so that the terminal determines the text as index information added by the user and obtains the index information. For example: the user clicks an information input box in the adding interface, and the terminal pops up a keyboard for text input, so that the user inputs text through the keyboard. After the text is input, the terminal determines the text as index information and acquires the index information.

When the information input box is used for a user to input audio, the user can input the audio through the information input box, so that the terminal determines the audio as index information added by the user and acquires the index information. For example: the user presses the information input box on the adding interface for a long time, and in the long-time pressing process, the terminal continuously activates the microphone, so that the user inputs audio through the microphone. After the user releases the information input box, the terminal determines the audio as index information and acquires the index information after the audio is input.

When the information input box is used for a user to input an image, the user can input the image through the information input box, so that the terminal determines the image as index information added by the user and acquires the index information. For example: the user clicks an information input box positioned on the adding interface, and the terminal calls out the electronic album, so that the user selects images in the electronic album. After the user determines the selection of the image, the terminal determines the image as index information and acquires the index information.

The embodiment has the advantages that through the setting information input box, a user can set and add index information through the input of information, and the flexibility of adding the index information is improved.

Fig. 6 illustrates a process of inputting keywords in an add interface of a terminal in an embodiment of the present disclosure. In this embodiment, a text input box and a "ok" button are displayed in the top area of the add interface of the terminal; the middle area of the added interface displays the determined words, namely, word A, word B and word C. The "vocabulary D" is input in the text input box and the "ok" button is clicked, and the terminal determines and displays the "vocabulary D" as and in the middle area of the add interface.

It should be noted that, this embodiment is only exemplary for the process of inputting keywords in the addition interface of the terminal, and should not limit the functions and the application scope of the present disclosure.

In one embodiment, each candidate index information is displayed in the second area of the add interface. The index information added in the adding interface is obtained, and the index information comprises: when a first gesture triggering a preset on a candidate index information in the second area is detected, the candidate index information is determined to be the added index information, and the added index information is acquired.

In this embodiment, the second area of the terminal adding interface (for example, the right side bar area in the adding interface) is mainly used for displaying candidate index information, so that the user can select and add index information from the candidate index information.

The candidate index information may be counted by the terminal in advance for the index information adding record of the user group in a certain time, and then determined according to the counting result (for example, the terminal determines the historical keywords of 20 before ranking according to the order of the adding times from high to low in advance according to the keyword adding record of the user group in the last month, and further determines the 20 historical keywords as candidate keywords for the user to select the keywords therefrom); the terminal can also decompose and count the target multimedia content according to the statistical result, and then determine (for example, the target multimedia content is a film file, the terminal extracts the subtitle in the film file, divides the subtitle into words to obtain each real word vocabulary in the subtitle, determines the real word vocabulary of 20 before ranking according to the sequence of the appearance times from high to low, and then determines the 20 real word vocabularies as candidate keywords for the user to select the keywords from.

In this embodiment, after the terminal enters the adding interface, each candidate index information is displayed in the second area of the adding interface. When a user makes a preset first gesture to one of the candidate index information, the terminal determines the candidate index information as index information, and further determines local multimedia content according to the index information.

For example: the terminal pre-determines 20 candidate keywords; the preset first gesture is dragging, and specifically, the first gesture is dragging the candidate keywords to an information input box of the adding interface.

After clicking a key word button at the bottom of a playing interface of the terminal, the terminal enters an adding interface, and the 20 candidate key words are displayed in a right side bar area of the adding interface. If the user drags the candidate keyword 'word C' to the information input box of the adding interface, the terminal determines the 'word C' as index information and acquires the index information, and then in the process of presenting the target multimedia content subsequently, the terminal processes the local multimedia content matched with the 'word C'.

The embodiment has the advantages that the operation of adding the index information by the user setting is saved to a certain extent by displaying the predetermined candidate index information for the user to select, and the convenience of adding the index information is improved.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and scope of use of the present disclosure. It can be understood that the first gesture may be, besides dragging, a single click, a double click, a long press, a press exceeding a certain force, or the like.

Fig. 7 illustrates a process of acquiring keywords from candidate keywords in an add interface of a terminal in an embodiment of the present disclosure. In this embodiment, a text input box is displayed in the top area of the addition interface of the terminal; the right side bar area of the add interface displays candidate keywords-from "vocabulary A" to "vocabulary H". After dragging the word D to the text input box, the terminal determines the word D as a keyword and displays the keyword in the middle area of the adding interface.

It should be noted that, this embodiment is only exemplary shown in the process of acquiring keywords from each candidate keyword in the addition interface of the terminal, and should not limit the functions and the application scope of the present disclosure.

In one embodiment, each of the pre-saved index information is displayed in the third area of the add-on interface. The method further comprises the steps of:

displaying the added index information in the third area;

when a second gesture triggering a preset function is detected to the displayed index information in the third area, deleting the displayed index information from the third area;

In this embodiment, the third area of the terminal add interface (e.g., the middle area of the add interface) is mainly used for displaying and deleting index information.

Specifically, after the terminal enters the adding interface, each piece of pre-stored index information is displayed in a third area of the adding interface. The pre-stored index information may be index information stored in a default setting of the terminal, or may be index information stored in a previous use setting of the user.

The terminal displays the index information added this time in the third area in addition to displaying the index information stored in advance in the third area. That is, as long as it is determined as index information at the current time, it is displayed in the third area.

When a user makes a preset second gesture on index information displayed in the third area, the terminal deletes the index information from the third area; when the user triggers the save button in the add interface, the terminal saves each index information displayed in the third area.

For example: the index information pre-stored by the terminal is long-pressed by the second gestures including vocabulary A, vocabulary B and vocabulary preset.

After the terminal enters the adding interface, the words A, B and D are displayed in the middle area of the adding interface. If the user determines the word C as index information and adds the index information, the terminal also displays the word C in the middle area; if the user presses the word A for a long time, the terminal deletes the word A from the middle area; if the index information displayed in the middle area is "word B", "word C", and "word D", the user clicks the save button in the add interface, and the terminal saves "word B", "word C", and "word D".

The embodiment has the advantages that the added index information is displayed, so that the user is prompted for the added index information, and meanwhile, the user can further adjust and delete the added index information, so that the management efficiency of the user on the index information is improved.

It should be noted that this embodiment is merely exemplary to illustrate one preferred case, and should not limit the functions and application scope of the present disclosure. It can be understood that the second gesture can be a long press, a single click, a double click, a drag, a press exceeding a certain force, and the like; the saving of the displayed index information does not necessarily need to be performed by triggering a save button; a save button is also and necessarily present in the third area.

The first region, the second region, and the third region in the addition interface shown above are not necessarily related. In a specific embodiment, these three regions may exist simultaneously or only one of them may exist. The combination of these three regions is not described here.

Fig. 8 illustrates a process of deleting keywords in an addition interface of a terminal in an embodiment of the present disclosure. In this embodiment, the words "word a", "word B" and "word C" which are words determined as keywords are displayed in the middle area of the addition interface of the terminal. After double clicking on "word B", the terminal no longer determines "word B" as a keyword and deletes "word B" from the middle region.

It should be noted that, this embodiment is only exemplary of the process of deleting keywords in the adding interface of the terminal, and should not limit the functions and the application scope of the present disclosure.

The specific implementation process of matching index information with target multimedia content and determining local multimedia content in the embodiments of the present disclosure is described in detail below.

In one embodiment, the index information is an index text in text form. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the method comprises the following steps:

Acquiring a target text contained in the target multimedia content;

based on the appearance time point, local multimedia content corresponding to the index text in the target multimedia content is determined.

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target multimedia content directly contains target audio or target video and target text in text form (for example, a movie file corresponding to a soft subtitle movie, wherein the subtitle is separated from the audio and video). The terminal mainly determines local multimedia content corresponding to the keywords according to the matching between the texts.

Specifically, after acquiring a target text in target multimedia content, the terminal matches the target text with an index text, and determines the text position of the index text in the target text; and further determining a point in time of occurrence of the index text in the target multimedia content based on the text position. It will be appreciated that, in general, matching of content between audio and video and text belonging to the same multimedia content is synchronized in time. Thus, the point in time of occurrence of the index text in the target multimedia content can be located by the text position where the index text is located.

And further determining the local multimedia content corresponding to the index text based on the occurrence time point. So that the terminal can process the partial multimedia content on the basis of this (the process includes a play-only process, a skip process, and a fast forward process).

For example: the movie file contains subtitle data (subtitle corresponding to the movie) of the movie in addition to audio data (audio corresponding to the movie) and video data (video corresponding to the movie) of the movie.

The user presets an index text in the terminal, wherein the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the caption of the film, matches the word A with the caption, determines the position of the word A in the caption, and locates at the word A' 01 in the caption time axis: 10: subtitle part at time 03"; further, the occurrence time point of the word A in the movie is determined to be "01:10: 03'; further, the movie is shown in "01:10:02-01:10:04 "is determined as the local movie content to be processed.

acquiring target audio contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target multimedia content at least directly contains target audio, and may not directly contain target text in text form (e.g., a movie file corresponding to a subtitle-free movie). The terminal mainly determines local multimedia content corresponding to the index text according to the matching between the text and the audio.

Specifically, after the terminal acquires the target audio in the target multimedia content, processing the target audio based on a preset audio recognition technology, and converting the target audio into a target text in a text form; matching the target text with the index text, and determining the text position of the index text in the target text; further, based on the text position, determining an occurrence time point of the index text in the target multimedia content; and further determining the local multimedia content corresponding to the index text based on the occurrence time point. So that the terminal can process the partial multimedia content on the basis of this (the process includes a play-only process, a skip process, and a fast forward process).

For example: the movie file contains audio data (corresponding to the audio of the movie) and video data (corresponding to the video of the movie) of the movie.

The user presets an index text in the terminal, wherein the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the audio of the movie and converts the audio into corresponding text. It will be appreciated that in general, the converted text corresponds to the subtitles of the movie. The terminal then matches the word A with the subtitle to determine the position of the word A in the subtitle; and then determining the position of the word A in the audio, wherein the word A is positioned in an audio time axis and is positioned in a word A (01): 10: the audio portion at time 03"; further, the occurrence time point of the word A in the movie is determined to be "01:10: 03'; further, the movie is shown in "01:10:02-01:10:04 "is determined as the local movie content to be processed.

The embodiment has the advantages that the local multimedia content is determined by converting the audio into the target text and further by the matching result of the index text and the target text, so that the determined local multimedia content can be suitable for the target multimedia content without subtitles and the like, and the scene suitable for the embodiment is expanded.

In one embodiment, the index information is an index text in text form. Matching the index text with the target multimedia content, and determining local multimedia content corresponding to the index text in the target multimedia content based on a matching result, wherein the method comprises the following steps:

acquiring a target video contained in the target multimedia content;

In this embodiment, the index information acquired by the terminal is an index text in a text form; the target video is directly contained in the target multimedia content, and text in the target multimedia content exists in the form of images as part of the target video (e.g., a movie file corresponding to a subtitle movie, where the subtitle is a part of the images in the video).

Specifically, after the terminal acquires a target video in the target multimedia content, processing the target video based on a preset image recognition technology, and extracting an image text contained in each image frame in the target video; matching the image text with the index text, and determining the position of an image frame of the index text in the target video; determining the occurrence time point of the index text in the target multimedia content based on the image frame position; and further determining the local multimedia content containing the index text based on the occurrence time point. So that the terminal can process the partial multimedia content on the basis of this (the process includes a play-only process, a skip process, and a fast forward process).

For example: the movie file contains audio data (corresponding to the audio of the movie) and video data (corresponding to the video of the movie) of the movie. Wherein the subtitles of the movie exist in the form of images as part of the video.

The user presets an index text in the terminal, wherein the index text is a keyword word A, and the terminal is opened to play the movie file. The terminal extracts the video of the film and extracts the image text contained in each image frame in the video; and then the word A is matched with the image text, and the image frame in which the word A appears, namely the image frame in which the word A appears, is determined to be positioned in a video time axis, namely the word A is positioned in the video time axis, and the word A is positioned in the video time axis, namely the video time axis: 10: image frame at time 03 "; further, the movie is shown in "01:10:02-01:10:04 "is determined as the local movie content to be processed.

The embodiment has the advantages that the local multimedia content is determined through the extraction of the image text in the image frame and the matching result of the index text and the image text, so that the local multimedia content can be determined to be suitable for target multimedia content such as hard captions, and the scene suitable for the embodiment is further expanded.

Fig. 9 illustrates a multimedia presentation interface with text displayed in a video screen in an embodiment of the present disclosure. Referring to fig. 9, in this embodiment, text is displayed in the form of images as part of a video in a multimedia presentation interface. Therefore, the terminal can match the image text with the index text by extracting the image text in the image frame, and further determine the local multimedia content corresponding to the index text.

In one embodiment, determining local multimedia content corresponding to the index text in the target multimedia content based on the occurrence time point includes:

acquiring a preset time interval length;

the target multimedia content in a local time interval is determined as the local multimedia content, wherein the local time interval is a time interval with the length of the time interval and the center of the local time interval is the appearance time point.

In this embodiment, the terminal locates the local multimedia content including the index information to the preset time interval length according to the occurrence time point of the index information in the target multimedia content, so that the local multimedia content including the index information of the time interval length can be skipped again. For example: the preset time interval is 2 seconds long. If it is determined that the occurrence time point of the index information in the movie is "01:10:03", then the movie is shown as" 01:10:02-01:10:04 "is determined as the local movie content to be processed.

In this embodiment, the time interval length may be set by default by the terminal, or may be set by user definition.

In one embodiment, the index information is index audio in audio form. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the method comprises the following steps:

acquiring target audio contained in the target multimedia content;

based on the occurrence time point, local multimedia content corresponding to the index audio among the target multimedia content is determined.

In this embodiment, the index information acquired by the terminal is index audio in audio form; the target multimedia content at least comprises target audio. The terminal determines the local multimedia content corresponding to the index audio mainly according to the matching between the audio.

Specifically, after the terminal acquires the target audio in the target multimedia content, matching the target audio with the index audio, and determining the audio position of the index audio in the target audio; further determining a point in time of occurrence of the index audio in the target multimedia content based on the audio position; and further determining the local multimedia content containing the index audio based on the occurrence time point. So that the terminal can process the partial multimedia content on the basis of this (the process includes a play-only process, a skip process, and a fast forward process).

The user presets the index audio frequency 'audio frequency G' in the terminal, and opens the terminal to play the movie file. The terminal extracts the audio of the movie, and then matches the audio G with the audio of the movie-the position of the audio G in the audio of the movie is determined, and the audio G is located in the audio time axis as "01:10:03-01:10: an audio portion of the 05 "time period; further, the occurrence period of the audio G in the movie is determined to be "01:10:03-01:10: 05'; further, the movie is shown in "01:10:02-01:10:07 "is determined as the local movie content to be processed.

The advantage of this embodiment is that the local multimedia content is determined by matching between the audio frequencies, so that the language-level content to be processed can be located more accurately.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and scope of use of the present disclosure. It can be understood that the audio position of the index audio in the target audio can be determined by directly matching the audio characteristics of the index audio with the target audio; the audio position of the index audio in the target audio can be determined by converting the index audio and the target audio into corresponding texts and then matching the two texts.

In one embodiment, the index information is an index image in the form of an image. Matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the method comprises the following steps:

acquiring a target video contained in the target multimedia content;

based on the appearance time point, local multimedia content corresponding to the index image among the target multimedia content is determined.

In this embodiment, the index information acquired by the terminal is an index image in the form of an image; the target multimedia content at least comprises target video. The terminal determines the local multimedia content corresponding to the index image mainly according to the matching between the images. The index image may be a photo image, a screen capturing image, or an expression image (e.g., emoji expression).

Specifically, after the terminal acquires a target video in the target multimedia content, matching the target video with an index image, and determining the position of an image frame of the index image in the target audio; further, based on the image frame position, determining an occurrence time point of the index image in the target multimedia content; and further determining the local multimedia content containing the index image based on the occurrence time point. So that the terminal can process the partial multimedia content on the basis of this (the process includes a play-only process, a skip process, and a fast forward process).

The user presets an index image 'image H' in the terminal, and opens the terminal to play the movie file. The terminal extracts the video of the movie, and then matches the image H with the video of the movie, and determines the image frame where the image H appears, and the image frame is positioned in the video time axis: 10: a video portion at time 03"; further, it is determined that the occurrence time point of the "image H" in the movie is "01:10: 03'; further, the movie is shown in "01:10:02-01:10:04 "is determined as the local movie content to be processed.

An advantage of this embodiment is that the local multimedia content is determined by matching between images, thereby more accurately locating the content to be processed at the image level.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and scope of use of the present disclosure. It can be understood that the image frame position of the index image in the target video can be determined by directly matching the image characteristics of the index image with the target video; the position of the image frame of the index image in the target video can be determined by firstly extracting the text of each image frame in the index image and the target video and then matching the text in the index image with the text in each image frame in the target video; when the index image is an expression image, a specific expression represented by the expression image and an expression contained in each image frame in the target video can be recognized based on the expression recognition technology, so that the image frame containing the specific expression in the target video is determined as an image frame position of the expression image in the target video.

Specific implementation procedures for presenting target multimedia contents in the embodiments of the present disclosure are described in detail below.

In one embodiment, audio and video in the local multimedia content are skipped when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by skipping the audio and video. Specifically, when the terminal presents the target multimedia content, the audio and video in the local multimedia content containing the index information are skipped. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal is playing "01" to the movie: 10:02", a" 01 "is skipped: 10:02-01:10: audio and video during 04", directly from" 01:10:04 "continue playing the movie.

The embodiment has the advantage that the index information is screened out to the greatest extent by skipping the audio-video containing the index information.

In one embodiment, the audio of the local multimedia content is skipped when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by mute. Specifically, the terminal skips the audio in the local multimedia content containing the index information when presenting the target multimedia content. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal is playing "01" to the movie: 10:02-01:10:04", the movie of the portion is muted, and the video of the portion is unchanged-that is, when" 01 "is played to the movie: 10:02-01:10:04", only the picture is displayed, and no sound is generated.

The embodiment has the advantage that the audio containing the index information is skipped, so that the integrity of the visual information of the target multimedia content is ensured while the language information corresponding to the index information is screened out.

In one embodiment, the video of the local multimedia content is skipped when the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by means of a black screen. Specifically, when the terminal presents the target multimedia content, the video in the local multimedia content containing the index information is skipped. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal is playing "01" to the movie: 10:02-01:10:04", the portion of the movie is subjected to a black screen process, and the audio of the portion is unchanged-that is, when" 01 "is played to the movie: 10:02-01:10:04", there is only sound and no screen.

The embodiment has the advantage that the video containing the index information is skipped, so that the visual information corresponding to the index information is screened out, and the integrity of the language information of the target multimedia content is ensured.

It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and scope of use of the present disclosure. The method for skipping the multimedia content is not limited to the black screen processing method, but can be other methods, such as advertisement displaying, scenario related content displaying, and the like.

In one embodiment, the audio and video in the local multimedia content is fast forwarded while the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by fast-forwarding the audio and video. Specifically, when the terminal presents the target multimedia content, the terminal synchronously fast forwards the audio and the video in the local multimedia content containing the index information. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal is playing "01" to the movie: 10:02", the playing speed of the audio and video is increased by four times the constant speed, and the time taken for 2 seconds to complete the playing is" 01:10:02-01:10: the audio and video during 04 "takes 0.5 seconds to play, and the movie resumes at normal speed.

The embodiment has the advantage that the consistency of the audios and videos in the target multimedia content is maintained to a certain extent while the audios and videos containing the index information are screened and dropped.

In one embodiment, the audio of the local multimedia content is fast-forwarded while the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by fast-forwarding the audio. Specifically, when the terminal presents the target multimedia content, the terminal fast forwards the audio in the local multimedia content containing the index information.

For example: the terminal can be connected with the audio sharing platform and the video sharing platform at the same time. The terminal acquires broadcast audio from the audio sharing platform and acquires handwriting video from the video sharing platform, wherein information contained in the broadcast audio and information contained in the handwriting video can be irrelevant; the terminal encapsulates the broadcast audio and the handwriting video to obtain encapsulated audio and video; when the terminal presents the packaged audio and video to the user, the user can watch the calligraphic video while listening to the broadcast audio.

The index information is a keyword ' word A ', and the terminal determines that the time period of the ' word A ' in the packaged audio and video is ' 01:10:02-01:10:04". The terminal plays "01" to the encapsulated audio/video: 10:02", the playing speed of the broadcast audio is increased by four times the constant speed, and the time taken to complete playing is" 01 ", which takes 2 seconds: 10:02-01:10: the broadcasting audio in 04' period takes 0.5 seconds to play, and the broadcasting audio is recovered to resume the normal speed to continue playing; in the process, the playing speed of the handwriting video is unchanged.

This embodiment has the advantage that the audio continuity in the target multimedia content is maintained to a certain extent while the audio containing the index information is filtered out.

In one embodiment, the video of the local multimedia content is fast-forwarded while the target multimedia content is presented.

In this embodiment, the terminal screens the multimedia content by fast-forwarding the video. Specifically, when the terminal presents the target multimedia content, the terminal fast forwards the video in the local multimedia content containing the index information.

The index information is a keyword ' word A ', and the terminal determines that the time period of the ' word A ' in the packaged audio and video is ' 01:10:02-01:10:04". The terminal plays "01" to the encapsulated audio/video: 10:02", the playing speed of the calligraphic video is increased by four times the constant speed, and the time taken for 2 seconds to complete the playing is" 01:10:02-01:10: the handwriting video in 04' period takes 0.5 seconds to play, and then the normal speed is recovered to continue playing the handwriting video; during this process, the playing speed of the broadcast audio is unchanged.

An advantage of this embodiment is that the continuity of the video in the target multimedia content is maintained to some extent while the video containing the index information is filtered out.

In one embodiment, when the target multimedia content is presented, only the audio and video in the local multimedia content is played.

In this embodiment, the terminal screens the multimedia content by playing only the audio and video in the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the audio and video in the local multimedia content containing the index information are played. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal only plays "01" when playing the movie: 10:02-01:10:04", audio and video during 04".

The embodiment has the advantage that the index information is screened out to the greatest extent by playing only the audio/video containing the index information.

In one embodiment, when the target multimedia content is presented, only audio in the local multimedia content is played.

In this embodiment, the terminal screens the multimedia content by playing only the audio in the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the audio in the local multimedia content containing the index information is played. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal only plays "01" when playing the movie: 10:02-01:10:04 "during audio.

This embodiment has an advantage in that language information of the index information is most highlighted by playing only audio containing the index information.

In one embodiment, when the target multimedia content is presented, only the video in the local multimedia content is played.

In this embodiment, the terminal screens the multimedia content by playing only the video in the local multimedia content. Specifically, when the terminal presents the target multimedia content, only the video in the local multimedia content containing the index information is played. For example: the index information is a keyword "vocabulary A", and the terminal determines that the time period of occurrence of the "vocabulary A" in the movie is "01:10:02-01:10:04". The terminal only plays "01" when playing the movie: 10:02-01:10:04 "video during the period.

This embodiment has the advantage that language information of the index information is most highlighted by playing only the audio/video containing the index information.

It should be noted that, in the embodiment of the present disclosure, the processing of the local multimedia content when the terminal presents the target multimedia content is not limited to only play processing, skip processing, and fast forward processing, but may also include other processing modes capable of suppressing the audio or video expression information. For example: mixing and filtering.

Specifically, in one embodiment, the processing for suppressing audio expression information is performed: the terminal presets a section of bell sound audio for mixing. After determining the local multimedia content corresponding to the index information, the terminal mixes the ring tone with the audio in the local multimedia content when presenting the target multimedia content, so that the user can be interfered by the ring tone while hearing the audio in the local multimedia content. In this way, audio expression information in the local multimedia content is suppressed, thereby suppressing the influence of audio containing index information.

Processing for suppressing video expression information: after determining the local multimedia content corresponding to the index information, the terminal adds a preset filter to the video in the local multimedia content when presenting the target multimedia content, so that a user can be interfered by the filter when watching the video in the local multimedia content. In this way, video expression information in the local multimedia content is suppressed, thereby suppressing the influence of video containing index information.

Fig. 10 shows a complete flow chart of user interaction with a terminal according to an embodiment of the present disclosure.

In this embodiment, in a movie played by a terminal, subtitles exist in the form of images as a part of video.

Before the user watches the film, one or more keywords to be skipped are input into the terminal in advance, and then the film is selected for watching.

The terminal starts playing the movie, and in the playing process, data for a period of time are preloaded; and the terminal detects the preloaded data to determine whether the keyword appears in the preloaded data. There are two ways of detection: 1. detecting voiceprint data of the pre-load data in a voiceprint matching mode, and determining whether the keywords appear in the pre-load data by matching the voiceprints with the keywords; 2. image frame data of the preloaded data is detected in a manner of image frame matching, and whether keywords appear in the preloaded data is determined by matching subtitles and keywords in the image frames.

If the keywords are detected in the preloaded data, marking the time period of occurrence of the matched keywords, and recording the time period in the terminal; thus, when the terminal plays to the marked time period, the terminal skips, and the movie is played directly from the end point of the time period.

There is also provided, in accordance with an embodiment of the present disclosure, as shown in fig. 11, a data processing apparatus including:

a first acquisition module 410 configured to acquire target multimedia content;

a second acquisition module 420 configured to acquire index information for indexing the multimedia content;

a determining module 430 configured to match the index information with the target multimedia content, and determine local multimedia content corresponding to the index information among the target multimedia content based on a matching result;

a processing module 440 configured to process the local multimedia content when presenting the target multimedia content, wherein processing the local multimedia content comprises playing only the local multimedia content, or fast forwarding or skipping the local multimedia content.

acquiring a target text contained in the target multimedia content;

acquiring target audio contained in the target multimedia content;

acquiring a target video contained in the target multimedia content;

acquiring a preset time interval length;

acquiring target audio contained in the target multimedia content;

acquiring a target video contained in the target multimedia content;

and obtaining the index information added in the adding interface.

Displaying the added index information in the third area;

The data processing electronic device 50 according to an embodiment of the present disclosure is described below with reference to fig. 12. The data processing electronic device 50 shown in fig. 12 is only one example and should not be taken as limiting the functionality and scope of use of the disclosed embodiments.

As shown in FIG. 12, the data processing electronics 50 are embodied in the form of a general purpose computing device. Components of data processing electronics 50 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 connecting the various system components, including the memory unit 520 and the processing unit 510.

Wherein the storage unit stores program code that is executable by the processing unit 510 such that the processing unit 510 performs the steps according to various exemplary embodiments of the present invention described in the description of the exemplary methods described above in this specification. For example, the processing unit 510 may perform the various steps as shown in fig. 3.

The storage unit 520 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 5201 and/or cache memory unit 5202, and may further include Read Only Memory (ROM) 5203.

The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The data processing electronic device 50 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the data processing electronic device 50, and/or with any device (e.g., router, modem, etc.) that enables the data processing electronic device 50 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 550. An input/output (I/O) interface 550 is connected to the display unit 540. Also, data processing electronics 50 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 560. As shown, network adapter 560 communicates with other modules of data processing electronic device 50 over bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with data processing electronics 50, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method described in the method embodiment section above.

According to an embodiment of the present disclosure, there is also provided a program product for implementing the method in the above-described method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and comprise program code, and may run on a server or a terminal or the like. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data processing, the method comprising:

acquiring target multimedia content;

when the trigger adding button is detected, an adding interface for adding index information is entered; acquiring index information added in the adding interface;

processing the local multimedia content when the target multimedia content is presented, wherein processing the local multimedia content includes playing only the local multimedia content, or fast forwarding or skipping the local multimedia content;

an information input box is displayed in a first area of the adding interface, and index information added in the adding interface is obtained, including:

when the information input box is used for a user to input text, determining the text input by the user through the information input box as the index information and acquiring the index information; when the information input box is used for a user to input audio, determining the audio input by the user through the information input box as the index information and acquiring the index information; when the information input box is used for a user to input an image, the image input by the user through the information input box is determined as the index information and the index information is acquired.

2. The method of claim 1, wherein the index information is an index text in the form of text,

matching the index information with the target multimedia content, and determining local multimedia content corresponding to the index information in the target multimedia content based on a matching result, wherein the method comprises the following steps:

acquiring a target text contained in the target multimedia content;

3. The method of claim 1, wherein the index information is an index text in the form of text,

Acquiring target audio contained in the target multimedia content;

4. The method of claim 1, wherein the index information is an index text in the form of text,

matching the index text with the target multimedia content, and determining local multimedia content corresponding to the index text in the target multimedia content based on a matching result, wherein the method comprises the following steps:

acquiring a target video contained in the target multimedia content;

5. The method according to any one of claims 2 to 4, wherein determining local multimedia content corresponding to the index text among the target multimedia content based on the occurrence time point includes:

acquiring a preset time interval length;

6. The method of claim 1, wherein the index information is index audio in audio form,

acquiring target audio contained in the target multimedia content;

7. The method of claim 1, wherein the index information is an index image in the form of an image,

acquiring a target video contained in the target multimedia content;

8. The method of claim 1, wherein each candidate index information is displayed in a second area of the add interface,

the index information added in the adding interface is obtained, and the index information comprises: when a first gesture triggering a preset on a candidate index information in the second area is detected, the candidate index information is determined to be the added index information, and the added index information is acquired.

9. The method of claim 1, wherein each pre-saved index information is displayed in a third area of the add interface, the method further comprising:

displaying the added index information in the third area;

10. A data processing apparatus, the apparatus comprising:

a first acquisition module configured to acquire a target multimedia content;

The second acquisition module is configured to enter an adding interface for adding index information when the triggering adding button is detected, and an information input box is displayed in a first area of the adding interface; acquiring index information added in the adding interface;

a processing module configured to process the local multimedia content when the target multimedia content is presented, wherein processing the local multimedia content includes playing only the local multimedia content, or fast forwarding or skipping the local multimedia content;

the index information added in the adding interface is obtained, and the index information comprises:

11. A data processing electronic device, comprising:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored in a memory to perform the method of any one of claims 1-9.

12. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any of claims 1-9.