CN110602528B

CN110602528B - Video processing method, terminal, server and storage medium

Info

Publication number: CN110602528B
Application number: CN201910885938.9A
Authority: CN
Inventors: 吴国祖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2021-07-27
Anticipated expiration: 2039-09-18
Also published as: CN110602528A

Abstract

The embodiment of the invention discloses a video processing method, a terminal, a server and a storage medium, wherein the method comprises the following steps: sending a playing request of a target video to a server; receiving playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video, a target subtitle corresponding to the target video and relevant information of a word to be displayed, and the relevant information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed; and playing the target video by a user interface, and displaying the target caption and floating and displaying the relevant information of the word to be displayed. By adopting the embodiment of the invention, the relevant information of the words to be displayed in the target video, which need to be learned by the user, can be effectively displayed in a floating manner, thereby being beneficial to improving the word learning efficiency.

Description

Video processing method, terminal, server and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a video processing method, a terminal, a server, and a storage medium.

Background

In recent years, with the continuous penetration of internet technology in the field of education, rich high-quality resources are provided for online teaching. Video teaching is a mainstream online learning mode, and the video teaching means that a teacher can make a section of learning video according to learning contents in advance, and a learner can master the learning contents by watching the learning video. For example, video foreign language instruction is a video that a learner pre-records or cuts out a section of video to help a learner learn a foreign language.

At present, in the video foreign language teaching process, a common foreign language learning method is as follows: the lines included in the foreign language video are displayed through the bilingual subtitles, wherein one language is a language that a learner is skilled to master, and the other language is a language that the learner needs to learn. Therefore, when the learner watches foreign language videos, the learner can learn the paraphrase and the usage of each foreign language new word through the contrast relation between the bilingual subtitles. However, because grammars of different languages are different, the word orders of bilingual subtitles are often different, and the subtitle contents corresponding to two languages may not correspond to the same sentence line, so that for a foreign language new word, a user needs to manually input the new word into a new word query application to obtain the corresponding paraphrase and usage, resulting in low learning efficiency. Therefore, in the field of video foreign language teaching, how to effectively learn foreign language words becomes a hot issue of research at present.

Disclosure of Invention

The embodiment of the invention provides a video processing method, a terminal, a server and a storage medium, which can effectively float and display relevant information of a word to be displayed, which needs to be learned by a user, in a target video.

In a first aspect, an embodiment of the present invention provides a video processing method, including:

sending a playing request of a target video to a server;

receiving playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video, a target subtitle corresponding to the target video and relevant information of the word to be displayed, and the relevant information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed;

and playing the target video on a user interface, and displaying the target subtitle and floating the relevant information of the word to be displayed.

In a second aspect, an embodiment of the present invention provides a video processing method, including:

acquiring a target video and a target subtitle corresponding to the target video;

acquiring a word to be displayed for floating display according to a target subtitle and acquiring related information of the word to be displayed, wherein the related information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed;

and storing the target video, the target subtitle, the word to be displayed and the relevant information of the word to be displayed in a correlation manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video, the target subtitle and the word to be displayed is sent to the terminal, and the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when the target video is played on a user interface.

Optionally, if the floating display mode is a second display mode in which a floating display duration is set by a user, and the first display state includes a floating display state and a continuous play display state of the target video segment, updating the unfamiliar degree corresponding to the target word based on the unfamiliar degree update rule corresponding to the first display state includes:

acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value;

acquiring a time interval between the time when the continuous playing display state of the target video clip is detected and the time when the automatic playing of the target video clip is detected to be paused last time, and generating third unfamiliarity information based on the time interval;

and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the third degree of strangeness information.

Optionally, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, the second display state includes a subtitle display state; or if the floating display mode is a second display mode in which the floating display duration is set by the user, and the second display state includes a subtitle display state, updating the strangeness degree corresponding to the other word based on the strangeness degree update rule corresponding to the second display state includes:

acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state;

and updating the strangeness degrees corresponding to the other words based on the subtitle display strangeness degree adjustment amplitude value.

The method of claim 8, wherein the method further comprises:

obtaining display words included in the target subtitle fragments, wherein the display words at least include each word in the word set;

and aiming at other words which are not displayed in a preset word bank except the displayed word, acquiring the adding time of the other words which are not displayed and added into the preset word bank, and updating the strangeness degrees corresponding to the other words which are not displayed based on the adding time and a strangeness degree updating rule corresponding to the time state.

In a third aspect, an embodiment of the present invention provides a video processing apparatus, including:

a sending unit, configured to send a play request of a target video to a server;

a receiving unit, configured to receive play data corresponding to the target video sent by the server, where the play data includes the target video, a target subtitle corresponding to the target video, and relevant information of a word to be displayed, and the relevant information of the word to be displayed includes one or more of paraphrase information and use case information of the word to be displayed;

and the processing unit is used for playing the target video on a user interface, displaying the target caption and floating and displaying the relevant information of the word set to be displayed.

In a fourth aspect, an embodiment of the present invention provides another video processing apparatus, including:

the device comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a target video and a target subtitle corresponding to the target video;

the acquisition unit is further used for acquiring a word to be displayed which is in accordance with floating display according to the target subtitle and acquiring related information of the word to be displayed, wherein the related information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed;

and the storage unit is used for storing the target video, the target subtitle and the relevant information of the word to be displayed in an associated manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video, the target subtitle and the word to be displayed is sent to the terminal, and the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when playing the target video on a user interface.

In a fifth aspect, an embodiment of the present invention provides a terminal, where the terminal includes:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

sending a playing request of a target video to a server;

In a sixth aspect, an embodiment of the present invention provides a server, where the server includes:

acquiring a word to be displayed for floating display according to the target caption, and acquiring related information of the word to be displayed, wherein the related information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed;

and storing the target video, the target subtitle and the relevant information of the word to be displayed in a correlation manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video, the target subtitle and the word to be displayed is sent to the terminal, and the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when the target video is played on a user interface.

In a seventh aspect, an embodiment of the present invention provides a computer storage medium, where the computer storage medium stores first computer program instructions, and the first computer program instructions, when executed by a processor, are configured to perform the video processing method according to the first aspect; alternatively, the computer storage medium has stored therein second computer program instructions for executing the video processing method of the second aspect when executed by the processor.

In the embodiment of the invention, a server acquires a target video and a target subtitle corresponding to the target video, and further acquires a word to be displayed for floating display in the target video and related information of the word to be displayed according to the target subtitle; and storing the target video, the target caption and the relevant information of the word to be displayed in a correlation manner; when a playing request of a terminal about a target video is received, the target video, a target subtitle and relevant information of a word to be displayed are sent to the terminal as playing data, the target video is played on a user interface by the terminal, the target subtitle is displayed, and the relevant information of the word to be displayed is displayed in a floating mode, so that paraphrasing information or use case information of the word to be displayed, which needs to be learned by a user, in the target video is effectively displayed in a floating mode, and the learning efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1a is a schematic structural diagram of a video processing system according to an embodiment of the present invention;

FIG. 1b is a schematic diagram of a user interface of a terminal according to an embodiment of the present invention;

FIG. 1c is a schematic diagram of a user interface of another terminal according to an embodiment of the present invention;

FIG. 1d is a diagram illustrating a floating display of a target word in a user interface according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of another video processing method according to an embodiment of the present invention;

FIG. 4a is a schematic diagram of a setup interface of a predetermined word bank according to an embodiment of the present invention;

FIG. 4b is a diagram illustrating an alternative interface for setting up a predetermined word library according to an embodiment of the present invention;

FIG. 4c is a diagram illustrating a setup interface of another predetermined word bank according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of another video processing method according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of another video processing method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of another video processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The embodiment of the invention provides a video processing scheme which can be applied to the field of video foreign language teaching. In different foreign language teaching fields, the meaning of a word is different, and in the English teaching field, the word refers to an English word, such as 'hello', 'word', 'A' and the like; in the field of Chinese teaching, a word can refer to a Chinese character, such as "rare", "you", "yes", or a word, such as "epicosity", "storm exterminate", etc. Specifically, the video processing scheme provided by the embodiment of the present invention may include: acquiring a target video and a target subtitle corresponding to the target video; acquiring a word to be displayed for floating display according to the target caption, and acquiring related information of the word to be displayed; storing the target video, the target subtitle and the relevant information of the word to be displayed in a correlation manner; and when a playing request of a terminal about the target video is received, sending the target video, the target subtitle and the relevant information of the word to be displayed to the terminal as playing data, so that the terminal displays the target subtitle and floatingly displays the relevant information of the word to be displayed when the target video is played on a user interface.

The target video may refer to any piece of video stored in the server. The word to be displayed for floating display means that the word to be displayed needs to be displayed in a floating display mode for a user to learn. The related information of the word to be displayed can comprise one or more of paraphrase information and use case information of the word to be displayed, wherein the paraphrase information refers to the explanation of the meaning and the part of speech of the word to be displayed, and comprises any one or more of the shape, the phonetic symbol, the meaning explanation, the belonging word category such as noun, adjective, adverb and the like, the use context, common phrases and the like of the word to be displayed. The use case information is used for explaining the use method of the word to be displayed in various situations by making a sentence by using the word to be displayed. Assuming that the paraphrase information corresponding to the word "you" can be "person refers to pronoun, you", the use case information can be: "You are my friend".

Based on the above video processing scheme, an embodiment of the present invention provides a structure diagram of a video processing system, as shown in fig. 1 a. The video processing system shown in fig. 1a may include a server 101 and a terminal 102, where the server 101 stores a plurality of videos, the terminal 102 may be installed with a video client playing the videos, and a user may view the plurality of videos stored in the server 101 through the video client in the terminal 102.

In one embodiment, the server 101 executes the video processing scheme to process the plurality of videos stored therein, so as to obtain subtitles, words to be displayed, and related information of the words to be displayed corresponding to each of the plurality of videos. When a user inputs a play operation with respect to a target video (the target video is any one of a plurality of videos stored in a server) in the terminal 102, the terminal 102 transmits a play request with respect to the target video to the server; the server responds to the playing request and sends playing data to the terminal 102, wherein the playing data comprises a target video, a target subtitle corresponding to the target video and relevant information of a word to be displayed; the terminal 102 plays the target video in the user interface according to the data sent by the server 101, and displays the target caption and the information related to the word to be displayed in a floating manner in the user interface.

In one embodiment, the terminal 102 and the server 101 may be node devices in a blockchain, which is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the valid request after consensus is completed on storage, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, and transmits the encrypted service information to a shared account (network communication) completely and consistently, and records and stores the encrypted service information.

The video processing system described with reference to fig. 1a is applied to video foreign language teaching as an example to describe the video processing scheme provided by the embodiment of the present invention. In one embodiment, a plurality of segments of videos may be stored in the server, the terminal may be installed with a video client for playing the videos, and a user may view a plurality of videos stored in the server through the video client in the terminal, such as a business foreign language learning video, a daily dialogue foreign language learning video, a daily movie and television play, and the like. After the user logs in the video client, the user can select a target video which is interested in the user from a plurality of videos to play, wherein the target video is any one of the videos included in the video client.

Optionally, referring to fig. 1b, which is a schematic view of a user interface of a terminal according to an embodiment of the present invention, it is assumed that after a user logs in a video client of the terminal, a plurality of marks corresponding to videos may be displayed in the user interface of the video client, where the marks may include information such as a video name, a video duration, and a video cover image. The video cover image can be the first frame image in the video, or can be any frame image, or can be any image related to the video content but not the image in the video. As can be seen from fig. 1b, the mark corresponding to video 1, the mark corresponding to video 2, the mark corresponding to video 3, the mark corresponding to video 4, and the like are shown in the user interface of the video client.

If the user is interested in the video 3, the user can click a mark corresponding to the video 3 (at this time, the video 3 is assumed to be the target video), in response to the selection operation of the user on the mark corresponding to the target video, the terminal generates a playing request about the target video and sends the playing request to the server, wherein the playing request can include an identifier of the target video, such as the name of the target video; the server searches a target video from a plurality of stored videos according to the identification of the target video, acquires a target subtitle corresponding to the target video and relevant information of a word to be displayed by executing the video processing scheme, and sends the information to a terminal as play data corresponding to the target video; and the terminal plays the target video in a user interface, displays the target subtitle and floats and displays the relevant information of the word to be displayed in the process of playing the target video.

In one embodiment, the words to be displayed corresponding to the target video may include a plurality of words, and in order to enable the user to effectively read and learn the words to be displayed in a floating manner, the terminal may display only one of the words to be displayed in a floating manner on the user interface each time when playing the target video. Optionally, when the target video is played, an implementation manner that only one word is displayed in a floating manner at a time may be as follows: when the server executes the video processing scheme, the target video can be divided into a plurality of video segments, so that each word in the words to be displayed corresponds to one video segment, and when the terminal plays each video segment in the user interface, the word which needs to be displayed currently is found out to be displayed according to the corresponding relation. In one embodiment, the target video may be divided into a plurality of video segments according to the time required for playing each sentence of speech in the target video, each sentence of speech corresponds to one video segment, and the time length of one video segment is about 2 to 3 seconds. The floating display of the words can cause the user to pay attention to the words, so that the learning of the user to the words is facilitated, the words are effectively displayed, and the learning efficiency is improved.

In other embodiments, when the target video is played, an implementation that only one of the words to be displayed is displayed in a floating manner at a time may also be: when the terminal plays the target video, a word is selected from words to be displayed corresponding to the target video according to a preset selection rule each time, and relevant information of the selected word is displayed in a floating mode in a user interface.

In the following, the server divides the target video into a plurality of video segments to realize that only one word is displayed in a floating manner in the user interface at a time in the process of executing the above video processing. Assuming that the plurality of video segments include a target video segment and the words to be displayed include target words corresponding to the target video segment, how the terminal plays the target video segment and floatingly displays relevant information of the target words corresponding to the target video segment will be described below with reference to fig. 1b to 1 d. Assuming that the target video segment is as shown in fig. 1c, the target subtitle segment corresponding to the target video segment is "I met a girl, we talk and it wa epic.but the sun cam up and relevance set in"; further, the server acquires that a target word corresponding to the target video clip is epic, and then acquires related information of the target word. The server sends the target video clip, the target caption and the related information of the target word to the terminal, when the terminal displays the target video clip of the target video in the user interface, as shown in fig. 1d, the target caption corresponding to the target video clip is displayed, as shown in fig. 1d by 103, and the related information of the target word is displayed in the user interface in a floating manner. Assume that the relevant information of the target word may include paraphrase information of the target word, such as: "phonetic symbol" ("phonetic symbol")^|epIk]The term: narrate poetry, historical poetry, high elevation and amateur elevation; adjectives: steward, steward ", as shown at 104 in fig. 1 d.

Based on the above description, an embodiment of the present invention provides a flow chart diagram of a video processing method, as shown in fig. 2. The video processing method described in fig. 2 may be performed by a terminal. The video processing method shown in fig. 2 may include the steps of:

step S201, the terminal sends a playing request about the target video to the server.

In one embodiment, a video client for playing videos can be installed in the terminal, the terminal displays marks of a plurality of videos stored in the video client on a user interface, and a user can input a selection operation on a target video in the user interface, wherein the target video can be any one of the videos; the terminal can generate a playing request after detecting the selection operation of the user for the target video, and send the playing request to the server.

In an embodiment, the play request may include an identifier of the target video, so that after receiving the play request, the server finds the target video according to the identifier of the target video, and further carries the target video, a target subtitle associated with the target video, and related information of a word to be displayed in the play data and sends the information to the terminal.

In an embodiment, before generating the play request, the terminal may further detect whether the play data of the target video is stored in the local storage according to the identifier of the target video; if yes, acquiring playing data of the target video from a local storage; and if not, carrying the identification of the target video in the playing data and sending the identification to a server.

Step S202, the terminal receives playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video, target subtitles corresponding to the target video and relevant information of words to be displayed.

In one embodiment, the playing data stored in the server may be obtained by performing real-time video processing on the target video when the server receives a playing request from the terminal; alternatively, the server may perform video processing on the target video before receiving a play request from the terminal.

And the target video included in the playing data is acquired by the server according to the identifier in the playing request. Any video segment is composed of a plurality of frames of images and an audio sequence, wherein the audio sequence can be called a speech, each frame of image forms a video segment, and each video segment corresponds to a speech. And converting the audio sequence corresponding to the target video into characters, namely the characters are the target subtitles corresponding to the target video.

The words to be displayed refer to the words needing floating display in the target subtitles of the target video, the floating displayed words can attract the attention of the user, and the learning efficiency of the user on the words is improved. The related information of the word to be displayed included in the playing data may refer to any one or more of paraphrase information and use case information of the word to be displayed. The paraphrase information of the word to be displayed can refer to information such as appearance, part of speech, explanation, common phrases and the like of the word to be displayed. The use case information of the word to be displayed may refer to an application example of the word to be displayed under different meanings, and the like.

It should be understood that the target video includes at least one speech, and the server may divide the target video into a plurality of video segments according to the duration of each speech in the target video, and the playing of the target video is substantially to sequentially play the video segments of the target video. And each video clip corresponds to a speech, and the speech corresponding to each video clip is converted into a character form to obtain a subtitle clip of each video clip. That is to say, the target video included in the playing data may include a plurality of video segments, the target subtitle includes a subtitle segment corresponding to each video segment, and the word to be displayed may include a word corresponding to each video segment.

And S203, the terminal plays the target video on a user interface, and displays the target caption and the relevant information of the word to be displayed in a floating manner.

As can be seen from the foregoing, the implementation of step S203 may be: and the terminal sequentially plays each video clip of the target video in the user interface, displays the subtitle clip corresponding to each video clip and displays the word corresponding to the corresponding video clip in a floating manner. Taking an example that a target video includes a target video clip, specifically, the terminal plays the target video on a user interface, displays the target subtitle and floatingly displays the relevant information of the word to be displayed, including: and when the target video clip is played on a user interface, displaying the target subtitle clip and floating and displaying the related information of the target word.

In one embodiment, in the process of playing a target video clip, the terminal further acquires a preset floating display mode, and acquires a display state set of the target video clip corresponding to the floating display mode; sending the display state set to the server, so that the server acquires a first display state associated with the target word from the display state set, and updating the strangeness degree corresponding to the target word based on a strangeness degree updating rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set, and updating the strangeness degree corresponding to the other words based on a strangeness degree updating rule corresponding to the second display state.

In one embodiment, the floating display mode may be set by a user or may be set by a default of the terminal. The floating display mode may include a first display mode in which the floating display duration is set by the terminal and a second display mode in which the floating display duration is set by the user.

In one embodiment, the first display mode in which the floating display duration is set by the terminal may be: the terminal presets a time length threshold value for floatingly displaying the relevant information of the target word, detects the time length of the floating display of the relevant information of the target word when the relevant information of the target word is floatingly displayed in the user interface, and ends the floating display of the relevant information of the target word if the time length is equal to the time length threshold value. In the first display mode, the user may manually pause the target video if the user needs the relevant information of the target word to be floatingly displayed for a longer time for the user to peruse and learn the target word during the time of floatingly displaying the relevant information of the target word. Therefore, the user interface stays at the target video clip, the terminal stops monitoring the duration of the continuous floating display of the relevant information of the target word after the pause playing operation is detected, and the terminal continues monitoring on the basis of the duration of the floating display of the relevant information of the target word when the pause playing operation is detected after the user continues playing the target video clip. For example, when the play pause operation is detected, the terminal monitors that the related information of the target word is displayed in a floating manner for 1 minute, and when the play pause operation is detected, the terminal interrupts monitoring of the floating display duration of the related information of the target word; when a continuous playing instruction is detected, the terminal continues to monitor on the basis of 1 minute, the preset time threshold of the terminal is assumed to be 2 minutes, and if the time threshold is 1 minute after the continuous playing instruction is detected, the terminal determines to finish floating and displaying the relevant information of the target word.

In an embodiment, in order to ensure that the target subtitles and the floating display of the target video are synchronized, in a general case, the time threshold of the related information of the target word for floating display may be set to be equal to or less than the time required for playing the target speech corresponding to the target video segment. The terminal may obtain in advance at which time point the target lines start to be played and at which time point the target lines end to be played on the playing time axis of the target video, that is, each line of lines corresponds to a playing time range. Optionally, a playback function button may be disposed on the user interface of the terminal for playing the target video, and if the user interface of the terminal is currently playing the target video segment and the target speech-line corresponding to the target video segment is already played partially, if the playback function button is triggered, the terminal obtains a playing time range corresponding to the target video segment, and starts to play the target video segment and the target speech-line corresponding to the target video segment again from the first word of the target speech-line corresponding to the target video segment according to the playing start time in the playing time range. For example, the playing time range of the target speech corresponding to the target video segment is 10 seconds from 3 rd minute to 3 rd minute, and if the target speech has already been played for 3 rd minute and 15 seconds, and it is detected that the playback function button is triggered, the terminal starts playing the target speech again from 3 rd minute.

Due to the playback function, in the first display mode, if the terminal finishes floating the relevant information of the target word within the time of playing the target video segment and the target speech, the user can click the playback function button if the user wants to continue to carefully and learn and read the relevant information of the target word.

Based on the above description, in the first display mode in which the terminal sets the floating display duration, the display state corresponding to the target video clip includes: a floating display state, a subtitle display state, and a paused display state of the target video segment and a playback display state of the target video segment; the first display state associated with the target word may include any one or more of a floating display state, a subtitle display state, a paused display state of the target video, and a playback state of the target video. The floating display state means that the target word is displayed in a floating mode in the user interface, the subtitle display state means that the target word is displayed in the current subtitle, and the pause playing display state means that the user triggers the pause playing of the target video within the time period of the floating display of the target word; the playback display state refers to that the user triggers the playback target video within the time of playing the current speech-line corresponding to the current frame image. The second display states associated with other words in the set of words may include: any one or more of a subtitle display state, a paused display state of the target video segment, and a playback display state of the target video segment;

in one embodiment, the second display mode for setting the floating display time period by the user may be: when the terminal detects that the relevant information of the target word starts to be displayed in a floating mode in the user interface, the terminal automatically pauses the playing of the target video until the fact that the user continues playing operation is detected, and then the target video continues to be played. In the second display mode, the time length of the floating display of the relevant information of the target word is equal to the time difference between the automatic pause time and the time of detecting the user to continue the playing operation. Therefore, in the second display mode, the user can continue to play the target video along with the selection according to the reading and learning requirements of the user, and the floating display time length of the target video to be displayed is indirectly determined. In general, after detecting that the user continues the playing operation, the floating display of the related information of the target word is finished.

Based on this, in the second display mode, the display state set corresponding to the target video segment may include: a floating display state, a subtitle display state, and a continuous play display state of the target video clip. The first display state of the target word association may include: any one or more of a floating display state, a subtitle display state, and a continued display state of the target video. The continuous display state is that the user continuous playing operation is detected within a period of time after the playing of the target video is automatically paused, and in this state, the floating display of the relevant information of the target word is finished, in other words, the continuous display state determines the time length of the floating display of the relevant information of the target word.

According to the embodiment of the invention, a terminal sends a playing request about a target video to a server and receives playing data about the target video sent by the server, wherein the playing data comprises the target video, a target subtitle corresponding to the target video and relevant information of a word to be displayed. Further, the target video is played on a user interface, the target subtitle is displayed, and the related information of the word to be displayed is displayed in a floating mode. The words to be displayed refer to the words needing to be displayed in a floating mode, so that the related information of the words to be displayed, needing to be learned by the user, in the target video is effectively displayed in a floating mode, the attention of the user is attracted, and the learning efficiency is improved.

Based on the above description, an embodiment of the present invention provides a flowchart of another video processing method, as shown in fig. 3. The video processing method described in fig. 3 may be performed by a server. The video processing method shown in fig. 3 may include the steps of:

step S301, a target video and a target subtitle corresponding to the target video are obtained.

In one embodiment, a plurality of video data can be stored in the server, the video data can support the corresponding video to be played in the terminal, and when a video playing request sent by the terminal is received, the corresponding video data is sent to the terminal so that the terminal plays the video in the user interface. Optionally, the target subtitles corresponding to the target video are obtained by converting the lines of the target video into characters.

Step S302, words to be displayed for floating display are obtained according to the target subtitles, and relevant information of the words to be displayed is obtained.

Optionally, the obtaining of the relevant information of the word to be displayed may include: detecting whether relevant information corresponding to the word to be displayed is stored in a local storage; if yes, determining the relevant information in the local storage as the relevant information corresponding to the word to be displayed; if not, the server can query the word to be displayed as a search keyword in a dictionary database for relevant information corresponding to the word to be displayed and store the relevant information.

It should be understood that the target video includes at least one speech, and the target video may be divided into a plurality of video segments according to the duration of each speech in the target video, where each video segment corresponds to one speech. In order to facilitate the user to understand the target video, the lines corresponding to the target video are usually converted into text, and the text is used as the target subtitles of the target video. As such, each video clip corresponds to a subtitle clip. That is, the target video segment includes a plurality of video segments, and the target subtitle includes a subtitle segment corresponding to each video segment.

In one embodiment, the implementation manner of step S202 may be: obtaining a word for floating display corresponding to each video clip according to the subtitle clip corresponding to each video clip; and taking the word which corresponds to each video clip and is used for floating display as the word to be displayed. That is, the word to be displayed may include a plurality of words, and the related information of the word to be displayed may include related information corresponding to each word.

In an embodiment, the obtaining, by the target subtitle, a word for floating display corresponding to each video clip according to the subtitle clip corresponding to each video clip includes: acquiring a word set which is included in the target subtitle fragment and meets the floating display condition; and acquiring a target word corresponding to the target video clip from the word set according to the display priority corresponding to each word in the word set. The target video segment may refer to any one of a plurality of video segments, and the target subtitle segment is obtained by converting a speech corresponding to the target video segment into a text.

In an embodiment, the method for the server to obtain the target subtitle segment corresponding to the target video segment may be: before video processing is carried out on a target video, a server divides the target video into a plurality of video segments according to the playing time of each sentence of lines of the target video; converting the speech corresponding to each video clip into characters to obtain a subtitle clip corresponding to each video clip; and acquiring a segment identifier corresponding to each video segment, and storing the segment identifier and the subtitle segment corresponding to the corresponding segment in an associated manner. When the server starts to process the target video segment in the target video, the segment identification corresponding to the target video segment is obtained, and the target subtitle segment corresponding to the target video segment is obtained according to the corresponding relation between the segment identification and the subtitle segment. The clip identifier may be a start playing time and an end playing time corresponding to a certain video clip, for example, the clip identifier corresponding to the video clip a is (00: 03: 05-00: 03: 08), which indicates that the video clip a starts playing in the 5 th second at the 3 rd minute and ends playing in the 8 th second at the 3 rd minute.

In other embodiments, the method for the server to obtain the target subtitle segment corresponding to the target video segment may be: before the server carries out video processing on the target video, dividing the target video into a plurality of video segments according to the playing time of each sentence of lines of the target video; acquiring a segment identifier corresponding to each video segment, and storing the segment identifier and the speech corresponding to each video segment in a correlation manner; when the server starts to process a target video segment in the target video, segment identification corresponding to the target video segment is obtained, a target speech corresponding to the target video segment is obtained according to the corresponding relation between the segment identification and the speech, and the target speech is converted into characters, so that a target subtitle segment corresponding to the target video segment is obtained.

In one embodiment, if a word meets the floating display condition, it means that the word is a word that needs to be displayed in the user interface of the terminal in a floating display manner, in other words, if it is determined that a word needs to be displayed in the user interface in a floating display manner, it is determined that the word meets the floating display condition. Therefore, the word set meeting the floating display condition means that each word included in the word set needs to be displayed in the user interface in a floating display manner. The floating display means that some words are displayed in a manner of floating on the content displayed on the user interface, such as a bullet screen.

In an embodiment, the implementation of the server obtaining the word set corresponding to the target subtitle segment and meeting the floating display condition may be: the method comprises the steps that a dynamic preset word bank is preset by a server, and a plurality of words and adding time of each word added to the preset word bank are recorded in the dynamic preset word bank. The dynamic preset word library is that before video processing is carried out on a target video, a server sets a preset word library, and when video processing is carried out on the target video, words which do not belong to the preset word library and are included in target caption segments or target lines corresponding to target video segments are added into the preset word library so as to enrich the preset word library; the server analyzes each word included in the target caption segment, and obtains the time difference between the adding time of each word added into the preset word library and the current time; determining each word with the time difference being equal to or smaller than a first time length threshold value and the time difference being larger than a second time length threshold value as a word meeting the floating display condition in the target video clip; these words are grouped into a set of words that meet the floating display condition.

The first duration threshold may be 0, and if a time difference between an addition time of a word to the preset word bank and a current time is equal to or less than 0, it indicates that the word is a word that has not been added to the preset word bank and is to be added to the preset word bank. Words not in the preset word library are completely strange to the user, and the words are determined to be words meeting the floating display condition, so that the user can learn strange words through floating display, the vocabulary of the user is increased, and the purpose of foreign language learning is achieved. The second duration threshold may be 1 hour, or 1 hour and 30 minutes, or any other duration not equal to 0. If the time difference between the adding time of a word added into the preset word bank and the current time is larger than the second duration threshold, the word is considered to belong to a word to be forgotten by the user, and the word is determined to be a word meeting the floating display condition, so that the user can strengthen the memory of the word by floating display of the word, and the word is prevented from being forgotten completely.

In one embodiment, the predetermined word bank stored in the server may be acquired from the terminal. The implementation mode that the terminal presets the preset word bank can be as follows: a video client installed in the terminal provides a setting function of a preset word bank, and after a user inputs a trigger operation for setting the preset word bank, the terminal can display a setting interface of the preset word bank to the user, and fig. 4a is a schematic diagram of the setting interface of the preset word bank provided by the embodiment of the present invention. The setting interface of the preset word library may include examination options of foreign language levels, such as below english level four, english level six, professional level eight, toffee, yasi, etc. shown in fig. 4 a; selecting corresponding foreign language level examination options by a user according to the foreign language level of the user; after the terminal detects that a user selects a target foreign language level examination option in the foreign language level examination options, preset words corresponding to the target foreign language level examination are obtained, and all the preset words form a preset word bank. The preset word may be a word with the highest frequency of occurrence in the target foreign language level test, or all words occurring in the target foreign language level test. For example, the user selects the foreign language level examination option of english six in fig. 4a, and the terminal obtains preset words corresponding to the english six level option, and the words form a preset word bank.

In other embodiments, referring to fig. 4b, a setting interface of another preset word library provided in the embodiments of the present invention may include a word import option 401 and a word addition area 402. When the user clicks 401, the user may import the word from a storage device of the terminal or other external device. If the user imports a word library, the word import option 401 may appear with the user imported word library token as shown in FIG. 4 c. After the user clicks on the word addition area 402 in FIG. 4b, the user may enter one or more words in the area 402. After completing the word import and word addition, the user may click on the generate word library button 403 in FIG. 4 b; after detecting the operation of the user, the terminal may generate a preset word bank according to the word bank included in the word import option 401 and the word included in the word addition region 402.

In one embodiment, a target word corresponding to the target video clip may be obtained from the word set according to a display priority corresponding to each word in the word set. The display priority corresponding to each word in the word set may be determined according to the degree of strangeness corresponding to each word and/or the degree of importance corresponding to each word, where the degree of strangeness corresponding to a word is used to measure the degree of strangeness of the word to a user, or the degree of strangeness is used to reflect a learning condition of the user on the word. If the word is never seen by the user from the beginning of the target video playing to the current moment, the word is safe and unfamiliar to the user, and as the word appears in the caption or floats in the user interface, the unfamiliarity corresponding to the word is correspondingly reduced. The importance is used to reflect the importance of a word to the user's learning of a word in a language, and the size of the importance depends on the frequency of use of the word in the user's viewing of all historical foreign language learning videos.

In one embodiment, the unfamiliarity is determined according to history display record information corresponding to each word, wherein the history display record information is recorded when the terminal plays the target video and is sent to the server. The importance is determined according to a bag-of-words model algorithm. In one embodiment, how the importance degree corresponding to each word is determined is described by taking a target word in a word set (the target word is a word corresponding to the target video clip and used for floating display) as an example. Optionally, in the embodiment of the present invention, the manner of calculating the importance degree corresponding to the target word may be: acquiring the word frequency of the target word in a target subtitle segment corresponding to the target video; acquiring the inverse document frequency corresponding to the target word according to the historical subtitles and the target subtitles corresponding to the historical video watched by the user; and determining the importance corresponding to the target word based on the word frequency and the inverse document frequency corresponding to the target word.

And converting all the lines in the target video into characters to obtain the target subtitles corresponding to the target video. The method for acquiring the word frequency of the target word in the target caption may be as follows: and calculating the ratio of the number of times of the target word appearing in the target caption to the total number of words included in the target caption, wherein the ratio is the word frequency. This calculation process can be expressed by the following formula (1):

where tf represents the word frequency corresponding to the target word in the target video, word _ count represents the number of times the target word appears in the target subtitle, and total _ count represents the total number of words included in the target subtitle.

In one embodiment, the obtaining the inverse document frequency corresponding to the target word according to the historical subtitles and the target subtitles corresponding to the historical video watched by the user may include: and calculating the ratio of the total number of the historical subtitles corresponding to the historical video watched by the user to the number of the subtitles comprising the target word, and then carrying out logarithm operation on the ratio, wherein the operation result is the inverse document frequency. The above calculation process can be realized by formula (2):

where idf represents the inverse document frequency, total _ document represents the total number of history subtitles, and contact _ word represents the number of subtitles including the target word.

In one embodiment, a word frequency and an inverse document frequency corresponding to a target word are determined, and the importance corresponding to the target word may be determined based on the word frequency and the inverse document frequency corresponding to the target word. Specifically, the word frequency and the inverse document frequency may be multiplied, and the operation result is the importance of the target word. Alternatively, the calculation of the importance is represented by formula (3):

tfidf＝tf·idf (3)

wherein, the larger the value of tfidf, the greater the importance, that is, the more important the target word is in the target video. In one embodiment, the terminal can calculate the importance of each word by using the method.

In one embodiment, before the target video is played, any word corresponds to an initial unfairness value, and as the playing time of the target video goes by, each word may be gradually displayed in the user interface of the terminal, so that the unfairness of each word changes in real time, and at this time, determining the unfairness of each word according to the historical display record information of each word may be considered to update the current unfairness of each word according to the historical display record of each word. For a target word in the word set, the history display record information of the target word comprises a first display state associated with the target word, and for other words in the word set except the target word, the history display records corresponding to the other words comprise other associated second display states. Further, the unfamiliarity of the target word is updated based on the first display state, and the unfamiliarity of other words is updated based on the second display state.

Specifically, determining the degree of strangeness of each word based on the history display record information of each word may include: acquiring a display state set corresponding to the target video clip; acquiring a first display state associated with the target word from the display state set, and updating the unfamiliarity corresponding to the target word based on an unfamiliarity updating rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the display word set from the display state set, and updating the strangeness degree corresponding to the other words based on a strangeness degree updating rule corresponding to the second display state.

As can be seen from the foregoing, in different floating display modes of the terminal, the display state sets corresponding to the target video clips are also different, and the first display state and the second display state are also different. The specific embodiment of determining the degree of strangeness of each word based on the history display record information of each word will be described in detail in the following embodiments.

And step S303, storing the related information of the target video, the target caption and the word to be displayed in a correlation mode.

Based on the above description, the implementation manner of step S303 may be: and storing the target video clip, the target subtitle clip and the relevant information corresponding to the target word included in the target video in a correlation manner, so that when the terminal plays the target video clip in a user interface, the target subtitle clip is displayed and the relevant information of the target word is displayed in a floating manner.

In the embodiment of the invention, a server acquires a target video and a target subtitle corresponding to the target video, and further acquires a word to be displayed for floating display in the target video and related information of the word to be displayed according to the target subtitle; and the target video, the target caption and the first-relevant information of the word to be displayed are stored in an associated manner, so that when a playing request of the terminal about the target video is received, the relevant information of the target video, the target caption and the word to be displayed is sent to the terminal as playing data, the terminal plays the target video on a user interface, and displays the target caption and the relevant information of the word to be displayed in a floating manner, the purpose that the relevant information of the word to be displayed in the target video, which needs to be learned by a user, is effectively displayed in a floating manner is achieved, and the learning efficiency is improved.

Based on the above description, a flowchart of another video processing method according to an embodiment of the present invention is shown in fig. 5, where the video processing method shown in fig. 5 may be executed by a server, and specifically may be executed by a processor of the server. The video processing method shown in fig. 5 may include the steps of:

step S501, the server obtains a target video segment included in the target video and a target subtitle segment corresponding to the target video segment.

And step S502, acquiring a word set which is included in the target subtitle fragment and meets the floating display condition.

And S503, acquiring a target word corresponding to the target video clip from the word set according to the display priority corresponding to each word in the word set.

Step S504, the server stores the target video clip, the target subtitle clip and the related information of the target word in a correlation mode.

In an embodiment, some possible implementations included in steps S501 to S504 may refer to descriptions of related steps in fig. 2, and are not described herein again.

It should be understood that, the above steps S501 to S504 specifically describe how to obtain the target subtitle segment and the target word corresponding to the target video segment in the target video, and for other video segments except the target video segment in the target video, the same method may be adopted to obtain the subtitle segments and the words for floating display corresponding to other video segments.

Step S505, the terminal sends a play request about the target video to the server.

Step S506, the server sends the target video clip, the target subtitle clip and the relevant information of the target word to the terminal, and the terminal plays the target video clip and displays the target subtitle clip in the user interface.

In one embodiment, if the user wants to watch the target video through the terminal, the user inputs a selection operation on the target video through a video client of the terminal. And after detecting the selection operation of the user, the terminal sends a playing request about the target video to the server.

In one embodiment, the server responds to a playing request of the terminal, sends each video clip included by the target video, a subtitle clip corresponding to each video clip and relevant information of words for floating display to the terminal, and the terminal displays the video clips of the target video in sequence. In the following description, the video processing method according to the embodiment of the present invention will be described by taking a terminal to display a target video clip as an example, except for the feature description.

And step S507, the terminal acquires a preset floating display mode and floats and displays the relevant information of the target in the user interface based on the floating display mode.

In one embodiment, the floating display mode may include a first display mode in which the terminal sets the floating display duration and a second display mode in which the user sets the floating display duration, and in brief, the first display mode in which the terminal sets the floating display duration refers to: and if the time length of the target word displayed in the floating mode in the user interface is equal to the preset time length threshold value of the terminal, the terminal finishes the floating display of the target word. In this case, if the user wants to view the related information of the target word for a long time, the user may manually pause the target video clip or manually play back the target video clip during the time of floating the related information of the display target word. The second display mode for setting the floating display time length by the user is as follows: when the floating display of the related information of the target word is started, the terminal automatically suspends the playing of the target video clip to give the user sufficient time to peruse the related information of the target word. And if the user finishes checking, clicking to continue playing the target video clip, wherein in this case, the time difference between the automatic pause time and the continuous playing time is the time length of floatingly displaying the relevant information of the target word.

And step S508, the terminal acquires a display state set of the target video clip corresponding to the floating display mode and sends the display state set to the server.

In one embodiment, the obtaining the display state set of the target video segment may include: detecting user play behavior operations in the process of displaying the target subtitle segment, such as one or more of pause operation, playback operation and continuous play operation, of the target video segment; acquiring a display mode of each word in the word set in the process of playing the target video clip, such as one or more of display in subtitles, floating display and the like; and generating a display state set according to the monitored user playing behavior operation and the obtained display mode of each word in the process of playing the target video clip.

In one embodiment, as can be seen from the foregoing, when the floating display mode is the first display mode, the user needs to manually pause or manually play back to repeatedly view the relevant information of the target word for a long time; when the floating display mode is the second display mode, the user does not need to pause manually, and only needs to input the continuous playing operation. Therefore, in the process of playing the target video clip, the user playing behavior operations detected by the terminal are different, so that the generated display state sets of the target video clip are different. Therefore, in different floating display modes, the display state sets of the target video segments are different, and specifically, the implementation of step S508 may be: if the floating display mode is the first display mode with the floating display time length set by the terminal, the display state set comprises: a floating display state, a subtitle display state, a paused display state of the target video segment, and a playback display state of the target video segment; if the floating display mode is a second display mode in which the floating display time length is set by the user, the display state set comprises: a floating display state, a subtitle display state, and a continuous play display state of the target video clip.

In step S509, the server obtains a first display state associated with the target word from the display state set, and updates the unfamiliar degree corresponding to the target word based on the unfamiliar degree update rule corresponding to the first display state.

In one embodiment, the target word and the other words in the set of words are not displayed in the same way, such as though the target word and the other words are displayed in subtitles in the user interface, the target word is displayed floating in the user interface. Therefore, the first display state associated with the target word and the second display state associated with the other words in the display states are also different.

Under different floating display modes, the display state sets are different, so that under different floating display modes, the first display states associated with the target words are different; under different floating display modes, the second display states of other word associations can be the same or different.

In one embodiment, if the floating display mode is the first display mode, the first display state associated with the target word may include any one or more of a floating display state, a paused display state of the target video segment, and a playback display state of the target video segment; if the floating display mode is the second display mode, the first display state associated with the target word may include any one or more of a floating display state and a continuous play display state of the target video segment.

In other embodiments, if the floating display mode is the first display mode, the first display state associated with the target word may further include any one or more of a floating display state, a subtitle display state, a paused display state of the target video segment, and a playback display state of the target video segment; and if the floating display mode is the second display mode, the first display state associated with the target word further comprises any one or more of a floating display state, a subtitle display state and a continuous playing display state of the target video clip.

To summarize, for the target word, the floating display state and the subtitle display state may be mutually exclusive, that is, if the floating display state is included in the first display state associated with the target word, the subtitle display state may not be included in the first display state; alternatively, the floating display state and the subtitle display state may coexist, that is, both the floating display state and the subtitle display state may be included in the first display state. In the following description of the embodiments of the present invention, a floating display state and a subtitle display state are described as examples of mutual exclusion.

In one embodiment, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, and the first display state includes a floating display state, a paused display state of the target video segment, and a playback display state of the target video segment, the updating the strangeness corresponding to the target word based on the strangeness update rule corresponding to the first display state includes: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; obtaining pause time corresponding to the pause playing display state and playback times corresponding to the playback state, and generating second strangeness information based on the pause time and the playback times; and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the second degree of strangeness information.

The floating display strangeness degree adjustment amplitude value may be a positive number smaller than 1, and in general, the floating display strangeness degree adjustment amplitude value may be set to 0.2. The first unfamiliarity information generated based on the floating display unfamiliar adjustment amplitude value may include: and performing first updating processing on the strangeness degree corresponding to the target word through the floating display strangeness degree adjusting amplitude value to obtain a first updating strangeness degree. Optionally, the server performs a first update process on the strangeness degree corresponding to the target word based on the floating display strangeness degree adjustment magnitude value to obtain a first update strangeness degree, which may be implemented by the following formula (4):

S_d(1)＝S_d(1-k₂) (4)

wherein k is₂Indicating the magnitude of the adjustment of the degree of strangeness of the floating display, S_dIndicating the degree of strangeness corresponding to the target word, S_d(1)Indicating a first update degree of strangeness.

In an embodiment, the implementation manner of obtaining the pause duration corresponding to the pause playing display state may be: and the terminal monitors the time of inputting the pause operation by the user and the time of continuously playing by the user, determines the time difference as pause duration and sends the pause duration to the server. It should be understood that if the pause of the target video is too long, it is likely that the user is not reading the information related to the target word, but may stop watching the target video, so in order to improve the update strangeness, the terminal may preset a pause duration threshold, and if the time difference is smaller than the pause duration threshold, the time difference is determined as the pause duration of the pause display state and sent to the server. If the time difference is greater than the pause duration threshold, the terminal can ignore the pause playing operation of the user, namely the server fails to acquire the pause duration.

In an embodiment, the implementation manner of the server obtaining the playback times corresponding to the playback display state may be: and the terminal monitors the number of times of clicking a playback function button in the user interface, determines the number of times as the number of times of playback corresponding to the playback display state, and sends the number of times of playback to the server. It should be understood that if the number of playbacks is too large, it may indicate that the user is not reading the relevant information of the target word displayed floating, it may be that the user leaves the screen, or that the terminal is always in an automatic playback state. Therefore, in order to improve the accuracy of updating the strangeness degree, the terminal may set a playback number threshold, and if the number of times the playback function button is clicked is less than the playback number threshold, determine the number as the playback number corresponding to the playback display state. If the number of times of playback is greater than the threshold number of times of playback, the playback can be ignored, i.e., the server fails to acquire the number of times of playback.

In one embodiment, the second unfamiliar degree information generated by the server based on the pause duration and the playback frequency may include a second update processing on the unfamiliar degree corresponding to the target word based on the pause duration and the playback frequency, and the obtained second update unfamiliar degree. Optionally, performing the second update process on the strangeness degree corresponding to the target word based on the pause duration and the playback frequency may be implemented by the following formula (5):

S_d(2)＝S_d(1-k₃r-k₄t₁),(r∈[1,r_m]∧t∈[1,t_m]) (5)

where r denotes the number of playbacks, t₁Indicates the time of pause, k₃Weight value, k, corresponding to the number of playbacks₄Representing a weight value corresponding to the time of pause, S_d(2)Indicating a second degree of update strangeness, r_mIndicating a threshold number of playbacks, t_mIndicating a pause duration threshold. In general, k is₃Set to 0.01, k₄May be set to 0.001.

After the first unfamiliar degree information and the second unfamiliar degree information are obtained, the unfamiliar degree corresponding to the target word can be updated according to the first unfamiliar degree information and the second unfamiliar degree information. In one embodiment, the manner of updating the unfamiliarity corresponding to the target word according to the first unfamiliarity information and the second unfamiliarity information may be: setting a first weight value for a first update degree included in the first update degree information, and setting a second weight value for a second update degree; and adding the result of multiplying the first update strangeness by the first weight value and the result of multiplying the second update strangeness by the second weight value, and taking the operation result as the update strangeness corresponding to the target word.

In other embodiments, if the floating display mode is a second display mode in which a floating display duration is set by a user, and the first display state includes a floating display state and a continued playing display state of the target video segment, the updating the strangeness degree corresponding to the target word based on the strangeness degree update rule corresponding to the first display state includes: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; acquiring a time interval between the time when the continuous playing display state of the target video clip is detected and the time when the automatic playing pause is detected last time, and generating third unfamiliar degree information based on the time interval; and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the third degree of strangeness information.

The embodiment of obtaining the floating display unfamiliar degree adjustment amplitude value corresponding to the floating display state and generating the first unfamiliar degree information based on the floating display unfamiliar degree adjustment amplitude value is the same as the method for generating the first unfamiliar degree information. And the time interval between the detection of the continuous playing display state of the target video clip and the last detection of the automatic playing pause is obtained by the terminal and sent to the server. In one embodiment, the time interval between the time when the display state of the target video segment is detected to be continuously played and the time when the automatic pause is detected last time, that is, how long the terminal detects that the target video is automatically paused, the longer the pause time is, the longer the time when the user learns the relevant information of the target word displayed in a floating manner, the less the strangeness of the target word is. In one embodiment, the third unfamiliar degree information generated by the server based on the time interval may include a third update degree corresponding to the target word based on the time interval, and the obtained third update degree may be specifically represented by the following formula (6):

S_d(3)＝S_d(1-k₅t₂) (6)

wherein S is_d(3)Indicates a third degree of update strangeness, t₂Time interval, k, between the time when the display status of the target video segment is detected to continue playing and the time when the automatic pause playing was last detected₅Represents a weight coefficient, typically a positive integer less than 1.

In an embodiment, the updating of the degree of strangeness corresponding to the target character according to the first degree of strangeness information and the third degree of strangeness information may be implemented by: setting a first weight value for a first update degree included in the first unfamiliar degree information; setting a third weight value for a third update degree included in the third unfamiliar degree information; and adding the result of multiplying the first update strangeness degree by the first weight value and the result of multiplying the third update strangeness degree by the second weight value, and taking the operation result as the update strangeness degree corresponding to the target word.

In step S5010, the server acquires a second display state associated with the other words excluding the target word from the display state set, and updates the strangeness degree corresponding to the other words based on the strangeness degree corresponding to the second display state.

In one embodiment, when the floating display mode is the first display mode and the second display mode, the second display states associated with other words may be the same, and are both subtitle display states. In other embodiments, when the floating display mode is the first display mode and the second display mode, the second display states associated with other words may be different. Specifically, in the first display mode, the second display state associated with the other words may include any one or more of a subtitle display state, a pause display state of the target video segment, and a playback display state of the target video segment; in the second display mode, the second display state associated with other words may include any one or more of a subtitle display state and a playback continuation display state of the target video segment. In practice, the second display state includes at least a subtitle display state regardless of the floating display mode.

In one embodiment, if the floating display mode is the first display mode and the second display mode, and the second display states associated with other words can be the same and are both subtitle display states, the server updates the strangeness degree corresponding to other words based on the strangeness degree corresponding to the second display state, including: acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state; and updating the strangeness degrees corresponding to the other words based on the subtitle display strangeness degree adjustment amplitude value. It should be understood that, for any other word, although the corresponding paraphrase information is not shown in the caption, the word will be seen by the user in the caption, and the user can roughly know the meaning of the word according to the context relatively easily, so the strangeness degree corresponding to the word should be reduced appropriately. And, as the number of times of the appearance of one other word in the caption is more and more, the corresponding strangeness should approach a certain threshold value, which is assumed to be b, b is not equal to 0 (b is set to be equal to 10 in general), because the user cannot completely learn the paraphrase information of one word through the caption.

In one embodiment, the value of the subtitle display unfamiliar degree adjustment amplitude may be set by the terminal, and in general, the value of the subtitle display unfamiliar degree adjustment amplitude is smaller than the value of the floating display unfamiliar degree adjustment amplitude and may be set to 0.05. The updating of the strangeness degree corresponding to the other words based on the subtitle display strangeness degree adjustment amplitude value can be realized in the following manner (7):

wherein S is_qgIndicating updated degree of strangeness of other words, S_qIndicating the corresponding degree of strangeness, k, of other words before the update has been made₁And the adjustment amplitude value of the strangeness degree of the subtitle display is represented. As can be seen from fig. 7, when the strangeness degree corresponding to other words is greater than the strangeness degree threshold b, a step of updating the strangeness degree according to the degree-of-strangeness adjustment magnitude value of the subtitle display is performed; and when the degree of strangeness corresponding to other words is not greater than the degree of strangeness threshold b, the degree of strangeness of the words is not updated.

Step S5011, the server obtains the displayed words included in the target subtitle fragments, obtains adding time for adding other undisplayed words into a preset word library aiming at other undisplayed words except the displayed words in the preset word library, and updates the strangeness degree corresponding to other undisplayed words based on the adding time and a strangeness degree updating rule corresponding to the time state.

In one embodiment, the displayed words included in the target subtitle segment refer to all words included in the target video segment, the words included in the preset word library are added to the preset word library at different times, and a part of words included in the preset word library, which are referred to as other non-displayed words, may not be included in the target subtitle segment. Over time, the degree of strangeness of the part of the word increases. If the influence of time on the strangeness of the part of words is not considered, the probability that the part of words is displayed can be reduced, and the part of words is forgotten. Therefore, the embodiment of the invention can also update the unfamiliar degree of other undisplayed words which do not appear in the target subtitle segment in the preset word library based on the time state.

Specifically, display words included in the target subtitle segment are obtained, adding time for adding other non-displayed words into a preset word library is obtained for other non-displayed words except the display words in the preset word library, and the strangeness degree corresponding to the other non-displayed words is updated based on the adding time and a strangeness degree updating rule corresponding to the time state. Wherein the updating the unfamiliarity corresponding to the other words not shown based on the updating rule corresponding to the adding time and the time state comprises: determining a time state adjustment amplitude value corresponding to the time state according to the time difference between the adding time and the current moment; and updating the unfamiliarity corresponding to the other words which are not displayed based on the time state adjustment amplitude value.

In an embodiment, the implementation of determining the time state adjustment amplitude value corresponding to the time state according to the time difference between the adding time and the current time may be: substituting the time difference between the adding time and the current time into an Ebings forgetting function for calculation, wherein the calculated function value can be used as a time state adjustment amplitude value; and then adjusting the amplitude value based on the time state to adjust the strangeness degree corresponding to other words which are not displayed, and updating the strangeness degree, wherein the specific updating can be represented by the following formula (8):

wherein t represents a time difference between an addition time when other words are added to the preset word bank and the current time, f (t) represents an Ebings forgetting function, S_nIndicating the degree of strangeness corresponding to other words not shown before the update of the degree of strangeness, S_n+1Indicating updated strangeness corresponding to other words not shown, k and b being positive numbers, S₀For an initial strangeness value, this value may be set to 100.

Based on the above description, an embodiment of the present invention further provides a flowchart of another video processing method, as shown in fig. 6, a server obtains a target subtitle segment corresponding to a target video segment; analyzing the target caption segment to obtain a display word included in the target caption segment; determining words which do not belong to a preset word library in the displayed words as words for floating display corresponding to the target video clip, wherein each word for floating display forms a word set; adding each word for floating display into a preset word library as a newly added word; and determining target words from the words, and storing the target video clips, the target subtitles and relevant information of the target words in a related manner. When the terminal needs to play the target video clip in the user interface, the server sends the target video clip, the target subtitle clip and the relevant information corresponding to the target word to the terminal; the terminal displays relevant information of the target word in a floating mode in the user interface in a selected floating display mode; and detecting a display state set corresponding to the target video clip matched with the selected floating display mode within the time of floating display of the relevant information of the target word, and sending the display state set to the server.

For a target word, the server updates the strangeness degree of the target word according to one or more of a floating display state included in the display state set, a pause display state of the target video clip, and a playback display state of the target video clip; or the server updates the strangeness degree corresponding to the target word according to the floating display state included in the display state set and one or more of the continuous playing states of the target video, and then determines the display priority of the target word according to the updated strangeness degree when floating display is needed next time; for other words except the target word in the word set, the server updates the strangeness degree corresponding to other words according to the subtitle display state in the display state set; and updating the corresponding strangeness degrees of other words except the displayed words in the preset word library according to the time state.

In the embodiment of the invention, a server acquires a target video clip and a target subtitle clip corresponding to the target video clip; acquiring a word set which corresponds to the target subtitle fragment and accords with the floating display condition; further, acquiring a target word from the word set according to the display priority of each word in the word set; transmitting the target video clip, the target subtitle clip and the relevant information of the target word to a terminal, and playing the target video clip and displaying the target subtitle by the terminal in a user interface; and the terminal displays the related information of the target word in a floating manner on the user interface according to a preset floating display mode. According to the invention, the server can automatically select the target word from the word set according to the display priority of each word without the participation of a user, so that the terminal can float and display the relevant information of the target word in the user interface. In addition, the related information of the target word displayed in a floating mode can cause the user to pay attention to the target word, the learning of the user to the target word is facilitated, the target word is effectively displayed, and the learning efficiency can be improved.

In addition, in or after the process of floatingly displaying the relevant information of the target word, the server also acquires a display state set corresponding to the word set, acquires a first display state associated with the target word from the display state set, and updates the strangeness degree corresponding to the target word based on a strangeness degree update rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the word set from the display state set, and updating the strangeness degree corresponding to the other words based on the strangeness degree corresponding to the second display state. And updating the strangeness degree of other words which are not displayed and are included in the preset word bank except the displayed word included in the target subtitle fragment based on the updating rule corresponding to the time state, so that the display priority corresponding to each word in the preset word bank can be determined according to the updated strangeness degree of each word in the preset word bank, and the word displayed in a floating mode is selected according to the display priority.

Based on the above video processing method, an embodiment of the present invention further provides a video processing apparatus, which can be configured in a terminal and configured to execute the video processing method executed by the terminal in fig. 2 and 3. Referring to fig. 7, the video processing apparatus may operate as follows:

a sending unit 701, configured to send a play request of a target video to a server;

a receiving unit 702, configured to receive play data corresponding to the target video sent by the server, where the play data includes the target video, a target subtitle corresponding to the target video, and relevant information of a word to be displayed;

and the processing unit 703 is configured to play the target video on a user interface, and display the target subtitle and float and display the related information of the word to be displayed.

In one embodiment, the target video includes a target video segment, the target subtitle includes a target subtitle segment corresponding to the target video segment, the word to be displayed includes a target word corresponding to the target video segment, and the processing unit 703 performs the following operations when playing the target video in the user interface, displaying the target subtitle, and floating and displaying the related information of the word to be displayed: and when the target video clip is played on a user interface, displaying the target subtitle clip and floating and displaying the related information of the target word.

In one embodiment, the processing unit 703 is further configured to: acquiring a preset floating display mode; in the process of playing the target video clip, acquiring a display state set of the target video clip corresponding to the floating display mode; sending the display state set to the server, so that the server acquires a first display state associated with the target word from the display state set, and updating the strangeness degree corresponding to the target word based on a strangeness degree updating rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set, and updating the strangeness degree corresponding to the other words based on a strangeness degree updating rule corresponding to the second display state.

In one embodiment, if the floating display mode is a first display mode in which a floating display duration is set by a terminal, the display state set includes: a floating display state, a subtitle display state, and a paused display state of the target video segment and a playback display state of the target video segment; if the floating display mode is a second display mode in which the floating display time length is set by the user, the display state set comprises: a floating display state, a subtitle display state, and a continuous play display state of the target video clip.

The steps involved in the methods shown in fig. 2 and 5 may be performed by various units in the video processing apparatus shown in fig. 7, according to an embodiment of the present invention. For example, step S201 shown in fig. 2 may be performed by the transmitting unit 701 in the video processing apparatus described in fig. 7, step S202 may be performed by the receiving unit 702 in the video processing apparatus described in fig. 6, and step S203 may be performed by the processing unit 703 in the video processing apparatus described in the figure; as another example, step S505 shown in fig. 5 may be performed by the transmitting unit 701 in the video processing apparatus shown in fig. 7, and steps S507 to S508 may be performed by the processing unit 703 in the video processing apparatus shown in fig. 7.

According to another embodiment of the present invention, the units in the video processing apparatus shown in fig. 7 may be respectively or entirely combined into one or several other units to form one or several other units, or some unit(s) thereof may be further split into multiple units with smaller functions to form the same operation, without affecting the achievement of the technical effect of the embodiment of the present invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present invention, the word-based display device may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present invention, the video processing apparatus as shown in fig. 7 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and a video processing method according to an embodiment of the present invention may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.

Based on the above video processing method and video processing apparatus, another video processing apparatus is provided in an embodiment of the present invention, the video processing apparatus can execute the video processing method shown in fig. 2 and fig. 3, and the video processing apparatus can be configured in a server. Referring to fig. 8, the video processing apparatus may operate as follows:

an obtaining unit 801, configured to obtain a target video and a target subtitle corresponding to the target video;

the obtaining unit 801 is further configured to obtain a word to be displayed for floating display according to the target subtitle, and obtain related information of the word to be displayed;

a storage unit 802, configured to store the target video, the target subtitle, and relevant information of a word to be displayed in an associated manner, so as to send the target video, the target subtitle, and the relevant information of the word to be displayed to a terminal when a play request of the terminal about the target video is received, so that the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when the terminal plays the target video on a user interface.

In one embodiment, the target video includes a plurality of video segments, the target subtitle includes a subtitle segment corresponding to each video segment, and the obtaining unit 801 performs the following operations when the target video includes a plurality of video segments, the target subtitle includes a subtitle segment corresponding to each video segment, and the word to be displayed for floating display is obtained according to the target subtitle: obtaining a word for floating display corresponding to each video clip according to the subtitle clip corresponding to each video clip; and taking the word which corresponds to each video clip and is used for floating display as the word to be displayed.

In one embodiment, the plurality of video segments include a target video segment, the target subtitle includes a target subtitle segment corresponding to the target video segment, and the obtaining unit 801, when obtaining the word for floating display corresponding to each video segment according to the subtitle segment corresponding to each video segment, performs the following operations: acquiring a word set which corresponds to the target subtitle fragment and accords with a floating display condition; and acquiring a target word corresponding to the target video clip from the word set according to the display priority corresponding to each word in the word set.

In one embodiment, the display priority corresponding to each word in the word set is determined according to the strangeness degree corresponding to each word and/or the importance degree corresponding to each word; the unfamiliarity corresponding to each word is determined according to historical display record information corresponding to the word; the corresponding importance of each word is determined according to a bag-of-words model algorithm.

In one embodiment, when the storage unit 802 stores the target video, the target subtitle, and the related information of the word to be displayed in association, the following operations are performed: and storing the target video clip, the target subtitle clip and the relevant information corresponding to the target word included in the target video in a correlation manner, so that when the terminal plays the target video clip in a user interface, the target subtitle clip is displayed and the relevant information of the target is displayed in a floating manner.

In one embodiment, the word display apparatus further comprises a processing unit 803:

an obtaining unit 801, configured to obtain a display state set corresponding to the target video segment;

an obtaining unit 801, further configured to obtain a first display state associated with the target word from the display state set,

the processing unit 803 is configured to update the unfamiliar degree corresponding to the target word based on the unfamiliar degree update rule corresponding to the first display state;

an obtaining unit 801, further configured to obtain, from the display state set, a second display state associated with a word other than the target word in the display word set,

the processing unit 803 is further configured to update the unfamiliar degree corresponding to the other word based on the unfamiliar degree update rule corresponding to the second display state.

In one embodiment, if the floating display mode is a first display mode in which a floating display duration is set by a terminal, the display state set includes: a floating display state, a subtitle display state, a paused display state of the target video segment, and a playback display state of the target video segment; the first display state comprises any one or more of a floating display state, a subtitle display state, a pause display state of the target video segment and a playback display state of the target video segment; the second display state comprises any one or more of a subtitle display state, a pause display state of the target video segment and a playback display state of the target video segment;

if the floating display mode is a second display mode in which the floating display time length is set by the user, the display state set comprises: a floating display state, a subtitle display state and a continuous playing display state of the target video clip; the first display state comprises any one or more of a floating display state, a subtitle display state and a continuous playing display state of the target video clip; the second display state comprises any one or more of a subtitle display state and a continuous playing display state of the target video segment.

In one embodiment, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, and the first display state includes a floating display state, a paused display state of the target video segment, and a playback display state of the target video segment, the processing unit 803 performs the following operations when the strangeness degree corresponding to the target word is updated based on the strangeness degree update rule corresponding to the first display state: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; obtaining pause time corresponding to the pause playing display state and playback times corresponding to the playback display state, and generating second strangeness degree information based on the pause time and the playback times; and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the second degree of strangeness information.

In one embodiment, if the floating display mode is a second display mode in which the floating display duration is set by the user, and the first display state includes a floating display state and a continuously playing display state of the target video segment, the processing unit 803 performs the following operations when updating the strangeness degree corresponding to the target word based on the strangeness degree update rule corresponding to the first display state: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; acquiring a time interval between the time when the continuous playing display state of the target video clip is detected and the time when the automatic playing pause is detected last time, and generating third unfamiliar degree information based on the time interval; and updating the degree of strangeness corresponding to the target character according to the first degree of strangeness information and the third degree of strangeness information.

In one embodiment, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, the second display state includes a subtitle display state; or the floating display mode is a second display mode in which the floating display duration is set by the user, where the second display state includes a subtitle display state, when the strangeness degree corresponding to the other word is updated based on the strangeness degree update rule corresponding to the second display state, the processing unit 803 performs the following operations: acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state; and updating the strangeness degrees corresponding to the other words based on the subtitle display strangeness degree adjustment amplitude value.

In one embodiment, the obtaining unit 801 is further configured to obtain display words included in the target subtitle segment, where the display words include at least each word in the word set; and aiming at other words which are not displayed in a preset word bank except the displayed word, acquiring the adding time of the other words which are not displayed and added into the preset word bank, and updating the strangeness degrees corresponding to the other words which are not displayed based on the adding time and a strangeness degree updating rule corresponding to the time state.

The steps involved in the methods shown in fig. 3 and 5 may be performed by various units in the video processing apparatus shown in fig. 8, according to an embodiment of the present invention. For example, steps S301 to S302 shown in fig. 3 may be performed by the acquisition unit 801 in the video processing apparatus described in fig. 8, and step S303 may be performed by the storage unit 802 in the video processing apparatus described in the figure; as another example, steps S501 to S503, and steps S509 to S5011 shown in fig. 5 may be executed by the acquisition unit 801 in the video processing apparatus shown in fig. 8, and step S506 may be executed by the storage unit 802 in the video processing apparatus shown in fig. 8.

According to another embodiment of the present invention, the units in the video processing apparatus shown in fig. 8 may be respectively or entirely combined into one or several other units to form one or several other units, or some unit(s) may be further split into multiple units with smaller functions to form the same operation, without affecting the achievement of the technical effect of the embodiment of the present invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present invention, the word-based display device may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present invention, the video processing apparatus as shown in fig. 8 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 3 or fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and a video processing method according to an embodiment of the present invention may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.

Based on the description of the above method embodiment and apparatus embodiment, the embodiment of the present invention further provides a terminal, and fig. 9 is a schematic structural diagram of the terminal provided in the embodiment of the present invention. As shown in fig. 9, the terminal can include a processor 901 and a computer storage medium 902.

A computer storage medium 902 may be stored in the memory of the server, the computer storage medium 902 being for storing a computer program comprising program instructions, the processor 901 being for executing the program instructions stored by the computer storage medium 902. The processor 901 or CPU (Central Processing Unit) is a computing core and a control core of the server, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 901 according to the embodiment of the present invention may be configured to perform: sending a playing request of a target video to a server; receiving playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video, a target subtitle corresponding to the target video and relevant information of a word to be displayed; and playing the target video on a user interface, and displaying the target subtitle and floating the relevant information of the word to be displayed.

In one embodiment, the target video includes a target video segment, the target subtitle includes a target subtitle segment corresponding to the target video segment, the word to be displayed includes a target word corresponding to the target video segment, and when the processor 901 plays the target video in the user interface, and displays the target subtitle and floatingly displays the related information of the word to be displayed, the following steps are performed: and when the target video clip is played on a user interface, displaying the target subtitle clip and floating and displaying the related information of the target word.

In one embodiment, the processor 901 is further configured to: acquiring a preset floating display mode; in the process of playing the target video clip, acquiring a display state set of the target video clip corresponding to the floating display mode; sending the display state set to the server, so that the server acquires a first display state associated with the target word from the display state set, and updating the strangeness degree corresponding to the target word based on a strangeness degree updating rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set, and updating the strangeness degree corresponding to the other words based on a strangeness degree updating rule corresponding to the second display state.

Based on the description of the above method embodiment and apparatus embodiment, the embodiment of the present invention further provides a server, and fig. 10 is a schematic structural diagram of a server provided in the embodiment of the present invention. As shown in fig. 10, the server may include a processor 1001 and a computer storage medium 1002.

A computer storage medium 1002 may be stored in the memory of the server, the computer storage medium 802 being for storing a computer program comprising program instructions, the processor 801 being for executing the program instructions stored by the computer storage medium 1002. The processor 1001 or CPU (Central Processing Unit) is a computing core and a control core of the server, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 1001 according to an embodiment of the present invention may be configured to perform: acquiring a target video and a target subtitle corresponding to the target video; acquiring a word to be displayed for floating display according to the target caption, and acquiring related information of the word to be displayed, wherein the related information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed; and storing the target video, the target subtitle and the relevant information of the word to be displayed in a correlation manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video, the target subtitle and the word to be displayed is sent to the terminal, and the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when the target video is played on a user interface.

In one embodiment, the target video includes a plurality of video segments, the target subtitle includes a subtitle segment corresponding to each video segment, and the processor 1002 performs the following operations when obtaining a word to be displayed for floating display according to the target subtitle: obtaining a word for floating display corresponding to each video clip according to the subtitle clip corresponding to each video clip; and taking the word which corresponds to each video clip and is used for floating display as the word to be displayed.

In one embodiment, the plurality of video segments include a target video segment, the target subtitle includes a target subtitle segment corresponding to the target video segment, and the processor 1001, when obtaining a word for floating display corresponding to each video segment according to the subtitle segment corresponding to each video segment, performs the following operations: acquiring a word set which corresponds to the target subtitle fragment and accords with a floating display condition; and acquiring a target word corresponding to the target video clip from the word set according to the display priority corresponding to each word in the word set.

In one embodiment, when storing the target video, the target subtitle, and the related information of the word to be displayed in association, the processor 1001 performs the following operations: and storing the target video clip, the target subtitle clip and the relevant information corresponding to the target word included in the target video in a correlation manner, so that when the terminal plays the target video clip in a user interface, the target subtitle clip is displayed and the relevant information of the target is displayed in a floating manner.

In one embodiment, the processor 1001 is further configured to: acquiring a display state set corresponding to the target video clip; acquiring a first display state associated with the target word from the display state set, and updating the unfamiliarity corresponding to the target word based on an unfamiliarity updating rule corresponding to the first display state; and acquiring a second display state associated with other words except the target word in the display word set from the display state set, and updating the strangeness degree corresponding to the other words based on a strangeness degree updating rule corresponding to the second display state.

In one embodiment, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, and the first display state includes a floating display state, a paused display state of the target video segment, and a playback display state of the target video segment, the processor 1001 performs the following operations when updating the strangeness degree corresponding to the target word based on a strangeness degree update rule corresponding to the first display state: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; obtaining pause time corresponding to the pause playing display state and playback times corresponding to the playback display state, and generating second strangeness degree information based on the pause time and the playback times; and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the second degree of strangeness information.

In one embodiment, if the floating display mode is a second display mode in which a floating display duration is set by a user, the first display state includes a floating display state and a continuously playing display state of the target video segment, the processor 1001 performs the following operations when updating the strangeness degree corresponding to the target word based on a strangeness degree update rule corresponding to the first display state: acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, and generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value; acquiring a time interval between the time when the continuous playing display state of the target video clip is detected and the time when the automatic playing of the target video clip is detected to be paused last time, and generating third unfamiliarity information based on the time interval; and updating the degree of strangeness corresponding to the target word according to the first degree of strangeness information and the third degree of strangeness information.

In one embodiment, if the floating display mode is the first display mode in which a floating display duration is set by a terminal, the second display state includes a subtitle display state; or the floating display mode is a second display mode in which the floating display duration is set by the user, where the second display state includes a subtitle display state, and when the strangeness degree corresponding to the other word is updated based on the strangeness degree update rule corresponding to the second display state, the processor 1001 performs the following operations: acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state; and updating the strangeness degrees corresponding to the other words based on the subtitle display strangeness degree adjustment amplitude value.

In one embodiment, the processor 1001 is further configured to: obtaining display words included in the target subtitle fragments, wherein the display words at least include each word in the word set; and aiming at other words which are not displayed in a preset word bank except the displayed word, acquiring the adding time of the other words which are not displayed and added into the preset word bank, and updating the strangeness degrees corresponding to the other words which are not displayed based on the adding time and a strangeness degree updating rule corresponding to the time state.

An embodiment of the present invention further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used for storing programs and data. It is understood that the computer storage medium herein may include both a built-in storage medium in the electronic device and, of course, an extended storage medium supported by the electronic device. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor. The computer storage medium may include first computer program instructions that, when executed, perform: sending a playing request of a target video to a server; receiving playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video, a target subtitle corresponding to the target video and relevant information of a word to be displayed, and the relevant information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed; and playing the target video on a user interface, and displaying the target subtitle and floating the relevant information of the word to be displayed.

In one embodiment, the computer storage medium may include second computer program instructions that, when executed, perform: acquiring a target video and a target subtitle corresponding to the target video; acquiring a word to be displayed for floating display according to the target caption, and acquiring related information of the word to be displayed, wherein the related information of the word to be displayed comprises one or more of paraphrase information and use case information of the word to be displayed; and storing the target video, the target subtitle and the relevant information of the word to be displayed in a correlation manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video, the target subtitle and the word to be displayed is sent to the terminal, and the terminal displays the target subtitle and displays the relevant information of the word to be displayed in a floating manner when the target video is played on a user interface.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is intended to be illustrative of only some embodiments of the invention, and is not intended to limit the scope of the invention.

Claims

1. A video processing method, comprising:

sending a playing request of a target video to a server;

receiving playing data corresponding to the target video sent by the server, wherein the playing data comprises the target video clip, a target subtitle clip corresponding to the target video clip and relevant information of a target word corresponding to the target video clip, and the relevant information of the target word comprises one or more of paraphrase information and use case information of the target word;

acquiring a preset floating display mode, displaying the target subtitle fragment and displaying related information of the target word in a floating mode according to the floating display mode when the target video fragment is played on a user interface;

if the floating display mode is a first display mode with floating display duration set by a terminal, monitoring user playing behavior operation in the process of playing the target video clip, and acquiring a display state set of the target video clip corresponding to the first display mode according to the monitored user playing behavior operation and the first display mode, wherein the display state set of the target video clip corresponding to the first display mode comprises: a floating display state, a subtitle display state, and a paused display state of the target video segment and a playback display state of the target video segment;

sending the display state set of the target video segment corresponding to the first display mode to the server, so that the server acquires a first display state associated with the target word from the display state set of the target video segment corresponding to the first display mode, wherein the first display state associated with the target word in the first display mode comprises the floating display state, the pause display state and the playback display state; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value, acquiring pause time corresponding to the pause display state and playback frequency corresponding to the playback display state, generating second strangeness degree information based on the pause time and the playback frequency, and updating the strangeness degree of the target word according to the first strangeness degree information and the second strangeness degree information;

and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set of the target video segment corresponding to the first display mode, wherein in the first display mode, the second display state associated with the other words comprises the subtitle display state, acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state, and updating the unfamiliarity corresponding to the other words based on the subtitle display unfamiliarity adjustment amplitude value, wherein the subtitle display unfamiliarity adjustment amplitude value is a positive number smaller than the floating display unfamiliarity adjustment amplitude value.

2. The method of claim 1, wherein if the floating display mode is a first display mode in which a floating display duration is set by a terminal, the user play behavior operation comprises a pause play operation or a playback operation.

3. The method of claim 1, wherein the method further comprises:

if the floating display mode is a second display mode with floating display duration set by a user, monitoring user playing behavior operation in the process of playing the target video clip, and acquiring a display state set of the target video clip corresponding to the second display mode according to the monitored user playing behavior operation and the second display mode, wherein the display state set of the target video clip corresponding to the second display mode comprises: a floating display state, a subtitle display state and a continuous playing display state of the target video clip;

sending the display state set of the target video clip corresponding to the second display mode to the server, so that the server obtains a first display state associated with the target word from a second display state set of the target video clip corresponding to the second display mode, wherein the first display state associated with the target word in the second display mode comprises the floating display state and the continuous playing display state; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness information based on the floating display strangeness adjustment amplitude value; acquiring a time interval between the time when the display state of continuous playing is detected and the time when automatic playing pause is detected last time, generating third unfamiliar degree information based on the time interval, and updating the unfamiliar degree corresponding to the target word according to the first unfamiliar degree information and the third unfamiliar degree information;

and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set of the target video segment corresponding to the second display mode, wherein in the second display mode, the second display state associated with the other words comprises a subtitle display state, acquiring a subtitle display unfamiliar degree adjustment amplitude value corresponding to the subtitle display state, and updating the unfamiliar degree corresponding to the other words based on the subtitle display unfamiliar degree adjustment amplitude value, wherein the subtitle display unfamiliar degree adjustment amplitude value is a positive number smaller than the floating display unfamiliar degree adjustment amplitude value.

4. The method of claim 3, wherein if the floating display mode is a second display mode in which a floating display duration is set by a user, the user play behavior operation comprises a continue play operation.

5. A method for single video processing, comprising:

acquiring a target video clip and a target subtitle clip corresponding to the target video clip;

acquiring a target word for floating display according to the target caption segment, and acquiring related information of the target word, wherein the related information of the target word comprises one or more of paraphrase information and use case information of the target word;

storing the target video clip, the target subtitle clip and the relevant information of the target word in an associated manner, so that when a playing request of a terminal about the target video is received, the relevant information of the target video clip, the target subtitle clip and the target word is sent to the terminal, the terminal can obtain a preset floating display mode, and when the target video clip is played on a user interface, the target subtitle clip is displayed and the relevant information of the target word is displayed in a floating manner according to the floating display mode; and if the floating display mode is a first display mode with floating display duration set by a terminal, monitoring user playing behavior operation in the process of playing the target video clip, and acquiring a display state set of the target video clip corresponding to the first display mode according to the monitored user playing behavior operation and the first display mode, wherein the display state set of the target video clip corresponding to the first display mode comprises: a floating display state, a subtitle display state, and a paused display state of the target video segment and a playback display state of the target video segment; sending the display state set of the target video clip corresponding to the first display mode to a server;

receiving a display state set of the target video clip corresponding to a first display mode sent by the terminal, and acquiring a first display state associated with the target word from the display state set of the target video clip corresponding to the first display mode, wherein the first display state associated with the target word in the first display mode comprises the floating display state, the pause display state and the playback display state; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness information based on the floating display strangeness adjustment amplitude value, acquiring pause time corresponding to the pause playing display state and playback frequency corresponding to the playback display state, generating second strangeness information based on the pause time and the playback frequency, and updating the strangeness of the target word according to the first strangeness information and the second strangeness information;

6. The method of claim 5, wherein if the floating display mode is a first display mode in which a floating display duration is set by a terminal, the user play behavior operation comprises a pause play operation or a playback operation.

7. The method of claim 5, wherein the retrieving a target word for floating display from the target subtitle segment comprises:

acquiring a word set which corresponds to the target subtitle fragment and accords with a floating display condition;

and acquiring a target word corresponding to the target video clip from the word set according to the display priority corresponding to each word in the word set.

8. The method of claim 7, wherein the display priority corresponding to each word in the set of words is determined according to the strangeness degree corresponding to each word and/or the importance degree corresponding to each word; the unfamiliarity corresponding to each word is determined according to historical display record information corresponding to the word; the corresponding importance of each word is determined according to a bag-of-words model algorithm.

9. The method of claim 5, wherein if the floating display mode is a second display mode in which a floating display time period is set by a user, the method further comprises:

receiving a display state set of the target video clip corresponding to the second display mode, which is sent by the terminal, wherein the display state set of the target video clip corresponding to the second display mode is obtained by monitoring a user playing behavior operation and according to the monitored user playing behavior operation and the second display mode in the process of playing the target video clip by the terminal; the display state set of the target video segment corresponding to the second display mode comprises: a floating display state, a subtitle display state and a continuous playing display state of the target video clip;

acquiring a first display state associated with the target word from the display state set of the target video clip corresponding to the second display mode, wherein the first display state associated with the target word comprises the floating display state and the continuous playing display state in the second display mode; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness information based on the floating display strangeness adjustment amplitude value; acquiring a time interval between the time when the display state of continuous playing is detected and the time when automatic playing pause is detected last time, generating third unfamiliar degree information based on the time interval, and updating the unfamiliar degree corresponding to the target word according to the first unfamiliar degree information and the third unfamiliar degree information;

and acquiring a second display state associated with other words except the target word in the display word set from the display state set of the target video segment corresponding to the second display mode, wherein the second display state associated with the other words comprises a subtitle display state and acquires a subtitle display unfairness adjustment amplitude value corresponding to the subtitle display state, and updating the unfairness corresponding to the other words based on the subtitle display unfairness adjustment amplitude value, wherein the subtitle display unfairness adjustment amplitude value is a positive number smaller than the subtitle display unfairness adjustment amplitude value.

10. The method of claim 9, wherein if the floating display mode is a second display mode in which a floating display duration is set by a user, the user play behavior operation comprises a continue play operation.

11. A terminal, comprising:

sending a playing request of a target video to a server;

sending the display state set of the target video segment corresponding to the first display mode to the server, so that the server acquires a first display state associated with the target word from the display state set of the target video segment corresponding to the first display mode, wherein the first display state associated with the target word in the first display mode comprises the floating display state, the pause display state and the playback display state; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness information based on the floating display strangeness adjustment amplitude value, acquiring pause time corresponding to the pause playing display state and playback frequency corresponding to the playback display state, generating second strangeness information based on the pause time and the playback frequency, and updating the strangeness of the target word according to the first strangeness information and the second strangeness information;

12. A server, comprising:

receiving a display state set of the target video clip corresponding to a first display mode sent by the terminal, and acquiring a first display state associated with the target word from the display state set of the target video clip corresponding to the first display mode, wherein the first display state associated with the target word in the first display mode comprises the floating display state, the pause display state and the playback display state; acquiring a floating display strangeness degree adjustment amplitude value corresponding to the floating display state, wherein the floating display strangeness degree adjustment amplitude value is a preset positive number smaller than 1; generating first strangeness degree information based on the floating display strangeness degree adjustment amplitude value, acquiring pause time corresponding to the pause display state and playback frequency corresponding to the playback display state, generating second strangeness degree information based on the pause time and the playback frequency, and updating the strangeness degree of the target word according to the first strangeness degree information and the second strangeness degree information;

and acquiring a second display state associated with other words except the target word in the word set corresponding to the target video segment from the display state set of the target video segment corresponding to the first display mode, wherein in the first display mode, the second display state associated with the other words comprises the subtitle display state, acquiring a subtitle display unfamiliarity adjustment amplitude value corresponding to the subtitle display state, and updating the unfamiliarity corresponding to the other words based on the subtitle display unfamiliarity adjustment amplitude value, wherein the subtitle display unfamiliarity adjustment amplitude value is a positive number smaller than the subtitle display unfamiliarity adjustment amplitude value.

13. A computer storage medium, characterized in that the computer storage medium stores first computer program instructions for performing the video processing method of any of claims 1-4 when executed by a processor; alternatively, the computer storage medium has stored therein second computer program instructions for executing the video processing method according to any of claims 5-10 when executed by a processor.