CN112752134A

CN112752134A - Video processing method and device, storage medium and electronic device

Info

Publication number: CN112752134A
Application number: CN202010693888.7A
Authority: CN
Inventors: 田元
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2021-05-04
Anticipated expiration: 2040-07-17
Also published as: CN112752134B

Abstract

The invention discloses a video processing method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: displaying a target video in a display interface on a client; responding to the received conversion instruction, and converting the voice content in the target video into text content; displaying text content in a target control contained in a display interface; and responding to a trigger instruction executed on the target control, and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client. The invention solves the technical problem of poor flexibility in processing video content in the related technology.

Description

Video processing method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of computers, and in particular, to a video processing method and apparatus, a storage medium, and an electronic apparatus.

Background

In the prior art, after a user receives video content, the user can view the video content. However, the user cannot further process the content of interest in the video content, which is limited to watching the video content. If the user wishes to further process the content of interest, the user needs to remember the content of interest by watching the video and then process the content of interest using a particular application or function.

That is, the related art has a problem that the processing efficiency of the content of interest in the video is low.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a video processing method and device, a storage medium and an electronic device, which are used for at least solving the technical problem of low processing efficiency of interested contents in videos in related technologies.

According to an aspect of an embodiment of the present invention, there is provided a video processing method, including: displaying a target video in a display interface on a client; responding to the received conversion instruction, and converting the voice content in the target video into text content; displaying text content in a target control contained in a display interface; and responding to a trigger instruction executed on the target control, and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client.

According to another aspect of the embodiments of the present invention, there is also provided a video processing apparatus, including: the first display unit is used for displaying the target video in a display interface on the client; the conversion unit is used for responding to the received conversion instruction and converting the voice content in the target video into character content; the second display unit is used for displaying the text content in a target control contained in the display interface; and the execution unit is used for responding to a trigger instruction executed on the target control and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client.

As an alternative example, the second display unit includes: the third display module is used for displaying a plurality of target controls in the display interface; and the fourth display module is used for displaying a vocabulary of the text content in each target control.

As an optional example, the apparatus further comprises: and a third display unit, configured to display a target result obtained after the target function is executed after the target function corresponding to the target control is executed in response to the trigger instruction executed on the target control, where the target result is a result obtained after the target function is executed on the text content in the target control.

As an optional example, the apparatus further comprises: an obtaining unit, configured to obtain a type of the client before the target function corresponding to the target control is executed in response to the trigger instruction executed on the target control; a first determining unit configured to determine a plurality of functions of the client matching the type; a second determining unit configured to determine one function from the plurality of functions as the target function.

As an alternative example, the conversion unit includes: the input module is used for inputting the voice content into a target neural network model, wherein the target neural network model is obtained by training an original neural network model by using sample voice, and the target neural network model is used for outputting text content corresponding to the voice content after the voice content is input; and the acquisition module is used for acquiring the text content output by the target neural network model.

As an optional example, the execution unit includes: the processing module is used for searching the text content in the target control under the condition that the target function is a search function, sharing the text content in the target control under the condition that the target function is a sharing function, translating the text content in the target control under the condition that the target function is a translation function, and displaying the meaning of the text content in the target control under the condition that the target function is an interpretation function.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video processing method.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the video processing method through the computer program.

In the embodiment of the invention, the target video is displayed in a display interface on the client; responding to the received conversion instruction, and converting the voice content in the target video into text content; displaying text content in a target control contained in a display interface; the method comprises the steps of responding to a trigger instruction executed on a target control, and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of a client.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative video processing method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an application environment of an alternative video processing method according to an embodiment of the invention;

FIG. 3 is a flow diagram illustrating an alternative video processing method according to an embodiment of the invention;

FIG. 4 is an interface diagram of an alternative video processing method according to an embodiment of the invention;

FIG. 5 is an interface schematic of an alternative video processing method according to an embodiment of the invention;

FIG. 6 is an interface diagram of yet another alternative video processing method according to an embodiment of the present invention;

FIG. 7 is an interface diagram of yet another alternative video processing method according to an embodiment of the invention;

FIG. 8 is an interface diagram of yet another alternative video processing method according to an embodiment of the present invention;

FIG. 9 is an interface diagram of yet another alternative video processing method according to an embodiment of the invention;

fig. 10 is a schematic structural diagram of an alternative video processing apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, there is provided a video processing method, which may be applied, but not limited to, in the environment shown in fig. 1 as an optional implementation manner.

Human-computer interaction between the user 102 and the user device 104 in fig. 1 is possible. The user equipment 104 comprises a memory 106 for storing interaction data and a processor 108 for processing the interaction data. User device 104 may interact with server 112 via network 110. The server 112 includes a database 114 for storing interaction data and a processing engine 116 for processing the interaction data. The user equipment 104 may operate a client, display a target video in a display interface of the client, display a target control in the display interface when receiving a conversion instruction, and execute a target function corresponding to the target control when receiving a trigger instruction executed on the target control.

As an alternative embodiment, the above-described video processing method may be applied, but not limited to, in an environment as shown in fig. 2.

Human interaction between the user 202 and the user device 204 in fig. 2 is possible. The user equipment 204 comprises a memory 206 for storing interaction data and a processor 208 for processing the interaction data. The user equipment 204 may operate a client, display a target video in a display interface of the client, display a target control in the display interface when receiving a conversion instruction, and execute a target function corresponding to the target control when receiving a trigger instruction executed on the target control.

Alternatively, the user device 104 or the user device 204 may be, but is not limited to, a terminal such as a mobile phone, a tablet computer, a notebook computer, a PC, and the like, and the network 110 may include, but is not limited to, a wireless network or a wired network. Wherein, this wireless network includes: WIFI and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The server 112 may include, but is not limited to, any hardware device capable of performing computations.

Optionally, as an optional implementation manner, as shown in fig. 3, the video processing method includes:

s302, displaying a target video in a display interface on a client;

s304, responding to the received conversion instruction, and converting the voice content in the target video into text content;

s306, displaying the text content in a target control contained in the display interface;

s308, responding to a trigger instruction executed on the target control, and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client.

Alternatively, the above video processing method may be applied, but not limited, to any client. For example, the client is a video applet, or a live application, or a mailbox, and may also be applied to a client with a chat function. The chat function may be a real-time chat function. The client has a chat function and also has other functions, such as a transfer function, a search function, a forwarding function and the like. That is, the client in the present application is not limited to the real-time communication client, and may be another client having a chat function. For example, communication chatting among friends can be performed in the payment treasure as well.

The client side in the application is a client side capable of displaying the target video, and can display the address of the target video in the client side or display a video identifier, wherein the video identifier corresponds to the address of the target video. The target video can be played by clicking on the address or video identification. The type of the client is not limited in the present application, and all clients that can display the target video or display the video identifier of the target video are within the scope of the present application.

Taking the live broadcasting process as an example, in the live broadcasting process, when the live video stream is displayed, live voice content can be converted into text content, and the text content is displayed in a control mode. If the user clicks on the control, the target function is executed. The target function is a function that matches the type of the live application.

Or, taking a client with a chat function as an example, displaying the target video in the chat window, converting the voice content of the target video into the text content, and displaying the text content in a control manner. And if the user clicks the control, executing a target function corresponding to the type of the client.

Or, taking a common client as an example, such as a news client, the news client may display a target video, convert voice content of the target video into text content, and display the text content in a control manner. And if the user clicks the control, executing a target function corresponding to the type of the client.

Or, taking the email as an example, after the email is received, the email has a target video, and the voice content of the target video can be converted into the text content, and the text content is displayed in a control manner. And if the user clicks the control, executing a target function corresponding to the type of the client. E.g., forwarding, etc.

Optionally, in the present application, the target video may be displayed through a chat window, the target video may be a video sent to the current user by another user, and the chat window may be a chat window between two users or a chat window between multiple users. And displaying the target video in a chat window of the client, wherein the target video can be displayed through one video frame. For example, as shown in fig. 4, fig. 4 is a display interface of a client of one user when two users chat using a chat window, and a target video 402 is displayed in the display interface.

After the target video is displayed, the target video can be automatically played or a playing instruction of a user is received for playing. If the video is automatically played, before the video is automatically played, the voice content in the target video can be obtained, then the voice content is converted into the text content, and then the text content is displayed through the target control. The user can click the control to execute the target function on the text content.

There are many ways to display the target control. The target control may be displayed around the target video during the playing of the target video, or during the non-playing of the target video.

For example, as shown in fig. 5, fig. 5 shows that during the playing of a video, text information 502 may be displayed in the video, and the text information 502 is obtained by converting the audio content of the video into text. The textual information may be tokenized and then exposed to a target control, as shown in fig. 6, where a target control 602 is displayed in fig. 6 and textual information is displayed in the target control 602. Fig. 5 and 6 show the case where a target control is displayed or text information is displayed in a video.

As shown in fig. 7 and 8, in fig. 7, the text information is displayed below the video content, and in fig. 8, the target control 802 is displayed below the video content after the text information is segmented. The video need not be played.

If the target control is displayed in the video, the target control needs to be used for replacing the original subtitle. That is, if the video has subtitles, when the target control is determined and the text content is displayed by using the target control, the target control replaces the original subtitles. The replacement may be to delete the original subtitle or to overwrite the original subtitle. And displaying the target control in the time period of displaying the original subtitle.

If the target video does not have the subtitles, after the audio content in the target video is converted into the text content, the corresponding relationship between the audio content and the text content is correspondingly recorded, for example, the corresponding relationship between the target voice content and the target text content is recorded, the target voice content is a section of voice in the audio, and the target text content is the text content converted from the target voice content. And acquiring a starting time point and an ending time point of the target semantic content, and displaying the target text content in the starting time point and the ending time point. And when the target text content is displayed, displaying the target control, and displaying the target text content in the target control.

In the application, when the target control is displayed, a plurality of target controls can be displayed. That is, after the audio content in the video is converted into the text content, the text content may be segmented to obtain a plurality of words, and then each target control displays one word by using a plurality of target controls.

The target function in the application can be a function carried by the client. Such as any function of searching, interpreting, translating, forwarding, etc.

After the target control is displayed, after a trigger instruction is received and the function corresponding to the control is executed, a result obtained by executing the function of the control can be displayed. For example, a search result is shown, or a sharing result is shown, or a translation result is shown, or the meaning of the text content is shown. For example, as shown in fig. 9, taking a search as an example, after clicking a target control, text content in the target control is searched, and a search result is displayed. The search can be a search in the client or a search engine interface is called to perform a full network search.

Optionally, in the application, when the target control is generated, a function needs to be given to the target control, so that after the target control is triggered, the corresponding function can be executed. The functionality to which the target control is assigned may be determined based on the type of client. For example, if the client is a search engine, the target control can be given the function of searching. And if the client is translation software, a translation function can be given to the target control. If a client has multiple functions, one of the multiple functions can be selected to be assigned to the target control. Of course, it is also possible to select multiple functions and assign a target control to each function. The functionality of each target control needs to be displayed.

In the present application, a target neural network model may be used to implement the conversion of the speech content into the text content. The target neural network model is obtained by training an original neural network model by using sample voice, and the target neural network model is used for outputting the text content corresponding to the voice content after the voice content is input.

In the application, sample voice can be obtained and then input into the original neural network model to train the original neural network model. And determining whether to adjust the weights and parameters in the original neural network model by calculating the loss of the original neural network model, and when the identification accuracy of the original neural network model is greater than a first threshold, if so, determining the original neural network model to be the target neural network model, wherein if the identification accuracy of the original neural network model is greater than 99%, the original neural network model is given to a user.

In the above process, the voice is automatically converted into characters and word segmentation is performed. The application also provides a method for segmenting words according to the wishes of the user. Different from the above content, in the present application, after the target video is obtained and the voice content of the target video is displayed as the text content, the text content may be displayed first, and then, when the user clicks the text content, the content selected by the user may be generated as the target control, and the text content selected by the user is displayed in the target control, or the user may perform word segmentation on the text content and generate the target control from the word segmentation result. The method can generate the target control in a targeted manner and generate and execute the target function for the content which is interested by the user.

The present application is explained below with reference to a specific example. For example, the application is applied to a client with a chat function, and a user receives a friend message which comprises a video message. The video message may be displayed as shown in fig. 4. However, at this point the video message has not yet been played. The user can select to start the function of converting the sound in the video into the subtitle, if the function is started, the voice content can be converted into the text content in the playing process of the video, the text content is segmented, after the text content is segmented, the segmentation result can be displayed in the form of a target control, and the corresponding function is not bound by the control. When the user clicks on the target control, the corresponding function may be executed. Or, the user can select to start the function of converting the sound in the video into the subtitle, if the function is started, the video converts the voice content into the text content in the playing process, the text content is displayed, the user performs word segmentation or selects an interested word, and the system converts the word selected by the user into a target control and displays the target control. When the user clicks on the target control, the corresponding function may be executed. When the target control is displayed, the target control can replace the original text content. The result may be that the target control is displayed during video playback, as shown in fig. 5.

In the above process, the target control is displayed in the video playing process. The target control can be displayed under the condition that the video is not played. When the user can display the target video on the display interface of the client, the user can select to press the target video for a long time, and then select the voice-to-text function, so that the text content can be displayed below the target video. When the text content is displayed, the target control generated after word segmentation can be directly displayed, the text content can also be directly displayed, a user carries out word segmentation on the text content or selects an interested word, and then the system generates and displays the target control. And replacing the original subtitle or text content by the displayed target control. And when the user clicks the target control, executing the function of the target control. Such as searching the vocabulary within the target control, or forwarding or translating, etc. And displays the results.

The client side can be a receiving end, and the target video is sent by the sending end and sent to the receiving end through the server. The receiving end obtains the unique identification code VID of the target video, then sends the VID to the server, the server retrieves the video data according to the VID, performs voice-to-text processing on the video, and sends the text to the receiving end. And after the receiving end receives the character data, refreshing the display front end to display and displaying the character data. The user can select word segmentation in the pop-up menu by long-time pressing the character data of the receiving end, and the word segmentation module performs word segmentation. Of course, the word segmentation module may also be deployed in a server, and the server performs word segmentation. And after the word segmentation is successful, generating a control for each word segmentation, and displaying each control by the receiving end. And if the control is clicked, taking the vocabulary as input to invoke the corresponding information association function in the application. Such as retrieval, or translation, or forwarding or paraphrasing, etc.

Or, the receiving end server side in the application sends a video sound to caption conversion request and carries the video unique identification code VID, the server side searches the video data stored in the server side according to the VID, performs voice to text processing on the video and adds a time axis to compress the video into a caption file (text information). And the server transmits the subtitle file corresponding to the video to the client of the receiving end. A client of a receiving end loads a subtitle file when a video is played, and subtitles are displayed in the video; after a receiving end user clicks a caption, video playing is suspended, the caption changes into a popup caption word segmentation control, after the user clicks the caption word segmentation control, the caption sentence is transmitted to a word segmentation module for word segmentation processing, the word segmentation module carries out word segmentation processing on text information and returns data to a client, the client refreshes and displays the word segmentation effect to the original position of the caption, each segmented word generates a clickable control, and the receiving end user clicks the control corresponding to the segmented word and can use the word as input to invoke the corresponding information correlation function in the application.

As an optional embodiment, the displaying the text content in a target control included in the display interface includes:

under the condition that the target video is not played, displaying the target control in an area except the target video in the display interface, and displaying the text content in the target control;

and under the condition that the target video is playing, displaying the target control in the target video, and displaying the text content in the target control.

Optionally, the target video may be played, and the target control is displayed at the original subtitle position during playing. Or when the target video is not played, the target control is displayed below the target video, so that the flexibility of displaying the target control is improved.

As an optional embodiment, the displaying the target control in the target video and the text content in the target control in the case that the target video is playing comprises:

replacing the subtitle content in the target video with the target control when the subtitle content is contained in the target video;

and displaying the target control in the time period for displaying the subtitle content.

Through the method, repeated display of the subtitles can be avoided, and the effect of accuracy of displaying the target control is improved.

under the condition that the target video does not include subtitle content, acquiring a starting time point and an ending time point of target voice content in the target video, wherein the target voice content is a section of content in the voice content;

starting to display the target control at the starting time point, and displaying the text content corresponding to the target voice content in the target control;

and ending the display of the target control at the ending time point, and canceling the display of the text content corresponding to the target voice content.

That is, in the case where the target video does not include subtitles in the present application, text content converted from the voice content can be displayed along with the target video. The starting time point and the ending time point of the target voice content can be determined, so that the text content corresponding to the target voice content is displayed in the starting time point and the ending time point, and the aim of playing the text content along with the voice content is fulfilled.

displaying a plurality of the target controls in the display interface;

and displaying a vocabulary of the text content in each target control.

That is to say, in the application, word segmentation can be performed on the text information, and then a word segmentation result is displayed by each target control of the plurality of target controls, so that the effect of improving the efficiency of displaying the target controls is achieved.

As an optional embodiment, after the target function corresponding to the target control is executed in response to the triggering instruction executed on the target control, the method further includes:

and displaying a target result obtained after the target function is executed, wherein the target result is obtained after the target function is executed on the text content in the target control.

Optionally, the remembering corresponding to the display target control may be to jump to another page to display the result, or to directly display the result on the current page. By the embodiment, the flexibility of processing the video is improved.

As an optional embodiment, before executing the target function corresponding to the target control in response to the triggering instruction executed on the target control, the method further includes:

acquiring the type of the client;

determining a plurality of functions of the client matching the type;

determining one function from the plurality of functions as the target function.

That is, in the present application, one function may be selected from a plurality of functions of the client to process the text information converted from the video, thereby improving flexibility of processing the video.

As an optional implementation, the converting the voice content in the target video into the text content in response to the received conversion instruction includes:

and inputting the voice content into a target neural network model, wherein the target neural network model is obtained by training an original neural network model by using sample voice, and the target neural network model is used for outputting the text content corresponding to the voice content after the voice content is input.

In the application, the semantic content is identified through the target neural network model, and the text content is obtained through conversion, so that the effect of automatically, accurately and efficiently converting the voice content into the text content is realized.

As an optional embodiment, in response to the triggering instruction executed on the target control, executing the target function corresponding to the target control includes:

searching the text content in the target control under the condition that the target function is a searching function;

under the condition that the target function is a sharing function, sharing the text content in the target control;

under the condition that the target function is a translation function, translating the text content in the target control;

in the case where the target function is an interpretation function, the meaning of the text content in the target control is displayed.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided a video processing apparatus for implementing the above-described video processing method. As shown in fig. 10, the apparatus includes:

a first display unit 1002, configured to display a target video in a display interface on a client;

a conversion unit 10024, configured to, in response to the received conversion instruction, convert the voice content in the target video into text content;

a second display unit 1006, configured to display the text content in a target control included in the display interface;

an executing unit 1008, configured to respond to a trigger instruction executed on the target control, and execute a target function corresponding to the target control, where the target function of the target control is determined according to the type of the client.

Alternatively, the video processing apparatus can be applied to any client that can receive and display a message, but is not limited to this. For example, the client is a video applet, or a live application, or a mailbox, and may also be applied to a client with a chat function. The chat function may be a real-time chat function. The client has a chat function and also has other functions, such as a transfer function, a search function, a forwarding function and the like. That is, the client in the present application is not limited to the real-time communication client, and may be another client having a chat function. For example, communication chatting among friends can be performed in the payment treasure as well.

As an alternative embodiment, the second display unit includes:

the first display module is used for displaying the target control in an area except the target video in the display interface under the condition that the target video is not played, and displaying the text content in the target control;

and the second display module is used for displaying the target control in the target video and displaying the text content in the target control under the condition that the target video is playing.

As an alternative embodiment, the second display module comprises:

the replacing sub-module is used for replacing the subtitle content in the target video with the target control under the condition that the subtitle content is contained in the target video;

and the first display sub-module is used for displaying the target control in a time period for displaying the subtitle content.

As an alternative embodiment, the second display module comprises:

the obtaining sub-module is used for obtaining a starting time point and an ending time point of target voice content in the target video under the condition that the target video does not include subtitle content, wherein the target voice content is a segment of content in the voice content;

and the second display sub-module is used for starting to display the target control at the starting time point, displaying the text content corresponding to the target voice content in the target control, finishing displaying the target control at the finishing time point and canceling displaying the text content corresponding to the target voice content.

As an alternative embodiment, the second display unit includes:

the third display module is used for displaying a plurality of target controls in the display interface;

and the fourth display module is used for displaying a vocabulary of the text content in each target control.

As an alternative embodiment, the apparatus further comprises:

and a third display unit, configured to display a target result obtained after the target function is executed after the target function corresponding to the target control is executed in response to the trigger instruction executed on the target control, where the target result is a result obtained after the target function is executed on the text content in the target control.

As an alternative embodiment, the apparatus further comprises:

an obtaining unit, configured to obtain a type of the client before the target function corresponding to the target control is executed in response to the trigger instruction executed on the target control;

a first determining unit configured to determine a plurality of functions of the client matching the type;

a second determining unit configured to determine one function from the plurality of functions as the target function.

As an alternative embodiment, the conversion unit comprises:

the input module is used for inputting the voice content into a target neural network model, wherein the target neural network model is obtained by training an original neural network model by using sample voice, and the target neural network model is used for outputting text content corresponding to the voice content after the voice content is input;

and the acquisition module is used for acquiring the text content output by the target neural network model.

As an alternative embodiment, the execution unit includes:

the processing module is used for searching the text content in the target control under the condition that the target function is a search function, sharing the text content in the target control under the condition that the target function is a sharing function, translating the text content in the target control under the condition that the target function is a translation function, and displaying the meaning of the text content in the target control under the condition that the target function is an interpretation function.

According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the video processing method, as shown in fig. 11, the electronic device includes a memory 1102 and a processor 1104, the memory 1102 stores therein a computer program, and the processor 1104 is configured to execute the steps in any one of the method embodiments by the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

displaying a target video in a display interface on a client;

responding to the received conversion instruction, and converting the voice content in the target video into text content;

displaying the text content in a target control contained in the display interface;

and responding to a trigger instruction executed on the target control, and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the video processing method and apparatus in the embodiments of the present invention, and the processor 1104 executes various functional applications and data processing by operating the software programs and modules stored in the memory 1102, so as to implement the video processing method described above. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be used for storing information such as target videos and target controls, but is not limited to the storage. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, a first display unit 1002, a conversion unit 1004, a second display unit 1006, and an execution unit 1008 of the video processing apparatus. In addition, the video processing apparatus may further include, but is not limited to, other module units in the video processing apparatus, which is not described in this example again.

Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1108 for displaying the target video and the target control; and a connection bus 1110 for connecting the respective module parts in the above-described electronic apparatus.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the steps in any of the above-mentioned method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

displaying a target video in a display interface on a client;

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A video processing method, comprising:

displaying a target video in a display interface on a client;

2. The method of claim 1, wherein displaying the textual content in a target control included in the display interface comprises:

3. The method of claim 2, wherein displaying the target control in the target video and the textual content in the target control while the target video is playing comprises:

4. The method of claim 2, wherein displaying the target control in the target video and the textual content in the target control while the target video is playing comprises:

5. The method of claim 1, wherein displaying the textual content in a target control included in the display interface comprises:

displaying a plurality of the target controls in the display interface;

and displaying a vocabulary of the text content in each target control.

6. The method of claim 1, wherein after executing the target function corresponding to the target control in response to the triggering instruction executed on the target control, the method further comprises:

7. The method of claim 1, wherein before executing the target function corresponding to the target control in response to the triggering instruction executed on the target control, the method further comprises:

acquiring the type of the client;

determining a plurality of functions of the client matching the type;

8. The method of claim 1, wherein the converting the voice content in the target video to text content in response to the received conversion instruction comprises:

9. The method according to any one of claims 1 to 8, wherein the executing the target function corresponding to the target control in response to the triggering instruction executed on the target control comprises:

searching the text content in the target control under the condition that the target function is a search function;

sharing the text content in the target control under the condition that the target function is a sharing function;

translating the text content in the target control under the condition that the target function is a translation function;

and displaying the meaning of the text content in the target control under the condition that the target function is an explanation function.

10. A video processing apparatus, comprising:

the first display unit is used for displaying the target video in a display interface on the client;

the conversion unit is used for responding to the received conversion instruction and converting the voice content in the target video into character content;

the second display unit is used for displaying the text content in a target control contained in the display interface;

and the execution unit is used for responding to a trigger instruction executed on the target control and executing a target function corresponding to the target control, wherein the target function of the target control is determined according to the type of the client.

11. The apparatus of claim 10, wherein the second display unit comprises:

12. The apparatus of claim 11, wherein the second display module comprises:

13. The apparatus of claim 11, wherein the second display module comprises:

14. A storage medium readable by a computer, the storage medium storing a computer program, the computer program, when executed by a processor, implementing the method of any one of claims 1 to 9.

15. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program which, when executed by the processor, implements the method of any of claims 1 to 9.