CN111209437A

CN111209437A - Label processing method and device, storage medium and electronic equipment

Info

Publication number: CN111209437A
Application number: CN202010030363.5A
Authority: CN
Inventors: 杨广煜
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2020-05-29
Anticipated expiration: 2040-01-13
Also published as: CN111209437B

Abstract

The embodiment of the application discloses a label processing method, a label processing device, a storage medium and electronic equipment, wherein the method comprises the following steps: the method comprises the steps that a video playing page corresponding to a target video in a video playing client is displayed, the video playing page comprises a voice input control and a label viewing control, the voice input control is used for triggering generation of a video label, the label viewing control is used for viewing the video label, when voice input operation aiming at the voice input control is detected, the video label corresponding to the current video time is generated, the video label comprises the current video time and video frame image information corresponding to the current video time, when label viewing operation aiming at the label viewing control is detected, a video label list is displayed, and the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved.

Description

Label processing method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a tag processing method, an apparatus, a storage medium, and an electronic device.

Background

The electronic bookmark is a mark added at the interruption of reading when the reader stops reading the electronic book. By the added electronic bookmark, when a reader opens the electronic book next time, the interruption position of last reading can be conveniently and quickly found according to the electronic bookmark, and reading is continued from the interruption position of last reading. In the prior art, a reader can add an electronic tag at a position where the reader wants to add the electronic bookmark through manual operation, or a reading application can add the tag at a position where the reader stops reading. However, in the prior art, the electronic bookmark can only be applied to the reading field, and the application range of the electronic bookmark generation is single.

Disclosure of Invention

The embodiment of the application provides a label processing method, a label processing device, a storage medium and electronic equipment, which can improve the flexibility of label processing.

An embodiment of the present application provides a tag processing method, including:

displaying a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control for triggering generation of a video label and a label viewing control for viewing the video label;

when a voice input operation aiming at the voice input control is detected, generating a video label corresponding to the current video time, wherein the video label comprises: the current video time and video frame image information corresponding to the current video time;

when a label viewing operation aiming at the label viewing control is detected, a video label list is displayed, wherein the video label list comprises at least one generated video label arranged according to a preset sequence.

Correspondingly, an embodiment of the present application further provides a tag processing apparatus, including:

the display module is used for displaying a video playing page corresponding to a target video in a video playing client, and the video playing page comprises a voice input control for triggering generation of a video label and a label viewing control for viewing the video label;

a generating module, configured to generate a video tag corresponding to a current video time when a voice input operation for the voice input control is detected, where the video tag includes: the current video time and video frame image information corresponding to the current video time;

and the display module is used for displaying a video label list when the label viewing operation aiming at the label viewing control is detected, wherein the video label list comprises at least one generated video label arranged according to a preset sequence.

Optionally, in some embodiments, the generating module may include an obtaining sub-module and a generating sub-module, as follows:

the acquisition submodule is used for acquiring video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and currently recorded voice information when the voice input operation aiming at the voice input control is detected;

and the generation submodule is used for generating a video label corresponding to the video unit to be marked based on the voice information and the video frame image information.

Optionally, in some embodiments, the obtaining sub-module may include a first obtaining sub-module and a second obtaining sub-module, as follows:

the first obtaining sub-module is used for obtaining current video time, video frame image information corresponding to the current video time and current voice information recorded aiming at the voice input control when voice input operation aiming at the voice input control is detected;

and the second obtaining submodule is used for obtaining the video unit to be marked corresponding to the current video time from the target video based on the voice information.

At this time, the third obtaining sub-module may be specifically configured to, when it is detected that the voice information includes the first tag type information, obtain a current video frame image corresponding to the current video time from the target video, and determine the current video frame image as a video unit to be marked.

At this time, the third obtaining sub-module may be specifically configured to, when it is detected that the voice information includes second tag type information, determine an operation start time point and an operation end time point corresponding to the voice input operation, obtain, based on the operation start time point and the operation end time point, a current video segment from the target video, and determine the current video segment as a video unit to be marked.

At this time, the generating sub-module may be specifically configured to, when it is detected that the voice information includes tag content voice information, convert the tag content voice information into tag content text information, and generate a video tag corresponding to the video unit to be tagged based on the tag content text information and the video frame image information.

At this time, the obtaining sub-module may be specifically configured to, when a voice input operation for the voice input control is detected, obtain the current video time and video frame image information corresponding to the current video time, close the audio in the target video, play the target video with the audio closed, and obtain the voice information recorded for the voice input control at present.

Optionally, in some embodiments, the tag processing apparatus may further include a first obtaining module and an arranging module, as follows:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video tags corresponding to a plurality of videos in a video set, and the video set comprises videos corresponding to a plurality of hierarchies;

and the arranging module is used for arranging the video tags based on the hierarchy of the video set and the current video time corresponding to the video tags to obtain a video tag list.

Optionally, in some embodiments, the tag processing apparatus may further include a determining module, a second obtaining module, and a skipping module, as follows:

the determining module is used for determining target video time corresponding to a target video tag and a video to be played corresponding to the target video tag when a skip playing operation aiming at the target video tag in the video tag list is detected;

the second obtaining module is used for obtaining a video clip to be played from the video to be played based on the target video time;

and the skipping module is used for skipping and playing the video clip to be played.

At this time, the obtaining sub-module may be specifically configured to, when a voice input operation to the voice input control is detected, detect a login situation of the user for the video playing client to obtain user login state information, and when the user login state information determines that the user has logged in the video playing client, obtain video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time, and currently recorded voice information.

In addition, a computer storage medium is provided in an embodiment of the present application, where a plurality of instructions are stored in the computer storage medium, and the instructions are suitable for being loaded by a processor to perform steps in any one of the tag processing methods provided in the embodiment of the present application.

In addition, an electronic device is further provided in an embodiment of the present application, and includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps in any one of the tag processing methods provided in the embodiment of the present application.

The embodiment of the application can display a video playing page corresponding to a target video in a video playing client, the video playing page comprises a voice input control for triggering generation of a video label and a label viewing control for viewing the video label, when a voice input operation aiming at the voice input control is detected, a video label corresponding to the current video time is generated, and the video label comprises: and when detecting a label viewing operation aiming at the label viewing control, displaying a video label list, wherein the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a scenario of a tag processing system provided in an embodiment of the present application;

fig. 2 is a first flowchart of a tag processing method provided in an embodiment of the present application;

fig. 3 is a second flowchart of a label processing method provided by an embodiment of the present application;

fig. 4 is a third flowchart of a label processing method provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a first video tag list provided by an embodiment of the present application;

FIG. 6 is a schematic view of video tag processing provided by an embodiment of the present application;

fig. 7 is a schematic diagram for determining a user login situation according to an embodiment of the present application;

fig. 8 is a schematic diagram of a second video tag list provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a third video tag list provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a video playback page provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a label processing apparatus provided in an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The embodiment of the application provides a label processing method, a label processing device, a storage medium and electronic equipment. Specifically, the embodiment of the application provides a label processing method suitable for electronic equipment. The electronic equipment can be equipment such as a terminal, and the terminal can be equipment such as a mobile phone, a tablet computer, a notebook computer, a personal computer, an intelligent television, a box and the like; the electronic device may also be a device such as a server, and the server may be a single server or a server cluster composed of a plurality of servers.

For example, the tag processing means may be integrated in a terminal or a server.

In the embodiment of the present application, the tag processing method may be executed by a terminal or a server alone, or may be executed by both the terminal and the server.

Referring to fig. 1, for example, the electronic device may be configured to display a video play page corresponding to a target video in a video play client, where the video play page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag, and when a voice input operation for the voice input control is detected, a video tag corresponding to a current video time is generated, where the video tag includes the current video time and video frame image information corresponding to the current video time, and when a tag viewing operation for the tag viewing control is detected, a video tag list is displayed, where the video tag list includes at least one generated video tag arranged in a predetermined order.

In another embodiment, a video playing page corresponding to the target video may be displayed on a video playing client installed in the terminal, and when a voice input operation of the user for the voice input control is detected, the currently recorded voice information is acquired and sent to the server. The server can generate a video tag corresponding to the current video time according to the received voice information, and returns the video tag to the video playing client.

It is understood that, in another embodiment, the steps in the tag processing method may also be executed by a terminal, and the tag processing apparatus may be integrated in the terminal in the form of a video playing client, and the video playing client may perform professional video editing and the like.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The embodiments of the present application will be described from the perspective of a tag processing apparatus, which may be specifically integrated in a terminal or a server.

In the tag processing method provided in the embodiment of the present application, the method may be executed by a processor of a terminal, as shown in fig. 2, a specific flow of the tag processing method may be as follows:

201. and displaying a video playing page corresponding to the target video in the video playing client.

The video playing client is a user side capable of providing video playing service for users, and can be installed on the terminal and operated in cooperation with the server side. The user can log in the video playing client through the user account, so that the video playing client can record data such as video playing history, video watching preference and the like of the user. The video playing client is not limited to a client that mainly plays video, but may also be a client that includes a video playing function, such as a video client, a browser client, and the like.

For example, when a user is watching video 1 through the video playing client, video 1 may be determined as the target video. The embodiment of the application may not limit video content, video format, and the like of the target video.

For example, as shown in fig. 10, the video playing page may play a target video so that the user can view the target video, and the video playing page may further include a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag.

The video tag is an electronic bookmark capable of guiding a user to find a specific video frame or video segment in a video. For example, when the user is watching the second episode of the series 1 and plays the second episode of the series 1 at 20 th minute and 30 th second, the user adds a video tag, and then the user can jump to play a video segment starting at 20 th minute and 30 th second of the second episode of the series 1 next time by directly using the video tag.

The user may also add tag content to the video tag, where the tag content may be a summary of the currently playing video content, a subtitle added to the currently playing video, or a user's viewing experience, etc. For example, when the user is watching the second episode of the series 1 and plays the second episode of the series 1 at 20 th minute and 30 th second, the scenario 1 occurs, and at this time, the user may add a video tag, where the video tag includes a tag content "scenario 1"; when the video is played to the 30 th and 30 th seconds of the second episode, scenario 2 occurs, at this time, the user may further add a video tag, where the video tag includes the tag content "scenario 2", then, when the user views the video tag list, the tag content "scenario 1" corresponding to the 20 th and 30 th seconds of the second episode of drama 1 and the tag content "scenario 2" corresponding to the 30 th and 30 th seconds of the second episode of drama 1 may be displayed in the video tag list, so that the user may intuitively know the scenarios that have occurred in the video and the occurrence time of each scenario.

The voice input control can be a control which is positioned in a video playing page and guides a user to input voice, and the video tag is generated according to the voice information input by the user in the embodiment of the application, so that the voice input control can also be used as a control for triggering the video tag processing request. For example, as shown in fig. 10, the voice input control may be in the form of a button, and the user may trigger the video tag processing request and record the voice information by pressing the voice input control in the form of the button for a long time. The voice input control can be in various forms, for example, the voice input control can be in the form of an input box, a button, an icon, and the like.

The label viewing control can be a control which is positioned in the video playing page and guides the user to view the video label. For example, as shown in fig. 10, the label viewing control may be in the form of a button, and a user may click the label viewing control in the form of the button to trigger a request for displaying a video label list, where the video label list may be displayed on the terminal interface. The form of the label viewing control may be various, for example, the label viewing control may be in the form of an input box, a button, an icon, and the like.

In practical applications, for example, as shown in fig. 10, the user 1 may log in a video playing client by using the account 1, and view the target video based on the video playing client, at this time, the target video may be played in a video playing page of the video playing client. The video playing page can further comprise a voice input control used for triggering generation of a video label and a label viewing control used for viewing the video label.

In an embodiment, the video tag may be specific to a certain time in the target video, and may also be specific to a certain time period in the target video, for example, the video tag may be specific to not only the 20 th minute and the 30 th second in the third set of the series 1, but also the 20 th minute to the 30 th minute in the third set of the series 1, and at this time, according to the video tag, video segments corresponding to the 20 th minute to the 30 th minute in the third set of the series 1 may be jumped to play.

202. And when the voice input operation aiming at the voice input control is detected, generating a video label corresponding to the current video time.

The voice input operation may be an operation of a user inputting voice for the voice input control, for example, the user may press the voice input control in the form of a button for a long time and speak during the long time pressing of the button, at this time, the spoken utterance of the user may be recorded and stored, and this operation of pressing the button for a long time by the user may be determined as the voice input operation for the voice input control. For another example, the user may press a voice input control in the form of a button for a long time, and play audio from the playing device during the long-time pressing of the button, at this time, the played audio may be recorded and stored, and this operation of pressing the button for a long time by the user may also be determined as a voice input operation for the voice input control.

The current video time and the video frame image information corresponding to the current video time are needed for constructing the video label. For example, if the video tag is constructed when the user watches 20 th, 30 th seconds in the third quarter of the second episode of the tv series 1, the current video time corresponding to the video tag is 20 th, 30 th seconds in the third quarter of the second episode of the tv series 1.

For example, if the video tag is constructed when the user watches the 20 th and 30 th minutes in the third set of the second quarter of the tv series 1, the video frame image information may include a video title corresponding to the third set of the second quarter of the tv series 1, a video unique identifier corresponding to the third set of the second quarter of the tv series 1, the 20 th and 30 th seconds in the third set of the second quarter of the tv series 1, a time stamp, and the like.

When the current video time is the time corresponding to the target video in the video playing page when the user performs the voice input operation, for example, the third episode in the second quarter of the series 1 is being played in the video playing page, when the third episode is played for the 20 th minute and the 30 th second, the user long-presses the voice input control in the form of a button to input the voice information, and at this time, the 20 th minute and the 30 th second in the third episode in the second quarter of the series 1 can be determined as the current video time.

In practical applications, for example, when the video playing page is played to the 20 th, 30 th second in the third quarter set of the series 1, a user is detected to press a voice input control in the form of a button for a long time, and says "bookmark", and at this time, a video tag corresponding to the 20 th, 30 th second in the third quarter set of the series 1 can be generated.

In an embodiment, for example, after the video tag is generated, the user may be prompted to be successfully made in the form of a message prompt box, and the message prompt box may include "tag making is successful". For another example, after the video tag is generated, the user may be prompted by voice that the tag is successfully made, so as to prompt the user that the video tag is successfully made.

In an embodiment, after the video tag is generated, the video tag may also be shared, for example, after the user 1 shares the video tag of the 20 th score of the video 1 to the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tag corresponding to the 20 th score of the video 1 to the user 2.

In an embodiment, since the video tag needs to be constructed according to the input voice information, the current video time, and the video frame image information, the voice information, the current video time, and the video frame image information also need to be acquired. Specifically, the step "generating a video tag corresponding to the current video time when the voice input operation for the voice input control is detected" may include:

when voice input operation aiming at the voice input control is detected, video frame image information corresponding to the current video time and currently recorded voice information are obtained;

and generating a video label corresponding to the current video time based on the voice information and the video frame image information.

In practical applications, for example, when the video playing page is played to the 20 th, 30 th second in the third quarter of the second quarter of the series 1, a user's long-pressing of a voice input control in the form of a button is detected, and at this time, a video tag generation request is triggered, and then, the user can speak a "bookmark" for the voice input control. The terminal may determine, according to the video tag generation request, the 20 th and 30 th seconds in the third set of the second quarter of the tv series 1 as the current video time, and obtain video frame image information corresponding to the current video time, where the video frame image information may include one or more of a video title corresponding to the third set of the second quarter of the tv series 1, a video unique identifier corresponding to the third set of the second quarter of the tv series 1, the 20 th and 30 th seconds in the third set of the second quarter of the tv series 1 at the current video time, and a timestamp. And acquiring voice information 'bookmark' recorded by the user aiming at the voice input control, and then generating a video label corresponding to 20 th minute and 30 th second in the third set of the second quarter and the third quarter of the television series 1 according to the voice information, the current video time and the video frame image information.

In an embodiment, after the video tag corresponding to the current video time in the target video is generated according to the voice information, the current video time, and the video frame image information, the video tag, the merged voice information, the current video time, the video frame image information, and other information may be stored in the database. The data structure of the video tag may include: the video ID of the target video, the video name of the target video, the video subtitle corresponding to the target video, the link of the target video image, the current video time, the total duration of the target video, the content information of the video tag, and the like.

The video subtitle may be the number of video episodes in which the target video is located, for example, if the target video is the third episode of the second season of the tv series 1, the video subtitle may be the "third episode".

For example, if the video tag corresponds to the 20 th 30 th second in the second quarter third set of the series 1, the target video image may be a video frame image corresponding to the 20 th 30 th second in the second quarter third set of the series 1. For another example, if the video tags correspond to 20 th to 30 th scores in the third set of the second season of the tv series 1, the target video image may be a poster image corresponding to the second season of the tv series 1, and so on.

In an embodiment, since the video tag belongs to a single user, it is necessary to determine whether the user has logged in the video playing client when creating the video tag. Specifically, the step "acquiring video frame image information corresponding to the current video time and currently recorded voice information when a voice input operation for the voice input control is detected" may include:

when the voice input operation aiming at the voice input control is detected, detecting the login condition of the user aiming at the video playing client to obtain user login state information;

and when the user login state information determines that the user has logged in the video playing client, acquiring video frame image information corresponding to the current video time and currently recorded voice information.

In practical applications, for example, when the video playing page is played to the 20 th, 30 th second in the third quarter of the second quarter of the series 1, a user's long-pressing of a voice input control in the form of a button is detected, and at this time, a video tag generation request is triggered. As shown in fig. 7, the terminal may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has already logged in the video playing client, the video tag construction condition is satisfied, and then the steps of acquiring the video frame image information and the voice information may be performed. If the fact that the user does not log in the video playing client is detected, the fact that the condition for constructing the video tag is not met is indicated, therefore, a login request can be sent to the user, the user can log in the video playing client through the user account according to the login request, and then the video tag construction step is carried out.

In an embodiment, since the video tag in the embodiment of the present application may not only be specific to a certain moment in the target video, but also be specific to a certain time period in the target video, the types of the video tag may be distinguished according to the recorded voice information. Specifically, the step "acquiring video frame image information corresponding to the current video time and currently recorded voice information when a voice input operation for the voice input control is detected" may include:

when voice input operation aiming at the voice input control is detected, acquiring current video time and video frame image information corresponding to the current video time;

acquiring the current voice information recorded aiming at the voice input control;

acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information;

the step of generating the video tag corresponding to the current video time based on the voice information and the video frame image information includes:

and generating a video label corresponding to the video unit to be marked based on the voice information and the video frame image information.

The video unit to be marked is a video unit marked by the video tag, and the video unit to be marked can be a frame of video frame in the target video or a segment of video clip in the target video. For example, the video unit to be marked may be a video frame corresponding to 20 th to 30 th seconds in the third set of the drama 1, or may be a video clip corresponding to 20 th to 22 th minutes in the third set of the drama 1.

In practical applications, for example, when the video playing page is played to the 20 th, 30 th second in the third quarter of the second quarter of the series 1, a user's long-pressing of a voice input control in the form of a button is detected, and at this time, a video tag generation request is triggered, and then, the user can speak a "bookmark" for the voice input control. The terminal may determine, according to the video tag generation request, the 20 th minute and the 30 th second in the third set in the second quarter of the drama 1 as the current video time, and acquire video frame image information corresponding to the current video time. And acquiring a voice information 'bookmark' recorded by a user aiming at the voice input control, then acquiring a video frame corresponding to the 20 th and 30 th seconds from the third set in the second quarter of the television series 1 according to the voice information to be used as a video unit to be marked, and generating a video label corresponding to the video unit to be marked according to the voice information, the current video time and the video frame image information.

In an embodiment, the video tag may correspond to a certain time in a target video, and specifically, the step "acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information" may include:

when the voice information is detected to comprise first label type information, acquiring a current video frame image corresponding to the current video time from the target video;

and determining the current video frame image as a video unit to be marked.

The first tag type information may represent that a video tag to be constructed is a bookmark type, the video tag of the bookmark type corresponds to a certain moment in the target video, and a video unit to be marked corresponding to the video tag of the bookmark type is a frame of video frame in the target video. For example, in the embodiment of the present application, the first tag type information may be preset as a "bookmark", and therefore, when it is detected that the voice information includes the "bookmark", it may be determined that a video tag of a bookmark type needs to be constructed at this time.

In practical applications, for example, the first tag type information may be preset as "bookmark", and when the video playing page is played to the 20 th and 30 th seconds in the third quarter and the third quarter of the series 1, the user is detected to press a voice input control in the form of a button for a long time, and the voice input control is said to be "bookmark". At this time, since the voice information input by the user includes "bookmark", it may be determined that the voice information includes the first tag type information, and a current video frame image corresponding to 20 th and 30 th seconds may be acquired from the second quarter and the third set of the drama 1, and the acquired current video frame image is determined as the video unit to be marked.

In an embodiment, the video tag may further correspond to a certain time period in a target video, and specifically, the step "acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information" may include:

when the voice information is detected to comprise second label type information, determining an operation starting time point and an operation ending time point corresponding to the voice input operation;

acquiring a current video clip from the target video based on the operation starting time point and the operation ending time point;

and determining the current video clip as a video unit to be marked.

The second tag type information may represent that a video tag to be constructed is a short video type, the short video type video tag corresponds to a certain time period in the target video, and a video unit to be marked corresponding to the short video type video tag is a video segment in the target video. For example, in the embodiment of the present application, the second tag type information may be preset to be "build a short video", and therefore, when it is detected that the voice information includes "build a short video", it may be determined that a video tag of a short video type needs to be built at this time.

In practical applications, for example, the first tag type information may be preset as "construct short video", when the video playing page is played to the 20 th 30 th second in the third set of the second quarter of the series 1, the user is detected to press the voice input control in the form of a button for a long time, and says "record short video" for the voice input control, and then stops pressing the voice input control for the 22 th 30 th second. At this time, since the voice information input by the user includes "record short video", it may be determined that the voice information includes the second tag type information, and it may determine that the 20 th and 30 th seconds of the third episode in the second quarter of the drama 1 are the operation start time point corresponding to the voice input operation, and determine that the 22 th and 30 th seconds of the third episode in the second quarter of the drama 1 are the operation end time point corresponding to the voice input operation. Then, a current video segment corresponding to 20 th, 30 th second to 22 th, 30 th second may be acquired from the third quarter and the third set of the tv series 1, and the acquired current video segment may be determined as a video unit to be marked.

In an embodiment, the first tag type information and the second tag type information are not limited to "bookmark" and "record short video", and information contents of the first tag type information and the second tag type information may be adjusted according to a practical application requirement, as long as a type of a video tag that needs to be currently constructed can be distinguished according to the first tag type information and the second tag type information.

In an embodiment, the user can add content information in the video tags, so that the user can intuitively know the information which the user wants to pay attention to according to the content information when viewing the video tag list. Specifically, the step "generating a video tag corresponding to the current video time based on the voice information and the video frame image information" may include:

when detecting that the voice information comprises the tag content voice information, converting the tag content voice information into tag content text information;

and generating a video label corresponding to the current video time based on the label content text information and the video frame image information.

The tag content voice information may be voice information including a summary of the currently playing video content, a subtitle added to the currently playing video, or a user's viewing experience, etc. For example, when the 20 th and 30 th seconds in the third second quarter of the series 1 are played in the video playing page, the user is detected to press a voice input control in the form of a button for a long time, and "bookmark scenario 1" is spoken for the voice input control, at which point, "scenario 1" may be determined as the tag content voice information, representing that the 20 th and 30 th seconds in the third second quarter of the series 1 play the content related to scenario 1.

In practical applications, for example, when the video playing page is played to the 20 th and 30 th seconds in the third set of the second quarter of the drama 1, a user is detected to press a voice input control in the form of a button for a long time, and "bookmark scenario 1" is spoken for the voice input control, at this time, "scenario 1" may be determined as the voice information of the tag content, and then the voice information of the tag content may be converted into the text information of the tag content "scenario 1", and a video tag corresponding to the 20 th and 30 th seconds in the third set of the second quarter of the drama 1 is generated according to the text information of the tag content and the acquired video frame image information, where the video tag includes the "scenario 1" of the tag content.

For another example, when the video playing page is played to the 20 th and 30 th seconds in the third quarter and fourth set of the series 1, the user is detected to press the voice input control in the form of a button for a long time, and says "record short video scenario 1" for the voice input control, and then stops pressing the voice input control for the 22 th and 30 th seconds. At this time, "scenario 1" may be determined as the tag content voice information, and then the tag content voice information may be converted into tag content text information "scenario 1", and a video tag corresponding to 20 th, 30 th, and 30 th seconds in the third set of the second quarter and the third quarter of the tv series 1 is generated according to the tag content text information and the acquired video frame image information, where the video tag includes the tag content "scenario 1".

In an embodiment, for example, when the video playing page is played to 20 minutes and 30 seconds in the third set of the second quarter of the series 1, a user is detected to press a voice input control in the form of a button, and "bookmark watch hearts" is spoken for the voice input control, at this time, "watch hearts" may be determined as the tag content voice information, and then the tag content voice information may be converted into the tag content text information "watch hearts", and a video tag corresponding to 20 minutes and 30 seconds in the third set of the second quarter of the series 1 is generated according to the tag content text information and the acquired video frame image information, where the video tag includes the tag content "watch hearts". By the method, the notes recorded by the user in the process of watching the video or the watching hearts can be recorded, so that the user can know the notes or the watching feelings made by the user before like reading the novel when viewing the video label list, and the like.

In an embodiment, because the voice information recorded by the user needs to be identified and the content included in the voice information needs to be acquired in the embodiment of the present application, the influence on the voice information recorded by the user needs to be reduced as much as possible, and the voice information which is as clear as possible needs to be acquired. Specifically, the step "acquiring video frame image information corresponding to the current video time and currently recorded voice information when a voice input operation for the voice input control is detected" may include:

when voice input operation aiming at the voice input control is detected, video frame image information corresponding to the current video time is obtained;

closing the audio in the target video, and playing the target video after the audio is closed;

and acquiring the currently recorded voice information.

In practical applications, for example, when the video playing page is played to the 20 th minute and 30 th second in the third quarter of the television series 1, a user's long-pressing of a voice input control in the form of a button is detected, and a "bookmark" is spoken for the voice input control. Because the target video in the video playing page is still in a state of continuously playing in the process of the user speaking the bookmark, the audio in the target video can be closed in the process of the user long-pressing the voice input control in the form of a button, so that the target video is continuously played in a silent state, and when the user stops long-pressing the voice input control, the audio in the target video is recovered, and the target video with the audio is continuously played. By the method, the voice information with the smallest interference can be acquired, so that the accuracy of label generation is improved.

In an embodiment, in the process that the user presses the voice input control in the form of a long button, the volume of the audio in the target video may also be reduced, so that the target audio continues to be played in a low volume state, and when the user stops pressing the voice input control for a long time, the volume of the audio in the target video is restored, and the target video with the original audio volume continues to be played. Therefore, on one hand, the influence of the audio in the target video on the voice information can be reduced, and on the other hand, the user watching the video is not influenced.

In one embodiment, for example, when the user presses the voice input control in the form of a button for a long time, the playing of the target video may be paused at the same time, so that the user may record the voice information without interruption, and when the "bookmark" is detected to be included in the voice information, the target video may be continuously played when the user stops pressing the voice input control, so that the video may be automatically played without additional operation by the user.

For another example, when the user presses the voice input control in the form of a button for a long time, the playing of the target video may be paused at the same time, so that the user may record the voice information without interference, when the voice information is detected to include "record short video", the video may be automatically played, and when the user stops pressing the voice input control for a long time, the operation start time point and the operation end time point corresponding to the voice input operation are recorded. At this time, in the process that the user presses the voice input control for a long time, the target video with the volume turned down or eliminated can be played, so that the voice information recorded by the user is not influenced.

In one embodiment, due to the uncertainty of the voice information, the voice information needs to be detected multiple times to improve the accuracy of tag generation. For example, as shown in fig. 6, when the video playing page is played to 20 th, 30 th second in the third quarter of the second quarter of the series 1, a user's long-pressing of a voice input control in the form of a button is detected, and a "bookmark" is spoken for the voice input control. At this time, the client voice service may upload an instruction to the server, where the instruction may include the voice information "bookmark", the current video time, and the video frame image information corresponding to the current video time. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, that is, whether the voice information is invalid, such as being indistinguishable or not supported by the instruction, exists is judged, and if the voice information is detected to be invalid, the voice information can be fed back to the client to report an error.

If the voice information in the instruction is detected to be valid, whether the instruction is a tag generation instruction or not can be continuously judged, and if the instruction is not the tag generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a label generation instruction, a label creation process can be performed, and a video label is generated. After the video tag is generated, the video tag, the current video time carried in the instruction, and the video frame image information corresponding to the current video time can be stored in a database in the server corresponding to the user. And then sending the video label to the client, and updating a database of the client according to the video label, the current video time and the video frame image information corresponding to the current video time. After the label is stored, the user can be informed that the video bookmark is manufactured in a message prompt box or a voice prompt mode.

In an embodiment, after the video tag is generated, the video tag can be uploaded to the cloud, so that the multiple platforms are consistent. For example, the user 1 logs in the video playback client 1 with the user account 1, and adds a video tag at the 20 th minute and 30 th second in the third quarter of the series 1. Then, when the user logs in the video playing client 2 by using the user account 1, the user may find the video tag corresponding to 20 th, 30 th second in the third quarter of the series 1 of the video playing client 2.

203. And when a label viewing operation aiming at the label viewing control is detected, displaying the video label list.

The tag viewing operation may be an operation of viewing a tag for the tag viewing control by a user, for example, the user may click the tag viewing control in a button form, at this time, it may be determined that the user needs to view the video tag list, and the video tag list is displayed for the user to view.

Wherein the video tag list comprises at least one generated video tag arranged in a predetermined order. The video tag list may be for a certain drama or movie, wherein the plurality of video tags may be arranged in the order of the episodes, that is, a plurality of video tags belonging to the same episode in one drama are arranged in the time order, and the plurality of episodes in the one drama are arranged and displayed in the number of episodes. For example, the video tag list may be for a tv show 1 and include two video tags of a first episode and one video tag of a second episode in the tv show 1. The two video tags in the first set may be arranged in the order of time corresponding to the video tags.

In practical applications, for example, as shown in fig. 9, when it is detected that a user clicks a tab viewing control in the form of a button in a video playback page, a video tab list may be displayed on the left side of the video playback page, where the video tab list includes two video tabs corresponding to a first set and one video tab corresponding to a second set in a tv play 1, where the two video tabs in the first set may be arranged in a time sequence corresponding to the video tabs.

In one embodiment, video tags of multiple television shows belonging to a series may also be presented simultaneously in a video tag list. For example, as shown in fig. 5, the video tag list may further include two video tags corresponding to the first episode in the tv series 1, one video tag corresponding to the second episode, and one video tag corresponding to the first episode in the second season of the tv series 1, where the two video tags in the first episode may be arranged in the order of time corresponding to the video tags.

In an embodiment, the video tag list may be arranged according to a hierarchy of the video set, and specifically, the tag processing method may further include:

acquiring video tags corresponding to a plurality of videos in a video set, wherein the video set comprises videos corresponding to a plurality of hierarchies;

and arranging the video tags based on the hierarchy of the video set and the current video time corresponding to the video tags to obtain a video tag list.

The video set may be a set formed by a plurality of videos belonging to a series, for example, a plurality of episodes in a tv episode may form a video set; also for example, a plurality of episodes in a television episode belonging to a plurality of seasons of a series may constitute a video set, and so on.

The hierarchy may be a standard for dividing videos in the video set, for example, the video set includes 1 to 6 episodes of a first season of a

drama

1 and 1 to 6 episodes of a second season of the drama 1, and at this time, the video set may be divided into a plurality of videos corresponding to the first season of the drama 1 and a plurality of videos corresponding to the second season of the drama 1 according to the hierarchy of the seasons. According to the hierarchy of the episodes, a plurality of videos corresponding to the first season of the television series 1 can be divided into 6 videos corresponding to 1-6 episodes, a plurality of videos corresponding to the second season of the television series 1 can be divided into 6 videos corresponding to 1-6 episodes, and the like.

In practical application, for example, as shown in fig. 5, the video set includes a first season 1-2 episode of a drama 1 and a second season 1 episode of the

drama

1, 4 video tags corresponding to the video set may be acquired, then the video set is divided into a plurality of videos corresponding to the first season of the drama 1 and a plurality of videos corresponding to the second season of the drama 1 according to a hierarchy of the seasons, then the plurality of videos corresponding to the first season of the drama 1 are divided into 2 videos corresponding to the 1-2 episodes according to a hierarchy of the drama, and the plurality of videos corresponding to the second season of the drama 1 are divided into 1 video corresponding to the 1 episode, and so on. Then, television plays in different seasons are arranged according to the seasons, a plurality of videos belonging to the same season are arranged according to the episodes, and a plurality of video tags corresponding to each video are arranged according to the time sequence to obtain a video tag list.

In an embodiment, for example, as shown in fig. 5, when it is detected that the user clicks a tab viewing control in the form of a button in a video playback page, a video tab list may be further displayed, where the video tab list includes 2 video tabs corresponding to a first episode of a

tv series

1, 1 video tab corresponding to a second episode, and 1 video tab corresponding to a first episode of a second quarter of the tv series 1, where a total duration corresponding to a video of the episode is indicated after each episode, a video time corresponding to the video tab is indicated after each video tab, and a tab content corresponding to the video tab, and each video tab corresponds to one target video image.

In one embodiment, because the time span for updating the television series is long, the user may not remember the content of the previous episode when viewing the following episode, and if the user needs to review, a large number of fast forward and fast backward operations are required, so the operation is complicated and the efficiency is low. After the video tag list is displayed, the user can know the content of the previous episode according to the tag content corresponding to the plurality of video tags in the video tag list so as to achieve the purpose of reviewing the episode and also can know the content concerned by the user, thereby providing more accurate content recommendation for the user. Therefore, for videos such as reasoning suspicion, the user can know the event clues and the development process of the events according to the label contents corresponding to the plurality of video labels in the video label list.

In an embodiment, the user may further perform video skip playing based on the video tag list, and specifically, the tag processing method may further include:

when a skip playing operation for a target video label in the video label list is detected, determining target video time corresponding to the target video label and a video to be played corresponding to the target video label;

acquiring a video clip to be played from the video to be played based on the target video time;

and skipping to play the video clip to be played.

The skip playing operation may be an operation of a user for skip playing of the video tag list, for example, the user may click a video tag corresponding to the 20 th and 30 th seconds of the third episode in the second quarter of the tv series 1 in the video tag list, and at this time, the user may skip to play from the 20 th and 30 th seconds of the third episode in the second quarter of the tv series 1.

In practical applications, for example, as shown in fig. 8, after the video tag list is presented, when it is detected that the user clicks a target video tag in the video tag list, it may be determined that the target video tag corresponds to 20 minutes and 30 seconds of the third episode in the second quarter of the television series 1, and the video segment to be played is determined to be a video segment starting from 20 minutes and 30 seconds in the third episode in the second quarter of the television series 1, and then, the video segment to be played is skipped in the video playing page.

In an embodiment, for example, since the video tag may also be for a time slot in the target video, when the user clicks the target video tag for the time slot in the video tag list, it may be determined that the target video tag corresponds to a video segment from 20 th to 30 th of the third episode of the second quarter of the tv series 1, the video segment may be determined as a video segment to be played, and then, the video segment to be played is jumped to play in the video playing page.

In an embodiment, the tag processing method in the embodiment of the present application is not limited to generating a video tag according to voice, and may also generate a video tag according to a gesture. For example, when the 20 th and 30 th seconds in the third second quarter set of the drama 1 are played in the video playing page, the gesture captured by the terminal camera is to wave the hand to the left, and at this time, the video tag corresponding to the 20 th and 30 th seconds in the third second quarter set of the drama 1 can be generated.

The preset gesture generated according to the gesture video tag can be various, the gesture of the user is not limited too much in the embodiment of the application, for example, the preset gesture can be a hand waving left, a hand waving right and the like.

As can be seen from the above, in the embodiment of the present application, a video playing page corresponding to a target video in a video playing client may be displayed, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag, and when a voice input operation for the voice input control is detected, a video tag corresponding to a current video time is generated, where the video tag includes: and when detecting a label viewing operation aiming at the label viewing control, displaying a video label list, wherein the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved. And the user can add the label content in the video label, so when the user checks the video label list, the user can clearly and intuitively know the content concerned by the user, the watching experience of the user, the development and change of the plot and the like according to the added label content, thereby facilitating the plot review of the user. And moreover, the user can also utilize the video tag list to carry out skip playing of the video, and when skip playing operation of the user is detected, the video playing page can directly skip and play the video clip concerned by the user. Moreover, the video tag can not only aim at a certain moment in the video, but also aim at a certain time period in the video, so that a user can also store a section of video clip by using the video tag, add tag content to the video clip, and quickly and conveniently skip to play by using the form of the video tag.

According to the method described in the foregoing embodiment, the following will be described in further detail by way of example in which the tag processing apparatus is specifically integrated in an electronic device.

Referring to fig. 3, a specific flow of the tag processing method according to the embodiment of the present application may be as follows:

301. and the electronic equipment displays a video playing page corresponding to the second set of the television play 1 in the video playing client.

In practical applications, for example, as shown in fig. 10, the user 1 may log in the video playing client by using the account 1, and view the second episode of the tv play 1 based on the video playing client, at this time, the second episode of the tv play 1 may be played in the video playing page of the video playing client. The video playing page can further comprise a voice input control used for triggering generation of a video label and a label viewing control used for viewing the video label.

302. When detecting that the user presses the voice input button for a long time, the electronic equipment detects the login condition of the user.

In practical applications, for example, when the 20 th minute and 30 th second in the second set of the series 1 are played in the video playing page, a long-time pressing of the voice input button by the user is detected, and a video tag generation request is triggered. As shown in fig. 7, the electronic device may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has already logged in the video playing client, the conditions for constructing the video tag are satisfied at this time, then the steps of the subsequent tag processing method may be performed.

If the fact that the user does not log in the video playing client is detected, the fact that the condition for constructing the video tag is not met is indicated, therefore, a login request can be sent to the user, the user can log in the video playing client through the user account according to the login request, and then the steps of the subsequent tag processing method are carried out.

303. When a user logs in a video playing client, the electronic equipment acquires video frame image information corresponding to the current video time and currently recorded voice information.

In practical applications, for example, when it is detected that the user has logged in the video playing client, the 20 th minute and the 30 th second in the second set of the series 1 may be determined as the current video time, and video frame image information corresponding to the current video time may be acquired, where the video frame image information may include one or more of a video title "series 1" corresponding to the series 1, a sub-video title "second set" corresponding to the second set of the series 1, a video unique identifier corresponding to the second set of the series 1, and a timestamp. And acquires voice information recorded by the user for the voice input button.

After acquiring the video frame image information corresponding to the current video time and the currently recorded voice information, the client voice service may upload an instruction to the server, where the instruction may include the video frame image information corresponding to the current video time and the currently recorded voice information. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, that is, whether the voice information is invalid, such as being indistinguishable or not supported by the instruction, exists is judged, and if the voice information is detected to be invalid, the voice information can be fed back to the client to report an error. If the voice information in the instruction is detected to be valid, whether the instruction is a tag generation instruction or not can be continuously judged, and if the instruction is not the tag generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a tag generation instruction, the following video tag generation step can be performed.

In an embodiment, since the video in the video playing page is still in a state of continuing playing when the user records the voice information for the voice input button, the audio in the video playing page can be closed during the process that the user presses the voice input button for a long time, so that the video in the video playing page continues playing in a silent state, and when the user stops pressing the voice input button for a long time, the audio in the video is recovered, and the video with the audio continues playing.

In an embodiment, during the long-time pressing of the voice input button by the user, the volume of the audio in the video may also be reduced, so that the audio continues to be played in a low volume state, and when the user stops pressing the voice input button, the volume of the audio in the video is restored, and the video with the original audio volume continues to be played.

304. When the voice information comprises the bookmark, the electronic equipment acquires the video unit to be marked corresponding to the 20 th minute and the 30 th second from the second set of the television series 1.

In practical applications, for example, after voice information recorded by a user is acquired, the voice information can be detected, and when it is detected that the voice information includes a "bookmark", it indicates that the user needs to construct a video tag of a bookmark type. The electronic device may obtain a video frame corresponding to the 20 th and 30 th seconds from the second set of the drama 1, and use the video frame as a video unit to be marked.

305. And based on the voice information and the video frame image information, the electronic equipment generates a video label corresponding to the video unit to be marked.

In practical applications, for example, when it is detected that "bookmark" is included in the voice information, a video tag of bookmark type may be constructed, and when it is detected that "scenario 1" is also included in the voice information, it indicates that the user needs to add the tag content "scenario 1" to the video tag, and a video tag including the tag content "scenario 1" may be constructed, where the video tag corresponds to the 20 th and 30 th seconds in the second set of series 1.

In an embodiment, for example, after the video tag is generated, the user may be prompted to successfully produce the video tag in a message prompt box, which may include "tag production successful". For another example, after the video tag is generated, the user may be prompted by voice that the tag is successfully made, so as to prompt the user that the video tag is successfully made.

In an embodiment, after the video tag is generated, the video tag may also be shared, for example, after the user 1 shares the 20 th and 30 th video tags in the second set of the drama 1 with the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tag corresponding to the 20 th and 30 th second set of the drama 1 with the user 2.

In an embodiment, after the video tag is generated, the video tag, the merging voice information, the current video time, the video frame image information, and other information may be stored in a database. The data structure of the video tag may include: video ID, video name, video subtitle, link of target video image, current video time, total duration of target video, content information of video tag, and the like.

306. When the user is detected to click the tag view button, the electronic equipment displays the video tag list.

In practical applications, for example, as shown in fig. 5, when it is detected that a user clicks a tag view button in a video playback page, a video tag list may be displayed, where the video tag list includes 2 video tags corresponding to a first episode of a

tv series

1, 1 video tag corresponding to a second episode, and 1 video tag corresponding to a first episode of a second quarter of the tv series 1, where a total duration corresponding to a video of the episode is indicated after each episode, a video time corresponding to the video tag is indicated after each video tag, and tag content corresponding to the video tag, and each video tag corresponds to one target video image.

In an embodiment, for example, as shown in fig. 8, when it is detected that the user clicks a tab view button in the video playback page, a video tab list may be further displayed on the left side of the video playback page, where the video tab list includes 2 video tabs corresponding to the first episode of the

tv series

1, 1 video tab corresponding to the second episode, and 1 video tab corresponding to the first episode of the second season of the tv series 1, where a video time corresponding to each video tab and a tab content corresponding to the video tab are indicated behind each video tab.

307. And when detecting that the user clicks the area corresponding to the target video tag in the video tag list, the electronic equipment skips to play the video clip to be played.

In practical applications, for example, after the video tag list is presented, when it is detected that the user clicks an area corresponding to a target video tag in the video tag list, it may be determined that the target video tag corresponds to 20 th minute and 30 th second in the second set of the series 1, and it may be determined that a video segment to be played is a video segment in the second set of the series 1 starting from the 20 th minute and 30 th second, and then, the video segment to be played is skipped in the video playing page.

As can be seen from the above, in the embodiment of the present application, the electronic device may display the video playing page corresponding to the second episode of the tv play 1 in the video playing client, when detecting that the user presses the voice input button for a long time, detecting the login condition of the user, when the user logs in the video playing client, acquiring the video frame image information corresponding to the current video time and the currently recorded voice information, when the voice information comprises a bookmark, acquiring a video unit to be marked corresponding to 20 th minute and 30 th second from a second set of the television series 1, generating a video label corresponding to the video unit to be marked based on the voice information and the video frame image information, and when detecting that the user clicks the tag viewing button, displaying the video tag list, and when detecting that the user clicks the area corresponding to the target video tag in the video tag list, skipping to play the video clip to be played. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved. And the user can add the label content in the video label, so when the user checks the video label list, the user can clearly and intuitively know the content concerned by the user, the watching experience of the user, the development and change of the plot and the like according to the added label content, thereby facilitating the plot review of the user. And moreover, the user can also utilize the video tag list to carry out skip playing of the video, and when skip playing operation of the user is detected, the video playing page can directly skip and play the video clip concerned by the user. Moreover, the video tag can not only aim at a certain moment in the video, but also aim at a certain time period in the video, so that a user can also store a section of video clip by using the video tag, add tag content to the video clip, and quickly and conveniently skip to play by using the form of the video tag.

Referring to fig. 4, a specific flow of the tag processing method according to the embodiment of the present application may be as follows:

401. and the electronic equipment displays a video playing page corresponding to the second set of the television play 1 in the video playing client.

In practical applications, the specific steps for displaying the video playing page have been described, and are not described herein again.

402. When detecting that the user presses the voice input button for a long time, the electronic equipment detects the login condition of the user.

In practical applications, for example, when the video playing page is played to 20 th 30 th second of the second set of the series 1, the user is detected to press the voice input button for a long time, and voice information is recorded, and then when the voice input button is stopped for 22 th 30 th second, the video tag generation request is triggered. As shown in fig. 7, the electronic device may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has already logged in the video playing client, the conditions for constructing the video tag are satisfied at this time, then the steps of the subsequent tag processing method may be performed.

403. When a user logs in a video playing client, the electronic equipment acquires video frame image information corresponding to the current video time and currently recorded voice information.

After acquiring the video frame image information corresponding to the current video time and the currently recorded voice information, the client voice service may upload an instruction to the server, where the instruction may include the video frame image information corresponding to the current video time and the currently recorded voice information. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, that is, whether the voice information is invalid, such as being indistinguishable or not supported by the instruction, exists is judged, and if the voice information is detected to be invalid, the voice information can be fed back to the client to report an error. If the voice information in the instruction is detected to be valid, whether the instruction is a tag generation instruction or not can be continuously judged, and if the instruction is not the tag generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a tag generation instruction, the following video tag processing steps can be entered.

404. When the voice information comprises 'record short video', the electronic equipment acquires the video units to be marked corresponding to the 20 th 30 th second to the 22 nd 30 th second from the second set of the television series 1.

In practical applications, for example, after the voice information recorded by the user is acquired, the voice information can be detected, and when the voice information is detected to include a 'recorded short video', it is indicated that the user needs to construct a video tag of a short video type. The electronic device may obtain a video segment corresponding to the 20 th 30 th second to the 22 nd 30 th second from the second set of the series 1, and use the video segment as a video unit to be marked.

405. And based on the voice information and the video frame image information, the electronic equipment generates a video label corresponding to the video unit to be marked.

In practical applications, for example, when it is detected that "record short video" is included in the voice information, a video tag of a short video type may be constructed, and when it is detected that "scenario 1" is also included in the voice information, it indicates that the user needs to add a tag content "scenario 1" to the video tag, and a video tag including the tag content "scenario 1" may be constructed, where the video tag corresponds to the 20 th 30 th second to the 22 nd 30 th second in the second set of the series 1.

In an embodiment, after the video tag is generated, the video tag may also be shared, for example, after the user 1 shares the 20 th to 30 th video tags in the second set of the series 1 to the 22 nd to 30 th video tags in the second set of the series 2 to the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tags corresponding to the 20 th to 30 th video tags in the second set of the series 1 to the 22 nd to 30 th video tags in the second set of the series 1 to the user 2.

In an embodiment, after the video tag is generated, the video tag can be uploaded to the cloud, so that the multiple platforms are consistent. For example, the user 1 logs in the video playback client 1 with the user account 1, and adds video tags at 20 th, 30 th second to 22 th, 30 th second in the third quarter of the series 1. Then, when the user logs in the video playing client 2 by using the user account 1, the user may find the video tags corresponding to the 20 th, 30 th second to the 22 nd, 30 th second in the third quarter of the series 1 of the video playing client 2.

406. When the user is detected to click the tag view button, the electronic equipment displays the video tag list.

tv series

407. And when detecting that the user clicks the area corresponding to the target video tag in the video tag list, the electronic equipment skips to play the video clip to be played.

In practical applications, for example, after the video tag list is presented, when it is detected that the user clicks the area corresponding to the target video tag in the video tag list, it may be determined that the target video tag corresponds to the 20 th 30 th second to 22 th 30 th second in the second episode of the series 1, and the video segment to be played is determined to be the 20 th 30 th second to 22 th 30 th second video segment in the second episode of the series 1, and then, the video segment to be played is jumped and played in the video playing page.

As can be seen from the above, in the embodiment of the present application, a video playing page corresponding to the second episode of the tv show 1 in the video playing client may be displayed through the electronic device, when it is detected that the user presses the voice input button for a long time, the login situation of the user is detected, when the user has logged in the video playing client, the video frame image information corresponding to the current video time and the currently recorded voice information are obtained, when the voice information includes "record short video", the video unit to be marked corresponding to the 20 th 30 th second to 22 th 30 th second is obtained from the second episode of the tv show 1, based on the voice information and the video frame image information, the video tag corresponding to the video unit to be marked is generated, when it is detected that the user clicks the tag viewing button, the video tag list is displayed, when it is detected that the user clicks the area corresponding to the target video tag in the video tag list, skipping to play the video clip to be played. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved. And the user can add the label content in the video label, so when the user checks the video label list, the user can clearly and intuitively know the content concerned by the user, the watching experience of the user, the development and change of the plot and the like according to the added label content, thereby facilitating the plot review of the user. And moreover, the user can also utilize the video tag list to carry out skip playing of the video, and when skip playing operation of the user is detected, the video playing page can directly skip and play the video clip concerned by the user. Moreover, the video tag can not only aim at a certain moment in the video, but also aim at a certain time period in the video, so that a user can also store a section of video clip by using the video tag, add tag content to the video clip, and quickly and conveniently skip to play by using the form of the video tag.

In order to better implement the above method, correspondingly, the embodiment of the present application further provides a label processing apparatus, which may be integrated in an electronic device, and the label processing apparatus includes a display module 111, a generation module 112, and a presentation module 113, as follows with reference to fig. 11:

the display module 111 is configured to display a video playing page corresponding to a target video in a video playing client, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag;

a generating module 112, configured to generate, when a voice input operation for the voice input control is detected, a video tag corresponding to a current video time, where the video tag includes: the current video time and video frame image information corresponding to the current video time;

a presentation module 113, configured to, when a tag viewing operation for the tag viewing control is detected, present a video tag list, where the video tag list includes at least one generated video tag arranged in a predetermined order.

In an embodiment, the generating module 112 may include an obtaining sub-module 1121 and a generating sub-module 1122 as follows:

the obtaining sub-module 1121 is configured to, when a voice input operation for the voice input control is detected, obtain video frame image information corresponding to a current video time, a video unit to be marked corresponding to the current video time, and currently recorded voice information;

the generating submodule 1122 is configured to generate a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information.

In an embodiment, the obtaining sub-module 1121 may include a first obtaining sub-module 11211 and a second obtaining sub-module 11212, as follows:

a first obtaining sub-module 11211, configured to, when a voice input operation for the voice input control is detected, obtain a current video time, video frame image information corresponding to the current video time, and voice information recorded for the voice input control at present;

a second obtaining sub-module 11212, configured to obtain, from the target video, a video unit to be marked corresponding to the current video time based on the voice information.

In an embodiment, the third obtaining sub-module 11213 may be specifically configured to:

and determining the current video frame image as a video unit to be marked.

and determining the current video clip as a video unit to be marked.

In an embodiment, the generating sub-module 1122 may be specifically configured to:

and generating a video label corresponding to the video unit to be marked based on the label content text information and the video frame image information.

In an embodiment, the obtaining sub-module 1121 may be specifically configured to:

and acquiring the current voice information recorded aiming at the voice input control.

In one embodiment, the tag processing apparatus may further include a first obtaining module 114 and an arranging module 115, as follows:

a first obtaining module 114, configured to obtain video tags corresponding to multiple videos in a video set, where the video set includes videos corresponding to multiple hierarchies;

the arranging module 115 is configured to arrange the video tags based on the hierarchy of the video set and the current video time corresponding to the video tags to obtain a video tag list.

In an embodiment, the tag processing apparatus may further include a determining module 116, a second obtaining module 117, and a skipping module 118, as follows:

a determining module 116, configured to determine, when a skip play operation for a target video tag in the video tag list is detected, a target video time corresponding to the target video tag and a video to be played corresponding to the target video tag;

a second obtaining module 117, configured to obtain a to-be-played video clip from the to-be-played video based on the target video time;

and the skipping module 118 is used for skipping to play the video clip to be played.

and when the user login state information determines that the user has logged in the video playing client, acquiring video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and currently recorded voice information.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in the embodiment of the present application, a video playing page corresponding to a target video in a video playing client may be displayed through the display module 111, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag, and when a voice input operation for the voice input control is detected, a video tag corresponding to a current video time is generated through the generation module 112, where the video tag includes: when detecting a tag viewing operation for the tag viewing control, the display module 113 displays a video tag list including at least one generated video tag arranged in a predetermined order. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, and the flexibility of label processing is improved. And the user can add the label content in the video label, so when the user checks the video label list, the user can clearly and intuitively know the content concerned by the user, the watching experience of the user, the development and change of the plot and the like according to the added label content, thereby facilitating the plot review of the user. And moreover, the user can also utilize the video tag list to carry out skip playing of the video, and when skip playing operation of the user is detected, the video playing page can directly skip and play the video clip concerned by the user. Moreover, the video tag can not only aim at a certain moment in the video, but also aim at a certain time period in the video, so that a user can also store a section of video clip by using the video tag, add tag content to the video clip, and quickly and conveniently skip to play by using the form of the video tag.

The embodiment of the application also provides electronic equipment, and the electronic equipment can integrate any one of the label processing devices provided by the embodiment of the application.

For example, as shown in fig. 12, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

the electronic device may include components such as a processor 121 of one or more processing cores, memory 122 of one or more computer-readable storage media, a power supply 123, and an input unit 124. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 12 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 121 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 122 and calling data stored in the memory 122, thereby performing overall monitoring of the electronic device. Alternatively, processor 121 may include one or more processing cores; preferably, the processor 121 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 121.

The memory 122 may be used to store software programs and modules, and the processor 121 executes various functional applications and data processing by operating the software programs and modules stored in the memory 122. The memory 122 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 122 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 122 may also include a memory controller to provide the processor 121 with access to the memory 122.

The electronic device further comprises a power supply 123 for supplying power to the various components, and preferably, the power supply 123 may be logically connected to the processor 121 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 123 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may also include an input unit 124, and the input unit 124 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 121 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 122 according to the following instructions, and the processor 121 runs the application programs stored in the memory 122, so as to implement various functions as follows:

the method comprises the steps of displaying a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control used for triggering generation of a video label and a label viewing control used for viewing the video label, when voice input operation aiming at the voice input control is detected, the video label corresponding to the current video time is generated, and the video label comprises: and when detecting a label viewing operation aiming at the label viewing control, displaying a video label list, wherein the video label list comprises at least one generated video label arranged according to a preset sequence.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides an electronic device, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the tag processing methods provided in the embodiment of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any of the tag processing methods provided in the embodiments of the present application, beneficial effects that can be achieved by any of the tag processing methods provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The foregoing detailed description is directed to a tag processing method, a tag processing apparatus, a storage medium, and an electronic device provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A label processing method, comprising:

when voice input operation aiming at the voice input control is detected, generating a video label corresponding to the current video time, wherein the video label comprises the current video time and video frame image information corresponding to the current video time;

2. The tag processing method according to claim 1, wherein generating a video tag corresponding to a current video time when a voice input operation for the voice input control is detected comprises:

when voice input operation aiming at the voice input control is detected, acquiring video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and currently recorded voice information;

3. The tag processing method according to claim 2, wherein when a voice input operation for the voice input control is detected, acquiring video frame image information corresponding to a current video time, a video unit to be marked corresponding to the current video time, and currently recorded voice information, includes:

when voice input operation aiming at the voice input control is detected, acquiring current video time, video frame image information corresponding to the current video time and voice information recorded aiming at the voice input control currently;

and acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information.

4. The tag processing method according to claim 3, wherein obtaining the video unit to be tagged corresponding to the current video time from the target video based on the voice information comprises:

and determining the current video frame image as a video unit to be marked.

5. The tag processing method according to claim 3, wherein obtaining the video unit to be tagged corresponding to the current video time from the target video based on the voice information comprises:

and determining the current video clip as a video unit to be marked.

6. The tag processing method according to claim 2, wherein generating the video tag corresponding to the video unit to be tagged based on the voice information and the video frame image information comprises:

7. The tag processing method according to claim 3, wherein when a voice input operation for the voice input control is detected, acquiring a current video time, video frame image information corresponding to the current video time, and voice information recorded for the voice input control currently includes:

8. The label processing method according to claim 1, further comprising:

9. The label processing method according to claim 1, further comprising:

and skipping to play the video clip to be played.

10. The tag processing method according to claim 2, wherein when a voice input operation for the voice input control is detected, acquiring video frame image information corresponding to a current video time, a video unit to be marked corresponding to the current video time, and currently recorded voice information, includes:

11. A label processing apparatus, comprising:

the generating module is used for generating a video label corresponding to the current video time when the voice input operation aiming at the voice input control is detected, wherein the video label comprises the current video time and video frame image information corresponding to the current video time;

12. A computer storage medium having stored thereon a computer program, characterized in that, when the computer program is run on a computer, it causes the computer to execute the label processing method according to any one of claims 1 to 10.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any of claims 1 to 10 are implemented when the program is executed by the processor.