CN113949920A - Video annotation method and device, terminal equipment and storage medium - Google Patents

Video annotation method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN113949920A
CN113949920A CN202111558433.5A CN202111558433A CN113949920A CN 113949920 A CN113949920 A CN 113949920A CN 202111558433 A CN202111558433 A CN 202111558433A CN 113949920 A CN113949920 A CN 113949920A
Authority
CN
China
Prior art keywords
video
time
target
tag
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111558433.5A
Other languages
Chinese (zh)
Inventor
杜秋雨
刘国清
杨广
王启程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Minieye Innovation Technology Co Ltd
Original Assignee
Shenzhen Minieye Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Minieye Innovation Technology Co Ltd filed Critical Shenzhen Minieye Innovation Technology Co Ltd
Priority to CN202111558433.5A priority Critical patent/CN113949920A/en
Publication of CN113949920A publication Critical patent/CN113949920A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The application discloses a video labeling method, a device, a terminal device and a storage medium, which respond to the label selection operation acted on a label list subregion by displaying a video labeling interface, add a target label frame of a target video label to a time track of a time axis subregion, respond to the time selection operation acted on the time axis subregion, determine the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale, and finally generate video segmentation label information based on the target video label, the starting time scale and the ending time scale, so that a user can visually check video content and real-time labeled video on the same interface, thereby adding the label to the video in a what you see is what you get mode along with the video playing progress, without repeatedly watching the video, and simplifying the labeling operation, and the video labeling efficiency is improved.

Description

Video annotation method and device, terminal equipment and storage medium
Technical Field
The present application relates to the field of data annotation technologies, and in particular, to a video annotation method, apparatus, terminal device, and storage medium.
Background
With the rapid development of the artificial intelligence technology based on vision, the application scenes of the intelligent model are wider and wider, and the requirements for video annotation of different scenes are more and more. During intelligent model training, only key segments extracted from videos are generally needed to be used as sample data, and the key segments needed by different models are different. In order to fully utilize video information, marking the segments of the video in which the key information appears.
Currently, a annotator plays a video through a player, records key scenes in the video, creates annotation information, fills in a start time and an end time according to a video playing progress, and repeatedly watches the video to confirm the filled-in time points and the annotation information. However, for a plurality of labeled information appearing in the same time period, the playing time needs to be repeated for operation, so that the operation of the labeling process is complicated, and the video labeling efficiency is low.
Disclosure of Invention
The application provides a video labeling method, a video labeling device, terminal equipment and a storage medium, which aim to solve the technical problem that the labeling efficiency is low in the current video labeling mode.
In order to solve the above technical problem, in a first aspect, an embodiment of the present application provides a video annotation method, including:
displaying a video annotation interface, wherein the video annotation interface comprises a video playing area and a video annotation area, the video annotation area comprises a tag list sub-area and a time axis sub-area, the video playing area comprises a video picture corresponding to video data, the tag list sub-area comprises a plurality of video tags, and the time axis sub-area comprises video time scales and time tracks;
responding to a tag selection operation acting on a tag list subregion, and adding a target tag frame of a target video tag to a time track of a time axis subregion, wherein the target video tag is a video tag in the tag list subregion;
responding to time selection operation acting on a time axis subregion, and determining starting time scale and ending time scale of a target label frame on a time track according to the video time scale;
and generating video segmentation label information based on the target video label, the starting time scale and the ending time scale.
According to the embodiment of the application, the video annotation interface is displayed, so that a user can visually check the video content and the real-time annotation video on the same interface; by responding to the tag selection operation acting on the tag list sub-area, the target tag frame of the target video tag is added to the time track of the time axis sub-area, so that the video can be labeled in a WYSIWYG (what you see is what you get) mode in real time along with the video playing progress, the video does not need to be watched repeatedly, and the labeling operation is simplified; responding to time selection operation acting on the time axis subregion, and determining the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale, so that the time period corresponding to the label can be directly selected along with the video playing progress, a user does not need to repeatedly watch the video for confirmation, and the video labeling efficiency is improved; and finally, generating video segmentation label information based on the target video label, the starting time scale and the ending time scale, thereby realizing video segmentation labeling and improving video labeling efficiency.
In one embodiment, in response to a tag selection operation acting on the tag list sub-region, adding a target tag box of a target video tag to a time track of the time axis sub-region includes:
responding to a tag selection operation acting on a tag list subregion, and determining a target video tag in the tag list subregion;
and moving the target video label to a time track of the time axis subregion, and initializing a target label frame of the target video label on the time track.
According to the embodiment, the user can select the video tags in the sub-area of the tag list and add the video tags to the time track to generate the target tag frame, so that the video can be labeled in real time, the start time and the end time of the subsequent user can be conveniently modified by displaying in the tag frame mode, the labeling operation is simplified, and the labeling efficiency is improved.
In one embodiment, in response to a time selection operation applied to a sub-area of a time axis, determining a start time scale and an end time scale of a target tab box on a time track according to a video time scale includes:
determining the position and the length of a label frame of a target label frame on a time track in response to time selection operation acting on a time axis subregion;
and aligning the target label frame with the video time scale based on the position and the length of the label frame to obtain the start time scale and the end time scale.
According to the embodiment, the starting time scale and the ending time scale are determined in an automatic alignment mode, the operation difficulty of a user is reduced, and the labeling efficiency is improved.
In one embodiment, the video data includes video duration, and the generating of the video segment tag information based on the target video tag, the start time scale and the end time scale includes:
determining a first video time point corresponding to the starting time scale and a second video time point corresponding to the ending time scale according to the corresponding relation between the video time length and the time scale;
and forming video segmentation label information by using the label content corresponding to the target video label, the first video time point and the second video time point.
The embodiment converts the time scale of the time axis sub-region into the actual video time progress so as to perform fine time selection operation on the time axis sub-region, thereby facilitating the user operation.
In one embodiment, before displaying the video annotation interface, the method further comprises:
loading video data, wherein the video data comprises a video length;
converting the video length into a GUI pixel length of a graphical user interface, and taking the GUI pixel length as time scale;
based on the GUI pixel length, a time track is generated.
The embodiment converts the video length into the GUI pixel length, so that the user can perform fine operation on the time axis sub-region, and compared with a mode in which the user adjusts the video progress in a player, the method has the advantage of being simpler and more convenient.
Optionally, generating a time-track based on the GUI pixel length comprises:
based on the GUI pixel length, a plurality of time tracks are generated, and track identification information for each time track is configured.
The embodiment facilitates the user to add a plurality of tags by generating a plurality of time tracks, and facilitates subsequent viewing and modification.
In an embodiment, the sub-area of the time axis further includes a time pointer, and the time pointer is used for converting the pixel value on the time scale into a target time progress of the video data so as to display a video picture corresponding to the target time progress.
The present embodiment facilitates a user to perform a delicate operation by adjusting a video picture with a time pointer having a smaller time unit.
In a second aspect, an embodiment of the present application provides a video annotation device, including:
the display module is used for displaying a video annotation interface, the video annotation interface comprises a video playing area and a video annotation area, the video annotation area comprises a label list subregion and a time axis subregion, the video playing area comprises video pictures corresponding to video data, the label list subregion comprises a plurality of video labels, and the time axis subregion comprises video time scales and time tracks;
the first response module is used for responding to the tag selection operation acted on the tag list sub-area, adding a target tag frame of a target video tag to a time track of the time axis sub-area, wherein the target video tag is a video tag in the tag list sub-area;
the second response module is used for responding to the time selection operation acted on the time axis sub-area and determining the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale;
and the generation module is used for generating video segmentation label information based on the target video label, the starting time scale and the ending time scale.
In a third aspect, an embodiment of the present application provides a terminal device, including a display, a processor, and a memory, where the display is configured to display a video annotation interface, and the memory is configured to store a computer program, and when the computer program is executed by the processor, the steps of the video annotation method according to the first aspect are implemented.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the video annotation method according to the first aspect.
Please refer to the relevant description of the first aspect for the beneficial effects of the second to fourth aspects, which are not repeated herein.
Drawings
Fig. 1 is a schematic flowchart of a video annotation method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a video annotation interface according to an embodiment of the present application;
FIG. 3 is another schematic diagram of a video annotation interface provided in an embodiment of the present application;
FIG. 4 is a schematic illustration of a video annotation interface provided in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a video annotation device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the related art, currently, a annotating person plays a video through a player, records key scenes in the video, creates annotation information, fills in a start time and an end time according to a video playing progress, and repeatedly views the video to confirm the filled-in time points and the annotation information. However, for a plurality of labeled information appearing in the same time period, the playing time needs to be repeated for operation, so that the operation of the labeling process is complicated, and the video labeling efficiency is low.
Therefore, the embodiment of the application provides a video annotation method, a video annotation device, a terminal device and a storage medium, and a user can visually check video content and annotate a video in real time on the same interface by displaying a video annotation interface; by responding to the tag selection operation acting on the tag list sub-area, the target tag frame of the target video tag is added to the time track of the time axis sub-area, so that the video can be labeled in a WYSIWYG (what you see is what you get) mode in real time along with the video playing progress, the video does not need to be watched repeatedly, and the labeling operation is simplified; responding to the time selection operation acting on the time axis subregion, and determining the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale, so that the time period corresponding to the label can be directly selected along with the video playing progress, a user does not need to repeatedly watch the video for confirmation, and the video labeling efficiency is improved; and finally, generating video segmentation label information based on the target video label, the starting time scale and the ending time scale, thereby realizing video segmentation labeling and improving video labeling efficiency.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video annotation method according to an embodiment of the present disclosure. The video annotation method of the embodiment of the application can be applied to terminal equipment, and the terminal equipment includes but is not limited to computing equipment such as a smart phone, a tablet computer, a notebook computer and a desktop computer which are provided with a display. As shown in fig. 1, the video annotation method includes steps S101 to S104, which are detailed as follows:
step S101, a video annotation interface is displayed, the video annotation interface comprises a video playing area and a video annotation area, the video annotation area comprises a label list sub-area and a time axis sub-area, the video playing area comprises a video picture corresponding to video data, the label list sub-area comprises a plurality of video labels, and the time axis sub-area comprises video time scales and time tracks.
In this step, as shown in fig. 2, the video annotation interface is a schematic diagram of a video annotation interface, and the video playing area is similar to the playing interface of a conventional player, and has a video time progress bar, a playing start/pause key, a volume adjustment key, a playing double speed and a screen zoom-in/zoom-out key. The tag sub-list area includes a plurality of video tags and is also provided with an addition key for adding a new tag.
It will be appreciated that for different videotagging tasks, corresponding videotags may be preset. For example, fig. 2 is a task of labeling a vehicle driving video, and the video tags include, but are not limited to, road conditions, lane line colors, lane line types, slopes, weather, lighting, target types, area distribution, work conditions, target distances, traffic flows, target locations, and standards.
Optionally, each video tag also includes multiple subclasses, such as another schematic of the video annotation interface shown in FIG. 3, with lighting including shadows, reflections, daytime, tunnel, night, and so forth.
The sub-regions of the time axis include time scales, which may be 00:00:00, 00:00:20, 00:00:40, 00:01:00, and 00:01:20 as shown in FIG. 3, and time tracks. It will be appreciated that in practice the time scale comprises a time scale of every second or even smaller time units, and that the illustration is merely to show the time scale in whole points, but this is not to be taken as a limitation of the present application. The time track may be the track where the target video tag "+ tunnel" is shown in fig. 3.
It should be noted that the video annotation interfaces shown in fig. 2 to fig. 4 are used as examples and are not limited thereto, and other layouts may be adopted in other embodiments, which are not described herein again.
Optionally, the sub-area of the time axis further includes a time pointer, where the time pointer is used to convert a pixel value on a time scale into a target time schedule of the video data, so as to display a video picture corresponding to the target time schedule.
As shown in fig. 3, the video duration of the video playing area is 5 minutes and 59 seconds, the time axis converts the video duration into a finer GUI pixel length, and a time scale is correspondingly marked based on the GUI pixel length. In fig. 3, the time scale of the GUI visible area ranges from 0min 0 s to 1 min 20 s, so operating the time pointer in the sub-area of the time axis more easily adjusts the video progress, thereby adjusting the video frame with the time pointer having a smaller time unit, and facilitating the user to perform fine operation.
It should be noted that, in the embodiment of the present application, the time value on the time scale corresponds to the video time value in the video playing area, and the time value on the time scale is converted from the GUI pixel value in the sub-area of the time axis in proportion.
Optionally, since the display range of the GUI visible region in the time axis sub-region is limited, the time axis part beyond the GUI visible region is hidden and not displayed, a horizontal scroll bar may be displayed at the bottom of the time axis sub-region, and the display range of the GUI visible region is adjusted by scrolling the horizontal scroll bar left and right.
Alternatively, the horizontal scroll bar may also automatically follow the time pointer, keeping the time axis of the GUI viewable area around the point in time at which the video is currently playing.
Optionally, the sub-area of the time axis may also be horizontally enlarged or reduced by presetting shortcut keys to adjust the time granularity corresponding to the length of the GUI pixel. Step S102, responding to the tag selection operation acted on the tag list sub-area, and adding a target tag frame of a target video tag to the time track of the time axis sub-area, wherein the target video tag is the video tag in the tag list sub-area.
In this step, as shown in fig. 3, the tag selection operation may be to select a target video tag in a sub-area of the tag list and drag the target video tag onto the time track.
Optionally, a plurality of video tags may be contained within the target tag box. In yet another illustration of the video annotation interface shown in fig. 4, the target label box includes two video labels, tunnel and white.
Optionally, in response to the tag modification operation applied to the target tag box to modify the video tag of the target tag box, the tag modification operation may be to double-click the target tag box and change/add/delete the video tag.
And step S103, responding to the time selection operation acted on the time axis sub-area, and determining the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale.
In this step, both ends of the target label frame represent the start time scale and the end time scale corresponding to the video label in the target label frame. Alternatively, as shown in fig. 4, the time selection operation may be to drag both ends of the target tab frame to change the position of the target tab frame on the time track, so as to achieve the purpose of selecting time. Optionally, the time selection operation may also be a preset time filling option in the target tab box to fill in the start time scale and the end time scale.
And step S104, generating video segment label information based on the target video label, the starting time scale and the ending time scale.
In this step, for each target video label and the video progress corresponding to the start time scale and the end time scale thereof, a piece of video segment label information is composed, and finally, video annotation data is generated from all pieces of video segment label information.
In an embodiment, based on the embodiment shown in fig. 1, the step S102 includes:
determining the target video tag in the tag list sub-region in response to a tag selection operation acting on the tag list sub-region;
and moving the target video label to a time track of the time axis sub-area, and initializing a target label frame of the target video label on the time track.
In this embodiment, as shown in fig. 3, for example, during the video playing process, when the user sees a key scene, the target video tag may be dragged by the mouse onto the time track aligned with the time scale, and the time track may highlight to prompt that the target video tag is ready to be received. And after the mouse is released, creating a target label frame according to the coordinates released by the mouse on the time track, and automatically calculating the default time range covered by the target video label according to the default configuration of the target video label to finish initialization.
According to the embodiment, the user can select the video tags in the sub-area of the tag list and add the video tags to the time track to generate the target tag frame, so that the video can be labeled in real time, the start time and the end time of the follow-up user can be conveniently modified by displaying in the tag frame mode, the labeling operation is simplified, and the labeling efficiency is improved.
In an embodiment, on the basis of the embodiment shown in fig. 1, the step S103 includes:
determining the position and the length of a label frame of the target label frame on the time track in response to the time selection operation acting on the time axis subregion;
and aligning the target label frame with the video time scale based on the position of the label frame and the length of the label frame to obtain the starting time scale and the ending time scale.
In this embodiment, as shown in fig. 4, for example, in the video playing process, when the user sees a key scene, the user may enter a modification state by double-clicking the target tab box with a mouse, and may quickly fill in or select a preset tab. The left end and the right end of the target label frame can be dragged by a mouse, the position of the target label frame on the time track is adjusted, the time range which is signed and covered by the target label frame is quickly modified, the positions of the adjusted left end and right end are aligned with the time scale, and the time range, the starting time scale and the ending time scale which are covered by the target label frame are automatically calculated.
According to the embodiment, the starting time scale and the ending time scale are determined in an automatic alignment mode, the operation difficulty of a user is reduced, and the labeling efficiency is improved.
In an embodiment, the video data includes a video duration, and on the basis of the embodiment shown in fig. 1, the step S104 includes:
determining a first video time point corresponding to the starting time scale and a second video time point corresponding to the ending time scale according to the corresponding relation between the video duration and the time scales;
and forming the video segmentation label information by using the label content corresponding to the target video label, the first video time point and the second video time point.
In the present embodiment, the time scale represents a scale corresponding to the GUI pixel length. Optionally, the first GUI pixel length displayed by the video duration icon in the video playing region is converted into a second GUI pixel length with a smaller granularity according to a certain proportion, and the second GUI pixel length is displayed in the time axis sub-region, so that when a certain second GUI pixel value is determined in the time axis sub-region, the first GUI pixel length can be converted into a video time point corresponding to the video playing region according to the proportion.
For example, the video length is 100min, the GUI pixel length of the time length icon in the video playing area is 100px, the GUI pixel length is enlarged by 10 times to 1000px, then the 1000px is taken as the time scale, the time scale displayed in the sub-area of the time axis is 1000px, and the pixel length beyond the visible area of the GUI is hidden. When the time scale on the sub-area of the time axis is determined to be 500px, the time scale can be converted to 50min corresponding to the video time point of the video playing area in an equal proportion.
In one possible implementation, the time scale is a time scale represented by a video image frame, and the video image frame is obtained by converting video duration, for example, based on PAL format, 1 second corresponds to 25 frames, and based on NTSC format, 1 second corresponds to 30 frames, so that the time scale needs to be converted into an actual video time point. The embodiment converts the time scale of the time axis sub-region into the actual video time progress so as to perform fine time selection operation on the time axis sub-region, thereby facilitating the user operation.
In an embodiment, on the basis of the embodiment shown in fig. 1, before the step S101, the method further includes:
loading the video data, wherein the video data comprises a video length;
converting the video length into a GUI pixel length of a graphical user interface, and taking the GUI pixel length as the time scale;
generating the time-track based on the GUI pixel length.
In this embodiment, by converting the video length into the GUI pixel length, the user can perform a fine operation on the time axis sub-region, and compared with a method in which the user adjusts the video progress in a player, the method has the advantage of being simpler and more convenient.
Optionally, the establishing the time trajectory based on the GUI pixel length includes:
and generating a plurality of time tracks based on the GUI pixel length, and configuring track identification information of each time track.
In this alternative embodiment, as shown in fig. 4, multiple time tracks are created for video annotation with different dimensions, and multiple video tag boxes are created for visual display for each time track. Further, to facilitate distinguishing time tracks, each time track corresponds to unique ID identification information to which a target video tag on the time track corresponds. The embodiment facilitates the user to add a plurality of tags by generating a plurality of time tracks, and facilitates subsequent viewing and modification.
In order to execute the video annotation method corresponding to the above method embodiment, corresponding functions and technical effects are realized. Referring to fig. 5, fig. 5 is a block diagram illustrating a structure of a video annotation device according to an embodiment of the present application. For convenience of explanation, only the parts related to the present embodiment are shown, and the video annotation device provided in the embodiment of the present application includes:
the display module 501 displays a video annotation interface, where the video annotation interface includes a video playing area and a video annotation area, the video annotation area includes a tag list sub-area and a time axis sub-area, the video playing area includes a video picture corresponding to video data, the tag list sub-area includes a plurality of video tags, and the time axis sub-area includes video time scales and time tracks;
a first response module 502, configured to respond to a tag selection operation acting on the tag list sub-area, add a target tag frame of a target video tag to a time track of the time axis sub-area, where the target video tag is the video tag in the tag list sub-area;
a second response module 503, configured to determine, in response to a time selection operation applied to the sub-area of the time axis, a start time scale and an end time scale of the target tab box on the time track according to the video time scale;
a generating module 504, configured to generate video segment tag information based on the target video tag, the start time scale, and the end time scale.
In one embodiment, the first response module 502 includes:
a first response unit, configured to determine the target video tag in the tag list sub-region in response to a tag selection operation acting on the tag list sub-region;
and the moving unit is used for moving the target video tag to the time track of the time axis sub-area and initializing a target tag frame of the target video tag on the time track.
In one embodiment, the second response module 503 includes:
a second response unit, configured to determine, in response to a time selection operation performed on the time axis sub-region, a tab frame position and a tab frame length of the target tab frame on the time track;
and the alignment unit is used for aligning the target label frame with the video time scale based on the position of the label frame and the length of the label frame to obtain the starting time scale and the ending time scale.
In an embodiment, the generating module 504 includes:
the determining unit is used for determining a first video time point corresponding to the starting time scale and a second video time point corresponding to the ending time scale according to the corresponding relation between the video duration and the time scale;
and the forming unit is used for forming the video segmentation label information by using the label content corresponding to the target video label, the first video time point and the second video time point.
In one embodiment, the video annotation device further comprises:
a loading module, configured to load the video data, where the video data includes a video length;
the conversion module is used for converting the video length into a GUI pixel length of a graphical user interface and taking the GUI pixel length as the time scale;
a second generation module to generate the time-track based on the GUI pixel length.
Optionally, the second generating module includes:
and the generating unit is used for generating a plurality of time tracks based on the GUI pixel length and configuring track identification information of each time track.
In an embodiment, the sub-area of the time axis further includes a time pointer, and the time pointer is used for converting the pixel values on the time scale into a target time schedule of the video data, so as to display a video picture corresponding to the target time schedule.
The video annotation device can implement the video annotation method of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.
Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: a display 63, at least one processor 60 (only one shown in fig. 6), a memory 61, and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 implementing the steps in any of the above-described method embodiments when executing the computer program 62.
The terminal device 6 may be a computing device such as an intelligent collection device, a tablet computer, a desktop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is only an example of the terminal device 6, and does not constitute a limitation to the terminal device 6, and may include more or less components than those shown, or combine some components, or different components, such as an input/output device, a network access device, and the like.
The Processor 60 may be a Central Processing Unit (CPU), and the Processor 60 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may in some embodiments be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 61 may also be used to temporarily store data that has been output or is to be output.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.
The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.
In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a terminal device to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (10)

1. A method for video annotation, comprising:
displaying a video annotation interface, wherein the video annotation interface comprises a video playing area and a video annotation area, the video annotation area comprises a tag list subregion and a time axis subregion, the video playing area comprises video pictures corresponding to video data, the tag list subregion comprises a plurality of video tags, and the time axis subregion comprises video time scales and time tracks;
responding to a tag selection operation acting on the tag list sub-area, and adding a target tag frame of a target video tag to a time track of the time axis sub-area, wherein the target video tag is the video tag in the tag list sub-area;
responding to time selection operation acting on the time axis subregion, and determining starting time scale and ending time scale of the target label frame on the time track according to the video time scale;
and generating video segmentation label information based on the target video label, the starting time scale and the ending time scale.
2. The video annotation method of claim 1, wherein said adding a target tab box of a target video tab to a time track of the timeline sub-area in response to a tab selection operation applied to the tab list sub-area comprises:
determining the target video tag in the tag list sub-region in response to a tag selection operation acting on the tag list sub-region;
and moving the target video label to a time track of the time axis sub-area, and initializing a target label frame of the target video label on the time track.
3. The method for video annotation according to claim 1, wherein said determining the start time scale and the end time scale of the target tab box on the time track according to the video time scale in response to the time selection operation applied to the sub-area of the time axis comprises:
determining the position and the length of a label frame of the target label frame on the time track in response to the time selection operation acting on the time axis subregion;
and aligning the target label frame with the video time scale based on the position of the label frame and the length of the label frame to obtain the starting time scale and the ending time scale.
4. The video annotation method of claim 1, wherein the video data comprises a video duration, and wherein the generating video segment tag information based on the target video tag, the start time scale, and the end time scale comprises:
determining a first video time point corresponding to the starting time scale and a second video time point corresponding to the ending time scale according to the corresponding relation between the video duration and the time scales;
and forming the video segmentation label information by using the label content corresponding to the target video label, the first video time point and the second video time point.
5. The video annotation method of any one of claims 1 to 4, wherein prior to displaying the video annotation interface, further comprising:
loading the video data, wherein the video data comprises a video length;
converting the video length into a GUI pixel length of a graphical user interface, and taking the GUI pixel length as the time scale;
generating the time-track based on the GUI pixel length.
6. The video annotation method of claim 5, wherein said generating the time track based on the GUI pixel length comprises:
and generating a plurality of time tracks based on the GUI pixel length, and configuring track identification information of each time track.
7. The video annotation method of claim 1, wherein the sub-region of the time axis further comprises a time pointer, and the time pointer is used for converting the pixel value on the time scale into the target time progress of the video data so as to display the video picture corresponding to the target time progress.
8. A video annotation apparatus, comprising:
the display module is used for displaying a video annotation interface, the video annotation interface comprises a video playing area and a video annotation area, the video annotation area comprises a label list sub-area and a time axis sub-area, the video playing area comprises a video picture corresponding to video data, the label list sub-area comprises a plurality of video labels, and the time axis sub-area comprises video time scales and time tracks;
a first response module, configured to add a target tag frame of a target video tag to a time track of the time axis sub-region in response to a tag selection operation that acts on the tag list sub-region, where the target video tag is the video tag in the tag list sub-region;
the second response module is used for responding to the time selection operation acted on the time axis subregion and determining the starting time scale and the ending time scale of the target label frame on the time track according to the video time scale;
and the generation module is used for generating video segmentation label information based on the target video label, the starting time scale and the ending time scale.
9. A terminal device comprising a display for displaying a video annotation interface, a processor and a memory for storing a computer program which, when executed by the processor, carries out the steps of the video annotation method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the video annotation method according to any one of claims 1 to 7.
CN202111558433.5A 2021-12-20 2021-12-20 Video annotation method and device, terminal equipment and storage medium Pending CN113949920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111558433.5A CN113949920A (en) 2021-12-20 2021-12-20 Video annotation method and device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111558433.5A CN113949920A (en) 2021-12-20 2021-12-20 Video annotation method and device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113949920A true CN113949920A (en) 2022-01-18

Family

ID=79339304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111558433.5A Pending CN113949920A (en) 2021-12-20 2021-12-20 Video annotation method and device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113949920A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115334354A (en) * 2022-08-15 2022-11-11 北京百度网讯科技有限公司 Video annotation method and device
CN115424393A (en) * 2022-08-24 2022-12-02 青岛海容商用冷链股份有限公司 Vending machine energy-saving time setting method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665128A (en) * 2012-04-27 2012-09-12 北京人民广播电台 Method and device for customizing timer-shaft content
US20130145327A1 (en) * 2011-06-07 2013-06-06 Intersect Ptp, Inc. Interfaces for Displaying an Intersection Space
WO2014002004A1 (en) * 2012-06-25 2014-01-03 Batchu Sumana Krishnaiahsetty A method for marking highlights in a multimedia file and an electronic device thereof
US20150135068A1 (en) * 2013-11-11 2015-05-14 Htc Corporation Method for performing multimedia management utilizing tags, and associated apparatus and associated computer program product
US20160322081A1 (en) * 2015-04-30 2016-11-03 Rodica Schileru Method and system for segmenting videos
CN109495791A (en) * 2018-11-30 2019-03-19 北京字节跳动网络技术有限公司 A kind of adding method, device, electronic equipment and the readable medium of video paster
CN110381382A (en) * 2019-07-23 2019-10-25 腾讯科技(深圳)有限公司 Video takes down notes generation method, device, storage medium and computer equipment
CN111010619A (en) * 2019-12-05 2020-04-14 北京奇艺世纪科技有限公司 Method, apparatus, computer device and storage medium for processing short video data
CN111526405A (en) * 2020-04-30 2020-08-11 网易(杭州)网络有限公司 Media material processing method, device, equipment, server and storage medium
CN111654749A (en) * 2020-06-24 2020-09-11 百度在线网络技术(北京)有限公司 Video data production method and device, electronic equipment and computer readable medium
WO2020201780A1 (en) * 2019-04-04 2020-10-08 Google Llc Video timed anchors
US20210004131A1 (en) * 2019-07-01 2021-01-07 Microsoft Technology Licensing, Llc Highlights video player
US20210090610A1 (en) * 2019-09-20 2021-03-25 Beijing Xiaomi Mobile Software Co., Ltd. Video processing method, video playing method, devices and storage medium
CN113038265A (en) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 Video annotation processing method and device, electronic equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130145327A1 (en) * 2011-06-07 2013-06-06 Intersect Ptp, Inc. Interfaces for Displaying an Intersection Space
CN102665128A (en) * 2012-04-27 2012-09-12 北京人民广播电台 Method and device for customizing timer-shaft content
WO2014002004A1 (en) * 2012-06-25 2014-01-03 Batchu Sumana Krishnaiahsetty A method for marking highlights in a multimedia file and an electronic device thereof
US20150135068A1 (en) * 2013-11-11 2015-05-14 Htc Corporation Method for performing multimedia management utilizing tags, and associated apparatus and associated computer program product
US20160322081A1 (en) * 2015-04-30 2016-11-03 Rodica Schileru Method and system for segmenting videos
CN109495791A (en) * 2018-11-30 2019-03-19 北京字节跳动网络技术有限公司 A kind of adding method, device, electronic equipment and the readable medium of video paster
WO2020201780A1 (en) * 2019-04-04 2020-10-08 Google Llc Video timed anchors
US20210004131A1 (en) * 2019-07-01 2021-01-07 Microsoft Technology Licensing, Llc Highlights video player
CN110381382A (en) * 2019-07-23 2019-10-25 腾讯科技(深圳)有限公司 Video takes down notes generation method, device, storage medium and computer equipment
US20210090610A1 (en) * 2019-09-20 2021-03-25 Beijing Xiaomi Mobile Software Co., Ltd. Video processing method, video playing method, devices and storage medium
CN111010619A (en) * 2019-12-05 2020-04-14 北京奇艺世纪科技有限公司 Method, apparatus, computer device and storage medium for processing short video data
CN111526405A (en) * 2020-04-30 2020-08-11 网易(杭州)网络有限公司 Media material processing method, device, equipment, server and storage medium
CN111654749A (en) * 2020-06-24 2020-09-11 百度在线网络技术(北京)有限公司 Video data production method and device, electronic equipment and computer readable medium
CN113038265A (en) * 2021-03-01 2021-06-25 创新奇智(北京)科技有限公司 Video annotation processing method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
快乐猪脚饭: "「AE半汉化」基础入门级✦界面介绍 编辑流程", 《【「AE半汉化」基础入门级✦界面介绍 编辑流程-哔哩哔哩】 HTTPS://B23.TV/53FV3MB》 *
爱喝咖啡的当麻: "如何为你的B站视频添加进度条分段", 《哔哩哔哩 HTTPS://WWW.BILIBILI.COM/VIDEO/BV1VL411G7N7/?SPM_ID_FROM=AUTONEXT》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115334354A (en) * 2022-08-15 2022-11-11 北京百度网讯科技有限公司 Video annotation method and device
CN115334354B (en) * 2022-08-15 2023-12-29 北京百度网讯科技有限公司 Video labeling method and device
CN115424393A (en) * 2022-08-24 2022-12-02 青岛海容商用冷链股份有限公司 Vending machine energy-saving time setting method and system

Similar Documents

Publication Publication Date Title
US11741328B2 (en) Dynamic embedding of machine-readable codes within video and digital media
CN113949920A (en) Video annotation method and device, terminal equipment and storage medium
EP4207742A1 (en) Photography method, photography apparatus, and electronic device
US20120229489A1 (en) Pillarboxing Correction
CN111612873A (en) GIF picture generation method and device and electronic equipment
US11551392B2 (en) Graphic drawing method and apparatus, device, and storage medium
CN108449631B (en) Method, apparatus and readable medium for media processing
JP6143678B2 (en) Information processing apparatus, information processing method, and program
CN113126862B (en) Screen capture method and device, electronic equipment and readable storage medium
CN112860163A (en) Image editing method and device
CN113259592B (en) Shooting method and device, electronic equipment and storage medium
CN112887794A (en) Video editing method and device
US11126856B2 (en) Contextualized video segment selection for video-filled text
CN112822394A (en) Display control method and device, electronic equipment and readable storage medium
CN113852757B (en) Video processing method, device, equipment and storage medium
CN109871465B (en) Time axis calculation method and device, electronic equipment and storage medium
US20140111678A1 (en) Method and system for capturing, storing and displaying animated photographs
CN113794831B (en) Video shooting method, device, electronic equipment and medium
CN112887623B (en) Image generation method and device and electronic equipment
CN114650370A (en) Image shooting method and device, electronic equipment and readable storage medium
CN113873319A (en) Video processing method and device, electronic equipment and storage medium
CN114237800A (en) File processing method, file processing device, electronic device and medium
CN114302009A (en) Video processing method, video processing device, electronic equipment and medium
CN110662099B (en) Method and device for displaying bullet screen
CN114443182A (en) Interface switching method, storage medium and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220118

RJ01 Rejection of invention patent application after publication