CN115334354B

CN115334354B - Video labeling method and device

Info

Publication number: CN115334354B
Application number: CN202210975567.5A
Authority: CN
Inventors: 王浩; 王伟; 杨云鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2023-12-29
Anticipated expiration: 2042-08-15
Also published as: CN115334354A

Abstract

The disclosure provides a video annotation method and device, relates to the field of artificial intelligence, and particularly relates to the field of video evaluation. The specific implementation scheme is as follows: acquiring temporary annotation information of at least one annotation item of a target video and a corresponding time period; performing buried point rendering on temporary marking information corresponding to each time period on a time axis of the player; aggregating the temporary labeling information of each labeling item to obtain final labeling information; and outputting the final labeling information. According to the embodiment, the score ratio of each dimension marking item can be automatically calculated, the manual marking efficiency is improved, and the data tracing is facilitated.

Description

Video labeling method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the field of video evaluation, and specifically relates to a video labeling method and device.

Background

With the innovation of the content production field, the generation of videos according to the graphics context by utilizing AI (artificial intelligence) has been applied, so that the consumption experience of the text content is greatly improved. And explore and break through in three aspects of improving the information thickness of the video, increasing emotion presentation and enhancing visual perception. The evaluation of the video effectiveness of the materials produced by the current AI is to manually mark the time range of each index item in an off-line mode and manually calculate the produced report, so that the method is relatively time-consuming and labor-consuming, and is not beneficial to data tracing and quick production of marking results.

For the annotation of the video time period, the traditional method is to manually play the evaluation video on a browser, record the time period occupied by the score of each index item, manually calculate the proportion of the total sum of the time periods with the same score to the total duration of the video under the same index item, judge whether the video can be admitted or not according to the score and the proportion, and finally manually fill in an excel form. The whole flow is finished off-line, the evaluation efficiency is low, the control marking progress cannot be controlled, the online data result is not available, and the improvement efficiency of project user experience is affected.

Disclosure of Invention

The present disclosure provides a video annotation method, apparatus, device, storage medium and computer program product.

According to a first aspect of the present disclosure, there is provided a video annotation method, including: acquiring temporary annotation information of at least one annotation item of a target video and a corresponding time period; performing buried point rendering on temporary marking information corresponding to each time period on a time axis of the player; aggregating the temporary labeling information of each labeling item to obtain final labeling information; and outputting the final labeling information.

According to a second aspect of the present disclosure, there is provided a video annotation device comprising: the acquisition unit is configured to acquire temporary annotation information of at least one annotation item of the target video and a corresponding time period; the rendering unit is configured to perform buried point rendering on temporary marking information corresponding to each time period on a time axis of the player; the aggregation unit is configured to aggregate the temporary labeling information of each labeling item to obtain final labeling information; and an output unit configured to output the final annotation information.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.

According to the video labeling method and device, the starting point value and the ending point value of the time period are reported through the rendering capability of the plug-in time point of the player, the temporary result of the current time period is cached in a hash structure in combination with redis, the duty ratio of each labeling item time period is calculated in a final labeling automatic aggregation mode, and the labeling efficiency is improved. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a video annotation method according to the present disclosure;

3a-3e are schematic diagrams of one application scenario of the video annotation method according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a video annotation method according to the present disclosure;

FIG. 5 is a schematic structural view of one embodiment of a video annotation device according to the present disclosure;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of video annotation methods or video annotation devices of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a video player, a video annotation class application, a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting video playback, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for videos displayed on the terminal devices 101, 102, 103. The background server can analyze and process the received video annotation request and the like, and feed back the processing result (for example, the aggregated annotation information) to the terminal equipment.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein. The server may also be a server of a distributed system or a server that incorporates a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should be noted that, the video annotation method provided by the embodiment of the present disclosure may be performed by the terminal devices 101, 102, 103, or may be performed by the server 105. Accordingly, the video annotation device may be provided in the terminal device 101, 102, 103 or in the server 105. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a video annotation method according to the present disclosure is shown. The video labeling method comprises the following steps:

step 201, obtaining temporary labeling information of at least one labeling item of a target video and a corresponding time period thereof.

In this embodiment, the execution body of the video labeling method (for example, the server shown in fig. 1) may receive the report request from the terminal with which the user performs video labeling through a wired connection manner or a wireless connection manner. In order to obtain the scores of different labeling items in different time periods, the whole video needs to be browsed manually on a visual platform, the scores of the different labeling items at the time point are reported on the whole time node of video playing, and the starting and ending time node (i.e. the time period) for obtaining the scores are obtained, so that the concept of time period labeling is provided. Because the annotation item may have scores for multiple time periods, the annotation of a single time period is referred to as a temporary annotation in order to aggregate all data at the time of final review. And making an overall conclusion on a tagged material is referred to as final tagging. The temporary marking information after manual marking and the corresponding time period can be sent to the server through a report request. The annotation item can be set according to the requirement, and can generally comprise voice, pictures, subtitles and the like. Each item may also be subdivided, e.g., subtitle overlap, subtitle availability, subtitle breaks, etc.

And 202, performing buried point rendering on temporary marking information corresponding to each time period on a time axis of the player.

In this embodiment, based on the dynamic rendering capability of the time point provided by the video js plug-in (e.g. player. Markers), buried point rendering is performed on the data reported at different time points on the time axis of the player, and the rendered point location can obtain temporary labeling information from the server by clicking the buried point.

And 203, aggregating the temporary labeling information of each labeling item to obtain final labeling information.

In this embodiment, the temporary labeling information may include a segmentation score of each time period, and temporary labeling information with the same score of the same labeling item in all time periods may be aggregated to obtain final labeling information. The final annotation information can also calculate the duty ratio of each score of each annotation item, so that the video quality can be conveniently evaluated. For the same label item and the same fractional time period information, firstly accumulating the sum of the time periods, calculating the duty ratio of the total duration of the video, and mapping the time starting and ending time points into' minutes: second "format, finally obtaining a single result: "[ PREPARATION one ] 00:01-00:03, 00:05-00:10, 00:18-00:19 (total 8 seconds), with a ratio xx%".

And step 204, outputting final labeling information.

In this embodiment, the final labeling information may be output in various manners, for example, information pushing, mail, and the like. And may also be displayed on the page of the player as shown in fig. 3 e. Temporary labeling information can also be selectively output. And whether the video needs to be released or not can be judged according to the final labeling information, and if the video does not meet the release requirement, the video needs to be returned to be revised again. If the data is met, the data can be issued to the network platform.

According to the method provided by the embodiment of the disclosure, the starting and ending point values of the time periods are reported through the rendering capability of the time points of the plug-in units of the player, the duty ratio of each labeling item time period is calculated in the final labeling automatic aggregation mode, and the labeling efficiency is improved. The method is mainly applied to the time period annotation of the video content produced by the AI in multiple dimensions such as quality, relevance and the like, and the score duty ratio of each dimension is automatically calculated, so that the manual annotation efficiency is improved, and the data tracing is facilitated. The method is suitable for a large background of cost reduction and efficiency improvement, and finally the user experience and the product quality can be improved.

In some optional implementations of this embodiment, the temporary annotation information includes a segmentation score and remark information for each time period; the method further comprises: constructing a key of the hash structure according to the video identification of the target video, constructing a domain of the hash structure according to different time periods, constructing a value of the hash structure according to the start-stop time, the segmentation fraction and the length of the video of each time period, and storing the value into redis; and storing the remark information into a database according to the video identification.

Since video segment annotation information may be frequently read and written, segment related information is cached using redis. Considering that a single video single user can add segmentation information for multiple annotation items at different points in time, a hash (hash) structure is relatively more suitable for the present function. The hash structure is shown in fig. 3 a.

In the hash structure, a key of the hash structure may be constructed according to the video identifier of the target video, for example, a constant+a user mailbox prefix+a video id is used as a key, values of different time points are used as fields (fields), the values (values) are current json characters of a json structure, and the json structure is used for storing the starting and ending time points of each annotation item and the video length. And other remark information such as a temporary labeling text conclusion, a temporary labeling score, an evaluation material id and the like are stored in the database (db).

The data processing speed can be improved by storing frequently read data into redis, and the stability of the data can be ensured by storing remark information into a database.

In some optional implementations of this embodiment, the aggregating the temporary labeling information of each labeling item to obtain final labeling information includes: reading all values of the hash structure from redis; constructing mapping relations among the labeling items, the segmentation scores and the time periods according to all values of the hash structure; for the time periods of the same labeling item and the same segmentation fraction, accumulating the time period sum and calculating the proportion of the time period sum to the length of the video as a single aggregation result; and splicing each single aggregation result with the remark information to form final labeling information.

1. Firstly, splicing the hash keys of the user labeling material dimension, and inquiring all hash values of the corresponding keys.

2. Traversing json characters of corresponding values of different field, and converting the type intensity into an array.

3. And constructing different new arrays aiming at the same labeling item, wherein each array stores the mapping relation between the same scoring result and the information of a plurality of time periods. And constructing a new array, and storing the mapping relation between the annotation item and the time period. The new array mapping is shown in fig. 3 b.

4. Traversing the array, aiming at the time slot information of the same label item and the same score, accumulating the sum of the time slots, calculating the duty ratio of the total duration of the video, and mapping the time start-stop nodes into 'scores': second "format, finally obtaining a single result: "[ PREPARATION one ] 00:01-00:03, 00:05-00:10, 00:18-00:19 (total 8 seconds), with a ratio xx%".

5. And splicing all the single results to obtain the aggregation result of the segmentation labels.

Thus, the temporary labeling information can be quickly aggregated.

In some optional implementations of this embodiment, the method further includes: verifying the start-stop time of each time period according to the length of the video in the value of the hash structure; the hash structure values that fail the verification are filtered out. In step 2 above, outliers (time period starting point less than 0, end node greater than the total duration of the video, or end time point less than the starting time point, etc.) can be culled. After the time point verification is carried out, the abnormal value is filtered, and the error assessment of the video caused by the abnormal value is avoided. And the abnormal place can be prompted, and the quality inspection is carried out on the working process of manual marking.

On the online platform, after the user finishes all temporary labeling of a single material, relevant information is already input into a database and redis, when the user switches from the temporary labeling tab to the final labeling tab, the result of the segmentation information aggregation is displayed, and at the moment, the automatic calculation of video segmentation labeling is finished.

In some optional implementations of this embodiment, the method further includes: acquiring newly added temporary labeling information of the target video; updating the hash structure according to the newly added temporary labeling information; and re-aggregating the temporary labeling information of each labeling item according to the updated hash structure to obtain final labeling information. If the user considers that the temporary marking information is missing, the operation can be repeated again to report the temporary marking information, at the moment, the temporarily stored information is updated, the value of the new time point is taken as field, and the value information of the new time period is added. And repeating the aggregation operation to obtain a new calculation result of the video segmentation labels. New temporary labeling information can be flexibly added, calculation is accumulated on the basis of the original aggregation result, and calculation time is saved.

In some optional implementations of this embodiment, the method further includes: in response to receiving a request for deleting the target temporary annotation information, logically deleting the target temporary annotation information in the hash structure; and re-aggregating the temporary marking information of each marking item according to the hash structure after the target temporary marking information is logically deleted, so as to obtain final marking information. If the user considers that the temporary marking information is wrong, the user can perform logic deletion on the reported temporary marking information (the state of the temporary marking data is set to be deleted), at the moment, the temporarily marked stored information is updated, and the corresponding value information is deleted for field at the reporting time point. And repeating the aggregation operation to obtain a new calculation result of the video segmentation labels. The temporary labeling information is not deleted directly from redis, so that the deleting speed can be improved. In addition, when the data needs to be recovered, the state of the temporary marked data is set to be undeleted, so that the data can be recovered quickly without re-writing.

The overall flow of video annotation is shown in fig. 3 d. The user can watch various annotation information on the annotation page, such as annotation item sets, annotation videos, video basic information, annotation progress, dynamic rendering of progress bar breakpoints, temporary annotation information display and the like. The user scores each annotation item piecewise while watching the video, e.g., 0-10 seconds caption overlap less than 2 points, content difference 0, etc. Some remark information can also be compiled, e.g. 2:06 pictures are cut quickly, etc. After the labeling is finished, the user can click to report, the time period related information can be stored in redis, and remark information can be stored in a database (db). And when detecting that the user marks the complete video, starting segment marking calculation. The hash structure is read from db and redis, dirty data (the abnormal hash value is determined by conflict between a time starting point and a video length and the like) is filtered, then the integral mapping relation construction is carried out, the single segment duty ratio is calculated, and finally the multi-segment information is aggregated. And the method also supports the addition and modification of temporary annotation information and the display of annotation information. The effect of use is shown in figure 3 e.

The scheme of the application is supplemented with a notch of the video segmentation labeling scheme. The efficiency of manual video annotation can be improved. And by online labeling, the control progress and data can be traced conveniently. And the accuracy of data calculation and the confidence of the labeling result of each index item can be improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a video annotation method is shown. The video annotation method flow 400 comprises the steps of:

step 401, obtaining temporary labeling information of at least one labeling item of the target video and a corresponding time period thereof.

And step 402, performing buried point rendering on temporary marking information corresponding to each time period on a time axis of the player.

And step 403, aggregating the temporary labeling information of each labeling item to obtain final labeling information.

Steps 401-403 are substantially identical to steps 201-203 and are therefore not described in detail.

Step 404, a labeling information query request is received.

In this embodiment, as shown in fig. 3c, the labeling material ID, that is, the target video ID, may be input to request query for labeling information. The method can also select basic information of a query labeling task (such as auditors, review flows and the like), collection information of query labeling items, basic information of query labeling materials (such as titles, websites and the like), and information of query labeling results (temporary labeling results and final labeling results can be respectively queried). After the basic information of the labeling task, the basic information of the labeling item set and the basic information of the labeling materials are selected and inquired, the basic information such as the current material progress, the playing video, the title and the like can be rendered at the front end.

Step 405, query temporary labeling information and final labeling information.

In this embodiment, temporary annotation information and final annotation information may be queried from redis and db. The data related to the time period is stored in redis, and others may be stored in db. In db, the entire contents of redis may also be backed up.

And step 406, rendering the queried temporary annotation information and final annotation information on the time axis of the player.

In this embodiment, after the temporary labeling result of the query is selected, the temporary labeling time points may be aggregated, and then the front end assigns the temporary labeling time points into the layer. Markers (dynamic rendering component), and the progress bar renders a plurality of time points. If the final labeling result of the query is selected, analyzing and rendering the scores of all labeling items.

The method and the device realize online labeling and are convenient for tracing the control progress and the data. And the progress of query marking is visualized, repeated marking or missing marking is prevented, and marking efficiency is improved.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of a video annotation device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device is particularly applicable to various electronic devices.

As shown in fig. 5, the video labeling apparatus 500 of the present embodiment includes: an acquisition unit 501, a rendering unit 502, an aggregation unit 503, an output unit 504. Wherein, the obtaining unit 501 is configured to obtain temporary annotation information of at least one annotation item of the target video and a corresponding time period thereof; a rendering unit 502 configured to perform buried point rendering on temporary annotation information corresponding to each time period on a time axis of the player; an aggregation unit 503, configured to aggregate the temporary labeling information of each labeling item to obtain final labeling information; an output unit 504 configured to output the final annotation information.

In this embodiment, the specific processing of the obtaining unit 501, the rendering unit 502, the aggregation unit 503, and the output unit 504 of the video labeling apparatus 500 may refer to step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2.

In some optional implementations of this embodiment, the temporary annotation information includes a segmentation score and remark information for each time period; and the apparatus 500 further comprises a storage unit (not shown in the drawings) configured to: constructing a key of the hash structure according to the video identification of the target video, constructing a domain of the hash structure according to different time periods, constructing a value of the hash structure according to the start-stop time, the segmentation fraction and the length of the video of each time period, and storing the value into redis; and storing the remark information into a database according to the video identification.

In some optional implementations of the present embodiment, the aggregation unit 503 is further configured to: reading all values of the hash structure from redis; constructing mapping relations among the labeling items, the segmentation scores and the time periods according to all values of the hash structure; for the time periods of the same labeling item and the same segmentation fraction, accumulating the time period sum and calculating the proportion of the time period sum to the length of the video as a single aggregation result; and splicing each single aggregation result with the remark information to form final labeling information.

In some optional implementations of the present embodiment, the apparatus 500 further includes a filtering unit (not shown in the drawings) configured to: verifying the start-stop time of each time period according to the length of the video in the value of the hash structure; the hash structure values that fail the verification are filtered out.

In some optional implementations of the present embodiment, the apparatus 500 further includes an adding unit (not shown in the drawings) configured to: acquiring newly added temporary labeling information of the target video; updating the hash structure according to the newly added temporary labeling information; and re-aggregating the temporary labeling information of each labeling item according to the updated hash structure to obtain final labeling information.

In some optional implementations of the present embodiment, the apparatus 500 further includes a deleting unit (not shown in the drawings) configured to: in response to receiving a request for deleting the target temporary annotation information, logically deleting the target temporary annotation information in the hash structure; and re-aggregating the temporary marking information of each marking item according to the hash structure after the target temporary marking information is logically deleted, so as to obtain final marking information.

In some optional implementations of the present embodiment, the apparatus 500 further includes a query unit (not shown in the drawings) configured to: receiving a labeling information inquiry request; inquiring temporary marking information and final marking information; and rendering the queried temporary annotation information and final annotation information on a time axis of the player.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of flow 200 or 400.

A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of flow 200 or 400.

A computer program product comprising a computer program that when executed by a processor implements the method of flow 200 or 400.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as video annotation methods. For example, in some embodiments, the video annotation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the video annotation method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the video annotation method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video annotation method comprising:

acquiring temporary annotation information of at least one annotation item of a target video and a corresponding time period, wherein the temporary annotation information comprises the score of the annotation item of a single time period, the segmentation score and remark information of each time period, and the annotation item comprises at least one of the following: voice, picture, caption, wherein the score of caption includes at least one of: caption overlap, caption effectiveness, caption sentence break;

performing buried point rendering on temporary marking information corresponding to each time period on a time axis of a player based on the time point dynamic rendering capability provided by the video js plug-in, wherein the temporary marking information is acquired from a server by clicking the rendered buried point;

the temporary labeling information of each labeling item is aggregated according to the starting and ending value of the reporting time period, and final labeling information is obtained;

outputting the final labeling information;

judging whether the video needs to be released according to the final labeling information, if not, returning to be modified again, and if so, releasing to the network platform;

constructing a hash structure key according to the video identification of the target video, constructing a domain of the hash structure according to different time periods, and storing a hash structure value according to the start-stop time, the segmentation fraction and the video length of each time period into redis;

storing the remark information into a database according to the video identification;

the aggregation of the temporary labeling information of each labeling item according to the starting and ending value of the reporting time period to obtain final labeling information comprises the following steps:

and aggregating the temporary annotation information with the same score of the same annotation item in all the time periods, and calculating the duty ratio of the total video duration of each score of each annotation item by accumulating the time period sum to obtain final annotation information for evaluating the video quality.

2. The method of claim 1, wherein the aggregating the temporary labeling information of each labeling item according to the start-stop value of the reporting period to obtain final labeling information includes:

reading all values of the hash structure from redis;

constructing mapping relations among the labeling items, the segmentation scores and the time periods according to all values of the hash structure;

for the time periods of the same labeling item and the same segmentation fraction, accumulating the time period sum and calculating the proportion of the time period sum to the length of the video as a single aggregation result;

and splicing each single aggregation result with the remark information to form final labeling information.

3. The method of claim 2, wherein the method further comprises:

verifying the start-stop time of each time period according to the length of the video in the value of the hash structure;

the hash structure values that fail the verification are filtered out.

4. The method of claim 1, wherein the method further comprises:

acquiring newly added temporary labeling information of the target video;

updating the hash structure according to the newly added temporary labeling information;

and re-aggregating the temporary labeling information of each labeling item according to the updated hash structure to obtain final labeling information.

5. The method of claim 1, wherein the method further comprises:

in response to receiving a request for deleting the target temporary annotation information, logically deleting the target temporary annotation information in the hash structure;

and re-aggregating the temporary marking information of each marking item according to the hash structure after the target temporary marking information is logically deleted, so as to obtain final marking information.

6. The method of claim 1, wherein the method further comprises:

receiving a labeling information inquiry request;

inquiring temporary marking information and final marking information;

and rendering the queried temporary annotation information and final annotation information on a time axis of the player.

7. A video annotation device comprising:

the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is configured to acquire temporary annotation information of at least one annotation item of a target video and a corresponding time period thereof, the temporary annotation information comprises the score of the annotation item of a single time period, the segmentation score of each time period and remark information, and the annotation item comprises at least one of the following components: voice, picture, caption, wherein the score of caption includes at least one of: caption overlap, caption effectiveness, caption sentence break;

the rendering unit is configured to perform buried point rendering on temporary marking information corresponding to each time period on a time axis of the player based on the time point dynamic rendering capability provided by the video js plug-in, wherein the temporary marking information is acquired from the server by clicking the rendered buried point;

the aggregation unit is configured to aggregate the temporary labeling information of each labeling item according to the starting and ending values of the reporting time period to obtain final labeling information;

an output unit configured to output the final annotation information; judging whether the video needs to be released according to the final labeling information, if not, returning to be modified again, and if so, releasing to the network platform;

a storage unit configured to construct a key of a hash structure according to a video identifier of the target video, construct a domain of the hash structure according to different time periods, and store a value of the hash structure constructed according to a start-stop time, a segmentation score, and a length of the video of each time period into a redis; storing the remark information into a database according to the video identification;

wherein the aggregation unit is further configured to:

and aggregating the temporary annotation information with the same score of the same annotation item in all the time periods, and calculating the duty ratio of the total video duration of each score of each annotation item by accumulating the time period sum to obtain the final annotation information.

8. The apparatus of claim 7, wherein the aggregation unit is further configured to:

reading all values of the hash structure from redis;

9. The apparatus of claim 8, wherein the apparatus further comprises a filtering unit configured to:

the hash structure values that fail the verification are filtered out.

10. The apparatus of claim 7, wherein the apparatus further comprises an adding unit configured to:

acquiring newly added temporary labeling information of the target video;

11. The apparatus of claim 7, wherein the apparatus further comprises a deletion unit configured to:

12. The apparatus of claim 7, wherein the apparatus further comprises a query unit configured to:

receiving a labeling information inquiry request;

inquiring temporary marking information and final marking information;

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.