CN115474089A

CN115474089A - Audio and video online examination method and related equipment

Info

Publication number: CN115474089A
Application number: CN202210971679.3A
Authority: CN
Inventors: 谭熙
Original assignee: Shenzhen Big Head Brothers Technology Co Ltd
Current assignee: Shenzhen Big Head Brothers Technology Co Ltd
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-12-13

Abstract

The invention discloses an on-line examination method of audio and video and related equipment, wherein the method comprises the steps of obtaining an audio and video file to be processed; analyzing the audio and video file to obtain a time axis corresponding to the audio and video file; when a positioning instruction for the time axis is detected, target information in the audio and video file is displayed according to the positioning instruction; and when an annotation instruction for the target information is detected, generating annotation information corresponding to the target information according to the annotation instruction. The invention can facilitate the user to annotate and modify the audio/video file and improve the efficiency.

Description

Audio and video online examination method and related equipment

Technical Field

The invention relates to the technical field of multimedia processing, in particular to an on-line audio and video film examination method and related equipment.

Background

Along with threshold reduction of video shooting and processing, more and more users can easily shoot and make videos. In the video editing process, team cooperation is often required to improve the quality of the video in the later period. Because the current video post-processing usually takes a frame as a unit, but a section of video contains a large number of video frames, corresponding video frames need to be frequently searched in the modification process after a problem is raised, and the working efficiency is low. If the offline direct communication mode is adopted, team members are required to be present at the same time, so that modification cannot be frequently and efficiently discussed.

Disclosure of Invention

The invention aims to solve the technical problem of the annotation modification efficiency of audio and video, and provides an online examination method of audio and video and related equipment aiming at the defects of the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an on-line audio-video film examination method, comprising:

acquiring an audio/video file to be processed;

analyzing the audio and video file to obtain a time axis corresponding to the audio and video file;

when a positioning instruction for the time axis is detected, target information in the audio and video file is displayed according to the positioning instruction;

and when an annotation instruction for the target information is detected, generating annotation information corresponding to the target information according to the annotation instruction.

The method for the online examination of the audio and video comprises the steps that a time axis comprises an image axis or/and an audio axis, and a positioning instruction comprises an image instruction aiming at the image axis and an audio instruction aiming at the audio axis; the displaying the corresponding target information in the audio and video file according to the positioning instruction comprises the following steps:

when the image instruction is detected, displaying and displaying image information corresponding to the audio and video file according to the image instruction;

and when the audio instruction is detected, displaying and displaying the video information corresponding to the audio and video file according to the audio instruction.

According to the method for the on-line examination of the audios and videos, the image instruction comprises a single-frame instruction and a multi-frame instruction; the displaying the target information in the audio and video file according to the positioning instruction comprises the following steps:

when the positioning instruction is a single-frame instruction, according to a timestamp corresponding to the single-frame instruction, taking a corresponding frame image in the audio/video file as target information and displaying the target information;

and when the positioning instruction is a multi-frame instruction, taking the corresponding image set in the audio/video file as target information and displaying according to the starting time and the ending time corresponding to the multi-frame instruction.

The method for the on-line examination of the audio and video comprises the following steps of taking a corresponding image set in an audio and video file as target information and displaying the target information according to the starting time and the ending time corresponding to the multi-frame instruction:

determining a starting image and a terminating image in the audio and video file according to a starting frame corresponding to the starting moment and a terminating frame corresponding to the terminating moment;

taking the video images between the starting image and the ending image as a set of images;

and displaying the image set according to a preset preview rule.

The audio and video online examination method comprises the steps that the annotation instruction comprises a starting instruction and an annotation text; the generating annotation information according to the annotation instruction comprises:

when a starting instruction corresponding to the audio and video information is detected, activating a preset annotation area;

and when the annotation text aiming at the annotation area is detected, generating annotation information according to the positioning instruction and the annotation text.

The method for the on-line examination of the audio and video, wherein the generating of the annotation information according to the positioning instruction and the annotation text comprises the following steps:

generating time information according to the positioning instruction;

determining an annotation object according to a time axis corresponding to the positioning instruction;

and generating annotation information according to the time information, the annotation object and the annotation text.

The method for the online examination of the audio and video comprises the following steps:

when a modification instruction for the audio/video file is detected, determining whether the modification instruction corresponds to the annotation information or not according to a timestamp corresponding to the modification instruction;

and if so, generating a modification remark according to the modification instruction.

The audio and video online examination method comprises the following steps of:

acquiring a file to be processed;

carrying out shot recognition on the file to be processed to obtain boundary frames corresponding to different shots;

and splitting the file to be processed according to the boundary frame to obtain a plurality of audio and video files.

A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the method for online review of audio-visual data as described in any of the above.

A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes the connection communication between the processor and the memory;

the processor realizes the steps of the online audio and video reviewing method when executing the computer readable program.

Has the beneficial effects that: the invention provides an audio and video online examination method and related equipment. When a user needs to annotate the audio and video at a certain moment or a certain period of time, a positioning instruction is sent first, and target information is displayed according to the positioning instruction so that the user can determine whether the content needs to be annotated. And after the user confirms, inputting an annotation instruction so as to generate annotation information corresponding to the moment or time.

Drawings

Fig. 1 is a flowchart of an online audio/video review method provided by the present invention.

Fig. 2 is a schematic diagram of determining a boundary frame in the audio/video online film examination method provided by the invention.

Fig. 3 is a schematic diagram of first target information display in the audio/video online review method provided by the present invention.

Fig. 4 is a schematic diagram of second target information display in the online audio/video reviewing method provided by the present invention.

Fig. 5 is a schematic structural diagram of a terminal device provided in the present invention.

Detailed Description

The invention provides an on-line audio and video reviewing method, which is further described in detail below by referring to the attached drawings and embodiments in order to make the purpose, technical scheme and effect of the invention clearer and clearer. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As shown in fig. 1, this embodiment provides an online audio/video reviewing method, which is described by using a common server as an execution subject for convenience of description, where the server may be replaced with a tablet, a computer, or other devices having a data processing function, and the online audio/video reviewing method includes the following steps:

and S10, acquiring an audio and video file to be processed.

Specifically, a to-be-processed audio and video file is obtained first, and the audio and video file comprises audio and video. For example, the audio/video file 1 is a video file, and the audio/video file 2 is an audio file.

Generally, video files are large, and annotating on the video files requires a computer to have high computing power. In order to improve the operation efficiency, in this embodiment, the file to be processed may be split to obtain a plurality of audio/video files, and each audio/video file is a fragment of the file to be processed, so that the processing efficiency may be improved. Further, the video is generally composed of a plurality of shots, and if the shots are cut randomly, the shots are likely to split when being viewed subsequently. Therefore, in this embodiment, to obtain the audio/video file, before obtaining the audio/video file, the method further includes:

and A10, acquiring a file to be processed.

Specifically, a file to be processed is first obtained, and the file to be processed is a video file.

And A20, carrying out shot identification on the file to be processed to obtain boundary frames corresponding to different shots.

Specifically, a plurality of shots are contained in the video file, and as shown in fig. 2, the frame image for switching from one shot to another shot, i.e. the boundary frame, can be determined according to the source of the shot taken by the video file.

For example, a light flow analysis algorithm is used for analyzing a file to be processed to determine a boundary moment, wherein a light flow is an 'instantaneous speed' of pixel motion of a spatial moving object on an observation imaging plane, so that a calculable instantaneous speed exists between frame images shot by the same lens, but a matched pixel is difficult to find between frame images shot by different lenses, so that the calculated instantaneous speed is abnormal, and a normal range of a light flow value, namely a light flow threshold value, can be preset.

And calculating an optical flow value between each frame image and the previous frame image in the video file, and taking the frame image as a boundary frame when the optical flow value is not in the range of the optical flow threshold value.

Besides optical flow analysis, the boundary frame can be determined by means of similarity between two front and back frame images.

After the boundary frame is determined, the boundary frame and a plurality of images of the front frame and the rear frame can be displayed on a display screen, and if a user checks that the boundary frame is determined to be wrong, the proper boundary frame can be manually reselected.

And A30, splitting the file to be processed according to the boundary frame to obtain a plurality of audio and video files.

Specifically, after the boundary frame is obtained, the boundary frame is used as a watershed, the file to be processed is split, frame images before the first boundary frame are used as one group, frame images after the first boundary frame and before the second boundary frame are used as a second group \8230, 8230, and the method is carried out until all frame images in the file to be processed are divided into a plurality of image groups. And simultaneously splitting the audio of the file to be processed according to the time corresponding to the boundary frame to obtain a plurality of audios. And combining the audio and the corresponding image according to the corresponding time period in the file to be processed to obtain a subsequent annotatable audio/video file.

And if no boundary frame exists, namely the number of the boundary frames is zero, directly taking the file to be processed as an audio/video file.

Further, team work requires cooperation of multiple people, for example, one user manages one shot, so a management account corresponding to each shot can be preset, and each management account corresponds to one shot. After the management account is split into the audio and video files, the corresponding management account is determined according to the shot corresponding to the audio and video files.

For example, a management account a corresponding to a first shot is preset, and a frontmost audio and video file in a corresponding file to be processed is sent to the management account a.

And S20, analyzing the audio and video file to obtain a time axis corresponding to the audio and video file.

Specifically, no matter the audio file or the video file is played, the playing is performed according to the time axis of the audio and the time axis of the video, so that a user can conveniently confirm the time needing to be modified, and the acquired audio and video file is analyzed to obtain the time axis corresponding to the audio and video file.

In order to distinguish the time axis corresponding to the audio file from the time axis corresponding to the video file, the time axis corresponding to the audio file is referred to as an audio axis, and the time axis corresponding to the video file is referred to as an image axis.

And S30, when a positioning instruction aiming at the time axis is detected, displaying and displaying target information in the audio and video file according to the positioning instruction.

Specifically, a user can send a positioning instruction to the computer through the external device, for example, a mouse clicks a time axis, each segment of the time axis corresponds to a certain moment of the audio/video file, when the positioning instruction is detected, the corresponding moment of the positioning instruction can be determined according to a coordinate corresponding to the positioning instruction, and information corresponding to the moment in the audio/video file is used as target information and displayed.

In the above example, for the target information that needs to be annotated to correspond to a single time, the positioning command may correspond to a time period, for example, moving a route while clicking a mouse, setting a time corresponding to a start point of the route on a time axis as a start time, and setting a time corresponding to an end point of the route on the time axis as an end time. A time period can be determined from the start time and the end time. And according to the time period, corresponding target information in the audio and video file can be determined and displayed.

And if the audio/video file is a video file, the positioning instruction aiming at the image axis is called an image instruction. And when the image command is detected, displaying a plurality of images corresponding to the image command according to the image command. For the video file, the positioning instruction can be further divided into a single-frame instruction and a multi-frame instruction according to whether the positioning instruction corresponds to a single moment or a period of time. Since the single frame of the audio does not correspond to the time, in the embodiment, the positioning instruction can only correspond to a plurality of time periods for the audio file.

And when the positioning instruction is a single-frame instruction, taking a corresponding frame image in the audio/video file as target information and displaying the target information according to a timestamp corresponding to the single-frame instruction. For example, as shown in fig. 3, a display area of the object information is set above the time axis, and the frame image is displayed.

And when the positioning instruction is a multi-frame instruction, taking the corresponding image set in the audio/video file as target information and displaying according to the starting time and the ending time corresponding to the multi-frame instruction. According to a start frame corresponding to the start time of a positioning instruction and an end frame corresponding to the end time, a start image and an end image in an audio/video file are determined, and then a video image between the start image and the end image is used as an image set. The start image and the end image themselves may or may not be included in the set of images. And then displaying the image set according to a preset preview rule.

At this time, because the multi-frame instruction corresponds to multiple frame images, in order to display the frame image, as shown in fig. 3, multiple display frames are preset in a preview rule, and when multiple frame images need to be displayed, the frame images are sequentially guided into the display frames according to the sequence of the frame images so as to display the image set. In another preview rule, as shown in fig. 4, based on the image set, a thumbnail is generated, which includes a thumbnail corresponding to each image frame in the image set. When a thumbnail for one of the thumbnails is detected, a frame image corresponding to the thumbnail is displayed. The former can facilitate a user to see a large number of images in the target information at one time, and the latter can facilitate the user to carefully check frame by frame, so that the two preview modes can be mutually switched and displayed.

And S40, when an annotation instruction aiming at the target information is detected, generating annotation information corresponding to the target information according to the annotation instruction.

Specifically, after the target information is determined, the user may input an annotation instruction for the target information, where the annotation instruction includes content to be annotated, and when the annotation instruction is detected, generate annotation information corresponding to the target information according to the content in the annotation instruction.

As shown in fig. 3, after generating the annotation information, the annotation information may be disposed on the left side of the display interface, and the annotation information may include information about a time or a time period, content of the annotation, time of the annotation, and the like.

Further, the annotation instruction may include a start instruction and an annotation text, where the start instruction is used to start annotation, and the annotation text is content that the user needs to annotate. In order to facilitate the user to input and determine the content of the annotation text, when the starting instruction is detected, the preset annotation area is activated, and the user can input the content to be annotated, namely the annotation text. And when the annotation text is detected, generating annotation information according to the positioning instruction and the annotation text. For example, time information is generated according to the time on the time axis corresponding to the positioning instruction. And simultaneously, determining an annotation object according to a time axis corresponding to the positioning instruction, wherein the annotation object can comprise an image, audio or video. For example, "volume when a person is talking is adjusted" may be annotated as audio for an object, "exposure is increased" for annotated text as an image for an object, and "video is not harmonized with music tempo" for annotated text as video for an object. Corresponding to the first annotation text, the corresponding annotation information may be "[ 0.

In addition, in order to facilitate the user to view the position corresponding to the annotation instruction, while generating the annotation information, a preset prompt tag may be displayed in a region corresponding to the annotation information on the time axis, for example, the background color of the region corresponding to the time on the time axis is changed to another color, and the green background is used as the prompt tag. For example, in fig. 4, a region on the time axis is light gray, which indicates that the corresponding annotation information exists in the time period.

Further, when the user modifies the audio and video file, a modification instruction for the audio and video file is sent to the server, and whether the modification instruction corresponds to the annotation information or not is determined according to the timestamp corresponding to the modification instruction. For example, the user makes a modification for audio with a corresponding time of [ 0.

Based on the method and the device, the user can annotate the content at a certain moment or in a certain time period through the positioning instruction, so that the area needing to be modified can be conveniently and quickly positioned in the follow-up process, and the working efficiency is improved.

Based on the above on-line examination method of audio and video, the present invention further provides a terminal device, as shown in fig. 5, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, and may further include a communication Interface (Communications Interface) 23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logical commands in the memory 22 to perform the method in the above embodiment.

In addition, the logic commands in the memory 22 can be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program commands or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes functional applications and data processing by executing software programs, commands or modules stored in the memory 22, i.e. implements the method in the above-described embodiments.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be used as the transient computer readable storage medium.

In addition, the specific processes loaded and executed by the computer readable storage medium and the plurality of command processors in the terminal device are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An on-line examination method for audio and video is characterized by comprising the following steps:

acquiring an audio/video file to be processed;

2. The method for on-line examination of audio and video according to claim 1, wherein the time axis comprises an image axis and/or an audio axis, and the positioning instruction comprises an image instruction for the image axis and an audio instruction for the audio axis; the step of displaying and displaying corresponding target information in the audio and video file according to the positioning instruction comprises the following steps:

when the image instruction is detected, displaying image information corresponding to the audio and video file according to the image instruction and displaying the image information;

3. The method for the on-line examination of audios and videos as claimed in claim 2, wherein the image command comprises a single-frame command and a multi-frame command; the displaying the target information in the audio and video file according to the positioning instruction comprises the following steps:

4. The method for the on-line examination of the audio and video according to claim 3, wherein the step of taking and displaying the corresponding image set in the audio and video file as target information according to the starting time and the ending time corresponding to the multi-frame instruction comprises the following steps:

determining a starting image and a terminating image in the audio/video file according to a starting frame corresponding to the starting moment and a terminating frame corresponding to the terminating moment;

and displaying the image set according to a preset preview rule.

5. The on-line audio-video film examination method according to claim 1, wherein the annotation command comprises a start command and an annotation text; the generating annotation information according to the annotation instruction comprises:

6. The method for the on-line examination of audios and videos as claimed in claim 5, wherein the generating of the annotation information according to the positioning instruction and the annotation text comprises:

generating time information according to the positioning instruction;

7. The method for on-line examination of audio-visual frequency and video according to claim 1, wherein the method further comprises:

8. The method for the on-line examination of audios and videos as claimed in claim 1, wherein the obtaining of the audio and video files to be processed comprises:

acquiring a file to be processed;

9. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which can be executed by one or more processors to implement the steps in the method for on-line examination of audio and video according to any one of claims 1 to 8.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor realizes the steps of the audio and video online examination method according to any one of claims 1 to 8 when executing the computer readable program.