CN116033182A

CN116033182A - Method and device for determining video cover map, electronic equipment and storage medium

Info

Publication number: CN116033182A
Application number: CN202211644371.4A
Authority: CN
Inventors: 宁本德
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-04-28

Abstract

The method, the device, the electronic equipment and the storage medium for determining the video cover map comprise the steps of obtaining a video to be processed and an initial cover map of the video to be processed, wherein the initial cover map comprises target objects, determining at least one target video frame similar to the initial cover map from the video to be processed, performing image processing on the target objects in each target video frame to obtain first mask images, calculating first definition of the target objects, and determining the video cover map from the at least one target video frame based on the at least one first definition. According to the method and the device for obtaining the video cover map, the target video frame similar to the initial cover map can be determined from the video to be processed based on the initial cover map, and further the clear video cover map is determined based on the definition of the target object in the target video frame.

Description

Method and device for determining video cover map, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of video networking, in particular to a method, a device, electronic equipment and a storage medium for determining a video cover map.

Background

Along with the development of video technology, videos become an important part in life entertainment of people, video cover images are extracted from sequences in the videos, and the video cover images are important matters in video websites and are used for displaying important images of current videos to users, attracting the users to click, outputting the most representative high-quality wonderful covers, facilitating the users to quickly position target videos, and improving vision and search experience.

At present, a video cover map is obtained by uniformly extracting frames from video, and no particularly clear picture is in a picture sequence obtained by uniformly extracting frames, or due to uniform frame extraction, a plurality of clear pictures can be missed later, so that the obtained video cover map is not clear easily, and the visual experience of a user is reduced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for determining a video cover map, which can obtain a clear video cover map and improve the visual experience of a user.

In a first aspect, an embodiment of the present invention provides a method for determining a video cover map, where the method includes:

acquiring a video to be processed and an initial cover diagram of the video to be processed, wherein the initial cover diagram comprises a target object; the method comprises the steps that an initial cover diagram is obtained by performing frame extraction processing on a video to be processed;

determining at least one target video frame similar to the initial cover map from the video to be processed; wherein each of the at least one target video frame includes a target object;

performing the following for each of the at least one target video frame: performing image processing on a target object in a target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image; the first mask image is used for representing a preset part area of the target object;

a video cover map of the video to be processed is determined from the at least one target video frame based on the at least one first sharpness.

In one possible implementation, determining at least one target video frame from the video to be processed that is similar to the initial cover map includes:

extracting a first video frame time from the initial cover map;

determining a video frame searching duration range based on the first video frame time;

determining a video segment corresponding to a video frame searching duration range from a video to be processed;

extracting a plurality of first video frames in the video segment according to a preset video frame extraction interval;

at least one target video frame similar to the initial cover map is determined based on the plurality of first video frames.

In one possible implementation, determining a video frame seek duration range based on a first video frame time includes:

taking the first video frame time as a starting point, intercepting a first time length range of a first preset time length in a time direction smaller than the first video frame time, and/or intercepting a second time length range of a second preset time length in a time direction larger than the first video frame time;

and determining the first time length range and/or the second time length range as a video frame searching time length range.

In one possible implementation, determining at least one target video frame similar to the initial cover map based on the plurality of first video frames includes:

the following is performed for each first video frame: calculating the similarity between the first video frame and the initial cover map;

judging whether the similarity is smaller than a preset similarity or not;

searching whether the first video frame comprises a target object or not under the condition that the similarity is larger than or equal to the preset similarity;

a first video frame including a target object is determined as a target video frame.

In one possible embodiment, the method further comprises:

and under the condition that the corresponding similarity of each first video frame is smaller than the preset similarity, determining the initial cover map as the video cover map of the video to be processed.

In one possible implementation manner, performing image processing on a target object in a target video frame to obtain a first mask image includes:

performing key point detection on a preset part area of a target object in a target video frame to obtain key points, and obtaining a closed preset part area by connecting the key points;

the pixel values of the pixel points in the preset part area are set to be first preset pixel values, and the pixel values of the pixel points in other areas except the preset part area in the target video frame are set to be second preset pixel values, so that a binary image is obtained;

a first mask image is obtained based on the target video frame and the binary image.

In one possible implementation, calculating the first sharpness of the target object based on the first mask image includes:

respectively calculating absolute values of pixel values of each pixel point and pixel points spaced in the same row in a preset position area to obtain a plurality of first values;

a plurality of first values are accumulated for a first sharpness of the target object.

In one possible implementation, determining a video cover map of a video to be processed from at least one target video frame based on at least one first sharpness includes:

performing image processing on the target object in the initial cover map to obtain a second mask image, and calculating second definition of the target object based on the second mask image;

judging whether at least one first definition is larger than a second definition;

determining the initial cover map as a video cover map of the video to be processed in the case that at least one first definition is less than or equal to a second definition;

and under the condition that at least one first definition is larger than the second definition, determining the target video frame corresponding to the largest first definition as a video cover diagram of the video to be processed.

In a second aspect, an embodiment of the present invention provides an apparatus for determining a video cover map, where the apparatus includes:

the acquisition module is used for acquiring the video to be processed and an initial cover diagram of the video to be processed, wherein the initial cover diagram comprises a target object; the method comprises the steps that an initial cover diagram is obtained by performing frame extraction processing on a video to be processed;

the first determining module is used for determining at least one target video frame similar to the initial cover map from the video to be processed; wherein each of the at least one target video frame includes a target object;

a computing module for performing, for each of the at least one target video frame, the following: performing image processing on a target object in a target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image; the first mask image is used for representing a preset part area of the target object;

and the second determining module is used for determining a video cover diagram of the video to be processed from at least one target video frame based on the at least one first definition.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the system comprises a processor and a memory, wherein the processor is used for executing a video cover map determining program stored in the memory so as to realize the method for determining the video cover map.

In a fourth aspect, an embodiment of the present invention provides a storage medium, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the method for determining a video cover map described above.

The method, the device, the electronic equipment and the storage medium for determining the video cover map comprise the steps of obtaining a video to be processed and an initial cover map of the video to be processed, wherein the initial cover map comprises target objects, determining at least one target video frame similar to the initial cover map from the video to be processed, performing image processing on the target objects in each target video frame in the at least one target video frame to obtain a first mask image, calculating first definition of the target objects based on the first mask image, and determining the video cover map of the video to be processed from the at least one target video frame based on the at least one first definition. According to the method and the device for obtaining the video cover map, the target video frame similar to the initial cover map can be determined from the video to be processed based on the initial cover map, and further the clear video cover map is determined based on the definition of the target object in the target video frame.

Drawings

FIG. 1 is a schematic diagram of a hardware environment of a method for determining a video cover map according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for determining a video cover map according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a first mask image according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a video time axis of a video to be processed according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image according to an embodiment of the present invention;

FIG. 6 is a block diagram illustrating an embodiment of an apparatus for determining a video cover map according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the purpose of facilitating an understanding of the embodiments of the present invention, reference will now be made to the following description of specific embodiments, taken in conjunction with the accompanying drawings, which are not intended to limit the embodiments of the invention.

In this embodiment, the method for determining the video cover map is applied to the application scenario shown in fig. 1, and as shown in fig. 1, the application scenario includes a client 101 and a server 102, where communication between the client 101 and the server 102 may be performed by using a wired communication technology, for example, communication is performed by connecting a network line or a serial port line; the communication may also be performed by using a wireless communication technology, for example, a bluetooth or wireless fidelity (wireless fidelity, WIFI) technology, which is not particularly limited.

The client 101 generally refers to a device that can provide a video to be processed to the server 102, such as a terminal device, a web page that the terminal device can access, or a third party program that the terminal device can access, or the like. The terminal equipment can be intelligent traffic equipment, cameras, mobile phones, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like. The server 102 generally refers to a device that can process video to be processed, such as a terminal device or a server. Servers include, but are not limited to, cloud servers, local servers, or associated third party servers, and the like. Both the client 101 and the server 102 can adopt cloud computing to reduce occupation of local computing resources; cloud storage may also be employed to reduce the occupation of local storage resources.

As an embodiment, the client 101 and the server 102 may be the same device, and in particular, without limitation, a method for determining a video cover chart in this embodiment is performed by the server 102, as shown in fig. 2, and the method may include the following steps:

step 201, obtaining a video to be processed and an initial cover diagram of the video to be processed, wherein the initial cover diagram comprises a target object;

the method comprises the steps that an initial cover diagram is obtained by performing frame extraction processing on a video to be processed; specifically, a plurality of video frames are extracted from a video to be processed according to a preset frame extraction time interval, then, a pixel value of an image is obtained for each video frame, image parameters such as image brightness, image sharpness and the like corresponding to each video frame are calculated according to the pixel value, and then, the video frame or frames with the best image parameters are determined to be an initial cover map.

In this embodiment, the preset frame-extracting time interval may be set to a time interval with a large frame-extracting span, such as 10fps and 20fps, so as to greatly shorten the image selecting time of the initial cover image, and facilitate improving the image selecting efficiency.

The obtained one or more initial cover images each include a target object focused by a user, where the target object may be a person, an animal, a plant, etc., and is not limited herein, and the target objects included in different initial cover images may be different, and each initial cover image may perform steps 202 to 204 to determine a clear video cover image from the video to be processed.

Step 202, determining at least one target video frame similar to the initial cover map from the video to be processed;

wherein each of the at least one target video frame includes a target object; each of the determined target video frames includes a target object included in the initial cover map, because when the target object is small, although the initial cover map is similar to one or more video frames in the video to be processed to a high degree, there is no target object to be focused on by the user, and the target video frames are unreasonable and cannot be displayed as the video cover map to the user.

Step 203, for each target video frame of the at least one target video frame, performing the following processing: performing image processing on a target object in a target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image;

the first mask image is used for representing a preset part area of the target object; the preset region is the whole region or part of the region where the target object is located, for example, if the target object is a person, the user may compare whether the head of interest is clear, so the obtained first mask image only shows the image included in the head, if the target object is an animal, the user may compare the whole animal of interest to make the whole animal clear, so the obtained first mask image only shows the image of the whole body of the animal, as can be seen from the above description, the first mask image only shows the preset region, but the scenes of other regions except the preset region are not displayed, for convenience of understanding, fig. 3 shows a schematic diagram of the first mask image, as shown in fig. 3, only shows the face region, and the scenes of other regions are black, namely, the pixel values of the pixels are 0.

The first sharpness calculated based on the first mask image of fig. 3 is also used to represent the sharpness of the preset region of the target object, which is the sharpness of the face region in the previous example.

Step 204, determining a video cover map of the video to be processed from the at least one target video frame based on the at least one first sharpness.

The first definition corresponding to the target video frame can determine the video cover map with clear preset part area for the video to be processed, so that the clear video cover map is displayed for the user, the visual experience of the user is greatly improved, and the user is attracted to click.

The method for determining the video cover map comprises the steps of obtaining a video to be processed and an initial cover map of the video to be processed, wherein the initial cover map comprises target objects, determining at least one target video frame similar to the initial cover map from the video to be processed, performing image processing on the target objects in each target video frame in the at least one target video frame to obtain a first mask image, calculating first definition of the target objects based on the first mask image, and determining the video cover map of the video to be processed from the at least one target video frame based on the at least one first definition. According to the method and the device for obtaining the video cover map, the target video frame similar to the initial cover map can be determined from the video to be processed based on the initial cover map, and further the clear video cover map is determined based on the definition of the target object in the target video frame.

In addition, compared with the method for determining the video cover map by performing image processing on each frame included in the video to be processed, the method for determining the video cover map provided by the embodiment of the invention does not need to perform image processing on each video frame in the video to be processed, reduces map selecting time and greatly improves map selecting efficiency.

In some embodiments, the step 202 may be specifically implemented by the following steps:

step A1, extracting a first video frame time from an initial cover map;

the first video frame time is the playing time of the initial cover map on the time axis corresponding to the video to be processed.

A2, determining a video frame searching duration range based on the first video frame time;

in general, the probability that the target object appears in the video frame searching duration range is relatively high, so that a video frame relatively similar to the initial cover map is found in the video frame searching duration range with high probability, so that the target video frame is further determined.

The process for specifically determining the video frame searching duration range comprises the following steps: taking the first video frame time as a starting point, intercepting a first time length range of a first preset time length in a time direction smaller than the first video frame time, and/or intercepting a second time length range of a second preset time length in a time direction larger than the first video frame time; and determining the first time length range and/or the second time length range as a video frame searching time length range.

For easy understanding, fig. 4 shows a schematic diagram of a video time axis of a video to be processed, as shown in fig. 4, where the total duration of the video time axis is 10 minutes, which indicates that the total duration of the video to be processed is 10 minutes, and as shown in fig. 4, the larger the value of the time to the right is, the smaller the value of the time to the left is, as shown in fig. 4, and as the first video frame time is 4 minutes, the first preset duration and the second preset duration are both set to be 1 minute, the intercepted first duration range is 3 minutes to 4 minutes, the second duration range is 4 minutes to 5 minutes, and the video frame searching duration range is 3 minutes to 5 minutes.

In actual use, only the first time length range or the second time length range may be determined as the video frame searching time length range, and the first preset time length and the second preset time length may be set to the same time length or different time lengths, which is not limited herein.

Step A3, determining a video segment corresponding to a video frame searching duration range from the video to be processed;

because each video frame in the video to be processed corresponds to a video frame time, the video segment in the video frame searching duration range can be searched from the video to be processed.

Step A4, extracting a plurality of first video frames in the video segment according to a preset video frame extraction interval;

in order to accurately find a target video frame similar to the initial cover map, in this embodiment, dense frame extraction is performed in the video segment, that is, the preset video frame extraction interval may be set to a frame extraction interval with a small time frame extraction span of 2fps, 5fps, etc., so as to extract a plurality of first video frames from the video segment.

Step A5, determining at least one target video frame similar to the initial cover map based on the plurality of first video frames.

In this embodiment, the image similarity is used to determine which first video frames are target video frames, and the specific determining process is as follows: the following is performed for each first video frame: calculating the similarity between the first video frame and the initial cover map; judging whether the similarity is smaller than a preset similarity or not; searching whether the first video frame comprises a target object or not under the condition that the similarity is larger than or equal to the preset similarity; a first video frame including a target object is determined as a target video frame.

In particular, the similarity between the first video frame and the initial cover map may be calculated by using a structural similarity measure, a cosine similarity measure, a histogram, and other calculation methods, which are not limited herein. The preset similarity may be set according to actual needs, but the higher the preset similarity setting value is, the more similar the first video frame and the initial cover map can be described, and in this embodiment, the first video frame, which has similarity greater than the preset similarity and includes the target object, is determined as the target video frame.

Generally, when the calculated similarity of each first video frame is smaller than the preset similarity, it is indicated that each first video frame is dissimilar from the initial cover map, and the initial cover map is directly determined as the video cover map of the video to be processed for displaying to the user without selecting the maps in the steps 203 to 204.

In some embodiments, the image processing of the target object in the target video frame in the step 203 to obtain the first mask image may specifically be implemented by the following steps:

step B1, detecting key points of a preset part area of a target object in a target video frame to obtain key points, and connecting the key points to obtain a closed preset part area;

in this embodiment, key point detection may be performed on the boundary of the preset portion area through the neural network model to obtain key points of the preset portion area, so that in order to facilitate sequential identification of the key points, identification numbers are generally further assigned to each key point, so that each key point is conveniently connected according to the identification number sequence, and the area where the preset portion area is located in the target video frame is defined.

Step B2, the pixel values of the pixel points in the preset part area are changed to a first preset pixel value, and the pixel values of the pixel points in other areas except the preset part area in the target video frame are changed to a second preset pixel value, so that a binary image is obtained;

in general, if the first preset pixel value is 1 and the second preset pixel value is 0, the binary image is a black-and-white image, that is, the preset region is white, and other regions except the preset region in the target video frame are black, for convenience of understanding, fig. 5 shows a schematic view of an image, the left image in fig. 5 is an image of the target video frame, and the target object in the image is a person, and the preset region is a head, so the right image in fig. 5 is a binary image in which the head region is displayed as white and the other regions are displayed as black.

And step B3, obtaining a first mask image based on the target video frame and the binary image.

And carrying out dot multiplication on pixel values of pixel points at corresponding positions in the left image and the right image in fig. 5 to obtain a first mask image shown in fig. 3.

In some embodiments, the calculating the first sharpness of the target object based on the first mask image in step 203 may specifically be implemented by: respectively calculating absolute values of pixel values of each pixel point and pixel points spaced in the same row in a preset position area to obtain a plurality of first values; a plurality of first values are accumulated for a first sharpness of the target object.

From the above description, the calculation process of the first definition can be expressed by the following formula:

D(B)＝∑ _x ∑ _y b (x+2, y) -B (x, y); where D (B) represents the first sharpness, B (x, y) represents the pixel value of the pixel at the (x, y) position in the first mask image, and B (x+2, y) represents the pixel value of the pixel at the (x+2, y) position in the first mask image.

In addition to the above-described calculation method of sharpness, other calculation methods of sharpness of images may be used, and are not limited thereto.

In some embodiments, the step 204 may be specifically implemented by the following steps:

step C1, performing image processing on a target object in the initial cover map to obtain a second mask image, and calculating second definition of the target object based on the second mask image;

the calculation process of the second definition of the initial cover map is the same as the calculation process of the first definition of the target video frame in step 203, and will not be described herein.

Step C2, judging whether at least one first definition is larger than a second definition;

it is determined whether the first sharpness corresponding to each target video frame is greater than the second sharpness corresponding to the initial cover to determine whether the target object in the target video frame is more sharp than the target object in the initial cover map.

Step C3, determining the initial cover map as a video cover map of the video to be processed under the condition that at least one first definition is smaller than or equal to a second definition;

if the first definition of each target video frame is not greater than the second definition corresponding to the initial cover, the determined target object in the target video frame is not clearer than the target object in the initial cover, so that the target video frame is not required to be determined as the video cover, and the initial cover is directly determined as the video cover of the video to be processed.

And C4, under the condition that at least one first definition is larger than the second definition, determining the target video frame corresponding to the largest first definition as a video cover diagram of the video to be processed.

In particular, if the first definition corresponding to one target video frame is greater than the second definition, it is indicated that the target object in the target video frame is clearer than the target object in the initial cover map, and therefore, the target video frame is determined to be the video cover map; if the first definition corresponding to the plurality of target video frames is larger than the second definition, the target video frame corresponding to the largest first definition is required to be determined as a video cover map, namely, the target video frame with the clearest target object is determined as the video cover map so as to be clearly displayed to a user.

Referring to fig. 6, a block diagram of an embodiment of an apparatus for determining a video cover map according to an embodiment of the present invention is provided; as shown in fig. 6, the apparatus may include:

the acquiring module 601 is configured to acquire a video to be processed and an initial cover map of the video to be processed, where the initial cover map includes a target object; the method comprises the steps that an initial cover diagram is obtained by performing frame extraction processing on a video to be processed;

a first determining module 602, configured to determine at least one target video frame similar to the initial cover map from the video to be processed; wherein each of the at least one target video frame includes a target object;

a calculation module 603 for performing, for each of the at least one target video frame, the following: performing image processing on a target object in a target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image; the first mask image is used for representing a preset part area of the target object;

a second determining module 604 is configured to determine a video cover map of the video to be processed from the at least one target video frame based on the at least one first sharpness.

The device for determining the video cover map provided by the embodiment of the invention comprises the steps of obtaining a video to be processed and an initial cover map of the video to be processed, wherein the initial cover map comprises target objects, at least one target video frame similar to the initial cover map is determined from the video to be processed, the target objects in each target video frame in the at least one target video frame are subjected to image processing to obtain a first mask image, the first definition of the target objects is calculated based on the first mask image, and the video cover map of the video to be processed is determined from the at least one target video frame based on the at least one first definition. According to the method and the device for obtaining the video cover map, the target video frame similar to the initial cover map can be determined from the video to be processed based on the initial cover map, and further the clear video cover map is determined based on the definition of the target object in the target video frame.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and an electronic device 500 shown in fig. 7 includes: at least one processor 501, memory 502, at least one network interface 504, and other user interfaces 503. The various components in the electronic device 500 are coupled together by a bus system 505. It is understood that bus system 505 is used to enable connected communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 505 in fig. 7.

The user interface 503 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It will be appreciated that the memory 502 in embodiments of the invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (ProgrammableROM, PROM), an erasable programmable Read-only memory (ErasablePROM, EPROM), an electrically erasable programmable Read-only memory (ElectricallyEPROM, EEPROM), or a flash memory, among others. The volatile memory may be a random access memory (RandomAccessMemory, RAM) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic random access memory (DynamicRAM, DRAM), synchronous dynamic random access memory (SynchronousDRAM, SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous link dynamic random access memory (SynchlinkDRAM, SLDRAM), and direct memory bus random access memory (DirectRambusRAM, DRRAM). The memory 502 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 502 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 5021 and application programs 5022.

The operating system 5021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 5022 includes various application programs such as a media player (MediaPlayer), a Browser (Browser), and the like for implementing various application services. A program for implementing the method according to the embodiment of the present invention may be included in the application 5022.

In the embodiment of the present invention, the processor 501 is configured to execute the method steps provided in the method embodiments by calling a program or an instruction stored in the memory 502, specifically, a program or an instruction stored in the application 5022.

The method disclosed in the above embodiment of the present invention may be applied to the processor 501 or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 501. The processor 501 may be a general purpose processor, a digital signal processor (DigitalSignalProcessor, DSP), an application specific integrated circuit (application specific IntegratedCircuit, ASIC), an off-the-shelf programmable gate array (FieldProgrammableGateArray, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 502, and the processor 501 reads information in the memory 502 and, in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ApplicationSpecificIntegratedCircuits, ASIC), digital signal processors (DigitalSignalProcessing, DSP), digital signal processing devices (dspev), programmable logic devices (ProgrammableLogicDevice, PLD), field programmable gate arrays (Field-ProgrammableGateArray, FPGA), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be an electronic device as shown in fig. 7, and may perform all steps of the method for determining a video cover map as shown in fig. 2, so as to achieve the technical effects of the method for determining a video cover map as shown in fig. 2, and detailed description with reference to fig. 2 is omitted herein for brevity.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

The one or more programs, when executed by the one or more processors, implement the method of video cover map determination described above.

The processor is configured to execute a video cover map determining program stored in the memory to implement the steps of the method for video cover map determination.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of video cover map determination, the method comprising:

acquiring a video to be processed and an initial cover diagram of the video to be processed, wherein the initial cover diagram comprises a target object; the initial cover map is obtained by performing frame extraction processing on the video to be processed;

determining at least one target video frame similar to the initial cover map from the video to be processed; wherein each of at least one of the target video frames includes the target object;

performing the following for each of at least one of the target video frames: performing image processing on the target object in the target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image; the first mask image is used for representing a preset part area of the target object;

and determining a video cover map of the video to be processed from at least one target video frame based on at least one first definition.

2. The method of claim 1, wherein the determining at least one target video frame from the video to be processed that is similar to the initial cover map comprises:

extracting a first video frame time from the initial cover map;

determining a video segment corresponding to the video frame searching duration range from the video to be processed;

at least one target video frame similar to the initial cover map is determined based on a plurality of the first video frames.

3. The method of claim 2, wherein the determining a video frame seek duration range based on the first video frame time comprises:

4. The method of claim 2, wherein the determining at least one target video frame that is similar to the initial cover map based on the plurality of first video frames comprises:

the following is performed for each of the first video frames: calculating the similarity between the first video frame and the initial cover map;

judging whether the similarity is smaller than a preset similarity or not;

searching whether the first video frame comprises the target object or not under the condition that the similarity is larger than or equal to the preset similarity;

a first video frame including the target object is determined as a target video frame.

5. The method according to claim 4, wherein the method further comprises:

and under the condition that the similarity corresponding to each first video frame is smaller than the preset similarity, determining the initial cover map as the video cover map of the video to be processed.

6. The method of claim 1, wherein the performing image processing on the target object in the target video frame to obtain a first mask image comprises:

performing key point detection on a preset part area of the target object in the target video frame to obtain key points, and connecting the key points to obtain a closed preset part area;

the pixel values of the pixel points in the preset part area are set to a first preset pixel value, and the pixel values of the pixel points in other areas except the preset part area in the target video frame are set to a second preset pixel value, so that a binary image is obtained;

and obtaining a first mask image based on the target video frame and the binary image.

7. The method of claim 1, wherein the calculating the first sharpness of the target object based on the first mask image comprises:

respectively calculating the absolute values of pixel values of each pixel point and pixel points spaced in the same row in the preset position area to obtain a plurality of first values;

and accumulating a plurality of the first values to obtain the first definition of the target object.

8. The method of claim 1, wherein the determining the video cover map for the video to be processed from at least one of the target video frames based on at least one of the first sharpness comprises:

judging whether at least one of the first definition is greater than the second definition;

determining the initial cover map as a video cover map of the video to be processed in the case that at least one of the first definition is less than or equal to the second definition;

9. An apparatus for video cover map determination, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a video to be processed and an initial cover diagram of the video to be processed, and the initial cover diagram comprises a target object; the initial cover map is obtained by performing frame extraction processing on the video to be processed;

a first determining module, configured to determine at least one target video frame similar to the initial cover map from the video to be processed; wherein each of at least one of the target video frames includes the target object;

a computing module for performing, for each of at least one of the target video frames, the following: performing image processing on the target object in the target video frame to obtain a first mask image, and calculating first definition of the target object based on the first mask image; the first mask image is used for representing a preset part area of the target object;

and the second determining module is used for determining a video cover diagram of the video to be processed from at least one target video frame based on at least one first definition.

10. An electronic device, comprising: a processor and a memory, the processor being configured to execute a video cover map determination program stored in the memory to implement the method of video cover map determination of any one of claims 1 to 8.

11. A storage medium storing one or more programs executable by one or more processors to implement the method of video cover map determination of any one of claims 1-8.