CN111626922A

CN111626922A - Picture generation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111626922A
Application number: CN202010392055.7A
Authority: CN
Inventors: 卢永晨
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2020-09-04
Anticipated expiration: 2040-05-11
Also published as: CN111626922B

Abstract

The embodiment of the disclosure discloses a picture generation method and device, electronic equipment and a computer-readable storage medium. The picture generation method comprises the following steps: performing frame extraction on a video to obtain a video frame; inputting the video frame into a picture classification model to obtain the category of the video frame; responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting partial frames of the video frames which do not comprise the first object according to the position of the first object to generate a first picture. By classifying and intercepting the video frames in the method, the technical problems of low efficiency and low accuracy of generating the video cover pictures in the prior art are solved.

Description

Picture generation method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for generating a picture, an electronic device, and a computer-readable storage medium.

Background

With the development of internet technology, the propagation form of the internet is constantly changing, from an early PC computer end to a current smart phone end, the way of accessing the internet by a netizen is more and more convenient, people enter a mobile internet era, mobile terminals represented by smart phones, tablet computers and the like are increasingly popular, the application of the mobile internet gradually permeates the daily life of people, and people can enjoy the convenience brought by a new technology anytime and anywhere. At present, with big information explosion, simple characters and pictures lose the market for a long time, and instead, the multi-dimensional stereo expression form of integrating various elements such as the characters, the pictures, the sounds and the like can fully transfer the senses such as the vision, the hearing, the smell and the like of people. Among them, long video and short video become important forms of information dissemination.

In the prior art, a video cover picture is generally set to show main contents of a video when the video is shown, for example, a restaurant introduction video, a front picture of a restaurant in the video can be set as the video cover picture, and the cover picture can be selected manually by a video photographer, but the method is time-consuming and labor-consuming; or the cover picture can be randomly selected, the selected picture does not meet the requirement or a large number of subtitles exist in some videos possibly, and the selected picture has the effect of affecting the subtitles. Therefore, the technical scheme for generating the cover picture of the video in the prior art has the problems of inaccuracy and low efficiency.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, an embodiment of the present disclosure provides a picture generation method, including:

performing frame extraction on a video to obtain a video frame;

inputting the video frame into a picture classification model to obtain the category of the video frame;

responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;

and intercepting partial frames of the video frames which do not comprise the first object according to the position of the first object to generate a first picture.

In a second aspect, an embodiment of the present disclosure provides a picture generating apparatus, including:

the frame extracting module is used for extracting frames of the video to obtain video frames;

the video frame classification module is used for inputting the video frame into a picture classification model to obtain the category of the video frame;

the first object detection module is used for responding to the fact that the video frame is in the first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;

and the first picture generation module is used for intercepting partial frames which do not comprise the first object in the video frame according to the position of the first object to generate a first picture.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the picture generation method of any of the preceding first aspects.

In a fourth aspect, the present disclosure provides a non-transitory computer-readable storage medium, which stores computer instructions for causing a computer to execute the picture generation method in any one of the foregoing first aspects.

The embodiment of the disclosure discloses a picture generation method and device, electronic equipment and a computer-readable storage medium. The picture generation method comprises the following steps: performing frame extraction on a video to obtain a video frame; inputting the video frame into a picture classification model to obtain the category of the video frame; responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting partial frames of the video frames which do not comprise the first object according to the position of the first object to generate a first picture. By classifying and intercepting the video frames in the method, the technical problems of low efficiency and low accuracy of generating the video cover pictures in the prior art are solved. The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flowchart of a picture generation method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a specific example of step S104 in the picture generation method according to the embodiment of the present disclosure;

fig. 3 is a schematic diagram of another specific example of step S104 in the picture generation method according to the embodiment of the present disclosure;

fig. 4 is a schematic view of a usage scenario of a picture generation method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an embodiment of a picture generating apparatus provided in the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flowchart of an embodiment of a picture generation method provided in this disclosure, where the picture generation method provided in this embodiment may be executed by a picture generation apparatus, the picture generation apparatus may be implemented as software, or implemented as a combination of software and hardware, and the picture generation apparatus may be integrated in a certain device in a picture generation system, such as a picture generation server or a picture generation terminal device. As shown in fig. 1, the method comprises the steps of:

step S101, performing frame extraction on a video to obtain a video frame;

in the present disclosure, the video is a produced video, and an exemplary video is a video uploaded to each large video website or short video application.

Performing a frame extraction process on the video, wherein the frame extraction process exemplarily includes extracting video frames at a predetermined frequency, for example, once every 0.1 second, and the number of extracted video frames is related to the frequency and the length of the video; or randomly extracting according to the total extraction frame number, if 100 video frames are required to be extracted, randomly extracting from the video regardless of whether the interval between the 100 video frames is uniform.

Illustratively, the step S101 further includes: and inputting the video into a frame extraction model to obtain a video frame. The frame extraction model extracts video frames meeting certain conditions in the video, for example, if the proportion of a certain color in one video frame exceeds 70%, it can be judged that the video frame may not be used as a cover picture, and at this time, the video frame may not be extracted; or, for example, the frame extraction model may be a frame extraction model for a certain target object, which extracts only video frames including the target object in the video frames, so that it may be more accurate when a cover picture is further generated subsequently.

Other frame extraction methods can also be used in the technical solution of the present disclosure, which is not limited in the present disclosure and will not be described herein again.

Step S102, inputting the video frame into a picture classification model to obtain the category of the video frame;

the image classification model is a classification model trained in advance. Illustratively, the picture classification model is a binary classification model, which classifies pictures into two types: a first category of video frames including the first object and the second object and a second category of video frames including only the second object. Illustratively, the first object is text and the second object is a target object such as a store of a restaurant.

The image classification model is trained in advance by using a training atlas, wherein the atlas comprises images marked by a mark and only comprising a second object and images marked by a mark and simultaneously comprising a first object and a second object. Illustratively, the picture classification model uses a resnet18 network and trains the network using a sigmoid loss function. For example, the image classification model may also be a multi-classification model, which trains the network using a softmax loss function, and in the case of the multi-classification model, one more classification may be added on the basis of the above types: neither the first object nor the picture of the second object is included. Therefore, video frames which are not suitable for being used as cover pictures in the video frames can be directly filtered.

In one case, the appropriate video frame may be selected directly at this step as the cover picture. Therefore, optionally, after the step S102, the method includes: in response to the video frame being in a second category, treating the video frame as a first picture. As a result of the classification in step S102, there are store pictures of the second category, and therefore, the pictures can be directly used as cover pictures of the video.

Further, in the classification result in S102, there may also be a plurality of video frames all classified as the second category, and in this case, in response to that the video frame is the second category, regarding the video frame as the first picture, further includes: and responding to the plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture. In this step, since the video frame with the largest second object ratio is used as the cover picture, the second object ratio in the video frame needs to be calculated, and therefore the range of the second object needs to be detected by using the object detection, and then the area of the second object needs to be calculated. Or, in this case, the front-most video frame of the plurality of video frames of the second category may be directly selected as the cover picture. It can be understood that when there are a plurality of video frames of the second category, any policy may be used to select one of the video frames as the first picture.

Step S103, responding to the video frame being in the first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;

optionally, the first category is a video frame including a first object and a second object. This step corresponds to a case where no video frame is classified into the second category in step S102. For example, some videos have subtitles in each frame from the beginning of the video, so that no frame extraction is performed to a video frame including only store images, and step S103 is required to detect the text position of the subtitles in these video frames. For example, in this step, a detection model of the target object, such as the EAST model or the SSD model, may be used for detection to obtain an external detection frame of the first object, where the external detection frame is the location of the first object.

Step S104, intercepting partial frames which do not include the first object in the video frame according to the position of the first object to generate a first picture;

in this step, a partial frame is obtained from the video frame by the position of the first object, and the partial frame does not include the first object. Optionally, the step S104 includes: step S201, dividing the video frame into a first partial frame including the first object and a second partial frame not including the first object according to the position of the first object in the video frame, wherein the area of the first partial frame is smaller than that of the second partial frame; step S202, a first picture is generated according to the second partial frame. Taking the first object as the subtitle and the second object as the store as an example, the position of the subtitle may be located at the lower part or the upper part or both the left and right sides of the video frame, and then the video frame may be cut into two parts according to the outer bounding box of the subtitle obtained in step S103, typically, the subtitle is located at the lower part of the video frame, and then the video frame may be cut into two parts according to the upper boundary of the outer bounding box of the subtitle, and then the partial frame of the upper part of the video frame has no subtitle and only has an image of the store, and at this time, the area of the second partial frame with only an image of the store is larger than the area of the first partial frame with only a subtitle. The first picture may then be generated from the second partial frame. Typically, the second partial frame may be directly used as the first picture.

However, if the first picture is a cover picture of the video, directly using the second portion of frames as the first picture may result in the cover picture having a portion of blank. Therefore, optionally, the step S202 includes: in response to the second partial frame satisfying a first condition, processing the second partial frame into a first picture having the same area as the video frame. Optionally, the first condition is: the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold. Illustratively, the first threshold is 60%. When the area of the second part of the frame is larger, the second part of the frame is suitable to be processed into the first picture. Specifically, the processing may be divided into padding and stretching, wherein if a ratio of an area of the second partial frame to an area of the video frame is greater than a second threshold, the partial frame may be directly padded into the first picture, and for example, if the area of the partial frame occupies 90% of the area of the video frame, the first picture may be generated by directly padding 10% of black below the partial frame; or, when the ratio of the area of the second partial frame to the area of the video frame is greater than the first threshold but smaller than the second threshold, stretching the second partial frame, where the first threshold is 60% and the second threshold is 90%, and at this time, if the color is directly supplemented, the generated first picture is not suitable for being used as a cover picture, so that the second partial frame may be stretched to the same size as the video frame to generate the first picture.

Similarly, in the above embodiment, only one video frame of the first category is selected to generate the first picture, and if the identification frames of the first category include a plurality of frames, the video frame with the largest proportion of the second object may also be selected to intercept the second partial frame or the video frame of the first category with the earliest time may be selected to intercept the second partial frame to generate the first picture, which is not described herein again.

In another embodiment, the first picture can also be synthesized by a plurality of video frames, which achieves better effect. Optionally, the step S104 includes:

step S301, obtaining a plurality of partial frames without the first object according to a plurality of video frames;

step S302, synthesizing a first picture according to the plurality of partial frames not including the first object.

In this embodiment, the position of the first object in each video frame is not fixed, and for example, the first object is a subtitle, and the position of the subtitle in each frame may change, in which case, a complete first picture may be generated according to the complementation between multiple frames. In step S301, a plurality of second partial frames may be obtained from the plurality of video frames, where the second partial frames do not include the first object and only include the second object. At this time, the color bits of the plurality of second partial frames may be added and averaged to obtain the first picture.

Optionally, the step S302 further includes: and taking and operating continuous partial frames in the plurality of partial frames not including the first object to obtain a first picture. Since the complementarity between the consecutive frames is stronger, for example, two consecutive frames in the plurality of second partial frames may be selected, and the color values thereof may be subjected to a union operation to obtain the first picture, where the union operation is: comparing the colors at the corresponding positions of the two second partial frames, and if the colors are the same, not changing the colors; if one is the first color and one is not, the first color is used as the color of the position, thereby generating the first picture. Illustratively, in two consecutive video frames, the video frames do not change much, but the position of the subtitle changes up and down, and a first picture including the complete second object can be synthesized by the above synthesizing method.

Fig. 4 is a usage scenario diagram of the technical solution of the present disclosure. In this scenario, the first object is a subtitle, and the second object is an image of a store. As shown in fig. 4, a plurality of video frames 401 may be obtained by performing frame extraction on a video; then, the video frame 401 is input into the picture classification model 402, the video frame 401 is divided into two video frames, namely a video frame 404 with subtitles and stores and a video frame 403 with stores only, if the video frame 403 appears, one of the video frames 403 can be directly selected as a first picture; otherwise, a first picture is selected from the video frame 404, and further, the video frame 404 is input into the text detection model 405 to output the position of the subtitle in the video frame 404, as shown by 406 in fig. 4, the video frame output by the text detection model 405 is shown as a position frame 407 with the subtitle, then the subtitle is cut off from the video frame according to the frame of the subtitle position, and the video frame after being cut off is processed into a first picture 408 with the same size as the input video frame, which can be used as a cover picture of the video.

Through the scheme, the cover image suitable for the video can be automatically generated, the problem that manual setting of the cover image wastes time and labor is solved, and the technical problems that the efficiency is not high and the effect is poor when the cover image is randomly selected in the prior art are also solved.

In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.

Fig. 5 is a schematic structural diagram of an embodiment of a picture generating device according to an embodiment of the present disclosure, and as shown in fig. 5, the device 500 includes: a frame extraction module 501, a video frame classification module 502, a first object detection module 503 and a first picture generation module 504. Wherein the content of the first and second substances,

a frame extracting module 501, configured to perform frame extraction on a video to obtain a video frame;

a video frame classification module 502, configured to input the video frame into a picture classification model to obtain a category of the video frame;

a first object detection module 503, configured to, in response to that the video frame is of a first category, input the video frame into an object detection model to obtain a position of a first object in the video frame;

a first picture generating module 504, configured to intercept, according to the position of the first object, a partial frame of the video frame that does not include the first object, so as to generate a first picture.

Further, the picture generation apparatus further includes:

and the first picture selection module is used for responding to the video frame being in the second category and taking the video frame as the first picture.

Further, the video frame category includes: a first category of video frames including the first object and the second object and a second category of video frames including only the second object.

Further, the first picture selection module is further configured to: and responding to the plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.

Further, the first picture generating module 504 further includes:

the first intercepting module is used for dividing the video frame into a first partial frame including the first object and a second partial frame not including the first object according to the position of the first object in the video frame, wherein the area of the first partial frame is smaller than that of the second partial frame;

and the generating module is used for generating the first picture according to the second part of frames.

Further, the generating module is further configured to:

in response to the second partial frame satisfying a first condition, processing the second partial frame into a first picture having the same area as the video frame.

Further, the first condition includes: the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.

Further, the first picture generating module 504 further includes:

the second intercepting module is used for obtaining a plurality of partial frames which do not comprise the first object according to the plurality of video frames;

a synthesizing module for synthesizing a first picture from the plurality of partial frames not including the first object.

Further, the synthesis module is further configured to:

and taking and operating continuous partial frames in the plurality of partial frames not including the first object to obtain a first picture.

The apparatus shown in fig. 5 can perform the method of the embodiment shown in fig. 1-3, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 2-5. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 2 to fig. 5, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: performing frame extraction on a video to obtain a video frame; inputting the video frame into a picture classification model to obtain the category of the video frame; responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting partial frames of the video frames which do not comprise the first object according to the position of the first object to generate a first picture.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a picture generation method including:

performing frame extraction on a video to obtain a video frame;

Further, after the inputting the video frame into the picture classification model to obtain the category of the video frame, the method further includes:

in response to the video frame being in a second category, treating the video frame as a first picture.

Further, the video frame category includes:

a first category of video frames including the first object and the second object and a second category of video frames including only the second object.

Further, the taking the video frame as a first picture in response to the video frame being in the second category includes:

and responding to the plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.

Further, the intercepting a portion of the video frame that does not include the first object according to the position of the first object generates a first picture, including:

dividing the video frame into a first partial frame including the first object and a second partial frame not including the first object according to the position of the first object in the video frame, wherein the area of the first partial frame is smaller than that of the second partial frame;

and generating a first picture according to the second part of the frame.

Further, the generating the first picture according to the second partial frame includes:

Further, the first condition includes:

the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.

Further, the step of intercepting a part of the video frame not including the first object according to the position of the first object generates a first picture:

obtaining a plurality of partial frames not including the first object according to a plurality of video frames;

and synthesizing a first picture according to the plurality of partial frames which do not comprise the first object.

Further, the synthesizing the first picture according to the plurality of partial frames not including the first object includes:

According to one or more embodiments of the present disclosure, there is provided a picture generation apparatus including:

Further, the picture generation apparatus further includes:

Further, the first picture generation module further includes:

Further, the generating module is further configured to:

Further, the first picture generation module further includes:

Further, the synthesis module is further configured to:

According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the picture generation method of any of the preceding first aspects.

According to one or more embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute any one of the picture generation methods of the foregoing first aspect.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A picture generation method, comprising:

performing frame extraction on a video to obtain a video frame;

2. The picture generation method as claimed in claim 1, wherein after said inputting said video frame into a picture classification model to obtain a category of said video frame, further comprising:

3. The picture generation method as claimed in any one of claims 1 or 2, wherein the categories of the video frames include:

4. The picture generation method of claim 3, wherein said taking the video frame as the first picture in response to the video frame being in the second category comprises:

5. The picture generation method according to claim 1, wherein said cutting out a portion of the video frame not including the first object according to the position of the first object generates a first picture, including:

and generating a first picture according to the second part of the frame.

6. The picture generation method of claim 5, wherein the generating the first picture from the second partial frame comprises:

7. The picture generation method as claimed in claim 6, wherein the first condition includes:

8. The picture generation method according to claim 1, wherein said truncating a portion of the video frames that do not include the first object according to the position of the first object generates a first picture:

9. The picture generation method according to claim 1, wherein said synthesizing a first picture from the plurality of partial frames excluding the first object comprises:

10. A picture generation apparatus comprising:

11. An electronic device, comprising:

a memory for storing computer readable instructions; and

a processor for executing the computer readable instructions, such that the processor when running implements the picture generation method according to any one of claims 1-9.

12. A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by a computer, cause the computer to perform the picture generation method of any one of claims 1-9.