CN111626922B - Picture generation method and device, electronic equipment and computer readable storage medium - Google Patents

Picture generation method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111626922B
CN111626922B CN202010392055.7A CN202010392055A CN111626922B CN 111626922 B CN111626922 B CN 111626922B CN 202010392055 A CN202010392055 A CN 202010392055A CN 111626922 B CN111626922 B CN 111626922B
Authority
CN
China
Prior art keywords
picture
video
frames
frame
video frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010392055.7A
Other languages
Chinese (zh)
Other versions
CN111626922A (en
Inventor
卢永晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010392055.7A priority Critical patent/CN111626922B/en
Publication of CN111626922A publication Critical patent/CN111626922A/en
Application granted granted Critical
Publication of CN111626922B publication Critical patent/CN111626922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the disclosure discloses a picture generation method, a picture generation device, electronic equipment and a computer readable storage medium. The picture generation method comprises the following steps: extracting frames from the video to obtain video frames; inputting the video frames into a picture classification model to obtain the categories of the video frames; responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture. By classifying and intercepting the video frames in the method, the technical problems of low efficiency and low accuracy in generating the video cover picture in the prior art are solved.

Description

Picture generation method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method and apparatus for generating a picture, an electronic device, and a computer readable storage medium.
Background
With the development of internet technology, the propagation form of the internet is continuously changing, from an early PC (personal computer) end to a today's smart phone end, the network access mode of the network citizen is more and more convenient, people enter a mobile internet era, mobile terminals represented by smart phones, tablet computers and the like are increasingly popular, mobile internet application gradually permeates into the daily life of people, and people can enjoy the convenience brought by the new technology at any time and any place. At present, the information explosion is large, the market of simple characters and pictures is lost, instead, a plurality of elements such as characters, pictures and sounds are fused, and the sense organs such as the vision, the hearing and the smell of people can be fully regulated. Among these, long video and short video become important forms of information dissemination.
In the prior art, a cover picture of a video is generally set to display main content of the video when the video is displayed, such as an introduction video of a restaurant, the front picture of the restaurant in the video can be set as the cover picture of the video, and the cover picture can be manually selected by a photographer of the video, so that time and labor are wasted; or the cover picture can be selected randomly, at this time, the selected picture may not meet the requirement or a large number of subtitles are provided in some videos, and the selected picture has subtitle influence effect. Therefore, the technical scheme for generating the cover picture of the video in the prior art has the problems of inaccuracy and low efficiency.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, an embodiment of the present disclosure provides a method for generating a picture, including:
extracting frames from the video to obtain video frames;
inputting the video frames into a picture classification model to obtain the categories of the video frames;
responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture.
In a second aspect, an embodiment of the present disclosure provides a picture generation apparatus, including:
the frame extraction module is used for extracting frames of the video to obtain video frames;
the video frame classification module is used for inputting the video frames into a picture classification model to obtain the categories of the video frames;
the first object detection module is used for responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
and the first picture generation module is used for intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the picture generation methods of the first aspect described above.
In a fourth aspect, an embodiment of the present disclosure provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform any one of the picture generation methods of the first aspect.
The embodiment of the disclosure discloses a picture generation method, a picture generation device, electronic equipment and a computer readable storage medium. The picture generation method comprises the following steps: extracting frames from the video to obtain video frames; inputting the video frames into a picture classification model to obtain the categories of the video frames; responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture. By classifying and intercepting the video frames in the method, the technical problems of low efficiency and low accuracy in generating the video cover picture in the prior art are solved. The foregoing description is only an overview of the disclosed technology, and may be implemented in accordance with the disclosure of the present disclosure, so that the above-mentioned and other objects, features and advantages of the present disclosure can be more clearly understood, and the following detailed description of the preferred embodiments is given with reference to the accompanying drawings.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a picture generation method according to an embodiment of the disclosure;
fig. 2 is a schematic diagram of a specific example of step S104 in the picture generation method according to the embodiment of the present disclosure;
fig. 3 is a schematic diagram of another specific example of step S104 in the picture generation method according to the embodiment of the present disclosure;
fig. 4 is a schematic view of a usage scenario of a picture generation method according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an embodiment of a picture generation device provided by an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Fig. 1 is a flowchart of an embodiment of a picture generation method provided by an embodiment of the present disclosure, where the picture generation method provided by the embodiment may be performed by a picture generation device, and the picture generation device may be implemented as software, or implemented as a combination of software and hardware, and the picture generation device may be integrally provided in a certain device in a picture generation system, such as a picture generation server or a picture generation terminal device. As shown in fig. 1, the method comprises the steps of:
step S101, frame extraction is carried out on video to obtain video frames;
in this disclosure, the video is a fabricated video, and exemplary videos are uploaded to various large video websites or short video applications.
Performing frame extraction processing on the video, wherein the frame extraction processing comprises extracting video frames according to a preset frequency, such as extracting every 0.1 second, and the number of extracted video frames is related to the frequency and the length of the video; or randomly according to the total number of frames, if 100 video frames need to be extracted, then randomly from the video, regardless of whether the intervals between the 100 video frames are uniform.
Illustratively, the step S101 further includes: and inputting the video into a frame extraction model to obtain a video frame. The frame extraction model extracts video frames meeting certain conditions in the video, if the proportion of certain colors in one video frame exceeds more than 70%, the video frames can be judged to be possibly incapable of being used as cover pictures, and the video frames can not be extracted at the moment; alternatively, the frame extraction model may be an extraction model for a certain target object, which extracts only video frames including the target object in the video frames, so that the cover picture may be more accurate when further generated later.
Other frame extraction modes can be used in the technical scheme of the present disclosure, and the present disclosure is not limited thereto, and is not repeated here.
Step S102, inputting the video frames into a picture classification model to obtain the categories of the video frames;
the picture classification model is a pre-trained classification model. Illustratively, the picture classification model is a classification model that classifies pictures into two types: a first category of video frames including a first object and a second category of video frames including only the second object. Illustratively, the first object is text and the second object is a target object such as a store of a restaurant.
The picture classification model is trained in advance by using a training atlas, wherein the atlas comprises pictures marked with only the second object and pictures simultaneously comprising the first object and the second object. Illustratively, the picture classification model uses a network of resnet18 and trains the network using a sigmoid loss function. The picture classification model may also be a multi-classification model, which trains the network using a softmax loss function, in which case one more classification may be added on the above-mentioned types: a picture that includes neither the first object nor the second object. Thus, video frames which are not suitable for being used as cover pictures in the video frames can be directly filtered out.
In one case, the appropriate video frame may be selected directly as the cover picture at this step. Thus optionally, after said step S102, it includes: and in response to the video frame being of the second category, taking the video frame as the first picture. In the example, the second category is a picture including only stores, and since there is a store picture of the category as a result of the classification in step S102, it can be directly used as a cover picture of the video.
Further, in the classification result in S102, there may be a plurality of video frames classified into the second class, and in this case, the responding to the video frame being the second class, takes the video frame as the first picture, further includes: and responding to a plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture. In this step, since the video frame having the largest second object duty ratio is required to be calculated as the cover picture, it is also required to detect the range of the second object using object detection and then calculate the area of the second object. Alternatively, in this case, the front-most video frame among the plurality of video frames of the second category may also be directly selected as the cover picture. It will be appreciated that when there are multiple video frames of the second category, any policy may be used to select one from the video frames as the first picture, which is not limited by the present disclosure and will not be described herein.
Step S103, responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
optionally, the first category is a video frame including a first object and a second object. This step corresponds to the case where no video frame is classified into the second category in step S102. For example, some videos have subtitles for each frame from the beginning of the video, so that frames cannot be extracted to video frames only including store images anyway, and step S103 is required to detect the text position of the subtitles in these video frames. In this step, detection may be performed in a detection model of a target object, such as an EAST model or an SSD model, to obtain an external detection frame of the first object, where the external detection frame is the position of the first object.
Step S104, intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture;
in this step, a partial frame may be derived from the video frame by the position of the first object, the partial frame excluding the first object. Optionally, the step S104 includes: step S201, dividing a video frame into a first part frame comprising a first object and a second part frame not comprising the first object according to the position of the first object in the video frame, wherein the area of the first part frame is smaller than that of the second part frame; step S202, generating a first picture according to the second part of frames. Taking the first object as a caption and the second object as a store as an example, the position of the caption may be located at the lower part or the upper part or at the left and right sides of the video frame, at this time, according to the circumscribed frame of the caption obtained in step S103, the video frame may be cut into two parts, typically, the caption is located at the lower part of the video frame, and then the video frame may be cut into two parts according to the upper boundary of the circumscribed frame of the caption, so that the partial frame of the upper half of the video frame has no caption but only an image of the store, and at this time, the area of the second partial frame with only the store image is larger than the area of the first partial frame with only the caption. The first picture may then be generated by the second partial frame. Typically, the second partial frame may be used directly as the first picture.
However, if the first picture is a cover picture of a video, directly using the second partial frame as the first picture may result in a part of the cover picture being blank. Thus, optionally, the step S202 includes: in response to the second partial frame meeting a first condition, the second partial frame is processed into a first picture that is the same as the video frame area. Optionally, the first condition is: the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold. Illustratively, the first threshold is 60%. When the area of the second partial frame is relatively large, it is suitable to process it into the first picture. Specifically, the processing may be divided into filling and stretching, where if the ratio of the area of the second partial frame to the area of the video frame is greater than a second threshold, the partial frame may be directly filled with the first picture, and, illustratively, the area of the partial frame accounts for 90% of the area of the video frame, and then 10% of black may be directly filled below the partial frame to generate the first picture; or when the ratio of the area of the second part frame to the area of the video frame is larger than the first threshold but smaller than the second threshold, stretching the second part frame to the same size as the video frame to generate a first picture, wherein the first threshold is 60% and the second threshold is 90%, and if the color is directly supplemented, the generated first picture is not suitable for being used as a cover picture.
Also, in the above embodiment, only one video frame of the first category is selected to generate the first picture, and if the identification frame of the first category includes a plurality of video frames, the video frame with the largest duty ratio of the second object may be selected to intercept the second partial frame, or the first video frame of the first category with the earliest time may be selected to intercept the second partial frame to generate the first picture, which will not be described herein.
In another embodiment, the first picture may also be synthesized by a plurality of video frames, so as to achieve better effect. Optionally, the step S104 includes:
step S301, obtaining a plurality of partial frames which do not comprise the first object according to a plurality of video frames;
step S302, synthesizing a first picture according to the plurality of partial frames excluding the first object.
In this embodiment, the position of the first object in each video frame is not fixed, and illustratively, the first object is a subtitle, the position of the subtitle in each frame may change, in which case a complete first picture may be generated according to the complementarity between the frames. In step S301, a plurality of second partial frames may be obtained from a plurality of video frames, where the second partial frames do not include the first object and include only the second object. At this time, color bits of the plurality of second partial frames may be added and averaged to obtain the first picture.
Optionally, the step S302 further includes: and taking and operating continuous partial frames in the partial frames which do not comprise the first object to obtain a first picture. Since the complementarity is stronger between successive frames, two successive frames of the plurality of second partial frames may be selected, and their color values may be combined to obtain the first picture, which is combined to operate as: comparing the colors at the corresponding positions of the two second partial frames, and if the colors are the same, not changing the colors; if one is the first color and one is not, the first color is used as the color of the position, thereby generating the first picture. In an exemplary embodiment, in two consecutive video frames, the video frames do not change much, but the position of the subtitle changes up and down, and at this time, a first picture including the complete second object may be synthesized by the above-described synthesizing method.
Fig. 4 is a usage scenario diagram of the technical solution of the present disclosure. In this scene, the first object is a subtitle, and the second object is a store image. As shown in fig. 4, video frames 401 are obtained by extracting frames from video, and there may be a plurality of video frames 401; then the video frame 401 is input into a picture classification model 402, the video frame 401 is divided into two video frames, namely a video frame 404 with subtitles and a store, and a video frame 403 with only the store, if the video frame 403 appears, one of the video frames 403 can be directly selected as a first picture; otherwise, a first picture is selected from the video frames 404, further, the video frames 404 are input into a text detection model 405 to output the position of the caption in the video frames 404, as shown in 406 in fig. 4, which is the video frames output through the text detection model 405, wherein a position frame 407 with the caption is provided, then the caption is cut out from the video frames according to the frame of the caption position, and the video frames after the cutting are processed into a first picture 408 with the same size as the input video frames, and the first picture can be used as a cover picture of the video.
Through the scheme, the cover image suitable for the video can be automatically generated, the problem that time and labor are wasted when the cover image is manually set is solved, and the technical problems of low efficiency and poor effect of randomly selecting the cover image in the prior art are also solved.
The embodiment of the disclosure discloses a picture generation method, a picture generation device, electronic equipment and a computer readable storage medium. The picture generation method comprises the following steps: extracting frames from the video to obtain video frames; inputting the video frames into a picture classification model to obtain the categories of the video frames; responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture. By classifying and intercepting the video frames in the method, the technical problems of low efficiency and low accuracy in generating the video cover picture in the prior art are solved.
In the foregoing, although the steps in the foregoing method embodiments are described in the foregoing order, it should be clear to those skilled in the art that the steps in the embodiments of the disclosure are not necessarily performed in the foregoing order, but may be performed in reverse order, parallel, cross, etc., and other steps may be further added to those skilled in the art on the basis of the foregoing steps, and these obvious modifications or equivalent manners are also included in the protection scope of the disclosure and are not repeated herein.
Fig. 5 is a schematic structural diagram of an embodiment of a picture generation device according to an embodiment of the present disclosure, as shown in fig. 5, the device 500 includes: a frame extraction module 501, a video frame classification module 502, a first object detection module 503, and a first picture generation module 504. Wherein, the liquid crystal display device comprises a liquid crystal display device,
the frame extraction module 501 is configured to extract frames from a video to obtain video frames;
the video frame classification module 502 is configured to input the video frame into a picture classification model to obtain a category of the video frame;
a first object detection module 503, configured to input the video frame into an object detection model to obtain a position of a first object in the video frame in response to the video frame being of a first category;
a first picture generating module 504, configured to intercept a portion of the video frames excluding the first object according to the position of the first object to generate a first picture.
Further, the image generating device further includes:
and the first picture selection module is used for responding to the video frame as a second category and taking the video frame as a first picture.
Further, the categories of the video frames include: a first category of video frames including a first object and a second category of video frames including only the second object.
Further, the first picture selection module is further configured to: and responding to a plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.
Further, the first picture generation module 504 further includes:
the first intercepting module is used for dividing the video frame into a first part frame comprising the first object and a second part frame not comprising the first object according to the position of the first object in the video frame, wherein the area of the first part frame is smaller than that of the second part frame;
and the generation module is used for generating the first picture according to the second partial frame.
Further, the generating module is further configured to:
in response to the second partial frame meeting a first condition, the second partial frame is processed into a first picture that is the same as the video frame area.
Further, the first condition includes: the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.
Further, the first picture generation module 504 further includes:
the second intercepting module is used for obtaining a plurality of partial frames which do not comprise the first object according to a plurality of video frames;
and the synthesis module is used for synthesizing the first picture according to the plurality of partial frames which do not comprise the first object.
Further, the synthesis module is further configured to:
and taking and operating continuous partial frames in the partial frames which do not comprise the first object to obtain a first picture.
The apparatus of fig. 5 may perform the method of the embodiment of fig. 1-3, and reference is made to the relevant description of the embodiment of fig. 2-5 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiments shown in fig. 2 to 5, and are not described herein.
Referring now to fig. 6, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting frames from the video to obtain video frames; inputting the video frames into a picture classification model to obtain the categories of the video frames; responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame; and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a picture generation method including:
extracting frames from the video to obtain video frames;
inputting the video frames into a picture classification model to obtain the categories of the video frames;
responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
and intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture.
Further, after the inputting the video frame into the picture classification model to obtain the category of the video frame, the method further includes:
and in response to the video frame being of the second category, taking the video frame as the first picture.
Further, the categories of the video frames include:
a first category of video frames including a first object and a second category of video frames including only the second object.
Further, the responding to the video frame being of the second category, taking the video frame as the first picture, includes:
and responding to a plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.
Further, the capturing the portion of the video frame excluding the first object according to the position of the first object generates a first picture, including:
dividing a video frame into a first part frame comprising a first object and a second part frame not comprising the first object according to the position of the first object in the video frame, wherein the area of the first part frame is smaller than that of the second part frame;
a first picture is generated from the second partial frame.
Further, the generating the first picture from the second partial frame includes:
in response to the second partial frame meeting a first condition, the second partial frame is processed into a first picture that is the same as the video frame area.
Further, the first condition includes:
the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.
Further, the capturing a part of the video frame excluding the first object according to the position of the first object generates a first picture:
obtaining a plurality of partial frames which do not comprise the first object according to the plurality of video frames;
and synthesizing a first picture according to the plurality of partial frames which do not comprise the first object.
Further, the synthesizing the first picture according to the plurality of partial frames excluding the first object includes:
and taking and operating continuous partial frames in the partial frames which do not comprise the first object to obtain a first picture.
According to one or more embodiments of the present disclosure, there is provided a picture generation apparatus including:
the frame extraction module is used for extracting frames of the video to obtain video frames;
the video frame classification module is used for inputting the video frames into a picture classification model to obtain the categories of the video frames;
the first object detection module is used for responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
and the first picture generation module is used for intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture.
Further, the image generating device further includes:
and the first picture selection module is used for responding to the video frame as a second category and taking the video frame as a first picture.
Further, the categories of the video frames include: a first category of video frames including a first object and a second category of video frames including only the second object.
Further, the first picture selection module is further configured to: and responding to a plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.
Further, the first picture generation module further includes:
the first intercepting module is used for dividing the video frame into a first part frame comprising the first object and a second part frame not comprising the first object according to the position of the first object in the video frame, wherein the area of the first part frame is smaller than that of the second part frame;
and the generation module is used for generating the first picture according to the second partial frame.
Further, the generating module is further configured to:
in response to the second partial frame meeting a first condition, the second partial frame is processed into a first picture that is the same as the video frame area.
Further, the first condition includes: the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.
Further, the first picture generation module further includes:
the second intercepting module is used for obtaining a plurality of partial frames which do not comprise the first object according to a plurality of video frames;
and the synthesis module is used for synthesizing the first picture according to the plurality of partial frames which do not comprise the first object.
Further, the synthesis module is further configured to:
and taking and operating continuous partial frames in the partial frames which do not comprise the first object to obtain a first picture.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the picture generation methods of the first aspect described above.
According to one or more embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform any of the picture generation methods of the foregoing first aspect.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (10)

1. A picture generation method, characterized by comprising:
extracting frames from the video to obtain video frames;
inputting the video frames into a picture classification model to obtain the categories of the video frames;
responding to the video frame as a first category, and inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture; wherein, the liquid crystal display device comprises a liquid crystal display device,
the capturing, according to the position of the first object, a part of the video frame excluding the first object to generate a first picture includes:
obtaining a plurality of partial frames which do not comprise the first object according to the plurality of video frames;
taking and operating continuous partial frames in the partial frames which do not comprise the first object to obtain a first graph; wherein the combining operation includes:
comparing colors at corresponding positions of two continuous partial frames, and if the colors are the same, not changing the colors; if one is the first color and one is not, the first color is used as the color of the position, thereby generating the first picture.
2. The picture generation method according to claim 1, wherein after the inputting the video frame into the picture classification model results in the category of the video frame, further comprising:
and in response to the video frame being of the second category, taking the video frame as the first picture.
3. The picture generation method according to any one of claims 1 or 2, wherein the category of the video frame includes:
a first category of video frames including a first object and a second category of video frames including only the second object.
4. A picture generation method according to claim 3, wherein the responding to the video frame being of the second category, taking the video frame as the first picture, comprises:
and responding to a plurality of video frames of the second category, and selecting the video frame with the largest proportion of the second object in the video frames as the first picture.
5. The picture generation method according to claim 1, wherein the capturing the portion of the video frame excluding the first object according to the position of the first object generates the first picture, comprises:
dividing a video frame into a first part frame comprising a first object and a second part frame not comprising the first object according to the position of the first object in the video frame, wherein the area of the first part frame is smaller than that of the second part frame;
a first picture is generated from the second partial frame.
6. The picture generation method of claim 5, wherein the generating the first picture from the second partial frame comprises:
in response to the second partial frame meeting a first condition, the second partial frame is processed into a first picture that is the same as the video frame area.
7. The picture generation method as claimed in claim 6, wherein the first condition comprises:
the ratio of the area of the second partial frame to the area of the video frame is greater than a first threshold.
8. A picture generation apparatus comprising:
the frame extraction module is used for extracting frames of the video to obtain video frames;
the video frame classification module is used for inputting the video frames into a picture classification model to obtain the categories of the video frames;
the first object detection module is used for responding to the video frame as a first category, inputting the video frame into an object detection model to obtain the position of a first object in the video frame;
the first picture generation module is used for intercepting part of frames which do not comprise the first object in the video frames according to the position of the first object to generate a first picture;
the first picture generation module further includes:
the second intercepting module is used for obtaining a plurality of partial frames which do not comprise the first object according to a plurality of video frames;
a synthesis module, which is used for fetching and operating continuous partial frames in the partial frames excluding the first object to obtain a first graph; wherein the combining operation includes:
comparing colors at corresponding positions of two continuous partial frames, and if the colors are the same, not changing the colors; if one is the first color and one is not, the first color is used as the color of the position, thereby generating the first picture.
9. An electronic device, comprising:
a memory for storing computer readable instructions; and
a processor configured to execute the computer readable instructions such that the processor when executed implements the picture generation method according to any one of claims 1-7.
10. A non-transitory computer readable storage medium storing computer readable instructions which, when executed by a computer, cause the computer to perform the picture generation method of any of claims 1-7.
CN202010392055.7A 2020-05-11 2020-05-11 Picture generation method and device, electronic equipment and computer readable storage medium Active CN111626922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010392055.7A CN111626922B (en) 2020-05-11 2020-05-11 Picture generation method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010392055.7A CN111626922B (en) 2020-05-11 2020-05-11 Picture generation method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111626922A CN111626922A (en) 2020-09-04
CN111626922B true CN111626922B (en) 2023-09-15

Family

ID=72272484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010392055.7A Active CN111626922B (en) 2020-05-11 2020-05-11 Picture generation method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111626922B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177603B (en) * 2021-05-12 2022-05-06 中移智行网络科技有限公司 Training method of classification model, video classification method and related equipment
CN114394100B (en) * 2022-01-12 2024-04-05 深圳力维智联技术有限公司 Unmanned patrol car control system and unmanned car

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106911A (en) * 2011-10-19 2013-05-15 三菱电机株式会社 Video processing device, video display device, video recording device, video processing method, and recording medium
US10163173B1 (en) * 2013-03-06 2018-12-25 Google Llc Methods for generating a cover photo with user provided pictures
CN110287949A (en) * 2019-07-30 2019-09-27 腾讯音乐娱乐科技(深圳)有限公司 Video clip extracting method, device, equipment and storage medium
CN110324662A (en) * 2019-06-28 2019-10-11 北京奇艺世纪科技有限公司 A kind of video cover generation method and device
CN110602554A (en) * 2019-08-16 2019-12-20 华为技术有限公司 Cover image determining method, device and equipment
CN110929070A (en) * 2019-12-09 2020-03-27 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN110991373A (en) * 2019-12-09 2020-04-10 北京字节跳动网络技术有限公司 Image processing method, image processing apparatus, electronic device, and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8941726B2 (en) * 2009-12-10 2015-01-27 Mitsubishi Electric Research Laboratories, Inc. Method and system for segmenting moving objects from images using foreground extraction
CN103210651B (en) * 2010-11-15 2016-11-09 华为技术有限公司 Method and system for video summary
US9904872B2 (en) * 2015-11-13 2018-02-27 Microsoft Technology Licensing, Llc Visual representations of photo albums

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106911A (en) * 2011-10-19 2013-05-15 三菱电机株式会社 Video processing device, video display device, video recording device, video processing method, and recording medium
US10163173B1 (en) * 2013-03-06 2018-12-25 Google Llc Methods for generating a cover photo with user provided pictures
CN110324662A (en) * 2019-06-28 2019-10-11 北京奇艺世纪科技有限公司 A kind of video cover generation method and device
CN110287949A (en) * 2019-07-30 2019-09-27 腾讯音乐娱乐科技(深圳)有限公司 Video clip extracting method, device, equipment and storage medium
CN110602554A (en) * 2019-08-16 2019-12-20 华为技术有限公司 Cover image determining method, device and equipment
CN110929070A (en) * 2019-12-09 2020-03-27 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN110991373A (en) * 2019-12-09 2020-04-10 北京字节跳动网络技术有限公司 Image processing method, image processing apparatus, electronic device, and medium

Also Published As

Publication number Publication date
CN111626922A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
US20210019562A1 (en) Image processing method and apparatus and storage medium
CN111445902B (en) Data collection method, device, storage medium and electronic equipment
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN110796664B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN104010124A (en) Method and device for displaying filter effect, and mobile terminal
CN109145970B (en) Image-based question and answer processing method and device, electronic equipment and storage medium
CN114331820A (en) Image processing method, image processing device, electronic equipment and storage medium
US11922597B2 (en) Video processing method and apparatus, readable medium, and electronic device
CN111626922B (en) Picture generation method and device, electronic equipment and computer readable storage medium
CN114630057B (en) Method and device for determining special effect video, electronic equipment and storage medium
CN113742025A (en) Page generation method, device, equipment and storage medium
US11871137B2 (en) Method and apparatus for converting picture into video, and device and storage medium
CN110619602B (en) Image generation method and device, electronic equipment and storage medium
CN113628097A (en) Image special effect configuration method, image recognition method, image special effect configuration device and electronic equipment
US11810336B2 (en) Object display method and apparatus, electronic device, and computer readable storage medium
CN113905177B (en) Video generation method, device, equipment and storage medium
CN117319736A (en) Video processing method, device, electronic equipment and storage medium
CN114399696A (en) Target detection method and device, storage medium and electronic equipment
CN111353536B (en) Image labeling method and device, readable medium and electronic equipment
CN113409208A (en) Image processing method, device, equipment and storage medium
CN111626919B (en) Image synthesis method and device, electronic equipment and computer readable storage medium
CN112766285B (en) Image sample generation method and device and electronic equipment
CN112785669B (en) Virtual image synthesis method, device, equipment and storage medium
CN115952315B (en) Campus monitoring video storage method, device, equipment, medium and program product
CN113360797B (en) Information processing method, apparatus, device, storage medium, and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant