CN112419447A - Method and device for generating dynamic graph, electronic equipment and storage medium - Google Patents

Method and device for generating dynamic graph, electronic equipment and storage medium Download PDF

Info

Publication number
CN112419447A
CN112419447A CN202011287439.9A CN202011287439A CN112419447A CN 112419447 A CN112419447 A CN 112419447A CN 202011287439 A CN202011287439 A CN 202011287439A CN 112419447 A CN112419447 A CN 112419447A
Authority
CN
China
Prior art keywords
motion
image
frame sequence
image frame
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011287439.9A
Other languages
Chinese (zh)
Inventor
谭冲
徐宁
李马丁
戴宇荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202011287439.9A priority Critical patent/CN112419447A/en
Publication of CN112419447A publication Critical patent/CN112419447A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a method, an apparatus, an electronic device, and a storage medium for generating a dynamic graph, wherein the method includes: acquiring a sequence of image frames comprising an object; detecting an image region corresponding to the object in the sequence of image frames; determining information related to the motion of the object based on the image frame sequence and the image region, and screening at least a part of the image frame sequence from the image frame sequence according to the information related to the motion of the object; a dynamic map is generated based on the at least a portion of the image frame sequence.

Description

Method and device for generating dynamic graph, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for generating a dynamic graph, an electronic device, and a storage medium.
Background
The dynamic images are special media between videos and static images, and are widely applied to social media, network news, digital forums, information bulletins and even communication mails to improve the emotional expression effect of users. As a container for recording spatio-temporal changes, which is simple and silent, circularly played, low in storage medium consumption and rich in emotional expression, a dynamic graph has the characteristics of good platform compatibility, portability, low network transmission bandwidth requirement and the like, and is widely applied in recent years. However, the dynamic graph currently used for broadcasting on social media is mainly generated by manual work through professional dynamic graph making software (e.g., gifside, ScreenToGif, Ezgif and other making tools), which makes the generation of the dynamic graph require a more professional maker and the making process is tedious.
Disclosure of Invention
The present disclosure provides a method, an apparatus, an electronic device, and a storage medium for generating a dynamic graph, so as to at least solve the problem in the related art that the manufacturing process is complicated when generating a dynamic graph. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a method of generating a dynamic graph, the method including: acquiring a sequence of image frames comprising an object; detecting an image region corresponding to the object in the sequence of image frames; determining information related to the motion of the object based on the image frame sequence and the image region, and screening at least a part of the image frame sequence from the image frame sequence according to the information related to the motion of the object; a dynamic map is generated based on the at least a portion of the image frame sequence.
Optionally, the acquiring a sequence of image frames including an object includes: acquiring a video including the object; calculating the similarity between adjacent frames in the video, and performing shot segmentation on the video according to the similarity to acquire a frame sequence under the same shot; and sampling the acquired frame sequence under the same shot, and taking the sampled frame sequence as an image frame sequence comprising the object.
Optionally, the detecting an image region corresponding to the object in the image frame sequence includes: detecting the image region with an object detection model based on the sequence of image frames. Further optionally, the determining information related to the motion of the object based on the sequence of image frames and the image region comprises: determining information about a motion of the object using a motion recognition model based on a sequence of consecutive ones of the sequence of image frames including the image region, wherein the object detection model and the motion recognition model are models based on a deep neural network.
Optionally, the information related to the motion of the object comprises information related to a motion category, or comprises information related to a motion category and information related to a motion amplitude.
Optionally, the information related to the motion of the object includes a motion class of the object, a confidence of the motion class, and a magnitude of motion amplitude, wherein the screening at least a portion of the image frame sequence from the image frame sequence according to the information related to the motion of the object includes: determining whether the motion class of the object belongs to a predetermined motion class and whether the confidence of the motion class and the magnitude of the motion amplitude satisfy predetermined conditions, and screening at least a part of the image frame sequence from the image frame sequence according to the determination result.
Optionally, the generating a dynamic map based on the at least a portion of the image frame sequence comprises: words matching the at least a portion of the image frame sequence are obtained and a dynamic map is generated based on the words and the at least a portion of the image frame sequence.
Optionally, the obtaining text matching the at least one partial image frame sequence and generating a dynamic map based on the text and the at least one partial image frame sequence includes: acquiring the characters based on the information related to the action of the object; cropping the at least a portion of the sequence of image frames from the image region; and generating a dynamic graph based on the characters and the image frame sequence after cutting.
Optionally, the obtaining the text based on the information related to the motion of the object includes: and searching a character list corresponding to the action category from a pre-established character library based on the action category of the object, and determining any one character in the character list as a character matched with the at least one part of the image frame sequence.
Optionally, the obtaining text matching the at least one partial image frame sequence and generating a dynamic map based on the text and the at least one partial image frame sequence includes: cropping the at least a portion of the sequence of image frames from the image region; acquiring the characters based on image characteristics of the cropped image frame sequence and information related to the action of the object; and generating a dynamic graph based on the characters and the image frame sequence after cutting.
Optionally, the obtaining the text based on the image features of the cropped image frame sequence and the information related to the motion of the object includes: based on the action category of the object, searching a character list corresponding to the action category from a pre-established character library; and searching the characters corresponding to the image characteristics in the character list to be used as the characters matched with the at least one part of image frame sequence.
Optionally, the generating a dynamic map based on the text and the cropped image frame sequence includes: determining the display attribute of the characters in the dynamic graph to be generated according to the content of the characters and the characteristics of the image frame sequence after cutting; and combining the characters with the clipped image frame sequence according to the determined display attributes to generate a dynamic graph.
Optionally, the display attribute includes at least one of a display size, a display position, and a display color of the text in the dynamic graph to be generated.
Optionally, the determining, according to the content of the text and the characteristics of the cropped image frame sequence, the display attribute of the text in the dynamic graph to be generated includes: and determining the display size and the display position of the characters in the dynamic image to be generated according to the content of the characters and the size of the image frame sequence after cutting, and determining the display color of the characters in the dynamic image to be generated according to the pixel information at the display position in the image frame sequence after cutting.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for generating a dynamic graph, the apparatus including: an acquisition unit configured to acquire an image frame sequence including an object; an object detection unit configured to detect an image region corresponding to the object in the image frame sequence; a motion recognition unit configured to determine information related to a motion of the object based on the image frame sequence and the image region, and to screen at least a portion of the image frame from the image frame sequence according to the information related to the motion of the object; a dynamic map generating unit configured to generate a dynamic map based on the at least a portion of the image frame.
Optionally, the obtaining unit is configured to: acquiring a video including the object; calculating the similarity between adjacent frames in the video, and performing shot segmentation on the video according to the similarity to acquire a frame sequence under the same shot; and sampling the acquired frame sequence under the same shot, and taking the sampled frame sequence as an image frame sequence comprising the object.
Optionally, the detection unit is configured to: detecting the image region with an object detection model based on the sequence of image frames, the action recognition unit being configured to: determining information about a motion of the object using a motion recognition model based on a sequence of consecutive ones of the sequence of image frames including the image region, wherein the object detection model and the motion recognition model are models based on a deep neural network.
Optionally, the information related to the motion of the object comprises information related to a motion category, or comprises information related to a motion category and information related to a motion amplitude.
Optionally, the information related to the motion of the object includes a motion category of the object, a confidence of the motion category, and a magnitude of the motion amplitude, wherein the motion recognition unit is configured to: determining whether the motion class of the object belongs to a predetermined motion class and whether the confidence of the motion class and the magnitude of the motion amplitude satisfy predetermined conditions, and screening at least a part of the image frame sequence from the image frame sequence according to the determination result.
Optionally, the dynamic graph generating unit is configured to: words matching the at least a portion of the image frame sequence are obtained and a dynamic map is generated based on the words and the at least a portion of the image frame sequence.
Optionally, the obtaining text matching the at least one partial image frame sequence and generating a dynamic map based on the text and the at least one partial image frame sequence includes: acquiring the characters based on the information related to the action of the object; cropping the at least a portion of the sequence of image frames from the image region; and generating a dynamic graph based on the characters and the image frame sequence after cutting.
Optionally, the obtaining the text based on the information related to the motion of the object includes: and searching a character list corresponding to the action category from a pre-established character library based on the action category of the object, and determining any one character in the character list as a character matched with the at least one part of the image frame sequence.
Optionally, the obtaining text matching the at least one partial image frame sequence and generating a dynamic map based on the text and the at least one partial image frame sequence includes: cropping the at least a portion of the sequence of image frames from the image region; acquiring the characters based on image characteristics of the cropped image frame sequence and information related to the action of the object; and generating a dynamic graph based on the characters and the image frame sequence after cutting.
Optionally, the obtaining the text based on the image features of the cropped image frame sequence and the information related to the motion of the object includes: based on the action category of the object, searching a character list corresponding to the action category from a pre-established character library; and searching the characters corresponding to the image characteristics in the character list to be used as the characters matched with the at least one part of image frame sequence.
Optionally, the generating a dynamic map based on the text and the cropped image frame sequence includes: determining the display attribute of the characters in the dynamic graph to be generated according to the content of the characters and the characteristics of the image frame sequence after cutting; and combining the characters with the clipped image frame sequence according to the determined display attributes to generate a dynamic graph.
Optionally, the display attribute includes at least one of a display size, a display position, and a display color of the text in the dynamic graph to be generated.
Optionally, the determining, according to the content of the text and the characteristics of the cropped image frame sequence, the display attribute of the text in the dynamic graph to be generated includes: and determining the display size and the display position of the characters in the dynamic image to be generated according to the content of the characters and the size of the image frame sequence after cutting, and determining the display color of the characters in the dynamic image to be generated according to the pixel information at the display position in the image frame sequence after cutting.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a method of generating a dynamic graph as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method of generating a dynamic graph as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions of which are executed by at least one processor in an electronic device to perform the method of generating a dynamic graph as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: since at least a part of the image frame sequences can be screened from the image frame sequences according to the information related to the motion of the object and the dynamic graph can be generated based on the at least a part of the image frame sequences, the dynamic graph can be automatically generated based on the acquired image frame sequences conveniently without a complicated manual manufacturing process, and a large amount of labor and time cost is saved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is an exemplary system architecture diagram in which exemplary embodiments of the present disclosure may be applied;
FIG. 2 is a flow chart illustrating a method of generating a dynamic graph in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating an apparatus for generating a dynamic graph according to an exemplary embodiment of the present disclosure;
fig. 4 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
Fig. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. A user may use the terminal devices 101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., video data upload requests, video data acquisition requests), and the like. Various communication client applications, such as a video recording application, an audio playing application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103. The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and capable of playing and recording audio and video, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal device 101, 102, 103 is software, it may be installed in the electronic devices listed above, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or it may be implemented as a single software or software module. And is not particularly limited herein.
The terminal devices 101, 102, 103 may be equipped with an image capturing device (e.g., a camera) to capture video data. In practice, the smallest visual unit that makes up a video is a Frame (Frame). Each frame is a static image. Temporally successive sequences of frames are composited together to form a motion video. Further, the terminal apparatuses 101, 102, 103 may also be mounted with a component (e.g., a speaker) for converting an electric signal into sound to play the sound, and may also be mounted with a device (e.g., a microphone) for converting an analog audio signal into a digital audio signal to pick up the sound.
The terminal devices 101, 102, 103 may collect video data by using image collecting devices installed thereon, and may play the video data by using video processing components that support video playing and are installed thereon.
The server 105 may be a server providing various services, such as a background server providing support for video recording type applications installed on the terminal devices 101, 102, 103. The background server can analyze, store and the like the received audio and video data uploading request and other data, and can also receive the audio and video data acquisition request sent by the terminal equipment 101, 102 and 103 and feed back the audio and video data indicated by the audio and video data acquisition request to the terminal equipment 101, 102 and 103.
The terminal devices 101, 102, 103 may send the captured video data to the server 105. The server 105 may process the received video to generate a dynamic graph and send the generated dynamic graph to the terminal devices 101, 102, 103, so that the terminal devices 101, 102, 103 may present the generated dynamic graph to the user on their user interface. Optionally, the terminal device may also process the collected video data by itself to generate a dynamic graph, and perform dynamic graph display.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating a dynamic graph provided in the embodiment of the present disclosure may be executed in the terminal devices 101, 102, and 103, may be executed in the server 105, or may be executed in cooperation between the terminal devices 101, 102, and 103 and the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation, and the disclosure is not limited thereto.
FIG. 2 is a flowchart illustrating a method of generating a dynamic graph in accordance with an exemplary embodiment.
In step S201, a sequence of image frames including an object is acquired. Here, the object may be a human or an animal (e.g., an animal having more physical activity such as a cat or a dog). According to an exemplary embodiment, in step S201, a video including the object may be first acquired, then similarities between adjacent frames in the video are calculated, and the video is shot-segmented according to the similarities to acquire a frame sequence under the same shot, and finally, the acquired frame sequence under the same shot is sampled, and the sampled frame sequence is taken as the image frame sequence including the object. As an example, the video including the object may be obtained locally in response to a user input, or may also be obtained from an external device (e.g., a server) in response to a user request, it should be noted that the present disclosure is not limited to the manner of obtaining the video. After the video including the object is acquired, the video may be decoded, for example, but not limited to, decoding the video using opencv or ffmpeg. The similarity between adjacent frames in the decoded video is then calculated, e.g.,image pixel statistics (e.g., utilizing image pixel statistical histograms) can be employed to compute similarity, but are not so limited. And then, performing shot segmentation on the video according to the similarity to obtain a frame sequence under the same shot, so that the dynamic graph generated later can be ensured to be in the same video shot scene. Finally, in order to reduce the processing of redundant frames of video, the acquired frame sequence under the same shot may be sampled to obtain an image frame sequence including the object, for example, equidistant sampling, key frame sampling, etc. may be performed, but the sampling manner is not limited to the above example. For convenience of description, the image frame sequence including the object acquired at step S201 may be represented as { I }1,I2,…,,InIn which InIs an image frame under the same lens, and n is a positive integer greater than or equal to 1.
After an image frame sequence including an object is acquired, in step S201, an image region corresponding to the object in the image frame sequence is detected. According to an exemplary embodiment, the image region may be detected using an object detection model based on the image frame sequence, but is not limited thereto, and for example, an image region corresponding to an object may be detected in each image frame directly through feature matching. As an example, the object detection model may be a model based on a deep neural network (e.g., YOLO v3, fast rcnn, etc.), but is not limited thereto. In fact, in the present disclosure, regardless of which type of detection model the object detection model is, the object detection model may be as long as the object detection model is capable of providing, for each image frame in the sequence of image frames, an image region corresponding to the object in the image frame. In the case where the object detection model is a deep neural network-based model, the object detection model needs to be trained in advance before being used to detect an image region. Specifically, an image set including an object may be first acquired, object region information of each image in the image set is labeled, and then an object detection model is trained using the labeled image set of the object. For example, if the object is a person, a set of images including the person may be obtained, and a region corresponding to the person is marked with a rectangular frame in each image in the set of images, and then the object detection model is trained using the marked set of images. After the object detection model is trained, if an image frame sequence including an object is input to the object detection model, the object detection model may directly provide object region information in each image frame, so that it may be possible to detect an image region corresponding to the object.
After detecting an image region corresponding to an object, in step S203, information related to the motion of the object is determined based on the image frame sequence and the image region, and at least a part of the image frame sequence is filtered from the image frame sequence according to the information related to the motion of the object. According to an exemplary embodiment, information related to a motion of the object may be determined using a motion recognition model based on a sequence of consecutive image frames of the sequence of image frames including the image region. Here, the sequence of consecutive image frames is a sequence of image frames under the same video shot scene and comprising the object. For example, assume a sequence of image frames { I1,I2,…,,InI in (b) }iTo IjFor a sequence of consecutive j-i +1 frames satisfying the aforementioned condition, the sequence of consecutive j-i +1 frames may be input to a motion recognition model to determine information about the dynamics of the object using the motion recognition model. Furthermore, since the operations and computations involved in motion recognition are large, motion recognition network models typically require a maximum input of a predetermined number of consecutive image frames, e.g., w frames. In this case, if j-i +1<Inputting j-i +1 frames into the network if w frames are input, if j-i +1>If the motion recognition model is a w frame, the consecutive w frames need to be traversed in the j-i +1 frame according to a preset traversal interval, and then the traversed w frames are input into the motion recognition model. For example, the traversal interval may be set to w/2, but is not limited thereto.
By way of example, the motion recognition model may be a model based on a deep neural network (e.g., Slow-fast, X3D, I3D, etc., model), but is not so limited. In fact, in the present disclosure, regardless of which type of model the motion recognition model is, it is sufficient if it can recognize motion information of the object for a continuous frame image sequence that satisfies the above-described conditions. According to an exemplary embodiment, the information related to the motion of the object comprises information related to a motion category or comprises information related to a motion category and information related to a motion magnitude.
In the case where the motion recognition model is a deep neural network-based model, the motion recognition model needs to be trained in advance before being used to determine information about the subject. Specifically, a motion video data set may be constructed, and a motion recognition model may be trained using the constructed motion video data set. For example, if the object is a human, an open-source motion video data set (e.g., Kinetics400, Kinetics600, AVA, UCF101, etc.) may be directly used as a training data set for training a motion recognition model, or online video data may be searched for from keywords and labeled, and then the labeled video data set may be used as a training data set for training a motion recognition model. The annotation content may include a start-stop timestamp and action category information for the action, or may include a start-stop timestamp, action category information, and action magnitude information for the action (e.g., an action magnitude score, i.e., a score used to measure the magnitude of the action magnitude).
As an example, the information related to the motion of the object may include a motion class of the object, a confidence of the motion class, and a magnitude of the motion magnitude. The action category may be, for example, an action category label output by the action recognition model, and the confidence of the action category is used to measure the recognition accuracy of the action category, and may be, for example, a confidence score of the action category output by the action recognition model. In the case that the information related to the motion of the object includes a motion class of the object, a confidence of the motion class, and a magnitude of a motion magnitude, screening at least a portion of the sequence of image frames from the sequence of image frames according to the information related to the motion of the object may include: determining whether the motion class of the object belongs to a predetermined motion class and whether the confidence of the motion class and the magnitude of the motion amplitude satisfy predetermined conditions, and screening at least a part of the image frame sequence from the image frame sequence according to the determination result. For example, an action category may be screened from the image frame sequence as a preset action category suitable for outputting a dynamic map, the confidence score of the action category is greater than a preset score, and the magnitude of the action is greater than a predetermined magnitude. And if the action category does not belong to the preset action category, or the confidence score of the action category is less than or equal to the preset score, or the action amplitude size is less than or equal to the predetermined amplitude, the image frames can be discarded.
After screening out at least a part of the image frame sequence, a dynamic map is generated based on the at least a part of the image frame sequence in step S204. Specifically, according to an exemplary embodiment, in step S204, a text matching the at least a portion of the image frame sequence may be acquired and a dynamic map may be generated based on the text and the at least a portion of the image frame sequence. By matching the text for the at least one part of the image frame sequence and generating the dynamic image based on the matched text and the at least one part of the image frame sequence, the generated dynamic image content can be richer and more interesting.
According to an exemplary embodiment, obtaining the text matching the at least one portion of the image frame sequence and generating the dynamic map based on the text and the at least one portion of the image frame sequence may include: first, words matching the at least one partial image frame sequence are obtained based on information about the motion of the object, for example, a word list corresponding to a motion category may be searched from a pre-established word library based on the motion category of the object (for example, the motion category may be a motion classification tag output by a motion recognition model), and any one word in the word list is determined as a word matching the at least one partial image frame sequence. For example, if the action category is dancing, the matching text may be "big move up" or the like indicating dancing. As another example, if the action category is drinking, the matching text may be a word such as "feeling deep and stuffy mouth"; if the action category is kneeling down, the matched text may be text such as "please get my knee"; if the action category is swimming, the matching text may be, for example, "Chang Yang at best," or the like. Then, the at least a portion of the sequence of image frames is cropped according to the image region. Specifically, a clipping range capable of containing all similar objects in each image frame may be calculated according to an image area corresponding to an object in each image frame of the at least one portion of the image frame sequence, and each image frame may be clipped according to the calculated clipping range. Finally, a dynamic map is generated based on the text matched with the at least one part of the image frame sequence and the clipped image frame sequence.
Optionally, according to another exemplary embodiment, obtaining a text matching the at least a portion of the image frame sequence and generating the dynamic map based on the text and the at least a portion of the image frame sequence may include: first, the at least a portion of the sequence of image frames is cropped according to the image region. Then, the text is acquired based on image features of the cropped image frame sequence and information related to the motion of the object. Specifically, for example, a text list corresponding to the motion category may be searched from a text library established in advance based on the motion category of the object, and then, a text corresponding to the image feature may be searched in the text list as a text matching the at least one partial image frame sequence. Here, the image feature may be a feature obtained by performing feature extraction on each image frame in the cropped image frame sequence. For example, in the case where the motion recognition model is a model based on a deep neural network, the image feature may be a convolution feature of each image frame obtained using the motion recognition model, and the convolution feature may be a feature map of each image frame. In addition, the pre-established correspondence between the convolution features and the characters in the character library can be established according to the convolution features obtained after the existing dynamic graph of the characters is input into the motion recognition model and character recognition results (which can be assisted by manual marking and calibration) obtained by performing character recognition on the existing dynamic graph of the characters. In addition, when searching for a text corresponding to the image feature in the text list, a convolution feature most similar to the convolution feature (for example, a convolution feature with a minimum euclidean distance or a minimum cosine distance) may be searched for in the convolution feature list corresponding to the text list according to the convolution feature of the clipped image frame sequence, and the text corresponding to the searched convolution feature may be regarded as a text matching the at least one part of the image frame sequence. Finally, a dynamic map is generated based on the text matched with the at least one part of the image frame sequence and the clipped image frame sequence.
In the above exemplary embodiment, generating the dynamic map based on the text and the cropped image frame sequence matched with the at least a portion of the image frame sequence may include: and determining the display attribute of the characters in the dynamic graph to be generated according to the content of the characters and the characteristics of the cut image frame sequence, and combining the characters and the cut image frame sequence according to the determined display attribute to generate the dynamic graph. For example, the display attribute may include at least one of a display size, a display position, and a display color of the text in the dynamic graph to be generated, but is not limited thereto. In addition, the characteristics of the cropped image frame sequence may include, but are not limited to, the size of the cropped image frame sequence and pixel information of the cropped image frame sequence. According to an exemplary embodiment, the display size and the display position of the text in the dynamic graph to be generated may be determined according to the content of the text and the size of the cropped image frame sequence, and the display color of the text in the dynamic graph to be generated may be determined according to the pixel information at the display position in the cropped image frame sequence. For example, if the sequence of the cropped image frames is a rectangle and the content of the text is long, the display position of the text may be determined according to the lengths in the horizontal and vertical directions of the rectangle (for example, the text may be displayed in the horizontal direction when the length in the horizontal direction is large), and the size of the text may be determined according to the length in the horizontal direction of the rectangle. In addition, in order to make the displayed characters more prominent or prominent, the display color of the characters in the dynamic image to be generated can be determined according to the pixel information of the displayed positions of the characters in the cut image frame sequence. For example, the color information of each pixel of the image region at the text display position may be counted, and the main color of the image region may be determined according to the color information, and then a foreground color that is more conspicuous with the main color as a background color may be selected according to the main color as the display color of the text in the dynamic graph to be generated. For example, if the main color is black, white may be selected as the display color of the text. After the display attribute of the text is determined, the matched text and the clipped image frame sequence can be synthesized by using a video image processing packet such as ffmpeg or moviepy according to the display attribute, and a dynamic image is output. For example, the text "big moving up" may be displayed in white font below the animation.
The method of generating a dynamic map according to the exemplary embodiment of the present disclosure has been described above in conjunction with fig. 2, and according to the method, since at least a portion of a picture frame sequence may be screened from an image frame sequence according to information related to an action of an object and a dynamic map may be generated based on the at least a portion of the picture frame sequence, it is possible to conveniently and automatically generate a dynamic map based on the acquired image frame sequence without a cumbersome manual fabrication process, saving a lot of labor and time costs.
Fig. 3 is a block diagram illustrating an apparatus for generating a dynamic graph (hereinafter, simply referred to as "dynamic graph generating apparatus" for convenience of description) according to an exemplary embodiment of the present disclosure.
Referring to fig. 3, the dynamic map generating apparatus 300 may include an acquisition unit 301, an object detection unit 302, an action recognition unit 303, and a dynamic map generating unit 304. Specifically, the acquisition unit 301 may acquire an image frame sequence including an object. The object detection unit 302 may detect an image area corresponding to the object in the image frame sequence. The motion recognition unit 303 may determine information related to the motion of the object based on the image frame sequence and the image region, and filter at least a portion of the image frame from the image frame sequence according to the information related to the motion of the object. The dynamic map generating unit 304 may generate a dynamic map based on the at least one portion of the image frame. Since the method for generating a dynamic graph shown in fig. 2 can be performed by the dynamic graph generating apparatus 300 shown in fig. 3, any relevant details related to the operations performed by the units in fig. 3 can be referred to the corresponding description of fig. 2, and are not described herein again.
Further, it should be noted that although the dynamic graph generation apparatus 300 is described above as being divided into units for respectively executing corresponding processes, it is clear to those skilled in the art that the processes executed by the units described above may also be executed without any specific unit division or explicit demarcation between units by the dynamic graph generation apparatus 300.
Fig. 4 is a block diagram of an electronic device 400 that may include at least one memory 401 having a set of computer-executable instructions stored therein that, when executed by the at least one processor, perform a method of generating a dynamic graph in accordance with an embodiment of the disclosure, and at least one processor 402, according to an embodiment of the disclosure.
By way of example, the electronic device may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. The electronic device need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) either individually or in combination. The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In an electronic device, a processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor may execute instructions or code stored in the memory, which may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory may be integral to the processor, e.g., RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.
In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a method of generating a dynamic graph according to an exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
In an embodiment according to the present disclosure, there may also be provided a computer program product, instructions of which are executable by at least one processor in an electronic device to implement a method of generating a dynamic graph according to an exemplary embodiment of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method of generating a dynamic graph, comprising:
acquiring a sequence of image frames comprising an object;
detecting an image region corresponding to the object in the sequence of image frames;
determining information related to the motion of the object based on the image frame sequence and the image region, and screening at least a part of the image frame sequence from the image frame sequence according to the information related to the motion of the object;
a dynamic map is generated based on the at least a portion of the image frame sequence.
2. The method of claim 1, wherein said acquiring a sequence of image frames including an object comprises:
acquiring a video including the object;
calculating the similarity between adjacent frames in the video, and performing shot segmentation on the video according to the similarity to acquire a frame sequence under the same shot;
and sampling the acquired frame sequence under the same shot, and taking the sampled frame sequence as an image frame sequence comprising the object.
3. The method of claim 2, wherein the detecting an image region in the sequence of image frames corresponding to the object comprises: detecting the image region with an object detection model based on the sequence of image frames,
the determining information related to the motion of the object based on the sequence of image frames and the image region comprises: determining information related to a motion of the object using a motion recognition model based on a sequence of consecutive image frames of the sequence of image frames including the image region,
wherein the object detection model and the motion recognition model are deep neural network based models.
4. The method of claim 1, wherein the information related to the motion of the object comprises information related to a category of motion or comprises information related to a category of motion and information related to a magnitude of motion.
5. The method of claim 1, wherein the information related to the motion of the object includes a motion class of the object, a confidence level of the motion class, and a magnitude of the motion magnitude, wherein the screening at least a portion of the sequence of image frames from the sequence of image frames according to the information related to the motion of the object comprises:
determining whether the motion class of the object belongs to a predetermined motion class and whether the confidence of the motion class and the magnitude of the motion amplitude satisfy predetermined conditions, and screening at least a part of the image frame sequence from the image frame sequence according to the determination result.
6. The method of claim 1, wherein generating the dynamic map based on the at least a portion of the sequence of image frames comprises:
words matching the at least a portion of the image frame sequence are obtained and a dynamic map is generated based on the words and the at least a portion of the image frame sequence.
7. The method of claim 6, wherein the obtaining text that matches the at least a portion of the image frame sequence and generating a dynamic map based on the text and the at least a portion of the image frame sequence comprises:
acquiring the characters based on the information related to the action of the object;
cropping the at least a portion of the sequence of image frames from the image region;
and generating a dynamic graph based on the characters and the image frame sequence after cutting.
8. An apparatus for generating a dynamic graph, comprising:
an acquisition unit configured to acquire an image frame sequence including an object;
an object detection unit configured to detect an image region corresponding to the object in the image frame sequence;
a motion recognition unit configured to determine information related to a motion of the object based on the image frame sequence and the image region, and to screen at least a portion of the image frame from the image frame sequence according to the information related to the motion of the object;
a dynamic map generating unit configured to generate a dynamic map based on the at least a portion of the image frame.
9. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 7.
10. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 7.
CN202011287439.9A 2020-11-17 2020-11-17 Method and device for generating dynamic graph, electronic equipment and storage medium Pending CN112419447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011287439.9A CN112419447A (en) 2020-11-17 2020-11-17 Method and device for generating dynamic graph, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011287439.9A CN112419447A (en) 2020-11-17 2020-11-17 Method and device for generating dynamic graph, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112419447A true CN112419447A (en) 2021-02-26

Family

ID=74832717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011287439.9A Pending CN112419447A (en) 2020-11-17 2020-11-17 Method and device for generating dynamic graph, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112419447A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8630454B1 (en) * 2011-05-31 2014-01-14 Google Inc. Method and system for motion detection in an image
US20180089512A1 (en) * 2016-09-23 2018-03-29 Microsoft Technology Licensing, Llc Automatic selection of cinemagraphs
CN108259769A (en) * 2018-03-30 2018-07-06 广东欧珀移动通信有限公司 Image processing method, device, storage medium and electronic equipment
CN109191548A (en) * 2018-08-28 2019-01-11 百度在线网络技术(北京)有限公司 Animation method, device, equipment and storage medium
CN110084306A (en) * 2019-04-30 2019-08-02 北京字节跳动网络技术有限公司 Method and apparatus for generating dynamic image
CN110309720A (en) * 2019-05-27 2019-10-08 北京奇艺世纪科技有限公司 Video detecting method, device, electronic equipment and computer-readable medium
CN110675473A (en) * 2019-09-17 2020-01-10 Oppo广东移动通信有限公司 Method, device, electronic equipment and medium for generating GIF dynamic graph
CN111882625A (en) * 2020-07-07 2020-11-03 北京达佳互联信息技术有限公司 Method and device for generating dynamic graph, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8630454B1 (en) * 2011-05-31 2014-01-14 Google Inc. Method and system for motion detection in an image
US20180089512A1 (en) * 2016-09-23 2018-03-29 Microsoft Technology Licensing, Llc Automatic selection of cinemagraphs
CN108259769A (en) * 2018-03-30 2018-07-06 广东欧珀移动通信有限公司 Image processing method, device, storage medium and electronic equipment
CN109191548A (en) * 2018-08-28 2019-01-11 百度在线网络技术(北京)有限公司 Animation method, device, equipment and storage medium
CN110084306A (en) * 2019-04-30 2019-08-02 北京字节跳动网络技术有限公司 Method and apparatus for generating dynamic image
CN110309720A (en) * 2019-05-27 2019-10-08 北京奇艺世纪科技有限公司 Video detecting method, device, electronic equipment and computer-readable medium
CN110675473A (en) * 2019-09-17 2020-01-10 Oppo广东移动通信有限公司 Method, device, electronic equipment and medium for generating GIF dynamic graph
CN111882625A (en) * 2020-07-07 2020-11-03 北京达佳互联信息技术有限公司 Method and device for generating dynamic graph, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11750875B2 (en) Providing visual content editing functions
US8270684B2 (en) Automatic media sharing via shutter click
US11416544B2 (en) Systems and methods for digitally fetching music content
US8995823B2 (en) Method and system for content relevance score determination
US20180025215A1 (en) Anonymous live image search
CN108182211B (en) Video public opinion acquisition method and device, computer equipment and storage medium
CN108520046B (en) Method and device for searching chat records
US10769196B2 (en) Method and apparatus for displaying electronic photo, and mobile device
CN109408672B (en) Article generation method, article generation device, server and storage medium
CN110740290B (en) Monitoring video previewing method and device
CN112084812A (en) Image processing method, image processing device, computer equipment and storage medium
JP2014153977A (en) Content analysis device, content analysis method, content analysis program, and content reproduction system
CN103475532A (en) Hardware detection method and system thereof
CN114845149B (en) Video clip method, video recommendation method, device, equipment and medium
CN112419447A (en) Method and device for generating dynamic graph, electronic equipment and storage medium
CN112580644A (en) Testing method and device based on video stream cutout time and readable storage medium
CN115858854B (en) Video data sorting method and device, electronic equipment and storage medium
CN116887011A (en) Video marking method, device, equipment and medium
CN118118747A (en) Video profile generation method, device, storage medium and computer equipment
CN113947610A (en) Image processing method and device
CN116597360A (en) Video processing method, system, equipment and medium and program product
CN117851347A (en) File management method and device
CN113886720A (en) Content display method and device, electronic equipment and storage medium
CN111831839A (en) Display method, system, equipment and storage medium based on voice entry tree
CN110781344A (en) Method, device and computer storage medium for voice message synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination