WO2019128979A1 - 关键帧调度方法和装置、电子设备、程序和介质 - Google Patents

关键帧调度方法和装置、电子设备、程序和介质 Download PDF

Info

Publication number
WO2019128979A1
WO2019128979A1 PCT/CN2018/123445 CN2018123445W WO2019128979A1 WO 2019128979 A1 WO2019128979 A1 WO 2019128979A1 CN 2018123445 W CN2018123445 W CN 2018123445W WO 2019128979 A1 WO2019128979 A1 WO 2019128979A1
Authority
WO
WIPO (PCT)
Prior art keywords
key frame
frame
current
feature
scheduling
Prior art date
Application number
PCT/CN2018/123445
Other languages
English (en)
French (fr)
Inventor
石建萍
李玉乐
林达华
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020207005376A priority Critical patent/KR102305023B1/ko
Priority to MYPI2020000416A priority patent/MY182985A/en
Priority to EP18897706.0A priority patent/EP3644221A4/en
Priority to US16/633,341 priority patent/US11164004B2/en
Priority to JP2020519444A priority patent/JP6932254B2/ja
Priority to SG11202000578UA priority patent/SG11202000578UA/en
Publication of WO2019128979A1 publication Critical patent/WO2019128979A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement

Definitions

  • the present application relates to computer vision technology, and more particularly to a key frame scheduling method and apparatus, an electronic device, a program, and a medium.
  • Video semantic segmentation is an important issue in the task of computer vision and video semantic understanding.
  • Video semantic segmentation models have important applications in many areas, such as autonomous driving, video surveillance, and video target analysis.
  • Video semantic segmentation speed is an important aspect of video semantic segmentation tasks.
  • the embodiment of the present application provides a technical solution for key frame scheduling.
  • a key frame scheduling method including:
  • the current frame is determined as a current key frame, and feature extraction is performed on a lower layer feature of the current key frame by using a second network layer of the neural network to obtain the current A high-level feature of a key frame; wherein, in the neural network, a network depth of the first network layer is shallower than a network depth of the second network layer.
  • the method further includes:
  • the method further includes:
  • the initial key frame is semantically segmented, and the semantic tag of the initial key frame is output.
  • the method further includes:
  • the scheduling probability of the current frame is obtained according to a lower layer feature of a previous key frame adjacent to the current frame and a lower layer feature of the current frame. Values, including:
  • the method further includes:
  • a key frame scheduling apparatus including:
  • a first feature extraction unit including a first network layer of the neural network, configured to perform feature extraction on the current frame to obtain a low-level feature of the current frame;
  • a scheduling unit configured to acquire a scheduling probability value of the current frame according to a lower layer feature of a previous key frame adjacent to the current frame and a lower layer feature of the current frame; wherein, a lower layer of the previous key frame The feature is obtained by performing feature extraction on the previous key frame by the first network layer, where the scheduling probability value is a probability that the current frame is scheduled as a key frame;
  • a determining unit configured to determine, according to a scheduling probability value of the current frame, whether the current frame is scheduled as a key frame
  • a second feature extraction unit including a second network layer of the neural network, configured to determine the current frame as a current key frame if it is determined that the current frame is scheduled as a key frame according to the determination result of the determining unit Performing feature extraction on the low-level features of the current key frame to obtain a high-level feature of the current key frame; wherein, in the neural network, the network depth of the first network layer is shallower than that of the second network layer Network depth.
  • the previous key frame includes a predetermined initial key frame
  • the device also includes:
  • a cache unit for buffering low-level features and high-level features of the key frame, the key frame including the initial key frame.
  • the first feature extraction unit is further configured to cache, in the cache unit, a low-level feature of the current key frame according to the determination result of the determining unit. .
  • the scheduling unit includes:
  • a splicing sub-unit configured to splicing a low-level feature of the previous key frame and a low-level feature of the current frame to obtain a splicing feature
  • a key frame scheduling network configured to acquire a scheduling probability value of the current frame based on the splicing feature.
  • the device further includes:
  • a semantic segmentation unit configured to perform semantic segmentation on the key frame, and output a semantic tag of the key frame, where the key frame includes: an initial key frame, the previous key frame, or the current key frame.
  • an electronic device including: the key frame scheduling apparatus according to any one of the embodiments of the present application.
  • an electronic device including:
  • the processor runs the key frame scheduling device
  • the unit of the key frame scheduling device described in any of the embodiments of the present application is executed.
  • an electronic device includes: a processor and a memory
  • the memory is configured to store at least one executable instruction that causes the processor to perform operations of steps in a key frame scheduling method described in any one of the embodiments of the present application.
  • a computer program comprising computer readable code, when the computer readable code is run on a device, the processor in the device performs the implementation of the present application The instructions of each step in the vehicle driving simulation method described in one embodiment.
  • a computer readable medium for storing a computer readable instruction, where the instruction is executed to implement a key frame scheduling method according to any one of the embodiments of the present application. The operation of each step.
  • the key frame scheduling method and apparatus, the electronic device, the program, and the medium provided by the foregoing embodiments of the present application perform feature extraction on the current frame to obtain a low-level feature of the current frame, according to the lower layer feature of the adjacent previous key frame and the current frame.
  • the low-level feature acquires the scheduling probability value of the current frame; determines whether the current frame is scheduled as a key frame according to the scheduling probability value of the current frame; and if it determines that the current frame is scheduled as a key frame, performs feature extraction on the low-level features of the current key frame, Get high-level features of the current keyframe.
  • the embodiment of the present application may acquire the change of the current frame relative to the lower layer feature of the previous key frame according to the low layer feature of the previous key frame and the low layer feature of the current frame, and utilize the change of the low layer feature between different frames in the video, thereby Fast, accurate and adaptive key frame scheduling improves the scheduling efficiency of key frames.
  • FIG. 1 is a schematic flowchart of a key frame scheduling method according to an embodiment of the present application.
  • FIG. 2 is another schematic flowchart of a key frame scheduling method according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a key frame scheduling apparatus according to an embodiment of the present application.
  • FIG. 4 is another schematic structural diagram of a key frame scheduling apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an application embodiment of an electronic device according to an embodiment of the present application.
  • Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a schematic flowchart of a key frame scheduling method according to an embodiment of the present application. As shown in FIG. 1, the method of this embodiment includes:
  • the current frame may be any one of the images in the video.
  • the step 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first feature extraction unit executed by the processor.
  • the low-level feature of the previous key frame is obtained by performing feature extraction on the previous key frame by the first network layer.
  • the scheduling probability value proposed in this embodiment is a probability that the current frame is scheduled as a key frame.
  • the step 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a scheduling unit that is executed by the processor.
  • whether the current frame is scheduled as a key frame may be determined according to whether the scheduling probability value of the current frame is greater than a preset threshold.
  • the preset threshold is 80%. If the scheduling probability value of the current frame is greater than or equal to the preset threshold, it is determined that the current frame is scheduled as a key frame, that is, the current frame is considered to be a key frame; if the scheduling probability value of the current frame is Below the preset threshold, it is determined that the current frame is not scheduled as a key frame.
  • the step 106 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a determining unit that is executed by the processor.
  • the second network layer of the neural network performs feature extraction on the low-level features of the current key frame to obtain a high-level feature of the current key frame.
  • the network depth of the first network layer is shallower than the network depth of the second network layer.
  • the step 108 may be performed by a processor invoking a corresponding instruction stored in the memory or by a second feature extraction unit executed by the processor.
  • the neural network includes two or more network layers with different network depths.
  • the network layer used for feature extraction may be referred to as a feature layer, and after receiving a frame, the neural network passes The first feature layer extracts the feature of the input frame and inputs it into the second feature layer. From the second feature layer, each feature layer performs feature extraction on the input feature in turn, and extracts the extracted feature input. Feature extraction is performed to the next network layer until features are available that can be used for semantic segmentation.
  • the network depth of at least one feature layer in the neural network is shallow to deep according to the order of feature extraction.
  • the feature layer used for feature extraction in the neural network may be divided into two parts: a low layer feature layer and a high layer feature layer.
  • the feature that at least one of the low-level feature layers performs feature extraction and the final output is called a low-level feature
  • at least one feature layer of the high-level feature layer sequentially performs feature extraction and finally outputs a feature called a high-level feature.
  • the depth of the feature layer with deeper network depth is larger, and more attention is paid to the spatial structure information.
  • the extracted features are used for semantic segmentation, the semantic segmentation is more accurate.
  • the deeper the network the higher the computational complexity and complexity.
  • the feature layer in the neural network may be divided into a low-level feature layer and a high-level feature layer according to a preset standard, such as a calculation amount, and the preset standard may be adjusted according to actual needs.
  • a preset standard such as a calculation amount
  • the first 30 to the 30th (and possibly other numbers) of the 100 feature layers may be preset according to a preset.
  • the 31st to 100th feature layers are used as high-level feature layers.
  • the neural network may include a four-part convolution network (conv1 to conv4) and a classification layer, and each partial convolution network includes a plurality of convolution layers, which may According to the magnitude of the calculation, the convolutional layer in the PSPN from conv1 to conv4_3 is used as the lower layer feature layer, which accounts for about 1/8 of the PSPN, and at least one of the PSPN from conv4_4 to the classification layer.
  • conv1 to conv4 the convolutional layer in the PSPN from conv1 to conv4_3 is used as the lower layer feature layer, which accounts for about 1/8 of the PSPN, and at least one of the PSPN from conv4_4 to the classification layer.
  • the convolution layer is used as a high-level feature layer, which accounts for about 7/8 of the PSPN calculation; the classification layer is used to semantically segment the high-level features of the high-level feature layer output to obtain the semantic label of the frame, that is, at least one pixel in the frame. Classification.
  • the extraction of high-level features requires a second network layer with deep network depth, the computational difficulty and complexity are high, and if the semantic label of the frame is accurately obtained, the high-level features of the frame are required for semantic segmentation. Therefore, the present application In the embodiment, only the high-level feature extraction is performed on the key frame for semantic segmentation, and the high-level feature extraction is performed on the frame-by-frame in the video, which not only helps to reduce the computational difficulty and complexity, but also obtains the semantic segmentation result of the video.
  • the key frame scheduling method performs feature extraction on the current frame, obtains a low-level feature of the current frame, and acquires a current frame scheduling according to a lower layer feature of the adjacent previous key frame and a lower layer feature of the current frame.
  • the probability value determines whether the current frame is scheduled as a key frame according to the scheduling probability value of the current frame; if it is determined that the current frame is scheduled as a key frame, feature extraction is performed on the low-level features of the current key frame to obtain a high-level feature of the current key frame.
  • the embodiment of the present application may acquire the change of the current frame relative to the lower layer feature of the previous key frame according to the low layer feature of the previous key frame and the low layer feature of the current frame, and utilize the change of the low layer feature between different frames in the video, thereby Fast, accurate and adaptive key frame scheduling improves the scheduling efficiency of key frames.
  • the method may further include:
  • Determine the initial keyframe For example, specifying the first frame or any other frame in the video as the initial key frame;
  • Feature extraction is performed on the low-level features of the initial key frame by the second network layer to obtain high-level features of the initial key frame for semantic segmentation.
  • the method further includes: performing semantic segmentation on the initial key frame, and outputting a semantic tag of the key frame.
  • the method may further include: referencing the current frame as a current key frame, and buffering a low-level feature of the current key frame. , in order to determine whether other frames after the current key frame in the video are scheduled to be used as key frames.
  • the method may further include: referencing the current frame as a current key frame, and performing semantics on the current key frame. Split, output the semantic label of the current key frame.
  • a computationally expensive single frame model such as PSPN
  • PSPN may be invoked for semantic segmentation to obtain a high-precision semantic segmentation result.
  • the key frame and the current frame may share a low-level feature layer of the neural network (ie, the first network layer) for low-level feature extraction, where the neural network may adopt a Pyramid Scene Parsing Network (PSPN).
  • PSPN Pyramid Scene Parsing Network
  • the neural network may comprise a four-part convolutional network (conv1 to conv4) and a classification layer, each partial convolutional network being further divided into a plurality of convolutional layers, wherein the lower-layer feature layer of the neural network may comprise from the conv1 in the PSPN To the convolutional layer in conv4_3, accounting for about 1/8 of the PSPN; the high-level feature layer of the neural network (ie, the second network layer) may include at least one convolutional layer from conv4_4 to the classification layer, accounting for The PSPN is about 7/8 of the calculation amount for extracting the high-level features of the key frame; the classification layer is used for identifying the category of at least one pixel in the key frame based on the high-level feature of the key frame, thereby realizing the semantic segmentation of the key frame.
  • the lower-layer feature layer of the neural network may comprise from the conv1 in the PSPN To the convolutional layer in conv4_3, accounting for about 1/8 of the PSPN
  • FIG. 2 is another schematic flowchart of a key frame scheduling method according to an embodiment of the present application. As shown in FIG. 2, the key frame scheduling method of this embodiment includes:
  • feature extraction may be performed on a current frame by a low-level feature layer of a neural network to obtain a low-level feature of the current frame.
  • the step 202 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first feature extraction unit executed by the processor.
  • the low-level feature of the previous key frame is obtained by performing feature extraction on the previous key frame by the first network layer.
  • the scheduling probability value proposed in this embodiment is a probability that the current frame is scheduled as a key frame.
  • the step 204 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a scheduling unit that is executed by the processor.
  • the current frame is determined to be the current key frame, and operation 208 is performed. Otherwise, if it is determined whether the current frame is scheduled as a non-key frame, the subsequent process of this embodiment is not performed.
  • the applicant found that the difference between the low-level features between two frames (defined as the difference between the low-level features of two frames) is larger, and the difference value of the corresponding semantic tags The larger the proportion of the non-coincident part of the semantic label of the two frames is defined by the difference between the lower layer feature of the previous key frame adjacent to the current frame and the lower layer feature of the current frame.
  • the current frame may be set as a key frame (ie, scheduled as a key frame) to obtain a more accurate semantic result.
  • the step 206 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a determining unit that is executed by the processor.
  • the step 208 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second feature extraction unit and a cache unit that are executed by the processor.
  • the step 210 may be performed by a processor invoking a corresponding instruction stored in a memory or by a semantic segmentation unit executed by the processor.
  • the feature information of at least one frame in the video may be obtained by using a deep learning method, and the low-level feature is determined according to the difference between the low-level feature of the previous key frame adjacent to the current frame and the low-level feature of the current frame. Change, analyze the jitter between frames in the video, calculate the degree of coincidence between the current frame and the adjacent low-level features of the previous key frame. If the low-level features change greatly, the label jitter is large, and the jitter is small, so that the low-level features are passed. Regressing the degree of jitter of the semantic tag, thereby adaptively scheduling keyframes.
  • operation 104 or 204 may include:
  • the scheduling probability value of the current frame is obtained and output based on the splicing feature through the key frame scheduling network.
  • the embodiments of the present application can be used for an Internet driving product such as an automatic driving scene, a video monitoring scene, and a portrait segmentation, for example:
  • the objects in the video can be quickly segmented using embodiments of the present application, such as people and vehicles;
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 3 is a schematic structural diagram of a key frame scheduling apparatus according to an embodiment of the present application.
  • the key frame scheduling apparatus provided by the embodiment of the present application can be used to implement the key frame scheduling method provided by the foregoing embodiments of the present application.
  • the first feature extraction unit, the scheduling unit, the determining unit, and the second feature extraction unit are included. among them:
  • the first feature extraction unit includes a first network layer of the neural network, and performs feature extraction on the current frame to obtain a low-level feature of the current frame.
  • a scheduling unit configured to acquire a scheduling probability value of the current frame according to a lower layer feature of the previous key frame adjacent to the current frame and a lower layer feature of the current frame.
  • the low-level feature of the previous key frame is obtained by performing feature extraction on the previous key frame by the first network layer.
  • the scheduling probability value proposed in this embodiment is a probability that the current frame is scheduled as a key frame.
  • a determining unit configured to determine, according to a scheduling probability value of the current frame, whether the current frame is scheduled as a key frame.
  • the second feature extraction unit includes a second network layer of the neural network, configured to determine the current frame as the key frame and determine the current frame as the current key frame according to the determination result of the determining unit, and the low-level feature of the current key frame. Feature extraction is performed to obtain high-level features of the current key frame.
  • the network depth of the first network layer is shallower than the network depth of the second network layer.
  • the key frame scheduling apparatus performs feature extraction on the current frame to obtain a low-level feature of the current frame, and acquires a current frame scheduling according to the lower layer feature of the adjacent previous key frame and the lower layer feature of the current frame.
  • the probability value determines whether the current frame is scheduled as a key frame according to the scheduling probability value of the current frame; if it is determined that the current frame is scheduled as a key frame, feature extraction is performed on the low-level features of the current key frame to obtain a high-level feature of the current key frame.
  • the embodiment of the present application may acquire the change of the current frame relative to the lower layer feature of the previous key frame according to the low layer feature of the previous key frame and the low layer feature of the current frame, and utilize the change of the low layer feature between different frames in the video, thereby Fast, accurate and adaptive key frame scheduling improves the scheduling efficiency of key frames.
  • the foregoing previous key frame includes a predetermined initial key frame.
  • FIG. 4 is another schematic structural diagram of a key frame scheduling apparatus according to an embodiment of the present disclosure.
  • the key frame scheduling apparatus further includes: a buffer unit, configured to cache a low-level feature of the key frame, where the key frame in the embodiment of the present application includes Initial keyframe.
  • the first feature extraction unit may be further configured to cache a low-level feature of the current key frame in the cache unit according to the determination result obtained by the determining unit.
  • the scheduling unit may include: a splicing sub-unit, configured to splicing a low-level feature of the previous key frame and a low-level feature of the current frame to obtain a splicing feature;
  • the frame scheduling network is configured to acquire a scheduling probability value of the current frame based on the splicing feature.
  • the key frame scheduling apparatus may further include: a semantic segmentation unit, configured to perform semantic segmentation on the key frame, and output a semantic tag of the key frame, where the key frame in the embodiment of the present application may be Includes: initial keyframe, previous keyframe, or current keyframe.
  • the embodiment of the present application further provides an electronic device, including the key frame scheduling apparatus of any of the foregoing embodiments of the present application.
  • the embodiment of the present application further provides another electronic device, including:
  • the processor runs the key frame scheduling device
  • the units in the key frame scheduling device of any of the above embodiments of the present application are executed.
  • the embodiment of the present application further provides another electronic device, including: a processor and a memory;
  • the memory is configured to store at least one executable instruction that causes the processor to perform the operations of the steps in the key frame scheduling method of any of the above embodiments of the present application.
  • FIG. 5 is a schematic structural diagram of an application embodiment of an electronic device according to an embodiment of the present application.
  • a schematic structural diagram of an electronic device 500 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
  • the electronic device 500 includes one or more processors and a communication unit.
  • the one or more processors are, for example: one or more central processing units (CPUs) 501, and/or one or more image processing units (GPUs) 513, etc., the processors may be stored in a read only memory ( Various suitable actions and processes are performed by executable instructions in ROM) 502 or executable instructions loaded into random access memory (RAM) 503 from storage portion 508.
  • the communication unit 512 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the read-only memory 502 and/or the random access memory 503 to execute executable instructions, connect to the communication unit 512 via the bus 504, and communicate with other target devices via the communication unit 512, thereby completing the embodiments of the present application.
  • Corresponding operation of any one of the methods for example, performing feature extraction on the current frame through the first network layer of the neural network to obtain a lower layer feature of the current frame; and lower layer features and current frame according to the previous key frame adjacent to the current frame
  • the lower layer feature obtains the scheduling probability value of the current frame; wherein, the low layer feature of the previous key frame is extracted by the first network layer for the previous key frame; and the current frame is determined according to the scheduling probability value of the current frame.
  • the current frame is determined as the current key frame, and the second network layer of the neural network extracts features of the low-level features of the current key frame to obtain high-level features of the current key frame;
  • the network depth of the first network layer is shallower than the network depth of the second network layer.
  • RAM 503 various programs and data required for the operation of the device can be stored.
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • ROM 502 is an optional module.
  • the RAM 503 stores executable instructions, or writes executable instructions to the ROM 502 at runtime, and the executable instructions cause the central processing unit 501 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the communication unit 512 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • FIG. 5 is only an optional implementation manner.
  • the number and type of components in FIG. 5 may be selected, deleted, added, or replaced according to actual needs; Different function component settings may also be implemented by separate settings or integrated settings.
  • the GPU 513 and the CPU 501 may be separately configured or the GPU 513 may be integrated on the CPU 501.
  • the communication unit may be separately configured or integrated on the CPU 501 or the GPU 513. and many more.
  • an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing the instruction corresponding to the method step provided by the embodiment of the present application, for example, performing feature extraction on the current frame by using the first network layer of the neural network to obtain a low-level feature of the current frame; and lowering according to a previous key frame adjacent to the current frame The feature and the low-level feature of the current frame are used to obtain a scheduling probability value of the current frame.
  • the low-level feature of the previous key frame is obtained by feature extraction of the previous key frame by the first network layer; and the current frame is determined according to the scheduling probability value of the current frame. Whether it is scheduled as a key frame; if it is determined that the current frame is scheduled as a key frame, the current frame is determined as the current key frame, and the second network layer of the neural network performs feature extraction on the low-level features of the current key frame to obtain the current key frame. a high-level feature; wherein, in the neural network, the network depth of the first network layer is shallower than the second network The network depth of the layer.
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511. When the computer program is executed by the central processing unit (CPU) 501, the operations of the above-described functions defined in the method of the present application are performed.
  • the embodiment of the present application further provides a computer storage medium for storing a computer readable instruction, and when the instruction is executed, the operation of the key frame scheduling method of any of the foregoing embodiments of the present application is implemented.
  • the embodiment of the present application further provides a computer program, including computer readable instructions, when the computer readable instructions are run in the device, the processor in the device executes to implement any of the foregoing implementations of the present application.
  • the computer program is specifically a software product, such as a Software Development Kit (SDK), and the like.
  • SDK Software Development Kit
  • the embodiment of the present application further provides a computer program product for storing computer readable instructions, when executed, causing a computer to perform any of the above possible implementations.
  • the key frame scheduling method is not limited to:
  • the computer program product can be implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as an SDK or the like.
  • the methods and apparatus of the present application may be implemented in a number of ways.
  • the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种关键帧调度方法和装置、电子设备、程序和介质,其中,方法包括:通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征(102);根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值(104);根据当前帧的调度概率值确定当前帧是否被调度为关键帧(106);若确定当前帧被调度为关键帧,通过神经网络的第二网络层对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征(108);其中,神经网络中,第一网络层的网络深度浅于第二网络层的网络深度。所述方法利用了视频中不同帧之间的低层特征的变化,从而可以快速、准确、自适应的进行关键帧调度,提高了关键帧的调度效率。

Description

关键帧调度方法和装置、电子设备、程序和介质
本申请要求在2017年12月27日提交中国专利局、申请号为CN201711455838.X、发明名称为“关键帧调度方法和装置、电子设备、程序和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是一种关键帧调度方法和装置、电子设备、程序和介质。
背景技术
视频语义分割是计算机视觉和视频语义理解任务中的重要问题。视频语义分割模型在很多领域,例如自动驾驶,视频监控,以及视频目标分析等领域,有重要的应用。视频语义分割速度是视频语义分割任务中较为重要的一方面。
发明内容
本申请实施例供一种关键帧调度的技术方案。
根据本申请实施例的一个方面,提供的一种关键帧调度方法,包括:
通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征;
根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前帧的调度概率值;其中,所述前一关键帧的低层特征由所述第一网络层对所述前一关键帧进行特征提取得到,所述调度概率值为当前帧被调度为关键帧的概率;
根据所述当前帧的调度概率值确定所述当前帧是否被调度为关键帧;
若确定所述当前帧被调度为关键帧,将所述当前帧确定为当前关键帧,通过所述神经网络的第二网络层对所述当前关键帧的低层特征进行特征提取,获得所述当前关键帧的高层特征;其中,所述神经网络中,所述第一网络层的网络深度浅于所述第二网络层的网络深度。
可选地,在本申请上述任一方法实施例中,还包括:
确定初始关键帧;
通过所述第一网络层对所述初始关键帧进行特征提取,获得所述初始关键帧的低层特征并缓存;
通过所述第二网络层对所述初始关键帧的低层特征进行特征提取,获得所述初始关键帧的高层特征。
可选地,在本申请上述任一方法实施例中,还包括:
对所述初始关键帧进行语义分割,输出所述初始关键帧的语义标签。
可选地,在本申请上述任一方法实施例中,在若确定所述当前帧被调度为关键帧之后,还包括:
缓存所述当前关键帧的低层特征。
可选地,在本申请上述任一方法实施例中,所述根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前帧的调度概率值,包括:
将所述前一关键帧的低层特征和所述当前帧的低层特征进行拼接,得到拼接特征;
通过关键帧调度网络,基于所述拼接特征获取所述当前帧的调度概率值。
可选地,在本申请上述任一方法实施例中,还包括:
对所述当前关键帧进行语义分割,输出所述关键帧的语义标签。
根据本申请实施例的另一个方面,提供的一种关键帧调度装置,包括:
第一特征提取单元,包括神经网络的第一网络层,用于对当前帧进行特征提取,获得当前帧的低层特征;
调度单元,用于根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前帧的调度概率值;其中,所述前一关键帧的低层特征由所述第一网络层对所述前一关键帧进行特征提取得到,所述调度概率值为当前帧被调度为关键帧的概率;
确定单元,用于根据所述当前帧的调度概率值确定所述当前帧是否被调度为关键帧;
第二特征提取单元,包括所述神经网络的第二网络层,用于根据所述确定单元的确定结果,若确定所述当前帧被调度为关键帧,将所述当前帧确定为当前关键帧,对所述当前关键帧的低层特征进行特征提取,获得所述当前关键帧的高层特征;其中,所述神经网络中,所述第一网络层的网络深度浅于所述第二网络层的网络深度。
可选地,在本申请上述任一装置实施例中,所述前一关键帧包括预先确定的初始关键帧;
所述装置还包括:
缓存单元,用于缓存关键帧的低层特征和高层特征,所述关键帧包括所述初始关键帧。
可选地,在本申请上述任一装置实施例中,所述第一特征提取单元,还用于根据所述确定单元的确定结果,在所述缓存单元中缓存所述当前关键帧的低层特征。
可选地,在本申请上述任一装置实施例中,所述调度单元包括:
拼接子单元,用于将所述前一关键帧的低层特征和所述当前帧的低层特征进行拼接,得到拼接特征;
关键帧调度网络,用于基于所述拼接特征获取所述当前帧的调度概率值。
可选地,在本申请上述任一装置实施例中,所述装置还包括:
语义分割单元,用于对所述关键帧进行语义分割,输出所述关键帧的语义标签,所述关键帧包括:初始关键帧、所述前一关键帧或者所述当前关键 帧。
根据本申请实施例的又一个方面,提供的一种电子设备,包括:本申请任一实施例所述的关键帧调度装置。
根据本申请实施例的再一个方面,提供的一种电子设备,包括:
处理器和本申请任一实施例所述的关键帧调度装置;
在处理器运行所述关键帧调度装置时,本申请任一实施例所述的关键帧调度装置的单元被运行。
根据本申请实施例的再一个方面,提供的一种电子设备,包括:处理器和存储器;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施例所述的关键帧调度方法中各步骤的操作。
根据本申请实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的车辆驾驶模拟方法中各步骤的指令。
根据本申请实施例的再一个方面,提供的一种计算机可读介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一实施例所述的关键帧调度方法中各步骤的操作。
基于本申请上述实施例提供的关键帧调度方法和装置、电子设备、程序和介质,对当前帧进行特征提取,获得当前帧的低层特征,根据相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值;根据当前帧的调度概率值确定当前帧是否被调度为关键帧;若确定当前帧被调度为关键帧,对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征。本申请实施例可以根据前一关键帧的低层特征和当前帧的低层特征来获取当前帧相对于前一关键帧低层特征的变化,利用了视频中不同帧之间的低层特征的变化,从而可以快速、准确、自适应的进行关键帧调度,提高了关键帧的调度效率。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请实施例提供的关键帧调度方法的一个流程示意图。
图2为本申请实施例提供的关键帧调度方法的另一流程示意图。
图3为本申请实施例提供的关键帧调度装置的一个结构示意图。
图4为本申请实施例提供的关键帧调度装置的另一结构示意图。
图5为本申请实施例提供的电子设备的一个应用实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服 务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本申请实施例提供的关键帧调度方法的一个流程示意图。如图1所示,该实施例方法包括:
102,通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征。
可选地,当前帧可以是视频中的任意一帧图像。
在一个可选示例中,该步骤102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取单元执行。
104,根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值。
其中,前一关键帧的低层特征由上述第一网络层对该前一关键帧进行特征提取得到,可选地,本申请实施例提出的调度概率值为当前帧被调度为关键帧的概率。
在一个可选示例中,该步骤104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的调度单元执行。
106,根据当前帧的调度概率值确定当前帧是否被调度为关键帧。
在本申请实施例的一个可选示例中,可以根据当前帧的调度概率值是否 大于预设阈值,确定当前帧是否被调度为关键帧。例如,预设阈值为80%,如果当前帧的调度概率值大于或等于该预设阈值,确定当前帧被调度为关键帧,即:认为该当前帧为关键帧;如果当前帧的调度概率值小于该预设阈值,确定当前帧不被调度为关键帧。
在一个可选示例中,该步骤106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定单元执行。
108,若确定当前帧被调度为关键帧,将当前帧确定为当前关键帧,通过上述神经网络的第二网络层对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征。
其中,神经网络中,上述第一网络层的网络深度浅于上述第二网络层的网络深度。
在一个可选示例中,该步骤108可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取单元执行。
本申请实施例中,神经网络包括两个以上网络深度不同的网络层,神经网络包括的网络层中,用于进行特征提取的网络层可以称为特征层,神经网络接收到一个帧后,通过第一个特征层对输入的帧进行特征提取,并将其输入第二个特征层,从第二个特征层起,每个特征层依次对输入的特征进行特征提取,将提取到的特征输入至下一个网络层进行特征提取,直至得到能够用于进行语义分割的特征。神经网络中至少一个特征层的网络深度依据特征提取的顺序由浅至深,依据网络深度,可以将神经网络中用于进行特征提取的特征层划分为低层特征层和高层特征层两部分,即上述第一网络层和第二网络层。其中,低层特征层中的至少一个特征层依次进行特征提取最终输出的特征称为低层特征,高层特征层中的至少一个特征层依次进行特征提取最终输出的特征称为高层特征。相对于同一神经网络中网络深度较浅的特征层,网络深度较深的特征层视野域较大,较多的关注空间结构信息,提取到的特征用于语义分割时,使得语义分割更准确,然而,网络深度越深,计算难度 和复杂度越高。实际应用中,可以根据预设标准,例如计算量,将神经网络中的特征层划分为低层特征层和高层特征层,该预设标准可以根据实际需求调整。例如,对于一个包括100个依次连接的特征层的神经网络,可以根据预先设定,将该100个特征层中的第1个至第30个这前30个(也可以是其他数量)的特征层作为低层特征层,将第31个至第100个这后70个特征层作为高层特征层。例如,对于金字塔场景解析网络(Pyramid Scene Parsing Network,PSPN),该神经网络可以包括四部分卷积网络(conv1到conv4)和一个分类层,每一部分卷积网络又包括多个卷积层,可以依据计算量的大小,将该PSPN中从conv1到conv4_3中的卷积层作为低层特征层,其占了该PSPN约1/8的计算量,将该PSPN中从conv4_4到分类层前的至少一个卷积层作为高层特征层,其占了PSPN约7/8的计算量;分类层用于对高层特征层输出的高层特征进行语义分割,以获得帧的语义标签,即:帧中至少一个像素的分类。
由于高层特征的提取需要网络深度较深的第二网络层,其计算难度和复杂度较高,而若要精确获得帧的语义标签,又需要基于帧的高层特征进行语义分割,因此,本申请实施例中仅对关键帧进行高层特征提取以用于语义分割,相对于对视频中逐帧进行高层特征提取,不仅有利于减小计算难度和复杂性,还可以获得视频的语义分割结果。
基于本申请上述实施例提供的关键帧调度方法,对当前帧进行特征提取,获得当前帧的低层特征,根据相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值;根据当前帧的调度概率值确定当前帧是否被调度为关键帧;若确定当前帧被调度为关键帧,对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征。本申请实施例可以根据前一关键帧的低层特征和当前帧的低层特征来获取当前帧相对于前一关键帧低层特征的变化,利用了视频中不同帧之间的低层特征的变化,从而可以快速、准确、自适应的进行关键帧调度,提高了关键帧的调度效率。
另外,在本申请关键帧调度方法的另一个实施例中,在上述图1所示实施例之前,还可以包括:
确定初始关键帧。例如,指定视频中的第一帧或其他任意一帧为初始关键帧;
通过上述第一网络层对初始关键帧进行特征提取,获得初始关键帧的低层特征并缓存,后续可以基于该关键帧的低层特征调度其他帧是否为关键帧(可参照上述步骤102确定);
通过上述第二网络层对初始关键帧的低层特征进行特征提取,获得初始关键帧的高层特征以便用于语义分割。
可选地,在本申请关键帧调度方法的又一个实施例中,还可以包括:对上述初始关键帧进行语义分割,输出该关键帧的语义标签。
另外,在本申请实施例提供的关键帧调度方法的又一个实施例中,确定当前帧被调度为关键帧之后,还可以包括:将当前帧称为当前关键帧,缓存当前关键帧的低层特征,以便用于确定视频中当前关键帧之后的其他帧是否被调度为关键帧使用。
另外,在本申请实施例提供的关键帧调度方法的再一个实施例中,确定当前帧被调度为关键帧之后,还可以包括:将当前帧称为当前关键帧,对该当前关键帧进行语义分割,输出该当前关键帧的语义标签。本申请实施例中,对于关键帧,可以调用计算代价大的单帧模型,例如PSPN进行语义分割,从而获得高精度的语义分割结果。本申请实施例中,关键帧和当前帧可以共享神经网络的低层特征层(即:第一网络层)进行低层特征提取,此处的神经网络可以采用金字塔场景解析网络(Pyramid Scene Parsing Network,PSPN),该神经网络可以包括四部分卷积网络(conv1到conv4)和一个分类层,每一部分卷积网络又分为多个卷积层,其中,神经网络的低层特征层可以包括PSPN中从conv1到conv4_3中的卷积层,占了PSPN约1/8的计算量;神经网络的高层特征层(即:第二网络层)可以包括从conv4_4到分类层前的至 少一个卷积层,占了PSPN约7/8的计算量,用于提取关键帧的高层特征;分类层用于基于关键帧的高层特征对应识别关键帧中至少一个像素的类别,从而实现对关键帧的语义分割。
图2为本申请实施例提供的关键帧调度方法的另一流程示意图。如图2所示,该实施例的关键帧调度方法包括:
202,通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征。
在本申请实施例的一个示例中,可以通过神经网络的低层特征层对当前帧进行特征提取,获得当前帧的低层特征。
在一个可选示例中,该步骤202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取单元执行。
204,根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值。
其中,前一关键帧的低层特征由上述第一网络层对该前一关键帧进行特征提取得到,可选地,本申请实施例提出的调度概率值为当前帧被调度为关键帧的概率。
在一个可选示例中,该步骤204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的调度单元执行。
206,根据当前帧的调度概率值确定当前帧是否被调度为关键帧。
若确定当前帧被调度为关键帧,将当前帧确定为当前关键帧,执行操作208。否则,若确定当前帧是否被调度为非关键帧,不执行本实施例的后续流程。
申请人在实现本申请的过程中,通过研究发现,两帧之间低层特征之间的差异性(定义为两帧的低层特征之间的差值)越大,对应的语义标签的差异性值(定义为两帧的语义标签中非重合部分所占的比例)越大,本申请实施例通过与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征之间的 差异性,来确认当前帧是否被调度为关键帧。在两帧之间低层特征之间的差异性大于该预设阈值时,可以将当前帧设置为关键帧(即:调度为关键帧),以便获取较准确的语义结果。
在一个可选示例中,该步骤206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定单元执行。
208,通过上述神经网络的第二网络层对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征,并缓存当前关键帧的低层特征。
在一个可选示例中,该步骤208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取单元和缓存单元执行。
210,对当前关键帧进行语义分割,输出当前关键帧的语义标签。
在一个可选示例中,该步骤210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的语义分割单元执行。
申请人在实现本申请的过程中,通过研究发现,视频中帧之间的低层特征变化大则其进行语义分割获得的语义标签之间抖动大,反之抖动小。本申请实施例中,可以利用深度学习方法,获取视频中至少一帧的特征信息,根据当前帧相邻的前一关键帧的低层特征和当前帧的低层特征之间的差异性确定低层特征的变化,分析视频中帧之间的抖动情况,通过计算当前帧和相邻的前一关键帧低层特征之间的重合程度,若低层特征变化大则标签抖动大,反之抖动小,从而通过低层特征回归语义标签的抖动程度,由此自适应的调度关键帧。
在本申请上述任意实施例的一个可选示例中,操作104或204可以包括:
将前一关键帧的低层特征和当前帧的低层特征进行拼接,得到拼接特征;
通过关键帧调度网络,基于该拼接特征获取当前帧的调度概率值并输出。
本申请实施例可以用于自动驾驶场景、视频监控场景、人像分割等互联网娱乐产品等,例如:
1,在自动驾驶的场景下,可以利用本申请实施例将视频中的目标快速分 割出来,例如,人和车辆;
2,在视频监控场景中,可以将人快速的分割出来;
3,在人像分割等互联网娱乐产品中,可以快速的从视频帧中把人分割出来。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图3为本申请实施例提供的关键帧调度装置的一个结构示意图。本申请实施例提供的关键帧调度装置可用于实现本申请上述各实施例提供的关键帧调度方法。如图3所示,在关键帧调度装置的一个实施例中,包括:第一特征提取单元,调度单元,确定单元和第二特征提取单元。其中:
第一特征提取单元,包括神经网络的第一网络层,用于对当前帧进行特征提取,获得当前帧的低层特征。
调度单元,用于根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值。其中,前一关键帧的低层特征由第一网络层对前一关键帧进行特征提取得到,可选地,本申请实施例提出的调度概率值为当前帧被调度为关键帧的概率。
确定单元,用于根据当前帧的调度概率值确定当前帧是否被调度为关键帧。
第二特征提取单元,包括神经网络的第二网络层,用于根据确定单元的确定结果,若确定当前帧被调度为关键帧,将当前帧确定为当前关键帧,对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征。其中,神经网络中,上述第一网络层的网络深度浅于第二网络层的网络深度。
基于本申请上述实施例提供的关键帧调度装置,对当前帧进行特征提取, 获得当前帧的低层特征,根据相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值;根据当前帧的调度概率值确定当前帧是否被调度为关键帧;若确定当前帧被调度为关键帧,对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征。本申请实施例可以根据前一关键帧的低层特征和当前帧的低层特征来获取当前帧相对于前一关键帧低层特征的变化,利用了视频中不同帧之间的低层特征的变化,从而可以快速、准确、自适应的进行关键帧调度,提高了关键帧的调度效率。
在本申请实施例提供的关键帧调度装置的一个可选实施方式中,上述前一关键帧包括预先确定的初始关键帧。
图4为本申请实施例提供的关键帧调度装置的另一结构示意图。如图4所示,与图3所示实施例相比,在该实施例中,关键帧调度装置还包括:缓存单元,用于缓存关键帧的低层特征,本申请实施例中的关键帧包括初始关键帧。
另外,在基于本申请实施例提供的关键帧调度装置的又一个实施例中,第一特征提取单元还可用于根据确定单元获得的确定结果,在缓存单元中缓存当前关键帧的低层特征。
在本申请实施例提供的关键帧调度装置的一个实施方式中,调度单元可以包括:拼接子单元,用于将前一关键帧的低层特征和当前帧的低层特征进行拼接,得到拼接特征;关键帧调度网络,用于基于拼接特征获取当前帧的调度概率值。
另外,再参见图4,本申请实施例提供的关键帧调度装置还可以包括:语义分割单元,用于对关键帧进行语义分割,输出关键帧的语义标签,本申请实施例中的关键帧可以包括:初始关键帧、前一关键帧或者当前关键帧。
另外,本申请实施例还提供了一种电子设备,包括本申请上述任一实施例的关键帧调度装置。
另外,本申请实施例还提供了另一种电子设备,包括:
处理器和本申请上述任一实施例的关键帧调度装置;
在处理器运行关键帧调度装置时,本申请上述任一实施例的关键帧调度装置中的单元被运行。
另外,本申请实施例还提供了又一种电子设备,包括:处理器和存储器;
存储器用于存放至少一可执行指令,可执行指令使处理器执行本申请上述任一实施例的关键帧调度方法中各步骤的操作。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。图5为本申请实施例提供的电子设备的一个应用实施例的结构示意图。下面参考图5,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备500的结构示意图:如图5所示,电子设备500包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)501,和/或一个或多个图像处理器(GPU)513等,处理器可以根据存储在只读存储器(ROM)502中的可执行指令或者从存储部分508加载到随机访问存储器(RAM)503中的可执行指令而执行各种适当的动作和处理。通信部512可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡。
处理器可与只读存储器502和/或随机访问存储器503中通信以执行可执行指令,通过总线504与通信部512相连、并经通信部512与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征;根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值;其中,前一关键帧的低层特征由第一网络层对前一关键帧进行特征提取得到;根据当前帧的调度概率值确定当前帧是否被调度为关键帧;若确定当前帧被调度为关键帧,将当前帧确定为当前关键帧,通过神经网络的第二网络层对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征;其中,神经网络中,第一网络层的网络深度浅于第二网络层的网络 深度。
此外,在RAM 503中,还可存储有装置操作所需的各种程序和数据。CPU501、ROM502以及RAM503通过总线504彼此相连。在有RAM503的情况下,ROM502为可选模块。RAM503存储可执行指令,或在运行时向ROM502中写入可执行指令,可执行指令使中央处理单元501执行上述通信方法对应的操作。输入/输出(I/O)接口505也连接至总线504。通信部512可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。
需要说明的,如图5所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图5的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU513和CPU501可分离设置或者可将GPU513集成在CPU501上,通信部可分离设置,也可集成设置在CPU501或GPU513上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,通过神经网络的第一网络层对当前帧进行特征 提取,获得当前帧的低层特征;根据与当前帧相邻的前一关键帧的低层特征和当前帧的低层特征,获取当前帧的调度概率值;其中,前一关键帧的低层特征由第一网络层对前一关键帧进行特征提取得到;根据当前帧的调度概率值确定当前帧是否被调度为关键帧;若确定当前帧被调度为关键帧,将当前帧确定为当前关键帧,通过神经网络的第二网络层对当前关键帧的低层特征进行特征提取,获得当前关键帧的高层特征;其中,神经网络中,第一网络层的网络深度浅于第二网络层的网络深度。在这样的实施例中,该计算机程序可以通过通信部分509从网络上被下载和安装,和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时,执行本申请的方法中限定的上述功能的操作。
另外,本申请实施例还提供了一种计算机存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请上述任一实施例关键帧调度方法的操作。
另外,本申请实施例还提供了一种计算机程序,包括计算机可读取的指令,当计算机可读取的指令在设备中运行时,设备中的处理器执行用于实现本申请上述任一实施例关键帧调度方法中的步骤的可执行指令。
在一个可选实施方式中,所述计算机程序具体为软件产品,例如软件开发包(Software Development Kit,SDK),等等。
在一个或多个可选实施方式中,本申请实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任一可能的实现方式中所述的关键帧调度方法。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选例子中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选例子中,所述计算机程序产品具体体现为软件产品,例如SDK等等。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见 即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (16)

  1. 一种关键帧调度方法,其特征在于,包括:
    通过神经网络的第一网络层对当前帧进行特征提取,获得当前帧的低层特征;
    根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前帧的调度概率值;其中,所述前一关键帧的低层特征由所述第一网络层对所述前一关键帧进行特征提取得到,所述调度概率值为当前帧被调度为关键帧的概率;
    根据所述当前帧的调度概率值确定所述当前帧是否被调度为关键帧;
    若确定所述当前帧被调度为关键帧,将所述当前帧确定为当前关键帧,通过所述神经网络的第二网络层对所述当前关键帧的低层特征进行特征提取,获得所述当前关键帧的高层特征;其中,所述神经网络中,所述第一网络层的网络深度浅于所述第二网络层的网络深度。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    确定初始关键帧;
    通过所述第一网络层对所述初始关键帧进行特征提取,获得所述初始关键帧的低层特征并缓存;
    通过所述第二网络层对所述初始关键帧的低层特征进行特征提取,获得所述初始关键帧的高层特征。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    对所述初始关键帧进行语义分割,输出所述初始关键帧的语义标签。
  4. 根据权利要求1-3任一所述的方法,其特征在于,在若确定所述当前帧被调度为关键帧之后,还包括:
    缓存所述当前关键帧的低层特征。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前 帧的调度概率值,包括:
    将所述前一关键帧的低层特征和所述当前帧的低层特征进行拼接,得到拼接特征;
    通过关键帧调度网络,基于所述拼接特征获取所述当前帧的调度概率值。
  6. 根据权利要求1-5任一所述的方法,其特征在于,还包括:
    对所述当前关键帧进行语义分割,输出所述关键帧的语义标签。
  7. 一种关键帧调度装置,其特征在于,包括:
    第一特征提取单元,包括神经网络的第一网络层,用于对当前帧进行特征提取,获得当前帧的低层特征;
    调度单元,用于根据与所述当前帧相邻的前一关键帧的低层特征和所述当前帧的低层特征,获取所述当前帧的调度概率值;其中,所述前一关键帧的低层特征由所述第一网络层对所述前一关键帧进行特征提取得到,所述调度概率值为当前帧被调度为关键帧的概率;
    确定单元,用于根据所述当前帧的调度概率值确定所述当前帧是否被调度为关键帧;
    第二特征提取单元,包括所述神经网络的第二网络层,用于根据所述确定单元的确定结果,若确定所述当前帧被调度为关键帧,将所述当前帧确定为当前关键帧,对所述当前关键帧的低层特征进行特征提取,获得所述当前关键帧的高层特征;其中,所述神经网络中,所述第一网络层的网络深度浅于所述第二网络层的网络深度。
  8. 根据权利要求7所述的装置,其特征在于,所述前一关键帧包括预先确定的初始关键帧;
    所述装置还包括:
    缓存单元,用于缓存关键帧的低层特征,所述关键帧包括所述初始关键帧。
  9. 根据权利要求8所述的装置,其特征在于,所述第一特征提取单元, 还用于根据所述确定单元的确定结果,在所述缓存单元中缓存所述当前关键帧的低层特征。
  10. 根据权利要求7-9任一所述的装置,其特征在于,所述调度单元包括:
    拼接子单元,用于将所述前一关键帧的低层特征和所述当前帧的低层特征进行拼接,得到拼接特征;
    关键帧调度网络,用于基于所述拼接特征获取所述当前帧的调度概率值。
  11. 根据权利要求7-10任一所述的装置,其特征在于,所述装置还包括:
    语义分割单元,用于对所述关键帧进行语义分割,输出所述关键帧的语义标签,所述关键帧包括:初始关键帧、所述前一关键帧或者所述当前关键帧。
  12. 一种电子设备,其特征在于,包括:权利要求7-11任一所述的关键帧调度装置。
  13. 一种电子设备,其特征在于,包括:
    处理器和权利要求7-11任一所述的关键帧调度装置;
    在处理器运行所述关键帧调度装置时,权利要求7-11任一所述的关键帧调度装置中的单元被运行。
  14. 一种电子设备,其特征在于,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行权利要求1-6任一所述的关键帧调度方法中各步骤的操作。
  15. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-6任一所述的关键帧调度方法中各步骤的指令。
  16. 一种计算机可读介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-6任一所述的关键帧调度方法中各步骤的操作。
PCT/CN2018/123445 2017-12-27 2018-12-25 关键帧调度方法和装置、电子设备、程序和介质 WO2019128979A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020207005376A KR102305023B1 (ko) 2017-12-27 2018-12-25 키 프레임 스케줄링 방법 및 장치, 전자 기기, 프로그램과 매체
MYPI2020000416A MY182985A (en) 2017-12-27 2018-12-25 Keyframe scheduling method and apparatus, electronic device, program and medium
EP18897706.0A EP3644221A4 (en) 2017-12-27 2018-12-25 KEY IMAGE PLANNING PROCESS AND APPARATUS, ELECTRONIC DEVICE, PROGRAM AND SUPPORT
US16/633,341 US11164004B2 (en) 2017-12-27 2018-12-25 Keyframe scheduling method and apparatus, electronic device, program and medium
JP2020519444A JP6932254B2 (ja) 2017-12-27 2018-12-25 キーフレームスケジューリング方法及び装置、電子機器、プログラム並びに媒体
SG11202000578UA SG11202000578UA (en) 2017-12-27 2018-12-25 Keyframe scheduling method and apparatus, electronic device, program and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711455838.X 2017-12-27
CN201711455838.XA CN108229363A (zh) 2017-12-27 2017-12-27 关键帧调度方法和装置、电子设备、程序和介质

Publications (1)

Publication Number Publication Date
WO2019128979A1 true WO2019128979A1 (zh) 2019-07-04

Family

ID=62648208

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123445 WO2019128979A1 (zh) 2017-12-27 2018-12-25 关键帧调度方法和装置、电子设备、程序和介质

Country Status (8)

Country Link
US (1) US11164004B2 (zh)
EP (1) EP3644221A4 (zh)
JP (1) JP6932254B2 (zh)
KR (1) KR102305023B1 (zh)
CN (1) CN108229363A (zh)
MY (1) MY182985A (zh)
SG (1) SG11202000578UA (zh)
WO (1) WO2019128979A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229363A (zh) 2017-12-27 2018-06-29 北京市商汤科技开发有限公司 关键帧调度方法和装置、电子设备、程序和介质
JP7257756B2 (ja) * 2018-08-20 2023-04-14 キヤノン株式会社 画像識別装置、画像識別方法、学習装置、及びニューラルネットワーク
CN111862030B (zh) 2020-07-15 2024-02-09 北京百度网讯科技有限公司 一种人脸合成图检测方法、装置、电子设备及存储介质
DE102021204846B4 (de) 2021-05-12 2023-07-06 Robert Bosch Gesellschaft mit beschränkter Haftung Verfahren zum Steuern einer Robotervorrichtung
CN114222124B (zh) * 2021-11-29 2022-09-23 广州波视信息科技股份有限公司 一种编解码方法及设备
CN115908280B (zh) * 2022-11-03 2023-07-18 广东科力新材料有限公司 基于数据处理的pvc钙锌稳定剂的性能确定方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799684A (zh) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 一种视音频文件编目标引、元数据存储索引与搜索方法
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
US20160358628A1 (en) * 2015-06-05 2016-12-08 Apple Inc. Hierarchical segmentation and quality measurement for video editing
US20170011264A1 (en) * 2015-07-07 2017-01-12 Disney Enterprises, Inc. Systems and methods for automatic key frame extraction and storyboard interface generation for video
CN108229363A (zh) * 2017-12-27 2018-06-29 北京市商汤科技开发有限公司 关键帧调度方法和装置、电子设备、程序和介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003134450A (ja) * 2001-10-24 2003-05-09 Ricoh Co Ltd 代表フレーム画像検出装置及びそのプログラム
JP4546157B2 (ja) * 2004-06-03 2010-09-15 キヤノン株式会社 情報処理方法、情報処理装置、撮像装置
KR20160083127A (ko) * 2013-11-30 2016-07-11 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드 얼굴 이미지 인식 방법 및 시스템
US10387773B2 (en) * 2014-10-27 2019-08-20 Ebay Inc. Hierarchical deep convolutional neural network for image classification
US20160378863A1 (en) 2015-06-24 2016-12-29 Google Inc. Selecting representative video frames for videos
CN105095862B (zh) * 2015-07-10 2018-05-29 南开大学 一种基于深度卷积条件随机场的人体动作识别方法
WO2017166019A1 (en) * 2016-03-28 2017-10-05 Xiaogang Wang Method and system for pose estimation
CN107484017B (zh) * 2017-07-25 2020-05-26 天津大学 基于注意力模型的有监督视频摘要生成方法
US11577388B2 (en) * 2019-06-27 2023-02-14 Intel Corporation Automatic robot perception programming by imitation learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799684A (zh) * 2012-07-27 2012-11-28 成都索贝数码科技股份有限公司 一种视音频文件编目标引、元数据存储索引与搜索方法
US20160358628A1 (en) * 2015-06-05 2016-12-08 Apple Inc. Hierarchical segmentation and quality measurement for video editing
US20170011264A1 (en) * 2015-07-07 2017-01-12 Disney Enterprises, Inc. Systems and methods for automatic key frame extraction and storyboard interface generation for video
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
CN108229363A (zh) * 2017-12-27 2018-06-29 北京市商汤科技开发有限公司 关键帧调度方法和装置、电子设备、程序和介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3644221A4

Also Published As

Publication number Publication date
MY182985A (en) 2021-02-05
KR20200102409A (ko) 2020-08-31
EP3644221A1 (en) 2020-04-29
CN108229363A (zh) 2018-06-29
EP3644221A4 (en) 2020-10-28
KR102305023B1 (ko) 2021-09-24
JP6932254B2 (ja) 2021-09-08
US11164004B2 (en) 2021-11-02
JP2020536332A (ja) 2020-12-10
SG11202000578UA (en) 2020-02-27
US20200394414A1 (en) 2020-12-17

Similar Documents

Publication Publication Date Title
WO2019128979A1 (zh) 关键帧调度方法和装置、电子设备、程序和介质
US10909380B2 (en) Methods and apparatuses for recognizing video and training, electronic device and medium
WO2018121737A1 (zh) 关键点预测、网络训练及图像处理方法和装置、电子设备
CN108229322B (zh) 基于视频的人脸识别方法、装置、电子设备及存储介质
US11557085B2 (en) Neural network processing for multi-object 3D modeling
US20210241032A1 (en) Training Text Recognition Systems
CN108235116B (zh) 特征传播方法和装置、电子设备和介质
US11062453B2 (en) Method and system for scene parsing and storage medium
CN108229341B (zh) 分类方法和装置、电子设备、计算机存储介质
WO2018121777A1 (zh) 人脸检测方法、装置和电子设备
US20190156144A1 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
US11270158B2 (en) Instance segmentation methods and apparatuses, electronic devices, programs, and media
WO2019129032A1 (zh) 遥感图像识别方法、装置、存储介质以及电子设备
US11475636B2 (en) Augmented reality and virtual reality engine for virtual desktop infrastucture
WO2022161302A1 (zh) 动作识别方法、装置、设备、存储介质及计算机程序产品
CN113569740B (zh) 视频识别模型训练方法与装置、视频识别方法与装置
CN115699096A (zh) 跟踪增强现实设备
CN113139463B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
CN114093006A (zh) 活体人脸检测模型的训练方法、装置、设备以及存储介质
CN111914850B (zh) 图片特征提取方法、装置、服务器和介质
CN112348615A (zh) 用于审核信息的方法和装置
CN111311604A (zh) 用于分割图像的方法和装置
CN114202728B (zh) 一种视频检测方法、装置、电子设备及介质
CN113343979B (zh) 用于训练模型的方法、装置、设备、介质和程序产品
CN116311271B (zh) 文本图像的处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18897706

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018897706

Country of ref document: EP

Effective date: 20200122

ENP Entry into the national phase

Ref document number: 2020519444

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE