JP7228608B2

JP7228608B2 - Video frame processing method and processing device, electronic device, storage medium and computer program

Info

Publication number: JP7228608B2
Application number: JP2021028626A
Authority: JP
Inventors: リン・ティエンウェイ; リー・シン; リー・フー; ホー・ドンリアン; スン・ハオ; ヂャン・ホナン
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-22
Filing date: 2021-02-25
Publication date: 2023-02-24
Anticipated expiration: 2041-02-25
Also published as: CN111524166A; JP2021174530A; US11748895B2; EP3901909A1; CN111524166B; KR102607137B1; KR20210131225A; US20210334579A1

Description

本出願の実施形態はコンピュータ技術分野に関し、具体的には、コンピュータビジョン技術分野、特にビデオフレームの処理方法及び装置、電子機器、記憶媒体並びにコンピュータプログラムに関する。 TECHNICAL FIELD Embodiments of the present application relate to the field of computer technology, in particular to the field of computer vision technology, particularly to methods and devices for processing video frames, electronic devices, storage media and computer programs.

インターネット技術の発展に伴い、ビデオウェブサイトと生放送プラットフォームなどの様々なインターネットプラットフォームは次々現れる。ビデオ画面の表示形式を多様化にするために、画面に対して、特殊効果、スタイル変換などの様々な処理を行うことができる。 With the development of Internet technology, various Internet platforms such as video websites and live broadcasting platforms are emerging one after another. In order to diversify the display format of the video screen, various processing such as special effects, style conversion, etc. can be performed on the screen.

ビデオにおける連続的なビデオフレームは通常に画像処理の難点である。各ビデオフレームの処理結果に誤差が存在する可能性があり、誤差が大きい場合、連続的なビデオフレームの処理結果にブレの問題が存在する可能性がある。 Consecutive video frames in a video are usually an image processing difficulty. There may be errors in the processed results of each video frame, and if the errors are large, there may be a blur problem in the processed results of consecutive video frames.

ビデオフレームの処理方法、処理装置、電子機器、記憶媒体並びにコンピュータプログラムを提供する。 A video frame processing method, processing apparatus, electronic device, storage medium and computer program are provided.

第１態様において、ビデオにおいて隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローを利用して、先行フレームの特徴マップを変換し、変換済み特徴マップを取得することと、オプティカルフローの誤差に基づき、変換済み特徴マップの重みを確定し、変換済み特徴マップと後続フレームの特徴マップとの間にある特徴の加重結果に基づき、融合された特徴マップを取得することと、後続フレームの特徴マップを更新し、更新済み特徴マップは融合された特徴マップであることと、を含むビデオフレームの処理方法を提供する。 In a first aspect, transforming a feature map of a preceding frame using an optical flow generated based on adjacent preceding and succeeding frames in a video to obtain a transformed feature map; determining weights of the transformed feature map based on the error, obtaining a fused feature map based on the weighted result of the features between the transformed feature map and the feature map of the subsequent frame; updating the feature map, wherein the updated feature map is a fused feature map.

第２態様において、ビデオにおいて隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローを利用して、先行フレームの特徴マップを変換し、変換済み特徴マップを取得するように構成される変換ユニットと、オプティカルフローの誤差に基づき、変換済み特徴マップの重みを確定し、及び変換済み特徴マップと後続フレームの特徴マップとの間にある特徴の加重結果に基づき、融合された特徴マップを取得するように構成される融合ユニットと、後続フレームの特徴マップを更新するように構成される更新ユニットであって、更新済み特徴マップは融合された特徴マップである、更新ユニットと、を備えるビデオフレームの処理装置を提供する。 In a second aspect, a transform configured to transform a feature map of a previous frame using an optical flow generated based on adjacent previous and subsequent frames in the video to obtain a transformed feature map. Determine the weights of the transformed feature map based on units and optical flow errors, and obtain a fused feature map based on the weighted result of the features between the transformed feature map and the feature map of the subsequent frame. and an update unit configured to update a feature map of a subsequent frame, wherein the updated feature map is a fused feature map. is provided.

第３態様において、１つ又は複数のプロセッサと、１つ又は複数のプログラムを格納するための記憶装置と、を備える電子機器であって、１つ又は複数のプログラムが１つ又は複数のプロセッサにより実行されることにより、１つ又は複数のプロセッサにビデオフレームの処理方法のいずれかの実施形態に記載の方法が実装される電子機器を提供する。 In a third aspect, an electronic device comprising one or more processors and a storage device for storing one or more programs, wherein the one or more programs are executed by the one or more processors Provide an electronic device by which a method according to any embodiment of a method for processing video frames is implemented on one or more processors by being executed.

第４態様において、コンピュータプログラムが格納されているコンピュータ可読媒体であって、該コンピュータプログラムがプロセッサにより実行される際にビデオフレームの処理方法のいずれかの実施形態に記載の方法が実装されるコンピュータ可読媒体を提供する。 In a fourth aspect, a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the embodiments of the method of processing video frames. Provide a readable medium.

第５態様において、本出願の実施形態は、プロセッサにより実行されると、ビデオフレームの処理方法のいずれかの実施形態に記載の方法が実装される、コンピュータプログラムを提供する。 In a fifth aspect, embodiments of the present application provide a computer program product which, when executed by a processor, implements the method according to any of the embodiments of methods for processing video frames.

本出願の解決手段により、先行フレームのオプティカルフローの変換結果を利用して、隣接しているビデオフレーム同士間のオブジェクトの位置オフセットをなくし、画像処理後に、隣接するビデオフレーム同士間の画面ブレを効果的に回避できる。且つ、オプティカルフローの誤差に基づいて変換済み特徴マップの重みを確定することで、オプティカルフローの誤差による融合特徴の正確性低下という問題の発生を防止することに役く立つ。 The solution of the present application utilizes the optical flow transformation results of previous frames to eliminate object position offsets between adjacent video frames and eliminate screen blur between adjacent video frames after image processing. can be effectively avoided. In addition, determining the weights of the transformed feature map based on the optical flow error helps to prevent the problem of deterioration of the accuracy of the fused features due to the optical flow error.

以下の図面を合わせた非限定的な実施形態に対する詳しい説明を閲覧することを通して、本出願のほかの特徴、目的及びメリットがさらに明確になる。 Other features, objects and advantages of the present application will become more apparent through reading the detailed description of the following non-limiting embodiments in conjunction with the drawings.

図１は本出願の一部の実施形態を適用する例示的なシステムアーキテクチャを示す図である。FIG. 1 illustrates an exemplary system architecture to which some embodiments of the present application apply. 図２は本出願のビデオフレームの処理方法の１つの実施形態のフローチャートである。FIG. 2 is a flowchart of one embodiment of a method for processing video frames of the present application. 図３は本出願のビデオフレームの処理方法による１つの適用シーンのフローチャートである。FIG. 3 is a flowchart of one application scene according to the video frame processing method of the present application. 図４は本出願のビデオフレームの処理方法による変換済み特徴マップの重みを確定する１つの実施形態のフローチャートである。FIG. 4 is a flow chart of one embodiment of determining the weights of a transformed feature map according to the video frame processing method of the present application. 図５は本出願のビデオフレームの処理装置による１つの実施形態の構造概略図である。FIG. 5 is a structural schematic diagram of one embodiment according to the video frame processing device of the present application. 図６は本出願の実施形態のビデオフレームの処理方法が実装されるための電子機器のブロック図である。FIG. 6 is a block diagram of an electronic device for implementing the video frame processing method of the embodiments of the present application.

次に、図面を合わせて本出願の例示的な実施形態を説明し、理解に役く立つために本出願の実施形態の様々な詳細情報が含まれ、それらは例示的なものにすぎない。そのため、当業者であれば、ここで記載された実施形態に対して様々な変化と修正を行うことができ、本出願の範囲と精神から逸脱しないことを認識すべきである。同様に、明確かつ簡潔に説明するために、以下に公知機能と構造の説明が省略される。 Exemplary embodiments of the present application will now be described in conjunction with the drawings, and various details of the embodiments of the present application are included for ease of understanding and are exemplary only. As such, those skilled in the art should appreciate that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present application. Similarly, descriptions of well-known functions and structures are omitted below for clarity and concise description.

なお、矛盾しない限り、本出願における実施形態及び実施形態における特徴を互いに組み合わせることができる。次に図面を参照しながら実施形態と組み合わせて本出願を詳しく説明する。 It should be noted that the embodiments in the present application and the features in the embodiments can be combined with each other unless inconsistent. The present application will now be described in detail in combination with embodiments with reference to the drawings.

図１は、本出願のビデオフレームの処理方法又はビデオフレームの処理装置を適用できる実施形態の例示的なシステムアーキテクチャ１００を示す。 FIG. 1 shows an exemplary system architecture 100 of an embodiment to which the video frame processing method or video frame processing apparatus of the present application can be applied.

図１に示されるように、システムアーキテクチャ１００は端末装置１０１、１０２、１０３、ネットワーク１０４およびサーバ１０５を含んでもよい。ネットワーク１０４は端末装置１０１、１０２、１０３とサーバ１０５の間に通信リンクを提供する媒体として用いられる。ネットワーク１０４は有線、無線通信リンク又は光ファイバケーブルなどの様々な接続タイプを含んでもよい。 As shown in FIG. 1, system architecture 100 may include terminals 101 , 102 , 103 , network 104 and server 105 . Network 104 is used as a medium for providing communication links between terminals 101 , 102 , 103 and server 105 . Network 104 may include various connection types such as wired, wireless communication links, or fiber optic cables.

ユーザは、メッセージなどを受信又は送信するように、端末装置１０１、１０２、１０３を用いて、ネットワーク１０４を介してサーバ１０５とのインタラクションを行うことができる。端末装置１０１、１０２、１０３に、ビデオアプリケーション、ライブブロードキャストアプリケーション、インスタントメッセージングツール、電子メールクライアント、ソーシャルプラットフォームソフトウェアなどの様々な通信クライアントアプリケーションをインストールできる。 Users can interact with server 105 over network 104 using terminals 101 , 102 , 103 to receive or send messages and the like. Terminals 101, 102, 103 can be installed with various communication client applications such as video applications, live broadcast applications, instant messaging tools, email clients, social platform software, and so on.

ここで、端末装置１０１、１０２、１０３は、ハードウェアであってもよく、ソフトウェアであってもよい。端末装置１０１、１０２、１０３はハードウェアである場合、ディスプレイを有する様々な電子機器であってもよく、スマートフォン、タブレット、電子書籍リーダー、ラップトップコンピュータおよびデスクトップコンピュータなどを含むことができるが、それらに限定されない。端末装置１０１、１０２、１０３がソフトウェアである場合、上記電子機器にインストールすることができる。それは複数のソフトウェア又はソフトウェアモジュール（例えば、分散式サービスを提供するための複数のソフトウェア又はソフトウェアモジュール）として実装されてもよく、単一のソフトウェア又はソフトウェアモジュールとして実装されてもよい。ここで具体的に限定しない。 Here, the terminal devices 101, 102, and 103 may be hardware or software. If the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with displays, and may include smartphones, tablets, e-readers, laptop computers and desktop computers, etc., but they is not limited to If the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices. It may be implemented as multiple pieces of software or software modules (eg, multiple pieces of software or software modules for providing a distributed service), or it may be implemented as a single piece of software or software module. It is not specifically limited here.

サーバ１０５は端末装置１０１、１０２、１０３にサポートを提供するバックエンドサーバなどの様々なサービスを提供するサーバであってもよい。バックエンドサーバは、受信した隣接しているビデオフレームなどのデータに対して解析などの処理を行い、且つ処理結果（例えば、更新済み特徴マップ）を端末装置にフィードバックすることができる。 Server 105 may be a server that provides various services, such as a back-end server that provides support for terminals 101 , 102 , 103 . The back-end server can perform processing such as analysis on received data, such as adjacent video frames, and feed back processing results (eg, updated feature maps) to the terminal device.

なお、本出願の実施形態により提供されるビデオフレームの処理方法はサーバ１０５又は端末装置１０１、１０２、１０３により実行されることができ、対応して、ビデオフレームの処理装置はサーバ１０５又は端末装置１０１、１０２、１０３に設置されてもよい。 It should be noted that the video frame processing method provided by the embodiments of the present application can be executed by the server 105 or the terminal device 101, 102, 103, and correspondingly, the video frame processing device can be the server 105 or the terminal device. 101, 102, 103 may be installed.

図１における端末装置、ネットワークおよびサーバの数は例示的なものであることを理解すべきである。実装の必要性に応じて、任意数の端末装置、ネットワークおよびサーバを備えてもよい。 It should be understood that the numbers of terminals, networks and servers in FIG. 1 are exemplary. Any number of terminals, networks and servers may be provided depending on the needs of the implementation.

図２に示されるように、本出願のビデオフレームの処理方法による１つの実施形態のフロー２００が示される。該ビデオフレームの処理方法はステップ２０１～２０３を含む。 As shown in FIG. 2, a flow 200 of one embodiment according to the video frame processing method of the present application is shown. The video frame processing method includes steps 201-203.

ステップ２０１、ビデオにおける隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローを利用して、先行フレームの特徴マップを変換し、変換済み特徴マップを取得する。 Step 201, transform the feature map of the previous frame using the optical flow generated based on the adjacent previous and subsequent frames in the video to obtain a transformed feature map.

本実施形態において、ビデオフレームの処理方法が実装される実行主体（例えば、図１に示されるサーバ又は端末装置）は、ビデオにおける隣接している先行フレーム（ａｆｒａｍｅｏｆｐｒｅｖｉｏｕｓｔｉｍｅ、例えば第５フレーム）と後続フレーム（ａｆｒａｍｅｏｆｎｅｘｔｔｉｍｅ、例えば、カレントフレームである第６フレーム）に生成したオプティカルフローを利用して、取得された先行フレームの特徴マップを変換して、変換済み特徴マップを取得できる。該変換済み特徴マップと後続フレームの特徴マップは類似している（類似度は所定閾値より大きい）。実際に、該変換の過程は歪み（ｗａｒｐ）の過程と呼ばれてもよく、即ち、各画素点に位置オフセットを生成させ、該オフセットのオフセット量は確定されたオプティカルフローである。 In this embodiment, the execution entity (eg, the server or terminal device shown in FIG. 1) in which the video frame processing method is implemented recognizes a frame of previous time in the video, eg, the fifth frame ) and the optical flow generated in the subsequent frame (a frame of next time, for example, the sixth frame, which is the current frame), transform the acquired feature map of the preceding frame to acquire the transformed feature map. can. The transformed feature map and the feature map of the subsequent frame are similar (similarity greater than a predetermined threshold). In fact, the process of transformation may be called the process of warping, ie, it causes each pixel point to generate a position offset, the offset of which is the determined optical flow.

実際に、上記実行主体はビデオの隣接する２つのビデオフレームのうちの、先行フレームの特徴マップと後続フレームの特徴マップを取得できる。具体的に、上記実行主体はローカル又はほかの電子機器から後続フレームの特徴マップと先行フレームの特徴マップを直接取得できる。そのほか、上記実行主体は後続フレーム及び先行フレームを取得し、後続フレームと先行フレームを検出し、後続フレームの特徴マップと先行フレームの特徴マップを取得することができる。実際に、上記オプティカルフローは高密度オプティカルフロー（ｄｅｎｓｅｏｐｔｉｃａｌｆｌｏｗ）又は低密度オプティカルフロー（ｓｐａｒｓｅｏｐｔｉｃａｌｆｌｏｗ）であってもよい。 In fact, the above execution entity can obtain the feature map of the previous frame and the feature map of the subsequent frame of two adjacent video frames of the video. Specifically, the execution entity can directly obtain the feature map of the subsequent frame and the feature map of the previous frame from a local or other electronic device. In addition, the execution entity can obtain the subsequent frame and the preceding frame, detect the subsequent frame and the preceding frame, and obtain the feature map of the subsequent frame and the feature map of the preceding frame. In fact, the optical flow may be dense optical flow or sparse optical flow.

具体的に、上記オプティカルフローは様々な方式で取得できる。例えば、上記実行主体はローカル又はほかの電子機器から上記オプティカルフローを直接取得できる。そのほか、上記実行主体は上記先行フレームと後続フレームをさらに取得し、上記オプティカルフローを生成することができる。例えば、上記実行主体は先行フレームと後続フレームとの間の初期オプティカルフローを生成し、且つ該初期オプティカルフローに対して予め設定された処理を行って、上記オプティカルフローを取得することができる。 Specifically, the optical flow can be obtained in various ways. For example, the executor can obtain the optical flow directly from a local or other electronic device. In addition, the execution entity can further obtain the preceding frame and the subsequent frame to generate the optical flow. For example, the execution entity can generate an initial optical flow between a preceding frame and a succeeding frame, and perform preset processing on the initial optical flow to obtain the optical flow.

ステップ２０２、オプティカルフローの誤差に基づき、変換済み特徴マップの重みを確定し、変換済み特徴マップと後続フレームの特徴マップとの特徴の加重結果に基づき、融合された特徴マップを取得する。 Step 202, determine the weight of the transformed feature map according to the optical flow error, and obtain a fused feature map according to the feature weighting result of the transformed feature map and the feature map of the subsequent frame.

本実施形態において、上記実行主体は変換済み特徴マップの重みを確定し、且つ変換済み特徴マップと後続フレームの特徴マップに対して加重を行った加重結果を取得し、且つ加重結果に基づいて融合された特徴マップを取得することができる。 In the present embodiment, the execution entity determines the weight of the transformed feature map, obtains the weighted result of weighting the transformed feature map and the feature map of the subsequent frame, and fuses according to the weighted result. It is possible to obtain a feature map that has been processed.

オプティカルフローの誤差は生成されたオプティカルフローと実のオプティカルフローとの偏差を指す。上記実行主体はオプティカルフローを利用して縮小後の先行フレームを変換し、変換結果を縮小後の後続フレームと比較し、例えば、変換結果と縮小後の後続フレームとの差値又は絶対値を計算してオプティカルフローの誤差を求めてもよい。 Optical flow error refers to the deviation between the generated optical flow and the actual optical flow. The execution entity uses optical flow to transform the reduced preceding frame, compares the transformed result with the reduced subsequent frame, and calculates, for example, the difference or absolute value between the transformed result and the reduced subsequent frame. to obtain the optical flow error.

実際に、上記実行主体は様々な方式でオプティカルフローの誤差に基づいて変換済み特徴マップの重みを確定することができる。例えば、上記実行主体はオプティカルフローの誤差が小さい（所定の誤差閾値より小さい）ことに応答して、変換済み特徴マップの重みを所定の重み値、例えば、変換済み特徴マップの各候補重みのうちの最大候補重みにしてもよい。そのほか、上記実行主体は、モデル又は対応関係表など、オプティカルフローの誤差と変換済み特徴マップの重みとの対応関係を取得することによって、確定されたオプティカルフローの誤差に対応する変換済み特徴マップの重みを取得してもよい。 In fact, the performing entity can determine the weights of the transformed feature map based on the optical flow error in various ways. For example, the agent may reduce the weight of the transformed feature map to a predetermined weight value, e.g. may be the maximum candidate weight of . In addition, the above execution entity acquires the correspondence between the optical flow error and the weight of the transformed feature map, such as a model or a correspondence table, to obtain the transformed feature map corresponding to the determined optical flow error. Weights may be obtained.

実際に、上記実行主体は様々な方式で加重結果に基づいて融合された特徴マップを取得することができる。例えば、上記実行主体は加重結果に基づいて各特徴マップの特徴の加重平均値を確定し、且つ加重平均値を融合された特徴マップとしてもよい。そのほか、上記実行主体はさらに加重結果を直接融合された特徴マップとするか、或いは、加重結果に対して、所定の係数を掛けるなど、予め設定された処理を行い、ここで、加重に係る各特徴マップの重みの合計は１であってもよい。 In fact, the performing entity can obtain the fused feature map based on the weighted results in various ways. For example, the executing entity may determine a weighted average value of the features of each feature map according to the weighted results, and the weighted average value may be the fused feature map. In addition, the above-mentioned executing entity further performs a preset process, such as directly converting the weighted result into a merged feature map, or multiplying the weighted result by a predetermined coefficient, where each of the weighted The sum of the feature map weights may be one.

ステップ２０３、後続フレームの特徴マップを更新し、ここで、更新済み特徴マップは融合された特徴マップである。 Step 203, update the feature map of the subsequent frame, where the updated feature map is the fused feature map.

本実施形態において、上記実行主体は後続フレームの特徴マップを融合された特徴マップに更新することができる。実際に、上記実行主体はさらに、該融合された特徴マップを後続フレームの特徴マップとして、ディープニューラルネットワークに入力するなど、後続の画像処理を行うことができる。例えば、畳み込みニューラルネットワークの全結合層に入力し、そのほか、敵対的生成ネットワークのジェネレータに入力してもよい。 In this embodiment, the execution entity can update the feature map of subsequent frames to the fused feature map. In fact, the execution entity can further perform subsequent image processing, such as inputting the fused feature map as the feature map of the subsequent frame into a deep neural network. For example, it may be input to a fully connected layer of a convolutional neural network, or to a generator of a generative adversarial network.

本出願の上記実施形態により提供される方法は、画像処理後の隣接フレーム同士間の画面ブレを回避するために、先行フレームのオプティカルフローの変換結果を利用して、隣接フレーム同士間のオブジェクトの位置オフセットをなくする。また、オプティカルフローの誤差に基づいて変換済み特徴マップの重みを確定することで、オプティカルフローの誤差による融合特徴の精度が低いという問題の発生を防止することに役く立つ。 The method provided by the above embodiment of the present application utilizes the optical flow transformation result of the previous frame to avoid the screen blurring between the adjacent frames after image processing. Eliminate position offsets. In addition, determining the weights of the transformed feature map based on the optical flow error helps prevent the problem of low precision of the fused features due to the optical flow error.

本実施形態の一部のオプション的な実施形態において、上記方法は、変換済み特徴マップの重み及び後続フレームの特徴マップの重みに基づき、変換済み特徴マップと後続フレームの特徴マップに対して特徴の加重を行い、変換済み特徴マップと後続フレームの特徴マップとの特徴の加重結果を取得することをさらに含み得、ここで、変換済み特徴マップの重みが大きければ大きいほど、後続フレームの特徴マップの重みが小さい。 In some optional embodiments of this embodiment, the method includes adding features to the transformed feature map and the subsequent frame feature map based on the transformed feature map weight and the subsequent frame feature map weight. weighting to obtain a weighted result of the features of the transformed feature map and the feature map of the subsequent frame, wherein the greater the weight of the transformed feature map, the greater the weight of the feature map of the subsequent frame. small weight.

これらのオプション的な実施形態において、上記方法は、上記実行主体が変換済み特徴マップと後続フレームの特徴マップに対して加重を行うことをさらに含み得る。具体的に、変換済み特徴マップの重みと後続フレームの特徴マップの重みは互いに制約するものであってもよい。 In these optional embodiments, the method may further include the performing entity weighting the transformed feature map and the feature map of subsequent frames. Specifically, the transformed feature map weights and the subsequent frame feature map weights may constrain each other.

オプション的に、上記変換済み特徴マップの重みと後続フレームの特徴マップの重みとの合計は、所定数値であってもよい。具体的に、加重に係る各特徴マップの重みの合計は所定数値であってもよく、例えば１であってもよい。例えば、後続フレームの特徴マップの重みは所定数値から変換済み特徴マップの重みを引いた後のものであってもよく、例えば、所定数値は１であり、変換済み特徴マップの重みが１であれば、後続フレームの特徴マップの重みは０である。実際に、重みは１と０のみを含み得る。 Optionally, the sum of the transformed feature map weight and the feature map weight of the subsequent frame may be a predetermined number. Specifically, the sum of the weights of each feature map related to weighting may be a predetermined numerical value, such as one. For example, the weight of the feature map of the subsequent frame may be a predetermined number minus the weight of the transformed feature map, e.g., if the predetermined number is 1 and the weight of the transformed feature map is 1, For example, the weight of the feature map of the subsequent frame is zero. In practice, the weights may only contain 1's and 0's.

これらの実施形態は重み同士間の関係を限定することにより、さらに正確な融合された特徴マップを取得することができる。 These embodiments can obtain a more accurate fused feature map by limiting the relationship between the weights.

本実施形態の一部のオプション的な実施形態において、ステップ２０１は、敵対的生成ネットワークを利用して、ビデオの後続フレームの特徴マップと先行フレームの特徴マップを生成することを含んでもよく、及び方法は、敵対的生成ネットワークを利用して更新済み特徴マップを処理し、後続フレームに対応するターゲットドメインの画像を生成することをさらに含む。 In some optional embodiments of this embodiment, step 201 may comprise generating a feature map for the subsequent frame and a feature map for the previous frame of the video using an adversarial generation network; and The method further includes processing the updated feature map using a generative adversarial network to generate images of the target domain corresponding to subsequent frames.

これらのオプション的な実施形態において、上記実行主体は敵対的生成ネットワークを利用して、後続フレームの特徴マップと先行フレームの特徴マップを生成することができる。そのほか、上記実行主体はさらに後続フレームの特徴マップを融合された特徴マップに更新した後、敵対的生成ネットワークを利用して更新済み特徴マップを処理することで、ターゲットドメイン（ｄｏｍａｉｎ）の画像を生成することができる。ここで、敵対的生成ネットワークはターゲットドメインの画像の生成に用いられる。 In these optional embodiments, the actor can utilize an adversarial generation network to generate feature maps for subsequent frames and feature maps for previous frames. In addition, the execution entity further updates the feature map of the subsequent frame to the fused feature map, and then processes the updated feature map using the adversarial generation network to generate an image of the target domain. can do. Here, a generative adversarial network is used to generate an image of the target domain.

これらの実施形態は敵対的生成ネットワークによる処理後の連続的なビデオフレームにおけるオブジェクトのブレを回避でき、敵対的生成ネットワークが複数のビデオフレームを一括して処理できないことによる隣接フレームの処理結果の間の差異によりもたらした画面ブレの問題が解消され、ビデオ画面の安定性が向上される。 These embodiments can avoid blurring of objects in successive video frames after processing by a generative adversarial network, and between processing results of adjacent frames due to the inability of a generative adversarial network to process multiple video frames in batches. The problem of screen blurring caused by the difference in video resolution is eliminated, and the stability of the video screen is improved.

次に、図３に示されるように、図３は本実施形態のビデオフレームの処理方法による１つの適用シーンの１つの概略図である。図３の適用シーンにおいて、実行主体３０１はビデオにおける隣接している先行フレーム（例えば第７フレーム）と後続フレーム（例えば第８フレーム）に基づいて生成したオプティカルフロー３０２を利用して、先行フレームの特徴マップを変換し、変換済み特徴マップ３０３を取得し、ここで、特徴マップのサイズは目標サイズ３２＊３２である。実行主体３０１はオプティカルフロー３０２の誤差に基づき、変換済み特徴マップ３０３の重みを確定し、変換済み特徴マップ３０３と後続フレームの特徴マップとの特徴の加重結果に基づき、融合された特徴マップ３０４を取得する。実行主体３０１は後続フレームの特徴マップ３０５を更新し、ここで、更新済み特徴マップは融合された特徴マップ３０４である。 Next, as shown in FIG. 3, FIG. 3 is a schematic diagram of one application scene according to the video frame processing method of the present embodiment. In the application scene of FIG. 3, the execution entity 301 uses the optical flow 302 generated based on the adjacent previous frame (eg, the 7th frame) and the subsequent frame (eg, the 8th frame) in the video to Transform the feature map to obtain a transformed feature map 303, where the size of the feature map is the target size 32*32. The execution entity 301 determines the weights of the transformed feature map 303 based on the error of the optical flow 302, and generates the fused feature map 304 based on the feature weighting result of the transformed feature map 303 and the feature map of the subsequent frame. get. Actor 301 updates the feature map 305 of the subsequent frame, where the updated feature map is the fused feature map 304 .

本実施形態の一部のオプション的な実施形態において、オプティカルフローは高密度オプティカルフローであってもよい。本出願のビデオフレームの処理方法は、先行フレームと後続フレームを特徴マップのサイズに縮小し、縮小後の先行フレームと縮小後の後続フレームとの間の高密度オプティカルフローを確定し、且つ該高密度オプティカルフローを、ビデオにおける隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローとすることをさらに含む。 In some optional embodiments of this embodiment, the optical flow may be dense optical flow. A video frame processing method of the present application reduces a preceding frame and a following frame to the size of a feature map, establishes a high density optical flow between the reduced preceding frame and the reduced succeeding frame, and The method further includes making the density optical flow an optical flow generated based on adjacent preceding and following frames in the video.

これらのオプション的な実施形態において、ビデオフレームの処理方法の実行主体（例えば、図１に示されたサーバ又は端末装置）はビデオの先行フレームと後続フレームのサイズを縮小することにより、後続フレームと先行フレームのサイズを特徴マップのサイズに縮小することができる。具体的に、取得された後続フレームの特徴マップと先行フレームの特徴マップのサイズが一致する。その後、上記実行主体は縮小後の先行フレームと縮小後の後続フレームとの間にあるオプティカルフローを確定することができる。ここで、先行フレームの特徴マップのサイズと、変換済み特徴マップのサイズはいずれも上記縮小後のビデオフレームのサイズである。例を挙げてみると、先行フレームと後続フレームはそれぞれビデオの第９フレームと第１０フレームであり、上記実行主体は第９フレームと第１０フレームを３２＊３２に、即ち特徴マップのサイズに縮小してもよい。 In these optional embodiments, the entity executing the video frame processing method (eg, the server or terminal shown in FIG. 1) reduces the size of the preceding and following frames of the video so that the following and The size of the previous frame can be reduced to the size of the feature map. Specifically, the size of the acquired feature map of the subsequent frame and the feature map of the preceding frame match. The execution entity can then determine the optical flow between the reduced previous frame and the reduced subsequent frame. Here, both the size of the feature map of the previous frame and the size of the transformed feature map are the size of the video frame after the reduction. To give an example, the preceding and following frames are the 9th and 10th frames of the video respectively, and the above execution entity reduces the 9th and 10th frames to 32*32, i.e. the size of the feature map. You may

これらの実施形態は高密度オプティカルフローで画素点毎に特徴の融合を行うことにより、融合された特徴マップの精度を高めることができる。 These embodiments can increase the accuracy of the fused feature map by performing feature fusion on a pixel-by-pixel basis with dense optical flow.

さらに、図４に示されるように、ビデオフレームの処理方法における変換済み特徴マップの重みを確定する１つの実施形態のフロー４００が示される。該フロー４００はステップ４０１～４０４を含む。 Further, as shown in FIG. 4, an embodiment flow 400 for determining transformed feature map weights in a video frame processing method is shown. The flow 400 includes steps 401-404.

ステップ４０１、高密度オプティカルフローを利用して、縮小後の先行フレームを変換し、変換済み先行フレームを取得する。 Step 401, using dense optical flow to transform the reduced predecessor frame to obtain a transformed predecessor frame.

本実施形態において、上記オプティカルフローは高密度オプティカルフローである。ビデオフレームの処理方法が実装される実行主体（例えば、図１に示されるサーバ又は端末装置）は高密度オプティカルフローを利用して、縮小後の先行フレームを変換し、変換結果を変換済み先行フレームとすることができる。変換済み先行フレームは縮小後の後続フレームと類似している。高密度オプティカルフローは即ちコンパクトオプティカルフロー（ｃｏｍｐａｃｔｏｐｔｉｃａｌｆｌｏｗ）である。高密度オプティカルフローは画素点毎に縮小後の後続フレームと縮小後の先行フレームとの位置オフセットを確定できる。 In this embodiment, the optical flow is high density optical flow. An execution entity (e.g., the server or terminal device shown in FIG. 1) in which the video frame processing method is implemented transforms the reduced preceding frame using Dense Optical Flow, and transforms the result into the transformed preceding frame. can be The transformed predecessor frame is similar to the downscaled subsequent frame. Dense optical flow is ie compact optical flow. Dense optical flow can determine the positional offset between the reduced subsequent frame and the reduced previous frame for each pixel point.

ステップ４０２、各座標の画素点の、変換済み先行フレームと縮小後の後続フレームにおける画素値の差値に基づき、高密度オプティカルフローの該座標にある画素点の誤差を確定する。 Step 402, determine the error of the pixel point at the coordinate of the high-density optical flow according to the pixel value difference between the transformed previous frame and the reduced subsequent frame of the pixel point at each coordinate.

本実施形態において、上記実行主体は画素点毎に高密度オプティカルフローを確定できる。各画素点に対して、該座標の画素点の、変換済み先行フレームと縮小後の後続フレームにおける画素値の差値に基づき、該座標の画素点の高密度オプティカルフローの誤差を確定する。 In this embodiment, the execution subject can determine the high-density optical flow for each pixel point. For each pixel point, determine the dense optical flow error of the pixel point at that coordinate based on the difference between the pixel values of the pixel point at that coordinate in the transformed previous frame and the reduced subsequent frame.

実際に、上記実行主体は、様々な方式で上記差値に基づき高密度オプティカルフローの誤差を確定することができる。例えば、上記実行主体は該差値の絶対値を高密度オプティカルフローの誤差として確定してもよく、該差値を高密度オプティカルフローの誤差として直接確定してもよい。そのほか、上記実行主体はさらに該差値に対して予め設定された処理を行い、例えば、所定係数を掛けるか又は所定関数に入力して、その結果を高密度オプティカルフローの誤差としてもよい。 In practice, the executing entity can determine the error of the dense optical flow based on the difference value in various ways. For example, the execution entity may determine the absolute value of the difference value as the error of the dense optical flow, or directly determine the difference value as the error of the dense optical flow. In addition, the execution subject may further perform a preset process on the difference value, for example, multiply it by a predetermined coefficient or input it to a predetermined function, and use the result as the error of the high-density optical flow.

ステップ４０３、特定の画像を高密度オプティカルフローで変換して得た画像におけるその座標の画素点に対して、特定の画像に、同じオブジェクトが含まれた画素点が存在するか否かを判定して、判定の結果を取得し、ここで、特定の画像のサイズは目標サイズである。 Step 403, it is determined whether a pixel point including the same object exists in the specific image for the pixel point of the coordinate in the image obtained by transforming the specific image with the high-density optical flow. to obtain the result of determination, where the size of the specific image is the target size.

本実施形態において、上記実行主体は高密度オプティカルフローを利用して変換した後の画像に、変換前の該座標の画素点に比して同じオブジェクトが含まれるか否かを判定できる。即ち、ビデオにおける隣接している前後フレームにおいて、オブジェクトに位置の変更がある可能性があり、後続フレームにおける一部の内容は先行フレームに現れなかった新しい内容である可能性がある。該ステップによれば隣接フレームにおける同じ内容を探し出すことができる。実際に、ここで特定の画像は、上記縮小後の先行フレームなど、様々な画像であってもよい。 In this embodiment, the execution entity can determine whether the image after transformation using high-density optical flow contains the same object as compared to the pixel point of the coordinates before transformation. That is, in adjacent frames before and after the video, the object may have changed position, and some content in the subsequent frame may be new content that did not appear in the previous frame. This step allows searching for the same content in adjacent frames. In fact, the particular image here may be various images, such as the previous frame after the reduction mentioned above.

本実施形態の一部のオプション的な実施形態において、ステップ４０３は、各画素点の画素値がいずれも所定画素値である特定の画像を取得することと、高密度オプティカルフローを利用して、特定の画像を変換し、変換済み特定画像を取得することと、変換済み特定画像における各座標の画素点に対して、変換済み特定画像における該座標の画素点の画素値が所定画素値以上であるか否かを判定することと、変換済み特定画像における該座標の画素点の画素値が所定画素値以上である場合、判定の結果は同じオブジェクトが含まれた画素点が存在することであると判定することと、変換済み特定画像における該座標の画素点の画素値が所定画素値未満である場合、判定の結果は同じオブジェクトが含まれる画素点が存在しないことであると判定することを含んでもよい。 In some optional embodiments of this embodiment, step 403 uses obtaining a specific image in which the pixel values of each pixel point are all predetermined pixel values, and using dense optical flow, transforming a specific image to obtain a transformed specific image; and if the pixel value of the pixel point at the coordinates in the transformed specific image is equal to or greater than a predetermined pixel value, the result of the determination is that the pixel point containing the same object exists. and determining that if the pixel value of the pixel point at the coordinates in the transformed specific image is less than a predetermined pixel value, the result of the determination is that there is no pixel point that includes the same object. may contain.

これらのオプション的な実施形態において、上記実行主体は特定の画像を取得し、該特定の画像におけるすべての画素点の画素値は所定画素値、例えば１又は２であってもよい。上記実行主体は高密度オプティカルフローを利用して、該特定の画像を変換し、変換済み特定画像を取得できる。このように、変換済み特定画像は特定の画像に対して、画像中のオブジェクトに位置オフセットが生じた可能性がある。１つの座標の画素点は、変換後の画像における画素値が該所定画素値以上である場合、該画素点の内容としてのオブジェクトは変換済み特定画像に存在するだけでなく、特定の画像にも存在する。該座標の画素点は、変換後の画像における画素値が該所定画素値未満である場合、該画素点の内容としてのオブジェクトは変換済み特定画像のみに存在し、特定の画像に存在しない。 In these optional embodiments, the executor obtains a particular image, and the pixel values of all pixel points in the particular image may be predetermined pixel values, eg, 1 or 2. The execution entity can use high-density optical flow to transform the specific image to obtain a transformed specific image. Thus, the transformed particular image may have had positional offsets to objects in the image relative to the particular image. For a pixel point of one coordinate, if the pixel value in the converted image is equal to or greater than the predetermined pixel value, the object as the content of the pixel point not only exists in the converted specific image, but also in the specific image. exist. If the pixel value of the pixel point at the coordinates is less than the predetermined pixel value in the image after conversion, the object as the content of the pixel point exists only in the transformed specific image and does not exist in the specific image.

これらの実施形態は画素値がいずれもある数値である特定の画像に基づいて判定の結果を取得することができ、計算過程を簡素化し、解決手段の処理効率を高める。 These embodiments can obtain the determination results based on a specific image in which the pixel values are all numerical values, which simplifies the calculation process and enhances the processing efficiency of the solution.

ステップ４０４、高密度オプティカルフローの誤差と判定の結果に基づき、変換済み特徴マップにおけるその座標の画素点の重みを確定する。 Step 404, determine the weight of the pixel point at that coordinate in the transformed feature map according to the error of the dense optical flow and the judgment result.

本実施形態において、上記実行主体は高密度オプティカルフローの誤差と判定の結果に基づき、変換済み特徴マップにおけるその座標の画素点の重みを確定することができる。実際に、上記実行主体は様々な方式で上記誤差と判定の結果に基づいて、その重みを確定することができる。例えば、上記実行主体は上記誤差、判定の結果、上記重みとの対応関係（例えば、対応関係表又はモデル）を取得することで、上記誤差と判定の結果に対応する重みを取得することができる。 In this embodiment, the execution entity can determine the weight of the pixel point at that coordinate in the transformed feature map based on the error of the dense optical flow and the judgment result. In fact, the executing entity can determine its weight according to the error and the judgment result in various ways. For example, the execution entity acquires the correspondence between the error, the judgment result, and the weight (for example, a correspondence table or model), thereby obtaining the weight corresponding to the error and the judgment result. .

本実施形態は高密度オプティカルフローを利用して、画素点毎に特徴の融合を行うことができ、画面ブレの防止効果が向上される。且つ、本実施形態は高密度オプティカルフローの誤差及び判定の結果の両者に基づいて先行フレームの変換済み特徴マップの重みを確定することで、融合された特徴マップの精度を高めることができる。 The present embodiment uses high-density optical flow to fuse features for each pixel point, thereby improving the effect of preventing screen blur. Moreover, the present embodiment determines the weight of the transformed feature map of the previous frame based on both the error of the dense optical flow and the result of the determination, so that the accuracy of the fused feature map can be enhanced.

本実施形態の一部のオプション的な実施形態において、ステップ４０４は、高密度オプティカルフローの誤差が所定の誤差閾値未満であることと、判定の結果が同じオブジェクトが含まれた画素点が存在することであるとの判定に応答して、変換済み特徴マップにおける該座標の画素点の重みは第１候補重みであると判定し、ここで、縮小後の後続フレームにおける該座標の画素点の画素値が大きければ大きいほど、所定の誤差閾値が大きくなり、高密度オプティカルフローの誤差が所定の誤差閾値以上であり、及び／又は判定の結果が同じオブジェクトが含まれた画素点が存在しないことであるとの判定に応答して、変換済み特徴マップにおける該座標の画素点の重みが第２候補重みであると判定し、ここで、第１候補重みは第２候補重みより大きいことを含んでもよい。 In some optional embodiments of this embodiment, step 404 determines that the error of the dense optical flow is less than a predetermined error threshold and that there are pixel points that contain the same object. , determining that the weight of the pixel point at the coordinate in the transformed feature map is the first candidate weight, wherein the pixel at the pixel point at the coordinate in the subsequent frame after reduction is The larger the value, the larger the predetermined error threshold, and the error of the high-density optical flow is equal to or greater than the predetermined error threshold, and/or there is no pixel point that includes the same object as the determination result. responsive to determining that there is, determining that the weight of the pixel point at the coordinate in the transformed feature map is the second candidate weight, wherein the first candidate weight is greater than the second candidate weight; good.

これらのオプション的な実施形態において、高密度オプティカルフローの誤差が所定の誤差閾値未満であることと、判定の結果が同じオブジェクトが含まれた画素点が存在することとの両方を満たす場合、変換済み特徴マップにおける該座標の画素点の重みが第１候補重みであると判定する。ここで、第１候補重みの数値が大きく、例えば、１であってもよく、第２候補重みの数値が小さく、例えば０であってもよい。 In these optional embodiments, if both the error of the dense optical flow is less than a predetermined error threshold and the result of the determination is that there are pixel points containing the same object, then the transform It is determined that the weight of the pixel point at the coordinates in the finished feature map is the first candidate weight. Here, the numerical value of the first candidate weight may be large, such as 1, and the numerical value of the second candidate weight may be small, such as 0.

実際に、所定の誤差閾値は縮小後の後続フレームにおける該座標点の画素値に関連付けられてもよく、該画素値が大きければ大きいほど、所定の誤差閾値が大きい。例えば、縮小後の後続フレームにおける該座標点の画素の画素値がＡに設定されると、所定の誤差閾値はａ＊Ａ＋ｂであってもよく、ここで、ａはＡの所定係数であり、０～１であってもよく、ｂは０より大きい所定定数であってもよい。 In practice, the predetermined error threshold may be associated with the pixel value of the coordinate point in the subsequent frame after downscaling, the larger the pixel value, the greater the predetermined error threshold. For example, if the pixel value of the pixel of the coordinate point in the subsequent frame after reduction is set to A, then the predetermined error threshold may be a*A+b, where a is a predetermined coefficient of A; It may be 0 to 1, and b may be a predetermined constant greater than 0.

これらの実施形態は高密度オプティカルフローにおける誤差が小さいことと、画素点の内容が前後フレームにいずれも存在することとの両方を満たす場合にのみ、先行フレームの変換後の処理結果を引き続き用い、このようにして、オプティカルフローの誤差が大きすぎることによる融合された特徴マップにおける特徴の位置オフセットを回避でき、同時に、融合される時に後続フレームにおける新しい内容が先行フレームの内容により代替されることで生じた画面のエラーを回避でき、画面における内容の正確性が確保される。 These embodiments continue to use the post-transformation processing result of the preceding frame only if both the error in Dense Optical Flow is small and the content of the pixel point is present in both the preceding and succeeding frames; In this way, positional offsets of features in the fused feature map due to too large optical flow errors can be avoided, while at the same time the new content in subsequent frames will be replaced by the content of the previous frame when fused. Errors on the screen that occur can be avoided and the correctness of the content on the screen is ensured.

さらに図５に示されるように、上記各図に示される方法の実装として、本出願はビデオフレームの処理装置の１つの実施形態を提供し、該装置の実施形態は図２に示される方法の実施形態に対応し、下記に記載の特徴に加え、該装置の実施形態はさらに図２に示された方法の実施形態と同じであるか又はそれに対応する特徴又は効果を含んでもよい。該装置は具体的に様々な電子機器の中に適用され得る。 Further, as shown in FIG. 5, as an implementation of the method shown in the above figures, the present application provides one embodiment of a video frame processing apparatus, the embodiment of the apparatus is the method shown in FIG. In addition to the features corresponding to the embodiments and described below, the apparatus embodiments may also include features or effects that are the same as or correspond to the method embodiments shown in FIG. The device can be specifically applied in various electronic devices.

図５に示されるように、本実施形態のビデオフレームの処理装置５００は変換ユニット５０１、融合ユニット５０２及び更新ユニット５０３を備える。ここで、変換ユニット５０１は、ビデオにおいて、隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローを利用して、先行フレームの特徴マップを変換し、変換済み特徴マップを取得するように構成され、融合ユニット５０２は、オプティカルフローの誤差に基づき、変換済み特徴マップの重みを確定し、及び変換済み特徴マップと後続フレームの特徴マップとの特徴の加重結果に基づき、融合された特徴マップを取得するように構成され、更新ユニット５０３は、後続フレームの特徴マップを更新するように構成され、ここで、更新済み特徴マップは融合された特徴マップである。 As shown in FIG. 5, the video frame processing apparatus 500 of this embodiment comprises a transform unit 501 , a fusion unit 502 and an update unit 503 . Here, the transformation unit 501 transforms the feature map of the preceding frame using the optical flow generated based on the adjacent preceding and following frames in the video to obtain a transformed feature map. A fusion unit 502 determines the weight of the transformed feature map based on the optical flow error, and the fused feature map based on the feature weighting result of the transformed feature map and the feature map of the subsequent frame. and the updating unit 503 is configured to update the feature map of the subsequent frame, where the updated feature map is the fused feature map.

本実施形態において、ビデオフレームの処理装置５００の変換ユニット５０１、融合ユニット５０２及び更新ユニット５０３の具体的な処理及びそれによる技術的効果はそれぞれ図２に対応する実施形態におけるステップ２０１、ステップ２０２およびステップ２０３の関連説明を参照することができるため、ここでその説明を省略する。 In this embodiment, the specific processing and technical effects of the conversion unit 501, fusion unit 502 and update unit 503 of the video frame processing device 500 are respectively described in steps 201, 202 and 202 in the embodiment corresponding to FIG. The related description of step 203 can be referred to, so the description thereof is omitted here.

本実施形態の一部のオプション的な実施形態において、オプティカルフローは高密度オプティカルフローであり、装置は、先行フレームと後続フレームを特徴マップのサイズに縮小し、縮小後の先行フレームと縮小後の後続フレームとの間の高密度オプティカルフローを確定し、且つ該高密度オプティカルフローを、ビデオにおける隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローとするように構成されるオプティカルフロー生成ユニットをさらに備える。 In some optional embodiments of this embodiment, the optical flow is dense optical flow, the apparatus reduces the preceding and following frames to the size of the feature map, and reduces the preceding and following frames to the size of the feature map. optical flow generation configured to determine a dense optical flow between a subsequent frame and the dense optical flow as an optical flow generated based on adjacent preceding and subsequent frames in the video Have more units.

本実施形態の一部のオプション的な実施形態において、融合ユニットはさらに下記の方式によってオプティカルフローの誤差に基づいて変換済み特徴マップの重みを確定することを実行するように構成される。高密度オプティカルフローを利用して、縮小後の先行フレームを変換し、変換済み先行フレームを取得し、各座標の画素点の、変換済み先行フレームと縮小後の後続フレームとにおける画素値の差値に基づいて、高密度オプティカルフローの該座標における画素点の誤差を確定し、特定の画像に対して高密度オプティカルフローを利用して変換を行った後に取得した画像における該座標の画素点に対して、特定の画像に同じオブジェクトが含まれた画素点が存在するか否かを判定して、判定の結果を取得し、ここで、特定の画像のサイズは目標サイズであり、高密度オプティカルフローの誤差と判定の結果に基づいて、変換済み特徴マップにおける該座標の画素点の重みを確定する。 In some optional embodiments of this embodiment, the fusion unit is further configured to perform determining the weights of the transformed feature map based on the optical flow error according to the following scheme. Using high-density optical flow, transform the reduced preceding frame, obtain the transformed preceding frame, and calculate the pixel value difference between the transformed preceding frame and the reduced succeeding frame for pixel points at each coordinate. Determine the error of the pixel point at the coordinates of the high-density optical flow based on, and for the pixel point at the coordinates in the image acquired after performing the transformation using the high-density optical flow on the specific image to determine whether there is a pixel point containing the same object in the specific image, and obtain the determination result, where the size of the specific image is the target size, and the dense optical flow Determine the weight of the pixel point at that coordinate in the transformed feature map, based on the error of and the result of determination.

本実施形態の一部のオプション的な実施形態において、融合ユニットはさらに下記の方式によって、特定の画像を高密度オプティカルフローで変換した後に取得した画像におけるその座標の画素点に対して、特定の画像に同じオブジェクトが含まれた画素点が存在するか否かを判定して、判定の結果を取得することを実行するように構成される。ここで、特定の画像のサイズは目標サイズである。各画素点の画素値がいずれも所定画素値である特定の画像を取得し、高密度オプティカルフローを利用して、特定の画像を変換し、変換済み特定画像を取得する。変換済み特定画像における各座標の画素点に対して、変換済み特定画像における該座標の画素点の画素値が所定画素値以上であるか否かを判定する。変換済み特定画像における該座標の画素点の画素値が所定画素値以上である場合、判定の結果が同じオブジェクトが含まれた画素点が存在することであると判定する。変換済み特定画像における該座標の画素点の画素値が所定画素値未満である場合、判定の結果が同じオブジェクトが含まれる画素点が存在しないことであると判定する。 In some optional embodiments of this embodiment, the fusion unit further converts a pixel point of its coordinates in an image obtained after transforming a particular image with Dense Optical Flow to a particular It is configured to determine whether or not a pixel point including the same object exists in the image, and obtain the result of the determination. Here, the size of a particular image is the target size. Obtaining a specific image in which the pixel values of each pixel point are all predetermined pixel values, and transforming the specific image using high-density optical flow to obtain a transformed specific image. For a pixel point at each coordinate in the transformed specific image, it is determined whether the pixel value of the pixel point at the coordinate in the transformed specific image is equal to or greater than a predetermined pixel value. When the pixel value of the pixel point at the coordinates in the transformed specific image is equal to or greater than the predetermined pixel value, it is determined that the pixel point including the same object exists as a determination result. When the pixel value of the pixel point at the coordinates in the transformed specific image is less than the predetermined pixel value, it is determined that the pixel point including the same object does not exist.

本実施形態の一部のオプション的な実施形態において、融合ユニットはさらに下記の方式によって、高密度オプティカルフローの誤差と判定の結果に基づいて、変換済み特徴マップにおける該座標の画素点の重みの確定を実行するように構成される。高密度オプティカルフローの誤差が所定の誤差閾値未満であることと、判定の結果が同じオブジェクトが含まれた画素点が存在することとの両方を満たすことに応答して、変換済み特徴マップにおける該座標の画素点の重みは第１候補重みであると判定し、ここで、縮小後の後続フレームにおける該座標の画素点の画素値が大きければ大きいほど、所定の誤差閾値が大きくなる。高密度オプティカルフローの誤差が所定の誤差閾値以上であること、及び／又は判定の結果が同じオブジェクトが含まれた画素点が存在しないことを満たすことに応答して、変換済み特徴マップにおける該座標の画素点の重みが第２候補重みであると判定し、ここで、第１候補重みは第２候補重みより大きい。 In some optional embodiments of this embodiment, the fusion unit further determines the weight of the pixel point at the coordinate in the transformed feature map based on the dense optical flow error and the determination result by the following method: Configured to perform a commit. In response to satisfying both that the error of the dense optical flow is less than a predetermined error threshold and that the result of the determination is that there is a pixel point containing the same object, the transformed feature map: The weight of the coordinate pixel point is determined to be the first candidate weight, where the greater the pixel value of the coordinate pixel point in the subsequent frame after reduction, the greater the predetermined error threshold. the coordinates in the transformed feature map in response to satisfying that the error of the dense optical flow is greater than or equal to a predetermined error threshold and/or that the result of the determination is that there are no pixel points containing the same object is the second candidate weight, where the first candidate weight is greater than the second candidate weight.

本実施形態の一部のオプション的な実施形態において、装置は、変換済み特徴マップの重み及び後続フレームの特徴マップの重みに基づき、変換済み特徴マップと後続フレームの特徴マップに対して特徴の加重を行い、変換済み特徴マップと後続フレームの特徴マップとの特徴の加重結果を取得し、ここで、変換済み特徴マップの重みが大きければ大きいほど、後続フレームの特徴マップの重みが小さいことをさらに含んでもよい。 In some optional embodiments of this embodiment, the apparatus weights features for the transformed feature map and the subsequent frame feature map based on the transformed feature map weight and the subsequent frame feature map weight. to obtain the feature weighting result of the transformed feature map and the feature map of the subsequent frame, where the greater the weight of the transformed feature map, the smaller the weight of the feature map of the subsequent frame. may contain.

本実施形態の一部のオプション的な実施形態において、装置はさらに、敵対的生成ネットワークを利用して、ビデオの後続フレームの特徴マップと先行フレームの特徴マップを生成するように構成される特徴生成ユニットを含んでもよい。また、装置は、敵対的生成ネットワークを利用して更新済み特徴マップを処理し、後続フレームに対応する、ターゲットドメインの画像を生成するように構成される目標生成ユニットをさらに含んでもよい。 In some optional embodiments of this embodiment, the apparatus is further configured to utilize a generative adversarial network to generate feature maps for subsequent frames and feature maps for preceding frames of the video. may include units. The apparatus may also further include a target generation unit configured to process the updated feature map utilizing the adversarial generation network to generate an image of the target domain corresponding to subsequent frames.

本出願の実施形態により、本出願は電子機器と可読記憶媒体をさらに提供する。
図６に示されるように、本出願の実施形態のビデオフレームの処理方法による電子機器のブロック図である。電子機器は、ラップトップコンピュータ、デスクトップコンピュータ、ワークステーション、パーソナルデジタルアシスタント、サーバ、ブレード型サーバ、メインフレームコンピュータおよびその他の適切なコンピュータ等の様々な形態のデジタルコンピュータを示す。また、電子機器は、個人デジタル処理、携帯電話、スマートフォン、ウェアラブル装置およびその他の類似するコンピューティングデバイス等の様々な形態のモバイルデバイスを示すことができる。なお、ここで示したコンポーネント、それらの接続および関係、並びにそれらの機能はあくまでも例示であり、ここで説明および／または要求した本出願の実現を限定することを意図するものではない。 According to embodiments of the present application, the application further provides an electronic device and a readable storage medium.
As shown in FIG. 6, it is a block diagram of an electronic device according to the video frame processing method of an embodiment of the present application. Electronic equipment refers to various forms of digital computers such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers and other suitable computers. Electronics can also refer to various forms of mobile devices such as personal digital assistants, cell phones, smart phones, wearable devices and other similar computing devices. It should be noted that the components, their connections and relationships, and their functionality shown herein are exemplary only and are not intended to limit the implementation of the application as described and/or claimed herein.

図６に示すように、該電子機器は、１つ又は複数のプロセッサ６０１、記憶装置６０２、及び各コンポーネントを接続するためのインタフェース（高速インタフェース及び低速インタフェースを含む）を含む。各コンポーネントは、異なるバスで互いに接続されており、共通のマザーボード上に実装されていてもよいし、必要に応じて他の方式で実装されていてもよい。プロセッサは電子機器内で実行される命令を処理することができ、インタフェースに結合された表示装置等の外部入出力装置にグラフィカルユーザインタフェース（ＧＵＩ，ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）のグラフィック情報を表示するための命令を、記憶装置内または記憶装置上に格納することを含む。他の実施形態では、必要に応じて、複数のプロセッサおよび／または複数のバスおよび複数の記憶装置を、複数の記憶装置とともに使用することができる。また、複数の電子機器が接続されていてもよく、各機器は、例えば、サーバアレイ、ブレードサーバ群またはマルチプロセッサシステムとして、一部の必要な動作を提供する。図６では、１つのプロセッサ６０１を例としている。 As shown in FIG. 6, the electronic device includes one or more processors 601, storage device 602, and interfaces (including high speed and low speed interfaces) for connecting components. Each component is connected to each other by a different bus, and may be mounted on a common motherboard, or may be mounted by other methods as required. The processor is capable of processing instructions executed within the electronic device and instructions for displaying graphical information of a graphical user interface (GUI) on an external input/output device such as a display device coupled to the interface. in or on a storage device. In other embodiments, multiple processors and/or multiple buses and multiple storage devices may be used, along with multiple storage devices, where appropriate. Also, multiple electronic devices may be connected, each device providing some required operation, for example, as a server array, blade server cluster, or multi-processor system. In FIG. 6, one processor 601 is taken as an example.

記憶装置６０２は、本出願の実施形態が提供する非一時的コンピュータ可読記憶媒体である。ここで、記憶装置は、少なくとも１つのプロセッサが実行可能な命令を格納しており、それにより少なくとも１つのプロセッサに本出願が提供するビデオフレームの処理方法を実行させる。本出願の非一時的コンピュータ可読記憶媒体はコンピュータ命令を格納し、該コンピュータ命令はコンピュータに本出願が提供するビデオフレームの処理方法を実行させるために用いられる。 Storage device 602 is a non-transitory computer-readable storage medium provided by embodiments of the present application. Here, the storage device stores instructions executable by at least one processor to cause the at least one processor to perform the video frame processing method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions, which are used to cause a computer to perform the video frame processing methods provided by the present application.

記憶装置６０２は、非一時的なコンピュータ可読記憶媒体として、非一時的なソフトウェアプログラム、非一時的なコンピュータ実行可能なプログラム、並びに本出願の実施形態におけるビデオフレームの処理方法に対応するプログラムコマンド／モジュール（例えば、図５に示される変換ユニット５０１、融合ユニット５０２および更新ユニット５０３）などのモジュールを格納することに用いられる。プロセッサ６０１は、記憶装置６０２に格納された非一時的なソフトウェアプログラム、コマンド及びモジュールを実行することにより、サーバの様々な機能アプリケーション及びデータ処理を実行し、すなわち上記方法の実施形態におけるビデオフレームの処理方法を実現する。 Storage device 602 is a non-transitory computer-readable storage medium containing non-transitory software programs, non-transitory computer-executable programs, and program commands/ It is used to store modules such as modules (eg, transform unit 501, fusion unit 502 and update unit 503 shown in FIG. 5). The processor 601 executes the non-transitory software programs, commands and modules stored in the storage device 602 to perform the various functional applications and data processing of the server, i.e. the processing of video frames in the above method embodiments. Implement a processing method.

記憶装置６０２はオペレーティングシステムや、少なくとも１つの機能に必要なアプリケーションを格納可能なプログラム記憶領域と、ビデオフレームの処理に係る電子機器の使用に応じて作成されたデータ等を格納可能なデータ記憶領域とを含んでもよい。また、記憶装置６０２は高速ランダムアクセスメモリを含むことができ、また非一時的記憶装置（例えば、少なくとも１つの磁気ディスク記憶装置、フラッシュメモリデバイス又はその他の非一時的ソリッドステート記憶装置）を含むことができる。いくつかの実施形態において、記憶装置６０２は任意選択でプロセッサ６０１に対して遠隔に設置された記憶装置を含むことができ、これらの遠隔に設置された記憶装置はネットワークを介して本実施形態のビデオフレームの処理のための電子機器に接続することができる。上記ネットワークとしては、例えば、インターネット、企業イントラネット、ローカルエリアネットワーク、移動体通信網及びこれらの組み合わせなどが挙げられるが、それらに限定されない。 The storage device 602 has a program storage area that can store an operating system and applications required for at least one function, and a data storage area that can store data created according to the use of the electronic device for processing video frames. and may include Storage 602 may also include high-speed random access memory and may also include non-transitory storage (eg, at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device). can be done. In some embodiments, storage device 602 can optionally include storage devices remotely located relative to processor 601, and these remotely located storage devices are connected to the present embodiment via a network. Can be connected to electronics for processing of video frames. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

ビデオフレームの処理方法を実行する電子機器はさらに入力装置６０３と出力装置６０４を備えてもよい。プロセッサ６０１、記憶装置６０２、入力装置６０３および出力装置６０４はバス又はほかの方式で接続可能であり、図６はバスで接続することを例とする。 The electronic device implementing the video frame processing method may further comprise an input device 603 and an output device 604 . The processor 601, storage device 602, input device 603 and output device 604 can be connected by a bus or other methods, and FIG. 6 is an example of connecting by a bus.

入力装置６０３は入力された数字や文字情報を受信でき、本実施形態のビデオフレームの処理方法を実現するための電子機器のユーザ設定および機能制御に関するキー信号の入力を生成することができる。入力装置６０３として、タッチパネル、キーパッド、マウス、トラックパッド、タッチパッド、ポインティングデバイス、１つまたは複数のマウスボタン、トラックボール、ジョイスティック等が例示される。出力装置６０４は表示装置、補助照明装置及び触覚フィードバック装置等を含むことができ、そのうち、補助照明装置は例えばＬＥＤであり、触覚フィードバック装置は例えば、振動モータである。該表示装置は、液晶ディスプレイ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ，ＬＣＤ）、発光ダイオード（ＬＥＤ）ディスプレイ及びプラズマディスプレイを含むことができるが、これらに限定されない。いくつかの実施形態において、表示装置はタッチパネルであってもよい。 The input device 603 can receive input numeric and character information, and can generate input of key signals related to user settings and functional control of the electronic device for implementing the video frame processing method of the present embodiment. Examples of input device 603 include a touch panel, keypad, mouse, trackpad, touchpad, pointing device, one or more mouse buttons, trackball, joystick, and the like. The output device 604 can include a display device, an auxiliary lighting device, a haptic feedback device, etc., wherein the auxiliary lighting device is, for example, an LED, and the haptic feedback device is, for example, a vibration motor. The display device can include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch panel.

ここで説明するシステム及び技術の様々な実施形態はデジタル電子回路システム、集積回路システム、特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ，ＡＳＩＣ）、コンピュータハードウェア、ファームウェア、ソフトウェア、及び／又はそれらの組み合わせにおいて実装されることができる。これらの様々な実施形態は、１つ又は複数のコンピュータプログラムに実装され、該１つ又は複数のコンピュータプログラムは少なくとも１つのプログラマブルプロセッサを含むプログラマブルシステムにおいて実行及び／又は解釈することができ、該プログラマブルプロセッサは専用又は汎用プログラマブルプロセッサであってもよく、記憶システム、少なくとも１つの入力装置及び少なくとも１つの出力装置からデータ及び指令を受信することができ、且つデータ及び指令を該記憶システム、該少なくとも１つの入力装置及び該少なくとも１つの出力装置に伝送することを含み得る。 Various embodiments of the systems and techniques described herein may be digital electronic circuit systems, integrated circuit systems, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. can be implemented in These various embodiments are implemented in one or more computer programs, which can be executed and/or interpreted in a programmable system that includes at least one programmable processor; The processor may be a dedicated or general purpose programmable processor capable of receiving data and instructions from a storage system, at least one input device and at least one output device, and transmitting data and instructions to the storage system, the at least one transmitting to one input device and the at least one output device.

これらのコンピュータプログラムは、プログラム、ソフトウェア、ソフトウェアアプリケーション又はコードとも呼ばれ、プログラマブルプロセッサの機械命令を含み、且つ高度プロセス及び／又はオブジェクト指向のプログラミング言語、及び／又はアセンブリ言語／機械語を利用して実装することができる。ここで、「機械可読媒体」及び「コンピュータ可読媒体」という用語は、機械命令及び／又はデータをプログラマブルプロセッサに供給するための任意のコンピュータプログラム製品、装置、及び／又はデバイス（たとえば、磁気ディスク、光ディスク、記憶装置、プログラマブルロジックデバイス（ＰＬＤ））を意味し、機械可読信号である機械命令を受信する機械可読媒体を含む。「機械可読信号」という用語は、機械命令および／またはデータをプログラマブルプロセッサに供給するための任意の信号を意味する。 These computer programs, also referred to as programs, software, software applications or code, contain programmable processor machine instructions and utilize high process and/or object oriented programming languages and/or assembly/machine language. can be implemented. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device for providing machine instructions and/or data to a programmable processor (e.g., magnetic disk, means an optical disc, storage device, programmable logic device (PLD)) and includes a machine-readable medium for receiving machine instructions as machine-readable signals. The term "machine-readable signal" means any signal for providing machine instructions and/or data to a programmable processor.

ユーザとのインタラクションを提供するために、ここで説明するシステムと技術は、ユーザに情報を表示するための表示装置（例えば、陰極線管（ＣａｔｈｏｄｅＲａｙＴｕｂｅ，ＣＲＴ）またはＬＣＤ（液晶ディスプレイ）モニタ）と、キーボード及びポインティングデバイス（例えば、マウスまたはトラックボール）とを備えるコンピュータ上で実現することができ、ユーザが該キーボード及び該ポインティングデバイスを介してコンピュータに入力を提供できる。他の種類の装置は、さらにユーザとのインタラクションを提供することに用いることができる。例えば、ユーザに提供されるフィードバックは、例えば、視覚フィードバック、聴覚フィードバック、又は触覚フィードバックのような任意の形態のセンシングフィードバックであってもよく、且つ音の入力、音声入力又は、触覚入力を含む任意の形態でユーザからの入力を受信してもよい。 To provide for user interaction, the systems and techniques described herein combine a display device (e.g., a Cathode Ray Tube (CRT) or LCD (Liquid Crystal Display) monitor) to display information to the user. , a keyboard and a pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other types of devices can be used to provide further interaction with the user. For example, the feedback provided to the user may be any form of sensing feedback, such as, for example, visual, auditory, or tactile feedback, and may include any sound, audio, or tactile input. Input from the user may be received in the form of

ここで説明したシステム及び技術は、バックグラウンドコンポーネントを含むコンピューティングシステム（例えば、データサーバ）に実施されてもよく、又はミドルウェアコンポーネントを含むコンピューティングシステム（例えば、アプリケーションサーバ）に実施されてもよく、又はフロントエンドコンポーネントを含むコンピューティングシステム（例えば、グラフィカルユーザインタフェース又はウェブブラウザを有するユーザコンピュータ）に実施されてもよく、ユーザは該グラフィカルユーザインタフェース又は該ウェブブラウザを介してここで説明したシステム及び技術の実施形態とインタラクションしてもよく、又はこのようなバックグラウンドコンポーネント、ミドルウェアコンポーネント又はフロントエンドコンポーネントのいずれかの組み合わせを含むコンピューティングシステムに実施されてもよい。また、システムの各コンポーネント間は、通信ネットワーク等の任意の形態または媒体を介してデジタルデータ通信により接続されていてもよい。通信ネットワークとしては、ローカルエリアネットワーク（ＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）及びインターネットなどを含む。 The systems and techniques described herein may be implemented in computing systems that include background components (e.g., data servers) or may be implemented in computing systems that include middleware components (e.g., application servers). , or a computing system that includes front-end components (e.g., a user computer having a graphical user interface or web browser), through which a user can interact with the systems and Embodiments of the technology may interact or be implemented in a computing system that includes any combination of such background, middleware or front-end components. Further, each component of the system may be connected by digital data communication via any form or medium such as a communication network. Communication networks include local area networks (LAN), wide area networks (WAN), the Internet, and the like.

コンピュータシステムは、クライアントとサーバとを含んでもよい。クライアントとサーバは、通常、互いに離れており、通信ネットワークを介してインタラクションを行う。クライアントとサーバとの関係は、互いにクライアント－サーバの関係を有するコンピュータプログラムをそれぞれのコンピュータ上で動作することによって生成される。 The computer system can include clients and servers. A client and server are typically remote from each other and interact through a communication network. The relationship of client and server is created by running computer programs on the respective computers which have a client-server relationship to each other.

図面におけるフローチャートとブロック図は、本出願の様々な実施形態によるシステム、方法およびコンピュータプログラム製品の実現可能なシステムアーキテクチャ、機能および操作を示す。この点では、フローチャート又はブロック図における各ブロックは１つのモジュール、プログラムセグメント、又はコードの一部を表すことができ、該モジュール、プログラムセグメント、又はコードの一部は規定されたロジック機能を実現するための１つ又は複数の実行可能な指令を含んでもよい。なお、一部の代替可能な実装において、ブロックに標識された機能は図面に標識された順序と異なる順序で発生してもよい。例えば、２つの連続的に表示されるブロックは関連機能によって、実質的に並行して実行されてもよく、逆の順序で実行されてもよい。なお、ブロック図及び／又はフローチャートにおける各ブロック、及びブロック図及び／又はフローチャートにおけるブロックの組み合わせは、規定される機能又は操作を実行するためのハードウェアに基づく専用システムで実現されるか、或いは、専用ハードウェアとコンピュータ指令の組み合わせを用いて実現されてもよい。 The flowcharts and block diagrams in the drawings illustrate possible system architectures, functionality, and operation of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram can represent a module, program segment, or portion of code, which implements a defined logic function. may include one or more executable instructions for It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two consecutively-appearing blocks may be executed substantially in parallel or in the reverse order, depending on the functionality involved. It is noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented in a dedicated hardware-based system to perform the specified functions or operations; It may also be implemented using a combination of dedicated hardware and computer instructions.

本出願の実施形態に記載されるユニットはソフトウェアの方式で実現されてもよく、ハードウェアの方式で実現されてもよい。記載されるユニットはプロセッサに設置されてもよく、例えば、変換ユニット、融合ユニットおよび更新ユニットを備えるプロセッサとして記載されてもよい。ここで、これらのユニットの名称は、特定の状況下では該ユニット自体の制限を構成するものではなく、例えば、更新ユニットはさらに、「後続フレームの特徴マップを更新するユニット」として記載されてもよい。 The units described in the embodiments of the present application may be implemented in software manner or may be implemented in hardware manner. The described units may be located in a processor and may for example be described as a processor comprising a conversion unit, a fusion unit and an update unit. Here, the names of these units do not, under certain circumstances, constitute a limitation of the units themselves, e.g. an update unit may also be described as "the unit that updates the feature map of subsequent frames". good.

他の１つの態様として、本出願はさらにコンピュータ可読媒体を提供し、該コンピュータ可読媒体は上記実施形態に記載の装置に含まれてもよく、該装置に組み込まれていなく独立に存在してもよい。上記コンピュータ可読媒体は１つ又は複数のプログラムを担持し、上記１つ又は複数のプログラムが該装置により実行される時、該装置は、ビデオにおいて、隣接している先行フレームと後続フレームに基づいて生成したオプティカルフローを利用して、先行フレームの特徴マップを変換し、変換済み特徴マップを取得し、オプティカルフローの誤差に基づき、変換済み特徴マップの重みを確定し、及び変換済み特徴マップと後続フレームの特徴マップとの特徴の加重結果に基づき、融合された特徴マップを取得し、後続フレームの特徴マップを更新し、ここで、更新済み特徴マップは融合された特徴マップである。 As another aspect, the present application further provides a computer-readable medium, which may be included in the apparatus described in the above embodiments, or may exist independently without being incorporated in the apparatus. good. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device displays a Using the generated optical flow, transform the feature map of the preceding frame, obtain the transformed feature map, determine the weight of the transformed feature map based on the error of the optical flow, and combine the transformed feature map with the subsequent Obtain a fused feature map based on the feature weighting result with the feature map of the frame, and update the feature map of the subsequent frame, where the updated feature map is the fused feature map.

以上の記載は本出願の好ましい実施形態、及び活用される技術的原理に対する説明のみである。当業者であれば、本出願で言及される発明の範囲は、上記技術的特徴の特定の組み合わせに限定されず、上記発明の構想から逸脱することなく、上記の技術的特徴又はそれらの均等な特徴の任意の組み合わせによって形成される他の技術的解決手段も同時に含まれることを理解すべきである。例えば、上記特徴と本出願に開示された類似機能を有する技術的特徴とが相互に置換することにより形成される技術的解決手段（それに限定されない）も包含される。 The above description is only a description of the preferred embodiments of the present application and the technical principles utilized. A person skilled in the art will appreciate that the scope of the invention referred to in the present application is not limited to the specific combination of the above technical features and that the above technical features or their equivalents can be used without departing from the above inventive concept. It should be understood that other technical solutions formed by any combination of features are also included at the same time. For example, technical solutions (but not limited to) formed by mutual replacement of the above features and technical features with similar functions disclosed in this application are also included.

Claims

transforming a feature map of a preceding frame using an optical flow generated based on adjacent preceding and following frames in a video to obtain a transformed feature map;
Determining weights of the transformed feature map based on the optical flow error, and obtaining a fused feature map based on the feature weighting results of the transformed feature map and the feature map of the subsequent frame. ,
updating the feature map of the subsequent frame to obtain an updated feature map that is the fused feature map ;
the optical flow is a high density optical flow;
The processing method is
reducing the preceding frame and the succeeding frame to the size of a feature map; determining a dense optical flow between the reduced preceding frame and the succeeding frame after reduction; and applying the dense optical flow to the video. further comprising generating an optical flow based on adjacent preceding and following frames;
Determining weights of the transformed feature map based on the optical flow error includes:
transforming the reduced previous frame using the dense optical flow to obtain a transformed previous frame;
determining an error of a pixel point at that coordinate in the Dense Optical Flow based on a pixel value difference between the transformed previous frame and the reduced subsequent frame for the pixel point at each coordinate;
Whether or not a pixel point including the same object exists in the specific image with respect to the pixel point of the coordinates in the image obtained by transforming the specific image of the target size based on the high-density optical flow determining and obtaining a determination result;
determining weights of pixel points at the coordinates in the transformed feature map based on the dense optical flow error and the result of the determination;
How to process video frames.

A pixel point containing the same object exists in the specific image with respect to the pixel point of the coordinates in the image obtained by transforming the specific image of the target size based on the high-density optical flow. Determining whether or not and obtaining the result of determination is
Acquiring a specific image in which the pixel value of each pixel point is a predetermined pixel value;
transforming the specific image using the high-density optical flow to obtain a transformed specific image;
Determining whether a pixel value of a pixel point at each coordinate in the transformed specific image is equal to or greater than the predetermined pixel value;
Determining that there is a pixel point including the same object when the pixel value of the pixel point at the coordinates in the transformed specific image is equal to or greater than the predetermined pixel value;
2. The processing method according to claim 1 , further comprising determining that there is no pixel point including the same object if the pixel value of the pixel point at the coordinates in the transformed specific image is less than the predetermined pixel value. .

Determining the weight of the pixel point at the coordinate in the transformed feature map based on the error of the dense optical flow and the result of the determination,
the transformed feature map in response to satisfying both that the error of the dense optical flow is less than a predetermined error threshold and that there is a pixel point containing the same object as a result of the determination; is determined to be the first candidate weight, and the larger the pixel value of the pixel point at the coordinates in the subsequent frame after reduction, the greater the predetermined error threshold is big things and
of the coordinates in the transformed feature map in response to the fact that the error of the dense optical flow is greater than or equal to a predetermined error threshold and/or that there is no pixel point that includes the same object as the result of the determination. 2. The method of claim 1 , comprising determining that a pixel point weight is a second candidate weight that is less than said first candidate weight.

performing feature weighting on the transformed feature map and the feature map of the subsequent frame based on the weight of the transformed feature map and the weight of the feature map of the subsequent frame; 2. The method of claim 1, further comprising obtaining a weighted result of features with a feature map, wherein the higher the weight of the transformed feature map, the lower the weight of the feature map of the subsequent frame. Processing method.

utilizing a generative adversarial network to generate a feature map for a subsequent frame and a feature map for a preceding frame of a video;
2. The method of claim 1, further comprising processing the updated feature map using a generative adversarial network to generate an image of the target domain corresponding to the subsequent frame.

a transformation unit configured to transform a feature map of a preceding frame using an optical flow generated based on adjacent preceding and following frames in a video to obtain a transformed feature map;
determining a weight of the transformed feature map based on the optical flow error, and obtaining a fused feature map based on the feature weighting result of the transformed feature map and the feature map of the subsequent frame. a fusion unit composed of
an update unit configured to update the feature map of the subsequent frame to obtain an updated feature map that is the fused feature map ;
the optical flow is a high density optical flow;
The processing device is
reducing the preceding frame and the succeeding frame to the size of a feature map; determining a dense optical flow between the reduced preceding frame and the succeeding frame after reduction; and applying the dense optical flow to the video. further comprising an optical flow generation unit configured to generate an optical flow based on adjacent preceding and following frames;
The fusion unit further comprises:
transforming the reduced preceding frame using the dense optical flow to obtain a transformed preceding frame;
determining an error of a pixel point at each coordinate of the pixel point at the coordinate of the high-density optical flow based on the pixel value difference between the transformed preceding frame and the reduced subsequent frame;
Whether or not a pixel point including the same object exists in the specific image with respect to the pixel point of the coordinates in the image obtained by transforming the specific image of the target size based on the high-density optical flow is determined, and the result of the determination is obtained,
configured to determine a weight of a pixel point at the coordinate in the transformed feature map based on the dense optical flow error and the result of the determination;
Video frame processor.

The fusion unit further comprises:
obtaining a specific image in which the pixel values of each pixel point are all predetermined pixel values;
Using the high-density optical flow to transform the specific image to obtain a transformed specific image;
determining whether a pixel value of a pixel point at each coordinate in the transformed specific image is equal to or greater than the predetermined pixel value;
determining that a pixel point including the same object exists when the pixel value of the pixel point at the coordinates in the transformed specific image is equal to or greater than the predetermined pixel value;
7. The processing device according to claim 6, configured to determine that there is no pixel point including the same object when the pixel value of the pixel point at the coordinates in the transformed specific image is less than the predetermined pixel value. .

The fusion unit further comprises:
the transformed feature map in response to satisfying both that the error of the dense optical flow is less than a predetermined error threshold and that there is a pixel point containing the same object as a result of the determination; is determined to be the first candidate weight, and the larger the pixel value of the pixel point at the coordinates in the subsequent frame after reduction, the greater the predetermined error threshold is big things and
of the coordinates in the transformed feature map in response to the fact that the error of the dense optical flow is greater than or equal to a predetermined error threshold and/or that there is no pixel point that includes the same object as the result of the determination. 7. The processor of claim 6 , configured to: determine that a pixel point weight is a second candidate weight that is less than the first candidate weight.

performing feature weighting on the transformed feature map and the feature map of the subsequent frame based on the weight of the transformed feature map and the weight of the feature map of the subsequent frame; further comprising a weighted result obtaining unit configured to obtain a weighted result of the feature with the feature map;
7. The processing device of claim 6 , wherein the higher the weight of the transformed feature map, the lower the weight of the feature map of the subsequent frame.

a feature generation unit configured to generate a feature map for a subsequent frame and a feature map for a preceding frame of a video using a generative adversarial network;
7. The process of claim 6 , further comprising a target generation unit configured to process the updated feature map utilizing an adversarial generation network to generate an image of the target domain corresponding to the subsequent frame. Device.

An electronic device comprising one or more processors and a storage device for storing one or more programs,
An electronic device in which the processing method according to any one of claims 1 to 5 is implemented in the one or more processors by executing the one or more programs by the one or more processors. device.

A computer-readable storage medium storing a computer program,
A computer readable storage medium on which the processing method according to any one of claims 1 to 5 is implemented when said computer program is executed by a processor.

A computer program, which, when executed by a processor, implements the processing method according to any one of claims 1 to 5 .