JP2023504926A

JP2023504926A - Image processing method, device, electronic device and computer-readable storage medium

Info

Publication number: JP2023504926A
Application number: JP2022537296A
Authority: JP
Inventors: ジュゴ，ジンジン; ニ，グアンヤオ; ヤン，ホイ
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-12-18
Filing date: 2020-12-16
Publication date: 2023-02-07
Anticipated expiration: 2040-12-16
Also published as: KR102534449B1; JP7214926B1; EP4060611A1; AU2020404293B2; US20220319062A1; BR112022012014A2; WO2021121291A1; KR20220099584A; US11651529B2; EP4060611A4; CN112991147B; CN112991147A; AU2020404293A1; MX2022007700A; CA3162058A1

Abstract

画像処理方法、装置、電子機器及びコンピュータ読み取り可能な記憶媒体であって、そのうち、この画像処理方法は、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別すること（Ｓ１０１）と、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、第２のビデオフレーム画像を得ることであって、そのうち、前記第２のビデオフレーム画像において前記第３の対象が前記第１の対象を覆ったこと（Ｓ１０２）と、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ること（Ｓ１０３）と、を含む。上記の方法により、従来技術における、対象上にオーバーレイしたエフェクトが対象の本来の特徴を表すことができないことに起因する現実感に欠けるという技術課題を解決した。【選択図】図１An image processing method, apparatus, electronic device and computer readable storage medium, wherein the image processing method comprises: a first object in a first video frame image; and overlaying a third object as a foreground image on the first video frame image based on the first object's position in the first video frame image. to obtain a second video frame image, wherein the third object covers the first object in the second video frame image (S102); superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the object in the first video frame image to obtain a third video frame image; (S103). The above method solves the technical problem of the lack of realism in the prior art, which is caused by the fact that the effect overlaid on the object cannot represent the original feature of the object. [Selection diagram] Fig. 1

Description

［関連出願の相互参照］
本開示は、２０１９年１２月１８日に中国特許局に提出された、出願番号が２０１９１１３０６４２１．６であり、出願の名称が「画像処理方法、装置、電子機器及びコンピュータ読み取り可能な記憶媒体」である中国特許出願の優先権を主張し、その内容の全てが参照によって本開示に組み込まれる。 [Cross reference to related applications]
This disclosure is filed with the Chinese Patent Office on Dec. 18, 2019, with application number 201911306421.6 and titled "Image Processing Method, Apparatus, Electronic Apparatus and Computer Readable Storage Medium". Priority is claimed from a Chinese patent application, the entire content of which is incorporated into this disclosure by reference.

［技術分野］
本開示は、画像処理分野に関し、特に画像処理方法、装置、電子機器及びコンピュータ読み取り可能な記憶媒体に関する。 [Technical field]
FIELD OF THE DISCLOSURE The present disclosure relates to the field of image processing, and more particularly to image processing methods, apparatus, electronic devices, and computer-readable storage media.

コンピュータネットワークの発展やスマートフォンの普及に伴い、一般ユーザは、単調な画像や文字だけで自分の感情を表すことに満足できない。ビデオは、より多様な内容と形式を呈し、直観的な感覚をもたらすことができるため、ユーザに深く好ましくなっており、徐々に流行ってきており、一般ユーザがオリジナルビデオを制作することは徐々にトレンドになる。しかし、自分撮りビデオのようなオリジナルビデオでは表現形式が無味乾燥であるが、一方、映画・テレビ作品におけるビデオエフェクトの活用がますます豊富になり、内容表現形式もより多様であるため、ビデオエフェクトは、成功した映画・テレビ作品の支え、保障になると言える。 With the development of computer networks and the popularization of smart phones, general users are not satisfied with expressing their emotions only with monotonous images and characters. Video presents more diverse content and formats and can bring intuitive feeling, so it is deeply favored by users and gradually becomes popular. become a trend. However, while original videos such as selfie videos have a dry expression format, the use of video effects in film and television works is becoming more and more abundant, and the content expression format is also more diverse. can be said to underpin and guarantee successful film and television productions.

しかしながら、従来技術では、直接的に目標対象（例えば、人の顔）をエフェクトによって覆うことによりエフェクトを制作するのが一般的であるが、覆われた箇所がエフェクトによって遮られ、目標対象の本来の特徴を表すことができないため、エフェクトの実際の効果が不自然で、現実感に欠ける。 However, in the prior art, it is common to create an effect by directly covering the target object (for example, a person's face) with the effect, but the covered part is blocked by the effect, resulting in , so the actual effect of the effect is unnatural and unrealistic.

この発明の概要は、構想を簡単な形で紹介するために提供され、これら構想は、後の発明を実施するための形態で詳細に記述される。この発明の概要は、権利化しようとする技術案のキーポイントとなる特徴又は必須な特徴を標識することを意図するものではなく、権利化しようとする技術案の範囲を制限することも意図していない。 This Summary of the Invention is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description that follows. This summary of the invention is not intended to mark the key features or essential features of the technical solution to be patented, nor is it intended to limit the scope of the technical solution to be patented. not

第一の側面によれば、本開示の実施例は、
第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することと、
前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、第２のビデオフレーム画像を得ることであって、そのうち、前記第２のビデオフレーム画像において前記第３の対象が前記第１の対象を覆ったことと、
前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を含む画像処理方法を提供する。 According to a first aspect, embodiments of the present disclosure include:
identifying a first object in a first video frame image and a second object located within the first object;
obtaining a second video frame image by overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image; wherein the third object covered the first object in the second video frame image;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; and obtaining a frame image.

第二の側面によれば、本開示の実施例は、
第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象と、を識別するための対象識別モジュールと、
前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得るための第２のビデオフレーム画像生成モジュールと、
前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得るための第３のビデオフレーム生成モジュールと、を含む画像処理装置を提供する。 According to a second aspect, embodiments of the disclosure include:
an object identification module for identifying a first object in a first video frame image and a second object located within said first object;
overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image, wherein the third object is the first video frame image; a second video frame image generation module for obtaining a second video frame image covering the object;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; and a third video frame generation module for obtaining frame images.

第三の側面によれば、本開示の実施例は、
少なくとも１つのプロセッサと、
前記少なくとも１つのプロセッサに通信可能に接続されたメモリと、を含み、
前記メモリには、前記少なくとも１つのプロセッサによって実行可能な命令が記憶され、前記命令が前記少なくとも１つのプロセッサによって実行されることにより、前記少なくとも１つのプロセッサが前記第一の側面のいずれかの前記画像処理方法を実行できるようにした、電子機器を提供する。 According to a third aspect, embodiments of the disclosure include:
at least one processor;
a memory communicatively coupled to the at least one processor;
The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to perform any of the first aspects. An electronic device capable of executing an image processing method is provided.

第四の側面によれば、本開示の実施例は、コンピュータに前記第一の側面のいずれかの前記画像処理方法を実行させるためのコンピュータ命令を記憶したことを特徴とする、非一時的なコンピュータ読み取り可能な記憶媒体を提供する。 According to a fourth aspect, an embodiment of the present disclosure is characterized by storing computer instructions for causing a computer to perform the image processing method of any of the first aspects. A computer-readable storage medium is provided.

第五の側面によれば、本開示の実施例は、コンピュータによって実行されると、前記コンピュータに前記第一の側面のいずれかの前記画像処理方法を実行させるコンピュータプログラムを提供する。 According to a fifth aspect, embodiments of the present disclosure provide a computer program product which, when executed by a computer, causes said computer to perform said image processing method of any of said first aspect.

本開示の実施例には、画像処理方法、装置、電子機器及びコンピュータ読み取り可能な記憶媒体が開示されている。この画像処理方法は、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することと、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、第２のビデオフレーム画像を得ることであって、そのうち、前記第２のビデオフレーム画像において前記第３の対象が前記第１の対象を覆ったことと、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を含む。上記の方法により、従来技術における、対象上にオーバーレイしたエフェクトが対象の本来の特徴を表すことができないことに起因する現実感に欠けるという技術課題を解決した。 An image processing method, apparatus, electronic device and computer-readable storage medium are disclosed in embodiments of the present disclosure. The image processing method includes identifying a first object and a second object located within the first object in a first video frame image; overlaying a third object as a foreground image on the first video frame image based on its position in the frame image to obtain a second video frame image, wherein the second video frame image is the second object as a foreground image based on the third object covering the first object in the image and the position of the second object in the first video frame image; superimposing on the third object in the video frame image of , to obtain a third video frame image. The above method solves the technical problem of the lack of realism in the prior art, which is caused by the fact that the effect overlaid on the object cannot represent the original feature of the object.

上記の説明は本開示の技術案の概要に過ぎず、本開示の技術的手段をより明確に理解させるために、明細書の内容に基づいて実施でき、また、本開示の上記及び他の目的、特徴及び利点をより明らかで理解しやすいようにするために、以下、好ましい実施例を挙げて、添付図面を結び付けながら、以下のように詳しく説明する。 The above description is merely an overview of the technical solution of the present disclosure, in order to make the technical means of the present disclosure more clearly understood, it can be implemented based on the content of the specification, and the above and other purposes of the present disclosure In order to make the features and advantages clearer and easier to understand, preferred embodiments are described in detail below in conjunction with the accompanying drawings.

本開示の各実施例の上記及び他の特徴、利点、並びに態様は、添付図面を結び付けながら、以下の発明を実施するための形態を参照することでより明らかとなる。全ての添付図面において、同一又は類似する要素を同一又は類似する符号で示している。添付図面は概略的なものであり、素子及び要素は必ずしも縮尺通りに描かれていないことを理解されたい。 The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent with reference to the following detailed description, taken in conjunction with the accompanying drawings. In all the drawings, the same or similar elements are indicated by the same or similar reference numerals. It is to be understood that the accompanying drawings are schematic and elements and components are not necessarily drawn to scale.

本開示による画像処理方法の実施例のフローチャートである。4 is a flow chart of an embodiment of an image processing method according to the present disclosure; 本開示による画像処理方法の実施例において人の顔のポジショニングポイントを算出する模式図である。FIG. 4 is a schematic diagram of calculating positioning points of a person's face in an embodiment of the image processing method according to the present disclosure; 本開示による画像処理方法の実施例におけるステップＳ１０３の具体例のフローチャートである。4 is a flowchart of a specific example of step S103 in an embodiment of the image processing method according to the present disclosure; 本開示による画像処理方法の実施例におけるステップＳ３０２の具体例のフローチャートである。FIG. 4 is a flow chart of an example of step S302 in an embodiment of the image processing method according to the present disclosure; FIG. 本開示の実施例による画像処理装置の実施例の構造模式図である。1 is a structural schematic diagram of an embodiment of an image processing device according to an embodiment of the present disclosure; FIG. 本開示の実施例による電子機器の構造模式図である。1 is a structural schematic diagram of an electronic device according to an embodiment of the present disclosure; FIG.

以下は、添付図面を参照しながら、本開示の実施例をより詳細に記述する。添付図面においては本開示のいくつかの実施例が示されたが、本開示を各種の形態で実現することができ、ここで記述された実施例に限定されると解釈されるべきではないことを理解されたい。逆に、これらの実施例は、本開示をより徹底的且つ完全に理解するために提供される。本開示の添付図面及び実施例は、例示的なものに過ぎず、本開示の保護範囲を制限するものではないことを理解されたい。 Embodiments of the present disclosure are described in more detail below with reference to the accompanying drawings. Although several embodiments of the disclosure are illustrated in the accompanying drawings, the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Please understand. Rather, these examples are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and examples of the present disclosure are only illustrative and do not limit the protection scope of the present disclosure.

本開示の方法の実施形態に記載された各ステップは、異なる順序で実行されてもよく、及び／又は並行して実行されてもよいことを理解されたい。なお、方法の実施形態は、追加のステップを含んでもよく、及び／又は示されたステップの実行を省略してもよい。本開示の範囲はこの点において制限されない。 It should be understood that the steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. It is noted that method embodiments may include additional steps and/or omit performance of the indicated steps. The scope of the disclosure is not limited in this respect.

本明細書で使用される「含む」という用語及びその変形は、開放式包括であり、即ち、「を含むが、それらに限らない」ことを意味する。「基づいて」という用語は、「少なくとも部分的に基づいて」を意味する。「一実施例」という用語は、「少なくとも１つの実施例」を意味し、「別の実施例」という用語は、「少なくとも１つの別の実施例」を意味し、「いくつかの実施例」という用語は、「少なくともいくつかの実施例」を意味する。他の用語の関連定義は、以下の記述において与えられる。 As used herein, the term "including" and variations thereof is open-ended, meaning "including, but not limited to." The term "based on" means "based at least in part on." The term "one embodiment" means "at least one embodiment" and the term "another embodiment" means "at least one other embodiment" and "some embodiments". The term means "at least some embodiments." Relevant definitions of other terms are provided in the description below.

なお、本開示で記載された「第一の」、「第二の」などの概念は、単に異なる装置、モジュール、又はユニットを区別するためのものであり、これらの装置、モジュール、又はユニットによって実行される機能の順序又は相互依存関係を限定するためのものではない。 It should be noted that the terms “first”, “second”, etc. described in this disclosure are merely for distinguishing between different devices, modules or units, and by these devices, modules or units It is not intended to limit the order or interdependencies of the functions performed.

なお、本開示で記載された「１つ」、「複数」の修飾は、模式的なものであり、制限性のあるものではない。特別な説明がない限り、「１つ又は複数」と理解されるべきであることは当業者にとって自明である。 It should be noted that the "one" and "plurality" modifications described in this disclosure are schematic and are not limiting. It is obvious to those skilled in the art that "one or more" should be understood unless specifically stated otherwise.

本開示の実施形態における複数の装置間でやりとりされるメッセージ又は情報の名称は、説明的なものに過ぎず、これらメッセージ又は情報の範囲を制限するものではない。 The names of messages or information exchanged between devices in embodiments of the present disclosure are descriptive only and do not limit the scope of these messages or information.

図１は本開示による画像処理方法の実施例のフローチャートである。本実施例によるこの画像処理方法は、１つの画像処理装置によって実行されてもよい。この画像処理装置は、ソフトウェアとして実現され、又はソフトウェアとハードウェアの組み合わせとして実現されてもよい。この画像処理装置は、画像処理システムにおける、画像処理サーバ又は画像処理端末機器などの機器に集積設置されてもよい。図１に示すように、この方法は以下のようなステップを含む。
ステップＳ１０１：第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別する。 FIG. 1 is a flow chart of an embodiment of an image processing method according to the present disclosure. This image processing method according to this embodiment may be performed by one image processing device. This image processing device may be implemented as software, or as a combination of software and hardware. This image processing apparatus may be integrated and installed in a device such as an image processing server or an image processing terminal device in an image processing system. As shown in FIG. 1, the method includes the following steps.
Step S101: Identifying a first object in a first video frame image and a second object located within said first object.

このステップにおいて、前記第１のビデオフレーム画像は、イメージセンサ又はメモリによって取得される。そのうち、前記イメージセンサとは、画像を取り込むことができる様々な機器を指し、典型的なイメージセンサは、ビデオカメラ、カメラヘッド、カメラなどである。この実施例では、前記イメージセンサは、端末機器上のカメラヘッド、例えば、スマートフォン上の前向き又は後ろ向きカメラヘッドであってもよく、カメラヘッドで撮像した画像をそのまま携帯電話の表示画面に表示することができる。そのうち、前記メモリは、ローカルストレージ又はネットワークストレージであり、ローカルストレージでの記憶位置又はネットワークストレージでの記憶アドレスからビデオフレーム画像を取得し、メモリにおけるビデオフレーム画像は、予め撮像されたビデオフレーム画像ファイルであり、端末機器でのプレーヤにより端末機器の表示装置に表示可能である。 In this step, the first video frame image is acquired by an image sensor or memory. Among them, the image sensor refers to various devices capable of capturing images, and typical image sensors are video cameras, camera heads, cameras, and the like. In this embodiment, the image sensor may be a camera head on a terminal device, for example, a front-facing or rear-facing camera head on a smartphone, and the image captured by the camera head may be displayed on the display screen of the mobile phone as it is. can be done. wherein the memory is a local storage or a network storage, the video frame image is obtained from a storage location in the local storage or a storage address in the network storage, and the video frame image in the memory is a pre-captured video frame image file; and can be displayed on the display device of the terminal device by the player of the terminal device.

前記第１のビデオフレーム画像には、第１の対象が含まれ、前記第１の対象は、第１のビデオフレーム画像における任意の物体であってもよい。例示的に、前記第１の対象は、人の顔であり、前記第２の対象は、第１の対象の一部であり、第１の対象内に位置する。例示的に、前記第１の対象が人の顔である場合、前記第２の対象は、人の顔における目、口などの五官である。 The first video frame image includes a first object, and the first object may be any object in the first video frame image. Illustratively, said first object is a human face and said second object is part of said first object and is located within said first object. Illustratively, when the first object is a human face, the second object is the five senses of the human face, such as the eyes and mouth.

この実施例では、対象のキーポイントを用いて対象を識別する。例示的に、予め設定された複数のキーポイントに基づいて対象を識別してもよい。前記複数のキーポイントが識別された場合、前記対象が識別される。選択的に、前記の、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することは、前記第１のビデオフレーム画像において前記第１の対象の複数の第１のキーポイントと前記第２の対象の複数の第２のキーポイントとを識別することと、前記第１の対象の複数の第１のキーポイントに基づいて前記第１の対象を識別することと、前記第２の対象の複数の第２のキーポイントに基づいて前記第２の対象を識別することと、を含み、そのうち、前記複数の第２のキーポイントは前記第２の対象のエッジキーポイントである。そのうち、前記エッジキーポイントは、前記第２の対象の輪郭を特定する。例示的に、前記第１の対象は、人の顔であり、前記第２の対象は、人の顔における目と口である。前記複数の第１のキーポイントは、人の顔における両目の中心キーポイントと鼻先キーポイントであり、前記両目の中心キーポイントと鼻先キーポイントが識別された場合、前記人の顔が識別されてもよい。前記複数の第２のキーポイントは、前記目のエッジキーポイントと口のエッジキーポイントであり、同様に、前記目のエッジキーポイントと口のエッジキーポイントの場合、前記目と口が識別される。例示的に、前記第１のキーポイントと第２のキーポイントは、ディープラーニングアルゴリズムを用いて識別されることができ、前記第１のキーポイントと第２のキーポイントがマークされたフレーム画像集合を直接に用いてディープラーニングネットワークを訓練して、第１のキーポイントと第２のキーポイントを回帰できるネットワークモデルを得て、その後、第１のビデオフレーム画像を前記ネットワークモデルに入力することができ、前記第１のビデオフレーム画像に第１の対象と第２の対象が含まれる場合、前記ネットワークモデルは、第１のキーポイントと第２のキーポイントの位置を出力し、これにより第１の対象と第２の対象が直接に識別されるとともに、第１のキーポイントと第２のキーポイントの位置が識別される。他の任意のキーポイント識別アルゴリズムを用いて前記第１のキーポイントと第２のキーポイントを識別してもよいことを理解されたく、ここではこれ以上説明しない。 In this example, the keypoints of the object are used to identify the object. Illustratively, the object may be identified based on a plurality of preset keypoints. When the plurality of keypoints are identified, the object is identified. Optionally, said identifying a first object in said first video frame image and a second object located within said first object comprises said second object in said first video frame image. identifying a plurality of first keypoints of a subject and a plurality of second keypoints of the second subject; and based on the plurality of first keypoints of the first subject, and identifying the second object based on a plurality of second keypoints of the second object, wherein the plurality of second keypoints are edge keypoints of the second object; wherein said edge keypoints identify the contour of said second object; Illustratively, the first object is a human face, and the second object is the eyes and mouth of the human face. The plurality of first keypoints are an eye center keypoint and a nose tip keypoint of a human face, and when the eye center keypoint and the nose tip keypoint are identified, the human face is identified. good too. The plurality of second keypoints are the eye edge keypoints and the mouth edge keypoints, and similarly if the eye edge keypoints and the mouth edge keypoints, the eyes and mouth are identified. be. Illustratively, the first keypoint and the second keypoint can be identified using a deep learning algorithm, and the set of frame images in which the first keypoint and the second keypoint are marked. can be directly used to train a deep learning network to obtain a network model that can regress the first keypoint and the second keypoint, and then input the first video frame image to the network model. and if the first video frame image contains a first object and a second object, the network model outputs the positions of the first keypoint and the second keypoint, whereby the first The first and second objects are directly identified, and the positions of the first and second keypoints are identified. It should be understood that any other keypoint identification algorithm may be used to identify the first and second keypoints and will not be further described here.

ステップＳ１０２：前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得る。 Step S102: Overlay a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image, such that the third object is the A second video frame image is obtained covering the first object.

選択的に、前記第３の対象は３次元仮想対象であり、かつ前記３次元仮想対象の前記第１のビデオフレーム画像における２次元投影の領域は、前記第１の対象の領域よりも大きく、つまり、前記第３の対象は、前記第１の対象を完全に覆うことができる。例示的に、前記３次元仮想対象は、３次元オレンジ、３次元ボール、３次元スイカなどの球形の仮想対象であってもよい。前記第３の対象は、前記第３の対象の位置を位置決めるための第３の対象のポジショニングポイントを含む。 optionally, said third object is a three-dimensional virtual object, and the area of the two-dimensional projection in said first video frame image of said three-dimensional virtual object is larger than the area of said first object; That is, the third object can completely cover the first object. Exemplarily, the 3D virtual object may be a spherical virtual object such as a 3D orange, a 3D ball, a 3D watermelon. The third object includes a third object positioning point for positioning the position of the third object.

前記第３の対象は、上述した３次元仮想対象に限らず、２次元仮想対象や現実の対象など任意の対象であってもよく、前記現実の対象は、現実の物体が撮像された画像に形成される対象であり、例えば、第１のビデオフレーム画像に人の顔画像及び動物の顔画像が含まれており、このとき、動物の顔画像は、第３の対象として、前記人の顔画像上にオーバーレイしてよいこと、を理解されたい。 The third object is not limited to the three-dimensional virtual object described above, and may be any object such as a two-dimensional virtual object or a real object. For example, a first video frame image includes a human facial image and an animal facial image, wherein the animal facial image is the third object, the human facial image. It should be understood that it may be overlaid on the image.

このステップでは、第３の対象を前記第１の対象上にオーバーレイする際に、単一キーポイントアライメントという方法を用いて、第３の対象が覆う位置を特定し得る。例示的に、第３の対象に第３の対象のポジショニングポイントが含まれ、前記第１の対象には、前記第３の対象のポジショニングポイントに対応する第１の対象のポジショニングキーポイントが予め設置され、前記第１の対象のポジショニングキーポイントは、前記第１のキーポイントのうちの１つであってもよい。第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイする際に、前記第３の対象のポジショニングポイントと第１の対象のポジショニングキーポイントの位置とをアライメントすることにより、前記第３の対象を前記第１の対象上にオーバーレイすることができる。 In this step, a method of single keypoint alignment may be used to identify locations covered by a third object when overlaying the third object onto the first object. Illustratively, the third object includes a third object positioning point, and the first object is pre-configured with a first object positioning key point corresponding to the third object positioning point. and the first target positioning keypoint may be one of the first keypoints. by aligning the positioning points of the third object with the positions of the positioning key points of the first object when overlaying the third object as a foreground image on the first video frame image; Three objects can be overlaid on the first object.

単一のキーポイントを用いたアライメントはそれほど正確でない場合があるため、選択的に、前記の、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして第２のビデオフレーム画像を得ることは、前記第１のビデオフレーム画像における前記第１の対象の複数の第１のキーポイントから、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントを算出することと、前記第３の対象のポジショニングポイントと前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントとをアライメントすることにより、前記第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第２のビデオフレーム画像を得ることと、を含む。この選択的な実施例では、前記第１の対象のポジショニングポイントは、複数の第１のキーポイントから算出され、このように前記第１の対象のポジショニングポイントに複数の第１のキーポイントの位置情報が含まれるので、単一の第１のキーポイントよりも位置決めが正確になる。例示的に、図２に示すように、前記第１の対象は人の顔であり、前記複数の第１のキーポイントは、左右両目の中心キーポイント２０１と２０２、鼻先キーポイント２０３であり、両目の中心キーポイントと鼻先キーポイントのそれぞれが１／３ずつの重みを占めるようにして前記人の顔のポジショニングポイントを算出することができ、このように算出されるのは、両目の中心キーポイントと鼻先キーポイントとで構成される三角形の中心点２０４であり、このように単一のキーポイントの位置の歪みがポジショニングポイントの位置に与える影響がかなり小さく、第１の対象のポジショニングポイントがより安定で正確になる。 Optionally, foreground a third object based on said position of said first object in said first video frame image, as alignment using a single keypoint may not be very accurate. Overlaying the first video frame image as an image to obtain a second video frame image comprises: from a plurality of first keypoints of the first object in the first video frame image, the first calculating the positioning point of the first object in one video frame image; and aligning the positioning point of the third object with the positioning point of the first object in the first video frame image. overlaying the third object as a foreground image on the first video frame image to obtain the second video frame image. In this alternative embodiment, the positioning point of the first object is calculated from a plurality of first keypoints, thus the positioning point of the first object is the position of the plurality of first keypoints. Because the information is included, the positioning is more accurate than a single primary keypoint. Exemplarily, as shown in FIG. 2 , the first object is a human face, the plurality of first keypoints are left and right eye center keypoints 201 and 202, and a nose tip keypoint 203; The positioning point of the person's face can be calculated such that the eye center key point and the nose tip key point each occupy 1/3 of the weight, thus calculating the eye center key point and the nose tip key point respectively. point and the tip of the nose keypoint, thus the distortion of the position of a single keypoint has a fairly small effect on the position of the positioning point, and the positioning point of the first object is more stable and accurate.

第１の対象のポジショニングポイントを得た後、前記第３の対象のポジショニングポイントを前記第１の対象のポジショニングポイントの位置に設置し、第３の対象を前景として第１のビデオフレーム画像をレンダリングして第２のビデオフレーム画像を得る。
ステップＳ１０３：前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得る。 After obtaining the positioning point of the first object, setting the positioning point of the third object at the position of the positioning point of the first object, and rendering a first video frame image with the third object as the foreground. to obtain a second video frame image.
Step S103: superimposing the second object as a foreground image on a third object in the second video frame image according to the position of the second object in the first video frame image; 3 video frame images are obtained.

ステップＳ１０１で既に第２の対象が識別され、そのうち、前記第２の対象の前記第１のビデオフレーム画像における位置は、前記第２の対象の中心点の位置又は前記複数の第２のキーポイントによって特定され得、例示的に、前記第２の対象を識別する際に、前記第２の対象のポジショニングポイントを１つのキーポイントとして直接に識別し、又は前記複数の第２のキーポイントから前記第２の対象のポジショニングポイントを算出し、前記第２の対象のポジショニングポイントを算出する際に、算出に関与する第２のキーポイント毎に重み値を設定することができ、前記第２の対象のポジショニングポイントの位置は、前記複数の第２のキーポイントの位置の加重平均値である。前記第２の対象の前記第１のビデオフレーム画像における位置を得た後、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象の対応位置上に重ね合わせる。 A second object has already been identified in step S101, wherein the position of the second object in the first video frame image is the position of the center point of the second object or the plurality of second key points. exemplarily, in identifying the second object, directly identifying the positioning point of the second object as a keypoint, or from the plurality of second keypoints, the calculating a positioning point of a second object; setting a weight value for each second key point involved in the calculation when calculating the positioning point of the second object; is a weighted average of the plurality of second keypoint positions. After obtaining the position of the second object in the first video frame image, the second object is superimposed as a foreground image on the corresponding position of the third object in the second video frame image.

選択的に、前記第３の対象が３次元対象である場合、３次元対象は一定の奥行き（Ｚ軸方向の情報）を有し、第２の対象は２次元対象であり、奥行きを有しないため、第２の対象を第３の対象上に重ね合わせると、一定のずれが生じるので、第２の対象を第３の対象上に重ね合わせる際に、重ね合わせ結果が比較的に自然となるように、ある程度のオフセット量を持たせる必要がある。このとき、図３に示すように、ステップＳ１０３は以下のようなステップを含んで良い。 Optionally, if said third object is a 3D object, the 3D object has constant depth (Z-axis information) and the second object is a 2D object and has no depth. Therefore, when the second object is superimposed on the third object, a certain deviation occurs, so that the superimposition result becomes relatively natural when the second object is superimposed on the third object. Therefore, it is necessary to have a certain amount of offset. At this time, as shown in FIG. 3, step S103 may include the following steps.

ステップＳ３０１：前記第１の対象上に予め設定された２つのキーポイント間の第１の距離を算出する。 Step S301: Calculate a first distance between two preset keypoints on the first object.

ステップＳ３０２：前記第１の距離に基づいて、前記第２の対象のオフセット量を算出する。 Step S302: Calculate the offset amount of the second target based on the first distance.

ステップＳ３０３：前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得る。 Step S303: Based on the position of the second object in the first video frame image and the amount of offset of the second object, the third object in the second video frame image with the second object as the foreground image. to obtain a third video frame image.

前記第１の対象上に予め設定された２つのキーポイントは、第１の対象上の任意の２つのキーポイントであってもよく、例示的に、前記予め設定された２つのキーポイントは、第１の対象上の左右対称な２つのキーポイントであり、例えば、上記実施例では、第１の対象が人の顔である場合、前記予め設定された２つのキーポイントは、左右両目の中心キーポイントであってもよい。例示的に、前記の、前記第１の距離に基づいて、前記第２の対象のオフセット量を算出することは、第１の距離に１つのオフセット係数を乗算することであってもよく、前記オフセット係数は前記第１の距離の値に関連し、例えば、第１の距離の値が大きいほどオフセット係数は大きくなり、すると、第２の対象のオフセット量は大きくなり、これにより、第１の対象がレンズに近づくほど、第３の対象の３次元表面に適応するように前記第２の対象はより大きくオフセットするという効果をもたらす。 The two preset keypoints on the first object may be any two keypoints on the first object, exemplarily, the two preset keypoints are: Two symmetrical keypoints on the first object, for example, in the above example, if the first object is a human face, the two preset keypoints are the centers of the left and right eyes. It can be a key point. Illustratively, calculating the offset amount of the second object based on the first distance may be multiplying the first distance by one offset factor, and the The offset factor is related to the value of said first distance, e.g., the greater the value of the first distance, the greater the offset factor, and thus the greater the amount of offset of the second object, thereby causing the first This has the effect that the closer the object is to the lens, the more the second object is offset to accommodate the three-dimensional surface of the third object.

上述したオフセット量の算出は近似的な算出であり、選択的に、前記オフセット量をより正確に算出してもよく、このとき、図４に示すように、前記ステップＳ３０２は、以下のようなステップを含んでもよい。 The calculation of the offset amount described above is an approximate calculation, and optionally, the offset amount may be calculated more accurately. At this time, as shown in FIG. may include steps.

ステップＳ４０１：前記第１の対象のヨー角とピッチ角とを取得し、そのうち、前記ヨー角は、前記第１の対象の前記第１のビデオフレーム画像における向きとレンズ撮影方向との水平夾角であり、前記ピッチ角は、第１の対象の前記第１のビデオフレーム画像における向きとレンズ撮影方向との垂直夾角である。 Step S401: obtaining the yaw angle and pitch angle of the first object, wherein the yaw angle is the horizontal included angle between the orientation of the first object in the first video frame image and the lens shooting direction; A, wherein the pitch angle is a vertical included angle between an orientation in the first video frame image of the first object and a lens shooting direction.

ステップＳ４０２：前記第１の距離、ヨー角及びピッチ角に基づいて、前記第２の対象のオフセット量を算出する。 Step S402: Calculate the offset amount of the second object based on the first distance, yaw angle and pitch angle.

具体的には、前記ステップＳ４０２における算出手順を以下の式で表すことができる：

ここで、ｓｉｇｎは、符号を取る操作であり、ｄｘは、前記第２の対象のＸ軸方向のオフセット量であり、ｄｙは、前記第２の対象のＹ軸方向のオフセット量であり、ｄｉｓｔ_ｅｙｅは第１の距離であり、ｙａｗは、第１の対象のヨー角であり、ｐｉｔｃｈは、第１の対象のピッチ角であり、θ_１は、第２の対象が前方を向っているときの初期相対ピッチ角であり、θ_２は、第２の対象が前方を向っているときの初期相対ヨー角である。ここで、θ_１とθ_２は、予め設定された角度値であり、３次元の第３の対象の外面は一定のラジアンを有するので、移動する際に、ある程度の減衰が必要である。第１の対象が人の顔で、第２の対象が目と口である場合を例にして、口が第３の対象における上下領域の下半分の領域に位置するため、口の初期相対ピッチ角は大きく、口が第３の対象の左右領域の中間に位置するため、口の初期相対ヨー角は０である。他の第２の対象のθ_１とθ_２は、第３の対象の違いに応じて予め異なる値に設定されてもよく、ここではこれ以上説明しない。 Specifically, the calculation procedure in step S402 can be represented by the following equation:

Here, sign is a sign operation, dx is the offset amount of the second object in the X-axis direction, dy is the offset amount of the second object in the Y-axis direction, and dist _eye is the first distance, yaw is the yaw angle of the first object, pitch is the pitch angle of the first object, and θ ₁ is when the second object is facing forward. and θ ₂ is the initial relative yaw angle when the second object is facing forward. where θ ₁ and θ ₂ are preset angular values, and since the outer surface of the 3D third object has a constant radian, some attenuation is required when moving. Assuming that the first object is a human face and the second object is eyes and mouth, since the mouth is located in the lower half region of the upper and lower regions of the third object, the initial relative pitch of the mouth is The initial relative yaw angle of the mouth is zero because the corners are large and the mouth is located halfway between the left and right regions of the third object. θ ₁ and θ ₂ of other second targets may be set to different values in advance according to different third targets, and will not be further described here.

選択的に、前記のステップＳ３０３において、前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることは、前記第２の対象のオフセット量に基づいて、前記第２の対象の前記第１のビデオフレーム画像における初期ポジショニングポイントを位置シフトさせて目標ポジショニングポイントを得ることと、前記第２のビデオフレーム画像において、前記第２の対象を前景画像として前記目標ポジショニングポイントの位置にレンダリングして、前記第３のビデオフレーム画像を得ることと、をさらに含む。前述ステップにおいて、前記第２の対象の第１のビデオフレーム画像における位置を初期ポジショニングポイントとし、前記初期ポジショニングポイントは、Ｘ軸の座標値とＹ軸の座標値を含み、前記初期ポジショニングポイントのＸ軸座標値及びＹ軸座標値をそれぞれ前記オフセット量のＸ軸座標値とＹ軸座標値に加算して目標ポジショニングポイントを得ており、その後、第２のビデオフレーム画像において、前記第２の対象を前景として前記第２の目標ポジショニングポイントの位置にレンダリングして第３のビデオフレーム画像を得て、前記第３のビデオフレーム画像において、前記第３の対象上に第２の対象を重ね合わせ、第１の対象の特徴をリアルに反映することができ、これにより、エフェクトにより現実感を持たせることができる。 Optionally, in said step S303, said second object as a foreground image based on said second object's position in said first video frame image and said second object's offset amount. obtaining a third video frame image overlaid on a third object in the video frame image of said first video frame of said second object based on an offset amount of said second object position-shifting an initial positioning point in an image to obtain a target positioning point; rendering the second object as a foreground image at the position of the target positioning point in the second video frame image; obtaining a video frame image of . In the above step, the position of the second object in the first video frame image is defined as an initial positioning point, the initial positioning point includes an X-axis coordinate value and a Y-axis coordinate value, and the X-axis of the initial positioning point. adding an axial coordinate value and a Y-axis coordinate value to the X-axis coordinate value and the Y-axis coordinate value of the offset amount respectively to obtain a target positioning point; as the foreground at the position of the second target positioning point to obtain a third video frame image, superimposing the second object on the third object in the third video frame image; The characteristics of the first object can be realistically reflected, thereby making the effect more realistic.

選択的に、第３の対象と第２の対象は大きさが異なる可能性があるので、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせる前に、第３の対象に従って前記第２の対象をスケーリングすることをさらに含む。このとき、第３の対象上の所定の２つのキーポイント間の距離と、第１の対象上の所定の２つのキーポイント間の距離との比に基づいて、第２の対象のスケーリング率を決定することができ、このように前記第３の対象が前記第１の対象より大きい場合、前記第２の対象を拡大し、前記第３の対象が前記第１の対象より小さい場合、前記第２の対象を縮小する。このステップの後に前記第２のビデオフレーム画像における第３の対象上に重ね合わせられる第２の対象は、スケーリング後の第２の対象である。 Optionally, before superimposing said second object as a foreground image onto said third object in said second video frame image, since said third object and said second object may be of different sizes. , further comprising scaling said second object according to a third object. At this time, based on the ratio of the distance between two predetermined keypoints on the third object and the distance between two predetermined keypoints on the first object, the scaling factor of the second object is and thus, if the third object is larger than the first object, enlarge the second object, and if the third object is smaller than the first object, enlarge the Reduce the target of 2. The second object superimposed on the third object in the second video frame image after this step is the scaled second object.

さらに、前記目標対象が前記３次元仮想対象上にオーバーレイした後、目標対象の色と３次元仮想対象の色との間に違いがある可能性があるため、さらなる色処理によってこのような違いを解消することで、第２の対象と第３の対象との重ね合わせ効果をより自然にすることができる。さらに、前記ステップＳ１０３の後に、
前記第３のビデオフレーム画像における第２の対象及び第３の対象の色空間を、ＲＧＢ（Ｒｅｄ、Ｇｒｅｅｎ、Ｂｌｕｅ；赤、緑、青）色空間からＨＳＶＨＳＶ（Ｈｕｅ、Ｓａｔｕｒａｔｉｏｎ、Ｖａｌｕｅ；色相、彩度、明度）色空間に変換することと、
前記第２の対象のＨＳＶ色空間におけるＨチャネルの値を前記第３の対象のＨチャネルの値に置き換えることと、
前記第２の対象及び第３の対象の色空間をＨＳＶ空間からＲＧＢ色空間に変換して第４のビデオフレーム画像を得ることと、をさらに含む。 Furthermore, since there may be differences between the colors of the target object and the colors of the 3D virtual object after the target object is overlaid on the 3D virtual object, further color processing may be used to eliminate such differences. By canceling, the effect of superimposing the second object and the third object can be made more natural. Furthermore, after step S103,
The color space of the second object and the third object in the third video frame image is changed from RGB (Red, Green, Blue; red, green, blue) color space to HSV HSV (Hue, Saturation, Value; hue, (saturation, lightness) color space;
replacing H channel values in the HSV color space of the second object with H channel values of the third object;
Converting the color space of the second object and the third object from HSV space to RGB color space to obtain a fourth video frame image.

前記ステップにより、ＲＧＢ空間をＨＳＶ空間に変換し、第２の対象の元の彩度と明度を保存した場合に、第２の対象の色を第３の対象の前記目標位置における色に変換して、このように第２の対象と第３の対象との融合度をより高め、第２の対象が第３の対象の一部であるように見えるようにして、より現実的にすることができる。 said step transforming the color of the second object to the color of the third object at said target position if the RGB space is transformed into the HSV space and the original saturation and lightness of the second object are preserved; In this way, it is possible to further increase the degree of fusion between the second object and the third object, make the second object appear to be part of the third object, and make it more realistic. can.

本開示の実施例には、画像処理方法、装置、電子機器及びコンピュータ読み取り可能な記憶媒体が開示されている。そのうち、この画像処理方法は、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することと、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得ることと、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を含む。上記方法により、従来技術における、対象上にオーバーレイしたエフェクトが対象の本来の特徴を表すことができないことに起因する現実感に欠けるという技術課題を解決した。 An image processing method, apparatus, electronic device and computer-readable storage medium are disclosed in embodiments of the present disclosure. Among which, the image processing method includes identifying a first object in a first video frame image and a second object located within the first object; overlaying a third object as a foreground image on the first video frame image based on the position in the video frame image of the second video frame with the third object covering the first object obtaining an image and superimposing the second object as a foreground image over a third object in the second video frame image based on the position of the second object in the first video frame image; to obtain a third video frame image. The above method solves the technical problem of the lack of realism caused by the fact that the effect overlaid on the object cannot represent the original feature of the object in the prior art.

以上、上記方法の実施例における各ステップを上述した順序で説明したが、本開示の実施例におけるステップは必ずしも上述した順序で実行される必要はなく、逆順、並列、交差などの他の順序で実行されてもよいことは当業者にとって自明であり、さらに、上述したステップを基にして、当業者は、他のステップを追加してもよく、これらの明らかな変形又は同等な置換の方式は、本開示の保護範囲に含まれるべきであり、ここではこれ以上説明しない。 Although the steps in the above method embodiments have been described in the above order, the steps in the embodiments of the present disclosure do not necessarily have to be performed in the above order, but in other orders such as reverse, parallel, cross, etc. It will be obvious to those skilled in the art what may be performed, and further, based on the steps described above, those skilled in the art may add other steps, these obvious variations or equivalent substitution schemes are , should be included in the protection scope of the present disclosure and will not be further described here.

図５は本開示の実施例による画像処理装置の実施例の構造模式図である。図５に示すように、この装置５００は、対象識別モジュール５０１と、第２のビデオフレーム画像生成モジュール５０２と、第３のビデオフレーム生成モジュール５０３とを含む。そのうち、
対象識別モジュール５０１は、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別するためのものであり、
第２のビデオフレーム画像生成モジュール５０２は、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得るためのものであり、
第３のビデオフレーム生成モジュール５０３は、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得るためのものである。 FIG. 5 is a structural schematic diagram of an embodiment of an image processing device according to an embodiment of the present disclosure. As shown in FIG. 5, the apparatus 500 includes an object identification module 501 , a second video frame image generation module 502 and a third video frame generation module 503 . Among them
the object identification module 501 is for identifying a first object in a first video frame image and a second object located within said first object;
A second video frame image generation module 502 overlays a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image. , for obtaining a second video frame image in which the third object covers the first object;
A third video frame generation module 503 generates a third object in the second video frame image with the second object as a foreground image based on the position of the second object in the first video frame image. Superimposed on top to obtain a third video frame image.

さらに、前記対象識別モジュール５０１は、さらに、
前記第１のビデオフレーム画像において前記第１の対象の複数の第１のキーポイントと前記第２の対象の複数の第２のキーポイントとを識別することと、
前記第１の対象の複数の第１のキーポイントに基づいて前記第１の対象を識別することと、
前記第２の対象のエッジキーポイントである前記第２の対象の複数の第２のキーポイントに基づいて、前記第２の対象を識別することと、に用いられる。 Furthermore, the object identification module 501 further:
identifying a plurality of first keypoints of the first object and a plurality of second keypoints of the second object in the first video frame image;
identifying the first object based on a plurality of first keypoints of the first object;
and identifying the second object based on a plurality of second keypoints of the second object that are edge keypoints of the second object.

さらに、前記第２のビデオフレーム画像生成モジュール５０２は、
前記第１のビデオフレーム画像における前記第１の対象の複数の第１のキーポイントに基づいて、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントを算出することと、
前記第３の対象のポジショニングポイントと、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントとをアライメントすることにより、前記第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第２のビデオフレーム画像を得ることと、を含む。 Further, the second video frame image generation module 502 may include:
calculating a positioning point of the first object in the first video frame image based on a plurality of first keypoints of the first object in the first video frame image;
aligning the positioning point of the third object with the positioning point of the first object in the first video frame image, with the third object as a foreground image on the first video frame image; to obtain the second video frame image.

さらに、前記第２のビデオフレーム画像生成モジュール５０２は、さらに、
前記第１の対象上に予め設定された２つのキーポイント間の第１の距離を算出することと、
前記第１の距離に基づいて、前記第２の対象のオフセット量を算出することと、
前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、に用いられる。 Additionally, the second video frame image generation module 502 further comprises:
calculating a first distance between two preset keypoints on the first object;
calculating an offset amount of the second target based on the first distance;
above a third object in the second video frame image with the second object as the foreground image based on the position of the second object in the first video frame image and the amount of offset of the second object; and obtaining a third video frame image by superimposing the .

さらに、前記の、前記第１の距離に基づいて、前記第２の対象のオフセット量を算出することは、
前記第１の対象のヨー角とピッチ角とを取得し、そのうち、前記ヨー角は、前記第１の対象の前記第１のビデオフレーム画像における向きとレンズ撮影方向との水平夾角であり、前記ピッチ角は、第１の対象の前記第１のビデオフレーム画像における向きとレンズ撮影方向との垂直夾角であることと、
前記第１の距離、ヨー角及びピッチ角に基づいて前記第２の対象のオフセット量を算出することと、を含む。 Furthermore, calculating the offset amount of the second target based on the first distance,
obtaining a yaw angle and a pitch angle of the first object, wherein the yaw angle is a horizontal included angle between an orientation of the first object in the first video frame image and a lens shooting direction; a pitch angle is a vertical included angle between an orientation in the first video frame image of the first object and a lens shooting direction;
calculating an offset amount of the second object based on the first distance, yaw angle and pitch angle.

さらに、前記の、前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることは、
前記第２の対象のオフセット量に基づいて、前記第２の対象の前記第１のビデオフレーム画像における初期ポジショニングポイントを位置シフトさせて目標ポジショニングポイントを得ることと、
前記第２のビデオフレーム画像において、前記第２の対象を前景画像として前記目標ポジショニングポイントの位置にレンダリングして、前記第３のビデオフレーム画像を得ることと、を含む。 Further, based on the position of the second object in the first video frame image and the offset amount of the second object, the second object in the second video frame image with the second object as the foreground image Obtaining a third video frame image superimposed on the third object
position-shifting an initial positioning point in the first video frame image of the second object to obtain a target positioning point based on the offset amount of the second object;
rendering the second object as a foreground image in the second video frame image at the location of the target positioning point to obtain the third video frame image.

さらに、前記画像処理装置５００は、
前記第３のビデオフレーム画像における第２の対象及び第３の対象の色空間をＲＧＢ色空間からＨＳＶ色空間に変換することと、前記第２の対象のＨＳＶ色空間におけるＨチャネルの値を前記第３の対象のＨチャネルの値に置き換えることと、前記第２の対象及び第３の対象の色空間をＨＳＶ空間からＲＧＢ色空間に変換して第４のビデオフレーム画像を得ることと、に用いられる第４のビデオ画像生成モジュールをさらに含む。 Furthermore, the image processing device 500
converting the color space of the second object and the third object in the third video frame image from the RGB color space to the HSV color space; replacing the H channel values of a third object; and converting the color spaces of the second and third objects from the HSV space to the RGB color space to obtain a fourth video frame image. It further includes a fourth video image generation module that is used.

図５に示す装置は、図１～図４に示す実施例の方法を実行することができ、本実施例に詳細に説明されていない部分については、図１～図４に示す実施例に関する説明を参照することができる。該技術案の実行手順及び技術効果については、図１～図４に示す実施例の説明を参照することができ、ここではこれ以上説明しない。 The apparatus shown in FIG. 5 is capable of carrying out the method of the embodiment shown in FIGS. 1-4, and for those parts not described in detail in this embodiment, the description of the embodiment shown in FIGS. 1-4 is repeated. can be referred to. The implementation procedures and technical effects of the technical solution can be referred to the description of the embodiments shown in FIGS. 1 to 4, and are not further described herein.

以下、本開示の実施例を実現することに適合する電子機器（例えば、図１における端末機器又はサーバ）６００の構造模式図を示す図６を参照する。本開示の実施例中の端末機器は、例えば、携帯電話、ノートパソコン、デジタル放送受信機、ＰＤＡ（パーソナルデジタルアシスタント）、ＰＡＤ（タブレット）、ＰＭＰ（可搬式マルチメディア再生機）、車載端末（例えば、車載ナビゲーション端末）などの携帯端末、及び例えばデジタルＴＶ、デスクトップコンピュータなどの固定端末を含み得るが、それらに限らない。図６に示された電子機器は一例に過ぎず、本開示の実施例の機能及び使用範囲に何の制限も加えない。 6, which shows a structural schematic diagram of an electronic device (eg, terminal device or server in FIG. 1) 600 suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure include, for example, mobile phones, notebook computers, digital broadcast receivers, PDA (personal digital assistant), PAD (tablet), PMP (portable multimedia player), in-vehicle terminals (e.g. , in-vehicle navigation terminals), and fixed terminals such as, for example, digital TVs and desktop computers. The electronic device shown in FIG. 6 is only an example and does not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

図６に示すように、電子機器６００は処理装置（例えば、中央処理装置、グラフィックプロセッサーなど）６０１を含んでもよく、それはリードオンリメモリ（ＲＯＭ）６０２に記憶されたプログラム又は記憶装置６０８からランダムアクセスメモリ（ＲＡＭ）６０３にロードされたプログラムによって各種の適切な動作及び処理を実行することができる。ＲＡＭ６０３には、電子機器６００の操作に必要な各種のプログラムとデータが記憶されている。処理装置６０１、ＲＯＭ６０２及びＲＡＭ６０３は、バス６０４を介して相互に接続される。入力／出力（Ｉ／Ｏ）インタフェース６０５もバス６０４に接続される。 As shown in FIG. 6, electronic device 600 may include a processing unit (eg, central processing unit, graphics processor, etc.) 601 , which may be randomly accessed from memory 608 or programs stored in read-only memory (ROM) 602 . Various suitable operations and processes may be performed by programs loaded into memory (RAM) 603 . The RAM 603 stores various programs and data necessary for operating the electronic device 600 . Processing unit 601 , ROM 602 and RAM 603 are interconnected via bus 604 . Input/output (I/O) interface 605 is also connected to bus 604 .

一般的には、例えばタッチスクリーン、タッチパネル、キーボード、マウス、カメラヘッド、マイクロホン、加速度計、ジャイロなどを含む入力装置６０６と、例えば液晶ディスプレイー（ＬＣＤ）、スピーカー、発振器などを含む出力装置６０７と、例えば磁気テープ、ハードディスクなどを含む記憶装置６０８と、通信装置６０９と、がＩ／Ｏインタフェース６０５に接続されていてもよい。通信装置６０９は、電子機器６００が他のデバイスと無線又は有線通信してデータを交換することを許可することができる。図６に各種の装置を備えた電子機器６００が示されているが、示された装置の全てを実施したり具備したりすることを要求する意図がないことを理解されたい。それ以上又は以下の装置を代替的に実施したり具備したりすることが可能である。 Generally, input devices 606 including, for example, touch screens, touch panels, keyboards, mice, camera heads, microphones, accelerometers, gyros, etc., and output devices 607, including, for example, liquid crystal displays (LCDs), speakers, oscillators, etc. Storage devices 608 , including, for example, magnetic tapes, hard disks, etc., and communication devices 609 may be connected to I/O interface 605 . Communication device 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. It should be understood that although FIG. 6 illustrates electronic device 600 with various devices, it is not intended to require that all of the devices shown be implemented or include all of them. More or less devices may alternatively be implemented or included.

特に、本開示の実施例によれば、フローチャートを参照しながら上述したプロセスはコンピュータソフトウェアプログラムとして実現できる。例えば、本開示の実施例はコンピュータプログラム製品を含み、それは非一時的なコンピュータ読み取り可能な媒体に搭載されたコンピュータプログラムを含み、当該コンピュータプログラムは、フローチャートに示す方法を実行するためのプログラムコードを含む。このような実施例では、該コンピュータプログラムは通信装置６０９によってネットワークからダウンロード及びインストールされ、又は記憶装置６０８からインストールされ、又はＲＯＭ６０２からインストールされ得る。該コンピュータプログラムが処理装置６０１によって実行される時に、本開示の実施例の方法で限定された上記機能が実行される。 In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program resident on a non-transitory computer-readable medium, the computer program having program code for performing the method illustrated in the flowcharts. include. In such embodiments, the computer program may be downloaded and installed from a network by communication device 609, installed from storage device 608, or installed from ROM 602. FIG. When the computer program is executed by the processing unit 601, the above-described functions defined in the method of the embodiments of the present disclosure are performed.

なお、本開示における上記コンピュータ読み取り可能な媒体は、コンピュータ読み取り可能な信号媒体、又はコンピュータ読み取り可能な記憶媒体、又は上記両者の任意の組み合わせであってもよい。コンピュータ読み取り可能な記憶媒体は、例えば、電気、磁気、光、電磁、赤外線又は半導体のシステム、装置又はデバイス、或いはこれらの任意の組み合わせであってもよいが、それらに限らない。コンピュータ読み取り可能な記憶媒体は、より具体的な例として、１つ又は複数の導線を有する電気接続、携帯型コンピュータ磁気ディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、光ファイバ、ポータブルコンパクトディスクリードオンリーメモリ（ＣＤ－ＲＯＭ）、光学記憶装置、磁気記憶装置又はこれらの任意の適切な組み合わせを含んでもよいが、それらに限らない。本開示では、コンピュータ読み取り可能な記憶媒体は、プログラムを含む或いは記憶するいかなる有形媒体であってもよく、該プログラムはコマンド実行システム、装置又はデバイスに使用されるか、それらと組み合わせて使用されることが可能である。本開示では、コンピュータ読み取り可能な信号媒体は、ベースバンドで伝播されるデータ信号又は搬送波の一部として伝播されるデータ信号を含んでもよく、その中にコンピュータ読み取り可能なプログラムコードを搭載した。このように伝播されるデータ信号は多種の形式を採用してもよく、電磁信号、光信号又はこれらの任意の適切な組み合わせを含むが、それらに限らない。コンピュータ読み取り可能な信号媒体は、更にコンピュータ読み取り可能な記憶媒体以外の任意のコンピュータ読み取り可能な媒体であってもよく、該コンピュータ読み取り可能な信号媒体は、コマンド実行システム、装置又はデバイスに使用されるかそれらと組み合わせて使用されるプログラムを送信、伝播又は伝送することができる。コンピュータ読み取り可能な媒体に含まれるプログラムコードは任意の適切な媒体で伝送可能であり、電線、光ケーブル、ＲＦ（ラジオ周波数）など、又はこれらの任意の適切な組み合わせを含んでもよいが、それらに限らない。 It should be noted that the computer-readable medium in the present disclosure may be a computer-readable signal medium, a computer-readable storage medium, or any combination of the above. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. Computer-readable storage media include, as more specific examples, electrical connections having one or more conductors, portable computer magnetic disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable may include, but is not limited to, programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination thereof. do not have. For purposes of this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that is used with or in combination with a command execution system, apparatus, or device. Is possible. In the present disclosure, a computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave and have computer readable program code embodied therein. The data signals so propagated may take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium other than a computer readable storage medium used in a command execution system, apparatus or device. or any program used in conjunction therewith may be transmitted, propagated or transmitted. Program code embodied in computer readable media may be transmitted in any suitable medium including, but not limited to, electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination thereof. do not have.

いくつかの実施形態において、クライアント、サーバは、ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ、ハイパーテキスト転送プロトコル）などの任意の現在既知の又は将来開発されるネットワークプロトコルを用いて通信することができ、任意の形式又は媒体のデジタルデータ通信（例えば、通信ネットワーク）と相互接続することができる。通信ネットワークの例としては、ローカルエリアネットワーク（「ＬＡＮ」）、広域ネットワーク（「ＷＡＮ」）、インターネット（例えば、Ｉｎｔｅｒｎｅｔ）及びエンドツーエンドネットワーク（例えば、ａｄｈｏｃエンドツーエンドネットワーク）、並びに現在既知の又は将来開発される任意のネットワークを含む。 In some embodiments, the client, server may communicate using any now known or later developed network protocol, such as HTTP (HyperText Transfer Protocol), and any format or It can be interconnected with a medium of digital data communication (eg, a communication network). Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., Internet) and end-to-end networks (e.g., ad hoc end-to-end networks), as well as currently known or any network developed in the future.

上記コンピュータ読み取り可能な媒体は上記電子機器に含まれるものであってもよいし、該電子機器に実装されていなく単独したものであってもよい。 The computer-readable medium may be included in the electronic device, or may stand alone without being installed in the electronic device.

上記コンピュータ読み取り可能な媒体には、１つ又は複数のプログラムが搭載されており、上記１つ又は複数のプログラムが該電子機器によって実行されると、該電子機器に、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することと、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得ることと、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を実行させる。 The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device instructs identifying a first object and a second object located within said first object; and identifying a third object based on the position of said first object in said first video frame image. obtaining a second video frame image with the third object overlying the first object as a foreground image overlaid on the first video frame image; superimposing the second object as a foreground image on a third object in the second video frame image to obtain a third video frame image based on the position in the video frame image of Let

本開示の操作を実行するためのコンピュータプログラムコードは、１種又は多種のプログラミング言語又はそれらの組み合わせを用いて作成されることが可能であり、上記プログラミング言語は、オブジェクト指向のプログラミング言語、例えばＪａｖａ、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋を含み、更に一般の手続き型プログラミング言語、例えば「Ｃ」言語又は類似的なプログラミング言語を含むが、それらに限らない。プログラムコードは、完全にユーザコンピュータで実行されたり、部分的にユーザコンピュータで実行されたり、独立したソフトウェアパッケージとして実行されたり、一部がユーザコンピュータで一部がリモートコンピュータで実行されたり、完全にリモートコンピュータ又はサーバで実行されたりすることができる。リモートコンピュータの場合に、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）又は広域ネットワーク（ＷＡＮ）を含む任意のネットワークによってユーザコンピュータに接続でき、又は、外部コンピュータに接続できる（例えば、インターネットサービスプロバイダーを用いてインターネット経由で接続する）。 Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or a combination thereof, which may be an object-oriented programming language such as Java , Smalltalk, C++, and also general procedural programming languages such as, but not limited to, the "C" language or similar programming languages. Program code may run entirely on the user computer, partially on the user computer, as a separate software package, partly on the user computer and partly on a remote computer, or completely It can also be run on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user computer by any network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (e.g., using an Internet service provider). connect via the Internet).

添付図面のうちフローチャート及びブロック図は本開示の各種の実施例に係るシステム、方法及びコンピュータプログラム製品の実現可能なシステムアーキテクチャ、機能及び動作を示す。この点では、フローチャート又はブロック図における各ブロックは、一つのモジュール、プログラムセグメント、又はコードの一部分を代表することができ、該モジュール、プログラムセグメント、又はコードの一部分は、指定された論理機能を実現するための１つ又は複数の実行可能命令を含む。なお、いくつかの置換としての実現では、ブロックに表記される機能は図面に付したものと異なる順序で実現されてもよい。例えば、二つの連続的に示されたブロックは実質的に基本的に並列に実行されてもよく、また、係る機能によって、それらは逆な順序で実行されてもよい場合がある。なお、ブロック図及び／又はフローチャートにおける各ブロック、及びブロック図及び／又はフローチャートにおけるブロックの組み合わせは、指定される機能又は操作を実行するハードウェアに基づく専用システムによって実現されてもよいし、又は専用ハードウェアとコンピュータ命令との組み合わせによって実現されてもよいことに注意すべきである。 The flowcharts and block diagrams in the accompanying drawings illustrate possible system architectures, functionality, and operation of systems, methods and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram can represent a module, program segment, or portion of code, which implements the specified logical function. contains one or more executable instructions for It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two consecutively shown blocks may be executed substantially in parallel, or they may be executed in the reverse order, depending on the functionality involved. It is noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by a dedicated hardware-based system that performs the specified functions or operations, or may be implemented by a dedicated system. Note that it may also be implemented by a combination of hardware and computer instructions.

本開示の実施例に係るユニットはソフトウェアで実現されてもよいし、ハードウェアで実現されてもよい。ここで、ユニットの名称は、ある場合には該ユニット自身に対する限定とはならない。 Units according to embodiments of the present disclosure may be implemented in software or hardware. Here, the name of the unit is in some cases not a limitation on the unit itself.

本明細書で上述された機能は、少なくとも部分的に１つ又は複数のハードウェア論理構成要素によって実行され得る。例えば、使用可能な例示的なハードウェア論理構成要素は、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、特定用途向け集積回路（ＡＳＩＣ）、特定用途向け標準製品（ＡＳＳＰ）、システムオンチップ（ＳＯＣ）、コンプレックスプログラマブル論理装置（ＣＰＬＤ）などを含むが、それらに限らない。 The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, exemplary hardware logic components that can be used are Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Including, but not limited to, logic devices (CPLDs) and the like.

本開示の説明において、機器読み取り可能な媒体は、有形媒体であってもよく、コマンド実行システム、装置又はデバイスによって使用され、もしくはコマンド実行システム、装置又はデバイスと組み合わせて使用されるプログラムを含んだり記憶したりすることができる。機器読み取り可能な媒体は、機器読み取り可能な信号媒体又は機器読み取り可能な記憶媒体であり得る。機器読み取り可能な媒体は、電子的、磁気的、光学的、電磁的、赤外線の媒体、又は半導体システム、装置もしくはデバイス、又は上記の任意の好適な組み合わせを含み得るが、それらに限らない。機器読み取り可能な記憶媒体のより具体的な例は、１つ又は複数の配線に基づく電気接続、ポータブルコンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、光ファイバ、ポータブルコンパクトディスクリードオンリーメモリ（ＣＤ－ＲＯＭ）、光記憶装置、磁気記憶装置、又はこれらの任意の適切な組み合わせを含む。 In the context of this disclosure, a machine-readable medium may be a tangible medium and may include a program used by or in combination with a command execution system, apparatus or device. can be remembered. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared media, or semiconductor systems, apparatus or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media are electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory. Including memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination thereof.

本開示の実施例は、コンピュータによって実行されると前記コンピュータに本開示の実施例による画像処理方法を実行させるコンピュータプログラムをさらに提供する。 An embodiment of the disclosure further provides a computer program product which, when executed by a computer, causes said computer to perform an image processing method according to an embodiment of the disclosure.

本開示の１つ又は複数の実施例によれば、
第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することと、
前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、第２のビデオフレーム画像を得ることであって、そのうち、前記第２のビデオフレーム画像において前記第３の対象が前記第１の対象を覆ったことと、
前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を含む画像処理方法を提供する。
さらに、前記の、第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別することは、
前記第１のビデオフレーム画像において前記第１の対象の複数の第１のキーポイントと前記第２の対象の複数の第２のキーポイントとを識別することと、
前記第１の対象の複数の第１のキーポイントに基づいて前記第１の対象を識別することと、
前記第２の対象の複数の第２のキーポイントに基づいて、前記第２の対象を識別することであって、そのうち、前記複数の第２のキーポイントは前記第２の対象のエッジキーポイントであることと、を含む。 According to one or more embodiments of the present disclosure,
identifying a first object in a first video frame image and a second object located within the first object;
obtaining a second video frame image by overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image; wherein the third object covered the first object in the second video frame image;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; and obtaining a frame image.
Further, said identifying a first object in said first video frame image and a second object located within said first object comprises:
identifying a plurality of first keypoints of the first object and a plurality of second keypoints of the second object in the first video frame image;
identifying the first object based on a plurality of first keypoints of the first object;
identifying the second object based on a plurality of second keypoints of the second object, wherein the plurality of second keypoints are edge keypoints of the second object; including being

さらに、前記の、前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、第２のビデオフレーム画像を得ることは、
前記第１のビデオフレーム画像における前記第１の対象の複数の第１のキーポイントに基づいて、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントを算出することと、
前記第３の対象のポジショニングポイントと、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントとをアライメントすることにより、前記第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第２のビデオフレーム画像を得ることと、を含む。 and overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image to produce a second video frame. to get the image
calculating a positioning point of the first object in the first video frame image based on a plurality of first keypoints of the first object in the first video frame image;
aligning the positioning point of the third object with the positioning point of the first object in the first video frame image, with the third object as a foreground image on the first video frame image; to obtain the second video frame image.

さらに、前記の、前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることは、
前記第１の対象上に予め設定された２つのキーポイント間の第１の距離を算出することと、
前記第１の距離に基づいて、前記第２の対象のオフセット量を算出することと、
前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、を含む。 and superimposing the second object as a foreground image on the third object in the second video frame image based on the position of the second object in the first video frame image. , to obtain the third video frame image,
calculating a first distance between two preset keypoints on the first object;
calculating an offset amount of the second target based on the first distance;
above a third object in the second video frame image with the second object as the foreground image based on the position of the second object in the first video frame image and the amount of offset of the second object; to obtain a third video frame image.

さらに、第３のビデオフレーム画像を得た後、
前記第３のビデオフレーム画像における第２の対象及び第３の対象の色空間を、ＲＧＢ色空間からＨＳＶ色空間に変換することと、
前記第２の対象のＨＳＶ色空間におけるＨチャネルの値を前記第３の対象のＨチャネルの値に置き換えることと、
前記第２の対象及び第３の対象の色空間をＨＳＶ空間からＲＧＢ色空間に変換して第４のビデオフレーム画像を得ることと、をさらに含む。 Further, after obtaining the third video frame image,
converting the color space of the second object and the third object in the third video frame image from the RGB color space to the HSV color space;
replacing H channel values in the HSV color space of the second object with H channel values of the third object;
Converting the color space of the second object and the third object from HSV space to RGB color space to obtain a fourth video frame image.

本開示の１つ又は複数の実施例によれば、
第１のビデオフレーム画像における第１の対象と、前記第１の対象内に位置する第２の対象とを識別するための対象識別モジュールと、
前記第１の対象の前記第１のビデオフレーム画像における位置に基づいて、第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第３の対象が前記第１の対象を覆った第２のビデオフレーム画像を得るための第２のビデオフレーム画像生成モジュールと、
前記第２の対象の前記第１のビデオフレーム画像における位置に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得るための第３のビデオフレーム生成モジュールと、を含む画像処理装置を提供する。 According to one or more embodiments of the present disclosure,
an object identification module for identifying a first object in a first video frame image and a second object located within said first object;
overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image, wherein the third object is the first video frame image; a second video frame image generation module for obtaining a second video frame image covering the object;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; and a third video frame generation module for obtaining frame images.

さらに、前記対象識別モジュールは、さらに、
前記第１のビデオフレーム画像において前記第１の対象の複数の第１のキーポイントと前記第２の対象の複数の第２のキーポイントとを識別することと、
前記第１の対象の複数の第１のキーポイントに基づいて前記第１の対象を識別することと、
前記第２の対象の複数の第２のキーポイントに基づいて、前記第２の対象を識別することであって、そのうち、前記複数の第２のキーポイントは前記第２の対象のエッジキーポイントであることと、に用いられる。 Further, the object identification module further comprises:
identifying a plurality of first keypoints of the first object and a plurality of second keypoints of the second object in the first video frame image;
identifying the first object based on a plurality of first keypoints of the first object;
identifying the second object based on a plurality of second keypoints of the second object, wherein the plurality of second keypoints are edge keypoints of the second object; Used to be and

さらに、前記第２のビデオフレーム画像生成モジュールは、
前記第１のビデオフレーム画像における前記第１の対象の複数の第１のキーポイントに基づいて、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントを算出することと、
前記第３の対象のポジショニングポイントと、前記第１のビデオフレーム画像における前記第１の対象のポジショニングポイントとをアライメントすることにより、前記第３の対象を前景画像として前記第１のビデオフレーム画像上にオーバーレイして、前記第２のビデオフレーム画像を得ることと、を含む。 Further, the second video frame image generation module includes:
calculating a positioning point of the first object in the first video frame image based on a plurality of first keypoints of the first object in the first video frame image;
aligning the positioning point of the third object with the positioning point of the first object in the first video frame image, with the third object as a foreground image on the first video frame image; to obtain the second video frame image.

さらに、前記第２のビデオフレーム画像生成モジュールは、さらに、
前記第１の対象上に予め設定された２つのキーポイント間の第１の距離を算出することと、
前記第１の距離に基づいて、前記第２の対象のオフセット量を算出することと、
前記第２の対象の前記第１のビデオフレーム画像における位置及び前記第２の対象のオフセット量に基づいて、前記第２の対象を前景画像として前記第２のビデオフレーム画像における第３の対象上に重ね合わせて、第３のビデオフレーム画像を得ることと、に用いられる。 Further, the second video frame image generation module further comprises:
calculating a first distance between two preset keypoints on the first object;
calculating an offset amount of the second target based on the first distance;
above a third object in the second video frame image with the second object as the foreground image based on the position of the second object in the first video frame image and the amount of offset of the second object; and obtaining a third video frame image by superimposing the .

さらに、前記画像処理装置は、
前記第３のビデオフレーム画像における第２の対象及び第３の対象の色空間をＲＧＢ色空間からＨＳＶ色空間に変換することと、前記第２の対象のＨＳＶ色空間におけるＨチャネルの値を前記第３の対象のＨチャネルの値に置き換えることと、前記第２の対象及び第３の対象の色空間をＨＳＶ空間からＲＧＢ色空間に変換して第４のビデオフレーム画像を得ることと、に用いられる第４のビデオ画像生成モジュールを、さらに含む。 Furthermore, the image processing device
converting the color space of the second object and the third object in the third video frame image from the RGB color space to the HSV color space; replacing the H channel values of a third object; and converting the color spaces of the second and third objects from the HSV space to the RGB color space to obtain a fourth video frame image. Further includes a fourth video image generation module that is used.

本開示の１つ又は複数の実施例によれば、コンピュータによって実行されると前記コンピュータに本開示の実施例による画像処理方法を実行させるコンピュータプログラムを提供する。
上述したのは本開示の実施例及び適用する技術原理の説明に過ぎない。本開示に係る開示の範囲は、上記技術特徴の特定組合による技術案に限定されなく、上記開示の構想を逸脱することなく上記技術特徴又はそれと同等な特徴を任意に組み合わせて形成した他の技術案をも含むべきであることは、当業者にとって自明である。例えば、上記特徴と本開示に開示された（それらに限らない）類似的な機能を有する技術的特徴を互いに取り替えて形成した技術案をも含む。
また、各操作は、特定の順序で記述されているが、これは、そのような操作が、示されている特定の順序で又は順次的な順序で実行されることを求めていると理解されるべきではない。一定の環境では、マルチタスク及び並行処理が有利であり得る。同様に、若干の具体的な実現の詳細が上記の記述に含まれているが、それらは、本開示の範囲を制限するものとして解釈されるべきではない。単一の実施例のコンテキストで記載されている特定の特徴は、単一の実施例において組み合わせて実現可能である。逆に、単一の実施例のコンテキストで記載されている様々な特徴は、複数の実施例で別々に、又は何らかの適切なサブコンビネーションで実現可能でもある。 According to one or more embodiments of the disclosure, there is provided a computer program which, when executed by a computer, causes the computer to perform an image processing method according to embodiments of the disclosure.
The above is merely a description of the embodiments and applied technical principles of the present disclosure. The scope of the disclosure according to the present disclosure is not limited to a technical proposal by a specific combination of the above technical features, and other technologies formed by arbitrarily combining the above technical features or equivalent features without departing from the concept of the above disclosure It is obvious to those skilled in the art that the proposal should also be included. For example, technical solutions formed by replacing the above features with technical features having similar functions disclosed in the present disclosure (but not limited to them) are also included.
Also, although each operation has been described in a particular order, this is understood to require that such operations be performed in the specific order shown or in a sequential order. shouldn't. Multitasking and parallel processing can be advantageous in certain environments. Similarly, although some specific implementation details are included in the above description, they should not be construed as limiting the scope of the disclosure. Certain features that are described in the context of a single embodiment can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

本主題は、構造的特徴及び／又は方法論理動作に特有の言語で記述されたが、添付の特許請求の範囲に限定された主題は、必ずしも上記で記述された特定の特徴又は動作に限定されないことを理解されたい。むしろ、上述された特定の特徴及び動作は、特許請求の範囲を実現する例示的な形態に過ぎない。 Although the subject matter has been described in language specific to structural features and/or methodological logical acts, the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Please understand. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

identifying a first object in a first video frame image and a second object located within the first object;
obtaining a second video frame image by overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image; wherein the third object covered the first object in the second video frame image;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; obtaining a frame image; and an image processing method.

Identifying a first object in said first video frame image and a second object located within said first object comprises:
identifying a plurality of first keypoints of the first object and a plurality of second keypoints of the second object in the first video frame image;
identifying the first object based on a plurality of first keypoints of the first object;
identifying the second object based on a plurality of second keypoints of the second object, wherein the plurality of second keypoints are edge keypoints of the second object; The image processing method of claim 1, comprising:

Overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image to obtain a second video frame image. The thing is
calculating a positioning point of the first object in the first video frame image based on a plurality of first keypoints of the first object in the first video frame image;
aligning the positioning point of the third object with the positioning point of the first object in the first video frame image, with the third object as a foreground image on the first video frame image; 3. The image processing method of claim 2, comprising overlaying on to obtain the second video frame image.

superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image; To get the video frame image of
calculating a first distance between two preset keypoints on the first object;
calculating an offset amount of the second target based on the first distance;
above a third object in the second video frame image with the second object as the foreground image based on the position of the second object in the first video frame image and the amount of offset of the second object; obtaining a third video frame image by superimposing on the .

Calculating the offset amount of the second target based on the first distance
obtaining a yaw angle and a pitch angle of the first object, wherein the yaw angle is a horizontal included angle between the orientation of the first object in the first video frame image and a lens shooting direction; wherein the pitch angle is a vertical included angle between the orientation of the first object in the first video frame image and the lens shooting direction;
5. The image processing method according to claim 4, comprising calculating an offset amount of said second object based on said first distance, yaw angle and pitch angle.

Based on the position of the second object in the first video frame image and the offset amount of the second object, a third image in the second video frame image with the second object as a foreground image. Obtaining a third video frame image superimposed on the object
position-shifting an initial positioning point in the first video frame image of the second object to obtain a target positioning point based on the offset amount of the second object;
6. Rendering the second object as a foreground image in the second video frame image at the position of the target positioning point to obtain the third video frame image. image processing method.

After obtaining the third video frame image,
converting the color space of the second object and the third object in the third video frame image from the RGB color space to the HSV color space;
replacing H channel values in the HSV color space of the second object with H channel values of the third object;
Converting the color space of said second and third objects from HSV space to RGB color space to obtain a fourth video frame image. image processing method.

an object identification module for identifying a first object in a first video frame image and a second object located within said first object;
overlaying a third object as a foreground image on the first video frame image based on the position of the first object in the first video frame image, wherein the third object is the first video frame image; a second video frame image generation module for obtaining a second video frame image covering the object;
superimposing the second object as a foreground image on a third object in the second video frame image based on the position of the second object in the first video frame image to produce a third video; and a third video frame generation module for obtaining frame images.

The object identification module further comprises:
identifying a plurality of first keypoints of the first object and a plurality of second keypoints of the second object in the first video frame image;
identifying the first object based on a plurality of first keypoints of the first object;
identifying the second object based on a plurality of second keypoints of the second object, wherein the plurality of second keypoints are edge keypoints of the second object; 9. The image processing apparatus according to claim 8, used for:

The second video frame image generation module comprises:
calculating a positioning point of the first object in the first video frame image based on a plurality of first keypoints of the first object in the first video frame image;
aligning the positioning point of the third object with the positioning point of the first object in the first video frame image, with the third object as a foreground image on the first video frame image; 10. The image processing apparatus of claim 9, comprising overlaying on to obtain the second video frame image.

The second video frame image generation module further comprises:
calculating a first distance between two preset keypoints on the first object;
calculating an offset amount of the second target based on the first distance;
above a third object in the second video frame image with the second object as the foreground image based on the position of the second object in the first video frame image and the amount of offset of the second object; 11. The image processing apparatus according to any one of claims 8 to 10, which is used for obtaining a third video frame image by superimposing on the .

Calculating the offset amount of the second target based on the first distance
obtaining a yaw angle and a pitch angle of the first object, wherein the yaw angle is a horizontal included angle between the orientation of the first object in the first video frame image and a lens shooting direction; wherein the pitch angle is a vertical included angle between the orientation of the first object in the first video frame image and the lens shooting direction;
12. The image processing apparatus according to claim 11, comprising calculating an offset amount of said second object based on said first distance, yaw angle and pitch angle.

Based on the position of the second object in the first video frame image and the offset amount of the second object, a third image in the second video frame image with the second object as a foreground image. Obtaining a third video frame image superimposed on the object
position-shifting an initial positioning point in the first video frame image of the second object to obtain a target positioning point based on the offset amount of the second object;
13. Rendering the second object as a foreground image in the second video frame image at the location of the target positioning point to obtain the third video frame image. image processing device.

converting the color space of the second object and the third object in the third video frame image from the RGB color space to the HSV color space; replacing the H channel values of a third object; and converting the color spaces of the second and third objects from the HSV space to the RGB color space to obtain a fourth video frame image. Image processing apparatus according to any one of claims 8 to 13, further comprising a fourth video image generation module used.

a memory for storing computer readable instructions;
A processor for executing the computer-readable instructions, the processor realizing the image processing method according to any one of claims 1 to 7 when executed by the processor.

used for storing computer readable instructions, said computer readable instructions causing said computer to perform an image processing method according to any one of claims 1 to 7 when said computer readable instructions are executed by said computer; A non-transitory computer-readable storage medium.

A computer program that, when executed by a computer, causes the computer to execute the image processing method according to any one of claims 1 to 7.