TW202420241A

TW202420241A - Image processing method and electronic device

Info

Publication number: TW202420241A
Application number: TW111142674A
Authority: TW
Inventors: 陳佩君; 林家慶; 胡柯民
Original assignee: 英業達股份有限公司
Filing date: 2022-11-08
Publication date: 2024-05-16

Abstract

An image processing method includes the following steps. A plurality of facial landmarks are analyzed from a face frame. A feature width is calculated according to the facial landmarks, and a head pose is analyzed according to the facial landmarks. The head pose is utilized to update the feature width to generate an updated width. A scale ratio of the updated width to an initial width is calculated. An object distance of a virtual camera is controlled according to the scale ratio. A two-dimensional image is captured from a virtual scene according to the object distance of the virtual camera.

Description

Image processing method and electronic device

本案係關於一種影像處理方法，特別係關於一種虛擬場景的影像處理方法及電子裝置。This case relates to an image processing method, and more particularly to a virtual scene image processing method and an electronic device.

在現今的環境中，人們被要求遠距工作的比例大幅提升。這使許多從業者意識到遠端協作的好處。協作的設定備受歡迎的原因有很多，其中一個原因是協作者不再需要花費數小時或數天的旅途見面，他們可以在虛擬世界中可靠地進行通信和協作。In today's environment, people are being asked to work remotely at a much higher rate. This has made many practitioners realize the benefits of remote collaboration. There are many reasons why collaborative settings are becoming popular, one of which is that collaborators no longer need to travel for hours or days to meet in person, they can communicate and collaborate reliably in the virtual world.

然而，在一些情形中，協作者必須擁有特定設備才能與虛擬世界中的其他人有效協作，例如，配備虛擬實境頭部顯示器來追蹤頭部運動。這些頭部顯示器通常與一組配備追蹤傳感器的操縱桿配對。However, in some cases, collaborators must have specific equipment to effectively collaborate with others in the virtual world, such as a virtual reality head-mounted display that tracks head movements. These head-mounted displays are often paired with a set of joysticks equipped with tracking sensors.

再者，虛擬應用程序通常需要相當強大的計算設備才能確保舒適度。由於上述的議題，從現實世界和虛擬世界加入的協作者之間仍然存在很大差距。Furthermore, virtual applications usually require quite powerful computing equipment to ensure comfort. Due to the above issues, there is still a big gap between collaborators joining from the real world and the virtual world.

因此，如何降低利用高階設備分析頭部位姿的需求，並且確保協作者可以在一個輕鬆且身臨其境的協作系統中合作，為本領域中重要的議題。Therefore, how to reduce the need for high-end equipment to analyze head posture and ensure that collaborators can work together in an easy and immersive collaborative system is an important issue in this field.

本揭示文件提供一種影像處理方法，影像處理方法包含下列步驟。分析臉部圖框的複數個臉部特徵點。依據該些臉部特徵點計算特徵寬度，並且依據該些臉部特徵點分析頭部位姿。依據頭部位姿，更新特徵寬度以產生更新後的寬度。計算更新後的寬度相對於初始寬度的縮放比例。依據縮放比例，控制虛擬相機在虛擬場景中的攝像距離。依據虛擬相機的攝像距離，自虛擬場景採樣二維影像。The present disclosure document provides an image processing method, which includes the following steps. Analyze multiple facial feature points of a facial frame. Calculate feature width based on the facial feature points, and analyze head pose based on the facial feature points. Update the feature width based on the head pose to generate an updated width. Calculate the scaling ratio of the updated width relative to the initial width. Control the shooting distance of a virtual camera in a virtual scene based on the scaling ratio. Sample a two-dimensional image from the virtual scene based on the shooting distance of the virtual camera.

本揭示文件提供一種電子裝置。電子裝置包含影像感測器、處理器以及顯示器。影像感測器用以拍攝影像。處理器電性耦接該影像感測器以及顯示器，處理器用以進行下列步驟。分析影像中的臉部圖框的複數個臉部特徵點。依據該些臉部特徵點計算特徵寬度，並且依據該些臉部特徵點分析頭部位姿。依據頭部位姿，更新特徵寬度以產生更新後的寬度。計算更新後的寬度相對於初始寬度的縮放比例。依據縮放比例，控制虛擬相機在虛擬場景中的攝像距離。依據虛擬相機的攝像距離，自虛擬場景採樣二維影像。顯示器用以顯示二維影像。The present disclosure document provides an electronic device. The electronic device includes an image sensor, a processor and a display. The image sensor is used to capture images. The processor is electrically coupled to the image sensor and the display, and the processor is used to perform the following steps. Analyze multiple facial feature points of a facial frame in an image. Calculate feature width based on the facial feature points, and analyze head posture based on the facial feature points. Update feature width based on the head posture to generate an updated width. Calculate the scaling ratio of the updated width relative to the initial width. Control the shooting distance of a virtual camera in a virtual scene based on the scaling ratio. Sample a two-dimensional image from the virtual scene based on the shooting distance of the virtual camera. The display is used to display two-dimensional images.

綜上所述，本案的影像處理方法以及電子裝置透過分析影像中的臉部位置，控制虛擬相機的攝像距離，並且依據虛擬相機的視野，將三維場景轉換至二維圖像，從而提供使用者在視訊會議、自拍影像、景色呈現或其他互動式影像處理技術中沉浸式的體驗。In summary, the image processing method and electronic device of the present invention controls the shooting distance of the virtual camera by analyzing the facial position in the image, and converts the three-dimensional scene into a two-dimensional image according to the field of view of the virtual camera, thereby providing users with an immersive experience in video conferencing, self-portrait images, landscape presentations or other interactive image processing technologies.

下列係舉實施例配合所附圖示做詳細說明，但所提供之實施例並非用以限制本揭露所涵蓋的範圍，而結構運作之描述非用以限制其執行順序，任何由元件重新組合之結構，所產生具有均等功效的裝置，皆為本揭露所涵蓋的範圍。另外，圖示僅以說明為目的，並未依照原尺寸作圖。為使便於理解，下述說明中相同元件或相似元件將以相同之符號標示來說明。The following is a detailed description of the embodiments with the attached diagrams, but the embodiments provided are not intended to limit the scope of the disclosure, and the description of the structure operation is not intended to limit its execution order. Any device with equal functions produced by the re-combination of components is within the scope of the disclosure. In addition, the diagrams are for illustration purposes only and are not drawn according to the original size. For ease of understanding, the same or similar components in the following description will be marked with the same symbols.

在全篇說明書與申請專利範圍所使用之用詞(terms)，除有特別註明除外，通常具有每個用詞使用在此領域中、在此揭露之內容中與特殊內容中的平常意義。The terms used throughout the specification and claims generally have the ordinary meanings of each term used in the art, in the context of this disclosure and in the specific context, unless otherwise specified.

此外，在本文中所使用的用詞『包含』、『包括』、『具有』、『含有』等等，均為開放性的用語，即意指『包含但不限於』。此外，本文中所使用之『及／或』，包含相關列舉項目中一或多個項目的任意一個以及其所有組合。In addition, the terms "include", "including", "have", "contain", etc. used in this article are open terms, which means "including but not limited to". In addition, "and/or" used in this article includes any one or more items in the relevant enumerated items and all combinations thereof.

於本文中，當一元件被稱為『耦接』或『耦接』時，可指『電性耦接』或『電性耦接』。『耦接』或『耦接』亦可用以表示二或多個元件間相互搭配操作或互動。此外，雖然本文中使用『第一』、『第二』、…等用語描述不同元件，該用語僅是用以區別以相同技術用語描述的元件或操作。In this article, when an element is referred to as "coupled" or "coupled", it may refer to "electrically coupled" or "electrically coupled". "Coupled" or "coupled" may also be used to indicate the coordinated operation or interaction between two or more elements. In addition, although the terms "first", "second", etc. are used in this article to describe different elements, the terms are only used to distinguish between elements or operations described with the same technical terms.

請參閱第1圖以及第2圖，第1圖為依據本揭示文件的一些實施例所繪示的電子裝置100拍攝影像IMG的示意圖。第2圖為依據本揭示文件的一些實施例所繪示的電子裝置100的示意圖。在一些實施例中，電子裝置100可以由個人電腦、平板、智慧型手機或其他具有影像感測以及運算功能的電子裝置實施。在一些實施例中，電子裝置100包含影像感測器110、處理器120、記憶體裝置130以及顯示器140。Please refer to FIG. 1 and FIG. 2. FIG. 1 is a schematic diagram of an electronic device 100 capturing an image IMG according to some embodiments of the present disclosure. FIG. 2 is a schematic diagram of an electronic device 100 according to some embodiments of the present disclosure. In some embodiments, the electronic device 100 can be implemented by a personal computer, a tablet, a smart phone, or other electronic devices with image sensing and computing functions. In some embodiments, the electronic device 100 includes an image sensor 110, a processor 120, a memory device 130, and a display 140.

在一些實施例中，本案的利用廣泛部署的網絡攝像頭從遠程協作者獲取視頻數據，並且通過電子裝置100將本地用戶的頭部動作關聯至虛擬世界來增強沉浸式體驗。In some embodiments, the present invention utilizes widely deployed webcams to obtain video data from remote collaborators and associates the local user's head movements with the virtual world through the electronic device 100 to enhance the immersive experience.

為了實現這一點，我們利用人臉特徵檢測神經網路模型從給本地用戶的網絡攝像頭拍攝的影像中評估頭部位姿。經評估的頭部姿勢被關聯至虛擬相機的作動，以減少在現實世界和虛擬世界之中即時發生的事件的差距，從而無需透過額外配置的擷取裝置或影像感測裝置即可獲得更加身臨其境的用戶體驗。To achieve this, we use a facial feature detection neural network model to estimate head pose from webcam images of the local user. The estimated head pose is linked to the motion of the virtual camera to reduce the gap between real-world and virtual events in real time, resulting in a more immersive user experience without the need for additional capture devices or image sensors.

換言之，用於渲染三維場景中的虛擬世界以進行協作的筆記型電腦、手機、平板、電腦或其他具有運算、顯示及影像感測功能的電子裝置可以用作虛擬世界的門戶或窗口。In other words, a laptop, mobile phone, tablet, computer or other electronic device with computing, display and image sensing functions used to render a virtual world in a three-dimensional scene for collaboration can be used as a portal or window to the virtual world.

在一些實施例中，利用本案的電子裝置100作為介面與虛擬世界互動，可降低配戴感測頭部位姿的頭部設備的需，從而在通用電子裝置(例如，傳統筆記本電腦、手機、電腦)的顯示器上實現三維場景的觀看體驗，而不須配置具有虛擬實境(Virtual Reality)或擴充實境(Augmented Reality)功能的頭部配戴裝置(例如，頭戴顯示器)。In some embodiments, the electronic device 100 of the present invention is used as an interface to interact with the virtual world, thereby reducing the need to wear a head device for sensing head posture, thereby realizing a three-dimensional scene viewing experience on the display of a general electronic device (e.g., a traditional laptop, mobile phone, computer) without the need to configure a head-mounted device (e.g., a head-mounted display) with virtual reality (Virtual Reality) or augmented reality (Augmented Reality) functions.

在一些實施例中，本案的電子裝置100亦可搭配虛擬實境(Virtual Reality)或擴充實境(Augmented Reality)技術使用。因此，本案不以此為限。In some embodiments, the electronic device 100 of the present invention may also be used in conjunction with virtual reality or augmented reality technology. Therefore, the present invention is not limited thereto.

處理器120可以由中央處理器、微處理器、圖形處理器、可程式閘陣列積體電路(Field-Programmable Gate Array；FPGA)、特定應用積體電路(Application Specific Integrated Circuit；ASIC)或其他適用於提取或執行儲存於記憶體裝置130的指令的硬體裝置。記憶體裝置130可以由電性、磁性、光學記憶裝置或其他儲存指令或資料的儲存裝置實施。The processor 120 may be a central processing unit, a microprocessor, a graphics processor, a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), or other hardware devices suitable for fetching or executing instructions stored in the memory device 130. The memory device 130 may be implemented by an electrical, magnetic, optical memory device, or other storage device for storing instructions or data.

記憶體裝置130可以由揮發性記憶體或非揮發性記憶體實施。在一些實施例中，記憶體裝置130可以由隨機存取記憶體(Random Access Memory；RAM)、動態隨機存取記憶體(Dynamic Random Access Memory；DRAM)、磁阻式隨機存取記憶體(Magnetoresistive Random Access Memory；MRAM)、相變化記憶體(Phase-Change Random Access Memory；PCRAM)或其他儲存裝置實施。記憶體裝置130用以儲存資料或指令供處理器120存取運作。The memory device 130 may be implemented by a volatile memory or a non-volatile memory. In some embodiments, the memory device 130 may be implemented by a random access memory (RAM), a dynamic random access memory (DRAM), a magnetoresistive random access memory (MRAM), a phase-change random access memory (PCRAM), or other storage devices. The memory device 130 is used to store data or instructions for the processor 120 to access and operate.

影像感測器110可以由CMOS(Complementary Metal Oxide Semiconductor；COMS)影像感測器(CMOS image sensor)、CCD(Charge-Coupled Device；CCD)影像感測器(CCD image sensor)、其他光感測組件或光感測裝置實施。影像感測器110電性耦接處理器120。The image sensor 110 may be implemented by a CMOS (Complementary Metal Oxide Semiconductor; CMOS) image sensor, a CCD (Charge-Coupled Device; CCD) image sensor, or other light sensing components or light sensing devices. The image sensor 110 is electrically coupled to the processor 120 .

電子裝置100的影像感測器110用以拍攝影像IMG。The image sensor 110 of the electronic device 100 is used to capture an image IMG.

此時，若使用者進入影像感測器110的視野FOV，電子裝置100利用神經網路自影像感測器110擷取的影象IMG中檢測臉部圖框Ff，而使電子裝置100分析影像IMG中使用者的臉部沿軸向Ay的深度變化以及沿與軸向Ay垂直的鉛直平面的位移，並且依據使用者的臉部的深度變化以及於二維平面的位移，調整顯示器140的顯示影像。處理器120如何分析使用者的臉部的深度變化以及位移並且調整顯示器140的顯示影像將於後續實施例中詳細說明。At this time, if the user enters the field of view FOV of the image sensor 110, the electronic device 100 uses a neural network to detect a facial frame Ff from the image IMG captured by the image sensor 110, and the electronic device 100 analyzes the depth change of the user's face along the axis Ay and the displacement along the vertical plane perpendicular to the axis Ay in the image IMG, and adjusts the display image of the display 140 according to the depth change of the user's face and the displacement in the two-dimensional plane. How the processor 120 analyzes the depth change and displacement of the user's face and adjusts the display image of the display 140 will be described in detail in the subsequent embodiments.

請一併參閱第1圖至第10B圖。第3圖為依據本揭示文件的一些實施例所繪示的影像處理方法200的流程圖。第4圖為依據本揭示文件的一些實施例所繪示的卷積神經網路CNN的示意圖。第5圖為依據本揭示文件的一些實施例所繪示的第4圖中的卷積神經網路CNN的規格的示意圖。第6圖為依據本揭示文件的一些實施例所繪示的使用者的頭部位姿HP的示意圖。第7圖為依據本揭示文件的一些實施例所繪示的第4圖中的臉部特徵點FPs的示意圖。第8圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的中心位置f _c以及特徵寬度f _w的示意圖。第9A圖至第9B圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的特徵寬度f _w以及更新後的寬度f _w’的示意圖。第10A圖以及第10B圖為依據本揭示文件的一些實施例所繪示的虛擬相機於虛擬場景的示意圖。 Please refer to Figures 1 to 10B together. Figure 3 is a flow chart of an image processing method 200 illustrated in accordance with some embodiments of the present disclosure. Figure 4 is a schematic diagram of a convolutional neural network CNN illustrated in accordance with some embodiments of the present disclosure. Figure 5 is a schematic diagram of the specifications of the convolutional neural network CNN in Figure 4 illustrated in accordance with some embodiments of the present disclosure. Figure 6 is a schematic diagram of a user's head posture HP illustrated in accordance with some embodiments of the present disclosure. Figure 7 is a schematic diagram of facial feature points FPs in Figure 4 illustrated in accordance with some embodiments of the present disclosure. Figure 8 is a schematic diagram of a user's face center position _fc and feature width _fw illustrated in accordance with some embodiments of the present disclosure. FIG9A to FIG9B are schematic diagrams of a characteristic width _fw and an updated width _fw ' of a user's face according to some embodiments of the present disclosure. FIG10A and FIG10B are schematic diagrams of a virtual camera in a virtual scene according to some embodiments of the present disclosure.

如第3圖所示，影像處理方法200包含步驟S210~S270。其中，步驟S210由影像感測器110執行。步驟S270由顯示器140執行。步驟S220~S260由處理器120執行。As shown in FIG. 3 , the image processing method 200 includes steps S210 to S270 , wherein step S210 is performed by the image sensor 110 , step S270 is performed by the display 140 , and steps S220 to S260 are performed by the processor 120 .

於步驟S210中，由影像感測器110拍攝二維影像。由電子裝置100的影像感測器110拍攝二維影像IMG，如第1圖所示。此時，若使用者進入影像感測器110的視野，電子裝置100利用神經網路會自影像感測器110擷取的影像IMG中檢測臉部圖框Ff。In step S210, the image sensor 110 captures a two-dimensional image. The image sensor 110 of the electronic device 100 captures a two-dimensional image IMG, as shown in FIG. 1. At this time, if the user enters the field of view of the image sensor 110, the electronic device 100 detects a face frame Ff from the image IMG captured by the image sensor 110 using a neural network.

於步驟S220中，分析影像IMG中的臉部圖框Ff的臉部特徵點FPs。在一些實施例中，處理器120利用臉部特徵神經網路模型CNN分析影像IMG中的臉部圖框Ff的臉部特徵點FPs。在一些實施例中，臉部特徵神經網路模型CNN係由卷積神經網路架構實施，所述卷積神經網路架構包含8個卷積層cov_1~cov_8、2個全連階層dense_1~dense_2以及4個池化層pool_1~pool_4。在一些實施例中，所述卷積神經網路架構的輸入係影像IMG。並且，對於一個臉部圖框Ff，所述卷積神經網路架構輸出68個臉部特徵點FPs的(x,z)座標，如第4圖、第5圖以及第7圖所示。In step S220, the facial feature points FPs of the facial frame Ff in the image IMG are analyzed. In some embodiments, the processor 120 uses a facial feature neural network model CNN to analyze the facial feature points FPs of the facial frame Ff in the image IMG. In some embodiments, the facial feature neural network model CNN is implemented by a convolutional neural network architecture, and the convolutional neural network architecture includes 8 convolutional layers cov_1~cov_8, 2 fully connected layers dense_1~dense_2, and 4 pooling layers pool_1~pool_4. In some embodiments, the input of the convolutional neural network architecture is the image IMG. Furthermore, for a facial frame Ff, the convolutional neural network architecture outputs the (x, z) coordinates of 68 facial feature points FPs, as shown in FIGS. 4 , 5 and 7 .

於步驟S230中，依據該些臉部特徵點FPs分析臉部圖框Ff中的特徵寬度f _w、中心位置f _c以及頭部位姿HP。在一些實施例中，中心位置f _c由雙眼間的中心點實施，並且特徵寬度f _w由瞳距實施。在另一些實施例中，特徵寬度f _w可以由臉部特徵點FPs的其他部分計算臉部圖框Ff中的雙眼內寬、雙眼外寬、眼裂寬度、臉部寬度、下頷寬度、嘴部寬度或鼻部寬度實施。因此，本案不以此為限。 In step S230, the feature width _fw , the center position _fc, and the head posture HP in the face frame Ff are analyzed based on the facial feature points FPs. In some embodiments, the center position _fc is implemented by the center point between the eyes, and the feature width _fw is implemented by the pupil distance. In other embodiments, the feature width _fw can be implemented by calculating the inner width of the eyes, the outer width of the eyes, the width of the palpebral fissure, the width of the face, the width of the mandible, the width of the mouth, or the width of the nose in the face frame Ff from other parts of the facial feature points FPs. Therefore, the present invention is not limited thereto.

舉例而言，若臉部的中心位置fc由雙眼之間的中心實施，並且臉部的特徵寬度f _w由瞳距實施，處理器120自臉部特徵點FPs提取左眼內外眼角位置p ₄₃及p ₄₆以及右眼內外眼角位置p ₃₇及p ₄₀。 For example, if the facial center position fc is implemented by the center between the eyes and the facial feature width _fw is implemented by the pupil distance, the processor 120 extracts the inner and outer corner positions _p43 and _p46 of the left eye and the inner and outer corner positions _p37 and _p40 of the right eye from the facial feature points FPs.

處理器120平均左眼內外眼角位置p ₄₃及p ₄₆作為左眼位置p _l，並且處理器120平均右眼內外眼角位置p ₃₇及p ₄₀作為右眼位置p _r，如下列公式所示。 The processor 120 averages the inner and outer corner positions of the left eye p ₄₃ and p ₄₆ as the left eye position p _l , and the processor 120 averages the inner and outer corner positions of the right eye p ₃₇ and p ₄₀ as the right eye position p _r , as shown in the following formula.

處理器120平均左眼位置p _l以及右眼位置p _r作為臉部的中心位置fc，並且處理器120計算左眼位置p _l以及右眼位置p _r之間的差值作為特徵寬度f _w，如下列公式所示。 The processor 120 averages the left eye position p _l and the right eye position _pr as the center position fc of the face, and the processor 120 calculates the difference between the left eye position p _l and the right eye position _pr as the feature width f _w , as shown in the following formula.

並且，處理器120計算中心位置f _c於二維平面的位移d _f以及特徵寬度f _w的縮放比例r _f，如下列公式所示。 Furthermore, the processor 120 calculates the displacement _{df of the center position fc} _in the two-dimensional plane and the scaling ratio _rf of the feature width _fw , as shown in the following formula.

在上述公式中，f _c,0代表中心位置的初始數值，f _w,0代表臉部寬度的初始數值。 In the above formula, f _c,0 represents the initial value of the center position, and f _w,0 represents the initial value of the face width.

如此，若使用者接近電子裝置100，在影像感測器110所拍攝的影像IMG中，使用者的臉部會放大。另一方面，若使用者遠離電子裝置100，在影像感測器110拍攝的影像IMG中，使用者的臉部會縮小。因此，透過臉部圖框Ff中的特徵寬度f _w的縮放比例r _f，便可分析並判斷使用者與電子裝置100之間的深度/距離的變化。 Thus, if the user approaches the electronic device 100, the user's face will be magnified in the image IMG captured by the image sensor 110. On the other hand, if the user moves away from the electronic device 100, the user's face will be reduced in the image IMG captured by the image sensor 110. Therefore, the change in the depth/distance between the user and the electronic device 100 can be analyzed and determined through the scaling ratio _rf of the feature width _fw in the face frame Ff.

在一些情形中，若使用者未向電子裝置 100 靠近, 僅有頭部位姿 HP 以垂直軸(例如，Az軸)為軸心的的偏擺 (yaw)。此時，由於影像感測器 110 捕捉到的使用者人臉為側臉，導致根據兩眼距離所計算出來的特徵距離 fw 會縮小。In some cases, if the user does not approach the electronic device 100, there is only a yaw of the head posture HP about the vertical axis (e.g., Az axis). At this time, since the user's face captured by the image sensor 110 is a profile, the characteristic distance fw calculated based on the distance between the two eyes will be reduced.

因此，接續步驟S240，依據頭部位姿HP，更新特徵寬度f _w以產生更新後的寬度f _w’，從而利用更新後的寬度f _w’判斷使用者與電子裝置100之間更佳精確的深度/距離的變化。 Therefore, in step S240, the feature width _fw is updated according to the head posture HP to generate an updated width _fw ', so as to use the updated width _fw ' to determine the change of the depth/distance between the user and the electronic device 100 with better accuracy.

具體而言，由處理器120利用神經網路由該些臉部特徵點FPs之間的向量變化分析使用者的頭部位姿HP的臉部方向FD。並且，使用者的頭部位姿HP在垂直軸上的偏擺(yaw)，經轉換為臉部方向FD在水平面上相對於軸向Ay的角度θ，其中角度θ於90度至-90度的範圍內。Specifically, the processor 120 uses a neural network to analyze the facial direction FD of the user's head posture HP from the vector changes between the facial feature points FPs. Furthermore, the yaw of the user's head posture HP on the vertical axis is converted into an angle θ of the facial direction FD relative to the axial direction Ay on the horizontal plane, wherein the angle θ is in the range of 90 degrees to -90 degrees.

由處理器120依據使用者的頭部位姿HP在垂直軸 (例如，Az軸)上的偏擺角度θ，校正/更新臉部圖框Ff中的特徵寬度f _w以產生更新後的寬度f _w’，更新後的寬度f _w’可由下列公式表示。 The processor 120 corrects/updates the feature width _fw in the face frame Ff according to the tilt angle θ of the user's head posture HP on the vertical axis (eg, Az axis) to generate an updated width _fw '. The updated width _fw ' can be expressed by the following formula.

如此，更新後的寬度f _w’將用於後續計算，從而更佳精確的判別電子裝置100沿軸向Ay至使用者之間的距離。處理器120計算更新後的寬度f _w’相對於初始寬度f _w,0的縮放比例，所述縮放比例以r _f’表示，縮放比例r _f’如下列公式所示。 Thus, the updated width _fw ' will be used in subsequent calculations to more accurately determine the distance from the electronic device 100 to the user along the axis Ay. The processor 120 calculates the scaling ratio of the updated width _fw ' relative to the initial width fw _,0 , which is represented by _rf '. The scaling ratio _rf ' is shown in the following formula.

於步驟S243中，依據縮放比例r _f’，控制虛擬相機VCA在虛擬場景中的攝像距離。若縮放比例r _f’變大(或大於1)，表示使用者臉部相對靠近電子裝置100，處理器120控制虛擬相機VCA靠近固定點VP，使虛擬相機VCA的視野縮小，從而關注虛擬場景VIR中的局部區域，如第10A圖所示。另一方面，若縮放比例r _f’變小(或小於1)，表示使用者臉部相對遠離電子裝置100，處理器120控制虛擬相機VCA遠離固定點VP，使虛擬相機VCA的視野擴大，如第10B圖所示。 In step S243, the shooting distance of the virtual camera VCA in the virtual scene is controlled according to the zoom ratio r _f '. If the zoom ratio r _f ' becomes larger (or greater than 1), it means that the user's face is relatively close to the electronic device 100. The processor 120 controls the virtual camera VCA to approach the fixed point VP, so that the field of view of the virtual camera VCA is reduced, so as to focus on a local area in the virtual scene VIR, as shown in FIG. 10A. On the other hand, if the zoom ratio r _f 'becomes smaller (or less than 1), it means that the user's face is relatively far away from the electronic device 100, and the processor 120 controls the virtual camera VCA to move away from the fixed point VP to expand the field of view of the virtual camera VCA, as shown in FIG. 10B.

值得注意的是，虛擬場景VIR可以由三維虛擬場景實施。在另一些實施例中，虛擬場景VIR由二維虛擬場景實施。因此，本案不以此為限。It is worth noting that the virtual scene VIR can be implemented by a three-dimensional virtual scene. In other embodiments, the virtual scene VIR is implemented by a two-dimensional virtual scene. Therefore, the present invention is not limited thereto.

在一些實施例中，處理器120可將縮放比例r _f’進行具有平滑功能的差分運算(Difference calculation with Smoothing)處理，在依據運算後的結果控制虛擬相機VCA在虛擬場景VIR中的攝像距離。 In some embodiments, the processor 120 may perform a difference calculation with smoothing on the scaling ratio r _f ′, and control the shooting distance of the virtual camera VCA in the virtual scene VIR according to the calculation result.

虛擬相機VCA的側向移動位置以及旋轉位姿係依據臉部圖框Ff中的中心位置f _c決定，會於步驟S252及S253中詳細說明。 The lateral movement position and rotational posture of the virtual camera VCA are determined according to the center position f _c in the face frame Ff, which will be described in detail in steps S252 and S253.

為了更佳的理解，請一併請參閱第1圖~第3圖、第8圖以及第11A圖至第11C圖。第11A圖至第11C圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的中心位置f _c相對於電子裝置100的位移d _f映射至虛擬場景VIR中的虛擬相機VCA的位姿以及位移的示意圖。 For a better understanding, please refer to FIG. 1 to FIG. 3, FIG. 8, and FIG. 11A to FIG. 11C are schematic diagrams showing the center position _fc of the user's face relative to the displacement df of the electronic device 100 _mapped to the position and displacement of the virtual camera VCA in the virtual scene VIR according to some embodiments of the present disclosure.

於步驟S252中，計算中心位置f _c相對於初始位置f _c,0的位移d _f。在一些實施例中，初始位置f _c,0的設置可在設定階段完成。舉例而言，電子裝置100進入設定階段後，提示使用者預備進行初始位置f _c,0的設置，在使用者調整其臉部與電子裝置100之間的調整至最合適的距離後，由電子裝置100擷取當下畫面，計算其中的臉部的中心位置作為初始位置f _c,0，如第11A圖所示。 In step S252, the displacement _df of the center position _fc relative to the initial position _fc,0 is calculated. In some embodiments, the setting of the initial position _fc,0 can be completed in the setting stage. For example, after the electronic device 100 enters the setting stage, it prompts the user to prepare to set the initial position _fc,0. After the user adjusts the distance between his face and the electronic device 100 to the most suitable distance, the electronic device 100 captures the current picture and calculates the center position of the face therein as the initial position _fc,0 , as shown in FIG. 11A.

於步驟S253中，依據所述位移d _f，控制虛擬相機VCA在虛擬場景VIR中的取景角度。處理器120依據所述位移，在基於固定點VP所建立的曲面移動虛擬相機VCA，以調整該虛擬相機VCA在虛擬場景VIR中的取景角度。 In step S253, the viewing angle of the virtual camera VCA in the virtual scene VIR is controlled according to the displacement _df . The processor 120 moves the virtual camera VCA on the curved surface established based on the fixed point VP according to the displacement to adjust the viewing angle of the virtual camera VCA in the virtual scene VIR.

舉例而言，若臉部圖框Ff中的中心位置f _c沿方向Dxp移動，虛擬相機VCA沿方向Dxp移動，並且相對於固定點VP以逆時針旋轉視角，從而調整該虛擬相機VCA在虛擬場景VIR中的取景角度，如第11B圖所示。 For example, if the center position _fc in the face frame Ff moves along the direction Dxp, the virtual camera VCA moves along the direction Dxp and rotates the viewing angle counterclockwise relative to the fixed point VP, thereby adjusting the viewing angle of the virtual camera VCA in the virtual scene VIR, as shown in FIG. 11B .

另一方面，若臉部圖框Ff中的中心位置f _c沿方向Dxn移動，虛擬相機VCA沿方向Dxn移動，並且相對於固定點VP以順時針旋轉視角，從而調整該虛擬相機VCA在虛擬場景VIR中的取景角度，如第11C圖所示。 On the other hand, if the center position _fc in the face frame Ff moves along the direction Dxn, the virtual camera VCA moves along the direction Dxn and rotates the viewing angle clockwise relative to the fixed point VP, thereby adjusting the viewing angle of the virtual camera VCA in the virtual scene VIR, as shown in FIG. 11C .

值得注意的是，處理器120可將中心位置f _c的位移d _f進行具有平滑功能的差分運算(Difference calculation with Smoothing)處理，依據運算後的結果控制虛擬相機VCA在虛擬場景VIR中的取景角度。 It is worth noting that the processor 120 may perform a difference calculation with smoothing on the displacement _df of the center position _fc , and control the viewing angle of the virtual camera VCA in the virtual scene VIR according to the result of the calculation.

於第11B圖以及第11C圖的實施例中，臉部的中心位置f _c沿軸向Ax移動。在一些實施例中，臉部的中心位置f _c可同時在軸向Ax以及Az所形成的鉛直平面移動。臉部的中心位置f _c在軸向Az的位移與虛擬相機VCA的攝影角度之間的關聯，類似於臉部的中心位置f _c在軸向Ax的位移d _f與虛擬相機VCA的攝影角度之間的關聯，故不再贅述。 In the embodiments of FIG. 11B and FIG. 11C , the center position f _c of the face moves along the axis Ax. In some embodiments, the center position f _c of the face may move simultaneously in the vertical plane formed by the axes Ax and Az. The relationship between the displacement of the center position f _c of the face in the axis Az and the shooting angle of the virtual camera VCA is similar to the relationship between the displacement d _f of the center position f _c of the face in the axis Ax and the shooting angle of the virtual camera VCA, so it is not repeated.

於步驟S260中，利用虛擬相機VCA自虛擬場景VIR採樣二維影像。處理器120依據前述虛擬相機VCA的攝像距離以及取景角度，利用將三維場景轉換至二維圖像的渲染引擎(3D-2D rendering engine)，自虛擬場景VIR採樣二維影像。In step S260, the virtual camera VCA samples a 2D image from the virtual scene VIR. The processor 120 samples a 2D image from the virtual scene VIR using a 3D-2D rendering engine that converts a 3D scene into a 2D image according to the shooting distance and viewing angle of the virtual camera VCA.

於步驟S270中，由顯示器140顯示自虛擬場景VIR採樣的二維影像。在一些實施例中，當虛擬相機VCA的視野縮小時，虛擬相機VCA所擷取的影像可經放大至與顯示器140的畫面相應的尺寸。如此，使用者可更清楚的關注局部區域。在一些實施例中，當虛擬相機VCA的視野放大時，虛擬相機VCA所擷取的影像可經縮小至與顯示器140的畫面相應的尺寸，從而關注全域區域。In step S270, the two-dimensional image sampled from the virtual scene VIR is displayed by the display 140. In some embodiments, when the field of view of the virtual camera VCA is reduced, the image captured by the virtual camera VCA can be enlarged to a size corresponding to the screen of the display 140. In this way, the user can pay more attention to the local area. In some embodiments, when the field of view of the virtual camera VCA is enlarged, the image captured by the virtual camera VCA can be reduced to a size corresponding to the screen of the display 140, so as to pay attention to the global area.

請參閱第12A圖至第12C圖。第12A圖至第12C圖為依據本揭示文件的一些實施例所繪示的影像感測器110拍攝的影像IMG中的使用者的移動以及顯示器140的影像IMG_DIS的示意圖。Please refer to FIG. 12A to FIG. 12C . FIG. 12A to FIG. 12C are schematic diagrams showing the movement of the user in the image IMG captured by the image sensor 110 and the image IMG_DIS of the display 140 according to some embodiments of the present disclosure.

如第12A圖所示，當使用者於實際環境中向右移動(由於前置鏡頭拍攝後呈鏡像，影像IMG中使用者向左移動)，虛擬相機VCA向右移動並逆時針旋轉以擷取左側空間中的畫面。As shown in FIG. 12A , when the user moves to the right in the real environment (since the front camera shoots and the image is mirrored, the user moves to the left in the image IMG), the virtual camera VCA moves to the right and rotates counterclockwise to capture the image in the left space.

如第12B圖所示，當使用者於實際環境中相對於初始位置f _c,0不具有位移時，虛擬相機VCA擷取中央空間的畫面。 As shown in FIG. 12B , when the user has no displacement relative to the initial position f _c,0 in the real environment, the virtual camera VCA captures the image of the central space.

如第12C圖所示，當使用者於實際環境中向左移動(由於前置鏡頭拍攝後呈鏡像，影像IMG中使用者向右移動)，虛擬相機VCA向左移動並順時針旋轉以擷取右側空間中的畫面。As shown in FIG. 12C , when the user moves to the left in the real environment (since the front camera shoots the image and the user moves to the right in the image IMG), the virtual camera VCA moves to the left and rotates clockwise to capture the image in the right space.

綜上所述，本案的電子裝置100以及影像處理方法200透過分析二維影像中的臉部位置，控制虛擬相機VCA的攝像距離以及取景角度，並且依據虛擬相機VCA的視野，將三維場景轉換至二維圖像，從而提供使用者在視訊會議、自拍影像、景色呈現或其他互動式影像處理技術中沉浸式的體驗。In summary, the electronic device 100 and the image processing method 200 of the present invention control the shooting distance and the viewing angle of the virtual camera VCA by analyzing the facial position in the two-dimensional image, and convert the three-dimensional scene into a two-dimensional image according to the field of view of the virtual camera VCA, thereby providing the user with an immersive experience in video conferencing, self-portrait images, scene presentation or other interactive image processing technologies.

為使本揭露之上述和其他目的、特徵、優點與實施例能更明顯易懂，所附符號之說明如下： 100:電子裝置 110:影像感測器 120:處理器 130:記憶體裝置 140:顯示器 200:影像處理方法 IMG,IMG_DIS:影像 Ff:臉部圖框 Ax,Ay,Az:軸向 FOV:視野 S210,S220,S230,S241,S242,S243,S252,S253,S260,S270:步驟 CNN:臉部特徵神經網路模型 FPs:臉部特徵點 HP:頭部位姿 p ₃₇:右眼外眼角位置 p ₄₀:右眼內眼角位置 p ₄₃:左眼內眼角位置 p ₄₆:左眼外眼角位置 p _r:右眼位置 p _l:左眼位置 f _c:中心位置 f _w:特徵寬度 d _r:位移 f _w’:更新後的寬度 FD:臉部方向 VP:固定點 VIR:虛擬場景 VCA:虛擬相機 Dxp,Dxn:方向 In order to make the above and other objects, features, advantages and embodiments of the present disclosure more clearly understandable, the attached symbols are described as follows: 100: electronic device 110: image sensor 120: processor 130: memory device 140: display 200: image processing method IMG, IMG_DIS: image Ff: face frame Ax, Ay, Az: axial FOV: field of view S210, S220, S230, S241, S242, S243, S252, S253, S260, S270: step CNN: facial feature neural network model FPs: facial feature point HP: head posture _p37 : right eye outer corner position _p40 : right eye inner corner position _p43 : left eye inner corner position _p46 : left eye outer corner position _pr : right eye position p _l : left eye position f _c : center position f _w : feature width d _r : displacement f _w ': updated width FD: face direction VP: fixed point VIR: virtual scene VCA: virtual camera Dxp, Dxn: direction

為使本揭露之上述和其他目的、特徵、優點與實施例能更明顯易懂，所附圖式之說明如下：第1圖為依據本揭示文件的一些實施例所繪示的電子裝置拍攝影像的示意圖。第2圖為依據本揭示文件的一些實施例所繪示的電子裝置的示意圖。第3圖為依據本揭示文件的一些實施例所繪示的影像處理方法的流程圖。第4圖為依據本揭示文件的一些實施例所繪示的卷積神經網路的示意圖。第5圖為依據本揭示文件的一些實施例所繪示的第4圖中的卷積神經網路的規格的示意圖。第6圖為依據本揭示文件的一些實施例所繪示的使用者的頭部位姿的示意圖。第7圖為依據本揭示文件的一些實施例所繪示的第4圖中的臉部特徵點的示意圖。第8圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的中心位置以及特徵寬度的示意圖。第9A圖至第9B圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的特徵寬度以及更新後的寬度的示意圖。第10A圖以及第10B圖為依據本揭示文件的一些實施例所繪示的虛擬相機於虛擬場景的示意圖。第11A圖至第11C圖為依據本揭示文件的一些實施例所繪示的使用者的臉部的中心位置相對於電子裝置的位移應設至虛擬場景中的虛擬相機的位姿以及位移的示意圖。第12A圖至第12C圖為依據本揭示文件的一些實施例所繪示的影像感測器拍攝的影像中的使用者的移動以及顯示器的影像的示意圖。 In order to make the above and other purposes, features, advantages and embodiments of the present disclosure more clearly understandable, the attached figures are described as follows: Figure 1 is a schematic diagram of an electronic device shooting an image according to some embodiments of the present disclosure. Figure 2 is a schematic diagram of an electronic device according to some embodiments of the present disclosure. Figure 3 is a flow chart of an image processing method according to some embodiments of the present disclosure. Figure 4 is a schematic diagram of a convolutional neural network according to some embodiments of the present disclosure. Figure 5 is a schematic diagram of the specifications of the convolutional neural network in Figure 4 according to some embodiments of the present disclosure. Figure 6 is a schematic diagram of a user's head posture according to some embodiments of the present disclosure. Figure 7 is a schematic diagram of facial feature points in Figure 4 according to some embodiments of the present disclosure. FIG. 8 is a schematic diagram of the center position and feature width of the user's face according to some embodiments of the present disclosure. FIG. 9A to FIG. 9B are schematic diagrams of the feature width and updated width of the user's face according to some embodiments of the present disclosure. FIG. 10A and FIG. 10B are schematic diagrams of a virtual camera in a virtual scene according to some embodiments of the present disclosure. FIG. 11A to FIG. 11C are schematic diagrams of the position and displacement of the virtual camera in the virtual scene relative to the displacement of the electronic device according to some embodiments of the present disclosure. FIG. 12A to FIG. 12C are schematic diagrams of the movement of the user in the image captured by the image sensor and the image of the display according to some embodiments of the present disclosure.

國內寄存資訊(請依寄存機構、日期、號碼順序註記) 無國外寄存資訊(請依寄存國家、機構、日期、號碼順序註記) 無 Domestic storage information (please note in the order of storage institution, date, and number) None Foreign storage information (please note in the order of storage country, institution, date, and number) None

100:電子裝置 100: Electronic devices

110:影像感測器 110: Image sensor

120:處理器 120: Processor

130:記憶體裝置 130: Memory device

140:顯示器 140: Display

IMG:影像 IMG: Image

Ff:臉部圖框 Ff: Face frame

Claims

An image processing method comprises: analyzing a plurality of facial feature points of a face frame; calculating a feature width based on the facial feature points, and analyzing a head pose based on the facial feature points; updating the feature width to generate an updated width based on the head pose; calculating a scaling ratio of the updated width relative to an initial width; controlling a shooting distance of a virtual camera in a virtual scene based on the scaling ratio; and sampling a two-dimensional image from the virtual scene based on the shooting distance of the virtual camera.

The image processing method as described in claim 1, wherein the feature width is the inner width of the eyes, the pupil distance, the outer width of the eyes, the width of the palpebral fissure, the width of the face, the width of the mandible, the width of the mouth or the width of the nose in the face frame.

The image processing method as described in claim 1 further comprises: Calculating a rotation angle of a facial orientation direction in the head posture relative to an axis on a horizontal plane; and Updating the feature width according to the rotation angle between the facial orientation direction and the axis to generate the updated width.

The image processing method as described in claim 1 further comprises: According to the scaling ratio of the updated width relative to the initial width, the virtual camera is moved closer to or farther away from a fixed point to adjust the shooting distance of the virtual camera in the virtual scene.

The image processing method as described in claim 1, wherein the step of adjusting the shooting distance of the virtual camera further comprises: If the zoom ratio decreases, the shooting distance of the virtual camera in the virtual scene is lengthened; or If the zoom ratio increases, the shooting distance of the virtual camera in the virtual scene is shortened.

The image processing method as described in claim 1 comprises: Analyzing a center position between the facial feature points and the feature width; Calculating a displacement between the center position and an initial position; Based on the displacement, controlling the viewing angle of the virtual camera in the virtual scene; and Based on the viewing angle of the virtual camera and the shooting distance, sampling the two-dimensional image from the virtual scene.

The image processing method as described in claim 6 further comprises: According to the displacement between the center position and the initial position, the virtual camera is moved on a curved surface relative to a fixed point to adjust the viewing angle of the virtual camera in the virtual scene.

An image processing method as described in claim 6, wherein the center position is the center of the two eyes in the face frame, and wherein the feature width is the distance between the two eyes, wherein the control method further comprises: Extracting the inner and outer corner positions of the left eye and the inner and outer corner positions of the right eye from the facial feature points; Averaging the inner and outer corner positions of the left eye as a left eye position; Averaging the inner and outer corner positions of the right eye as a right eye position; and Averaging the left eye position and the right eye position as the center position of the two eyes.

The image processing method as described in claim 8 further includes: Calculating the difference between the left eye position and the right eye position as the feature width.

An electronic device includes: an image sensor for capturing an image; a processor electrically coupled to the image sensor for: analyzing a plurality of facial feature points of a face frame in the image; calculating a feature width based on the facial feature points, and analyzing a head posture based on the facial feature points; updating the feature width to generate an updated width based on the head posture; calculating a scaling ratio of the updated width relative to an initial width; controlling a shooting distance of a virtual camera in a virtual scene based on the scaling ratio; and sampling a two-dimensional image from the virtual scene based on the shooting distance of the virtual camera; and A display, electrically coupled to the processor, for displaying the two-dimensional image.