CN113673567B - Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation - Google Patents
Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation Download PDFInfo
- Publication number
- CN113673567B CN113673567B CN202110816786.4A CN202110816786A CN113673567B CN 113673567 B CN113673567 B CN 113673567B CN 202110816786 A CN202110816786 A CN 202110816786A CN 113673567 B CN113673567 B CN 113673567B
- Authority
- CN
- China
- Prior art keywords
- feature
- sub
- emotion
- panorama
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/06—Topological mapping of higher dimensional structures onto lower dimensional surfaces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
- G06T3/047—Fisheye or wide-angle transformations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computational Mathematics (AREA)
- Evolutionary Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Analysis (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Algebra (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了基于多角度子区域自适应的全景图情感识别方法及系统,包括用于预测沉浸式虚拟环境下的用户情感识别,包括多角度旋转模块、特征提取模块、子区域自适应模块、多尺度融合模块及情感分类模块。利用球面多角度旋转算法产生一系列等距柱状投影全景图,输入卷积神经网络获取不同层次的特征优势。通过全局特征指导局部特征,自适应地建立当前尺度上下文特征之间的关联性,捕获不同层次特征图全局与局部的上下文依赖。对不同层次的特征图上采样,在通道维度拼接实现特征融合,获取用户的情感分类标签。本发明可正确预测用户在多种场景下的情感偏好及分布,提升VR下的用户体验。
The invention discloses a panorama emotion recognition method and system based on multi-angle sub-region self-adaptation, which includes a multi-angle rotation module, a feature extraction module, a sub-region self-adaptation module, a multi-scale fusion module and an emotion classification module for predicting user emotion recognition in an immersive virtual environment. Using the spherical multi-angle rotation algorithm to generate a series of equidistant cylindrical projection panoramas, input the convolutional neural network to obtain different levels of feature advantages. Guide local features through global features, adaptively establish the correlation between current scale context features, and capture the global and local context dependencies of feature maps at different levels. Sampling feature maps at different levels, splicing in the channel dimension to realize feature fusion, and obtaining user's emotional classification labels. The present invention can correctly predict user's emotional preference and distribution in various scenarios, and improve user experience under VR.
Description
技术领域technical field
本发明涉及情感识别领域,具体涉及基于多角度子区域自适应的全景图情感识别方法及系统。The invention relates to the field of emotion recognition, in particular to a panorama emotion recognition method and system based on multi-angle sub-region self-adaptation.
背景技术Background technique
情感是一种心理和生理状态,伴随着认知和意识过程,对人情感、认知的研究是人工智能的高级阶段。随着人工智能和深度学习的蓬勃发展,建立具备感知、识别和理解人类情感能力的情感模型成为可能。通过赋予机器对用户情感做出智能、灵敏和友好反馈的能力,最终营造出一个人与人、人与机器和谐共处的自然环境,这一美好愿景为计算机的未来应用指引新方向。Emotion is a psychological and physiological state, accompanied by the process of cognition and consciousness. The research on human emotion and cognition is the advanced stage of artificial intelligence. With the vigorous development of artificial intelligence and deep learning, it is possible to build emotional models capable of perceiving, recognizing and understanding human emotions. By endowing machines with the ability to make intelligent, sensitive, and friendly feedback on user emotions, and ultimately create a natural environment where humans and humans, and humans and machines coexist harmoniously, this beautiful vision guides a new direction for the future application of computers.
传统情感诱发有图片、文字、语音、视频等方式,对应的情感识别数据集实际预测效果却不尽人意。虚拟现实技术通过沉浸式逼真、立体体验,达到情感诱发目的,是较优的情绪诱发元。近年来,深度学习在实践中取得了革命性进展,但在情感交互方面,基于虚拟现实诱发态下的情感标签数据稀少,缺乏有效的情感研究方法和模型。全景图是全方位、真实的空间信息在二维平面上的一种存储形式,可作为分析VR沉浸式虚拟环境的情感的有效素材。Traditional emotion induction methods include pictures, text, voice, video, etc., but the actual prediction effect of the corresponding emotion recognition data set is not satisfactory. Virtual reality technology achieves the purpose of emotional induction through immersive, realistic and three-dimensional experience, and is a better element of emotional induction. In recent years, deep learning has made revolutionary progress in practice, but in terms of emotional interaction, emotion label data based on virtual reality-induced states is scarce, and effective emotion research methods and models are lacking. Panorama is a storage form of all-round and real spatial information on a two-dimensional plane, which can be used as an effective material for analyzing the emotion of VR immersive virtual environment.
发明内容Contents of the invention
为了克服现有技术的缺点和不足,本发明提出基于多角度子区域自适应的全景图情感识别系统及方法。In order to overcome the shortcomings and deficiencies of the prior art, the present invention proposes a panorama emotion recognition system and method based on multi-angle sub-region self-adaptation.
本发明通过头戴式显示器中全景内容的显示特点以及等距柱状投影方式,设计球面多角度旋转算法获取不同角度全景图,与自适应上下文的卷积神经网络相结合,从而有效提高情感分类标签的准确性。The present invention uses the display characteristics of the panoramic content in the head-mounted display and the equidistant columnar projection method to design a spherical multi-angle rotation algorithm to obtain panoramic images of different angles, and combines it with a context-adaptive convolutional neural network to effectively improve the accuracy of emotional classification labels.
本发明采用如下技术方案:The present invention adopts following technical scheme:
一种基于多角度子区域自适应的全景图情感识别方法,包括:A panorama emotion recognition method based on multi-angle sub-region adaptation, comprising:
多角度旋转步骤:采用球面多角度旋转及等距柱状投影实现三维全方位立体视图到二维平面全景图的转换;Multi-angle rotation step: using spherical multi-angle rotation and equidistant cylindrical projection to realize the transformation from three-dimensional omnidirectional stereoscopic view to two-dimensional plane panorama;
特征提取步骤:利用预训练卷积神经网络模型对二维平面全景图进行特征提取,获取不同层次的特征图;Feature extraction step: use the pre-trained convolutional neural network model to perform feature extraction on the two-dimensional plane panorama, and obtain feature maps of different levels;
子区域自适应步骤:输入不同层次的特征图,寻找全局与局部的关联性,自适应建立当前尺度的上下文特征,捕获不同层次特征图全局与局部的上下文依赖;Sub-region adaptation step: input feature maps of different levels, find global and local correlations, adaptively establish context features of the current scale, and capture global and local context dependencies of feature maps of different levels;
多尺度融合步骤:将不同层次的特征图通过上采样步骤统一尺寸,并在通道维度上进行拼接,实现多尺度特征融合;Multi-scale fusion step: Unify the size of feature maps at different levels through the up-sampling step, and stitch them together in the channel dimension to achieve multi-scale feature fusion;
情感分类步骤:根据不同层次特征优势,确定目标情感,输出对应的情感标签。Emotion classification step: According to the advantages of different levels of features, determine the target emotion, and output the corresponding emotion label.
进一步,所述球面多角度旋转具体为:Further, the multi-angle rotation of the spherical surface is specifically:
建立以用户头部为球心的三维球坐标系,将用户在头戴显示器下呈现的360度全景图先投影到球体表面;Establish a three-dimensional spherical coordinate system with the user's head as the center of the sphere, and project the 360-degree panorama presented by the user under the head-mounted display onto the surface of the sphere;
根据全景图内容分布特点对投影图进行旋转;Rotate the projection image according to the content distribution characteristics of the panorama;
所述旋转包括水平旋转及垂直旋转,水平旋转实现两侧被切割的边缘内容旋转到中间主视区;垂直旋转实现两极严重扭曲失真内容旋转到赤道附近。The rotation includes horizontal rotation and vertical rotation. Horizontal rotation realizes the rotation of the cut edge content on both sides to the middle main viewing area; vertical rotation realizes the rotation of severely distorted and distorted content at the two poles to near the equator.
进一步,所述等距柱状投影是将经线映射为恒定间距的垂直线,将纬线映射为恒定间距的水平线,将三维立体视图等距圆柱投影到二维全景图。Further, the equidistant cylindrical projection is to map the meridians to vertical lines with constant intervals, map the latitude lines to horizontal lines with constant intervals, and project the equidistant cylindrical projection of the three-dimensional stereoscopic view to the two-dimensional panorama.
进一步,三维球坐标为右手坐标系,视场角为90度,将用户双目直视方向作为水平轴,则前视口中心坐标为[0,0,0];右视口中心坐标为[90,0,0];后视口中心坐标为[180,0,0];左视口中心坐标为[-90,0,0];上视口中心坐标为[0,90,0];下视口中心坐标为[0,-90,0];对应与球体相切的立方体的六个面。Furthermore, the three-dimensional spherical coordinates are the right-hand coordinate system, the field of view angle is 90 degrees, and the user's binocular direct viewing direction is taken as the horizontal axis, then the center coordinates of the front viewport are [0,0,0]; the center coordinates of the right viewport are [90,0,0]; the center coordinates of the rear viewport are [180,0,0]; ; corresponding to the six faces of the cube that are tangent to the sphere.
进一步,所述特征提取步骤具体为:Further, the feature extraction step is specifically:
将二维全景图输入预训练好的卷积神经网络,提取视觉世界通用的不同特征空间的层次结构,构成特征向量集合[X1,X2,...,Xl],集合中的每一个元素代表当前层次的特征图。Input the two-dimensional panoramic image into the pre-trained convolutional neural network, extract the hierarchical structure of different feature spaces common in the visual world, and form a feature vector set [X 1 ,X 2 ,...,X l ], each element in the set represents the feature map of the current level.
进一步,所述子区域自适应步骤包括子区域内容表征分支与情感贡献度表征分支两个分支;Further, the sub-area adaptation step includes two branches: a sub-area content characterization branch and an emotional contribution characterization branch;
所述子区域内容表征分支将输入大小为h×w×c的特征图通过自适应平均池化操作,得到子区域内容表征ys,其中h,w,c,s分别代表特征图的高度、宽度、通道数和预置尺寸;The sub-area content representation branch uses an input feature map with an input size of h×w×c through an adaptive average pooling operation to obtain a sub-region content representation y s , where h, w, c, and s respectively represent the height, width, number of channels and preset size of the feature map;
所述情感贡献度表征分支,具体包括:The emotional contribution degree characterization branch specifically includes:
对特征向量集合[X1,X2,...,Xl]中的每个元素进行全局池化,得到大小为1×1×c的全局信息表征g(Xl);Perform global pooling on each element in the feature vector set [X 1 ,X 2 ,...,X l ] to obtain a global information representation g(X l ) with a size of 1×1×c;
利用广播机制将全局信息表征g(Xl)与输入特征图逐元素相加实现残差连接,通过1x1的卷积操作将通道数目转换为s2,从而构建大小为hw×s2的自适应情感贡献度矩阵as;Use the broadcast mechanism to add the global information representation g(X l ) element by element to the input feature map to realize the residual connection, and convert the number of channels into s 2 through the 1x1 convolution operation, so as to construct an adaptive emotional contribution matrix a s of size hw×s 2 ;
将自适应情感贡献度矩阵as与子区域内容表征ys相乘,得到上下文特征表征向量Zl,该向量表示每一个像素点i与每一个子区域ys×s的关联程度。Multiply the adaptive emotion contribution matrix a s with the sub-region content representation y s to obtain the context feature representation vector Z l , which represents the degree of association between each pixel i and each sub-region y s×s .
进一步,所述自适应平均池化将输入特征图划分为s×s个子区域,得到一组子区域表示Ys×s=[y1,y2,...,ys×s],将大小为s×s×c的特征图变形为s2×c的子区域内容表征ys。Further, the adaptive average pooling divides the input feature map into s×s sub-regions, obtains a group of sub-region representations Y s×s =[y 1 ,y 2 ,...,y s×s ], and transforms the feature map with a size of s×s×c into a sub-region content representation y s of s 2 ×c.
进一步,所述构建情感贡献度矩阵as具体步骤为:设子区域ys×s对特征图i点处的情感分类标签的贡献度为ai,则特征图任意i点对应s×s个情感贡献度向量ai,构成集合变形得到情感贡献度矩阵as,其大小为hw×s2。Further, the specific steps of constructing the emotional contribution matrix a s are as follows: set the contribution degree of the sub-region y s×s to the emotional classification label at point i of the feature map as a i , then any point i of the feature map corresponds to s×s emotional contribution vectors a i , forming a set Transformation obtains the emotion contribution degree matrix a s , whose size is hw×s 2 .
进一步,多尺度融合步骤,具体为:利用上采样操作,如反卷积或插值运算等,实现不同层次的多尺度特征图,尺寸统一,并在通道维度上拼接,完成特征融合,最终得到大小为H×W×(C1+C2+...+Cl)的底层几何信息表征与高层语义信息表征相结合的总信息表征。Further, the multi-scale fusion step is specifically: using upsampling operations, such as deconvolution or interpolation operations, to realize multi-scale feature maps at different levels, with uniform sizes, and splicing in the channel dimension to complete feature fusion, and finally obtain the total information representation of the combination of the underlying geometric information representation and the high-level semantic information representation with a size of H×W×(C 1 +C 2 +...+C l ).
一种实现基于多角度子区域自适应的全景图情感识别方法的系统,包括:A system for implementing a panorama emotion recognition method based on multi-angle sub-region adaptation, comprising:
多角度旋转模块:用于多角度旋转及等距柱状投影实现三维全景视图到二维全景图的转换;Multi-angle rotation module: used for multi-angle rotation and equidistant cylindrical projection to realize the conversion from 3D panoramic view to 2D panoramic view;
特征提取模块:用于对二维全景图进行特征提取,得到不同层次的特征图;Feature extraction module: used to perform feature extraction on two-dimensional panoramas to obtain feature maps of different levels;
子区域自适应模块:用于将情感分类标签一致的区域相互关联,全局特征引导局部特征自适应建立当前尺度的上下文特征的关联性,捕获长距离依赖;Sub-region adaptive module: used to correlate regions with consistent emotional classification labels, global features guide local features to adaptively establish the relevance of contextual features at the current scale, and capture long-distance dependencies;
多尺度融合模块:用于将不同层次特征图统一尺寸并在通道维度上拼接,实现多尺度特征融合;Multi-scale fusion module: used to unify the size of feature maps of different levels and splicing them in the channel dimension to realize multi-scale feature fusion;
情感分类模块:根据不同层次特征优势,确定目标情感,输出对应的情感标签。Sentiment classification module: According to the advantages of different levels of features, determine the target emotion, and output the corresponding emotion label.
本发明具有如下的有益效果:The present invention has following beneficial effect:
1、针对虚拟现实诱发态下情感标签数据稀少问题,提出球面多角度旋转算法实现数据增强。对用户虚拟环境下的360度视图建立三维球坐标系,将球体沿不同坐标轴多角度旋转后,再分别进行等距柱状投影得到扩充后的数据样本,可有效提高模型的泛化能力。1. Aiming at the scarcity of emotional label data in the induced state of virtual reality, a spherical multi-angle rotation algorithm is proposed to achieve data enhancement. Establish a three-dimensional spherical coordinate system for the 360-degree view of the user's virtual environment, rotate the sphere at multiple angles along different coordinate axes, and then perform equidistant cylindrical projection to obtain expanded data samples, which can effectively improve the generalization ability of the model.
2、等距柱状投影将经线和纬线等距投影到矩形平面,将导致全景内容在上下两极出现严重的扭曲失真。通过球面多角度旋转算法扩充的数据样本可保持旋转不变性,缓解扭曲失真的同时,将两侧边缘信息旋转到中心主视区,使内容特征能较好的被情感模型捕捉和提取,提升模型识别准确率。2. Equidistant columnar projection projects longitude and latitude equidistantly onto a rectangular plane, which will cause serious distortion and distortion of the panoramic content at the upper and lower poles. The data samples expanded by the spherical multi-angle rotation algorithm can maintain the invariance of rotation and alleviate the distortion. At the same time, the edge information on both sides is rotated to the central main viewing area, so that the content features can be better captured and extracted by the emotional model, and the accuracy of model recognition is improved.
3、利用预训练好的卷积神经网络提取全景图不同层次特征,发挥底层细节信息与高层语义信息的互补优势。通过全局特征引导局部特征,自适应地建立特征图不同区域或对象之间的关联性,捕获长距离依赖。从而有效提升模型对全景图情感诱发区域的预测性能。3. Use the pre-trained convolutional neural network to extract features at different levels of the panorama, and give full play to the complementary advantages of the underlying detailed information and high-level semantic information. Guide local features through global features, adaptively establish the correlation between different regions or objects in feature maps, and capture long-distance dependencies. In this way, the prediction performance of the model on the emotion-induced region of the panorama is effectively improved.
4、本发明填补了全景图情感识别领域的空白,有助于在沉浸式虚拟环境下,对用户情感进行解读并收集反馈,这对于用户行为预测和VR场景建模等VR应用场景的开发至关重要。4. The present invention fills the gap in the field of emotion recognition of panoramas, and helps to interpret user emotions and collect feedback in an immersive virtual environment, which is crucial for the development of VR application scenarios such as user behavior prediction and VR scene modeling.
附图说明Description of drawings
图1是本发明总体实施方法的流程图。Fig. 1 is a flow chart of the general implementation method of the present invention.
图2是用户在虚拟环境下头戴显示器的示意图。Fig. 2 is a schematic diagram of a user wearing a head-mounted display in a virtual environment.
图3(a)及图3(b)分别是三维球坐标及投影后的二维平面示意图。Fig. 3(a) and Fig. 3(b) are three-dimensional spherical coordinates and two-dimensional schematic diagrams after projection, respectively.
图4是多角度旋转算法沿x轴旋转180度的效果示意图。Fig. 4 is a schematic diagram of the effect of the multi-angle rotation algorithm rotating 180 degrees along the x-axis.
图5是本发明子区域自适应模块示意图。Fig. 5 is a schematic diagram of the sub-region adaptive module of the present invention.
图6为本发明总体实施方法的模型框架示意图。Fig. 6 is a schematic diagram of a model framework of the overall implementation method of the present invention.
具体实施方式Detailed ways
下面结合实施例及附图,对本发明作进一步地详细说明,但本发明的实施方式不限于此。The present invention will be described in further detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.
实施例Example
如图1所示,一种基于多角度子区域自适应的全景图情感识别方法,用于沉浸式虚拟环境下对用户情感的识别与预测,包括如下:As shown in Figure 1, a panorama emotion recognition method based on multi-angle sub-region adaptation is used to identify and predict user emotions in an immersive virtual environment, including the following:
多角度旋转模块,对沉浸式虚拟环境呈现给用户的交互式360度视图,如图2所示,采用球面多角度旋转算法获取一系列数据扩充样本。并利用等距柱状投影将经线映射为恒定间距的垂直线,将纬线映射为恒定间距的水平线,完成三维全方位立体视图到二维平面全景图的转换。The multi-angle rotation module presents an interactive 360-degree view to the user in the immersive virtual environment, as shown in Figure 2, and uses a spherical multi-angle rotation algorithm to obtain a series of data expansion samples. And use the equidistant cylindrical projection to map the meridians into vertical lines with constant spacing, and map the latitude lines into horizontal lines with constant spacing, so as to complete the conversion from three-dimensional omnidirectional stereoscopic view to two-dimensional flat panorama.
图2中HMD表示头盔显示器。HMD in Fig. 2 represents a head-mounted display.
所述球面多角度旋转算法,具体如下:建立以用户头部为球心的三维笛卡尔坐标系。将球体沿水平轴依次旋转一定角度,使得原本在两极严重扭曲的对象多角度旋转到赤道附近改善失真。同时将球体沿垂直轴依次旋转一定角度,将两侧被切割的边缘内容旋转到中心主视区。The spherical multi-angle rotation algorithm is specifically as follows: establish a three-dimensional Cartesian coordinate system with the user's head as the center of the sphere. The sphere is rotated by a certain angle along the horizontal axis, so that the object that is seriously distorted at the two poles is rotated at multiple angles to near the equator to improve the distortion. At the same time, the sphere is rotated by a certain angle along the vertical axis, and the cut edge content on both sides is rotated to the central main viewing area.
采用多角度旋转算法的目的在于,要根据全景图内容分布特点使全景图诱发情感的区域旋转到主视的靠近赤道的位置,减少扭曲失真投影带来的不利影响,便于模型捕捉相关特征。The purpose of using the multi-angle rotation algorithm is to rotate the emotion-inducing area of the panorama to a position close to the equator of the main view according to the content distribution characteristics of the panorama, so as to reduce the adverse effects caused by distortion and distortion projection, and facilitate the model to capture relevant features.
所述旋转包括水平旋转及垂直旋转,水平旋转实现两侧被切割的边缘内容旋转到中间主视区;垂直旋转实现两极严重扭曲失真内容旋转到赤道附近。The rotation includes horizontal rotation and vertical rotation. Horizontal rotation realizes the rotation of the cut edge content on both sides to the middle main viewing area; vertical rotation realizes the rotation of severely distorted and distorted content at the two poles to near the equator.
进一步,所述球面多角度旋转算法,具体包括如下步骤:Further, the spherical multi-angle rotation algorithm specifically includes the following steps:
构建以用户头部为原点o的三维球坐标系,符合右手坐标系,如图3(a)所示。利用球面多角度旋转算法,将球体沿水平方向旋转90度,重复2次,实现两侧被切割的边缘内容旋转到中间主视区,见图4。再将球体沿垂直方向旋转45度,重复4次,将原本两极严重扭曲的对象旋转到赤道附近改善失真。每张全景图得到2x4=8种数据增强的结果。Construct a three-dimensional spherical coordinate system with the user's head as the origin o, which conforms to the right-handed coordinate system, as shown in Figure 3(a). Using the spherical multi-angle rotation algorithm, the sphere is rotated 90 degrees in the horizontal direction and repeated twice to realize the rotation of the cut edge content on both sides to the middle main viewing area, as shown in Figure 4. Then rotate the sphere 45 degrees in the vertical direction, repeat 4 times, and rotate the object that was severely distorted at the poles to near the equator to improve the distortion. Each panorama gets 2x4=8 kinds of data augmentation results.
设全景图的高度为H,宽度为W,平面上任意一点的坐标为(u,v),对应三维球坐标点为(x,y,z),经纬度值为则经纬度与球坐标的关系如下:Suppose the height of the panorama is H, the width is W, the coordinates of any point on the plane are (u, v), the corresponding three-dimensional ball coordinates are (x, y, z), and the latitude and longitude are Then the relationship between latitude and longitude and spherical coordinates is as follows:
同一点在三维空间与二维平面的转换公式如下:The conversion formula of the same point in three-dimensional space and two-dimensional plane is as follows:
将经线映射为恒定间距的垂直线,将纬线映射为恒定间距的水平线,如图3(b)所示。The meridian is mapped to vertical lines with constant spacing, and the latitude is mapped to horizontal lines with constant spacing, as shown in Figure 3(b).
在情感识别领域,由于全景图ERP存储格式存在内容扭曲失真的局限性,为了便于模型捕捉相关特征,多角度算法需要将诱发情感的对象或区域旋转到主视的靠近赤道的位置,从而经等距矩形投影到二维平面的中心位。但不同全景图需要的旋转角度不同,人工对每一张全景图个性化定制不切实际,本发明通过设定统一的旋转角度和次数,便于批量化预处理。通常来说,将球体水平旋转90度,重复2次,再将球体沿x轴旋转45度,重复4次,每张全景图得到2x4=8种结果,可基本实现上述要求。In the field of emotion recognition, due to the limitation of content distortion in the panorama ERP storage format, in order to facilitate the model to capture relevant features, the multi-angle algorithm needs to rotate the object or area that induces emotion to the position close to the equator of the main view, so that it is projected to the center of the two-dimensional plane through an equidistant rectangle. However, different panoramas require different rotation angles, and it is impractical to manually customize each panorama. The invention facilitates batch preprocessing by setting a unified rotation angle and times. Generally speaking, the sphere is rotated horizontally by 90 degrees, repeated twice, and then rotated by 45 degrees along the x-axis, repeated four times, and each panorama can obtain 2x4=8 kinds of results, which can basically meet the above requirements.
特征提取模块,使用在大规模图像分类任务上预训练好的卷积神经网络实现特征提取。对于输入图像I,利用公式Xl=f(Σkl·Xl-1+bl),提取视觉世界通用的不同特征空间的层次结构,构成特征图向量集合[X1,X2,...,Xl]。其中,kl为第l层的卷积核,Xl-1为l-1层输出的特征图,bl为偏置项。集合中的每一个元素代表当前层次的特征图,作为子区域自适应模块的输入,发挥不同层次信息的互补优势。The feature extraction module uses a convolutional neural network pre-trained on large-scale image classification tasks to achieve feature extraction. For the input image I, use the formula X l = f(Σk l X l-1 +b l ) to extract the hierarchical structure of different feature spaces common to the visual world, and form a set of feature map vectors [X 1 ,X 2 ,...,X l ]. Among them, k l is the convolution kernel of layer l, X l-1 is the feature map output by layer l-1, and b l is the bias item. Each element in the set represents the feature map of the current level, which is used as the input of the sub-region adaptation module to play the complementary advantages of different levels of information.
子区域自适应模块,如图5所示,通过寻找全局与局部的关联性,自适应建立当前尺度的上下文特征,并捕获不同层次特征图全局与局部的上下文依赖。该模块由子区域内容表征分支与情感贡献度表征两个分支组成,具体为:The sub-region adaptive module, as shown in Figure 5, adaptively establishes the context features of the current scale by looking for global and local correlations, and captures the global and local context dependencies of feature maps at different levels. This module is composed of sub-area content characterization branch and emotional contribution characterization branch, specifically:
子区域内容表征分支对特征向量集合[X1,X2,...,Xl]中的每个元素进行自适应平均池化,自适应平均池化函数定义如下:The sub-area content representation branch performs adaptive average pooling on each element in the feature vector set [X 1 ,X 2 ,...,X l ]. The adaptive average pooling function is defined as follows:
kernel_size=(input_size+2×padding)-(output_size-1)×stridekernel_size=(input_size+2×padding)-(output_size-1)×stride
即输入尺寸、输出尺寸、边界填充及移动步长决定当前卷积核的尺寸。将大小为h×w×c的特征图Xl转换为s×s×c,其中h,w,c,s分别代表特征图的高度、宽度、通道数和预置尺寸。则自适应平均池化将输入特征图划分为s×s个子区域,得到一组子区域表示Ys×s=[y1,y2,...,ys×s]。将大小为s×s×c的特征图变形为s2×c的子区域内容表征ys。That is, the input size, output size, boundary padding and moving step determine the size of the current convolution kernel. Convert the feature map Xl with size h×w×c to s×s×c, where h, w, c, s represent the height, width, number of channels and preset size of the feature map respectively. Then adaptive average pooling divides the input feature map into s×s sub-regions, and obtains a set of sub-regions representing Y s×s =[y 1 ,y 2 ,...,y s×s ]. Transform a feature map of size s×s×c into a sub-region content representation y s of s 2 ×c.
情感贡献度表征分支对特征向量集合[X1,X2,...,Xl]中的每个元素进行全局平均池化,得到大小为1×1×c的全局信息表征g(Xl)。利用广播机制将1×1×c全局信息表征与输入特征图逐像素相加实现残差连接,得到大小为h×w×c的特征图。The emotional contribution representation branch performs global average pooling on each element in the feature vector set [X 1 ,X 2 ,...,X l ] to obtain a global information representation g(X l ) with a size of 1×1×c. The broadcast mechanism is used to add the 1×1×c global information representation and the input feature map pixel by pixel to realize the residual connection, and obtain the feature map with the size of h×w×c.
设子区域ys×s对特征图i点处的情感分类标签的贡献度为ai,通过1x1的卷积操作将通道数目转换为s2,则特征图任意i点对应s×s个情感贡献度向量ai,构成集合变形得到大小为hw×s2自适应情感贡献度矩阵as。Suppose the contribution degree of the sub-region y s×s to the emotional classification label at point i of the feature map is a i , and the number of channels is converted to s 2 through a 1x1 convolution operation, then any point i of the feature map corresponds to s×s emotional contribution vectors a i , forming a set The transformation results in a hw×s 2 adaptive emotion contribution matrix a s .
将情感贡献度表征分支输出的情感贡献度矩阵as与子区域内容表征分支输出的子区域内容表征ys相乘,函数定义如下:Multiply the emotional contribution degree matrix a s output by the emotional contribution degree representation branch with the sub-region content representation y s output by the sub-region content representation branch, the function is defined as follows:
得到上下文特征表征向量Zl,该向量表示每一个像素点i与每一个子区域ys×s的关联程度,其内部隐含的情感贡献度向量Ai表征全局与局部的连接权重,随着网络的不断迭代而自动优化。The contextual feature representation vector Z l is obtained, which represents the degree of association between each pixel i and each sub-region y s×s , and its internal emotional contribution vector A i represents the global and local connection weights, which are automatically optimized with the continuous iteration of the network.
进一步,所述依赖是指两个或多个情感主体之间的关联性。特征提取模块利用全景图全局和局部的特征,可实现对不同区域或对象的识别,比如情感主体人和猫,但这不足以作为情感预测的标准。还需要通过子区域自适应模块,自适应的建立人和猫之间的关联性,人在逗弄或抚摸小猫,从而给出正确的积极的情感标签。Further, the dependence refers to the correlation between two or more emotional subjects. The feature extraction module uses the global and local features of the panorama to realize the recognition of different regions or objects, such as emotional subjects such as people and cats, but this is not enough as a standard for emotional prediction. It is also necessary to adaptively establish the correlation between the human and the cat through the sub-region adaptive module, and the human is teasing or stroking the kitten, so as to give the correct positive emotional label.
多尺度融合模块,实现对不同层次的特征图进行特征融合。利用上采样操作实现不同层次特征图的尺寸统一,然后将统一尺寸的特征图在通道维度上进行拼接,最终得到大小为H×W×(C1+C2+...+Cl)的底层几何信息表征与高层语义信息表征结合。The multi-scale fusion module realizes feature fusion of feature maps at different levels. The upsampling operation is used to unify the size of the feature maps of different levels, and then the feature maps of the uniform size are spliced in the channel dimension, and finally the combination of the underlying geometric information representation and the high-level semantic information representation with a size of H×W×(C 1 +C 2 +...+C l ) is obtained.
情感分类模块,对含显著主体的全景图及不含显著主体的全景图都能实现较高的情感分类效果。由于全连接层的参数冗余,利用全局平均池化取代全连接层起到“分类器”的作用。利用更关注抽象语义信息的深层特征,对具有显著主体的全景图进行情感识别。利用提供关于边缘、条纹以及颜色等细节感知信息的浅层特征,对不含显著主体的全景图进行情感识别。得到准确率更高的情感分类标签,模型的整体框架如图6所示。The emotion classification module can achieve high emotional classification effect for panoramas with salient subjects and panoramas without salient subjects. Due to the parameter redundancy of the fully connected layer, the global average pooling is used to replace the fully connected layer to play the role of a "classifier". Emotion recognition for panoramas with salient subjects using deep features that pay more attention to abstract semantic information. Emotion recognition for panoramas without salient subjects using shallow features that provide perceptual information about details such as edges, stripes, and colors. The emotional classification labels with higher accuracy are obtained. The overall framework of the model is shown in Figure 6.
特征提取模块不同层次卷积操作提取的特征不同,conv layer_1,2等底层卷积提取视觉层特征,如色彩,纹理,轮廓等,conv layer 4,5等高层卷积提取对象层和概念层特征,即抽象的语义信息。预测不同/相同全景图的情感区域需要结合不同层次的特征优势,若全景图内容是单一直白的自然风光场景,则底层颜色、纹理信息是正确分类的关键;若全景图内容是复杂的多对象交互场景,那么高层的语义信息就很重要。子区域自适应模块通过建立特征图不同区域和对象之间的关联性,有利于更好地捕获情感诱发区域,从而给出正确的情感标签。The features extracted by convolution operations at different levels of the feature extraction module are different. The low-level convolutions such as conv layer_1 and 2 extract visual layer features, such as color, texture, outline, etc., and the high-level convolutions such as conv layer 4 and 5 extract object layer and concept layer features, that is, abstract semantic information. Predicting the emotional regions of different/same panoramas needs to combine the feature advantages of different levels. If the content of the panorama is a single and straightforward natural scene, the underlying color and texture information is the key to correct classification; if the content of the panorama is a complex multi-object interaction scene, then the high-level semantic information is very important. The sub-region adaptation module helps to better capture the emotion-induced regions by establishing the correlation between different regions and objects in the feature map, so as to give the correct emotion labels.
本实施例中,特征提取模块提取了conv layer_2,3,4,5的4层特征图,同时每一层的特征图都要送入子区域自适应模块,在不同尺度S=1,2,4,n(s设置为多少也是没有限制的,一般是1,2,4结合的效果最好)下建立不同区域的关联性。因为不同层次特征图的大小不一样,需要通过多尺度融合模块,首先是统一尺度,然后在通道维度上将上述所有特征图进行拼接,将拼接后的总特征作为情感分类的依据,最后得到输入全景图的情感极性,即积极的还是消极的。In this embodiment, the feature extraction module extracts the 4-layer feature maps of conv layer_2, 3, 4, and 5. At the same time, the feature maps of each layer are sent to the sub-region adaptive module, and the correlation of different regions is established under different scales S=1, 2, 4, n (the number of s is not limited, generally the combination of 1, 2, and 4 has the best effect). Because the size of feature maps at different levels is different, it is necessary to use a multi-scale fusion module. First, unify the scale, and then stitch all the above-mentioned feature maps in the channel dimension. The total features after stitching are used as the basis for emotion classification, and finally the emotional polarity of the input panorama is obtained, that is, positive or negative.
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受所述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principles of the present invention should be equivalent replacement methods, and are all included within the protection scope of the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110816786.4A CN113673567B (en) | 2021-07-20 | 2021-07-20 | Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110816786.4A CN113673567B (en) | 2021-07-20 | 2021-07-20 | Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113673567A CN113673567A (en) | 2021-11-19 |
| CN113673567B true CN113673567B (en) | 2023-07-21 |
Family
ID=78539860
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110816786.4A Active CN113673567B (en) | 2021-07-20 | 2021-07-20 | Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113673567B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114201970B (en) * | 2021-11-23 | 2024-12-27 | 国家电网有限公司华东分部 | A method and device for detecting power grid dispatching events based on semantic feature capture |
| CN114827749A (en) * | 2022-04-21 | 2022-07-29 | 应急管理部天津消防研究所 | Method for seamless switching and playing of multi-view panoramic video |
| CN115619625A (en) * | 2022-10-26 | 2023-01-17 | 华南理工大学 | A panoramic image style transfer method, system and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107506722A (en) * | 2017-08-18 | 2017-12-22 | 中国地质大学(武汉) | One kind is based on depth sparse convolution neutral net face emotion identification method |
| CN111832620A (en) * | 2020-06-11 | 2020-10-27 | 桂林电子科技大学 | An image sentiment classification method based on dual attention multi-layer feature fusion |
| CN112784764A (en) * | 2021-01-27 | 2021-05-11 | 南京邮电大学 | Expression recognition method and system based on local and global attention mechanism |
| CN112800875A (en) * | 2021-01-14 | 2021-05-14 | 北京理工大学 | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion |
| CN113011504A (en) * | 2021-03-23 | 2021-06-22 | 华南理工大学 | Virtual reality scene emotion recognition method based on visual angle weight and feature fusion |
-
2021
- 2021-07-20 CN CN202110816786.4A patent/CN113673567B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107506722A (en) * | 2017-08-18 | 2017-12-22 | 中国地质大学(武汉) | One kind is based on depth sparse convolution neutral net face emotion identification method |
| CN111832620A (en) * | 2020-06-11 | 2020-10-27 | 桂林电子科技大学 | An image sentiment classification method based on dual attention multi-layer feature fusion |
| CN112800875A (en) * | 2021-01-14 | 2021-05-14 | 北京理工大学 | Multi-mode emotion recognition method based on mixed feature fusion and decision fusion |
| CN112784764A (en) * | 2021-01-27 | 2021-05-11 | 南京邮电大学 | Expression recognition method and system based on local and global attention mechanism |
| CN113011504A (en) * | 2021-03-23 | 2021-06-22 | 华南理工大学 | Virtual reality scene emotion recognition method based on visual angle weight and feature fusion |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113673567A (en) | 2021-11-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | VR content creation and exploration with deep learning: A survey | |
| Li et al. | Read: Large-scale neural scene rendering for autonomous driving | |
| Wang et al. | High-resolution image synthesis and semantic manipulation with conditional gans | |
| CN117496072B (en) | Three-dimensional digital person generation and interaction method and system | |
| CN111783525A (en) | A method for generating target samples of aerial photography images based on style transfer | |
| CN113673567B (en) | Panoramic image emotion recognition method and system based on multi-angle sub-region self-adaptation | |
| CN115115805B (en) | Training method, device, equipment and storage medium for three-dimensional reconstruction model | |
| Mumuni et al. | A survey of synthetic data augmentation methods in computer vision | |
| CN113822965B (en) | Image rendering processing method, device and equipment and computer storage medium | |
| CN116385667B (en) | Reconstruction method of three-dimensional model, training method and device of texture reconstruction model | |
| CN108876814A (en) | A method of generating posture stream picture | |
| CN117557714A (en) | Three-dimensional reconstruction method, electronic device and readable storage medium | |
| WO2021151380A1 (en) | Method for rendering virtual object based on illumination estimation, method for training neural network, and related products | |
| CN118505878A (en) | Three-dimensional reconstruction method and system for single-view repetitive object scene | |
| CN114821675B (en) | Object processing method and system and processor | |
| CN114581986A (en) | Image processing method, device, electronic device and storage medium | |
| Karakottas et al. | 360 surface regression with a hyper-sphere loss | |
| Yao et al. | Neural radiance field-based visual rendering: A comprehensive review | |
| CN117218246A (en) | Training method, device, electronic equipment and storage medium for image generation model | |
| CN117115398A (en) | Virtual-real fusion digital twin fluid phenomenon simulation method | |
| CN113763536A (en) | A 3D Reconstruction Method Based on RGB Image | |
| He et al. | A survey on 3d gaussian splatting applications: Segmentation, editing, and generation | |
| Sumantri et al. | 360 panorama synthesis from a sparse set of images on a low-power device | |
| Duan et al. | Remote environment exploration with drone agent and haptic force feedback | |
| CN115018979A (en) | Image reconstruction method, apparatus, electronic device, storage medium and program product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |