CN117576573A - Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model - Google Patents

Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model Download PDF

Info

Publication number
CN117576573A
CN117576573A CN202410061106.6A CN202410061106A CN117576573A CN 117576573 A CN117576573 A CN 117576573A CN 202410061106 A CN202410061106 A CN 202410061106A CN 117576573 A CN117576573 A CN 117576573A
Authority
CN
China
Prior art keywords
building
atmosphere
feature
model
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410061106.6A
Other languages
Chinese (zh)
Other versions
CN117576573B (en
Inventor
陈纵
梁海岫
郑豪
姜磊
林泽轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Maritime University
Original Assignee
Guangzhou Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Maritime University filed Critical Guangzhou Maritime University
Priority to CN202410061106.6A priority Critical patent/CN117576573B/en
Priority claimed from CN202410061106.6A external-priority patent/CN117576573B/en
Publication of CN117576573A publication Critical patent/CN117576573A/en
Application granted granted Critical
Publication of CN117576573B publication Critical patent/CN117576573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of building atmosphere classification, in particular to a building atmosphere evaluation method, a system, equipment and a medium based on an improved VGG16 model, wherein the method specifically comprises the following steps: acquiring a building image dataset; constructing a network structure of an improved VGG16 model, and generating an initial building atmosphere classification model; after preprocessing the building image data set, training the initial building atmosphere classification model according to the preprocessed building image data set to obtain a target building atmosphere classification model; and according to the target building atmosphere classification model, a gradient weighting type activation mapping method is combined to obtain a result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the result of the calm atmosphere. The invention realizes objective, accurate and efficient quantitative prediction of building atmosphere by processing and analyzing the perception data of the building environment.

Description

Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model
Technical Field
The invention relates to the technical field of building atmosphere classification, in particular to a building atmosphere evaluation method, system, equipment and medium based on an improved VGG16 model.
Background
The soul of a building is that it shapes the atmosphere of the environment, which is of great importance in the design creation and evaluation process. Although the exterior shape of the building is not ignored, it is deeper how it touches the user's emotion. The charm of an atmosphere is that it has a profound effect on human perception, as it is the most direct and essential experience of the space that people experience. Even in a hurry at a glance, the charm of a location may leave a burn-in that is not grindable at the bottom of the heart. The building space is far more than a pile of bricks and tiles, which is full of vitality and breathes the existence of vitality. Taking the theatre as an example, its spatial layout is intended to promote introitus and praying. Such designs not only create a deep sense of calm, but also carefully guide the audience into the contemplation and meditation boundaries, sometimes even allowing people to have a completely new understanding of the passage of time. An impressive building experience can trigger the mind of the visitor, which makes them concentrate on the feeling and existence of oneself.
Although the influence of atmosphere is self-evident, the deep understanding and definition of the atmosphere are still challenging to date due to the subjective and hard-to-quantify characteristics, so that the quantitative evaluation of the atmosphere becomes extremely difficult, and in particular, the traditional atmosphere quantitative prediction method mainly depends on perception analysis and is limited by subjective factors and complexity, so that the result is not accurate and reliable enough, and the deep accurate quantification of the degree of association between the local design characteristics of a building and the atmosphere of the building is not possible.
Disclosure of Invention
The invention aims to provide a building atmosphere evaluation method, a system, equipment and a medium based on an improved VGG16 model, which realize objective, accurate and efficient quantitative prediction of building atmosphere by processing and analyzing perceived data of a building environment so as to solve at least one of the problems in the prior art.
In a first aspect, the present invention provides a building atmosphere assessment method based on an improved VGG16 model, the method specifically comprising:
acquiring a building image data set, wherein the building image data set comprises a calm atmosphere group building image set and a standard control group building image set;
constructing a network structure of an improved VGG16 model, and generating an initial building atmosphere classification model;
after preprocessing the building image data set, training the initial building atmosphere classification model according to the preprocessed building image data set to obtain a target building atmosphere classification model;
and according to the target building atmosphere classification model, a gradient weighting type activation mapping method is combined to obtain a result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the result of the calm atmosphere.
Further, the construction of the network structure for improving the VGG16 model specifically comprises the following steps:
replacing the original VCG network input layer receiving the 224×224 pixel RGB three-channel image with a VCG network input layer receiving the 1024×1024 pixel RGB three-channel image;
the method comprises the steps of cutting off an original VCG16 model and obtaining cut-off contents, wherein the cut-off contents comprise 13 convolution layers and 5 maximum pooling layers, and a feature extraction layer is constructed according to the cut-off contents and is used for extracting feature information of a building image;
constructing a refined local feature network layer module according to a channel attention mechanism and a spatial attention mechanism, wherein the refined local feature network layer module is used for acquiring key local features of the feature information;
adding a flat layer for converting the output characteristic diagram of the maximum pooling layer into a one-dimensional vector for each maximum pooling layer;
adding a first full-connection layer and a second full-connection layer, setting the number of neurons of the first full-connection layer to 1024, setting the number of neurons of the second full-connection layer to 512, and using the first full-connection layer and the second full-connection layer to receive one-dimensional vectors output by the Flatten layer and perform classification processing;
And adding an output layer after the first full-connection layer and the second full-connection layer, wherein the number of neurons of the output layer is set to be 2, and a sigmoid activation function is adopted, so that the output range is [0,1].
Further, the feature extraction layer includes 5 sets of feature extraction structures, the 1 st set and the 2 nd set of feature extraction structures include 2 convolution layers respectively, the 3 rd set, the 4 th set and the 5 th set of feature extraction structures include 3 convolution layers respectively, each convolution layer uses convolution kernels with a size of 3x3, the number of kernels is gradually increased from 64 to 512, the step size is set to 1, 1 maximum pooling layer is respectively arranged after each set of feature extraction structures, the kernel size of each maximum pooling layer is set to 2 x 2, the step size is set to 2, and a valid mode is adopted to avoid edge feature loss.
Further, the building of the refined local feature network layer module according to the channel attention mechanism and the spatial attention mechanism specifically comprises the following steps:
creating a channel attention unit based on a pooling technology and an AWGN network, wherein the channel attention unit is used for carrying out feature recalibration on an original input feature map to obtain a first feature map;
creating a spatial attention unit based on an LCM module, wherein the spatial attention unit is used for obtaining a second characteristic diagram after carrying out characteristic recalibration on the first characteristic diagram;
And constructing a refined local feature network layer module according to the channel attention unit and the space attention unit, and adding the refined local feature network layer module behind each group of feature extraction structures.
Further, the creating a channel attention unit based on the pooling technology and the AWGN network, where the channel attention unit is configured to obtain a first feature map after performing feature recalibration on an original input feature map, and specifically includes:
respectively acquiring statistical characteristics of each channel based on global average pooling and global standard deviation pooling, and carrying out characteristic fusion to acquire a first characteristic descriptor;
generating a channel attention weight of each channel through the first feature descriptor based on an AWGN network;
and acquiring an original input feature map, and carrying out feature recalibration on the original input feature map according to the channel attention weight of each channel to obtain a first feature map.
Further, the creating a spatial attention unit based on the LCM module, where the spatial attention unit is configured to obtain a second feature map after performing feature recalibration on the first feature map, and specifically includes:
analyzing the local structure of each spatial position of the original input feature map based on an LCM module to obtain a local correlation feature map of each spatial position;
Carrying out cross-channel information integration on the local correlation feature map of each spatial position, and obtaining a spatial attention map through a sigmoid function;
and obtaining a correlation weight of each spatial position based on the spatial attention map, and obtaining a second characteristic map after carrying out characteristic recalibration on the first characteristic map according to the correlation weight of each spatial position.
Further, training the initial building atmosphere classification model according to the preprocessed building image dataset to obtain a target building atmosphere classification model, which specifically comprises:
inputting the preprocessed building image data set into the initial building atmosphere classification model;
performing network training on the initial building atmosphere classification model according to tuning parameters and updating rules of the SGD optimizer, and simultaneously evaluating and calibrating accuracy, recall rate and specificity of an output result of the network training to obtain a target building atmosphere classification model;
the tuning parameters include: the initial learning rate is 0.001, the momentum is 0.9, the learning rate decay is 0.000001, the loss is calculated by using a binary cross entropy loss function, and the total training is 300 rounds;
the update rule satisfiesWherein- >Parameters representing the t+1st iteration, < >>Parameters representing the t-th iteration, +.>Represent learning rate, step size for controlling parameter update, +.>Representing the loss function J versus the parameter->Is a gradient of (a).
In a second aspect, the present invention provides a building atmosphere assessment system based on an improved VGG16 model, the system comprising in particular:
the image data acquisition module is used for acquiring a building image data set, wherein the building image data set comprises a calm atmosphere group building image set and a standard comparison group building image set;
the first model generation module is used for constructing a network structure for improving the VGG16 model and generating an initial building atmosphere classification model;
the second model generation module is used for training the initial building atmosphere classification model according to the preprocessed building image data set after preprocessing the building image data set to obtain a target building atmosphere classification model;
and the model output result module is used for acquiring a result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the calm atmosphere result according to the target building atmosphere classification model and by combining a gradient weighting type activation mapping method.
In a third aspect, the present invention provides a computer device comprising: memory and processor and computer program stored on the memory, which when executed on the processor, implements a construction atmosphere assessment method based on an improved VGG16 model as described in any of the above methods.
In a fourth aspect, the present invention provides a computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements a method for evaluating a building atmosphere based on an improved VGG16 model as described in any of the above methods.
Compared with the prior art, the invention has at least one of the following technical effects:
1. by processing and analyzing the perception data of the building environment, objective, accurate and efficient quantitative prediction of the building atmosphere is realized, key features can be more comprehensively captured in the building environment, and the accuracy and reliability of quantitative prediction of the atmosphere are improved.
2. The method solves the problem that the traditional building atmosphere perception analysis is limited by subjective factors and complexity, the prediction of the model is based on objective characteristics of data, the dependence on subjective factors is reduced, the consistency and stability of the prediction are improved, the convolutional neural network can learn complex image characteristics, the model can understand atmosphere characteristics in a building environment more comprehensively and deeply, the learning capability is helpful for the model to adapt to buildings of different styles and scenes, and the method provides wider applicability for various scenes and applications.
3. The method solves the problem that the traditional method can not accurately capture the key areas related to the atmosphere in the building image, so that the performance of the model is limited, and refines the local characteristics, and the network layer module of the local characteristics by introducing the attention mechanism and designing the new channel attention and space attention mechanism.
4. According to the method, the heat map of the concerned region is generated, so that the concerned degree of the model for different regions in a building image is intuitively displayed, and the interpretability of the model is improved. The visualization results provide information to the building designer regarding the architectural features of interest to the model, which may be used as a guide for improving the architectural design.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a building atmosphere evaluation method based on an improved VGG16 model according to a first embodiment of the invention;
FIG. 2 is a flow chart of a construction atmosphere assessment method based on an improved VGG16 model according to a second embodiment of the invention;
FIG. 3 is a flow chart of a construction atmosphere assessment method based on an improved VGG16 model according to a third embodiment of the invention;
fig. 4 is a flowchart of a construction atmosphere evaluation method based on an improved VGG16 model according to a fourth embodiment of the invention;
FIG. 5 is a schematic structural diagram of a building atmosphere assessment system based on an improved VGG16 model according to an embodiment of the invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Referring to fig. 1, a first embodiment of the present invention provides a construction atmosphere evaluation method based on an improved VGG16 model, the method specifically comprising:
s101, acquiring a building image data set, wherein the building image data set comprises a calm atmosphere group building image set and a standard control group building image set.
In this embodiment, a plurality of calm atmosphere group images (keywords: "serene architecture", "peaceful architecture" and "tranquil architecture") and standard control group images (keywords: "normal UK architecture interior" and "normal UK architecture interior") can be obtained from a search engine such as Google, bing, etc., and repeated pictures and irrelevant pictures (such as building model pictures, rendering pictures, book covers, etc.) are filtered and excluded to obtain a final dataset. Wherein the set of building images of a tranquility group is obtained by taking building images at various tranquility places, which may include villages, mountainous areas, nearby lakes, etc., which generally give a sense of tranquility, relaxation. The standard control group building image set is randomly selected from various building images, and the images can come from different environments such as cities, parks, business areas and the like, and have no specific atmosphere characteristics.
S102, constructing a network structure for improving the VGG16 model, and generating an initial building atmosphere classification model.
In this embodiment, VGG16 is a deep convolutional neural network model, which is characterized by adopting a continuous small convolutional kernel, and has a deeper hierarchical structure than other models, so that feature information in an image can be better extracted and abstracted, and feature extraction of an image biased to subjective building atmosphere is facilitated. The VGG16 model may be modified based on the original VCG16 model to accommodate architectural atmosphere classification tasks such as: adding convolution layers, namely adding 2 convolution layers before the last two full connection layers of the VGG16 model to improve the feature extraction capability; replacing the full connection layers, and replacing the last two full connection layers in the VGG16 model with full connection layers with fewer neurons so as to reduce the complexity of the model and improve the classification performance; adding a Dropout layer, and adding the Dropout layer between all the connection layers to prevent overfitting; adjusting an activation function, and changing the activation function of the last full-connection layer into a Softmax function so as to perform multi-classification tasks; etc.
In some embodiments, in step S102, the building a network structure for improving the VGG16 model specifically includes:
Replacing the original VCG network input layer receiving the 224×224 pixel RGB three-channel image with a VCG network input layer receiving the 1024×1024 pixel RGB three-channel image;
the method comprises the steps of cutting off an original VCG16 model and obtaining cut-off contents, wherein the cut-off contents comprise 13 convolution layers and 5 maximum pooling layers, and a feature extraction layer is constructed according to the cut-off contents and is used for extracting feature information of a building image;
constructing a refined local feature network layer module according to a channel attention mechanism and a spatial attention mechanism, wherein the refined local feature network layer module is used for acquiring key local features of the feature information;
adding a flat layer for converting the output characteristic diagram of the maximum pooling layer into a one-dimensional vector for each maximum pooling layer;
adding a first full-connection layer and a second full-connection layer, setting the number of neurons of the first full-connection layer to 1024, setting the number of neurons of the second full-connection layer to 512, and using the first full-connection layer and the second full-connection layer to receive one-dimensional vectors output by the Flatten layer and perform classification processing;
and adding an output layer after the first full-connection layer and the second full-connection layer, wherein the number of neurons of the output layer is set to be 2, and a sigmoid activation function is adopted, so that the output range is [0,1].
Specifically, the feature extraction layer includes 5 sets of feature extraction structures, the 1 st set and the 2 nd set of feature extraction structures include 2 convolution layers respectively, the 3 rd set, the 4 th set and the 5 th set of feature extraction structures include 3 convolution layers respectively, each convolution layer uses convolution kernels with a size of 3x3, the number of kernels is gradually increased from 64 to 512, the step size is set to 1, 1 maximum pooling layer is respectively arranged after each set of feature extraction structures, the kernel size of each maximum pooling layer is set to 2 x 2, the step size is set to 2, and a valid mode is adopted to avoid edge feature loss.
In the embodiment, an improved VGG16 network structure is used as a trunk, and a building atmosphere classification model is constructed; the network structure consists of a new input layer, a reserved VGG16 feature extraction network layer, a local feature refinement network layer module (a fusion module for improving channel attention and spatial attention) and a new full connection layer.
(1) Input layer
The input layer of the original VGG network is replaced from receiving 224 x 224 pixels to receiving an RGB three-channel image with a resolution of 1024x1024 pixels to ensure sufficient resolution to capture details in the building structure.
(2) Feature extraction part truncated by VGG16 original model
With the classical VGG16 network part structure, the classical network is truncated, using only 13 convolutional layers and 5 max pooling layers as part of the feature extractor, and omitting its fully connected layers. The 13 convolutional layers are divided into 5 groups, wherein the first and second groups each comprise two convolutional layers and the last three groups each comprise three convolutional layers. All convolution layers use convolution kernels of 3x3 size, the number of kernels gradually increasing from 64 to 512 (64,128,256,512,512), the step size being set to 1. The feature extraction layer of this section ends with conv5_3 (the third convolutional layer of the fifth group). The maximum pooling layer is arranged behind each group of convolution layers, the kernel size of the five pooling layers is set to be 2x2, the step length of all pooling layers is set to be 2, and a valid mode is adopted to avoid edge feature loss, and the maximum pooling layer can downsample the output of the convolution layers and retain the most important feature information.
The feature extraction layer comprises 5 groups of feature extraction structures, the number of convolution layers in each group of structures is different, and the depth of the convolution layers is gradually increased from 2 convolution layers in the 1 st group to 3 convolution layers in the 5 th group. The design of the multi-group feature extraction structure can capture features of different levels, and higher-level feature representations are gradually abstracted from shallow layers to deep layers.
Each convolution layer uses a convolution kernel of 3x3 size, the design of which helps capture local detail information in the image. At the same time, the number of kernels is gradually increased from 64 to 512, which means that the model can gradually learn more feature channels as the feature extraction proceeds, and the representation capability of the features is enhanced.
The step size of each convolution layer is set to 1, which means that the convolution operation will proceed smoothly across the image without introducing a large positional offset. Furthermore, each feature extraction structure is followed by a maximum pooling layer with a kernel size of 2x2 and a step size of 2. The maximum pooling operation is beneficial to reducing the dimension of the features and the calculated amount, and meanwhile, the robustness of the model can be enhanced and the model is insensitive to small changes of the image. The pooling operation is carried out by adopting the valid mode, so that the loss of edge characteristics can be avoided, and the model can be ensured to extract meaningful characteristics from a complete input image.
By the design, the feature extraction layer can effectively extract rich and hierarchical features from the input image.
(3) Network layer module for refining local characteristics
Inspired by a transducer model, in a building atmosphere quantitative prediction task, a channel and spatial attention combined refined local feature network layer module is designed for enabling the model to automatically pay more attention to a key information part when processing information, but not to treat all information equally. And after the convolution blocks of each group, a refined local feature network layer module is added, attention force diagrams are sequentially deduced along two independent dimensions (channels and spaces), then correlations between the channels and feature space positions are learned, the attention force diagrams are multiplied by the input feature diagrams to carry out self-adaptive feature modification, the detail information of the building atmosphere attention area is better extracted, and the module structure is shown in the figure.
(4) Adding a full connection layer and an output layer
Layer of flat: first add a layer of planen to convert the output feature map (feature maps) of the last max-pooling layer into a one-dimensional vector.
Full tie layer: two fully connected layers are then added. The number of neurons of the first fully connected layer is set to 1024, more neurons can capture more feature combinations, the number of neurons of the second fully connected layer is set to 512, and a ReLU activation function is added behind both fully connected layers to solve the gradient vanishing problem, meanwhile Dropout is used to reduce the risk of overfitting, and the Dropout rate is set to 0.5, namely 50% of neurons are randomly ignored during each update, so that the neurons are prevented from overadapting to training data.
Output layer: aiming at the building atmosphere quantitative prediction task, the number of neurons of an output layer is set to be 2, a sigmoid activation function is adopted, and the output range is [0,1].
In some embodiments, the building a refined local feature network layer module according to the channel attention mechanism and the spatial attention mechanism, as shown in fig. 2, specifically includes:
s201, creating a channel attention unit based on a pooling technology and an AWGN network, wherein the channel attention unit is used for carrying out feature recalibration on an original input feature map to obtain a first feature map.
In this embodiment, the original input feature map is subjected to pooling, so that the pooling operation can reduce the dimension of the feature map, reduce the calculation amount, and enhance the robustness of the model. The feature map processed by the pooling technology is input into an AWGN (Adaptive Weight Generation Network, self-adaptive weight generation network) network for feature recalibration, and the AWGN network carries out weighting processing on the channels by calculating the importance score of each channel in the feature map, so that the channel information of the feature map is recalibrated. The calibration mechanism can highlight the characteristic information of important channels, reduce the influence of irrelevant or redundant channels, and improve the effectiveness and classification accuracy of the characteristics.
S202, creating a spatial attention unit based on the LCM module, wherein the spatial attention unit is used for obtaining a second characteristic diagram after carrying out characteristic recalibration on the first characteristic diagram.
In this embodiment, the first feature map is input to the LCM (Local Context Module) module for processing, and the LCM module captures the relevance of the spatial positions in the feature map by modeling the local context information, and then performs weighting processing on the spatial positions by calculating the importance score of each spatial position in the feature map, so as to recalibrate the spatial information of the feature map, and obtain the second feature map. The calibration mechanism can highlight the characteristic information of important spatial positions, reduce the influence of irrelevant or redundant spatial positions, and improve the effectiveness and classification accuracy of the characteristics.
S203, constructing a refined local feature network layer module according to the channel attention unit and the space attention unit, and adding the refined local feature network layer module behind each group of feature extraction structures.
In this embodiment, by adding a refined local feature network layer module behind each group of feature extraction structures, the feature representation capability of the model can be further enhanced, the accuracy of classification or recognition tasks is improved, and the feature graph can be subjected to multi-dimensional feature recalibration through the combined action of the channel attention unit and the space attention unit, so that the model is facilitated to better adapt to complex and changeable scenes, and the generalization capability of different data distribution is improved.
Further, the step S201, as shown in fig. 3, further includes:
s2011, respectively acquiring the statistical characteristics of each channel based on global average pooling and global standard deviation pooling, and carrying out characteristic fusion to acquire a first characteristic descriptor.
In this embodiment, unlike the convolution attention module (Convolutional Block Attention Module, CBAM), the present invention employs global averaging pooling (Global Average Pooling, GAP) and global standard deviation pooling (Global Standard Deviation Pooling, GSDP) to obtain statistical features for each channel, the purpose of which is to take into account feature distribution differences inside the channel. Global averaging pooling and global standard deviation pooling extract the characteristic information of each channel from both the mean and volatility aspects, respectively. The multi-dimensional feature extraction method can better capture the internal structure and mode of the input feature map, and provide richer feature representation for subsequent tasks.
GAP converts the characteristic diagram of (H×W×C) into a vector of (1×1×C), and grabs the average information of the channels; for input feature graphsWhere C is the number of channels, H is the height, W is the width, and global average pooling is calculated as follows:
for each channel c, a global average is calculated
In this way, a vector is obtained, each element of which represents the global average of the corresponding channel.
GSDP also converts the feature map of (H W C) into a vector of (1X 1C), capturing the degree of dispersion within the channel.
Using the global average calculated aboveThe standard deviation for each channel was calculated:
similar to global averaging pooling, this results in a vector whose each element represents the global standard deviation of the corresponding channel.
Will beAnd->Stitching (concatenation) along a channel dimension, generating a fused descriptor
And S2012, generating the channel attention weight of each channel through the first feature descriptor based on the AWGN network.
In this example, we obtainThe descriptor may then be used to calculate an attention weight for each channel. These statistics are fed into a small network (two fully connected layers) that learns to generate weights for each channel. The first fully connected layer may reduce the vector of (1×1×c) to (1×1× (C/r)), where r is the reduction rate, for controlling the model complexity. The second fully connected layer then increases the dimension from (1 x (C/r)) back to (1 x C).
Using a first fully-connected layer pairPerforming dimension reduction to obtain a feature vector +.>Where r is the compression ratio.
Wherein the method comprises the steps ofIs the weight of the full connection layer, +.>Is a bias term and ReLU is a nonlinear activation function.
The next step is to passThe second full connection layer connects the feature vectorsThe dimension of (2) is increased again to the original channel number C, resulting in a channel attention weight +.>And activates the generated weights using a sigmoid function.
Wherein the method comprises the steps ofIs the weight of the full connection layer, +.>Is a bias term, sigmoid is an activation function used to normalize weights to the ((0, 1)) interval.
The generated attention weight of the channel can carry out weighting processing on each channel in the feature map, and important information of the feature map can be further highlighted by giving larger weight to important channels and smaller weight to unimportant or redundant channels, so that influence of irrelevant or redundant information is reduced.
S2013, acquiring an original input feature map, and carrying out feature recalibration on the original input feature map according to the channel attention weight of each channel to obtain a first feature map.
In this embodiment, the activated weights are multiplied by the original input feature map to perform channel-level feature recalibration to enhance the model's interest in the information-rich region, i.e
Wherein,representing the first characteristic diagram, ++>Representing the original input feature map, +.>Representing channel attention weights.
Each channel in the original input feature map may be weighted by using a channel attention weight. The weighting processing can highlight the characteristic information of important channels, inhibit the characteristic information of unimportant or redundant channels, facilitate the model to better understand and extract the internal structure of the characteristic diagram, and improve the accuracy of classification or recognition tasks.
Further, the step S202 specifically shown in fig. 4 further includes:
s2021, analyzing the local structure of each spatial position of the original input feature map based on the LCM module to obtain a local correlation feature map of each spatial position.
In this embodiment, to enhance the performance of the VGG16 model in the building atmosphere quantification prediction task by enhancing the features of locally relevant regions in the image, for spatial attention mechanism design, we employ a local correlation module (Local Correlation Module, LCM) to better capture the dependency between local features by analyzing the local structure of each location in the image, thereby generating a spatial attention map that can be used to emphasize or suppress the feature representation of a particular region.
Unlike CBAM (where the spatial attention is more focused on global context features), the spatial attention of LCM is more focused on correlations between local features for each location, computing its correlation to all locations within the surrounding neighborhood. This is achieved by a convolution operation in which the size of the convolution kernel defines the extent of the local neighborhood, here a 3 x 3 convolution kernel is used, aiming to promote the response to the detail features by strengthening the correlation of the local region. The obtained local features are subjected to autocorrelation operation, and element-by-element squaring operation is utilized to emphasize correlation in the local features.
For input feature graphsWe apply a set of 3 x 3 convolution kernels for local feature extraction. These convolution kernels may capture the local pattern and structure of each location.
Wherein the method comprises the steps ofIs a weight parameter of the convolution kernel, +.>Is a bias parameter->Representing convolution operations +.>Is an activation function. Usually->The same number of channels as the input profile (C) is set to maintain consistency in channel dimensions.
At local featuresThe autocorrelation operation is performed thereon, which may be an element-by-element squaring operation, or a multiplication operation between features. This step strengthens the local features in the image.
Here, theRepresenting Hadamard products (element-wise multiplication) which emphasize correlation in local features.
The LCM module can analyze each spatial position of the original input feature map, extract local structural features of the spatial position, sense local structural information of each spatial position in the feature map, capture detailed information such as edges and textures in an image better, help the model to understand image content better, help the model to understand positions and relative relations of objects in the image better, and improve accuracy of classification or recognition tasks.
S2022, integrating the cross-channel information of the local correlation feature map of each spatial position, and obtaining a spatial attention map through a sigmoid function.
In this embodiment, these normalized weights are used to weight the features of the local neighborhood to strengthen the highly correlated features.
The final step is to generate a spatial attention map by applying a force toAnother 1x1 convolution is applied and then done by a sigmoid activation function. This allows mapping of the feature vector at each location to a single scalar value, between 0 and 1, representing the importance of that location. These correlation weights are normalized by a sigmoid function.
Wherein the method comprises the steps ofIs a weight of 1x1 convolution, +.>Is the bias, sigmoid is the activation function,is the final activation weight, +.>The correlation weight after cross-channel information integration is carried out.
By integrating the cross-channel information of the local related feature graphs of each spatial position, feature information of different channels can be fused, richer and comprehensive feature representations are extracted, the model is facilitated to better understand the internal structure and mode of the input feature graphs, the accuracy of classification or recognition tasks is improved, the attention degree of the model to different spatial positions is reflected, and the model is facilitated to better focus on important areas and details.
S2023, obtaining a correlation weight of each spatial position based on the spatial attention map, and obtaining a second feature map after performing feature recalibration on the first feature map according to the correlation weight of each spatial position.
In this embodiment, the activated weights are multiplied by the first feature map to perform channel-level feature recalibration to enhance the model's interest in the information-rich region.
Wherein,representing a second characteristic map, ">Representing a first characteristic map, ">Representing the correlation weights of the final activations.
By using the correlation weight of each spatial position in the spatial attention map, the features in the first feature map can be weighted, the feature information of important spatial positions can be highlighted, and the feature information of unimportant or redundant spatial positions is restrained, so that the model is facilitated to better understand the global structure and mode in the feature map, and the expressive power of the model is further enhanced.
And S103, after preprocessing the building image data set, training the initial building atmosphere classification model according to the preprocessed building image data set to obtain a target building atmosphere classification model.
In this embodiment, preprocessing the building image dataset specifically includes:
(1) Data enhancement is carried out on the collected building images: performing data enhancement on the image data, including rotation, scaling, shearing, horizontal overturning and the like; the model can learn more robust features, and the generalization capability of the model for different environments and conditions is improved.
1.1 Rotation of (b)
A random angle is selected, the image is rotated by the angle, the angle change is limited to [ -30 degrees, 30 degrees ], and the rotation operation can help the model learn the position invariance. For angle θ, each point (x, y) on the image will be transformed to (x ', y'), where:
1.2 Scale-up and scale-down)
The image is scaled with a random scaling factor. Limiting the scaling factor to a range of 0.8, 1.2, the scaled image needs to be readjusted to the network input size.
1.3 Shearing(s)
And performing shearing transformation on the image to change the perspective of the image content. The shear transformation may be represented by an affine transformation matrix, typically applied in the horizontal or vertical direction.
1.4 Horizontal turning over)
Flipping the image at a certain probability level may provide more data diversity.
(2) Performing format conversion and size adjustment on the image: converting the data type of the image into floating point type float32; and the image is resized to ensure that the size of the image matches the size of the model input (the input size required for the improved VGG16 is 1024x1024 pixels).
(3) Mean normalization (mean) of the enhanced adjusted image data: and removing the mean shift of the data, and adjusting the mean of the characteristic data to 0. Each pixel value is subtracted from the global average value calculated over a large dataset (ImageNet) for each color channel (RGB) and divided by the standard deviation. This approach ensures that the distribution of the input data matches the distribution used by the pre-trained model, helping the model to generalize better to new data. The mean and standard deviation of the following pretrained VGG16 model on ImageNet was used:
average (RGB order) [123.68, 116.779, 103.939]
Standard deviation (RGB sequence) [58.393, 57.12, 57.375]
For each color channel, the normalized calculation formula is as follows:
Wherein,is the pixel value of the color channel c (red R or green G or blue B) at position (x, y), is +.>Mean value of the color channel on the ImageNet dataset,/for the color channel>Refers to the standard deviation of the color channel on the ImageNet dataset.
Through the process, each feature of the building image data set is processed by zero mean value and unit standard deviation, so that the convergence speed of a gradient descent algorithm is accelerated when a convolutional neural network model is trained, the variance of parameter updating is reduced, the consistency of different feature scales is maintained, and adverse effects on the model due to overlarge weights of certain features are prevented; meanwhile, by reducing the correlation among input features, the average value standardization is beneficial to model learning of more robust features, so that the generalization capability of the model is improved.
The data preprocessing is beneficial to ensuring the quality and consistency of input data, and the step is beneficial to improving the stability and generalization capability of the model.
Further, in the step S103, training the initial building atmosphere classification model according to the preprocessed building image dataset to obtain a target building atmosphere classification model, which specifically includes:
inputting the preprocessed building image data set into the initial building atmosphere classification model;
Performing network training on the initial building atmosphere classification model according to tuning parameters and updating rules of the SGD optimizer, and simultaneously evaluating and calibrating accuracy, recall rate and specificity of an output result of the network training to obtain a target building atmosphere classification model;
the tuning parameters include: the initial learning rate is 0.001, the momentum is 0.9, the learning rate decay is 0.000001, the loss is calculated by using a binary cross entropy loss function, and the total training is 300 rounds;
the update rule satisfiesWherein->Parameters representing the t+1st iteration, < >>Parameters representing the t-th iteration, +.>Represent learning rate, step size for controlling parameter update, +.>Representing the loss function J versus the parameter->Is a gradient of (a).
In this embodiment, the preprocessed building images are input into the network for training, and are optimized by using the SGD optimizer, the initial learning rate is 0.001, the momentum is 0.9, the learning rate is attenuated to 0.000001, the loss is calculated by using the binary cross entropy loss function, the total training is 300 rounds, and the learning of the building atmosphere quantization prediction model is performed by training the neural network model.
SGD (Stochastic Gradient Descent, random gradient descent) is one of the most basic optimization algorithms in deep learning to minimize the loss function and thereby update the parameters of the model. The basic idea of SGD is to gradually converge the loss function to a minimum by iteratively updating the parameters.
In a specific implementation, momentum is introduced to take into account the weighted average of past gradients in the update. This helps to accumulate velocity in the gradient direction, improving convergence; meanwhile, a learning rate scheduling method is adopted, the learning rate is dynamically adjusted, and gradually reduced along with the training process, so that the parameter can be adjusted more finely. In the selection of the loss function, since the two-classification task is adopted, the binary cross entropy (Binary Crossentropy) loss function is adopted, and the expression is as follows:
wherein,is the number of samples, +.>Is the actual tag (0 or 1) of the ith sample, is->Is the model predictive output of the ith sample.
During the training process, the model will try to minimize the binary cross entropy loss function to optimize the model parameters to better adapt to the building atmosphere quantization prediction task.
Meanwhile, the Accuracy (Accuracy), recall (Recall) and Specificity (Specificity) are used as model evaluation performance indexes, and the calculation formula is as follows:
where TP represents the number of correctly predicted calm images, FP represents the number of standard control images to be predicted as the calm images, TN represents the number of standard control images to be correctly predicted, and FN represents the number of standard control images to be predicted as the calm images. Accurcy is the sample proportion of the correct image quantity of the predicted category to the total predicted image quantity, and reflects the reliability degree of the model on the quantitative prediction of the building atmosphere under the condition of category data balance. A value between 0 and 1 can be obtained by calculation, representing the overall classification accuracy of the model. Generally the higher the accuracy, the better the performance of the model. Recall is the proportion of the predicted correct calm atmosphere image to the total real calm atmosphere image, and reflects the accuracy of the model to the building image prediction of the calm atmosphere. The Specificity is the proportion of the standard control image with correct prediction to the total real standard control image, and reflects the accuracy of the model to the prediction of the standard control building image. Based on Accumey index, two indexes of Recall and Specificity are added to evaluate the model performance more comprehensively.
S104, according to the target building atmosphere classification model and a gradient weighting type activation mapping method, obtaining the result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the calm atmosphere result.
In this embodiment, after preprocessing a new building environment image, inputting a trained model to obtain a quantized prediction output result of building atmosphere.
After the output of the image prediction, a Gradient weighted class-activated mapping (Gradient-weighted Class Activation Mapping, grad-CAM) method is added, without changing the architecture or retraining, to provide a more detailed model interpretation by using Gradient information to visualize the region of interest of the model on the input image.
(1) Gradient information: the gradient of the output score for a particular class to the convolutional layer is recorded as the model propagates forward. These gradients represent the degree of variation of the output score with respect to the convolution layer profile.
(2) Global average pooling: and carrying out global average pooling on the gradient information to obtain the weight on each feature map, and representing the importance of the feature map to the target category.
(3) And (3) weighted sum: each feature map is multiplied by a corresponding weight and then all feature maps are weighted together. This weighted sum indicates which regions of the input image play a key role in the final classification decision of the model.
(4) ReLU activation: a ReLU activation function is applied to the weighted sum to exclude the effect of the negative weights on the final visualization.
(5) Upsampling: and upsampling the final feature map to the same size as the input image to obtain a Grad-CAM visual result.
The output of Grad-CAM is a thermodynamic diagram representing the area of interest of the model in the input building environment image. This helps to interpret the model's ambiance-aware decisions and provides a visual understanding of the results of the calm ambiance classification.
And the result of judging whether the building accords with the calm atmosphere or not by the machine is obtained through the model output, and meanwhile, the thermodynamic diagram result output by the model can be interpreted to determine the influence degree of different characteristics on the building atmosphere. According to the quantitative prediction and the visual result of the model, substantial guidance can be provided for building design, improvement of building environment quality and user experience.
The above steps constitute the main algorithm flow of the building atmosphere quantitative prediction and evaluation method for improving the VGG16 model (convolutional neural network). The algorithm can analyze the building environment perception data more comprehensively and objectively by improving the visualization of the quantization prediction and gradient weighting type activation mapping method of the VGG16 model, improves the accuracy and the practicability of atmosphere quantization prediction, and provides guidance for improving the quality of the building environment.
Referring to fig. 5, an embodiment of the present invention provides a building atmosphere assessment system 5 based on an improved VGG16 model, the system 5 specifically comprising:
an image data acquisition module 501, configured to acquire a building image dataset, where the building image dataset includes a calm atmosphere group building image dataset and a standard control group building image dataset;
a first model generating module 502, configured to construct a network structure for improving the VGG16 model, and generate an initial architectural atmosphere classification model;
a second model generating module 503, configured to perform preprocessing on the building image dataset, and train the initial building atmosphere classification model according to the preprocessed building image dataset to obtain a target building atmosphere classification model;
and a model output result module 504, configured to obtain, according to the target building atmosphere classification model and in combination with a gradient weighting type activation mapping method, a result of whether the input building image meets the calm atmosphere, and a degree of influence of each feature of the input building image on the result of the calm atmosphere.
It is understood that the content of the embodiment of the construction atmosphere evaluation method based on the improved VGG16 model shown in fig. 1 is applicable to the embodiment of the construction atmosphere evaluation system based on the improved VGG16 model, and the functions specifically realized by the embodiment of the construction atmosphere evaluation system based on the improved VGG16 model are the same as those of the embodiment of the construction atmosphere evaluation method based on the improved VGG16 model shown in fig. 1, and the advantages achieved by the embodiment of the construction atmosphere evaluation method based on the improved VGG16 model shown in fig. 1 are the same as those achieved by the embodiment of the construction atmosphere evaluation method based on the improved VGG16 model shown in fig. 1.
It should be noted that, because the content of information interaction and execution process between the above systems is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the system is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Referring to fig. 6, an embodiment of the present invention further provides a computer device 6, including: a memory 602 and a processor 601 and a computer program 603 stored on the memory 602, which computer program 603, when executed on the processor 601, implements a building atmosphere assessment method based on an improved VGG16 model as described in any of the above methods.
The computer device 6 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device 6 may include, but is not limited to, a processor 601, a memory 602. It will be appreciated by those skilled in the art that fig. 6 is merely an example of computer device 6 and is not intended to be limiting of computer device 6, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 601 may be a central processing unit (Central Processing Unit, CPU), the processor 601 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 602 may in some embodiments be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. The memory 602 may also be an external storage device of the computer device 6 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Further, the memory 602 may also include both internal storage units and external storage devices of the computer device 6. The memory 602 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program code for the computer program. The memory 602 may also be used to temporarily store data that has been output or is to be output.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being run by a processor, implements the building atmosphere evaluation method based on the improved VGG16 model according to any one of the above methods.
In this embodiment, the integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments disclosed in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Claims (10)

1. The building atmosphere evaluation method based on the improved VGG16 model is characterized by comprising the following specific steps of:
acquiring a building image data set, wherein the building image data set comprises a calm atmosphere group building image set and a standard control group building image set;
constructing a network structure of an improved VGG16 model, and generating an initial building atmosphere classification model;
after preprocessing the building image data set, training the initial building atmosphere classification model according to the preprocessed building image data set to obtain a target building atmosphere classification model;
and according to the target building atmosphere classification model, a gradient weighting type activation mapping method is combined to obtain a result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the result of the calm atmosphere.
2. The method according to claim 1, wherein the constructing a network structure that improves the VGG16 model specifically comprises:
replacing the original VCG network input layer receiving the 224×224 pixel RGB three-channel image with a VCG network input layer receiving the 1024×1024 pixel RGB three-channel image;
the method comprises the steps of cutting off an original VCG16 model and obtaining cut-off contents, wherein the cut-off contents comprise 13 convolution layers and 5 maximum pooling layers, and a feature extraction layer is constructed according to the cut-off contents and is used for extracting feature information of a building image;
constructing a refined local feature network layer module according to a channel attention mechanism and a spatial attention mechanism, wherein the refined local feature network layer module is used for acquiring key local features of the feature information;
adding a flat layer for converting the output characteristic diagram of the maximum pooling layer into a one-dimensional vector for each maximum pooling layer;
adding a first full-connection layer and a second full-connection layer, setting the number of neurons of the first full-connection layer to 1024, setting the number of neurons of the second full-connection layer to 512, and using the first full-connection layer and the second full-connection layer to receive one-dimensional vectors output by the Flatten layer and perform classification processing;
Adding an output layer after the first full connection layer and the second full connection layer, wherein the number of neurons of the output layer is set to be 2, and adoptingActivating a function with an output range of [0,1 ]]。
3. The method of claim 2, wherein the feature extraction layers comprise 5 sets of feature extraction structures, each of the 1 st and 2 nd sets of feature extraction structures comprises 2 convolution layers, each of the 3 rd, 4 th and 5 th sets of feature extraction structures comprises 3 convolution layers, each convolution layer uses a convolution kernel of 3x3 size, the number of kernels is gradually increased from 64 to 512, the step size is set to 1, each set of feature extraction structures is followed by 1 max pooling layer, the kernel size of each max pooling layer is set to 2 x 2, the step size is set to 2, and a valid mode is employed to avoid edge feature loss.
4. A method according to claim 3, wherein the building of the refined local feature network layer module according to the channel attention mechanism and the spatial attention mechanism comprises:
creating a channel attention unit based on a pooling technology and an AWGN network, wherein the channel attention unit is used for carrying out feature recalibration on an original input feature map to obtain a first feature map;
Creating a spatial attention unit based on an LCM module, wherein the spatial attention unit is used for obtaining a second characteristic diagram after carrying out characteristic recalibration on the first characteristic diagram;
and constructing a refined local feature network layer module according to the channel attention unit and the space attention unit, and adding the refined local feature network layer module behind each group of feature extraction structures.
5. The method of claim 4, wherein the creating a channel attention unit based on a pooling technique and an AWGN network, the channel attention unit configured to obtain a first feature map after feature recalibrating an original input feature map, specifically includes:
respectively acquiring statistical characteristics of each channel based on global average pooling and global standard deviation pooling, and carrying out characteristic fusion to acquire a first characteristic descriptor;
generating a channel attention weight of each channel through the first feature descriptor based on an AWGN network;
and acquiring an original input feature map, and carrying out feature recalibration on the original input feature map according to the channel attention weight of each channel to obtain a first feature map.
6. The method of claim 4, wherein the LCM module creates a spatial attention unit, and the spatial attention unit is configured to obtain a second feature map after feature recalibrating the first feature map, and specifically comprises:
Analyzing the local structure of each spatial position of the original input feature map based on an LCM module to obtain a local correlation feature map of each spatial position;
integrating the cross-channel information of the local correlation characteristic map of each spatial position and passing throughThe function obtains a spatial attention map;
and obtaining a correlation weight of each spatial position based on the spatial attention map, and obtaining a second characteristic map after carrying out characteristic recalibration on the first characteristic map according to the correlation weight of each spatial position.
7. The method according to any one of claims 1 to 6, wherein training the initial architectural atmosphere classification model from the preprocessed architectural image dataset to obtain a target architectural atmosphere classification model, in particular comprises:
inputting the preprocessed building image data set into the initial building atmosphere classification model;
performing network training on the initial building atmosphere classification model according to tuning parameters and updating rules of the SGD optimizer, and simultaneously evaluating and calibrating accuracy, recall rate and specificity of an output result of the network training to obtain a target building atmosphere classification model;
the tuning parameters include: the initial learning rate is 0.001, the momentum is 0.9, the learning rate decay is 0.000001, the loss is calculated by using a binary cross entropy loss function, and the total training is 300 rounds;
The update rule satisfiesWherein->Parameters representing the t+1st iteration, < >>Parameters representing the t-th iteration, +.>Represent learning rate, step size for controlling parameter update, +.>Representing the loss function J versus the parameter->Is a gradient of (a).
8. Building atmosphere evaluation system based on improved VGG16 model, characterized in that the system specifically comprises:
the image data acquisition module is used for acquiring a building image data set, wherein the building image data set comprises a calm atmosphere group building image set and a standard comparison group building image set;
the first model generation module is used for constructing a network structure for improving the VGG16 model and generating an initial building atmosphere classification model;
the second model generation module is used for training the initial building atmosphere classification model according to the preprocessed building image data set after preprocessing the building image data set to obtain a target building atmosphere classification model;
and the model output result module is used for acquiring a result of whether the input building image accords with the calm atmosphere or not and the influence degree of each feature of the input building image on the calm atmosphere result according to the target building atmosphere classification model and by combining a gradient weighting type activation mapping method.
9. A computer device, comprising: memory and processor and computer program stored on the memory, which when executed on the processor, implements the construction atmosphere assessment method based on the improved VGG16 model according to any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the building atmosphere assessment method based on the modified VGG16 model as claimed in any one of claims 1 to 7.
CN202410061106.6A 2024-01-16 Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model Active CN117576573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410061106.6A CN117576573B (en) 2024-01-16 Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410061106.6A CN117576573B (en) 2024-01-16 Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model

Publications (2)

Publication Number Publication Date
CN117576573A true CN117576573A (en) 2024-02-20
CN117576573B CN117576573B (en) 2024-05-17

Family

ID=

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476283A (en) * 2020-03-31 2020-07-31 上海海事大学 Glaucoma fundus image identification method based on transfer learning
US20200250491A1 (en) * 2017-11-01 2020-08-06 Tencent Technology (Shenzhen) Company Limited Image classification method, computer device, and computer-readable storage medium
CN111523561A (en) * 2020-03-19 2020-08-11 深圳市彬讯科技有限公司 Image style recognition method and device, computer equipment and storage medium
CN111612066A (en) * 2020-05-21 2020-09-01 成都理工大学 Remote sensing image classification method based on depth fusion convolutional neural network
CN113362223A (en) * 2021-05-25 2021-09-07 重庆邮电大学 Image super-resolution reconstruction method based on attention mechanism and two-channel network
CN115311555A (en) * 2022-07-15 2022-11-08 武汉大学 Remote sensing image building extraction model generalization method based on batch style mixing
WO2022252272A1 (en) * 2021-06-03 2022-12-08 江苏大学 Transfer learning-based method for improved vgg16 network pig identity recognition
CN115830379A (en) * 2022-12-05 2023-03-21 太原科技大学 Zero-sample building image classification method based on double-attention machine system
CN117315477A (en) * 2023-10-17 2023-12-29 北京林业大学 Ancient building identification method based on deep learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250491A1 (en) * 2017-11-01 2020-08-06 Tencent Technology (Shenzhen) Company Limited Image classification method, computer device, and computer-readable storage medium
CN111523561A (en) * 2020-03-19 2020-08-11 深圳市彬讯科技有限公司 Image style recognition method and device, computer equipment and storage medium
CN111476283A (en) * 2020-03-31 2020-07-31 上海海事大学 Glaucoma fundus image identification method based on transfer learning
CN111612066A (en) * 2020-05-21 2020-09-01 成都理工大学 Remote sensing image classification method based on depth fusion convolutional neural network
CN113362223A (en) * 2021-05-25 2021-09-07 重庆邮电大学 Image super-resolution reconstruction method based on attention mechanism and two-channel network
WO2022252272A1 (en) * 2021-06-03 2022-12-08 江苏大学 Transfer learning-based method for improved vgg16 network pig identity recognition
CN115311555A (en) * 2022-07-15 2022-11-08 武汉大学 Remote sensing image building extraction model generalization method based on batch style mixing
CN115830379A (en) * 2022-12-05 2023-03-21 太原科技大学 Zero-sample building image classification method based on double-attention machine system
CN117315477A (en) * 2023-10-17 2023-12-29 北京林业大学 Ancient building identification method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘思捷 等: "基于建筑群体氛围的柔性色彩规划策略研究", 《住宅科技》, no. 5, 31 May 2019 (2019-05-31), pages 1 - 6 *
梁海岫 等: "疗愈导向的大学校园支持性环境营造国际研究进展与启示 ——基于CiteSpace知识图谱分析", 《新建筑》, no. 2, 28 February 2023 (2023-02-28), pages 67 - 73 *

Similar Documents

Publication Publication Date Title
Li et al. An underwater image enhancement benchmark dataset and beyond
KR102442844B1 (en) Method for Distinguishing a Real Three-Dimensional Object from a Two-Dimensional Spoof of the Real Object
CN112750140B (en) Information mining-based disguised target image segmentation method
CN111709409A (en) Face living body detection method, device, equipment and medium
CN113994384A (en) Image rendering using machine learning
CN112639828A (en) Data processing method, method and equipment for training neural network model
JP2023520846A (en) Image processing method, image processing apparatus, computer program and computer equipment based on artificial intelligence
CN104866868A (en) Metal coin identification method based on deep neural network and apparatus thereof
CN115050064A (en) Face living body detection method, device, equipment and medium
CN112017192A (en) Glandular cell image segmentation method and system based on improved U-Net network
US11367206B2 (en) Edge-guided ranking loss for monocular depth prediction
Carballal et al. Transfer learning features for predicting aesthetics through a novel hybrid machine learning method
CN116740439A (en) Crowd counting method based on trans-scale pyramid convertors
CN112651333B (en) Silence living body detection method, silence living body detection device, terminal equipment and storage medium
Neggaz et al. Boosting Archimedes optimization algorithm using trigonometric operators based on feature selection for facial analysis
Singh et al. SEAM-an improved environmental adaptation method with real parameter coding for salient object detection
CN117576573B (en) Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model
CN116363461A (en) Depth network incremental learning method for classifying tumor pathological images of multi-view children
CN117576573A (en) Building atmosphere evaluation method, system, equipment and medium based on improved VGG16 model
CN115937409A (en) Anti-visual intelligent anti-attack texture generation method
CN111539420B (en) Panoramic image saliency prediction method and system based on attention perception features
Kumar et al. Guiding attention of faces through graph based visual saliency (GBVS)
Chen et al. Urban damage estimation using statistical processing of satellite images
CN113674383A (en) Method and device for generating text image
CN114693986A (en) Training method of active learning model, image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant