CN113591699B

CN113591699B - Online visual fatigue detection system and method based on deep learning

Info

Publication number: CN113591699B
Application number: CN202110869724.XA
Authority: CN
Inventors: 牛毅; 张子楠; 马明明; 李甫; 石光明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-02-09
Anticipated expiration: 2041-07-30
Also published as: CN113591699A

Abstract

The invention discloses an online visual fatigue detection system based on deep learning, which mainly solves the problems of single information of an obtained operator, poor real-time system operation and low fatigue detection accuracy in the prior art, and comprises a data acquisition module, an image data processing module and a fatigue detection module, wherein the data acquisition module is arranged under a computer display and is used for acquiring eye movement data, RGB images and depth information; the image data processing module is used for detecting the face position and the face characteristic point in the image data and extracting the depth information of the characteristic point; the fatigue detection module is used for extracting, fusing and classifying the characteristics of the eye movement data, the face characteristic point data and the depth data and outputting the fatigue degree of an operator. The invention uses a non-contact method, reduces the influence on the working state of an operator, avoids the design of manual characteristics, and improves the accuracy of visual fatigue detection. Can be used for detecting the visual fatigue level of an operator in real time on line.

Description

Online visual fatigue detection system and method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision and video analysis, and further relates to an online visual fatigue detection system and method which can be used for online real-time detection of the visual fatigue level of an operator.

Background

With the continuous progress of social development, the use of computers has been involved in various industries, and more work posts require practitioners to be skilled in the skills associated with the computers, and require operators to use the computers for a long time. The working physical force is less, the working content is monotonous and repeated, visual fatigue of operators is easily caused, the working capacity of the operators is reduced, and the working efficiency of the operators is greatly reduced. In addition, when the human body is in a visual fatigue state continuously, the symptoms such as inattention, dry eyes, eye astringency, dizziness and the like of an operator are easily caused. Therefore, how to detect the visual fatigue of the operator and timely take an effective intervention method ensures that the operator can always keep a better working state, thereby improving the task performance of the operator, which is a matter of concern. The prior art is mainly used for visual fatigue detection behaviors and is mainly divided into physiological information and visual information, wherein the physiological information mainly comprises brain electrical information and heart rate information, the visual information mainly comprises eye movement information and facial information, and compared with the visual information-based method, the visual fatigue detection precision can be higher by the physiological information-based method, but the related equipment adopts a contact type or even invasive type method, so that the working state of an operator can be interfered, and the fatigue degree of the operator is increased.

A fatigue degree detection method and device is disclosed in patent literature (application number: 201811360966.0, application publication number: CN109657550A, application date: 2018, 11, 15) of Kunshan, national institute of microelectronics, china, and the like. The method disclosed in the patent application comprises the steps of firstly shooting a video segment, detecting face images in the video segment according to a time dimension, extracting characteristic points of a plurality of areas in the face images, calculating eye closure degree, mouth opening degree and nodding frequency at each moment according to the characteristic points of the areas, and calculating corresponding threshold values to determine the fatigue degree of an operator. The method has the following defects: because the related characteristics and the threshold values are designed manually, the quality of the characteristics and the size of the threshold values can directly influence the final fatigue detection result. The device disclosed in the patent comprises a shooting module, a detection module, an extraction module and a determination module. When the device works, firstly, a shooting module shoots video clips; then, detecting each face image in the video clips shot by the shooting module by the detection module according to the time dimension; then, the extraction module extracts characteristic points of a plurality of areas in each face image detected by the detection module; and finally, determining the fatigue degree of the corresponding personnel by a determining module according to the characteristic points of the plurality of areas extracted by the extracting module. The system has two disadvantages: firstly, the device only inputs video data, the data type is single, and other effective information is lacking, so that the fatigue state judgment result is inaccurate and is easily influenced by illumination factors; secondly, the device only inputs image data in a 2D space, lacks depth information, and therefore cannot represent face pose information.

The Nanjing university discloses a driving fatigue detection method and a driving fatigue detection system combining a pseudo-3D convolutional neural network and an attention mechanism in a patent document (application number: 202010522475.2, application publication number: CN111428699A, application date: 2020, month and 10 days) applied by Nanjing university. The system disclosed in the patent application comprises a video acquisition and shearing module, a driving fatigue detection module and a display module. When the system works, firstly, the video acquisition and shearing module acquires real-time video stream of upper body information of a driver, then the driving fatigue detection module detects the fatigue degree of the driver, and finally the display module displays input video image information, output driving fatigue detection state information and warning information after driving fatigue is detected. The system has the defects in the detection process that the fatigue state of an operator cannot be comprehensively detected because only video stream information is collected and the information type is single. The patent application discloses a driving fatigue detection method combining a pseudo 3D convolutional neural network and an attention mechanism, which comprises the steps of firstly extracting and processing a driving video frame sequence; then adopting a pseudo 3D convolution module to perform space-time feature learning; then constructing a P3D-Attention module, and applying Attention to the channel and the feature map by using an Attention mechanism; finally, the 2D global average pooling layer, softmax classification layer, is used for classification. The method directly operates the sequence images, so that the network parameters are large, redundant information is large, the real-time performance is poor, and if the real-time performance is required to be met, the requirements on hardware equipment are high.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an online visual fatigue detection system and method based on deep learning, so as to acquire rich physiological information of operators, improve the calculation efficiency and improve the real-time performance and the accuracy of fatigue detection.

The idea for realizing the purpose of the invention is as follows: for the problem that the fatigue degree of an operator is increased at the contact part of equipment and a human body in a contact type visual fatigue detection system, the influence on the working state of the operator is reduced by adopting non-contact type hardware equipment; for the problem of single data in the non-contact visual fatigue detection system, the invention acquires the eye movement data, the image data and the depth data of an operator through the data acquisition system, and provides richer information for judging the visual fatigue state; for the problem that real-time performance is difficult to meet in the visual fatigue detection system, the image data is converted into text data by adopting a computer vision method, so that the calculation efficiency is improved; for the problem of high difficulty in manually designing the features in the visual fatigue detection system, a deep learning method and an end-to-end mode are adopted, so that the manner of manually designing the features is avoided, and the accuracy is greatly improved.

According to the above thought, the technical scheme of the invention is as follows:

1. the utility model provides an online visual fatigue detecting system based on degree of depth study, includes data acquisition module, image data processing module and fatigue detection module, its characterized in that:

the data acquisition module is arranged right below the computer display and comprises an eye movement data acquisition sub-module, an RGB image and depth data acquisition sub-module, an image data processing module and a data processing module, wherein the eye movement data acquisition sub-module is used for acquiring eye movement data and RGB image and depth information respectively, inputting the eye movement data acquired by the eye movement data acquisition sub-module into the fatigue detection module and inputting the RGB image and the depth information acquired by the RGB image and depth data acquisition sub-module into the image data processing module;

the image data processing module comprises a face detection sub-module, a face characteristic point extraction sub-module and a depth information extraction sub-module, which are respectively used for detecting the position of a face in image data, detecting the characteristic point of the face in the image data and extracting the depth information of the characteristic point, and inputting the face characteristic point data and the depth data output by the image data processing module into the fatigue detection module;

the fatigue detection module comprises a time sequence eye movement network, a space face network, a space depth network, a feature fusion network and a visual fatigue detection network, wherein the time sequence eye movement network, the space face network and the space depth network are connected in parallel and then are sequentially connected with the feature fusion network and the visual fatigue detection network in cascade, and the fatigue detection module is used for carrying out feature extraction, feature fusion and classification on eye movement data, face feature point data and depth data by adopting a deep learning method and outputting the fatigue degree of an operator.

Further, the structure and parameters of each network in the fatigue detection module are as follows:

the time sequence eye movement network has the structure of an input layer, a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer;

the spatial face network has the structure of an input layer, a 1 st convolution layer, a 2 nd convolution layer, a 3 rd convolution layer, a 4 th convolution layer, a 5 th convolution layer and a 6 th convolution layer;

the spatial depth network has the structure of an input layer, an I convolution layer, an II convolution layer and an III convolution layer;

the feature fusion network has the structure of a convolution layer, a first full-connection layer and a second full-connection layer, wherein the size of the first full-connection layer is l ₁ The second full connection layer has a size of l ₂ ；

The visual fatigue detection network comprises a Softmax function with an input size of l ₂ Tensor of (2), output size of l ₃ The tensor of the output tensor is taken as the index of the maximum value in the output tensor, namely the fatigue degree of the operator.

Each convolution layer in the time sequence eye movement network, the space facial network and the space depth network uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the convolution layer in the feature fusion network uses a one-dimensional convolution kernel with the size of 1, and the convolution step length is 1.

2. A method for detecting on-line visual fatigue using the system of claim 1, comprising:

1) Collecting data:

1a) The size T is obtained through the eye movement data acquisition submodule _E X E eye movement data, wherein E represents the dimension of the eye movement data, T _E ＝F _E ×n，T _E Frame number representing eye movement data, F _E Representing the sampling rate of the eye movement instrument, and n represents the acquisition time;

1b) The RGB image and depth data acquisition submodule is used for acquiring the image with the size of MxNxT _I Is of size M x N x T _I Wherein M represents the width of the image, N represents the height of the image, T _I ＝F _I ×n，T _I Representing the number of frames of an image sequence, F _I Representing the number of transmission frames per second FPS, n representing the time of acquisition;

2) Processing the image data:

2a) Inputting RGB image data into a face detection submodule, wherein the submodule outputs the upper left corner coordinate P of a rectangular frame marking the position of a face in an image by using a face detection algorithm based on a direction gradient histogram HOG and a support vector machine SVM method in a Dlib library ₁ And the right lower corner coordinate P of the rectangular frame ₂ ；

2b) Inputting the face position data output in the step 2 a) into a face feature point extraction submodule, wherein the submodule detects 68 face feature point coordinate sets by using a face feature point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, and extracts 20 feature point positions in the sets, including binocular feature points and feature points at the inner side of a mouth;

2c) Inputting the feature point positions obtained in the step 2 b) into a depth information extraction sub-module to obtain depth data of the corresponding feature point positions;

3) Detecting the fatigue degree of an operator:

3a) Respectively inputting the eye movement data acquired in 1 a), the face feature point data output in 2 b) and the depth data output in 2 c) into a time sequence eye movement network, a space face network and a space depth network in a fatigue detection module, and extracting different features in the data, namely sequence eye movement feature x by using a deep learning method _g Spatial facial features x _i And spatial depth feature x _d ；

3b) Inputting the different features output in the step 3 a) into a feature fusion network in a fatigue detection module, and outputting fusion features;

3c) Inputting the fusion features output in 3 b) into a visual fatigue detection network in a fatigue detection module, and outputting the fatigue degree of an operator.

Compared with the prior art, the invention has the following advantages:

first, ensure the normal work of operators

Because the system adopts the non-contact data acquisition module, namely the data acquisition module is arranged under the computer display, the effective data of an operator are acquired under the condition of not interfering with the working state of the operator, and the working interference to the operator is avoided.

Second, obtain the physiological information to be rich

Because the eye movement data acquisition sub-module, the RGB image and the depth data acquisition sub-module are adopted in the system, and the eye movement data, the image data and the depth data of the operator are acquired at the same time, the problem that the estimated visual fatigue degree of the operator is large due to the fact that only single type of data is acquired in the prior art is solved, and the information category is increased.

Third, the detection accuracy is high

Because the method of the invention uses a deep learning method to realize an end-to-end process, the problem of low accuracy of visual fatigue detection results caused by high difficulty of manual design features in the prior art is solved, and the detection accuracy is improved;

fourth, the real-time performance of operation is strong

The method of the invention converts the image data processing problem into the text data processing problem by using the image processing method, thereby solving the problem of poor real-time performance of system operation caused by processing a large amount of image data by using a deep learning method in the prior art and improving the real-time performance of visual fatigue detection operation.

Drawings

FIG. 1 is a schematic diagram of the structure of the on-line visual fatigue detection system based on deep learning;

FIG. 2 is a schematic diagram of a fatigue detection module in the system of the present invention;

fig. 3 is a flowchart of an implementation of the online visual fatigue detection method based on deep learning.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 1, the on-line visual fatigue detection system of the invention comprises a data acquisition module 1, an image data processing module 2 and a fatigue detection module 3.

The data acquisition module 1 is arranged under the computer display and has a distance between 65cm and 85cm from an operator, and mainly comprises an eye movement data acquisition sub-module 11, an RGB image and depth data acquisition sub-module 12, wherein:

the eye movement data acquisition sub-module 11 uses Tobii Eye Tracker with a sampling rate of 90Hz according to the accuracy required by the visual fatigue detection system, adopts an improved version of the traditional pupil cornea reflection technology PCCR remote measurement type eye movement tracking technology, uses an image sensor to acquire a near infrared light source to generate a reflection image on the cornea and pupil of the eye of an operator, accurately calculates the spatial position and the sight line position of the eye through an image processing algorithm built in the sub-module, and acquires the image including the fixation point coordinate P _G (x, y), left eye spatial position P _LE (x, y, z), right eye spatial position P _RE (x, y, z), head position P _H (x, y, z), head pose R _H (x, y, z) eye movement data;

the RGB image and depth data acquisition sub-module 12 acquires RGB image data and depth data having a resolution of 640×360 using Intel RealSense Depth Camera D435 i.

The image data processing module 2 mainly comprises a face detection sub-module 21, a face feature point extraction sub-module 22 and a depth information extraction sub-module 23, wherein:

the face detection sub-module 21 acquires the face position of an operator in an RGB image by adopting a face detection algorithm based on HOG and SVM methods in a Dlib library according to the real-time performance required by the visual fatigue detection system;

the face feature point extraction submodule 22 adopts a GBDT-based face feature point detection extraction algorithm in a Dlib library according to the real-time performance required by the visual fatigue detection system to obtain a 68 face feature point set of an operator, and outputs 20 feature point positions including binocular feature points and mouth inside feature points in the set, wherein binocular feature point position data represent the opening and closing states of eyes of a human body, and mouth inside feature point data represent the opening and closing states of a mouth of the human body;

the depth information extraction sub-module 23 outputs depth data having a size of 75×20 corresponding to the extracted feature point position data.

The fatigue detection module 3 adopts a one-dimensional convolutional neural network algorithm, uses an end-to-end method, directly obtains the fatigue degree of an operator from input data,

referring to fig. 2, the fatigue detection module includes a sequential eye movement network 31, a spatial face network 32, a spatial depth network 33, a feature fusion network 34 and a visual fatigue detection network 35, and the sequential eye movement network 31, the spatial face network 32 and the spatial depth network 33 are connected in parallel and then are connected in cascade with the feature fusion network 34 and the visual fatigue detection network 35 in sequence, wherein:

the time sequence eye movement network 31 is formed by sequentially cascading an input layer and 5 convolution layers, wherein each convolution layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the sequence eye movement characteristic x is output _g ；

The spatial facial network 32 is composed of an input layer and 6 convolution layers which are sequentially cascaded, wherein each convolution layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the output generates a spatial facial feature x _i ；

The spatial depth network 33 is formed by sequentially cascading an input layer and 3 convolution layers, wherein each convolution layer uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and the spatial depth characteristic x is output _d ；

The feature fusion network 34 is formed by sequentially cascading 1 convolution layer and 2 full connection layers, wherein the convolution layer uses a one-dimensional convolution kernel with the size of 1, the convolution step length is 1, and fusion features x are output;

the visual fatigue detection network 35 classifies the visual fatigue with the Softmax function and outputs the fatigue degree of the operator.

The steps of the on-line visual fatigue detection method of the present invention are further described below with reference to fig. 3.

And step 1, starting a data acquisition module.

Eye movement data with the size of 450 multiplied by 14 is acquired through an eye movement data acquisition submodule, wherein the eye movement data comprises a fixation point coordinate P _G (x, y), left eye spatial position P _LE (x, y, z), right eye spatial position P _RE (x, y, z), head position P _H (x, y, z), head pose R _H (x,y,z)；

Acquiring an image sequence with the size of 640 multiplied by 360 multiplied by 75 and depth data with the size of 640 multiplied by 360 multiplied by 75 through an RGB image and depth data acquisition submodule;

and 2, extracting feature points and corresponding depth information of a human face in the acquired image data.

Inputting RGB image data into a face detection submodule, wherein the submodule outputs the upper left corner coordinate P of a rectangular frame containing a face in an image by using a face detection algorithm based on a direction gradient histogram HOG and a support vector machine SVM method in a Dlib library ₁ And the right lower corner coordinate P of the rectangular frame ₂ ；

Upper left corner position P of rectangular frame ₁ And the right lower corner coordinate P of the rectangular frame ₂ Inputting the human face feature point extraction sub-module, obtaining 68 human face feature point position sets by using a human face feature point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, and extracting 2 total of the two-eye feature points and the feature point at the inner side of the mouth from the sets0 feature point positions, outputting face feature point data with the size of 40 multiplied by 75;

and inputting the facial feature point data into a depth information extraction sub-module, and outputting the depth information of the corresponding feature point position.

And 3, detecting the fatigue degree of the operator.

Inputting eye movement data into a time sequence eye movement network and outputting sequence eye movement characteristics x _g ；

Inputting the feature point position data into a spatial facial network and outputting the spatial facial feature x _i ；

Inputting depth data into a spatial depth network and outputting spatial depth characteristics x _d ；

Sequence eye movement characteristics x _g Spatial facial features x _i And spatial depth feature x _d Inputting the characteristics into a characteristic fusion network;

sequence eye movement characteristics x _g And spatial facial features x _i Splicing and outputting one-time splicing characteristic x _gi ；

The feature x is spliced once _gi Results and spatial depth features x output through a convolution layer _d Splicing, outputting a secondary splicing characteristic x _gid ；

Feature x is spliced twice _gid Sequentially passing through two full connection layers to output fusion characteristics x;

inputting the fusion characteristic x into a visual fatigue detection network, and outputting the fatigue degree of an operator.

The above description is only one specific example of the present invention and does not constitute any limitation of the present invention. It will be apparent to those skilled in the art that various modifications and changes in form and details can be made without departing from the principles and construction of the invention, but these modifications and changes based on the inventive concept are still within the scope of the appended claims.

Claims

1. The utility model provides an online visual fatigue detecting system based on degree of depth study, includes data acquisition module (1), image data processing module (2) and fatigue detection module (3), its characterized in that:

the data acquisition module (1) is arranged right below the computer display and comprises an eye movement data acquisition sub-module (11) and an RGB image and depth data acquisition sub-module (12), wherein the RGB image and depth data acquisition sub-module is respectively used for acquiring eye movement data and RGB image and depth information, the eye movement data acquired by the eye movement data acquisition sub-module (11) is input to the fatigue detection module (3), and the RGB image and the depth information acquired by the RGB image and depth data acquisition sub-module (12) are input to the image data processing module (2);

the image data processing module (2) comprises a face detection sub-module (21), a face characteristic point extraction sub-module (22) and a depth information extraction sub-module (23) which are respectively used for detecting the position of a face in image data, detecting the characteristic point of the face in the image data and extracting the depth information of the characteristic point, and inputting the face characteristic point data and the depth data output by the image data processing module (2) into the fatigue detection module (3);

the fatigue detection module (3) comprises a time sequence eye movement network (31), a space face network (32), a space depth network (33), a feature fusion network (34) and a visual fatigue detection network (35), wherein the time sequence eye movement network (31), the space face network (32) and the space depth network (33) are connected in parallel and then are sequentially cascaded with the feature fusion network (34) and the visual fatigue detection network (35), and the fatigue detection module is used for carrying out feature extraction, feature fusion and classification on eye movement data, face feature point data and depth data by adopting a deep learning method and outputting the fatigue degree of an operator;

the structure and parameters of each network in the fatigue detection module (3) are as follows:

the time sequence eye movement network (31) is structurally characterized in that an input layer, a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer;

the spatial face network (32) has a structure of an input layer, a 1 st convolution layer, a 2 nd convolution layer, a 3 rd convolution layer, a 4 th convolution layer, a 5 th convolution layer and a 6 th convolution layer;

the spatial depth network (33) has a structure of an input layer, an I convolution layer, an II convolution layer and an III convolution layer;

the feature fusion network (34) is structured by a convolution layer, a first full connection layer and a second full connection layer, wherein the size of the first full connection layer is l ₁ The second full connection layer has a size of l ₂ ；

The visual fatigue detection network (35) comprises a Softmax function with an input size of l ₂ Tensor of (2), output size of l ₃ Taking the index of the maximum value in the output tensor to obtain the tensor of the output tensor, namely the fatigue degree of the operator;

each convolution layer in the time sequence eye movement network (31), the space face network (32) and the space depth network (33) uses a one-dimensional convolution kernel with the size of 3, the convolution step length is 1, and each convolution layer in the feature fusion network (34) uses a one-dimensional convolution kernel with the size of 1, and the convolution step length is 1.

(1) Collecting data:

(1a) The size T is obtained through the eye movement data acquisition submodule _E X E eye movement data, wherein E represents the dimension of the eye movement data, T _E ＝F _E ×n，T _E Frame number representing eye movement data, F _E Representing the sampling rate of the eye movement instrument, and n represents the acquisition time;

(1b) The RGB image and depth data acquisition submodule is used for acquiring the image with the size of MxNxT _I Is of size M x N x T _I Wherein M represents the width of the image, N represents the height of the image, T _I ＝F _I ×n，T _I Representing the number of frames of an image sequence, F _I Representing the number of transmission frames per second FPS, n representing the time of acquisition;

(2) Processing the image data:

(2a) Inputting RGB image data into a face detection submodule, wherein the submodule outputs a rectangular frame left for marking the position of a face in an image by using a face detection algorithm based on a direction gradient histogram HOG and a support vector machine SVM method in a Dlib libraryUpper angular position P ₁ And the right lower corner coordinate P of the rectangular frame ₂ ；

(2b) Inputting the face position data output in the step (2 a) into a face feature point extraction submodule, wherein the submodule detects 68 face feature point coordinate sets by using a face feature point detection extraction algorithm based on a gradient lifting decision tree GBDT in a Dlib library, and extracts 20 feature point positions in the sets, including binocular feature points and mouth inner side feature points;

(2c) Inputting the feature point positions obtained in the step (2 b) into a depth information extraction sub-module to obtain depth data of the corresponding feature point positions;

(3) Detecting the fatigue degree of an operator:

(3a) The eye movement data acquired in the step (1 a), the face characteristic point data output in the step (2 b) and the depth data output in the step (2 c) are respectively input into a time sequence eye movement network (31), a space face network (32) and a space depth network (33) in the fatigue detection module, and different characteristics, namely sequence eye movement characteristics x, in the eye movement data, the face characteristic point data and the depth data are extracted by using a deep learning method _g Spatial facial features x _i And spatial depth feature x _d ；

(3b) Inputting the different features output in the step (3 a) into a feature fusion network (34) in the fatigue detection module, and outputting fusion features;

(3c) Inputting the fusion characteristic output in the step (3 b) into a visual fatigue detection network (35) in a fatigue detection module, and outputting the fatigue degree of an operator.

3. The method of claim 2, wherein the eye movement data collected in (1 a) comprises: gaze point coordinate P _G (x, y), left eye spatial position P _LE (x, y, z), right eye spatial position P _RE (x, y, z), head position P _H (x, y, z), head pose R _H (x,y,z)。

4. The method of claim 2, wherein the different features output in (3 a) are input to a feature fusion network (34) in the fatigue detection module in (3 b), the output fusion features beingAfter three different features enter the feature fusion network, the sequential eye movement feature x is first selected _g And spatial facial features x _i Splicing to obtain a primary splicing characteristic x _gi The method comprises the steps of carrying out a first treatment on the surface of the The first splice feature x is then used _gi Results and spatial depth features x output through a convolution layer _d Splicing to obtain a secondary splicing characteristic x _gid The method comprises the steps of carrying out a first treatment on the surface of the The secondary stitching feature x _gid And sequentially passing through two full connection layers to output fusion characteristics x.