CN115547488A - Early screening system and method based on VGG convolutional neural network and facial recognition autism - Google Patents

Early screening system and method based on VGG convolutional neural network and facial recognition autism Download PDF

Info

Publication number
CN115547488A
CN115547488A CN202211242844.8A CN202211242844A CN115547488A CN 115547488 A CN115547488 A CN 115547488A CN 202211242844 A CN202211242844 A CN 202211242844A CN 115547488 A CN115547488 A CN 115547488A
Authority
CN
China
Prior art keywords
autism
data
face
image data
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211242844.8A
Other languages
Chinese (zh)
Inventor
吴琴
肖湘民
杨琰
张志蕾
郑常榕
何兵
宇文泰然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Chengdu University of Information Technology
Original Assignee
Peking University
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Chengdu University of Information Technology filed Critical Peking University
Priority to CN202211242844.8A priority Critical patent/CN115547488A/en
Publication of CN115547488A publication Critical patent/CN115547488A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computers and artificial intelligence, and discloses a VGG convolutional neural network and facial recognition autism early screening system and a method, wherein facial image data of a testee are acquired through data collection equipment in a non-contact mode, and the presentation of a visual stimulation video is controlled through an experiment control program and the working state of the data collection equipment is controlled; performing primary processing on the original facial image data acquired by the data acquisition device through a data processing program; sending the processed face image data set into a VGG16 model for training through a data training program to obtain a face classification model of the autism child; the face image data is captured through the camera, the face image data is sent to the trained face classification model of the autism children for prediction, and the classification function of the autism children is finally completed. According to the invention, the histogram equalization processing is carried out on the face image data, so that the contrast is improved, and the image enhancement effect is achieved.

Description

Early screening system and method based on VGG convolutional neural network and facial recognition autism
Technical Field
The invention belongs to the technical field of computers and artificial intelligence, and particularly relates to a system, a method, a medium, equipment and a terminal for early screening of autism based on a VGG convolutional neural network and facial recognition.
Background
Currently, autism, also known as autism disorder, is a serious mental developmental disorder. It is mainly manifested as obvious lag and disorder of speech function, impaired social function, impaired communication ability, stereotyped behaviors, interests and actions. At present, the incidence rate of the autism of the newborn in China is about 1/100, and the newborn cannot be cured after being discovered untimely, so that great economic and mental burdens are brought to the society and the individual families.
Currently, there is no drug available for the treatment of autism, so early screening, diagnosis and intervention are currently the only proven effective means of correction. For early screening of autistic children, scientific and effective measuring and evaluating tools are lacked in China. In recent years, face recognition technology is becoming widely used in research of face recognition for autism due to its advantages such as high accuracy and rapidity. At the same time, the increasing number of cases of ASD worldwide is also the driving force for therapists and scientists to find more effective screening methods. Therefore, the face recognition technology has great development potential in the field of early screening and evaluation of autism.
Deep learning is a new research direction in the field of artificial intelligence, and with the rapid development of neural networks and the strong feature extraction capability of the neural networks on image data, the deep learning is also gradually applied to the research of face recognition nowadays. At present, deep learning has achieved many achievements in search technology, data mining, machine learning, natural language processing, multimedia learning, voice recognition and other related fields, and has promoted the development of artificial intelligence technology. However, in the early screening field of autism, the application of the face recognition technology based on deep learning in the field is still relatively lacking due to the limitation of the traditional screening way.
Through the above analysis, the problems and defects of the prior art are as follows: on one hand, the traditional screening method for autism is based on a standardized scale tool for prediction, and the prediction result is influenced by the quality and variety of the scale, the subjective opinion of parents and therapists and other aspects, so that the accuracy is reduced. On the other hand, due to the fact that the quality of the face image data sets of the autism children is uneven and preprocessing operation is not performed, the accuracy rate is low in the process of performing data training and image classification by using a traditional neural network.
Therefore, the application of the face recognition technology based on deep learning in the early stage screening field of autism is still lacked due to the limitation of the traditional screening mode.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a system, a method, a medium, equipment and a terminal for early screening of autism based on a VGG convolutional neural network and facial recognition, and particularly relates to a system, a method, a medium, equipment and a terminal for early screening of autism based on a VGG convolutional neural network and facial recognition technology.
The invention is realized by a VGG convolutional neural network and face recognition autism promethary screening system, which comprises:
a data collection device including a data collection device and an experiment control program for collecting facial image data of a subject in a non-contact manner by the data collection device; controlling the presentation of a visual stimulation video and controlling the working state of the data acquisition equipment through the experiment control program;
the data processing module is used for carrying out primary processing on the original facial image data acquired by the data acquisition device through a data processing program, and comprises a first type processing mode and a second type processing mode;
the data training module is used for sending the processed face image data set into the VGG16 model for training through a data training program to obtain an autism child face classification model;
the face recognition module comprises a model classification unit and a camera scheduling unit and is used for capturing face image data through a camera, sending the face image data into the trained face classification model of the autism children for prediction, and finally completing the classification function of the autism children.
Further, the visual stimulation video refers to various stimulation paradigms for distinguishing the autistic children from the normal children, and the stimulation paradigms are presented in a video mode; the facial image data refers to facial expression characteristic data presented by a testee when watching a visual stimulation video, and is used for generating raw data of various training samples and test samples through subsequent processing.
The data collection equipment is preferably a USB high-definition camera, and the experiment control program is preferably a desktop application program developed based on Python-Opencv; when data collection is carried out, visual stimulation videos are displayed and played for a testee, and meanwhile, facial image data of the testee are collected by the aid of the USB high-definition camera at intervals of 1 second.
Further, the data training program takes a VGG16 convolutional neural network model as a core framework; the face image data set adopts a kaggle autism detection data set, the data set is divided into three groups according to a training set, a verification set and a test set, each group is divided into two types according to an autism image and a non-autism image, and the proportion of male and female images in the autism group is 3:1, the proportion of the male and female pictures of the non-autism group is 1:1.
the VGG16 convolutional neural network model is used for training of a face image data set. The VGG16 convolutional neural network model comprises 1 input layer, 1 output layer, 13 convolutional layers, 3 full-connection layers and 5 pooling layers; the VGG16 convolutional neural network model is formed by stacking a plurality of convolutional layers and pooling layers, is divided into different blocks, and is numbered from top to bottom as Block1-Block 6; wherein, block1 and Block2 contain 2 convolutional layers and 1 pooling layer, block3 and Block4 contain 3 convolutional layers and 1 pooling layer, block5 contains 3 convolutional layers, and Block6 contains a full-link layer and a Dropout layer; the input layer has input data dimensions of 64 × 1; the convolutional layer adopts a convolutional kernel with the kernel size of 3 and the stride of 1, the filling mode is padding = same, and relu is adopted as an activation function; the pooling layer adopts a max pooling mode, the parameters of pooling nuclei are 2*2, and the step length is 2; the full-connection layer uses a Flatten function to carry out one-dimensional operation on the multi-dimensional data, so that the effect of transition from the convolution layer to the full-connection layer is achieved; mapping the one-dimensional data by using a Dense function, and adding a Dropout layer to improve the generalization capability of the model; and the output layer is built by using a Dense function, and the softmax is used as the output layer function to complete the building of the network model.
Further, the model classification unit loads the face classification model of the autism child by calling a model loading method in TensorFlow 2.0, and classifies the facial image data of the tested child by calling the classification model trained by the data training program;
the camera scheduling unit completes calling of the USB high-definition camera by adopting a camera obtaining method built in the OpenCV, and completes capturing of the human face features by loading the Harr cascade classification model.
Another object of the present invention is to provide a VGG convolutional neural network-based and face recognition autism early screening method using the VGG convolutional neural network-based and face recognition autism early screening system, which includes:
acquiring facial image data of a testee watching different stimulation paradigms in a mode which is easily accepted by the testee, and further acquiring a sample which can be used as a deep learning model training; the facial images of the autism children and the normal children are classified through deep learning, and the screening of the autism is finally completed.
Further, the VGG convolutional neural network-based and face recognition autism early screening method comprises the following steps:
the method comprises the steps that firstly, facial image data of a testee are collected in a non-contact mode through data collection equipment, presentation of a visual stimulation video is controlled through an experiment control program, and the working state of the data collection equipment is controlled;
performing primary processing on the original facial image data acquired by the data acquisition device through a data processing program, wherein the primary processing comprises a first type processing mode and a second type processing mode;
step three, sending the processed face image data set into a VGG16 model for training through a data training program to obtain an autism child face classification model;
and step four, capturing face image data through a camera, sending the face image data into the trained face classification model of the autism children for prediction, and finally completing the classification function of the autism children.
Further, the first type processing is that the original image data is subjected to face detection, face characteristic data is extracted from the original image data, a face is separated from an image background, the position of the face is returned in a coordinate mode, and the size and the posture of the face are returned at the same time; the first type of processing is preferably realized by a human face detection classifier based on Harr characteristics in OpenCV; extracting Harr-like features of the collected face image data; the Harr-like feature is a feature for solving difference values by a pixel division module reflecting gray level change of an image, is composed of rectangles and is divided into an edge feature and a linear featureCenter feature and diagonal feature; the Harr-like feature is extracted by adopting a Viola-Jones face detection algorithm; calculating characteristic value by dividing the gray image into black and white regions, calculating the difference of the sum of pixel values of the white region W and the black region B, and multiplying the difference by corresponding weight coefficient T to obtain Harr characteristic value C of i region Haari The calculation method is as follows:
C Haari =[∫∫ w p(x,y)dx dy-∫∫ B p(x,y)dx dy]*T i
and finally, training a Harr classifier by using corresponding positive sample images and negative sample images of features to be recognized and generating a training model, wherein the first type of processing detects the human face by using a Harr cascade classification model which is trained and completed in OpenCV.
The second type of processing is to perform preprocessing operation on the extracted face picture, and the preprocessing operation comprises histogram equalization, denoising and normalization.
The histogram equalization is to perform gray mapping on input image data to obtain a two-dimensional histogram of an input image; counting the occurrence frequency of each gray level, and adjusting a gray level histogram to improve the phenomenon that the foreground background is too bright or too dark due to overexposure or underexposure of the image;
preferably, the histogram equalization is performed by sequentially scanning each pixel of the original image, calculating a gray histogram of the image, and calculating a cumulative distribution function S of the gray histogram k (ii) a Obtaining a mapping relation between input and output according to the cumulative distribution function, and carrying out image transformation according to the mapping relation; the mapping relationship is expressed as:
Figure BDA0003885378960000051
wherein S is k Refers to the value of the current gray level after the cumulative distribution function mapping, n is the sum of the pixels in the image, n j Is the number of pixels at the current gray level and L is the total number of gray levels in the image.
The denoising is to replace the value of one point in the digital image by the median of each point value in a neighborhood of the point by using a median filtering method, so that the surrounding pixel values are close to the real value, and the isolated noise point is eliminated; the normalization is to divide the pixel value of the image by 255 to obtain a value between 0 and 1 for calculation.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the VGG-based convolutional neural network and face recognition autism early screening method.
It is another object of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the VGG convolutional neural network-based and face recognition autism prescreening method.
Another object of the present invention is to provide an information data processing terminal for implementing the VGG-based convolutional neural network and the facial recognition autism early screening system.
By combining the technical scheme and the technical problem to be solved, the technical scheme to be protected by the invention has the advantages and positive effects that:
according to the invention, through a mode which is easily accepted by a testee, the facial image data of the testee watching different stimulation paradigms is obtained, so that a sample which can be used as a deep learning model training is further obtained, the facial images of the autistic children and normal children are classified through deep learning, and the screening of the autism is finally completed.
According to the early screening system based on the VGG convolutional neural network and the facial recognition autism, the histogram equalization processing is carried out on the collected face image data, the phenomenon that the foreground background is too bright or too dark due to overexposure or underexposure of the image is improved, the contrast is improved, the image enhancement effect is achieved, and the technical blank that the application of the facial recognition technology based on deep learning in the early screening field of autism is still lacked is filled.
The technical scheme of the invention fills the technical blank in the industry at home and abroad: at present, the traditional early screening of autism cannot be diagnosed through a specific examination, and can be screened only according to the symptomatology characteristics, the medical history data and the social function of the autism. In the field of deep learning, the convolutional neural network model is widely applied to the fields of image classification, semantic segmentation, natural language processing and the like, but the application in the autism early screening industry is yet to be developed.
The technical scheme of the invention takes a VGG convolutional neural network model as a core, and can control the number of parameters while acquiring more sample characteristics by constructing a deeper network structure, a smaller convolutional kernel and a pooling sampling domain. In addition, compared with the traditional deep learning identification step, the image preprocessing operations such as histogram equalization, denoising and normalization are newly added, the image enhancement and the contrast are effectively carried out, the calculated amount is reduced, and the accuracy of early screening is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a VGG convolutional neural network-based and facial recognition autism-based early screening method provided by an embodiment of the invention;
FIG. 2 is a schematic structural diagram of a VGG-based convolutional neural network and facial recognition autism prescreening system provided by an embodiment of the invention;
FIG. 3 is a basic architecture diagram of a VGG convolutional neural network-based and facial recognition autism prescreening system provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a VGG16 neural network provided by an embodiment of the present invention;
FIG. 5 is a graph of training set and validation set accuracy for each epoch of the VGG16 model provided by an embodiment of the present invention.
In the figure: 1. a computer device; 2. a display screen; 3. a USB high definition camera; 210. a data collection device; 220. a data processing program; 230. a data training program; 240. a facial recognition framework program.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a system, a method, a medium, equipment and a terminal for early screening of autism based on a VGG convolutional neural network and facial recognition, and the invention is described in detail with reference to the accompanying drawings.
This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.
As shown in fig. 1, the VGG convolutional neural network-based and facial recognition autism early screening method provided by the embodiment of the invention comprises the following steps:
s101, acquiring facial image data of a testee in a non-contact mode through data acquisition equipment, controlling the presentation of a visual stimulation video through an experiment control program and controlling the working state of the data acquisition equipment;
s102, performing primary processing on original facial image data acquired by a data acquisition device through a data processing program, wherein the data processing program comprises a first type processing mode and a second type processing mode;
s103, sending the processed face image data set into a VGG16 model through a data training program for training to obtain a face classification model of the autism child;
and S104, capturing the face image data through the camera, sending the face image data into the trained face classification model of the autism children for prediction, and finally completing the classification function of the autism children.
The VGG convolutional neural network and facial recognition autism early screening system based on the combination of the facial recognition technology and the deep learning provided by the embodiment of the invention comprises a data collection device, a data processing program, a data training program and a facial recognition framework program.
(1) Data collection device
Specifically, the data collection device provided by the embodiment of the invention comprises a data collection device and an experiment control program, wherein the data collection device can collect facial image data of a human subject in a non-contact mode, and the experiment control program is used for controlling the presentation of visual stimulation videos and controlling the working state of the data collection device. The visual stimulation video refers to various stimulation paradigms for distinguishing the autistic children from the normal children, and the stimulation paradigms are presented in a video mode. The facial image data refers to facial expression feature data presented by the testee when watching the visual stimulation video, and is used as raw data for generating various training samples and test samples through subsequent processing.
As a preferred embodiment, the data collection device provided in the embodiment of the present invention is a USB high-definition camera, and the experiment control program is a desktop application program developed based on Python-Opencv. When data collection is carried out, visual stimulation videos are displayed and played for a testee, and meanwhile, facial image data of the testee are collected by using the USB high-definition camera at the time interval of 1 second.
(2) Data processing program
The data processing program provided by the embodiment of the invention is a program for performing preliminary processing on original facial image data collected by a data collection device, and comprises a first type of processing and a second type of processing.
The first type of processing is to perform face detection on original image data, extract face feature data from the original image data, separate a face from an image background, return the position of the face in a coordinate form, and return the size and the posture of the face at the same time.
The second type of processing is to perform a preprocessing operation on the extracted face picture. The preprocessing operations herein include histogram equalization, denoising, normalization, etc., as a preferred embodiment.
As a preferred embodiment, the first type of processing is implemented based on a face detection classifier of Harr features in OpenCV. Specifically, firstly, the Harr-like features of the collected face image data are extracted. The Harr-like feature is a feature reflecting the gray level change of an image and solving a difference value by a pixel division module, mainly comprises a rectangle and is divided into four types of edge features, linear features, central features and diagonal features. The extraction of Harr-like features is realized by using a Viola-Jones face detection algorithm. Then, the characteristic value is calculated by dividing the gray image into two parts (black and white regions), calculating the difference of the sum of the pixel values of the white region W and the black region B, and multiplying the difference by the corresponding weight coefficient T to obtain the Harr characteristic value C of the i region Haari
Figure BDA0003885378960000091
And finally, training a Harr classifier by using the corresponding positive sample image and the corresponding negative sample image of the features to be recognized and generating a training model. In particular, the first type of processing detects faces using Harr cascade classification models that have been trained in OpenCV.
As a preferred embodiment, the second type of processing provided in the embodiment of the present invention is to sequentially perform histogram equalization, denoising, and normalization operations on the acquired face image data. Specifically, histogram equalization refers to that input image data is subjected to gray level mapping to obtain a two-dimensional histogram of an input image, then the occurrence frequency of each gray level is counted, and finally the phenomenon that a foreground background of the image is too bright or too dark due to overexposure or underexposure is improved by adjusting the gray level histogram, so that the contrast is improved, and the effect of image enhancement is achieved. Denoising is to use a median filtering method to replace the value of one point in a digital image with the median of each point value in a neighborhood of the point, so that the surrounding pixel values are close to the true values, thereby eliminating isolated noise points. Normalization is a calculation of dividing the pixel value of an image by 255 to obtain a value between 0 and 1 in order to avoid interference due to insufficient image contrast.
(3) Data training program
The data training program provided by the embodiment of the invention takes the VGG16 convolutional neural network model as a core framework, and the facial image data set which is processed is sent to the VGG16 model for training to finally obtain the autism child face classification model.
The face image data set provided by the embodiment of the invention adopts a kaggle autism child face image data set, the data set comprises about 3000 child face images of 2-8 years old, the data set is divided into three groups according to a training set, a verification set and a test set, each group is divided into two groups according to an autism image and a non-autism image, and the proportion of male and female images in the autism group is close to 3:1, the proportion of the male and female pictures of the non-autism group is close to 1:1. the VGG16 convolutional neural network model is used for training the face image data set.
As a preferred embodiment, the specific structure of the model includes 1 Input Layer (Input Layer), 1 Output Layer (Output Layer), 13 Convolutional layers (volumetric Layer), 3 Fully Connected layers (full Connected Layer) and 5 pooling layers (Pool Layer). The model is formed by stacking (stack) a plurality of convolution layers and pooling layers, and is divided into different blocks (Block), which are numbered as Block1-Block6 from top to bottom. Wherein Block1 and Block2 contain 2 convolutional layers and 1 pooling layer, block3 and Block4 contain 3 convolutional layers and 1 pooling layer, block5 contains 3 convolutional layers, and Block6 contains fully connected layers (Flatten, density) and Dropout layers. input layer input data dimension is 64 × 1; the convolution layer adopts a convolution kernel with the kernel size of 3 and the stride of 1, the filling mode is padding = same, and relu is adopted as an activation function; the pooling layer adopts a max pooling mode, the parameters of pooling cores are 2*2, and the step length is 2; and the full connection layer uses a Flatten function to carry out one-dimensional conversion on the multi-dimensional data to achieve the effect of transition from the convolutional layer to the full connection layer, uses a Dense function to map the one-dimensional data, and then adds a Dropout layer to improve the generalization capability of the model. And the output layer is built by using a Dense function, and softmax is used as the output layer function. And finally completing the construction of the network model.
(4) Face recognition framework program
The facial recognition framework program provided by the embodiment of the invention comprises a model classification module and a camera scheduling module, and is characterized in that the camera captures face image data, and the face image data is sent to a trained face classification model of autism children for prediction, so that the function of classifying the autism children is finally completed.
As a preferred embodiment, the model classification module completes the loading of the facial classification model of the autism child by calling a model loading method in the tensrflow 2.0, and performs the operation of classifying the facial image data of the child to be tested by calling the classification model trained by the data training program.
And the camera scheduling module completes the calling of the USB high-definition camera by adopting a camera acquisition method built in OpenCV, and completes the capturing of the human face characteristics by loading the Harr cascade classification model.
According to the embodiment of the invention, the facial image data of the testee watching different stimulation paradigms is obtained in a manner easily accepted by the testee, so that a sample capable of being used as a deep learning model training is further obtained, the facial images of the autism children and the normal children are classified through deep learning, and the screening of the autism is finally completed.
In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies.
The invention belongs to the technical field of computers and artificial intelligence, and particularly relates to a system, a method, a medium, equipment and a terminal for early screening of autism based on a VGG convolutional neural network and facial recognition, which are applicable to the fields of medical health, children rehabilitation, internet of things application and the like in the future.
The system can be deployed in medical institutions, can assist doctors in collecting facial image data of autism children by combining corresponding computer equipment and hardware equipment, and can feed back an objective and accurate autism screening and evaluating result in time for the reference of the doctors. In addition, the system can also be deployed in an autism children rehabilitation center and is used for assisting a rehabilitation therapist to complete effective evaluation on characteristics of the autism children such as spirit, emotion and behavior and providing a targeted intervention suggestion.
In the embodiment, the face recognition of the autistic children by using the VGG16 convolutional neural network has the following advantages:
compared with a traditional neural network model, the VGG16 model is built by using a convolution kernel with the size of 3x3 and a maximum pooling layer with the size of 2x2, and the structure is simpler and more beautiful. Meanwhile, the method has good expansibility, and the performance of the model can be continuously improved by deepening the network.
2. The VGG16 model is trained by using a kaggle autism child face image data set. The data set contains about 3000 face images covering various human autistic and non-autistic children aged 2-8 years. The method has the advantages of large data volume, wide coverage and strong pertinence. Therefore, the data set is used for model training, and the accuracy of classification and screening results is effectively guaranteed.
3. The embodiment adds preprocessing operations such as histogram equalization, denoising, normalization and the like to the face image data, effectively reduces the noise points of the data, and improves the contrast ratio. Through experiments, the accuracy of a training set is about 70% and the accuracy of a verification set is about 60% by using data which is not subjected to preprocessing operation for model training and classification, and the accuracy of the training set is improved to about 80% and the accuracy of the verification set is improved to about 70% by using data which is subjected to preprocessing operation for model training and classification. Therefore, after the data is preprocessed, the accuracy of classification and screening results is improved. The figure is the training result of the VGG16 model after the preprocessing operation is carried out on the data, and the training result reflects the accuracy curve of the training set and the accuracy curve of the verification set. As shown in FIG. 5, the training set and validation set accuracy for each epoch of the VGG16 model.
The embodiment of the invention provides a VGG convolutional neural network and facial recognition autism early screening system which is combined with a facial recognition technology and deep learning and is based on a particular stimulation paradigm according to facial expression characteristics of autism children when facing a particular stimulation paradigm, and the system comprises a data collection device, a data processing program, a data training program and a facial recognition framework program.
(1) Data collection device
Specifically, referring to fig. 2, the data collection apparatus provided by the embodiment of the present invention is composed of a computer device 1, a display screen 2, and a USB high definition camera 3. The display screen provided by the embodiment of the invention is used for displaying the visual stimulation video; the USB high-definition camera is used to collect the facial image of the experimental child when watching the visual stimulation video, and send the facial image data to the experimental control program 212 shown in fig. 3. And the computer equipment runs an experiment control program to complete the collection of the facial image data.
The visual stimulation video provided by the embodiment of the invention is an animation film for recovery of autism children, namely transport vehicle and antenna baby. It should be appreciated that the experiment control program 212 may be one or more programs developed based on any language, running on any platform, and the invention is not limited in this respect.
(2) Data processing program
Referring to fig. 3, the data processing program 220 provided in the embodiment of the present invention is a set of programs for processing the face image data collected by the data collection device 210, and the data processing program 220 includes two different processing modes, i.e., a first type processing 221 and a second type processing 222. And performing first type processing on the face image data, performing second type processing on the face image data, and finally outputting the processed data information.
1) Treatment of the first type
The first type processing 221 detects and separates a human face in the face image data. Specifically, facial image data are transmitted into a data processing program by calling an OpenCV built-in data transmission method, then a face detection cascade classification model of Harr characteristics is loaded, then a feature detection method built in OpenCV is called in the classification model to complete detection and extraction of all faces in the facial image data, and coordinates and sizes (expressed by a matrix) of all the facial image data are saved. The scaling factor of the image scaling is set to 1.2, the number of effective points to be detected is set to 3, and the range size of the limited target region is set to 32 × 32. And finally, storing the processed face image data in a folder of a specified path.
In this embodiment, the first type processing 221 is implemented based on Python-OpenCV. It is to be understood that the algorithm described in the present embodiment is only one of the ways to implement the first type processing 221, and is not limited.
2) Treatment of the second type
The second type processing 222 is to perform histogram equalization, denoising and normalization operations on the data processed by the first type processing 221 in sequence.
(1) Histogram equalization
The histogram equalization operation provided by the embodiment of the invention is a method for enhancing the contrast of face image data. Specifically, each pixel of the original image is scanned sequentially, the gray histogram of the image is calculated, and then the cumulative distribution function S of the gray histogram is calculated k Then, a mapping relation between input and output is obtained according to the cumulative distribution function, and finally, image transformation is carried out according to the mapping relation. The mapping relationship is specifically expressed as:
Figure BDA0003885378960000131
wherein S is k Refers to the value of the current gray level after the cumulative distribution function mapping, n is the sum of the pixels in the image, n j Is the number of pixels of the current gray level and L is the total number of gray levels in the image.
(2) De-noising
The denoising operation provided by the embodiment of the invention adopts a median filtering method. Specifically, pixel points in the original image are scanned one by one, pixel values of all elements in the neighborhood of the pixel points are sorted from small to large, the obtained intermediate value is assigned to the pixel point corresponding to the current point in the target image until all the pixel points are processed, and finally the result is copied to a data area of the source image from a memory buffer area.
(3) Normalization
The normalization operation provided by the embodiment of the invention is to perform pixel extraction on the face image data after histogram equalization and denoising, then perform operation of dividing the pixel value by 255 to obtain a value between 0 and 1, and then perform illumination compensation to finally obtain the regularized face image data.
By the second type of processing, the noise points of the face image data are significantly reduced, and the contrast is significantly increased.
(3) Data training program
Referring to fig. 4, the data training program provided by the embodiment of the present invention is constructed with a VGG16 convolutional neural network model as a core. Specifically, the VGG16 convolutional neural network model is composed of 1 input layer, 1 output layer, 13 convolutional layers, 3 fully-connected layers, and 5 pooling layers.
The specific implementation steps are as follows:
and sending the face image data subjected to the first type processing and the second type processing into an input layer to be converted into a data matrix input, wherein the size of the face image data is 224 × 3.
Furthermore, the data matrix is activated by two conv64 convolution layers containing 64 convolution kernels and a ReLU activation function, and the size of the obtained feature map data is 224 × 64.
Further, the feature map data was passed through a max pooling layer (MaxPooling), and the size of the obtained feature map data was halved to 112 × 64.
Furthermore, the pooled feature map data is activated by two conv128 convolution layers containing 128 convolution kernels and a ReLU activation function, and the size of the obtained feature map data is 112 × 128.
Further, the feature map data was again passed through the max pooling layer (MaxPooling), and the size of the obtained feature map data was halved to 56 × 128.
Furthermore, the pooled feature map data is activated by three conv256 convolution layers containing 256 convolution kernels and a ReLU activation function, and the size of the obtained feature map data is 56 × 256.
Further, the feature map data was again passed through the max pooling layer (MaxPooling), and the size of the obtained feature map data was halved to 28 × 256.
Furthermore, the pooled feature map data is activated by three conv512 convolution layers containing 512 convolution kernels and a ReLU activation function, and the size of the obtained feature map data is 28 × 512.
Further, the feature map data was again passed through the max pooling layer (MaxPooling), and the size of the obtained feature map data was halved to 14 × 256.
Furthermore, the pooled feature map data is activated by three conv512 convolution layers containing 512 convolution kernels and a ReLU activation function, and the size of the obtained feature map data is 14 × 512.
Further, the above-described feature map data was again passed through the max pooling layer (MaxPooling), and the size of the obtained feature map data was halved to 7 × 512.
And further, activating the pooled feature map data through two fully-connected layers with the size of 1 × 4096 by using a ReLU activation function, and outputting vector with the size of 1 × 4096.
Further, the above feature map data was passed through a fully connected layer of size 1 × 1000, outputting a vector of size 1 × 1000.
Dropout (probability of 0.5) is carried out on the one-dimensional vectors obtained through the full-link layer to achieve regularization, the results are input into a softmax activation function, and the prediction results are converted into probability distribution.
After the VGG16 convolutional neural network model is built, the training data are sent to an SGD optimizer for model optimization, wherein the learning rate is set to be 0.01, the escape parameter is set to be 1e-6, and the momentum value is set to be 0.09. Four fifths of the face image data are taken as a training data set, and one fifth of the face image data are taken as a verification data set. The input features of the training data set are then set, with the size of each batch set to 8 and the number of iterations (epoch) set to 100. And finally, sending the set training data set into the VGG16 convolutional neural network model for training to finally obtain the autism child face classification model.
(4) Face recognition framework program
Referring to fig. 3, the facial classification model of autism children and the data after the first type processing and the second type processing are loaded into the facial recognition framework program 240, the camera scheduling module 241 records the facial image data in real time, the model classification module 242 performs system evaluation on the data, and finally outputs the evaluation result.
It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A VGG convolutional neural network and facial recognition autism prescreening system, comprising:
a data collection device including a data collection device and an experiment control program for collecting facial image data of a subject in a non-contact manner by the data collection device; controlling the presentation of a visual stimulation video and controlling the working state of the data acquisition equipment through the experiment control program;
the data processing module is used for carrying out primary processing on the original facial image data acquired by the data acquisition device through a data processing program, and comprises a first type processing mode and a second type processing mode;
the data training module is used for sending the processed face image data set into the VGG16 model for training through a data training program to obtain an autism child face classification model;
the face recognition module comprises a model classification unit and a camera scheduling unit and is used for capturing face image data through a camera, sending the face image data into the trained face classification model of the autism children for prediction, and finally completing the classification function of the autism children.
2. The VGG convolutional neural network and facial recognition autism promiscuous screening system of claim 1, wherein the visual stimulus video refers to various stimulus paradigms for differentiating autistic children from normal children, the stimulus paradigms being presented in a video manner; the facial image data refers to facial expression characteristic data which are expressed when a testee watches a visual stimulation video, and are used for generating original data of various training samples and test samples through subsequent processing;
the data collection equipment is preferably a USB high-definition camera, and the experiment control program is preferably a desktop application program developed based on Python-Opencv; when data collection is carried out, visual stimulation videos are displayed and played for a testee, and meanwhile, facial image data of the testee are collected by the aid of the USB high-definition camera at intervals of 1 second.
3. The VGG convolutional neural network and facial recognition autism promyelocytic screening system of claim 1, wherein the data training program frames a VGG16 convolutional neural network model as a core; the face image data set adopts a kaggle autism detection data set, the data set is divided into three groups according to a training set, a verification set and a test set, each group is divided into two types according to an autism image and a non-autism image, and the proportion of male and female images in the autism group is 3:1, the proportion of the non-autism group male and female pictures is 1:1;
the VGG16 convolutional neural network model is used for training a face image data set; the VGG16 convolutional neural network model comprises 1 input layer, 1 output layer, 13 convolutional layers, 3 full-connection layers and 5 pooling layers; the VGG16 convolutional neural network model is formed by a plurality of convolutional layers and pooling layers in a stacking mode, and is divided into different blocks which are numbered as Block1-Block6 from top to bottom; wherein Block1 and Block2 contain 2 convolutional layers and 1 pooling layer, block3 and Block4 contain 3 convolutional layers and 1 pooling layer, block5 contains 3 convolutional layers, and Block6 contains a full connection layer and a Dropout layer; the input layer has input data dimensions of 64 × 1; the convolutional layer adopts a convolutional kernel with the kernel size of 3, the step length of 1, the padding mode of padding = same and relu as an activation function; the pooling layer adopts a max pooling mode, pooling nuclear parameters are 2*2, and the step length is 2; the full-connection layer uses a Flatten function to carry out one-dimensional operation on the multi-dimensional data, so that the effect of transition from the convolution layer to the full-connection layer is achieved; mapping the one-dimensional data by using a Dense function, and adding a Dropout layer to improve the generalization capability of the model; and the output layer is built by using a Dense function, and the softmax is used as the output layer function to complete the building of the network model.
4. The VGG convolutional neural network and facial recognition autism early screening system of claim 1, wherein the model classification unit completes the loading of the autism children face classification model by calling a model loading method in TensorFlow 2.0, and performs the operation of classifying the facial image data of the children to be tested by calling the classification model completed through the training of the data training program;
the camera scheduling unit completes calling of the USB high-definition camera by adopting a camera obtaining method built in the OpenCV, and completes capturing of the human face features by loading the Harr cascade classification model.
5. A VGG convolutional neural network and facial recognition autism prescreening method applying the VGG convolutional neural network and facial recognition autism prescreening system as claimed in any one of claims 1 to 4, wherein the VGG convolutional neural network and facial recognition autism prescreening method comprises:
acquiring facial image data of a testee watching different stimulation paradigms in a mode which is easily accepted by the testee, and further acquiring a sample which can be used as a deep learning model training; the facial images of the autism children and the normal children are classified through deep learning, and the screening of the autism is finally completed.
6. The VGG convolutional neural network-based and face recognition autism prescreening method of claim 5, wherein the VGG convolutional neural network-based and face recognition autism prescreening method comprises the steps of:
acquiring facial image data of a testee in a non-contact mode through data acquisition equipment, controlling the presentation of a visual stimulation video through an experiment control program and controlling the working state of the data acquisition equipment;
performing primary processing on the original facial image data acquired by the data acquisition device through a data processing program, wherein the primary processing comprises a first type processing mode and a second type processing mode;
step three, sending the processed face image data set into a VGG16 model for training through a data training program to obtain an autism child face classification model;
and step four, capturing face image data through a camera, sending the face image data into the trained face classification model of the autism children for prediction, and finally completing the classification function of the autism children.
7. The VGG convolutional neural network-based and facial recognition autism prescreening method of claim 6, wherein the first type of processing in step two comprises: carrying out face detection on original image data, extracting face characteristic data from the original image data, and separating a face from an image background; returning the position of the face in a coordinate form, and simultaneously returning the size and the posture of the face;
the first type of processing is preferably realized by a human face detection classifier based on Harr characteristics in OpenCV; extracting Harr-like features of the collected face image data; the Harr-like feature is a feature for solving a difference value by a pixel division module reflecting the gray level change of an image, is composed of a rectangle and is divided into four categories of edge feature, linear feature, central feature and diagonal feature; the Harr-like feature is extracted by adopting a Viola-Jones face detection algorithm; calculating characteristic value by dividing the gray image into black and white regions, calculating the difference of the sum of pixel values of the white region W and the black region B, and multiplying the difference by corresponding weight coefficient T to obtain Harr characteristic value C of i region Haari The calculation method is as follows:
C Haari =[∫∫ w p(x,y)dx dy-∫∫ B p(x,y)dx dy]*T i
finally, training a Harr classifier by using corresponding positive sample images and negative sample images of features to be recognized and generating a training model, wherein the first type of processing detects the human face by using a Harr cascade classification model which is trained and completed in OpenCV;
the second type of processing is to perform preprocessing operation on the extracted face picture, wherein the preprocessing operation comprises histogram equalization, denoising and normalization;
the histogram equalization is to perform gray mapping on input image data to obtain a two-dimensional histogram of an input image; counting the occurrence frequency of each gray level, and adjusting a gray level histogram to improve the phenomenon that the foreground background is too bright or too dark due to overexposure or underexposure of the image;
the histogram equalization is to scan each pixel of the original image in turn, calculate the gray level histogram of the image, and then calculate the cumulative distribution function S of the gray level histogram k (ii) a Obtaining a mapping relation between input and output according to the cumulative distribution function, and carrying out image transformation according to the mapping relation; the mapping relationship is expressed as:
Figure FDA0003885378950000041
wherein S is k Refers to the value of the current gray level after the cumulative distribution function mapping, n is the sum of the pixels in the image, n j Is the number of pixels of the current gray level, and L is the total number of gray levels in the image;
the denoising is to replace the value of one point in the digital image by the median of each point value in a neighborhood of the point by using a median filtering method, so that the surrounding pixel values are close to the real value, and the isolated noise point is eliminated; the normalization is to divide the pixel value of the image by 255 to obtain a value between 0 and 1 for calculation.
8. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the VGG convolutional neural network-based and face recognition autism prescreening method of any one of claims 5 to 7.
9. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the VGG convolutional neural network-based and face recognition autism prescreening method of any one of claims 6-7.
10. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the VGG convolutional neural network and facial recognition autism early screening system as claimed in any one of claims 1 to 4.
CN202211242844.8A 2022-10-11 2022-10-11 Early screening system and method based on VGG convolutional neural network and facial recognition autism Pending CN115547488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211242844.8A CN115547488A (en) 2022-10-11 2022-10-11 Early screening system and method based on VGG convolutional neural network and facial recognition autism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211242844.8A CN115547488A (en) 2022-10-11 2022-10-11 Early screening system and method based on VGG convolutional neural network and facial recognition autism

Publications (1)

Publication Number Publication Date
CN115547488A true CN115547488A (en) 2022-12-30

Family

ID=84733948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211242844.8A Pending CN115547488A (en) 2022-10-11 2022-10-11 Early screening system and method based on VGG convolutional neural network and facial recognition autism

Country Status (1)

Country Link
CN (1) CN115547488A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473304A (en) * 2023-12-28 2024-01-30 天津大学 Multi-mode image labeling method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473304A (en) * 2023-12-28 2024-01-30 天津大学 Multi-mode image labeling method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Yang et al. Underwater image enhancement based on conditional generative adversarial network
CN108446730B (en) CT pulmonary nodule detection device based on deep learning
CN108229490B (en) Key point detection method, neural network training method, device and electronic equipment
CN110288597B (en) Attention mechanism-based wireless capsule endoscope video saliency detection method
Lakshminarayanan et al. Deep Learning-Based Hookworm Detection in Wireless Capsule Endoscopic Image Using AdaBoost Classifier.
WO2020024127A1 (en) Bone age assessment and height prediction model, system thereof and prediction method therefor
Rahmon et al. Motion U-Net: Multi-cue encoder-decoder network for motion segmentation
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN111401293A (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN112836602B (en) Behavior recognition method, device, equipment and medium based on space-time feature fusion
CN112580458A (en) Facial expression recognition method, device, equipment and storage medium
Yan et al. Ghost removal via channel attention in exposure fusion
CN112836653A (en) Face privacy method, device and apparatus and computer storage medium
Krishnan et al. SwiftSRGAN-Rethinking super-resolution for efficient and real-time inference
CN116757986A (en) Infrared and visible light image fusion method and device
CN115547488A (en) Early screening system and method based on VGG convolutional neural network and facial recognition autism
CN111814682A (en) Face living body detection method and device
CN112541566B (en) Image translation method based on reconstruction loss
Wang et al. SERR-U-Net: squeeze-and-excitation residual and recurrent block-based U-Net for automatic vessel segmentation in retinal image
CN113781468A (en) Tongue image segmentation method based on lightweight convolutional neural network
CN110570425B (en) Pulmonary nodule analysis method and device based on deep reinforcement learning algorithm
CN116884036A (en) Live pig posture detection method, device, equipment and medium based on YOLOv5DA
Peng et al. MND-GAN: A Research on Image Deblurring Algorithm Based on Generative Adversarial Network
Kalbhor et al. CerviCell-detector: An object detection approach for identifying the cancerous cells in pap smear images of cervical cancer
CN111435448B (en) Image saliency object detection method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination