CN115410061B - Image-text emotion analysis system based on natural language processing - Google Patents

Image-text emotion analysis system based on natural language processing Download PDF

Info

Publication number
CN115410061B
CN115410061B CN202210833400.5A CN202210833400A CN115410061B CN 115410061 B CN115410061 B CN 115410061B CN 202210833400 A CN202210833400 A CN 202210833400A CN 115410061 B CN115410061 B CN 115410061B
Authority
CN
China
Prior art keywords
unit
feature
text
characteristic
hur
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210833400.5A
Other languages
Chinese (zh)
Other versions
CN115410061A (en
Inventor
李琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Forestry University
Original Assignee
Northeast Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Forestry University filed Critical Northeast Forestry University
Priority to CN202210833400.5A priority Critical patent/CN115410061B/en
Publication of CN115410061A publication Critical patent/CN115410061A/en
Application granted granted Critical
Publication of CN115410061B publication Critical patent/CN115410061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The invention provides a graphic emotion analysis system based on natural language processing, and relates to the technical field of graphic emotion analysis. The image-text emotion analysis system based on natural language processing comprises an image-text feature acquisition module, an image-text feature analysis module, an expression feature recognition module, a picture feature recognition module, a text feature recognition module, a voice feature recognition module and a feature fusion analysis module, wherein the output end of the image-text feature acquisition module is connected with the image-text feature analysis module through HUR signals. The feature noise reduction unit is used for carrying out voice noise reduction on the extracted features, correcting wrongly written characters of the text and separating shadows of pictures, extracting and analyzing a plurality of feature data contained in the pictures and texts, improving the accuracy of emotion analysis, reducing singleness, and the static feature recognition unit is used for recognizing the expression static features after gray level equalization so as to find the middle point of expression recognition, thereby solving the difficulty that the picture and text expression recognition needs to face a large number of emotion features.

Description

Image-text emotion analysis system based on natural language processing
Technical Field
The invention relates to the technical field of image-text emotion analysis, in particular to an image-text emotion analysis system based on natural language processing.
Background
With the rapid development of the mobile internet in recent years, more and more users express their own views through various social media such as microblogs and the like, comment on hot events is released, a social network has become an important platform for people to release views and express emotions, intervention, penetration and daily increase of the social network are carried out on various fields of society, social influence is increasingly huge, public opinion analysis, personalized recommendation and the like are facilitated by mining the emotion of the users, which is released by massive information on the social network, and China pays importance to the direction, so that the recognition research on social topics in the network based on deep learning is developed in basic scientific research business of colleges and universities.
In early emotion studies, the main study was more single text or image, and the method adopted was mainly a traditional machine learning classification algorithm, such as: in recent years, deep learning shows excellent learning performance, and more researchers start to learn the characteristic representation of text or images by using a deep neural network for emotion classification; but has the problems of insufficient information quantity of a single mode, easy interference by other factors and easy interference by other factors, complex expression meaning and numerous expression types, and can not find a cutting point.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a graphic emotion analysis system based on natural language processing, which solves the problems that the single-mode information content is insufficient and the prior graphic emotion analysis technology is easily interfered by other factors.
In order to achieve the above purpose, the invention is realized by the following technical scheme: the image-text emotion analysis system based on natural language processing comprises an image-text feature acquisition module, an image-text feature analysis module, an expression feature recognition module, a picture feature recognition module, a text feature recognition module, a voice feature recognition module and a feature fusion analysis module, wherein the output end of the image-text feature acquisition module is connected with the image-text feature analysis module through HUR signals, the image-text feature analysis module comprises the expression feature recognition module, the picture feature recognition module, the text feature recognition module and the voice feature recognition module, and the output end of the image-text feature analysis module is connected with the feature fusion analysis module through HUR signals;
the image-text characteristic acquisition module comprises an SVM data acquisition unit, a picture characteristic acquisition unit, an expression characteristic acquisition unit, a text characteristic acquisition unit, a voice characteristic acquisition unit, a characteristic impurity removal and noise reduction unit and a characteristic classification output unit, wherein the SVM data acquisition unit comprises a picture characteristic acquisition unit, an expression characteristic acquisition unit, a text characteristic acquisition unit and a voice characteristic acquisition unit respectively, an output port of the SVM data acquisition unit is connected with the characteristic impurity removal and noise reduction unit through HUR signals, and an output port of the characteristic impurity removal and noise reduction unit is connected with the characteristic classification output unit through HUR signals.
Preferably, the image-text characteristic acquisition module comprises an SVM data acquisition unit, a picture characteristic acquisition unit, an expression characteristic acquisition unit, a text characteristic acquisition unit, a voice characteristic acquisition unit, a characteristic impurity removal and noise reduction unit and a characteristic classification output unit, wherein the SVM data acquisition unit comprises a picture characteristic acquisition unit, an expression characteristic acquisition unit, a text characteristic acquisition unit and a voice characteristic acquisition unit respectively, an output port of the SVM data acquisition unit is connected with the characteristic impurity removal and noise reduction unit through HUR signals, and an output port of the characteristic impurity removal and noise reduction unit is connected with the characteristic classification output unit through HUR signals.
Preferably, the expression feature recognition module comprises a face feature reading unit, a face feature exposure unit, a feature gray balance unit, a static feature recognition unit, a deformation feature recognition unit and an expression emotion classification unit, wherein the input end of the face feature reading unit is connected with the corresponding feature reading unit through an Ethernet signal, the output end of the face feature reading unit is connected with the face feature exposure unit through a HUR signal, the output end of the face feature exposure unit is connected with the feature gray balance unit through a HUR signal, the output end of the feature gray balance unit is connected with the static feature recognition unit through a HUR signal, the output end of the static feature recognition unit is connected with the deformation feature recognition unit through a HUR signal, and the output end of the deformation feature recognition unit is connected with the expression emotion classification unit through a HUR signal.
Preferably, the picture feature recognition module comprises a picture feature reading unit, a wavelet conversion recognition unit, a color feature recognition unit, a line feature recognition unit and a picture emotion classification unit, wherein the input end of the picture feature reading unit is connected with the corresponding feature reading unit through an Ethernet signal, the output end of the picture feature reading unit is connected with the wavelet conversion recognition unit through a HUR signal, the output end of the wavelet conversion recognition unit is connected with the color feature recognition unit through the HUR signal, the output end of the color feature recognition unit is connected with the line feature recognition unit through the HUR signal, and the output end of the line feature recognition unit is connected with the picture emotion classification unit through the HUR signal.
Preferably, the text feature recognition module comprises a text feature collection unit, a text feature dimension reduction unit, a part-of-speech feature recognition unit, a sentence pattern feature recognition unit, a semantic feature recognition unit and a text emotion classification unit, wherein the input end of the text feature collection unit is connected with the corresponding feature reading unit through an Ethernet signal, the output end of the text feature collection unit is connected with the text feature dimension reduction unit through a HUR signal, the output end of the text feature dimension reduction unit is connected with the part-of-speech feature recognition unit through the HUR signal, the output end of the part-of-speech feature recognition unit is connected with the sentence pattern feature recognition unit through the HUR signal, the output end of the sentence pattern feature recognition unit is connected with the semantic feature recognition unit through the HUR signal, and the output end of the semantic feature recognition unit is connected with the text emotion classification unit through the HUR signal.
Preferably, the voice feature recognition module comprises a voice feature reading unit, a voice translation recognition unit, a prosodic feature recognition unit, an amplitude feature recognition unit, an MFCC feature recognition unit and a voice emotion classification unit, wherein the input end of the voice feature reading unit is connected with the corresponding feature reading unit through an Ethernet signal, the output end of the voice feature reading unit is connected with the voice translation recognition unit through a HUR signal, the output end of the voice translation recognition unit is connected with the prosodic feature recognition unit through the HUR signal, the output end of the prosodic feature recognition unit is connected with the amplitude feature recognition unit through the HUR signal, the output end of the amplitude feature recognition unit is connected with the MFCC feature recognition unit through the HUR signal, and the output end of the MFCC feature recognition unit is connected with the voice emotion classification unit through the HUR signal.
Preferably, the feature fusion analysis module comprises an SVM feature fusion unit, a BP model generation unit and an emotion level recognition unit, wherein the SVM feature fusion unit is connected with the BP model generation unit through an Ethernet signal, and the output end of the BP model generation unit is connected with the emotion level recognition unit through a HUR signal.
Preferably, the input end of the SVM feature fusion unit is connected with the expression emotion classification unit, the picture emotion classification unit, the text emotion classification unit and the voice emotion classification unit through HUR signals.
Working principle: s1, firstly, collecting image-text features through an image-text feature collecting module, classifying and extracting the image-text features through an SVM vector machine through an SVM data collecting unit, separating and extracting the image features in the image through an image feature collecting unit included in the SVM data collecting unit, separating and extracting the portrait expressions in the image through an expression feature collecting unit, extracting texts in image-text files, correspondingly extracting image-text voices through a voice feature collecting unit, carrying out voice noise reduction on the extracted features through a feature impurity removing and noise reducing unit, correcting text misprinting words, separating image shadows, and finally transmitting a feature Ethernet form to an image-text feature analyzing module through a feature classifying and outputting unit;
s2, a picture feature receiving unit included in the image-text feature analysis module directly receives picture features sent by the feature classification output unit, an expression feature receiving unit receives character expression features sent by the feature classification output unit, a text feature receiving unit receives text sentence features sent by the feature classification output unit, a voice feature receiving unit correspondingly receives character voice features of the feature classification output unit and stores the character voice features into an image-text feature database, and a corresponding feature reading unit enables an expression feature recognition module, a picture feature recognition module, a text feature recognition module and a voice feature recognition module to carry out corresponding feature reading;
s3, the facial feature recognition module recognizes the collected character expression features, the facial feature reading unit receives image-text character expression feature data of the corresponding feature reading unit, the face feature exposure unit exposes the faces of the characters to convert the faces into a plurality of mutually overlapped rectangular sets, in order to remove redundant overlapped information, the positions of the faces are accurately positioned, the feature gray balance unit carries out gray processing on the exposed facial expressions, the image subjected to gray balance has the same number of pixel points on each gray level, each gray level of a corresponding gray level histogram has the same height, gray balance also belongs to a method for improving the image, the image subjected to gray balance has the largest information quantity and plays a role of image enhancement, the static feature recognition unit and the deformation feature recognition unit can capture the expression static feature more quickly, the deformation feature recognition unit recognizes the deformation feature of the faces to obtain the large feature of the faces, the large feature of the faces represents the general shape of the faces, the feature of the small feature value is used for describing the specific feature of the faces, the image is more clearly and more easily recognized by the face feature recognition unit, and the image-text feature recognition unit is more clearly deformed in a similar manner, and the image-text-like relation is more clearly recognized, and the image-like is more greatly deformed by the face feature recognition unit;
s4, a picture characteristic reading unit included in the picture characteristic recognition module reads and recognizes the picture and text pictures acquired by the picture characteristic acquisition module, a wavelet conversion recognition unit converts the pictures into texture characteristic pictures which can be recognized, a color characteristic recognition unit recognizes color characteristics of the pictures, the basic attribute saturation, tone, color and brightness of the colors are measured, the emotion expression of the pictures is recognized by utilizing the basic characteristics of the images, the shape line characteristics of the picture content are recognized by a picture emotion classification unit, and finally the emotion level of the corresponding picture and text pictures is judged by a semantic characteristic recognition unit according to the recognition data of the picture emotion classification unit and the line characteristic recognition unit;
s5, a text feature recognition module recognizes a text description contained in the image-text, a text feature dimension reduction unit performs dimension reduction treatment, an part-of-speech feature recognition unit recognizes emotion part-of-speech features of the text, a sentence pattern feature recognition unit recognizes sentence pattern arrangement, a semantic feature recognition unit recognizes emotion data contained in the text meaning, and finally a text emotion classification unit classifies the emotion degree of the image-text;
s6, the voice characteristic recognition module recognizes the voice contained in the image and text, the voice characteristic reading unit included in the voice characteristic recognition module receives the voice characteristic collected by the image and text characteristic collection module, the voice translation recognition unit translates the voice and converts the voice characteristic into voice utterances which can be recognized, the prosody characteristic recognition unit recognizes the prosody of the corresponding voice, the prosody fundamental frequency describes the frequency of the voice vibration and is closely related to the size and the tightness degree of the vocal cords, so that the emotion change under different prosodies is judged, the MFCC characteristic recognition unit recognizes the voice amplitude, the amplitude determines the size of the voice, when the voice is in an angry or surprise state, the volume is increased, and when the voice is in a sad state, the volume is lower, so that the emotion state of the voice is judged, the MFCC characteristic recognition unit carries out emphasis processing on the voice signal, the resolution of the voice pitch frequency part is improved, and the voice recognized by the voice emotion classification unit carries out classification judgment;
s7, the SVM feature fusion unit receives emotion classification of the expression feature recognition module, the picture feature recognition module, the text feature recognition module and the voice feature recognition module, and then fusion is carried out, an emotion model is created by the BP model generation unit, and final image-text emotion judgment is carried out by the emotion classification recognition unit.
The invention provides an image-text emotion analysis system based on natural language processing. The beneficial effects are as follows:
1. the invention collects the image-text characteristics through the image-text characteristic collection module, carries out classification extraction through the SVM data collection unit by using an SVM vector machine, carries out separation extraction on the image characteristics in the image through the image characteristic collection unit included in the SVM data collection unit, carries out separation extraction on the portrait expression in the image through the expression characteristic collection unit, extracts the text in the image-text file, carries out corresponding extraction on the image-text voice through the voice characteristic collection unit, carries out voice noise reduction on the extracted characteristics through the characteristic impurity removal noise reduction unit, corrects the text by wrongly written characters, carries out extraction analysis on the image shadows, and improves the accuracy of emotion analysis and reduces the singleness.
2. According to the invention, the face of the person is subjected to exposure treatment through the face feature exposure unit, so that the face area is converted into a plurality of mutually overlapped rectangular sets, in order to remove redundant overlapped information, the face position is accurately positioned, the exposed face expression is subjected to gray processing through the feature gray balance unit, the image subjected to gray balance has the same number of pixels on each gray level, the corresponding gray level of the gray histogram has the same height, the gray balance also belongs to an image improvement method, the image subjected to gray balance has the maximum information quantity, the image enhancement function is realized, the static feature recognition unit and the deformation feature recognition unit can capture the expression more quickly, the static feature recognition unit recognizes the expression static feature subjected to gray balance, the deformation feature recognition unit recognizes the deformation feature of the face, and the larger feature of the face is obtained, so that the middle point of the expression recognition is found, and the problem that the image-text expression recognition needs to face to be in a large scale is solved.
Drawings
FIG. 1 is a schematic diagram of a system architecture according to the present invention;
FIG. 2 is a schematic diagram of an image-text feature acquisition module according to the present invention;
FIG. 3 is a schematic diagram of a graphic feature analysis module according to the present invention;
FIG. 4 is a schematic diagram of an emotion recognition module architecture according to the present invention;
FIG. 5 is a schematic diagram of a picture feature recognition module according to the present invention;
FIG. 6 is a schematic diagram of a text feature recognition module architecture according to the present invention;
FIG. 7 is a schematic diagram of a speech feature recognition module architecture according to the present invention;
fig. 8 is a schematic diagram of a feature fusion analysis module architecture according to the present invention.
1, an image-text characteristic acquisition module; 2. the image-text characteristic analysis module; 3. expression feature recognition module; 4. a picture feature recognition module; 5. a text feature recognition module; 6. a voice feature recognition module; 7. the feature fusion analysis module; 101. an SVM data acquisition unit; 102. a picture characteristic acquisition unit; 103. expression characteristic acquisition unit; 104. a text feature acquisition unit; 105. a voice characteristic acquisition unit; 106. the characteristic impurity removal and noise reduction unit; 107. a feature classification output unit; 201. a picture feature receiving unit; 202. an expression feature receiving unit; 203. a text feature receiving unit; 204. a voice feature receiving unit; 205. a graphic character database; 206. a corresponding feature reading unit; 301. a face feature reading unit; 302. a face feature exposure unit; 303. a characteristic gray level equalizing unit; 304. a static feature recognition unit; 305. a deformation characteristic recognition unit; 306. an expression emotion grading unit; 401. a picture feature reading unit; 402. a wavelet transformation identification unit; 403. a color feature recognition unit; 404. a line feature recognition unit; 405. a picture emotion grading unit; 501. a text feature reading unit; 502. a text feature dimension reduction unit; 503. part-of-speech feature recognition unit; 504. sentence pattern feature recognition unit; 505. a semantic feature recognition unit; 506. a text emotion grading unit; 601. a voice feature reading unit; 602. a speech translation recognition unit; 603. a prosodic feature recognition unit; 604. an amplitude characteristic recognition unit; 605. an MFCC feature recognition unit; 606. a speech emotion classifying unit; 701. an SVM feature fusion unit; 702. a BP model generation unit; 703. and an emotion level identification unit.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1-8, an embodiment of the present invention provides an image-text emotion analysis system based on natural language processing, which includes an image-text feature collection module 1, an image-text feature analysis module 2, an expression feature recognition module 3, a picture feature recognition module 4, a text feature recognition module 5, a voice feature recognition module 6 and a feature fusion analysis module 7, wherein an output end of the image-text feature collection module 1 is connected with the image-text feature analysis module 2 through a HUR signal, the image-text feature analysis module 2 includes the expression feature recognition module 3, the picture feature recognition module 4, the text feature recognition module 5 and the voice feature recognition module 6, and an output end of the image-text feature analysis module 2 is connected with the feature fusion analysis module 7 through the HUR signal;
the image-text feature analysis module 2 comprises a picture feature receiving unit 201, an expression feature receiving unit 202, a text feature receiving unit 203, a voice feature receiving unit 204, an image-text feature database 205 and a corresponding feature reading unit 206, wherein the output ends of the picture feature receiving unit 201, the expression feature receiving unit 202, the text feature receiving unit 203 and the voice feature receiving unit 204 are connected with the image-text feature database 205 through Ethernet signals, the output ends of the image-text feature database 205 are connected with the corresponding feature reading unit 206 through HUR signals, the picture feature receiving unit 201 included in the image-text feature analysis module 2 directly receives the picture features sent by the feature classification output unit 107, the expression feature receiving unit 202 receives the expression features sent by the feature classification output unit 107, the text feature receiving unit 203 receives the text sentence features sent by the feature classification output unit 107, the voice feature receiving unit 204 correspondingly receives the character voice features of the feature classification output unit 107, and stores the character voice features into the image-text feature database 205, and the corresponding feature reading unit 206 enables the expression feature recognition module 3, the picture feature recognition module 4, the text feature recognition module 5 and the voice feature recognition module 6 to read the corresponding features.
The image-text characteristic acquisition module 1 comprises an SVM data acquisition unit 101, a picture characteristic acquisition unit 102, an expression characteristic acquisition unit 103, a text characteristic acquisition unit 104, a voice characteristic acquisition unit 105, a characteristic impurity removal noise reduction unit 106 and a characteristic classification output unit 107, the SVM data acquisition unit 101 respectively comprises the picture characteristic acquisition unit 102, the expression characteristic acquisition unit 103, the text characteristic acquisition unit 104 and the voice characteristic acquisition unit 105, the output port of the SVM data acquisition unit 101 is connected with the characteristic impurity removal noise reduction unit 106 through HUR signals, the output port of the characteristic impurity removal noise reduction unit 106 is connected with the characteristic classification output unit 107 through HUR signals, the image-text characteristic acquisition module 1 is used for acquiring image-text characteristics, the SVM data acquisition unit 101 is used for classifying and extracting by an SVM vector machine, the picture feature acquisition unit 102 included in the SVM data acquisition unit 101 is used for separating and extracting picture features in pictures, the expression feature acquisition unit 103 is used for separating and extracting human expressions in the pictures, the text in the picture file is extracted by the expression feature acquisition unit 104, the picture-text voice is correspondingly extracted by the voice feature acquisition unit 105, the voice noise reduction is carried out on the extracted features by the feature impurity removal noise reduction unit 106, the text error word correction and the picture shadow separation are carried out, and finally the feature Ethernet type is transmitted to the picture-text feature analysis module 2 by the feature classification output unit 107.
The expression feature recognition module 3 comprises a face feature reading unit 301, a face feature exposure unit 302, a feature gray balance unit 303, a static feature recognition unit 304, a deformation feature recognition unit 305 and an expression emotion classification unit 306, wherein the input end of the face feature reading unit 301 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the face feature reading unit 301 is connected with the face feature exposure unit 302 through a HUR signal, the output end of the face feature exposure unit 302 is connected with the feature gray balance unit 303 through a HUR signal, the output end of the feature gray balance unit 303 is connected with the static feature recognition unit 304 through a HUR signal, the output end of the static feature recognition unit 304 is connected with the deformation feature recognition unit 305 through a HUR signal, the output end of the deformation feature recognition unit 305 is connected with the expression emotion classification unit 306 through a HUR signal, the expression feature recognition module 3 recognizes the collected character expression feature, the face feature reading unit 301 receives the image-text character expression feature data corresponding to the feature reading unit 206, the face feature exposure unit 302 exposes the face of the character to convert the face area into a plurality of mutually overlapped rectangular sets, in order to remove redundant overlapped information and accurately position the face position, the feature gray level equalizing unit 303 carries out gray level processing on the exposed face expression, the image after gray level equalization has the same number of pixels on each gray level, each gray level of the corresponding gray level histogram has the same height, the gray level equalization also belongs to the image improving method, the image after gray level equalization has the maximum information amount and plays the role of image enhancement, the static feature recognition unit 304 and the deformation feature recognition unit 305 can capture the expression more quickly, the static feature recognition unit 304 recognizes the static feature of the expression after gray balance, the deformation feature recognition unit 305 recognizes the deformation feature of the face, the larger feature of the face is obtained, the general shape of the face is represented, the feature of the smaller feature value is used for describing the specific details of the face, in the feature face set, the exaggeration degree and recognition rate of various expressions show similar positive correlation, the exaggeration degree of the expression is bigger, the facial feature is more obvious, the static and deformation features recognized by the static feature recognition unit 304 and the deformation feature recognition unit 305 enable the expression emotion classification unit 306 to classify the expression emotion of the image-text human, and the expression emotion value of the corresponding image-text is obtained.
The picture feature recognition module 4 comprises a picture feature reading unit 401, a wavelet conversion recognition unit 402, a color feature recognition unit 403, a line feature recognition unit 404 and a picture emotion classification unit 405, wherein the input end of the picture feature reading unit 401 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the picture feature reading unit 401 is connected with the wavelet conversion recognition unit 402 through a HUR signal, the output end of the wavelet conversion recognition unit 402 is connected with the color feature recognition unit 403 through a HUR signal, the output end of the color feature recognition unit 403 is connected with the line feature recognition unit 404 through a HUR signal, the output end of the line feature recognition unit 404 is connected with the picture emotion classification unit 405 through the HUR signal, the picture feature reading unit 401 included in the picture feature recognition module 4 reads and recognizes the picture and text image acquired by the picture and text feature acquisition module 1, the wavelet conversion recognition unit 402 converts the picture into a texture feature image which can be recognized, the color feature recognition unit 403 recognizes the color feature of the picture, the basic attribute saturation, tone, color and brightness of the color are measured, the emotion expression of the picture is recognized by utilizing the basic feature of the image, the shape line feature of the picture content is recognized by the picture emotion classification unit 405, and finally the semantic feature recognition unit 505 judges the emotion level of the corresponding picture and text image according to the recognition data of the picture emotion classification unit 405 and the line feature recognition unit 404.
The text feature recognition module 5 comprises a text feature reading unit 501, a text feature dimension reduction unit 502, an part-of-speech feature recognition unit 503, a sentence pattern feature recognition unit 504, a semantic feature recognition unit 505 and a text emotion classification unit 506, wherein the input end of the text feature reading unit 501 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the text feature reading unit 501 is connected with the text feature dimension reduction unit 502 through a HUR signal, the output end of the text feature dimension reduction unit 502 is connected with the part-of-speech feature recognition unit 503 through a HUR signal, the output end of the part-of-speech feature recognition unit 503 is connected with the sentence pattern feature recognition unit 504 through a HUR signal, the output end of the sentence pattern feature recognition unit 504 is connected with the semantic feature recognition unit 505 through a HUR signal, the output end of the semantic feature recognition unit 505 is connected with the text emotion classification unit 506 through a HUR signal, the text feature recognition module 5 recognizes word descriptions contained in the pictures and texts, the word part feature recognition unit 503 recognizes emotion word part features of the texts after the text feature dimension reduction unit 502 performs dimension reduction processing, the sentence pattern feature recognition unit 504 recognizes sentence pattern arrangement, the semantic feature recognition unit 505 recognizes emotion data contained in the text meanings, and finally the text emotion classification unit 506 classifies the word emotion degree of the pictures and texts.
The voice feature recognition module 6 comprises a voice feature reading unit 601, a voice translation recognition unit 602, a prosodic feature recognition unit 603, an amplitude feature recognition unit 604, an MFCC feature recognition unit 605 and a voice emotion classification unit 606, wherein the input end of the voice feature reading unit 601 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the voice feature reading unit 601 is connected with the voice translation recognition unit 602 through a HUR signal, the output end of the voice translation recognition unit 602 is connected with the prosodic feature recognition unit 603 through a HUR signal, the output end of the prosodic feature recognition unit 603 is connected with the amplitude feature recognition unit 604 through a HUR signal, the output end of the amplitude feature recognition unit 604 is connected with the MFCC feature recognition unit 605 through a HUR signal, the output end of the MFCC feature recognition unit 605 is connected with the voice emotion classification unit 606 through a HUR signal, the voice feature recognition module 6 recognizes the voice contained in the graphics context, the voice feature reading unit 601 included in the voice feature recognition module 6 receives the voice feature collected by the graphics context feature collection module 1, the voice translation recognition unit 602 translates the voice into a voice utterance which can be recognized, the prosody feature recognition unit 603 recognizes the prosody of the corresponding voice, the prosody fundamental frequency describes the frequency of the voice vibration and is closely related to the size and tightness of the vocal cords, so as to judge the emotion change under different prosodies, the MFCC feature recognition unit 605 recognizes the voice amplitude, the amplitude determines the size of the voice, when in an anger or surprise state, the volume is increased, when in a sad state, the volume is lower, so as to judge the emotion state of the voice, the MFCC feature recognition unit 605 processes the voice signal to emphasize the voice signal, improves the resolution of the voice pitch frequency part, the speech recognized by the speech feature recognition module 6 is classified and judged by the speech emotion classification unit 606.
The feature fusion analysis module 7 comprises an SVM feature fusion unit 701, a BP model generation unit 702 and an emotion level recognition unit 703, wherein the SVM feature fusion unit 701 is connected with the BP model generation unit 702 through an Ethernet signal, the output end of the BP model generation unit 702 is connected with the emotion level recognition unit 703 through a HUR signal, the input end of the SVM feature fusion unit 701 is connected with an expression emotion classification unit 306, a picture emotion classification unit 405, a text emotion classification unit 506 and a voice emotion classification unit 606 through a HUR signal, the SVM feature fusion unit 701 receives and fuses emotion classification of the expression feature recognition module 3, the picture feature recognition module 4, the text feature recognition module 5 and the voice feature recognition module 6, and the BP model generation unit 702 creates an emotion model and the emotion level recognition unit 703 carries out final picture emotion judgment.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. The utility model provides an image and text emotion analysis system based on natural language processing, includes image and text feature collection module (1), image and text feature analysis module (2), expression feature recognition module (3), picture feature recognition module (4), text feature recognition module (5), speech feature recognition module (6) and feature fusion analysis module (7), its characterized in that: the output end of the image-text characteristic acquisition module (1) is connected with the image-text characteristic analysis module (2) through HUR signals, the image-text characteristic analysis module (2) comprises an expression characteristic recognition module (3), a picture characteristic recognition module (4), a text characteristic recognition module (5) and a voice characteristic recognition module (6), and the output end of the image-text characteristic analysis module (2) is connected with the characteristic fusion analysis module (7) through HUR signals;
the image-text characteristic analysis module (2) comprises an image characteristic receiving unit (201), an expression characteristic receiving unit (202), a text characteristic receiving unit (203), a voice characteristic receiving unit (204), an image-text characteristic database (205) and a corresponding characteristic reading unit (206), wherein the output ends of the image characteristic receiving unit (201), the expression characteristic receiving unit (202), the text characteristic receiving unit (203) and the voice characteristic receiving unit (204) are connected with the image-text characteristic database (205) through Ethernet signals, and the output end of the image-text characteristic database (205) is connected with the corresponding characteristic reading unit (206) through HUR signals;
the expression feature recognition module (3) comprises a face feature reading unit (301), a face feature exposure unit (302), a feature gray level balancing unit (303), a static feature recognition unit (304), a deformation feature recognition unit (305) and an expression emotion classification unit (306), wherein the input end of the face feature reading unit (301) is connected with a corresponding feature reading unit (206) through an Ethernet signal, the output end of the face feature reading unit (301) is connected with the face feature exposure unit (302) through a HUR signal, the output end of the face feature exposure unit (302) is connected with the feature gray level balancing unit (303) through a HUR signal, the output end of the feature gray level balancing unit (303) is connected with the static feature recognition unit (304) through a HUR signal, the output end of the static feature recognition unit (304) is connected with the deformation feature recognition unit (305) through a HUR signal, and the output end of the deformation feature recognition unit (305) is connected with the expression classification unit (306) through a HUR signal;
the feature fusion analysis module (7) comprises an SVM feature fusion unit (701), a BP model generation unit (702) and an emotion level recognition unit (703), wherein the SVM feature fusion unit (701) is connected with the BP model generation unit (702) through an Ethernet signal, and the output end of the BP model generation unit (702) is connected with the emotion level recognition unit (703) through a HUR signal.
2. The natural language processing based teletext emotion analysis system of claim 1, wherein: the image-text characteristic acquisition module (1) comprises an SVM data acquisition unit (101), a picture characteristic acquisition unit (102), an expression characteristic acquisition unit (103), a text characteristic acquisition unit (104), a voice characteristic acquisition unit (105), a characteristic impurity removal and noise reduction unit (106) and a characteristic classification output unit (107), wherein the SVM data acquisition unit (101) respectively comprises the picture characteristic acquisition unit (102), the expression characteristic acquisition unit (103), the text characteristic acquisition unit (104) and the voice characteristic acquisition unit (105), an output port of the SVM data acquisition unit (101) is connected with the characteristic impurity removal and noise reduction unit (106) through HUR signals, and an output port of the characteristic impurity removal and noise reduction unit (106) is connected with the characteristic classification output unit (107) through HUR signals.
3. The natural language processing based teletext emotion analysis system of claim 2, wherein: the picture feature recognition module (4) comprises a picture feature reading unit (401), a wavelet conversion recognition unit (402), a color feature recognition unit (403), a line feature recognition unit (404) and a picture emotion classification unit (405), wherein the input end of the picture feature reading unit (401) is connected with a corresponding feature reading unit (206) through an Ethernet signal, the output end of the picture feature reading unit (401) is connected with the wavelet conversion recognition unit (402) through a HUR signal, the output end of the wavelet conversion recognition unit (402) is connected with the color feature recognition unit (403) through a HUR signal, the output end of the color feature recognition unit (403) is connected with the line feature recognition unit (404) through a HUR signal, and the output end of the line feature recognition unit (404) is connected with the picture emotion classification unit (405) through the HUR signal.
4. A natural language processing based teletext emotion analysis system according to claim 3, wherein: the text feature recognition module (5) comprises a text feature reading unit (501), a text feature dimension reduction unit (502), a part-of-speech feature recognition unit (503), a sentence feature recognition unit (504), a semantic feature recognition unit (505) and a text emotion classification unit (506), wherein the input end of the text feature reading unit (501) is connected with a corresponding feature reading unit (206) through an Ethernet signal, the output end of the text feature reading unit (501) is connected with the text feature dimension reduction unit (502) through a HUR signal, the output end of the text feature dimension reduction unit (502) is connected with the part-of-speech feature recognition unit (503) through the HUR signal, the output end of the part-of-speech feature recognition unit (503) is connected with the sentence feature recognition unit (504) through the HUR signal, the output end of the sentence feature recognition unit (504) is connected with the semantic feature recognition unit (505) through the HUR signal, and the output end of the semantic feature recognition unit (505) is connected with the text emotion classification unit (506) through the HUR signal.
5. The natural language processing based teletext emotion analysis system of claim 4, wherein: the voice feature recognition module (6) comprises a voice feature reading unit (601), a voice translation recognition unit (602), a prosodic feature recognition unit (603), an amplitude feature recognition unit (604), an MFCC feature recognition unit (605) and a voice emotion classification unit (606), wherein the input end of the voice feature reading unit (601) is connected with a corresponding feature reading unit (206) through an Ethernet signal, the output end of the voice feature reading unit (601) is connected with the voice translation recognition unit (602) through a HUR signal, the output end of the voice translation recognition unit (602) is connected with the prosodic feature recognition unit (603) through a HUR signal, the output end of the prosodic feature recognition unit (603) is connected with the amplitude feature recognition unit (604) through a HUR signal, the output end of the amplitude feature recognition unit (604) is connected with the MFCC feature recognition unit (605) through a HUR signal, and the output end of the MFCC feature recognition unit (605) is connected with the voice classification unit (606) through a HUR signal.
6. The natural language processing based teletext emotion analysis system of claim 5, wherein: the input end of the SVM feature fusion unit (701) is connected with the expression emotion classification unit (306), the picture emotion classification unit (405), the text emotion classification unit (506) and the voice emotion classification unit (606) through HUR signals.
CN202210833400.5A 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing Active CN115410061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210833400.5A CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210833400.5A CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Publications (2)

Publication Number Publication Date
CN115410061A CN115410061A (en) 2022-11-29
CN115410061B true CN115410061B (en) 2024-02-09

Family

ID=84157623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210833400.5A Active CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Country Status (1)

Country Link
CN (1) CN115410061B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN111694959A (en) * 2020-06-08 2020-09-22 谢沛然 Network public opinion multi-mode emotion recognition method and system based on facial expressions and text information
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
CN114495217A (en) * 2022-01-14 2022-05-13 建信金融科技有限责任公司 Scene analysis method, device and system based on natural language and expression analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN111694959A (en) * 2020-06-08 2020-09-22 谢沛然 Network public opinion multi-mode emotion recognition method and system based on facial expressions and text information
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
CN114495217A (en) * 2022-01-14 2022-05-13 建信金融科技有限责任公司 Scene analysis method, device and system based on natural language and expression analysis

Also Published As

Publication number Publication date
CN115410061A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN108717856A (en) A kind of speech-emotion recognition method based on multiple dimensioned depth convolution loop neural network
CN111259976B (en) Personality detection method based on multi-modal alignment and multi-vector characterization
CN104966084A (en) OCR (Optical Character Recognition) and TTS (Text To Speech) based low-vision reading visual aid system
CN113538636B (en) Virtual object control method and device, electronic equipment and medium
CN113592251B (en) Multi-mode integrated teaching state analysis system
CN112269868A (en) Use method of machine reading understanding model based on multi-task joint training
CN113657115B (en) Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion
CN112836702B (en) Text recognition method based on multi-scale feature extraction
CN112784696A (en) Lip language identification method, device, equipment and storage medium based on image identification
CN108805036A (en) A kind of new non-supervisory video semanteme extracting method
CN113920561A (en) Facial expression recognition method and device based on zero sample learning
CN113837907A (en) Man-machine interaction system and method for English teaching
CN115410061B (en) Image-text emotion analysis system based on natural language processing
CN114626424B (en) Data enhancement-based silent speech recognition method and device
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN115455136A (en) Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN112699236B (en) Deepfake detection method based on emotion recognition and pupil size calculation
CN115294966A (en) Nuclear power plant voice recognition training method, intelligent voice control method and system
CN115588227A (en) Emotion recognition method and device, electronic equipment and storage medium
CN204856534U (en) System of looking that helps is read to low eyesight based on OCR and TTS
CN113658582A (en) Voice-video cooperative lip language identification method and system
CN114359446A (en) Animation picture book generation method, device, equipment and storage medium
CN113569690A (en) Classroom teaching quality double-dial control method based on deep learning
CN114187632A (en) Facial expression recognition method and device based on graph convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant