CN115410061A - Image-text emotion analysis system based on natural language processing - Google Patents

Image-text emotion analysis system based on natural language processing Download PDF

Info

Publication number
CN115410061A
CN115410061A CN202210833400.5A CN202210833400A CN115410061A CN 115410061 A CN115410061 A CN 115410061A CN 202210833400 A CN202210833400 A CN 202210833400A CN 115410061 A CN115410061 A CN 115410061A
Authority
CN
China
Prior art keywords
unit
feature
text
characteristic
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210833400.5A
Other languages
Chinese (zh)
Other versions
CN115410061B (en
Inventor
李琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Forestry University
Original Assignee
Northeast Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Forestry University filed Critical Northeast Forestry University
Priority to CN202210833400.5A priority Critical patent/CN115410061B/en
Publication of CN115410061A publication Critical patent/CN115410061A/en
Application granted granted Critical
Publication of CN115410061B publication Critical patent/CN115410061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Child & Adolescent Psychology (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Hospice & Palliative Care (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a picture and text emotion analysis system based on natural language processing, and relates to the technical field of picture and text emotion analysis. The image-text emotion analysis system based on natural language processing comprises an image-text feature acquisition module, an image-text feature analysis module, an expression feature recognition module, an image feature recognition module, a text feature recognition module, a voice feature recognition module and a feature fusion analysis module, wherein the output end of the image-text feature acquisition module is connected with the image-text feature analysis module through HUR signals. The extracted features are subjected to voice noise reduction, text wrongly written character correction and picture shadow separation through the feature impurity removal and noise reduction unit, extraction and analysis are carried out on a plurality of feature data contained in the pictures and texts, the accuracy of emotion analysis is improved, the unicity is reduced, the static feature recognition unit recognizes the expression static features after gray balance so as to find a transfer point of expression recognition, and the problem that a large number of expression features are required to be faced in picture and text expression recognition is solved.

Description

Image-text emotion analysis system based on natural language processing
Technical Field
The invention relates to the technical field of image-text emotion analysis, in particular to an image-text emotion analysis system based on natural language processing.
Background
With the rapid development of mobile internet in recent years, more and more users express their own opinions and comment on hot events through various social media such as microblogs, social networks have become important platforms for people to publish opinions and express moods, and the social networks have great social influence on the intervention, penetration and daily increase of many social fields, so that public opinion analysis, personalized recommendation and the like are facilitated by mining the emotion implied by mass information published on the social networks by the users, and China pays attention to this direction, and the research on the identification of social topics in the networks based on deep learning is developed in the basic scientific research business of colleges and universities.
In the early emotional research, a relatively single text or image is mainly researched, and the adopted method is mainly a traditional machine learning classification algorithm, such as: the method comprises the following steps that (1) a K nearest neighbor algorithm, a support vector machine, a maximum entropy classifier, a Bayes classifier and the like are adopted, however, in recent years, deep learning represents excellent learning performance, and more researchers tend to learn feature representation of texts or images by using a deep neural network for emotion classification; but the method has the problems that the single-mode information quantity is insufficient, the interference of other factors is easy to occur, the expression meaning is complex, and the situations that the contact points cannot be found due to a plurality of expression types are solved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a graph and text sentiment analysis system based on natural language processing, which solves the problems that the single-mode information quantity of the existing graph and text sentiment analysis technology is insufficient and is easily interfered by other factors.
In order to realize the purpose, the invention is realized by the following technical scheme: a picture and text emotion analysis system based on natural language processing comprises a picture and text feature acquisition module, a picture and text feature analysis module, an expression feature recognition module, a picture feature recognition module, a text feature recognition module, a voice feature recognition module and a feature fusion analysis module, wherein the output end of the picture and text feature acquisition module is connected with the picture and text feature analysis module through HUR signals;
the image-text feature acquisition module comprises an SVM data acquisition unit, an image feature acquisition unit, an expression feature acquisition unit, a text feature acquisition unit, a voice feature acquisition unit, a feature impurity removal and noise reduction unit and a feature classification output unit, wherein the SVM data acquisition unit comprises the image feature acquisition unit, the expression feature acquisition unit, the text feature acquisition unit and the voice feature acquisition unit respectively, the output port of the SVM data acquisition unit is connected with the feature impurity removal and noise reduction unit through a HUR signal, and the output port of the feature impurity removal and noise reduction unit is connected with the feature classification output unit through a HUR signal.
Preferably, the image-text feature acquisition module comprises an SVM data acquisition unit, an image feature acquisition unit, an expression feature acquisition unit, a text feature acquisition unit, a voice feature acquisition unit, a feature impurity removal and noise reduction unit and a feature classification output unit, the SVM data acquisition unit comprises the image feature acquisition unit, the expression feature acquisition unit, the text feature acquisition unit and the voice feature acquisition unit respectively, the output port of the SVM data acquisition unit is connected with the feature impurity removal and noise reduction unit through a HUR signal, and the output port of the feature impurity removal and noise reduction unit is connected with the feature classification output unit through a HUR signal.
Preferably, expression feature identification module includes facial feature reading unit, facial feature exposure unit, characteristic grey scale equalization unit, static feature identification unit, deformation feature identification unit, expression emotion grading unit, the input of facial feature reading unit corresponds the feature reading unit through ethernet signal connection, the output of facial feature reading unit passes through HUR signal connection facial feature exposure unit, the output of facial feature exposure unit passes through HUR signal connection characteristic grey scale equalization unit, the output of characteristic grey scale equalization unit passes through HUR signal connection static feature identification unit, static feature identification unit's output passes through HUR signal connection deformation feature identification unit, deformation feature identification unit's output passes through HUR signal connection expression emotion grading unit.
Preferably, the picture characteristic identification module comprises a picture characteristic reading unit, a wavelet conversion identification unit, a color characteristic identification unit, a line characteristic identification unit and a picture emotion classification unit, the input end of the picture characteristic reading unit is connected with the corresponding characteristic reading unit through an Ethernet signal, the output end of the picture characteristic reading unit is connected with the wavelet conversion identification unit through an HUR signal, the output end of the wavelet conversion identification unit is connected with the color characteristic identification unit through an HUR signal, the output end of the color characteristic identification unit is connected with the line characteristic identification unit through an HUR signal, and the output end of the line characteristic identification unit is connected with the picture emotion classification unit through an HUR signal.
Preferably, the text feature identification module comprises a text feature acquisition unit, a text feature dimension reduction unit, a part of speech feature identification unit, a sentence pattern feature identification unit, a semantic feature identification unit and a text emotion classification unit, wherein the input end of the text feature acquisition unit is connected with a corresponding feature reading unit through an Ethernet signal, the output end of the text feature acquisition unit is connected with the text feature dimension reduction unit through an HUR signal, the output end of the text feature dimension reduction unit is connected with the part of speech feature identification unit through an HUR signal, the output end of the part of speech feature identification unit is connected with the sentence pattern feature identification unit through an HUR signal, the output end of the sentence pattern feature identification unit is connected with the semantic feature identification unit through an HUR signal, and the output end of the semantic feature identification unit is connected with the text emotion classification unit through an HUR signal.
Preferably, the voice feature recognition module comprises a voice feature reading unit, a voice translation recognition unit, a prosodic feature recognition unit, an amplitude feature recognition unit, a MFCC feature recognition unit and a voice emotion classification unit, the input end of the voice feature reading unit is connected with the corresponding feature reading unit through an Ethernet signal, the output end of the voice feature reading unit is connected with the voice translation recognition unit through a HUR signal, the output end of the voice translation recognition unit is connected with the prosodic feature recognition unit through a HUR signal, the output end of the prosodic feature recognition unit is connected with the amplitude feature recognition unit through a HUR signal, the output end of the amplitude feature recognition unit is connected with the MFCC feature recognition unit through a HUR signal, and the output end of the MFCC feature recognition unit is connected with the voice emotion classification unit through a HUR signal.
Preferably, the feature fusion analysis module comprises an SVM feature fusion unit, a BP model generation unit and an emotion level identification unit, the SVM feature fusion unit is connected with the BP model generation unit through an ethernet signal, and an output end of the BP model generation unit is connected with the emotion level identification unit through an HUR signal.
Preferably, the input end of the SVM feature fusion unit is connected with the expression emotion classification unit, the picture emotion classification unit, the text emotion classification unit and the voice emotion classification unit through HUR signals.
The working principle is as follows: s1, firstly, acquiring image-text characteristics through an image-text characteristic acquisition module, classifying and extracting the image-text characteristics through an SVM (support vector machine) data acquisition unit, separating and extracting the image characteristics in the image-text through an image characteristic acquisition unit included in the SVM data acquisition unit, separating and extracting human-image expressions in the image through an expression characteristic acquisition unit, extracting texts in an image-text file, correspondingly extracting image-text voices through a voice characteristic acquisition unit, performing voice noise reduction on the extracted characteristics through a characteristic impurity removal and noise reduction unit, correcting wrongly written texts, separating image shadows, and finally sending the characteristics to an image-text characteristic analysis module through a characteristic classification output unit in an Ethernet form;
s2, a picture characteristic receiving unit included in the image-text characteristic analysis module directly receives picture characteristics sent by a characteristic classification output unit, an expression characteristic receiving unit receives character expression characteristics sent by the characteristic classification output unit, a text characteristic receiving unit receives text sentence characteristics sent by the characteristic classification output unit, a voice characteristic receiving unit correspondingly receives character voice characteristics of the characteristic classification output unit and stores the character voice characteristics to an image-text characteristic database, and a corresponding characteristic reading unit starts an expression characteristic recognition module, a picture characteristic recognition module, a text characteristic recognition module and a voice characteristic recognition module to read corresponding characteristics;
s3, the expression feature recognition module recognizes the acquired character expression features, the facial feature reading unit receives the image-text character expression feature data of the corresponding feature reading unit, the facial feature exposure unit exposes the face of the character to enable the face area to be converted into a plurality of mutually overlapped rectangular sets, the face position is accurately positioned in order to remove redundant overlapped information, the gray level of the exposed facial expression is processed by the feature gray level equalization unit, the image after gray level equalization has the same number of pixel points on each gray level, each gray level corresponding to a gray level histogram has the same height, the gray level equalization also belongs to a method for improving the image, the image after gray level equalization has the largest information amount and plays a role in image enhancement, the static feature recognition unit and the deformation feature recognition unit can capture the expression more quickly, the static feature recognition unit recognizes the expression static feature after gray level equalization, the small feature of the expression is recognized, the deformation feature recognition unit recognizes the deformation feature of the face, the large feature of the face is obtained, the general shape of the face is represented, the feature of the small feature value is used for describing the specific details of the face, the exaggeration degree and the recognition rate of various expressions in a feature face set are in a positive correlation similar relation, the more exaggeration degree of the expression is, the more obvious facial feature is, the static and deformation features recognized by the static feature recognition unit and the deformation feature recognition unit enable the expression emotion grading unit to grade the image expression emotion of the image-text, and the expression emotion value corresponding to the image-text is obtained;
s4, an image feature reading unit included by the image feature identification module reads and identifies an image and text image acquired by the image and text feature acquisition module, a wavelet conversion identification unit converts the image into a texture feature image which can be identified, a color feature identification unit identifies the color feature of the image, the basic attribute saturation, the hue, the color and the brightness of the color are measured, the emotion expression of the image is identified by using the basic feature of the image, the shape line feature of the image content is identified by an image emotion grading unit, and finally the emotion grade of the corresponding image and text image is judged by a semantic feature identification unit according to the identification data of the image emotion grading unit and the line feature identification unit;
s5, the text feature recognition module recognizes the text narration contained in the image and text, after the text feature dimension reduction unit performs dimension reduction processing, the part-of-speech feature recognition unit recognizes the emotional part-of-speech features of the text, the sentence pattern feature recognition unit recognizes the sentence pattern arrangement, the semantic feature recognition unit recognizes the emotional data contained in the text meaning, and finally the text emotion classification unit classifies the character emotion degree of the image and text;
s6, the voice feature recognition module recognizes the voice contained in the image and text, a voice feature reading unit included in the voice feature recognition module receives the voice feature collected by the image and text feature collection module, the voice translation recognition unit translates the voice and converts the voice into a voice utterance for recognition, a prosody feature recognition unit recognizes the prosody of the corresponding voice, the prosody base frequency describes the frequency of voice vibration and is closely related to the size and tightness of vocal cords so as to judge emotion change under different prosody, an MFCC feature recognition unit recognizes voice amplitude, the amplitude determines the size of the voice, when the voice is in an angry or surprise state, the volume is increased, when the voice is in a sad state, the volume is lower, so that the emotion state of the voice is judged, the MFCC feature recognition unit emphasizes the voice signal, the resolution of a high-frequency part of the voice is improved, and the voice emotion classification unit performs classification judgment on the voice recognized by the voice feature recognition module;
and S7, the SVM feature fusion unit receives emotion classifications of the expression feature recognition module, the picture feature recognition module, the text feature recognition module and the voice feature recognition module and then fuses the received emotion classifications, the BP model generation unit creates an emotion model, and the emotion classification recognition unit performs final image-text emotion judgment.
The invention provides a graph and text sentiment analysis system based on natural language processing. The method has the following beneficial effects:
1. the image-text characteristics are acquired through an image-text characteristic acquisition module, classification extraction is carried out through an SVM vector machine by an SVM data acquisition unit, the image characteristics in the image-text are separated and extracted through an image characteristic acquisition unit included by the SVM data acquisition unit, then portrait expressions in the image are separated and extracted through an expression characteristic acquisition unit, texts in the image-text file are extracted, image-text voices are correspondingly extracted by a voice characteristic acquisition unit, voice noise reduction is carried out on the extracted characteristics by a characteristic impurity removal and noise reduction unit, wrongly written text correction and image shadow separation are carried out, extraction and analysis are carried out on a plurality of characteristic data contained in the image-text, the accuracy of emotion analysis is improved, and the singleness is reduced.
2. The invention exposes the face of a person through the face feature exposure unit, so that the face area is converted into a plurality of mutually overlapped rectangular sets, the face position is accurately positioned in order to remove redundant overlapping information, the gray scale processing is carried out on the exposed face expression through the feature gray scale balancing unit, an image after gray scale balancing has the same number of pixel points on each gray scale, each gray scale corresponding to a gray scale histogram has the same height, gray scale balancing also belongs to the method for improving the image, the image after gray scale balancing has the maximum information amount and plays a role of image enhancement, so that the static feature recognition unit and the deformation feature recognition unit can more quickly capture the expression, the static feature recognition unit recognizes the static features of the expression after gray scale balancing, the small features of the expression are recognized, the deformation feature recognition unit recognizes the deformation features of the face, the large features of the face are obtained, the transfer point of image-text recognition is found, and the difficulty that a large number of expression features are required to face expression recognition is solved.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a schematic diagram of an image-text feature acquisition module according to the present invention;
FIG. 3 is a schematic diagram of an image-text characteristic analysis module according to the present invention;
FIG. 4 is a block diagram of an expression feature recognition module according to the present invention;
FIG. 5 is a schematic diagram of an architecture of a picture feature recognition module according to the present invention;
FIG. 6 is a block diagram of a text feature recognition module according to the present invention;
FIG. 7 is a block diagram of a speech feature recognition module according to the present invention;
FIG. 8 is a schematic diagram of a feature fusion analysis module according to the present invention.
Wherein, 1, the image-text characteristic acquisition module; 2. the image-text characteristic analysis module; 3. an expression feature identification module; 4. a picture feature identification module; 5. a text feature recognition module; 6. a voice feature recognition module; 7. a feature fusion analysis module; 101. an SVM data acquisition unit; 102. a picture feature acquisition unit; 103. an expression feature acquisition unit; 104. a text feature acquisition unit; 105. a voice feature acquisition unit; 106. a characteristic impurity removal and noise reduction unit; 107. a feature classification output unit; 201. a picture feature receiving unit; 202. an expression feature receiving unit; 203. a text feature receiving unit; 204. a voice feature receiving unit; 205. a graphic characteristic database; 206. a corresponding feature reading unit; 301. a face feature reading unit; 302. a human face feature exposure unit; 303. a characteristic gray level equalizing unit; 304. a static feature identification unit; 305. a deformation feature identification unit; 306. an expression emotion grading unit; 401. a picture feature reading unit; 402. a wavelet transform identification unit; 403. a color feature identification unit; 404. a line feature identification unit; 405. a picture emotion classification unit; 501. a text feature acquisition unit; 502. a text feature dimension reduction unit; 503. a part-of-speech feature recognition unit; 504. a sentence pattern feature identification unit; 505. a semantic feature recognition unit; 506. a text emotion grading unit; 601. a voice feature reading unit; 602. a speech translation recognition unit; 603. a prosodic feature recognition unit; 604. an amplitude feature identification unit; 605. an MFCC feature identification unit; 606. a speech emotion classification unit; 701. an SVM feature fusion unit; 702. a BP model generation unit; 703. and an emotion level identification unit.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment is as follows:
as shown in fig. 1-8, an embodiment of the present invention provides a graphics context emotion analysis system based on natural language processing, including a graphics context feature acquisition module 1, a graphics context feature analysis module 2, an expression feature recognition module 3, an image feature recognition module 4, a text feature recognition module 5, a voice feature recognition module 6, and a feature fusion analysis module 7, where an output end of the graphics context feature acquisition module 1 is connected to the graphics context feature analysis module 2 through an HUR signal, the graphics context feature analysis module 2 includes the expression feature recognition module 3, the image feature recognition module 4, the text feature recognition module 5, and the voice feature recognition module 6, and an output end of the graphics context feature analysis module 2 is connected to the feature fusion analysis module 7 through an HUR signal;
the image-text characteristic analysis module 2 comprises an image characteristic receiving unit 201, an expression characteristic receiving unit 202, a text characteristic receiving unit 203, a voice characteristic receiving unit 204, an image-text characteristic database 205 and a corresponding characteristic reading unit 206, the output ends of the image characteristic receiving unit 201, the expression characteristic receiving unit 202, the text characteristic receiving unit 203 and the voice characteristic receiving unit 204 are connected with the image-text characteristic database 205 through Ethernet signals, the output end of the image-text characteristic database 205 is connected with the corresponding characteristic reading unit 206 through HUR signals, the image characteristic receiving unit 201 included in the image-text characteristic analysis module 2 directly receives image characteristics sent by the characteristic classification output unit 107, the expression characteristic receiving unit 202 receives character expression characteristics sent by the characteristic classification output unit 107, the text characteristic receiving unit 203 receives text sentence characteristics sent by the characteristic classification output unit 107, the voice characteristic receiving unit 204 correspondingly receives character voice characteristics of the characteristic classification output unit 107 and stores the character voice characteristics into the image-text characteristic database 205, and the corresponding characteristic reading unit 206 starts the expression characteristic identification module 3, the image characteristic identification module 4, the text characteristic identification module 5 and the voice characteristic reading module 6 to read characteristics.
The image-text feature acquisition module 1 comprises an SVM data acquisition unit 101, a picture feature acquisition unit 102, an expression feature acquisition unit 103, a text feature acquisition unit 104, a voice feature acquisition unit 105, a feature impurity removal and noise reduction unit 106 and a feature classification output unit 107, the SVM data acquisition unit 101 comprises the picture feature acquisition unit 102, the expression feature acquisition unit 103, the text feature acquisition unit 104 and the voice feature acquisition unit 105 respectively, an output port of the SVM data acquisition unit 101 is connected with the feature impurity removal and noise reduction unit 106 through a HUR signal, an output port of the feature impurity removal and noise reduction unit 106 is connected with the feature classification output unit 107 through a HUR signal, image-text features are acquired through the image-text feature acquisition module 1, image-text features are extracted through an SVM vector machine through the SVM data acquisition unit 101, image features in images are separated and extracted through the picture feature acquisition unit 102 included in the SVM data acquisition unit 101, human images in images are separated and extracted through the expression feature acquisition unit 103, the images and texts in images are extracted through the text, the voice feature acquisition unit 105 correspondingly extracts the images, the image-text features extracted through the image-text features are output to a network classification module in a shadow classification mode, and the shadow feature extraction unit 107, and the image-text feature extraction module corrects the shadow classification features extracted by the image-text feature extraction unit 2.
The expression feature recognition module 3 comprises a face feature reading unit 301, a face feature exposure unit 302, a feature gray scale equalization unit 303, a static feature recognition unit 304, a deformation feature recognition unit 305, and an expression emotion grading unit 306, wherein the input end of the face feature reading unit 301 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the face feature reading unit 301 is connected with the face feature exposure unit 302 through an HUR signal, the output end of the face feature exposure unit 302 is connected with the feature gray scale equalization unit 303 through an HUR signal, the output end of the feature gray scale equalization unit 303 is connected with the static feature recognition unit 304 through an HUR signal, the output end of the static feature recognition unit 304 is connected with the deformation feature recognition unit 305 through an HUR signal, the output end of the deformation feature recognition unit 305 is connected with the expression emotion grading unit 306 through an HUR signal, and the expression feature recognition module 3 recognizes the collected human expression features, after the face feature reading unit 301 receives the image-text character expression feature data corresponding to the feature reading unit 206, the face feature exposure unit 302 exposes the face of a human, so that the face region is converted into a plurality of mutually overlapped rectangular sets, the face position is accurately positioned in order to remove redundant overlapping information, the exposed face expression is subjected to gray scale processing by the feature gray scale equalization unit 303, the image after gray scale equalization has the same number of pixel points on each level of gray scale, the gray scale corresponding to each level of the gray scale histogram has the same height, gray scale equalization also belongs to a method for improving the image, the image after gray scale equalization has the largest information amount and plays a role in image enhancement, so that the static feature recognition unit 304 and the deformation feature recognition unit 305 can capture the expression more quickly, the static feature recognition unit 304 recognizes the expression static features after gray level equalization, recognizes the smaller features of the expressions, the deformation feature recognition unit 305 recognizes the deformation features of the faces, obtains the larger features of the faces, represents the general shape of the faces, the features of the smaller feature values are used for describing the specific details of the faces, in the feature face set, the exaggeration degree and the recognition rate of various expressions show a positive correlation, the larger the exaggeration degree of the expressions is, the more obvious the face features are, the static and deformation features recognized by the static feature recognition unit 304 and the deformation feature recognition unit 305 enable the expression emotion grading unit 306 to grade the image emotion of the image and text, and obtain the expression emotion values corresponding to the image and text.
The picture feature recognition module 4 comprises a picture feature reading unit 401, a wavelet conversion recognition unit 402, a color feature recognition unit 403, a line feature recognition unit 404 and a picture emotion classification unit 405, wherein an input end of the picture feature reading unit 401 is connected with the corresponding feature reading unit 206 through an Ethernet signal, an output end of the picture feature reading unit 401 is connected with the wavelet conversion recognition unit 402 through a HUR signal, an output end of the wavelet conversion recognition unit 402 is connected with the color feature recognition unit 403 through a HUR signal, an output end of the color feature recognition unit 403 is connected with the line feature recognition unit 404 through a HUR signal, an output end of the line feature recognition unit 404 is connected with the picture emotion classification unit 405 through a HUR signal, the picture feature reading unit 401 included in the picture feature recognition module 4 reads and recognizes a picture and text image collected by the picture emotion feature collection module 1, the wavelet conversion recognition unit 402 converts the picture into a texture feature map for recognition, the color feature recognition unit 403 recognizes the color feature of the picture, the basic attribute saturation, hue, color and brightness of the picture emotion attribute, expression of the picture emotion feature expression is recognized by using the basic attribute recognition unit, the picture emotion feature recognition unit 505, and the picture emotion recognition unit judges the line feature classification data corresponding to the picture emotion classification unit 405.
The text feature identification module 5 comprises a text feature acquisition unit 501, a text feature dimension reduction unit 502, a part of speech feature identification unit 503, a sentence pattern feature identification unit 504, a semantic feature identification unit 505 and a text emotion classification unit 506, wherein the input end of the text feature acquisition unit 501 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the text feature acquisition unit 501 is connected with the text feature dimension reduction unit 502 through an HUR signal, the output end of the text feature dimension reduction unit 502 is connected with the part of speech feature identification unit 503 through an HUR signal, the output end of the part of speech feature identification unit 503 is connected with the sentence pattern feature identification unit 504 through an HUR signal, the output end of sentence pattern feature identification unit 504 is connected with semantic feature identification unit 505 through HUR signal, the output end of semantic feature identification unit 505 is connected with text emotion classification unit 506 through HUR signal, text feature identification module 5 identifies the text narration contained in the image and text, after the dimension reduction processing is carried out by text feature dimension reduction unit 502, part of speech feature identification unit 503 identifies the emotion part of speech feature of the text, sentence pattern feature identification unit 504 identifies the sentence pattern arrangement, semantic feature identification unit 505 identifies the emotion data contained in the text meaning, and finally text emotion classification unit 506 classifies the emotion degree of the image and text.
The speech feature recognition module 6 comprises a speech feature reading unit 601, a speech translation recognition unit 602, a prosodic feature recognition unit 603, an amplitude feature recognition unit 604, an MFCC feature recognition unit 605 and a speech emotion classification unit 606, wherein the input end of the speech feature reading unit 601 is connected with the corresponding feature reading unit 206 through an Ethernet signal, the output end of the speech feature reading unit 601 is connected with the speech translation recognition unit 602 through an HUR signal, the output end of the speech translation recognition unit 602 is connected with the prosodic feature recognition unit 603 through an HUR signal, the output end of the prosodic feature recognition unit 603 is connected with the amplitude feature recognition unit 604 through an HUR signal, the output end of the amplitude feature recognition unit 604 is connected with the MFCC feature recognition unit 605 through an HUR signal, the output end of the MFCC feature recognition unit 605 is connected with the speech emotion classification unit 606 through an HUR signal, the voice feature recognition module 6 recognizes the voice contained in the graphics and text, the voice feature reading unit 601 included in the voice feature recognition module 6 receives the voice feature collected by the graphics and text feature collection module 1, the voice translation recognition unit 602 performs voice translation and converts the voice feature into a voice utterance for recognition, the prosody feature recognition unit 603 recognizes the prosody of the corresponding voice, the prosody base frequency describes the frequency of voice vibration and is closely related to the size and tightness of vocal cords so as to judge the emotional change under different prosody, the MFCC feature recognition unit 605 recognizes the voice amplitude, the amplitude determines the size of the voice, when the voice is in an anger state or an astonishing state, the volume is increased, and when the voice is in a sad state, the volume is lower so as to judge the emotional state of the voice, the MFCC feature recognition unit 605 performs emphasis processing on the voice signal so as to increase the resolution of the high-frequency part of the voice, the speech recognized by the speech feature recognition module 6 is subjected to classification judgment by the speech emotion classification unit 606.
The feature fusion analysis module 7 comprises an SVM feature fusion unit 701, a BP model generation unit 702 and an emotion level identification unit 703, the SVM feature fusion unit 701 is connected with the BP model generation unit 702 through an Ethernet signal, the output end of the BP model generation unit 702 is connected with the emotion level identification unit 703 through an HUR signal, the input end of the SVM feature fusion unit 701 is connected with the expression emotion classification unit 306, the picture emotion classification unit 405, the text emotion classification unit 506 and the voice emotion classification unit 606 through an HUR signal, the SVM feature fusion unit 701 receives emotion classification of the expression feature identification module 3, the picture feature identification module 4, the text feature identification module 5 and the voice feature identification module 6 and then carries out fusion, the BP model generation unit 702 creates an emotion model, and the emotion level identification unit carries out final image-text emotion judgment 703.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. The utility model provides a picture and text emotion analysis system based on natural language processing, includes picture and text characteristic acquisition module (1), picture and text characteristic analysis module (2), expression characteristic identification module (3), picture characteristic identification module (4), text characteristic identification module (5), speech feature identification module (6) and feature fusion analysis module (7), its characterized in that: the output end of the image-text characteristic acquisition module (1) is connected with the image-text characteristic analysis module (2) through an HUR signal, the image-text characteristic analysis module (2) comprises an expression characteristic recognition module (3), an image characteristic recognition module (4), a text characteristic recognition module (5) and a voice characteristic recognition module (6), and the output end of the image-text characteristic analysis module (2) is connected with a characteristic fusion analysis module (7) through an HUR signal;
the image-text characteristic analysis module (2) comprises an image characteristic receiving unit (201), an expression characteristic receiving unit (202), a text characteristic receiving unit (203), a voice characteristic receiving unit (204), an image-text characteristic database (205) and a corresponding characteristic reading unit (206), wherein the output ends of the image characteristic receiving unit (201), the expression characteristic receiving unit (202), the text characteristic receiving unit (203) and the voice characteristic receiving unit (204) are connected with the image-text characteristic database (205) through Ethernet signals, and the output end of the image-text characteristic database (205) is connected with the corresponding characteristic reading unit (206) through HUR signals.
2. A system for teletext emotion analysis based on natural language processing, as claimed in claim 1, wherein: the image-text feature acquisition module (1) comprises an SVM data acquisition unit (101), an image feature acquisition unit (102), an expression feature acquisition unit (103), a text feature acquisition unit (104), a voice feature acquisition unit (105), a feature impurity removal and noise reduction unit (106) and a feature classification output unit (107), wherein the SVM data acquisition unit (101) respectively comprises the image feature acquisition unit (102), the expression feature acquisition unit (103), the text feature acquisition unit (104) and the voice feature acquisition unit (105), an output port of the SVM data acquisition unit (101) is connected with the feature impurity removal and noise reduction unit (106) through an HUR signal, and an output port of the feature impurity removal and noise reduction unit (106) is connected with the feature classification output unit (107) through an HUR signal.
3. A system for teletext emotion analysis based on natural language processing according to claim 1, characterised in that: expression feature identification module (3) include facial feature reading unit (301), facial feature exposure unit (302), characteristic grey scale equalization unit (303), static feature identification unit (304), deformation feature identification unit (305), expression emotion grading unit (306), the input that facial feature read unit (301) passes through the ethernet signal connection and corresponds characteristic reading unit (206), facial feature reading unit (301)'s output passes through HUR signal connection facial feature exposure unit (302), the output of facial feature exposure unit (302) passes through HUR signal connection characteristic grey scale equalization unit (303), the output of characteristic grey scale equalization unit (303) passes through HUR signal connection static feature identification unit (304), the output of static feature identification unit (304) passes through HUR signal connection deformation feature identification unit (305), the output of deformation feature identification unit (305) passes through HUR signal connection expression emotion grading unit (306).
4. A system for teletext emotion analysis based on natural language processing, as claimed in claim 1, wherein: picture characteristic identification module (4) include picture characteristic reading unit (401), wavelet conversion identification element (402), color characteristic identification element (403), line characteristic identification element (404) and picture emotion classification unit (405), the input of picture characteristic reading unit (401) passes through ethernet signal connection and corresponds characteristic reading unit (206), the output of picture characteristic reading unit (401) passes through HUR signal connection wavelet conversion identification element (402), the output of wavelet conversion identification element (402) passes through HUR signal connection color characteristic identification element (403), the output of color characteristic identification element (403) passes through HUR signal connection line characteristic identification element (404), the output of line characteristic identification element (404) passes through HUR signal connection picture emotion classification unit (405).
5. A system for teletext emotion analysis based on natural language processing, as claimed in claim 1, wherein: text feature identification module (5) include text feature acquisition unit (501), text feature dimension reduction unit (502), part of speech feature identification unit (503), sentence pattern feature identification unit (504), semantic feature identification unit (505) and text emotion classification unit (506), the input of text feature acquisition unit (501) passes through the ethernet signal connection and corresponds feature reading unit (206), text feature acquisition unit's (501) output passes through HUR signal connection text feature dimension reduction unit (502), text feature dimension reduction unit's (502) output passes through HUR signal connection part of speech feature identification unit (503), the output of part of speech feature identification unit (503) passes through HUR signal connection sentence pattern feature identification unit (504), the output of sentence pattern feature identification unit (504) passes through HUR signal connection semantic feature identification unit (505), the output of semantic feature identification unit (505) passes through HUR signal connection text emotion classification unit (506).
6. A system for teletext emotion analysis based on natural language processing, as claimed in claim 1, wherein: the voice feature recognition module (6) comprises a voice feature reading unit (601), a voice translation recognition unit (602), a prosody feature recognition unit (603), an amplitude feature recognition unit (604), a MFCC feature recognition unit (605) and a voice emotion classification unit (606), wherein the input end of the voice feature reading unit (601) is connected with the corresponding feature reading unit (206) through an Ethernet signal, the output end of the voice feature reading unit (601) is connected with the voice translation recognition unit (602) through a HUR signal, the output end of the voice translation recognition unit (602) is connected with the prosody feature recognition unit (603) through a HUR signal, the output end of the prosody feature recognition unit (603) is connected with the amplitude feature recognition unit (604) through a HUR signal, the output end of the amplitude feature recognition unit (604) is connected with the MFCC feature recognition unit (605) through a HUR signal, and the output end of the MFCC feature recognition unit (605) is connected with the voice emotion classification unit (606) through a HUR signal.
7. A system for teletext emotion analysis based on natural language processing according to claim 1, characterised in that: the feature fusion analysis module (7) comprises an SVM feature fusion unit (701), a BP model generation unit (702) and an emotion level identification unit (703), wherein the SVM feature fusion unit (701) is connected with the BP model generation unit (702) through an Ethernet signal, and the output end of the BP model generation unit (702) is connected with the emotion level identification unit (703) through an HUR signal.
8. A system for teletext emotion analysis based on natural language processing according to claim 7, characterised in that: the input end of the SVM feature fusion unit (701) is connected with the expression emotion classification unit (306), the picture emotion classification unit (405), the text emotion classification unit (506) and the voice emotion classification unit (606) through HUR signals.
CN202210833400.5A 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing Active CN115410061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210833400.5A CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210833400.5A CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Publications (2)

Publication Number Publication Date
CN115410061A true CN115410061A (en) 2022-11-29
CN115410061B CN115410061B (en) 2024-02-09

Family

ID=84157623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210833400.5A Active CN115410061B (en) 2022-07-14 2022-07-14 Image-text emotion analysis system based on natural language processing

Country Status (1)

Country Link
CN (1) CN115410061B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN111694959A (en) * 2020-06-08 2020-09-22 谢沛然 Network public opinion multi-mode emotion recognition method and system based on facial expressions and text information
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
CN114495217A (en) * 2022-01-14 2022-05-13 建信金融科技有限责任公司 Scene analysis method, device and system based on natural language and expression analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN111694959A (en) * 2020-06-08 2020-09-22 谢沛然 Network public opinion multi-mode emotion recognition method and system based on facial expressions and text information
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
CN114495217A (en) * 2022-01-14 2022-05-13 建信金融科技有限责任公司 Scene analysis method, device and system based on natural language and expression analysis

Also Published As

Publication number Publication date
CN115410061B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN108629338B (en) Face beauty prediction method based on LBP and convolutional neural network
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
CN115994230A (en) Intelligent archive construction method integrating artificial intelligence and knowledge graph technology
CN111666831B (en) Method for generating face video of speaker based on decoupling expression learning
CN113657115B (en) Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion
CN112269868A (en) Use method of machine reading understanding model based on multi-task joint training
CN112818951A (en) Ticket identification method
CN112836702B (en) Text recognition method based on multi-scale feature extraction
CN114724222A (en) AI digital human emotion analysis method based on multiple modes
CN111914734A (en) Theme emotion analysis method for short video scene
CN114972847A (en) Image processing method and device
CN113920561A (en) Facial expression recognition method and device based on zero sample learning
CN114022923A (en) Intelligent collecting and editing system
CN111274891B (en) Method and system for extracting pitch and corresponding lyrics of numbered musical notation image
CN112738555A (en) Video processing method and device
CN112163605A (en) Multi-domain image translation method based on attention network generation
CN115410061B (en) Image-text emotion analysis system based on natural language processing
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN115588227A (en) Emotion recognition method and device, electronic equipment and storage medium
CN115661904A (en) Data labeling and domain adaptation model training method, device, equipment and medium
CN112699236B (en) Deepfake detection method based on emotion recognition and pupil size calculation
CN115472182A (en) Attention feature fusion-based voice emotion recognition method and device of multi-channel self-encoder
CN115455136A (en) Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN204856534U (en) System of looking that helps is read to low eyesight based on OCR and TTS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant