CN116340511B

CN116340511B - Public opinion analysis method combining deep learning and language logic reasoning

Info

Publication number: CN116340511B
Application number: CN202310165134.8A
Authority: CN
Inventors: 肖林; 黄国柱; 杨洲杰
Original assignee: Shenzhen Shenyi Technology Co ltd
Current assignee: Shenzhen Shenyi Technology Co ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-09-15
Anticipated expiration: 2043-02-16
Also published as: CN116340511A

Abstract

The application provides a public opinion analysis method combining deep learning and language logic reasoning, which comprises the following steps: topic data are obtained, identified and subjected to format conversion, and text data are extracted from the topic data; performing text classification and word vector modeling on the text data to extract first related information of the text data; carrying out structural analysis on the first related information to obtain first relation data of each subject term; determining a first keyword set consisting of a plurality of first keywords and first attribute data of each first keyword according to the first relation data; carrying out emotion classification on the first keyword set to obtain first emotion classification data and analyzing the first emotion classification data to obtain a first public opinion analysis result; carrying out validity verification on the first public opinion analysis result to obtain a first verification result; and correcting the first public opinion analysis result according to the first verification result. According to the scheme provided by the application, the public opinion analysis can be accurately performed by utilizing the deep learning technology and natural language logic reasoning.

Description

Public opinion analysis method combining deep learning and language logic reasoning

Technical Field

The application relates to the technical field of industrial control, in particular to a public opinion analysis method combining deep learning and language logic reasoning.

Background

With the rapid development of network technology, the internet has become an important platform for the public to acquire information and express views. The network public opinion is the state of opinion or speaking with certain influence and tendency expressed by public on the hot spot problem spread on the internet, and the public opinion is used for publishing the opinion of the social problem or expressing the speaking and the opinion with strong influence and tendency. The public opinion condition of the network can reflect the social state, and the effective public opinion monitoring and analysis can help to lock hot topics, quickly know the emotion development of network citizens and clear the current situation of public opinion, and simultaneously help to guide the trend of the public opinion and avoid public opinion crisis. Aiming at the description of the public opinion event, mainly from news texts on network media and social platforms similar to newness microblogs, people directly inform others or indirectly know related information of the public opinion event from others through reading, forwarding, commenting and the like. There is a need for a public opinion system that can extract features from these event information and further accurately analyze the current situation and propagation trends of public opinion.

Disclosure of Invention

Based on the problems, the application provides a public opinion analysis method combining deep learning and language logic reasoning.

In view of the above, an aspect of the present application provides a public opinion analysis method combining deep learning and language logic reasoning, including:

acquiring topic data related to a specific topic according to a preset trigger rule;

identifying the topic data, converting the format of the topic data, and extracting text data from the topic data;

performing text classification and word vector modeling on the text data by using a pre-trained first neural network to extract first related information of the text data;

carrying out structural analysis on the first related information by using a preset natural language logic reasoning model to obtain first relation data of each subject term;

processing the first relation data by using a pre-trained keyword determination model, so as to determine a first keyword set consisting of a plurality of first keywords and first attribute data of each first keyword in the plurality of first keywords from the subject words;

carrying out emotion classification on the first keyword set by using the trained emotion analysis model to obtain first emotion classification data;

analyzing the first emotion classification data to obtain a first public opinion analysis result;

performing validity verification on the first public opinion analysis result by using a clustering analysis, statistical analysis and accuracy test method to obtain a first verification result;

and correspondingly correcting the first public opinion analysis result according to the first verification result.

Optionally, the pre-trained first neural network is obtained by training by using a machine learning technology and a deep neural network and combining a corpus, so as to perform text classification on the text data, thereby analyzing first related information related to different public opinion categories.

Optionally, the step of performing structural analysis on the first related information by using a preset natural language logic inference model to obtain first relationship data of each subject term includes:

and the preset natural language logic reasoning model identifies each subject word in the first related information by utilizing a natural language processing technology so as to carry out statistical analysis on the topic data, thereby obtaining an accurate public opinion analysis conclusion.

Optionally, the step of acquiring topic data related to a specific topic according to a preset trigger rule includes:

extracting association data of the specific topics from the preset trigger rules and extracting association words from the association data;

performing semantic similarity analysis based on a word vector technology to obtain derivative related words similar to the word vectors of the related words;

and acquiring related texts, audios, images and videos according to the related words and the derived related words to serve as the topic data.

Optionally, the step of extracting text data from the topic data after identifying and format converting includes:

recognizing first voice data and first tone data in the audio, and obtaining audio description text data through a voice recognition algorithm and a semantic recognition algorithm;

recognizing first text data, first facial expression data and first expression symbol data in the image, and combining an expression recognition algorithm to obtain image description text data;

identifying second voice data, second text data, second facial expression data and second expression symbol data in the video, and combining a voice recognition algorithm, a semantic recognition algorithm and an expression recognition algorithm to obtain video description text data;

converting the text, the audio description text data, the image description text data and the video description text data into a unified standardized format to obtain initial text data;

extracting the text data from the initial text data.

Optionally, the step of converting the text, the audio description text data, the image description text data and the video description text data into a unified standardized format to obtain initial text data includes:

performing word segmentation, expression recognition and nonsensical symbol removal and word stopping operation on the text, the audio description text data, the image description text data and the video description text data by using a word segmentation model, an expression symbol recognition model and a stop word recognition model to obtain text data to be processed;

and carrying out standardization processing on the text data to be processed to obtain the initial text data.

Optionally, after the step of acquiring topic data related to the specific topic according to the preset triggering rule, the method further includes:

and acquiring a network address, a user account and user identity characteristic information corresponding to the topic data to generate a unique source identifier corresponding to the topic data.

Optionally, the step of normalizing the text data to be processed to obtain the initial text data includes:

grouping the text data to be processed according to the source identifier to obtain a plurality of grouped text data subgroups;

classifying the text data subgroups according to the original generation time, language, region and each dimension of the source person information to obtain a plurality of text data groups;

and carrying out standardization processing on the plurality of text data groups to obtain the initial text data.

for any one first text data subgroup in the plurality of text data subgroups, taking a first single word after word segmentation of the first text data subgroup as a reference word;

the description structure of each individual word after word segmentation is established, specifically:

creating a description structure file;

acquiring a start word, an intermediate word, an end word, a distance between the end word and the reference word and the occurrence number of the reference word of each individual word, and recording the initial word, the intermediate word and the end word in the description structure file;

repeating the steps until all the text data subgroups are iterated.

and for each first text data group, carrying out statistical analysis on all the individual words according to the occurrence times and the interval distance, and constructing characteristic structure data of the first text data group by using the ' individual word ', the occurrence times and the interval distance '.

By adopting the technical scheme of the application, the public opinion analysis method combining deep learning and language logic reasoning comprises the following steps: acquiring topic data related to a specific topic according to a preset trigger rule; identifying the topic data, converting the format of the topic data, and extracting text data from the topic data; performing text classification and word vector modeling on the text data by using a pre-trained first neural network to extract first related information of the text data; carrying out structural analysis on the first related information by using a preset natural language logic reasoning model to obtain first relation data of each subject term; processing the first relation data by using a pre-trained keyword determination model, so as to determine a first keyword set consisting of a plurality of first keywords and first attribute data of each first keyword in the plurality of first keywords from the subject words; carrying out emotion classification on the first keyword set by using the trained emotion analysis model to obtain first emotion classification data; analyzing the first emotion classification data to obtain a first public opinion analysis result; performing validity verification on the first public opinion analysis result by using a clustering analysis, statistical analysis and accuracy test method to obtain a first verification result; and correspondingly correcting the first public opinion analysis result according to the first verification result. According to the scheme provided by the application, the public opinion analysis can be accurately performed by utilizing the deep learning technology and natural language logic reasoning.

Drawings

FIG. 1 is a flow chart of a method for public opinion analysis combining deep learning and language logic reasoning according to an embodiment of the present application.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

A public opinion analysis method combining deep learning and language logic reasoning provided according to some embodiments of the present application is described below with reference to fig. 1.

As shown in FIG. 1, one embodiment of the present application provides a public opinion analysis method combining deep learning and language logic reasoning, comprising:

in this step, the pre-trained first neural network is obtained by training by using a machine learning technology and a deep neural network and combining a corpus, so as to perform text classification on the text data, thereby analyzing first related information related to different public opinion categories.

in the step, the first related information is subjected to structural analysis by using a preset natural language logic reasoning model to obtain first relation data of each subject term, so that the accuracy of the data can be ensured.

In some possible embodiments of the present application, the step of performing structural analysis on the first related information by using a preset natural language logical inference model to obtain first relationship data of each subject term includes:

in this step, the first relationship data is processed by using a keyword determination model trained in advance in conjunction with a neural network, so that a first keyword set composed of a plurality of first keywords and first attribute data of each of the plurality of first keywords is determined from the respective subject words.

in the step, the trained emotion analysis model is utilized to carry out emotion classification on the first keyword set, first emotion classification data are obtained, and efficiency and accuracy can be improved. The emotion analysis model can be obtained by training the following method: constructing a first emotion dictionary (generally comprising general emotion words, degree adverbs, negative words, field words and the like); calculating the semantic similarity of the words and the reference emotion word set by using a semantic similarity calculation method; using a first emotion dictionary and analyzing the special structure and emotion tendency words of the text sentences, and using a weight algorithm to carry out emotion classification; assigning different weights to the emotion words according to the emotion intensity to obtain a second emotion dictionary; dividing the second emotion dictionary into a training set and a testing set; extracting text emotion characteristics from the test set by using a neural network, and constructing a basic emotion analysis model; and testing the basic emotion analysis model by using a test set, and correcting the basic emotion analysis model according to a test result to obtain the emotion analysis model.

It can be understood that in the embodiment of the present application, according to a preset trigger rule, the implementation of obtaining topic data related to a specific topic may use a crawler technology to obtain data related to a specific topic in an internet platform. Specifically, the related data of the specific topic can be extracted from the preset triggering rule, and related words can be extracted from the related data; performing semantic similarity analysis based on a word vector technology to obtain derivative related words similar to the word vectors of the related words; and acquiring related texts, audios, images and videos according to the related words and the derived related words to serve as the topic data. For example, according to the related words and the derived related words, in the internet platforms such as a tremble sound heat search list, a hundred-degree heat list, a micro-blog heat search list, a head heat search list and the like, the heat search topics containing the related words and the derived related words are searched, and then topics with highest heat, such as tremble sound and video, audio, micro-blog or head articles, pictures and the like, are selected under each topic, and corresponding comment data are obtained. In one embodiment of the application, the acquired content consists essentially of: web links to articles/videos/pictures of public opinion news, etc., posting account numbers, publisher information, source websites, titles, texts, posting times, forwarding numbers, comment numbers, endorsements, etc. In order to improve accuracy and timeliness, in the embodiment of the present application, the preset trigger rule may be set every preset time period.

By adopting the technical scheme of the embodiment, topic data related to a specific topic is obtained according to a preset trigger rule; identifying the topic data, converting the format of the topic data, and extracting text data from the topic data; performing text classification and word vector modeling on the text data by using a pre-trained first neural network to extract first related information of the text data; carrying out structural analysis on the first related information by using a preset natural language logic reasoning model to obtain first relation data of each subject term; processing the first relation data by using a pre-trained keyword determination model, so as to determine a first keyword set consisting of a plurality of first keywords and first attribute data of each first keyword in the plurality of first keywords from the subject words; carrying out emotion classification on the first keyword set by using the trained emotion analysis model to obtain first emotion classification data; analyzing the first emotion classification data to obtain a first public opinion analysis result; performing validity verification on the first public opinion analysis result by using a clustering analysis, statistical analysis and accuracy test method to obtain a first verification result; and correspondingly correcting the first public opinion analysis result according to the first verification result. According to the scheme provided by the application, the public opinion analysis can be accurately performed by utilizing the deep learning technology and natural language logic reasoning.

In some possible embodiments of the present application, the step of extracting text data from the topic data after identifying and format converting includes:

extracting the text data from the initial text data.

It can be understood that, in order to make public opinion analysis more accurate, data of different formats and different platforms need to be acquired (namely, related text, audio, image and video are acquired as the topic data according to the related words and the derived related words), but the formats of the data are different and the source platforms are different, and standardized processing needs to be performed in advance, in this embodiment, the first voice data and the first tone data in the audio can be identified (the tone data can express the emotion of a speaker), and audio description text data is obtained through a voice recognition algorithm and a semantic recognition algorithm; recognizing first text data, first facial expression data and first expression symbol data (such as WeChat expression symbol) in the image, and combining an expression recognition algorithm (which can be trained by combining a neural network algorithm) to obtain image description text data; identifying second voice data, second text data, second facial expression data and second expression symbol data in the video, and combining a voice recognition algorithm, a semantic recognition algorithm and an expression recognition algorithm to obtain video description text data; converting the text, the audio description text data, the image description text data and the video description text data into a unified standardized format to obtain initial text data; extracting the text data from the initial text data.

In some possible embodiments of the present application, the step of converting the text, the audio description text data, the image description text data and the video description text data into a unified standardized format to obtain initial text data includes:

It can be understood that, in order to improve accuracy and efficiency of data analysis, in this embodiment, word segmentation, expression recognition, and operations of removing meaningless symbols and disabling words are performed on the text, the audio description text data, the image description text data, and the video description text data by using a word segmentation model, an expression recognition model, and a disabling word recognition model, so as to obtain text data to be processed; and carrying out standardization processing on the text data to be processed to obtain the initial text data.

In some possible embodiments of the present application, after the step of obtaining topic data related to a specific topic according to a preset triggering rule, the method further includes:

It can be understood that in the embodiment of the present application, the network address, the user account and the user identity feature information corresponding to the topic data are obtained to generate the unique source identifier corresponding to the topic data, so as to facilitate classification processing of the data.

In some possible embodiments of the present application, the step of normalizing the text data to be processed to obtain the initial text data includes:

It can be appreciated that, in order to perform public opinion analysis from different angles to ensure the comprehensiveness of the analysis, in this embodiment, the text data to be processed is grouped according to the source identifier, so as to obtain a plurality of grouped text data subgroups; classifying the text data subgroups according to the original generation time, language, region and each dimension of the source person information to obtain a plurality of text data groups; and carrying out standardization processing on the plurality of text data groups to obtain the initial text data.

creating a description structure file;

repeating the steps until all the text data subgroups are iterated.

It can be understood that, in order to accurately analyze the relationship between the individual words/subject words, in this embodiment, a description structure of each individual word after word segmentation is established, specifically: creating a description structure file; and acquiring the start word, the middle word, the end word, the interval distance between the end word and the reference word and the occurrence number of each individual word, and recording the acquired initial word, the middle word and the end word into the description structure file.

It will be appreciated that, in order to facilitate analysis of the structure of text data, in this embodiment, for each of the first text data subgroups, statistical analysis is performed on all the individual words according to the occurrence number and the separation distance, and feature structure data of the first text data subgroups is constructed with the ' individual word ', occurrence number, and separation distance '.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Although the present application is disclosed above, the present application is not limited thereto. Variations and modifications, including combinations of the different functions and implementation steps, as well as embodiments of the software and hardware, may be readily apparent to those skilled in the art without departing from the spirit and scope of the application.

Claims

1. A public opinion analysis method combining deep learning and language logic reasoning is characterized by comprising the following steps:

2. The public opinion analysis method of claim 1, wherein the pre-trained first neural network is obtained by training with a machine learning technique and a deep neural network in combination with a corpus to perform text classification on the text data, thereby analyzing first related information related to different public opinion categories.

3. The public opinion analysis method of claim 2, wherein the step of performing structural analysis on the first related information using a preset natural language logical inference model to obtain first relationship data of each subject term comprises:

4. The public opinion analysis method of claim 3, wherein the step of obtaining topic data related to a specific topic according to a preset trigger rule comprises:

5. The public opinion analysis method of claim 4, wherein the step of extracting text data from the topic data after identifying and format converting comprises:

extracting the text data from the initial text data.

6. The public opinion analysis method of claim 5, wherein the step of converting the text, the audio description text data, the image description text data, and the video description text data into a unified standardized format results in initial text data, comprises:

7. The public opinion analysis method according to claim 6, wherein after the step of obtaining topic data related to a specific topic according to a preset trigger rule, further comprises:

8. The public opinion analysis method of claim 7, wherein the step of normalizing the text data to be processed to obtain the initial text data comprises:

9. The public opinion analysis method of claim 8, wherein the step of normalizing the text data to be processed to obtain the initial text data comprises:

creating a description structure file;

repeating the steps until all the text data subgroups are iterated.

10. The public opinion analysis method of claim 9, wherein the step of normalizing the text data to be processed to obtain the initial text data comprises:

and for each first text data group, carrying out statistical analysis on all the individual words according to the occurrence times and the interval distance, and constructing characteristic structure data of the first text data group by using the individual words, the occurrence times and the interval distance.