CN112966161A - Public opinion category determination method, device and equipment - Google Patents

Public opinion category determination method, device and equipment Download PDF

Info

Publication number
CN112966161A
CN112966161A CN202110174033.8A CN202110174033A CN112966161A CN 112966161 A CN112966161 A CN 112966161A CN 202110174033 A CN202110174033 A CN 202110174033A CN 112966161 A CN112966161 A CN 112966161A
Authority
CN
China
Prior art keywords
target
public opinion
opinion information
characteristic
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110174033.8A
Other languages
Chinese (zh)
Inventor
郭宏
崔洋
马格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110174033.8A priority Critical patent/CN112966161A/en
Publication of CN112966161A publication Critical patent/CN112966161A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a public opinion category determination method, a public opinion category determination device and public opinion category determination equipment, wherein the method comprises the following steps: acquiring target public opinion information; determining a target classification label and a target characteristic word vector of target public opinion information; determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is obtained by utilizing a language representation model, a full connection layer and attention mechanism training; calculating TF-IDF values of all the feature words in the target feature word vector; determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing the public opinion category of the target public opinion information. In the embodiment of the specification, the characteristic value of the target public opinion information can be determined by combining the characteristic words and two dimensions of the whole text, so that the public opinion category can be objectively and accurately determined.

Description

Public opinion category determination method, device and equipment
Technical Field
The embodiment of the specification relates to the technical field of big data, in particular to a public opinion category determining method, device and equipment.
Background
With the continuous development of internet application, company public opinion information is continuously increasing every moment, and the number of the company public opinion information is huge and data is dynamically changed. The method can determine the public opinion condition of the company in time and play a vital role in controlling the investment risk.
In the prior art, generally, public opinion information in big data is classified in a coarse granularity according to the field to which the industry belongs, then the classified public opinion information is classified in a positive direction and a negative direction according to human experience, the positive direction and the negative direction public opinion information are counted, and data such as a counting result and the like are directly displayed visually. The public opinion information is processed according to the human experience, the requirements on the working experience and professional quality of business personnel are high, the public opinion information cannot be processed objectively to predict the current public opinion risk of a company, and the accuracy is low. Therefore, the current public opinion situation cannot be accurately predicted by adopting the technical scheme in the prior art, and the investment cannot be adjusted.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the specification provides a method, a device and equipment for determining public opinion categories, so as to solve the problems that the current public opinion categories are accurately predicted and investment is adjusted in the prior art.
An embodiment of the present specification provides a method for determining a public opinion category, including: acquiring target public opinion information; determining a target classification label and a target characteristic word vector of the target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of the target public opinion information; determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information; calculating TF-IDF values of all the feature words in the target feature word vector; determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing a public opinion category of the target public opinion information.
An embodiment of the present specification further provides a public opinion category determining apparatus, including: the acquisition module is used for acquiring target public opinion information; the first determination module is used for determining a target classification label and a target characteristic word vector of the target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of the target public opinion information; the second determination module is used for determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information; the calculation module is used for calculating TF-IDF values of all the feature words in the target feature word vector; a third determining module, configured to determine a feature value of the target public opinion information based on a weight of the target public opinion information and a TF-IDF value of each feature word in the target feature word vector; the characteristic value is used for representing a public opinion category of the target public opinion information.
The embodiment of the specification also provides a public opinion category determining device, which comprises a processor and a memory for storing processor executable instructions, wherein the processor executes the instructions to realize the steps of the public opinion category determining method.
The embodiments of the present specification also provide a computer readable storage medium, on which computer instructions are stored, and when executed, the instructions implement the steps of the method for determining the public opinion category.
The embodiment of the specification provides a method for determining a public opinion category, which can determine a plurality of characteristic words and a category of target public opinion information by acquiring the target public opinion information, determining a target classification label and a target characteristic word vector of the target public opinion information, and determining the target public opinion category. Further, the weight of the target public opinion information can be determined by using a target prediction model according to the target classification label and the target characteristic word vector so as to determine the importance of the target public opinion information. The target prediction model is obtained by combining a language representation model, a full connection layer and attention mechanism training, and can efficiently and accurately predict the weight of the public opinion information. In order to determine the importance of each feature word in the target public opinion information, the TF-IDF value of each feature word in the target feature word vector can be calculated, and the feature value of the target public opinion information is determined based on the weight of the target public opinion information and the TF-IDF value of each feature word in the target feature word vector, wherein the feature value is used for representing the public opinion category of the target public opinion information. Therefore, the characteristic value of the target public opinion information can be determined by integrating two dimensions of a single characteristic word and the whole target public opinion information, and the determined result is more objective and accurate.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the disclosure, are incorporated in and constitute a part of this specification, and are not intended to limit the embodiments of the disclosure. In the drawings:
fig. 1 is a schematic diagram illustrating a method for determining a public opinion category according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a structure of a target prediction model provided in accordance with an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a public opinion category determining apparatus according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a public opinion category determination device provided in an embodiment of the present specification.
Detailed Description
The principles and spirit of the embodiments of the present specification will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and to implement the embodiments of the present description, and are not intended to limit the scope of the embodiments of the present description in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, implementations of the embodiments of the present description may be embodied as a system, an apparatus, a method, or a computer program product. Therefore, the disclosure of the embodiments of the present specification can be embodied in the following forms: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
Although the flow described below includes operations that occur in a particular order, it should be appreciated that the processes may include more or less operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment).
Referring to fig. 1, the present embodiment provides a method for determining a public opinion category. The public opinion category determination method can be used for accurately predicting the current public opinion category. The method for determining the public opinion category may include the following steps.
S101: and acquiring target public opinion information.
In this embodiment, target public opinion information, which may be currently generated for predicting public opinion information of a current public opinion category, may be acquired. It is understood that the target public opinion information may also be historical public opinion information, which may be determined according to actual situations, and the embodiment of the present specification does not limit this.
In the present embodiment, the target public opinion information may be any piece of information distributed in channels such as wan, penbo, and company internal system, for example: company announcements, third party research reports, news reviews, etc. It should be understood that, in some embodiments, all information published for the same event may be integrated as a piece of public opinion information, which may be determined according to actual situations, and this is not limited by the embodiments of this specification.
In the present embodiment, the public opinion information can be obtained in a targeted manner, for example, the public opinion information of a target organization can be obtained for the target organization, and the target organization can be a company, an organization, or the like. The specific situation can be determined according to actual situations, and the embodiment of the present specification does not limit the specific situation.
In this embodiment, a manner of acquiring the target public opinion information may include: and pulling the data from a preset database, or inquiring according to a preset path. It is understood that, the sample data set may also be obtained in other possible manners, for example, target public opinion information is collected in multiple channels such as wan, penbo, and a company internal system by using a crawler technology according to a certain condition, which may be determined according to actual situations, and this is not limited in this specification.
S102: determining a target classification label and a target characteristic word vector of target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of target public opinion information.
In this embodiment, since the target public opinion information may include some redundant information, a target feature word vector may be determined, where the target feature word vector may include a plurality of feature words of the target public opinion information, and the feature words may be keywords extracted from the target public opinion information and used for representing the target public opinion information, for example: default, overdue, etc.
In this embodiment, since there may be a difference between the related fields of different public opinion information and a difference between the sensitive keywords that can be used to characterize the public opinion situation in different fields, the target classification label of the target public opinion information can be determined. The target classification label can be used for representing a domain category to which the target public opinion information belongs, such as: sports, finance, etc. It is understood that, in some embodiments, the information may be further subdivided into company bulletins, news comments and the like according to the related field, which may be determined according to actual situations, and the examples in this specification are not limited by contrast.
S103: determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information.
In this embodiment, the weight of the target public opinion information may be determined by using a target prediction model according to the target classification tag and the target feature word vector, where the weight of the target public opinion information is used to represent the importance of the target public opinion information, and the weight of the target public opinion information may be a value greater than 0. The target prediction model may be a model for predicting a weight of public opinion information, which is obtained by training using a language representation model (BERT), full Connected Layers (FC), and Attention Mechanism (Attention Mechanism).
In this embodiment, the input of the target prediction model may be a classification label and a feature vector of a piece of public opinion information, and the output may be a weight of the piece of public opinion information. The structure of the target prediction model may be as shown in fig. 2, and the target prediction model may include BERT, full connection layer, Attention layer (Attention), and Sigmoid functions. The input of the BERT can comprise feature word vectors (Tok1, Tok2 … … Tok N) and a Classification Label (CLS), the full-connection layer performs weighted sum on the features output by the BERT, the attention layer is used for judging which features input by the full-connection layer are more important, and the Sigmoid function is used as an activation function to map variables between 0 and 1, so that the weight of the piece of public opinion information is obtained.
In this embodiment, the BERT described above is intended to pre-train the deep bi-directional representation by jointly adjusting left and right contexts in all layers, so that only one additional output layer is needed to fine-tune the pre-trained BERT representation, thereby creating the most advanced models for a wide range of tasks without requiring extensive modification of the model structure specific to the task. The Sigmoid function is a common biological Sigmoid function, also called Sigmoid growth curve. In the information science, due to the properties of single increment and single increment of an inverse function, a Sigmoid function is often used as an activation function of a neural network, and variables are mapped to be between 0 and 1.
S104: and calculating TF-IDF values of all the feature words in the target feature word vector.
In this embodiment, TF-IDF values of each feature word in the target feature word vector may be calculated, where TF-IDF (Term Frequency-Inverse Document Frequency) is a commonly used weighting technique for information retrieval and data mining, TF represents Term Frequency (Term Frequency) and IDF represents Inverse text Frequency index (Inverse Document Frequency). TF-IDF is a statistical method to assess how important a word is for one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.
In this embodiment, the importance of each feature word in the target public opinion information can be determined according to the TF-IDF value of each feature word in the target feature word vector, thereby being beneficial to better determining the public opinion category reflected by the target public opinion information.
S105: determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing the public opinion category of the target public opinion information.
In the present embodiment, since the TF-IDF value of each feature in the target feature vector reflects the importance of each feature in the target public opinion information, the feature value of the target public opinion information can be determined based on the weight of the target public opinion information and the TF-IDF value of each feature in the target feature vector. The characteristic value can be used for representing the public opinion category of the target public opinion information.
In the embodiment, the characteristic value of the target public opinion information can be determined by integrating two dimensions of a single characteristic word and the whole target public opinion information, so that the determined result is more objective and accurate. The characteristic value can be a value larger than 0, and the public opinion category of the target public opinion information can be determined according to the size of the characteristic value. The public opinion category may include positive public opinion emotion and negative public opinion emotion, and of course, the public opinion category may also include other possible values, which may be determined according to actual situations, and this is not limited in this specification.
In this embodiment, the characteristic value of the determined target public opinion information, the target classification tag, the target characteristic word vector, and the target public opinion information may be distributed cloud-stored together, thereby ensuring traceability. Furthermore, the determined characteristic value of the target public opinion information can be displayed, so that investors can adjust investment strategies in time according to possible risks.
From the above description, it can be seen that the embodiments of the present specification achieve the following technical effects: the target classification label and the target characteristic word vector of the target public opinion information can be determined by acquiring the target public opinion information, and a plurality of characteristic words and the category of the target public opinion information can be determined. Further, the weight of the target public opinion information can be determined by using a target prediction model according to the target classification label and the target characteristic word vector so as to determine the importance of the target public opinion information. The target prediction model is obtained by combining a language representation model, a full connection layer and attention mechanism training, and can efficiently and accurately predict the weight of the public opinion information. In order to determine the importance of each feature word in the target public opinion information, TF-IDF values of each feature word in a target feature word vector can be calculated, and the feature value of the target public opinion information is determined based on the weight of the target public opinion information and the TF-IDF values of each feature word in the target feature word vector, wherein the feature values are used for representing the public opinion category of the target public opinion information. Therefore, the characteristic value of the target public opinion information can be determined by integrating two dimensions of a single characteristic word and the whole target public opinion information, and the determined result is more objective and accurate.
In one embodiment, obtaining target public opinion information may include: calling a web crawler according to a preset time interval to acquire a first public opinion information set, wherein the first public opinion information set comprises at least one piece of initial public opinion information. Furthermore, data cleaning can be carried out on the first public opinion information set to obtain a second public opinion information set, and the public opinion information in the second public opinion information set is used as the target public opinion information.
In the present embodiment, since the public opinion information has a certain timeliness, it is necessary to secure the target public opinion information that can be acquired in time. The corresponding web crawler can be called to perform real-time public opinion information collection, the preset time interval can be 10 minutes, 30 minutes or 1 hour and the like, and the specific time interval can be determined according to the actual situation, which is not limited in the embodiment of the specification. The web crawler is also called web spider and web robot, and is a program or script for automatically browsing the world wide web, and the crawler can verify hyperlinks and HTML (hypertext markup language) codes for web crawling.
In this embodiment, a piece of public opinion information may be used as a public opinion information set, or a plurality of pieces of public opinion information collected within a preset time interval may be used as a public opinion information set, and the public opinion information may be collected according to actual needs to avoid collecting irrelevant information. For example: public opinion risks of company A need to be monitored in real time, and public opinion information related to company A can be collected.
In this embodiment, since the initial public opinion information collected by the web crawler may include redundant information, the first public opinion information set may be data-cleaned, so that characters, images, and the like of non-text information in the first public opinion information set may be removed, and the validity of the target public opinion information may be ensured.
In one embodiment, determining a target classification label and a target feature word vector of target public opinion information may include: determining a target classification label by using the target public opinion information, and extracting at least one characteristic word of the target public opinion information by using a target identification model according to the target public opinion information and the classification label; the target recognition model is a model obtained by deep learning network training and used for extracting feature words in public sentiment information according to category labels. Further, a target feature word vector may be generated based on at least one feature word of the target public opinion information.
In this embodiment, the target public opinion information may be classified by using a bayesian classification method, a decision tree classification method, a neural network classification algorithm, or the like to obtain a classification label. It will of course be appreciated that other possible methods of classification may also be used, for example: the classification method based on fuzzy mathematics can be determined according to actual conditions, and the embodiment of the present specification does not limit the method.
In the present embodiment, in order to efficiently and accurately determine the feature words of the target public opinion information, at least one feature word included in the target public opinion information may be extracted by using a target recognition model trained by a deep learning network in advance. The input data of the target recognition model is a classification label of public opinion information and a text of the public opinion information, and the output data is at least one characteristic word contained in the public opinion information.
In this embodiment, since the sensitive keywords related to different domain categories are different, the feature words of the target public opinion information can be determined by using the target recognition model in combination with the classification label of the target public opinion information and the text of the target public opinion information, so that the feature word vector of the target public opinion information can be determined efficiently and accurately.
In one embodiment, calculating the TF-IDF value of each feature word in the target feature word vector may include: and counting the occurrence times of the target characteristic words in the target public opinion information in the target characteristic word vector, and determining the occurrence times of the characteristic words with the highest occurrence times in the target public opinion information, so that the word frequency of the target characteristic words can be calculated according to the occurrence times of the target characteristic words in the target public opinion information and the occurrence times of the characteristic words with the highest occurrence times in the target public opinion information. Furthermore, the total quantity of the public sentiment information recorded in the target database can be obtained, and the quantity of the public sentiment information containing the target characteristic words in the target database is determined, so that the inverse document frequency of the target characteristic words can be calculated according to the total quantity of the public sentiment information recorded in the target database and the quantity of the public sentiment information containing the target characteristic words in the target database. Further, the product of the word frequency of the target feature word and the inverse document frequency may be used as the TF-IDF value of the target feature word.
In this embodiment, the TF-IDF is a statistical method for evaluating the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. The word frequency and the inverse document frequency can be calculated respectively, and the TF-IDF value is calculated according to the word frequency and the inverse document frequency, so that the importance of each characteristic word in the target public opinion information can be determined.
In one embodiment, the word frequency of the target characteristic word can be calculated according to the following formula according to the frequency of the target characteristic word appearing in the target public opinion information and the frequency of the characteristic word appearing in the target public opinion information most frequently:
Figure BDA0002939924700000081
wherein, TFiThe word frequency of the target characteristic word is taken as the word frequency; n isiThe times of appearance of the target characteristic words in the target public opinion information are taken as the times; n isjThe characteristic words with the largest occurrence frequency in the target public opinion information are the occurrence frequencies of the characteristic words.
In one embodiment, the inverse document frequency of the target characteristic word can be calculated according to the following formula according to the total number of the public sentiments recorded in the target database and the number of the public sentiments containing the target characteristic word in the target database:
Figure BDA0002939924700000082
wherein, IDFiThe inverse document frequency of the target characteristic word; n is the total amount of public opinion information recorded in the target database; miThe number of public opinion information containing the target characteristic words in the target database is shown.
In this embodiment, the target database may be a database for recording public opinion information, and the target database may include all the historically collected public opinion information, so that the inverse document frequency may be calculated based on the data recorded in the target database.
In one embodiment, determining the feature value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each feature word in the target feature word vector may include: and respectively taking the product of the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector as the public opinion weight of each characteristic word, and calculating the characteristic value of each characteristic word according to the public opinion weight of each characteristic word. Further, the sum of the feature values of the feature words may be used as the feature value of the target public opinion information.
In this embodiment, the public sentiment weight of each feature word can be calculated through the combination of two dimensions of the word and the sentence, so that the determined public sentiment weight is more comprehensive and objective, and further, the feature value of each word can be calculated respectively according to the public sentiment weight of each feature word, so as to obtain the integral feature value of the target public sentiment information.
In one embodiment, the feature value of each feature word may be calculated according to the public sentiment weight of each feature word according to the following formula:
Figure BDA0002939924700000091
wherein, S (alpha)i) The feature value of the ith feature word is obtained; alpha is alphaiFor the public sentiment weight of the ith token, e is the euler number, which is an irrational number, and is approximately equal to 2.71828.
In a scene example, public opinion emotions can be directly displayed on a market place interface of each listed company in financial terminals such as Wande and Choice according to the calculated characteristic values of each public opinion information, so that investors can adjust investment strategies in time according to risks of the companies more conveniently. The Choice financial terminal is a financial data product of the profession under the oriental wealth flag. After the new characteristic value of the public opinion information is obtained and calculated, the data in the corresponding interface can be updated in time.
Based on the same inventive concept, the embodiment of the present specification further provides a public opinion category determination device, as in the following embodiments. Since the principle of solving the problem of the public opinion category determining device is similar to that of the public opinion category determining method, the implementation of the public opinion category determining device can be referred to the implementation of the public opinion category determining method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Fig. 3 is a block diagram of a public opinion category determining apparatus according to an embodiment of the present disclosure, and as shown in fig. 3, the apparatus may include: the acquisition module 301, the first determination module 302, the second determination module 303, the calculation module 304, and the third determination module 305, and the structure will be described below.
The acquisition module 301 may be configured to acquire target public opinion information;
a first determining module 302, configured to determine a target classification label and a target feature word vector of target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of target public opinion information;
a second determining module 303, configured to determine a weight of the target public opinion information by using the target prediction model according to the target classification tag and the target feature word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information;
the calculating module 304 may be configured to calculate a TF-IDF value of each feature word in the target feature word vector;
a third determining module 305, configured to determine a feature value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each feature word in the target feature word vector; the characteristic value is used for representing the public opinion category of the target public opinion information.
The embodiment of the present specification further provides an electronic device, which may specifically refer to the schematic structural diagram of the electronic device shown in fig. 4 based on the method for determining the public opinion category provided in the embodiment of the present specification, and the electronic device may specifically include an input device 41, a processor 42, and a memory 43. Among them, the input device 41 may be specifically used to input target public opinion information. The processor 42 can be specifically used for determining a target classification label and a target feature word vector of the target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of target public opinion information; determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information; calculating TF-IDF values of all the feature words in the target feature word vector; determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing the public opinion category of the target public opinion information. The memory 43 may be specifically used to store parameters such as TF-IDF values of the feature words, feature values of the target public opinion information, and the like.
In this embodiment, the input device may be one of the main apparatuses for information exchange between a user and a computer system. The input devices may include a keyboard, mouse, camera, scanner, light pen, handwriting input panel, voice input device, etc.; the input device is used to input raw data and a program for processing the data into the computer. The input device can also acquire and receive data transmitted by other modules, units and devices. The processor may be implemented in any suitable way. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The memory may in particular be a memory device used in modern information technology for storing information. The memory may include multiple levels, and in a digital system, memory may be used as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
In this embodiment, the functions and effects specifically realized by the electronic device can be explained by comparing with other embodiments, and are not described herein again.
The embodiment of the present specification further provides a computer storage medium based on a public opinion category determination method, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium may implement: acquiring target public opinion information; determining a target classification label and a target characteristic word vector of target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of target public opinion information; determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information; calculating TF-IDF values of all the feature words in the target feature word vector; determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing the public opinion category of the target public opinion information.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present specification described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present description are not limited to any specific combination of hardware and software.
Although the embodiments herein provide the method steps as described in the above embodiments or flowcharts, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In the case of steps where no causal relationship is logically necessary, the order of execution of the steps is not limited to that provided by the embodiments of the present description. When the method is executed in an actual device or end product, the method can be executed sequentially or in parallel according to the embodiment or the method shown in the figure (for example, in the environment of a parallel processor or a multi-thread processing).
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of embodiments of the present specification should, therefore, be determined not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The above description is only a preferred embodiment of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure, and it will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.

Claims (11)

1. A public opinion category determination method is characterized by comprising the following steps:
acquiring target public opinion information;
determining a target classification label and a target characteristic word vector of the target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of the target public opinion information;
determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information;
calculating TF-IDF values of all the feature words in the target feature word vector;
determining a characteristic value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each characteristic word in the target characteristic word vector; the characteristic value is used for representing a public opinion category of the target public opinion information.
2. The method of claim 1, wherein obtaining target public opinion information comprises:
calling a web crawler to collect a first public opinion information set according to a preset time interval; the first public opinion information set comprises at least one piece of initial public opinion information;
performing data cleaning on the first public opinion information set to obtain a second public opinion information set;
and using the public opinion information in the second public opinion information set as target public opinion information.
3. The method of claim 1, wherein determining a target classification label and a target feature word vector for the target public opinion information comprises:
determining the target classification label by using the target public opinion information;
extracting at least one characteristic word of the target public opinion information by using a target recognition model according to the target public opinion information and the classification label; the target recognition model is a model obtained by deep learning network training and used for extracting feature words in public sentiment information according to category labels;
generating the target characteristic word vector based on at least one characteristic word of the target public opinion information.
4. The method of claim 1, wherein calculating the TF-IDF value of each feature word in the target feature word vector comprises:
counting the occurrence times of the target characteristic words in the target characteristic word vector in the target public opinion information;
determining the occurrence frequency of the feature words with the maximum occurrence frequency in the target public opinion information;
calculating the word frequency of the target characteristic word according to the frequency of the target characteristic word appearing in the target public opinion information and the frequency of the characteristic word appearing in the target public opinion information with the maximum frequency;
acquiring the total quantity of public opinion information recorded in a target database;
determining the quantity of public opinion information containing the target characteristic words in the target database;
calculating the inverse document frequency of the target characteristic words according to the total quantity of the public sentiment information recorded in the target database and the quantity of the public sentiment information containing the target characteristic words in the target database;
and taking the product of the word frequency of the target characteristic word and the inverse document frequency as the TF-IDF value of the target characteristic word.
5. The method of claim 4, wherein the word frequency of the target feature word is calculated according to the following formula according to the number of occurrences of the target feature word in the target public opinion information and the number of occurrences of the feature word with the largest number of occurrences in the target public opinion information:
Figure FDA0002939924690000021
wherein, TFiThe word frequency of the target characteristic word is obtained; n isiThe number of times of appearance of the target characteristic word in the target public opinion information is obtained; n isjThe target public opinion information is the target public opinion information, and the target public opinion information is the target public opinion information.
6. The method according to claim 4, wherein the inverse document frequency of the target feature word is calculated according to the following formula according to the total number of public sentiment information recorded in the target database and the number of public sentiment information containing the target feature word in the target database:
Figure FDA0002939924690000022
wherein, IDFiThe document frequency is the inverse of the target characteristic word; n is the total amount of public opinion information recorded in the target database; miFor the object database to contain the objectThe number of public sentiment information of the characteristic words.
7. The method of claim 1, wherein determining the feature value of the target public opinion information based on the weight of the target public opinion information and the TF-IDF value of each feature word in the target feature word vector comprises:
respectively taking the product of the weight of the target public opinion information and the TF-IDF value of each feature word in the target feature word vector as the public opinion weight of each feature word;
calculating the characteristic value of each characteristic word according to the public opinion weight of each characteristic word;
and taking the sum of the characteristic values of the characteristic words as the characteristic value of the target public opinion information.
8. The method according to claim 7, wherein the feature value of each feature word is calculated from the public opinion weight of each feature word according to the following formula:
Figure FDA0002939924690000031
wherein, S (alpha)i) The feature value of the ith feature word is obtained; alpha is alphaiThe public opinion weight of the ith characteristic word.
9. An apparatus for determining public opinion category, comprising:
the acquisition module is used for acquiring target public opinion information;
the first determination module is used for determining a target classification label and a target characteristic word vector of the target public opinion information; the target characteristic word vector comprises a plurality of characteristic words of the target public opinion information;
the second determination module is used for determining the weight of the target public opinion information by using a target prediction model according to the target classification label and the target characteristic word vector; the target prediction model is a model for predicting the weight of the public opinion information obtained by utilizing a language representation model, a full connection layer and attention mechanism training, and the weight of the target public opinion information is used for representing the importance of the target public opinion information;
the calculation module is used for calculating TF-IDF values of all the feature words in the target feature word vector;
a third determining module, configured to determine a feature value of the target public opinion information based on a weight of the target public opinion information and a TF-IDF value of each feature word in the target feature word vector; the characteristic value is used for representing a public opinion category of the target public opinion information.
10. A public opinion category determination device comprising a processor and a memory for storing processor-executable instructions that, when executed by the processor, implement the steps of the method of any one of claims 1 to 8.
11. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 8.
CN202110174033.8A 2021-02-07 2021-02-07 Public opinion category determination method, device and equipment Pending CN112966161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110174033.8A CN112966161A (en) 2021-02-07 2021-02-07 Public opinion category determination method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110174033.8A CN112966161A (en) 2021-02-07 2021-02-07 Public opinion category determination method, device and equipment

Publications (1)

Publication Number Publication Date
CN112966161A true CN112966161A (en) 2021-06-15

Family

ID=76284262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110174033.8A Pending CN112966161A (en) 2021-02-07 2021-02-07 Public opinion category determination method, device and equipment

Country Status (1)

Country Link
CN (1) CN112966161A (en)

Similar Documents

Publication Publication Date Title
Minh et al. Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network
Swathi et al. An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis
WO2017067153A1 (en) Credit risk assessment method and device based on text analysis, and storage medium
Li et al. Stock prediction via sentimental transfer learning
CN109308323A (en) A kind of construction method, device and the equipment of causality knowledge base
Anglin Gather-narrow-extract: A framework for studying local policy variation using web-scraping and natural language processing
CN112347254B (en) Method, device, computer equipment and storage medium for classifying news text
Jammazi et al. Estimating and forecasting portfolio’s Value-at-Risk with wavelet-based extreme value theory: Evidence from crude oil prices and US exchange rates
CN109146152A (en) Incident classification prediction technique and device on a kind of line
Loyola et al. UNSL at eRisk 2021: A Comparison of Three Early Alert Policies for Early Risk Detection.
Smailović Sentiment analysis in streams of microblogging posts
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN109977231B (en) Depressed mood analysis method based on emotional decay factor
Achilles et al. Using Surface and Semantic Features for Detecting Early Signs of Self-Harm in Social Media Postings.
Voronov et al. Forecasting popularity of news article by title analyzing with BN-LSTM network
CN112966161A (en) Public opinion category determination method, device and equipment
Zhang et al. Stock trend forecasting method based on sentiment analysis and system similarity model
US11822609B2 (en) Prediction of future prominence attributes in data set
CN113051396B (en) Classification recognition method and device for documents and electronic equipment
CN115048487A (en) Artificial intelligence-based public opinion analysis method, device, computer equipment and medium
KR101613397B1 (en) Method and apparatus for associating topic data with numerical time series
CN113112299A (en) Noble metal price trend prediction method and device
CN109408531B (en) Method and device for detecting slow-falling data, electronic equipment and storage medium
KR101987301B1 (en) Sensibility level yielding system through web data Analysis associated with a stock and a social data and Controlling Method for the Same
Nakano et al. Enhancing Sentiment Analysis based Investment by Large Language Models in Japanese Stock Market

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination