CN115329173A - Method and device for determining enterprise credit based on public opinion monitoring - Google Patents

Method and device for determining enterprise credit based on public opinion monitoring Download PDF

Info

Publication number
CN115329173A
CN115329173A CN202211008504.9A CN202211008504A CN115329173A CN 115329173 A CN115329173 A CN 115329173A CN 202211008504 A CN202211008504 A CN 202211008504A CN 115329173 A CN115329173 A CN 115329173A
Authority
CN
China
Prior art keywords
credit
enterprise
model
machine learning
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211008504.9A
Other languages
Chinese (zh)
Inventor
陈博远
陈永录
张儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211008504.9A priority Critical patent/CN115329173A/en
Publication of CN115329173A publication Critical patent/CN115329173A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a method and a device for determining enterprise credit based on public opinion monitoring, wherein the method comprises the following steps: acquiring news text information about enterprises in the network public opinions through a network crawler; preprocessing the acquired news text information to generate a preprocessed file; inputting the preprocessed files into a machine learning model trained in advance to obtain enterprise credit or operation conditions; and determining the credit of the enterprise according to the enterprise credit or the business condition and the pre-established mapping relation between the enterprise and the credit level. The method and the system have the advantage that the real operating conditions of the enterprises are known by acquiring the information of some loan enterprises on the network. Huge text resources are obtained from a website through a web crawler technology, the text resources are preprocessed, processed data are screened out related information of loan enterprises through a text classification technology, and finally key information in the information is extracted through an entity extraction technology to serve as an important basis for evaluation, so that the technical effect of truly and objectively reflecting the credit condition of the enterprises is achieved.

Description

Public opinion monitoring-based enterprise credit determination method and device
Technical Field
The application belongs to the technical field of emotion recognition, and particularly relates to a method and a device for determining enterprise credit based on public opinion monitoring.
Background
When a bank delivers a loan to an enterprise, the bank must know the operation status of the enterprise, generally through a financial statement or the like. However, businesses may have a financial statement that is cost to obtain a loan. The reliability of conclusions drawn based on financial statements can also be problematic if the authenticity of the financial statements cannot be guaranteed.
Disclosure of Invention
The application provides a method and a device for determining enterprise credit based on public opinion monitoring, which are used for at least solving the problem that the current understanding of the business conditions of an enterprise can only draw conclusions through financial statements.
According to a first aspect of the application, a method for determining enterprise credit based on public opinion monitoring is provided, which comprises the following steps:
acquiring news text information about enterprises in the network public sentiment through a network crawler;
preprocessing the acquired news text information to generate a preprocessed file;
and inputting the preprocessed file into a pre-trained machine learning model to obtain the mapping relation between the enterprises and the credit level.
In an embodiment, the preprocessing the obtained news text information to generate a preprocessed file includes:
converting the news text information into digital information;
and dividing the digital information by taking the vocabulary as a unit, and removing pronouns, prepositions and stop words.
In one embodiment, the method for determining the credit of the enterprise based on the public opinion monitoring further comprises the following steps:
screening the selected multiple machine learning models; the plurality of machine learning models includes: SVM model, LGBM model, CNN model, and LSTM model.
In one embodiment, the filtering of the selected plurality of machine learning models includes:
respectively training an SVM model, an LGBM model, a CNN model and an LSTM model;
calculating the accuracy, recall rate and F1 value of the SVM model, LGBM model, CNN model and LSTM model according to the judged accurate positive example number;
and comparing the accuracy, the recall rate and the F1 value to determine a model with optimal comprehensive performance as a final machine learning model.
In one embodiment, the training process of the machine learning model comprises:
establishing a training set according to the acquired historical news text information;
inputting the training set into the machine learning model for training so that the machine learning model selects information related to the credit or the business condition of the enterprise from the training set.
In one embodiment, the method for determining the credit of the enterprise based on the public opinion monitoring further comprises the following steps:
an entity is obtained from information related to enterprise credit or operation conditions by applying an entity extraction technology;
and establishing a mapping relation between the enterprise and the credit level according to the obtained entities.
In one embodiment, the method for determining the credit of the enterprise based on the public opinion monitoring further comprises the following steps:
in the training process of the machine learning model, performing parameter tuning on the machine learning model through a particle swarm algorithm according to a training feedback result;
and (5) retraining the adjusted machine learning model.
According to another aspect of the present application, there is also provided a credit determination device for an enterprise based on public opinion monitoring, including:
the information crawling unit is used for acquiring news text information about enterprises in the network public sentiment through a network crawler;
the preprocessing unit is used for preprocessing the acquired news text information to generate a preprocessed file;
and the enterprise credit evaluation unit is used for inputting the preprocessed files into the pre-trained machine learning model to obtain the mapping relation between the enterprise and the credit level.
In one embodiment, the pre-processing unit comprises:
the conversion module is used for converting the news text information into digital information;
and the word segmentation and elimination module is used for segmenting the digital information by taking the vocabulary as a unit and removing pronouns, prepositions and stop words.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the model screening unit is used for screening the selected multiple machine learning models; the plurality of machine learning models includes: SVM model, LGBM model, CNN model, and LSTM model.
In one embodiment, the model screening unit includes:
the training module is used for respectively training the SVM model, the LGBM model, the CNN model and the LSTM model;
the calculation module is used for calculating the accuracy, the recall rate and the F1 value of the SVM model, the LGBM model, the CNN model and the LSTM model according to the judged accurate positive example number;
and the comparison module is used for comparing the accuracy, the recall rate and the F1 value to determine a model with the optimal comprehensive performance as a final machine learning model.
In one embodiment, the training process of the machine learning model includes:
establishing a training set according to the acquired historical news text information;
inputting the training set into the machine learning model for training, so that the machine learning model selects information related to the credit or the business condition of the enterprise from the training set.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the entity extraction module is used for acquiring an entity from information related to enterprise credit or operation conditions by applying an entity extraction technology;
and the mapping relation establishing module is used for establishing the mapping relation between the enterprises and the credit level according to the obtained entities.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the parameter tuning module is used for carrying out parameter tuning on the machine learning model through a particle swarm algorithm according to a training feedback result in the training process of the machine learning model;
and the retraining module is used for retraining the adjusted machine learning model.
The method and the system have the advantage that the real operating conditions of the enterprises are known by acquiring the information of some loan enterprises on the network. The method comprises the steps of acquiring huge text resources from a website through a web crawler technology, preprocessing the text resources, screening relevant information of loan enterprises from the processed data through a text classification technology, and finally extracting key information in the information through an entity extraction technology to serve as an important basis for evaluation, thereby realizing the technical effect of truly and objectively reflecting the credit condition of the enterprises.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a method for determining enterprise credit based on public opinion monitoring provided by the present application.
Fig. 2 is a flowchart of a method for preprocessing acquired news text information to generate a preprocessed file in the embodiment of the present application.
Fig. 3 is a flowchart of a method for screening a plurality of selected machine learning models in the embodiment of the present application.
Fig. 4 is a flowchart of a training process of a machine learning model in the embodiment of the present application.
Fig. 5 is a flowchart illustrating a method for determining enterprise credit based on public opinion monitoring according to another embodiment of the present application.
Fig. 6 is a flowchart of a method for determining enterprise credit based on public opinion monitoring according to another embodiment of the present application.
FIG. 7 shows two basic models of Word2vec in the embodiment of the present application.
FIG. 8 is a simplified block diagram of an LSTM memory cell according to an embodiment of the present invention.
Fig. 9 is a diagram showing the result of the procedure after initialization in the embodiment of the present application.
FIG. 10 is a diagram of hidden Markov models in an embodiment of the present application.
Fig. 11 is a block diagram illustrating a structure of an enterprise credit determination device based on public opinion monitoring according to the present application.
Fig. 12 is a block diagram of a preprocessing unit in the embodiment of the present application.
Fig. 13 is a block diagram of a structure of a model screening unit in the embodiment of the present application.
Fig. 14 is a specific implementation of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In view of the problems in the background art, most information on a network is often stored in a text format, and therefore, natural language processing techniques are used to analyze and process the information, and various theories and methods for efficient communication between a person and a computer using natural language are studied. The application mainly applies the text classification technology and the entity extraction technology. The text classification technology is defined by using a computer or other machine to automatically mark a text data set according to certain standards. Today, text classification has been developed in various application fields such as information retrieval, digital libraries, public opinion analysis, emotion analysis, and the like for the maximum development of network resources. The entity extraction technology aims to identify named entities such as names of people, places, organizational structures and the like in the corpus. In the field of natural language processing application, named entity recognition is a basic task of multiple natural language processing applications such as information retrieval, knowledge graph, machine translation, emotion analysis, question-answering system and the like.
According to an aspect of the present application, there is provided a method for determining enterprise credit based on public opinion monitoring, as shown in fig. 1, including:
s101: and acquiring news text information about the enterprises in the network public sentiment through the network crawler.
S102: and preprocessing the acquired news text information to generate a preprocessed file.
S103: inputting the preprocessed files into a machine learning model trained in advance to obtain enterprise credit or operation conditions;
s104: and determining the credit of the enterprise according to the credit or the business condition of the enterprise and the pre-established mapping relation between the enterprise and the credit level.
In an embodiment, the preprocessing the obtained news text information to generate a preprocessed file, as shown in fig. 2, includes:
s201: the news text information is converted into digital information.
S202: and dividing the digital information by taking the vocabulary as a unit and removing pronouns, prepositions and stop words.
In one embodiment, the text information is first obtained by a web crawler. Specifically, the screen crawler frame is selected to obtain the network news information, the reason is that the screen crawler frame is mature in technology, wide in application scene and good in concurrency, and a user can flexibly customize crawler rules and the like.
Secondly, the preprocessing is mainly divided into text word segmentation and word deactivation. Because text information appears for human reading, a computer cannot directly recognize such information. Text information that can be recognized by humans is converted into digital information through some text preprocessing. And because the Chinese text is continuous except punctuation marks, technical means are required to be applied to separate the text by taking vocabulary as a unit, so that the Chinese text is convenient for computer recognition. The invention uses the Jieba word segmentation tool to segment words of text data, and the Jieba word segmentation tool is the most widely used word segmentation tool in China. In order to solve the problem of the insufficient recognition capability of the Jieba participle to the unknown word, a user-defined dictionary is specially added, and the user-defined dictionary is added into a default dictionary of the Jieba, wherein the user-defined dictionary mainly comprises names of enterprises, financial special nouns and the like. The stop word means that after the original text is subjected to word segmentation, words which have no influence on subsequent processing are removed, such as pronouns "you", "i", "he", "that", and the like, prepositions "at", "as", "let", and the like. These contents not only have no value for classification, but also influence the classification result and reduce the keyword density, so they are removed to better highlight the text features. The algorithm for removing stop words is to judge whether each word after word segmentation belongs to the stop word, if the word is the stop word, the word is removed, and a word set is not added. There are already multiple stop word lists on the internet, and the text combines and deduplicates the multiple stop word lists to construct a stop word list suitable for the text data set, which mainly comprises punctuation marks, nonsense characters, single Chinese characters with higher frequency, and is stored in a file named' stopword. The partial deactivation word list is shown in table 1:
TABLE 1
Figure BDA0003809963900000051
Figure BDA0003809963900000061
Text is a symbol sequence composed of words and phrases, which cannot be recognized by a mathematical model, so that the symbol form needs to be converted into a numerical form, and can also be considered to be embedded into a mathematical space. Specifically, the Word2Vec model is selected to convert text information into a vector, and the Word2Vec model becomes the most extensive Word vector model at present due to a series of advantages of mature technology, capability of running quickly, good universality and the like of the Word2Vec model since the world. Two basic models of Word2vec are shown in fig. 7. CBOW uses preceding and following words to predict intermediate words, and Skip-Gram uses intermediate words to predict preceding and following words.
In one embodiment, the method for determining the credit of the enterprise based on public opinion monitoring further comprises the following steps:
screening the selected multiple machine learning models; the plurality of machine learning models includes: SVM model, LGBM model, CNN model, and LSTM model.
In one embodiment, the screening of the selected plurality of machine learning models, as shown in fig. 3, includes:
s301: and respectively training an SVM model, an LGBM model, a CNN model and an LSTM model.
S302: and calculating the accuracy, recall rate and F1 value of the SVM model, the LGBM model, the CNN model and the LSTM model according to the judged accurate number of the positive cases.
S303: and comparing the accuracy, the recall rate and the F1 value to determine a model with the optimal comprehensive performance as a final machine learning model.
Because the machine learning models are various, four representative machine learning models are selected, and the model with the best performance is selected as the classification model and is optimized. The four representative machine learning models are a support vector machine model (SVM), a gradient lifting decision tree model (LGBM), a convolutional neural network model (CNN) and a long-short memory neural network (LSTM).
And comparing the accuracy, the recall rate and the F1 value of the four models to obtain the classification results of the four models. The index calculation formula is shown below.
Precision (P) = TP/(TP + FP)
Recall (R) = TP/(TP + FN)
F1 value = (2 × P × R)/(P + R)
TP (True peptides): is correctly divided into positive case number; FP (False positives): is divided into the number of positive cases by errors; FN (False negatives): the number of instances wrongly divided into negative cases; TN (True negatives): is correctly divided into the number of negative examples. Wherein, the accuracy rate represents the accuracy degree in the text classification result, the recall rate represents the completeness degree of the result, and the F1 value is considered comprehensively. The comparative results are shown in table 2:
TABLE 2
Figure BDA0003809963900000071
In one embodiment, the long and short memory neural networks are shown to perform best according to table 2. The long and short memory neural network memorizes the information through the LSTM units, and each LSTM unit consists of a forgetting gate, an input gate and an output gate. The simplified structure of the LSTM memory cell is shown in FIG. 8.
In one embodiment, as shown in fig. 4, the training process of the machine learning model includes:
s401: and establishing a training set according to the acquired historical news text information.
S402: inputting the training set into the machine learning model for training, so that the machine learning model selects information related to the credit or the business condition of the enterprise from the training set.
In an embodiment, as shown in fig. 5, the method for determining enterprise credit based on public opinion monitoring further includes:
s501: and applying an entity extraction technology to acquire an entity from information related to enterprise credit or business conditions.
S502: and establishing a mapping relation between the enterprise and the credit level according to the obtained entities.
In one embodiment, entity extraction techniques are applied to obtain what is done by entities in the information, such as business names, and analysis is performed based on the extraction results to determine whether the events affect the credit level and business status of the business. The method is realized by a Hidden Markov Model (HMM), and relates to a probability model of a sequence, wherein the probability model assumes that data has a hidden state sequence which has a Markov chain structure, and an observation sequence is generated through the state sequence. The hidden Markov model is shown in FIG. 10.
In an embodiment, as shown in fig. 6, the method for determining enterprise credit based on public opinion monitoring further includes:
s601: and in the training process of the machine learning model, performing parameter optimization on the machine learning model through a particle swarm optimization algorithm according to a training feedback result.
S602: and (5) retraining the adjusted machine learning model.
In one embodiment, the LSTM is parametrically optimized by a Particle Swarm Optimization (PSO) algorithm derived from a study of a flock of birds, where scientists find that when one bird in a flock finds a food rich area, other birds are summoned to find their way together. Thus, a particle with a mass of zero is analogized to a bird, and when a particle finds an optimal value in the activity, other particles will gather toward the particle, and finally the PSO algorithm will search for the optimal value. Currently, PSO is widely used in the fields of function optimization, machine learning model training, etc. because it is easy to implement and operate. PSO first sets many random particles, and these particles have only two attributes: and (3) updating the speed and the position of each particle through iteration to find out the optimal solution.
Figure BDA0003809963900000081
In the above formula, w represents an inertia factor.
Figure BDA0003809963900000082
And
Figure BDA0003809963900000083
representing the position and velocity of the particle to be reached at the current and next time instants.
Figure BDA0003809963900000084
And
Figure BDA0003809963900000085
respectively representing the optimal position recorded by the particle in the iterative process and the optimal position of the whole particle swarm in the iterative process. c. C 1 、c 2 Represents the acceleration factor, 2 being the usual value. r is 1 、r 2 Is a random number automatically generated by a computer, and the value is between 0 and 1. After initialization, the program will get the final result as shown in fig. 9.
Based on the same inventive concept, the embodiment of the present application further provides an enterprise credit determination apparatus based on public opinion monitoring, which can be used to implement the method described in the above embodiments, as described in the following embodiments. The principle of solving the problems of the enterprise credit determination device based on the public opinion monitoring is similar to that of the enterprise credit determination method based on the public opinion monitoring. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
According to another aspect of the present application, there is also provided a credit determination apparatus for an enterprise based on public opinion monitoring, as shown in fig. 11, including:
an information crawling unit 1101, configured to obtain, by a web crawler, news text information about an enterprise in a network public opinion;
the preprocessing unit 1102 is configured to preprocess the acquired news text information to generate a preprocessed file;
and the enterprise credit evaluation unit 1103 is used for inputting the preprocessed files into the pre-trained machine learning model to obtain the mapping relation between the enterprise and the credit level.
In one embodiment, as shown in fig. 12, the preprocessing unit 1102 includes:
a conversion module 1201, configured to convert the news text information into digital information;
and the word segmentation removing module 1202 is used for segmenting the digital information by taking the vocabulary as a unit and removing pronouns, prepositions and stop words.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the model screening unit is used for screening the selected multiple machine learning models; the plurality of machine learning models includes: SVM model, LGBM model, CNN model, and LSTM model.
In one embodiment, as shown in fig. 13, the model filtering unit includes:
a training module 1301 for training the SVM model, the LGBM model, the CNN model, and the LSTM model, respectively;
a calculating module 1302, configured to calculate accuracy, recall rate and F1 value of the SVM model, the LGBM model, the CNN model and the LSTM model according to the determined accurate number of positive cases;
and the comparison module 1303 is used for comparing the accuracy, the recall rate and the F1 value to determine a model with optimal comprehensive performance as a final machine learning model.
In one embodiment, the training process of the machine learning model includes:
establishing a training set according to the acquired historical news text information;
inputting the training set into the machine learning model for training, so that the machine learning model selects information related to the credit or the business condition of the enterprise from the training set.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the entity extraction module is used for acquiring an entity from information related to enterprise credit or operation conditions by applying an entity extraction technology;
and the mapping relation establishing module is used for establishing the mapping relation between the enterprise and the credit level according to the obtained entity.
In one embodiment, the apparatus for determining enterprise credit based on public opinion monitoring further comprises:
the parameter tuning module is used for performing parameter tuning on the machine learning model through a particle swarm algorithm according to a training feedback result in the training process of the machine learning model;
and the retraining module is used for retraining the adjusted machine learning model.
An embodiment of the present application further provides a specific implementation manner of an electronic device capable of implementing all steps in the method in the foregoing embodiment, and referring to fig. 14, the electronic device specifically includes the following contents:
a processor (processor) 1401, a memory 1402, a communication Interface (Communications Interface) 1403, a bus 1404, and a nonvolatile memory 1405;
the processor 1401, the memory 1402 and the communication interface 1403 complete communication with each other through the bus 1404;
the processor 1401 is configured to invoke the computer programs in the memory 1402 and the non-volatile storage 1405, and when executing the computer programs, the processor implements all the steps of the method in the above embodiments, for example, when executing the computer programs, the processor implements the following steps:
s101: and acquiring news text information about the enterprises in the network public sentiment through the network crawler.
S102: and preprocessing the acquired news text information to generate a preprocessed file.
S103: and inputting the preprocessed file into a pre-trained machine learning model to obtain the mapping relation between the enterprises and the credit level.
Embodiments of the present application also provide a computer-readable storage medium capable of implementing all the steps of the method in the above embodiments, where the computer-readable storage medium stores thereon a computer program, and the computer program when executed by a processor implements all the steps of the method in the above embodiments, for example, the processor implements the following steps when executing the computer program:
s101: and acquiring news text information about the enterprises in the network public sentiment through the network crawler.
S102: and preprocessing the acquired news text information to generate a preprocessed file.
S103: and inputting the preprocessed file into a pre-trained machine learning model to obtain the mapping relation between the enterprises and the credit level.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment. Although embodiments of the present description provide method steps as described in embodiments or flowcharts, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When implemented in an actual device or end product, can be executed sequentially or in parallel according to the methods shown in the embodiments or figures (e.g., parallel processor or multi-thread processing environments, even distributed data processing environments). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in processes, methods, articles, or apparatus that include the recited elements is not excluded. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the embodiments of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. The above description is only an example of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure. Various modifications and variations to the embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the embodiments of the present invention should be included in the scope of the claims of the embodiments of the present invention.

Claims (10)

1. A public opinion monitoring-based enterprise credit determination method is characterized by comprising the following steps:
acquiring news text information about enterprises in the network public sentiment through a network crawler;
preprocessing the acquired news text information to generate a preprocessed file;
inputting the preprocessed file into a machine learning model trained in advance to obtain enterprise credit or business conditions;
and determining the credit of the enterprise according to the enterprise credit or the business condition and the pre-established mapping relation between the enterprise and the credit level.
2. The method for determining enterprise credit based on public opinion monitoring as claimed in claim 1, wherein the preprocessing the obtained news text information to generate a preprocessed file comprises:
converting the news text information into digital information;
and dividing the digital information by taking vocabularies as units and removing pronouns, prepositions and stop words.
3. The method for determining enterprise credit based on public opinion monitoring as claimed in claim 1, further comprising:
screening the selected multiple machine learning models; the plurality of machine learning models comprises: SVM model, LGBM model, CNN model, and LSTM model.
4. A method for determining enterprise credit based on public opinion monitoring according to claim 3, wherein the screening of the selected multiple machine learning models comprises:
respectively obtaining the correct judgment numbers of the trained SVM model, LGBM model, CNN model and LSTM model;
respectively calculating the accuracy, the recall rate and the F1 value of the four models according to the judged correct number of the four models;
and comparing the accuracy, the recall rate and the F1 value to determine a model with optimal comprehensive performance as a final machine learning model.
5. The method of claim 1, wherein the training process of the machine learning model comprises:
establishing a training set according to the acquired historical news text information;
inputting the training set into a machine learning model for training so that the machine learning model selects information related to the credit or the business condition of the enterprise from the training set.
6. The method of claim 5, further comprising establishing a mapping relationship between businesses and credit levels:
an entity is obtained from the information related to the enterprise credit or the operation condition by applying an entity extraction technology;
and establishing a mapping relation between the enterprise and the credit level according to the obtained entities.
7. The method of claim 5, wherein the method further comprises:
in the training process of the machine learning model, performing parameter optimization on the machine learning model through a particle swarm optimization algorithm according to a training feedback result;
and (5) retraining the adjusted machine learning model.
8. An enterprise credit determination device based on public opinion monitoring, comprising:
the information crawling unit is used for acquiring news text information about enterprises in the network public sentiment through a network crawler;
the preprocessing unit is used for preprocessing the acquired news text information to generate a preprocessed file;
and the enterprise credit evaluation unit is used for inputting the preprocessed file into a pre-trained machine learning model to obtain a mapping relation between the enterprise and the credit level and determining the enterprise credit according to the mapping relation.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for determining business credit based on public opinion monitoring according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the method for determining enterprise credit based on public opinion monitoring as claimed in any one of claims 1 to 7.
CN202211008504.9A 2022-08-22 2022-08-22 Method and device for determining enterprise credit based on public opinion monitoring Pending CN115329173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211008504.9A CN115329173A (en) 2022-08-22 2022-08-22 Method and device for determining enterprise credit based on public opinion monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211008504.9A CN115329173A (en) 2022-08-22 2022-08-22 Method and device for determining enterprise credit based on public opinion monitoring

Publications (1)

Publication Number Publication Date
CN115329173A true CN115329173A (en) 2022-11-11

Family

ID=83925102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211008504.9A Pending CN115329173A (en) 2022-08-22 2022-08-22 Method and device for determining enterprise credit based on public opinion monitoring

Country Status (1)

Country Link
CN (1) CN115329173A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151653A (en) * 2023-10-25 2023-12-01 辰风策划(深圳)有限公司 Enterprise information processing method and system based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151653A (en) * 2023-10-25 2023-12-01 辰风策划(深圳)有限公司 Enterprise information processing method and system based on artificial intelligence
CN117151653B (en) * 2023-10-25 2023-12-29 辰风策划(深圳)有限公司 Enterprise information processing method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
RU2628431C1 (en) Selection of text classifier parameter based on semantic characteristics
US8630989B2 (en) Systems and methods for information extraction using contextual pattern discovery
US11468342B2 (en) Systems and methods for generating and using knowledge graphs
CN111581949B (en) Method and device for disambiguating name of learner, storage medium and terminal
Nagamanjula et al. A novel framework based on bi-objective optimization and LAN2FIS for Twitter sentiment analysis
CN108027814B (en) Stop word recognition method and device
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
KR102334236B1 (en) Method and application of meaningful keyword extraction from speech-converted text data
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
CN110334343B (en) Method and system for extracting personal privacy information in contract
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
KR102334255B1 (en) Text data collection platform construction and integrated management method for AI-based voice service
CN114090787A (en) Knowledge graph construction method based on internet power policy information
Ahmad 40 Algorithms Every Programmer Should Know: Hone your problem-solving skills by learning different algorithms and their implementation in Python
Dommati et al. Bug Classification: Feature Extraction and Comparison of Event Model using Na\" ive Bayes Approach
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN115329173A (en) Method and device for determining enterprise credit based on public opinion monitoring
WO2016093839A1 (en) Structuring of semi-structured log messages
CN112835798A (en) Cluster learning method, test step clustering method and related device
CN117271558A (en) Language query model construction method, query language acquisition method and related devices
CN114547257B (en) Class matching method and device, computer equipment and storage medium
CN113420153B (en) Topic making method, device and equipment based on topic library and event library
CN113741864B (en) Automatic semantic service interface design method and system based on natural language processing
CN114610576A (en) Log generation monitoring method and device
Gupta et al. Community trolling: an active learning approach for topic based community detection in big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination