CN115640802A - Evaluation classification method, device, equipment and storage medium for enterprise entities - Google Patents

Evaluation classification method, device, equipment and storage medium for enterprise entities Download PDF

Info

Publication number
CN115640802A
CN115640802A CN202211184499.7A CN202211184499A CN115640802A CN 115640802 A CN115640802 A CN 115640802A CN 202211184499 A CN202211184499 A CN 202211184499A CN 115640802 A CN115640802 A CN 115640802A
Authority
CN
China
Prior art keywords
text
features
key
evaluation
enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211184499.7A
Other languages
Chinese (zh)
Inventor
康祖荫
李冠萍
胡颖
陈伟杰
陈青山
谷绒霞
李环宇
王康宇
严旭婕
徐霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211184499.7A priority Critical patent/CN115640802A/en
Publication of CN115640802A publication Critical patent/CN115640802A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method, a device, equipment and a storage medium for evaluating and classifying enterprise entities. The method comprises the following steps: acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a statement set corresponding to the enterprise entity based on the text to be analyzed; and for a statement set corresponding to any enterprise entity, evaluating and classifying the enterprise entity based on basic text features of text information in the statement set and/or key text features corresponding to the text information to obtain the evaluation category of the enterprise entity in the text to be analyzed. The method and the device have the advantages that the sentences corresponding to the enterprise entities in the text to be analyzed are extracted to form the sentence sets, the key features and the basic features are respectively extracted from the text information in the sentence sets of the enterprise entities, the evaluation categories of the enterprise entities are identified based on the key features and the basic features, the enterprise entities can be evaluated and classified more comprehensively, and the accuracy of the evaluation and classification of the enterprise entities is improved.

Description

Enterprise entity evaluation classification method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for evaluating and classifying business entities.
Background
With the increasing urgent need of the financial industry for capturing financial events and market public sentiments, public sentiment data monitoring and analysis by combining marketing and pneumatic control are receiving more and more attention.
At present, there are two main recognition modes for main news and main subject evaluation classification, one is recognition result based on sentence granularity, and the recognition result of the whole news is obtained by integrating the recognition results of each sentence; and the other method is to obtain the identification result of the whole news based on the first N characters of the full text.
However, the first method cannot acquire semantic relations between sentences; the second mode can not capture full-text information of each topic, for example, recognition is carried out based on a BERT model, and at most 512 character length input is supported; resulting in less accuracy in the evaluation classification.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for evaluating and classifying enterprise entities, which are used for improving the accuracy of evaluating and classifying the enterprise entities in texts.
According to an aspect of the present invention, there is provided a method for classifying evaluations of business entities, comprising:
acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a statement set corresponding to the enterprise entity based on the text to be analyzed;
and for a statement set corresponding to any enterprise entity, performing evaluation classification on the enterprise entity based on basic text features of text information in the statement set and/or key text features corresponding to the text information to obtain the evaluation category of the enterprise entity in the text to be analyzed.
Further, the performing evaluation classification on the business entity based on the basic text features of the text information in the sentence set and/or the key text features corresponding to the text information includes:
inputting the text information in the statement set into a multitask prediction model to obtain the evaluation category of the enterprise entity in the text to be analyzed, wherein the multitask prediction model comprises a text processing submodel and a classification submodel, the classification submodel is used for extracting the basic features of the input text information, receiving the key text features transmitted by the text processing submodel and evaluating and classifying the enterprise entity based on the key text features and the basic text features.
Further, the text processing sub-model comprises a key text feature extraction module and a key text generation module; the classification submodel comprises a basic text feature extraction module and an evaluation classification module, wherein the evaluation classification module is connected with the key text feature extraction module, receives the key text features transmitted by the key text feature extraction module, splices the key text features with the basic text features, and evaluates and classifies the enterprise entities based on the spliced features.
Further, the text processing sub-model is an abstract generation sub-model.
Further, the evaluating and classifying the business entity based on the basic text features of the text information in the sentence set and/or the key text features corresponding to the text information includes:
determining key text information corresponding to the text information in the sentence set;
and performing evaluation classification on the enterprise entity based on basic text features of text information in the statement set and/or key text features of the key text information.
Further, the determining key text information corresponding to the text information in the sentence set includes:
generating a text abstract based on the text information in the sentence set, and taking the text abstract as key text information; alternatively, the first and second electrodes may be,
and extracting at least one key sentence in the text information in the sentence set, and taking the at least one key sentence as key text information.
Further, the performing, by the enterprise entity, evaluation classification based on the basic text features of the text information in the sentence set and/or the key text features of the key text information includes:
extracting basic text features of text information in the sentence set, extracting key text features of key text information, performing feature splicing on the key text features and the basic text features, and evaluating and classifying the spliced features.
Further, the performing, by the enterprise entity, evaluation classification based on the basic text features of the text information in the sentence set and/or the key text features of the key text information includes:
extracting key text features of the key text information, and performing evaluation classification based on the key text features to obtain a first evaluation classification result;
extracting basic text features of text information in the sentence set, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result;
determining a rating category for the business entity based on the first rating classification result and/or the second rating classification result.
Further, the performing evaluation classification based on the text information in the sentence set to obtain a second evaluation classification result includes:
splicing the sentences in the sentence set according to the sequence in the text to be analyzed to form a long text, extracting basic text features of the long text, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; alternatively, the first and second liquid crystal display panels may be,
and respectively extracting basic text features of each sentence in the sentence set, and evaluating and classifying the sentences based on the basic text features of each sentence to obtain a second evaluation and classification result, wherein the second evaluation and classification result comprises the evaluation category of each sentence.
Further, the identifying the business entity in the text to be analyzed includes:
and performing sentence division processing on the text to be analyzed, and inputting each obtained sentence into an enterprise entity recognition model respectively to obtain an enterprise entity corresponding to each sentence.
Further, after obtaining the business entity corresponding to each statement, the method further includes:
matching the obtained enterprise entities in a white list, and removing the enterprise entities which are not successfully matched;
and/or the presence of a gas in the gas,
and counting to obtain the occurrence times of the enterprise entities in the text to be analyzed, and taking the enterprise entity with the maximum occurrence times as the enterprise entity for evaluation and classification.
According to an aspect of the present invention, there is provided an apparatus for classifying evaluations of business entities, comprising:
the sentence set determining module is used for acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a sentence set corresponding to the enterprise entity based on the text to be analyzed;
and the evaluation classification module is used for carrying out evaluation classification on the enterprise entities based on the basic text features of the text information in the sentence sets and/or the key text features corresponding to the text information for the sentence sets corresponding to any enterprise entities to obtain the evaluation categories of the enterprise entities in the text to be analyzed.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a method for ratings classification of business entities according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a method for classifying evaluations of business entities according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer program product, comprising a computer program which, when executed by a processor, implements the method for rating and classifying a business entity according to any embodiment of the present invention.
According to the technical scheme, the sentence set is formed by extracting the sentences corresponding to the enterprise entities in the text to be analyzed, the key features and the basic features are respectively extracted from the text information in the sentence set of the enterprise entities, the evaluation categories of the enterprise entities are identified based on the key features and the basic features, strong correlation exists between the key features and the basic features, the enterprise entities can be evaluated and classified more comprehensively, and the accuracy of the evaluation and classification of the enterprise entities is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an evaluation classification method for business entities according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a multi-tasking prediction model according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of an evaluation classification apparatus for business entities according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first feature data", "second feature data", and the like in the description and the claims of the present invention and the drawings described above are used for distinguishing similar objects and are not necessarily used for describing a particular sequence or order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme related by the application can be used for acquiring, storing and/or processing the data, and the relevant regulations of national laws and regulations are met.
Example one
Fig. 1 is a flowchart of an evaluation and classification method for an enterprise entity according to an embodiment of the present invention, where this embodiment is applicable to a case of performing evaluation and classification on an enterprise entity in news, and the method may be executed by an evaluation and classification device for an enterprise entity, where the evaluation and classification device for an enterprise entity may be implemented in a form of hardware and/or software, and the evaluation and classification device for an enterprise entity may be configured in an electronic device according to an embodiment of the present invention. As shown in fig. 1, the method includes:
s110, obtaining a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a statement set corresponding to the enterprise entity based on the text to be analyzed.
The text to be analyzed refers to a text to be subjected to enterprise entity evaluation and classification, and specifically, the text to be analyzed includes, but is not limited to, data texts of news, financial public opinions and the like, and is not limited herein. In the embodiment, a text to be analyzed is obtained, the text to be analyzed is identified, and an enterprise entity related to the text to be analyzed is identified; and for the identified enterprise entities, carrying out sentence segmentation on the text to be analyzed, and extracting sentences corresponding to the enterprise entities to form a sentence set. The business entity includes, but is not limited to, a name, an abbreviation, a substitute name, etc. of the business, which represent expressions of the business entity, and are not limited herein.
On the basis of the foregoing embodiment, optionally, the identifying the business entity in the text to be analyzed includes: and performing sentence division processing on the text to be analyzed, and inputting each obtained sentence into an enterprise entity recognition model respectively to obtain an enterprise entity corresponding to each sentence respectively.
In this embodiment, the text to be analyzed is subjected to sentence division processing to obtain each sentence, and each sentence is input into the business entity identification model, so as to identify the business entity of each sentence. The business entity identification model includes, but is not limited to, BERT model, bilTM model, CRF model, etc. The enterprise entity in the text to be analyzed is identified through the enterprise entity identification model, and the identification efficiency of the enterprise entity is improved. The enterprise entity recognition model can be obtained through training in advance, enterprise entities in the input text are marked in advance in the training process, and undetermined parameters in the enterprise entity recognition model are adjusted based on the enterprise entities obtained through recognition of the enterprise entity recognition model in the training process and the marked enterprise entities until the trained enterprise entity recognition model is obtained.
On the basis of the foregoing embodiment, optionally, after obtaining the business entities respectively corresponding to each statement, the method further includes: matching the obtained enterprise entities in a white list, and removing the enterprise entities which are not successfully matched; and/or counting the occurrence frequency of the enterprise entities in the text to be analyzed, and taking the enterprise entity with the maximum occurrence frequency as the enterprise entity for evaluation and classification.
In this embodiment, the enterprise entities identified by the enterprise entity identification model are further screened, and the screening method may be: matching the obtained enterprise entities in a white list, and removing the enterprise entities which are not successfully matched; the white list refers to a list of enterprise entities to be subjected to enterprise entity evaluation classification, and exemplarily, the white list may be a target enterprise list, and by querying the target enterprise list, if the queried enterprise entities exist in the target enterprise list, the enterprise entities are determined to be the enterprise entities subjected to evaluation classification; if the queried enterprise entity is not in the target enterprise list, the enterprise entity is discarded, and an invalid processing process caused by evaluation and classification of the enterprise entity which does not need to be processed is avoided. The screening method of the enterprise entity can also be as follows: and counting to obtain the occurrence times of the enterprise entities in the text to be analyzed, taking the enterprise entity with the largest occurrence time as the enterprise entity for evaluation and classification, and carrying out evaluation and classification on the most key enterprise entity in the text to be analyzed, so that the waste of computing resources is reduced. Through screening the identified enterprise entities, the enterprise entities which do not need to be evaluated and classified are screened, and the efficiency of enterprise entity evaluation and classification is improved.
In some embodiments, in order to facilitate operations such as enterprise entity identification and statement extraction, after a text to be analyzed is obtained, data cleaning is performed on the text to be analyzed, and a title and a body in the text to be analyzed are spliced to obtain an analysis text convenient to process, where the data cleaning refers to checking and examining data, and specifically, may include but is not limited to deleting useless data, modifying error data, and the like; illustratively, for example, symbols, picture data, etc. in the text to be analyzed are deleted.
S120, for a statement set corresponding to any enterprise entity, evaluating and classifying the enterprise entity based on basic text features of text information in the statement set and/or key text features corresponding to the text information to obtain an evaluation category of the enterprise entity in the text to be analyzed.
The basic text features refer to semantic features of original text information in a text to be analyzed, the key text features refer to semantic features of key text information corresponding to the original text information, for example, abstract text information is obtained by extracting text information in a sentence set, the abstract text information is used as key text information, and correspondingly, the key text features are semantic features of the abstract text information. In this embodiment, for a statement set corresponding to any identified enterprise entity, extracting basic text features of text information in the statement set and/or extracting key text features corresponding to the text information, and performing evaluation classification on the enterprise entity based on the basic text features and/or the key text features to obtain an evaluation category of the enterprise entity; the evaluation categories include positive evaluation, negative evaluation and neutral evaluation. In this embodiment, the evaluation types of the enterprise entities in the large amount of texts to be analyzed can be further counted by the evaluation classification of the enterprise entities in the large amount of texts to be analyzed, and the enterprise entity risk data is determined based on the statistical result of the evaluation types. The business entity risk data is positively correlated with the statistical data of the negative evaluation, for example, the statistical data of the negative evaluation may be the number of texts of the negative evaluation, or the text proportion of the negative evaluation.
On the basis of the foregoing embodiment, optionally, the performing, based on the basic text feature of the text information in the sentence set and/or the key text feature corresponding to the text information, the evaluation and classification of the business entity includes: inputting the text information in the statement set into a multitask prediction model to obtain the evaluation category of the enterprise entity in the text to be analyzed, wherein the multitask prediction model comprises a text processing submodel and a classification submodel, the classification submodel is used for extracting the basic features of the input text information, receiving the key text features transmitted by the text processing submodel and evaluating and classifying the enterprise entity based on the key text features and the basic text features.
The text information may be a long sentence formed according to a position order of each sentence in the sentence set before being input to the multitask prediction model based on each sentence in the sentence set, and the long sentence may be input to the multitask prediction model. In the embodiment, a multi-task prediction model is constructed and trained, and tasks in the multi-task prediction model comprise a key text generation task and an evaluation classification task; inputting text information in the statement set as a model to a trained multi-task prediction model to obtain the evaluation category of the enterprise entity corresponding to the statement set; the multi-task prediction model comprises two subtask models, namely a text processing submodel and a classification submodel, wherein the text processing submodel is used for extracting key features of input text information to obtain key text features and generating key texts; the classification submodel is used for extracting basic characteristics of input text information, receiving key text characteristics transmitted by the text processing submodel and evaluating and classifying the enterprise entities based on the basic text characteristics and the key text characteristics. By constructing the multi-task prediction model and carrying out evaluation classification on the enterprise entities based on the basic text features and the key text features, the evaluation categories of the enterprise entities can be more comprehensively identified, and the accuracy of the evaluation classification is improved.
On the basis of the above embodiment, optionally, the text processing sub-model includes a key text feature extraction module and a key text generation module; the classification submodel comprises a basic text feature extraction module and an evaluation classification module, wherein the evaluation classification module is connected with the key text feature extraction module, receives the key text features transmitted by the key text feature extraction module, splices the key text features and the basic text features, and evaluates and classifies the enterprise entities based on the spliced features.
Fig. 2 is a diagram illustrating a structure of a multi-tasking prediction model according to an embodiment of the invention. As shown in fig. 2, text information in the sentence set is input to the multitask prediction model, the text processing sub-model extracts key text features based on the key text feature extraction module, and generates a key text through the key text generation module based on the key text features; the classification submodel extracts basic text features based on the basic text feature extraction module, splices the basic text features with the key text features transmitted by the key text extraction module of the text processing submodel to obtain splicing features, evaluates and classifies the enterprise entities through the evaluation classification module based on the splicing features, outputs the distribution probability of each evaluation category of the enterprise entities, and determines the evaluation category corresponding to the enterprise entities based on the distribution probability of each evaluation category.
It should be noted that the parameters of the key text feature extraction module and the basic text feature extraction module are not completely consistent, the key text feature extraction module comprises an Encoder Encoder-1 unit and a Decoder Decoder-1 unit, the basic text feature extraction module only has an Encoder Encoder-2, the two modules only share a part of Encoder parameters at the bottom layer, and the Encoder-1 and the Encoder-2 realize the similarity of the parameters through the regularization of the distance between the parameters, so that the soft sharing of the parameters is realized, therefore, the Encoder-1 unit in the key text feature extraction module is connected with the Encoder-2 in the basic text feature extraction module, and the two can be transmitted in two directions. By sharing parameters with Encoder-1 and Encoder-2, the model can be more generalized to represent, over-fitting is avoided, and meanwhile, the model can have the characteristics of two tasks, so that the model is helped to focus attention on important characteristics.
In some embodiments, optionally, the basic text features and the key text features transmitted by the key text extraction module of the text processing sub-model are fused to obtain fusion features, and the enterprise entities are evaluated and classified by the evaluation classification module based on the fusion features.
Illustratively, the structure of the multitask predictive model is as follows:
(1) Input layer
The two tasks of the model adopt a Transformer model to extract semantic features of a bottom text, a vector Vi of each word Wi of a sentence L is obtained in an input layer, the Vi is obtained by adding a word vector Ci of the word Wi and a position Pi vector, and the Ci and the Pi can be obtained through random initialization.
(2) Soft shared layer
An Encoder-1 unit and a Decoder-1 unit are arranged on a key text feature extraction module of a key text generation task, only an Encoder-2 unit is arranged on a basic text feature extraction module of an evaluation classification task, parameters of the key text feature extraction module and the basic text feature extraction module are not completely consistent, a part of Encoder parameters are shared at the bottom layer, a unique part of Encoder and Decoder parameters are not shared by a sub-model, and the Encoder-1 and the Encoder-2 realize parameter similarity through regularization of distances between the parameters, so that soft sharing of the parameters is realized.
(3) Independent layer
On an independent layer, a key text generation module of a key text generation task and an evaluation classification module of an evaluation classification task do not share parameters, the independence of the tasks is kept, but a Decoder-1 result vector of a key text feature extraction module and an Encoder result vector of a basic text feature extraction module are spliced to be used as an input vector of the evaluation classification module, so that the semantic features of basic texts and the semantic features of key texts are learned, the learning capability of a classification sub-model is enhanced, and the distinguishing effect of the main evaluation type is improved. The key text generation module Tower-1 can comprise a full connection layer and a softmax layer, and the evaluation classification module Tower-2 can comprise a full connection layer and a softmax layer.
(4) Output layer
And independently outputting results by using Output-1 of the key text generation submodel and Output-2 of the classification submodel, and outputting the distribution probability of each evaluation category of the enterprise entity by using Output-2.
On the basis of the above embodiment, optionally, the text processing sub-model is an abstract generation sub-model, and correspondingly, the key text features are abstract text features.
In the embodiment, the text processing submodel is an abstract generation submodel, text information in a sentence set is input into the multi-task prediction model, the abstract generation submodel extracts abstract text characteristics from the text information to obtain abstract text characteristics, and an abstract is generated based on the abstract text characteristics; and the classification submodel extracts basic features of the text information to obtain basic text features, receives abstract text features transmitted by the abstract generation submodel, and evaluates and classifies the enterprise entities based on the abstract text features and the basic text features. By learning the two text characteristics, the learning capability of the classification submodel is enhanced, and the accuracy of enterprise entity evaluation category judgment is improved.
On the basis of the foregoing embodiment, optionally, the performing, based on the basic text feature of the text information in the sentence set and/or the key text feature corresponding to the text information, the evaluation and classification of the business entity includes: and determining key text information corresponding to the text information in the sentence set. And performing evaluation classification on the enterprise entity based on basic text features of text information in the statement set and/or key text features of the key text information.
The key text information refers to text information obtained by extracting key information in the text information. In this embodiment, the key text information is determined based on the text information in the sentence set, the basic feature extraction is performed on the text information to obtain the basic text feature, the key feature extraction is performed on the key text information to obtain the key feature extraction to obtain the key text feature, and the evaluation classification is performed on the enterprise entity based on the basic text feature and/or the key text feature to obtain the evaluation classification of the enterprise entity.
On the basis of the foregoing embodiment, optionally, the determining key text information corresponding to text information in the sentence set includes: generating a text abstract based on the text information in the sentence set, and taking the text abstract as key text information; or extracting at least one key sentence in the text information in the sentence set, and taking the at least one key sentence as the key text information.
In this embodiment, the manner of determining the key text information may be: extracting text information in the sentence set based on an abstract extraction model to generate a text abstract, taking the text abstract as key text information corresponding to the text information, and evaluating and classifying the enterprise entities based on basic text characteristics of the text information and/or key text characteristics of the key text information; the abstract extracting model can be a mature abstract extracting model for extracting the abstract in the text information.
Optionally, the manner of determining the key text information may also be: and performing key sentence extraction operation on the text information aggregated and collected, extracting at least one key sentence from the text information, taking the extracted key sentence as key text information, and performing evaluation classification on the enterprise entity based on the basic text features of the text information and/or the key text features of the key text information. The extracting operation may be to audit each statement in the text information, assign a weight to each statement based on the criticality of the statement, sort each statement, and extract the statement whose weight is greater than the preset weight.
On the basis of the foregoing embodiment, optionally, the performing, by using the basic text features of the text information in the sentence set and/or the key text features of the key text information, evaluation and classification on the business entity includes: extracting basic text features of text information in the sentence set, extracting key text features of the key text information, performing feature splicing on the key text features and the basic text features, and evaluating and classifying the spliced features.
In this embodiment, basic feature extraction is performed on text information to obtain basic text features, key feature extraction is performed on key text information to obtain key text features, feature splicing is performed on the key text features and the basic text features to obtain splicing features, and evaluation and classification are performed on the splicing features. By splicing the basic features and the key features and evaluating and classifying based on the splicing features, comprehensiveness is guaranteed, and meanwhile, the efficiency of evaluating and classifying is improved
In some embodiments, basic feature extraction is performed on text information to obtain basic text features, key feature extraction is performed on key text information to obtain key text features, feature fusion is performed on the key text features and the basic text features to obtain fusion features, and evaluation and classification are performed on the fusion features.
On the basis of the foregoing embodiment, optionally, the performing, by the evaluation and classification on the business entity based on the basic text features of the text information in the sentence set and/or the key text features of the key text information, includes: extracting key text features of the key text information, and performing evaluation classification based on the key text features to obtain a first evaluation classification result; extracting basic text features of text information in the sentence set, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; determining a rating category for the business entity based on the first rating classification result and/or the second rating classification result.
In the embodiment, key features of the key text information are extracted to obtain key text features, and evaluation classification is carried out based on the key text features to obtain a first evaluation classification result; performing basic features on the text information, extracting to obtain basic text features, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; determining an evaluation category of the business entity based on the first evaluation classification result and/or the second evaluation classification result; and the weight of the first evaluation classification result is greater than that of the second evaluation classification result. After the key text features and the basic text features are respectively evaluated and classified, the evaluation categories of the enterprise entities are determined based on two evaluation classification results, so that the evaluation classification mode of the enterprise entities is more flexible.
In some embodiments, the evaluation classification result is a distribution probability of each evaluation category. When the evaluation category of the business entity is determined based on the first evaluation classification result or the second evaluation classification result, the evaluation category of the business entity is determined based on the first evaluation classification result or the second evaluation classification result; illustratively, the distribution probabilities of the evaluation categories in the first evaluation classification result or the second evaluation classification result are compared, and the evaluation category with the largest distribution probability is selected as the evaluation category of the business entity. When determining the evaluation category of the business entity based on the first evaluation classification result and the second evaluation classification result, determining the evaluation category of the business entity based on the first evaluation classification result and the weight of the second evaluation classification result and the second evaluation classification result; for example, the distribution probability of each evaluation category after the first evaluation classification result is multiplied by the weight is calculated based on the weight of the first evaluation classification result and the first evaluation classification result, similarly, the distribution probability of each evaluation category after the second evaluation classification result is multiplied by the weight is calculated, the distribution probabilities of the same evaluation categories are added, the distribution probabilities of the evaluation categories after addition are compared, and the evaluation category with the largest distribution probability is selected as the evaluation category of the business entity.
On the basis of the foregoing embodiment, optionally, the performing evaluation classification based on text information in the sentence set to obtain a second evaluation classification result includes: splicing the sentences in the sentence set according to the sequence in the text to be analyzed to form a long text, extracting basic text features of the long text, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; or respectively extracting basic text features of each sentence in the sentence set, and performing evaluation classification on the sentences based on the basic text features of each sentence to obtain a second evaluation classification result, wherein the second evaluation classification result comprises the evaluation category of each sentence.
In this embodiment, the first way of performing evaluation classification based on text information in the sentence set to obtain the second evaluation classification result is to splice the sentences in the sentence set according to the sentence order in the text to be analyzed to form long text information, perform basic feature extraction on the long text information to obtain basic text features corresponding to the long text information, and perform evaluation classification based on the basic text features to obtain the second evaluation classification result. And performing evaluation classification based on the text information in the sentence set to obtain a second evaluation classification result, namely performing basic feature extraction on each sentence in the sentence set to obtain basic text features corresponding to each sentence, and performing evaluation classification on the corresponding sentences based on the basic text features corresponding to each sentence to obtain a second evaluation classification result, wherein the second evaluation classification result comprises the evaluation category of each sentence.
In some embodiments, when determining the evaluation category of the business entity based on the second evaluation classification result, counting the frequency of occurrence of each evaluation category in the evaluation classification result, and determining the evaluation category with the highest frequency as the evaluation category of the business entity; if the evaluation category with the highest frequency is not unique, summing the evaluation categories with the highest frequency according to the confidence level, and determining the evaluation category with the highest confidence level as the evaluation category of the enterprise entity; and if the evaluation category with the maximum confidence coefficient is not unique, analyzing the position of the evaluation category with the maximum confidence coefficient in the text to be analyzed, and determining the evaluation category with the most front position as the evaluation category of the enterprise entity. The confidence degree is a weight corresponding to the evaluation category, that is, the degree of believing that the evaluation category of the text to be analyzed is the evaluation category, and is determined by analysis of related information of the business entity by a person skilled in the art, which is not limited herein.
According to the technical scheme, the sentence set is formed by extracting the sentences corresponding to the enterprise entities in the text to be analyzed, the key features and the basic features are respectively extracted from the text information in the sentence set of the enterprise entities, the evaluation categories of the enterprise entities are identified based on the key features and the basic features, the key features and the basic features have strong relevance, the enterprise entities can be evaluated and classified more comprehensively, and the accuracy of the evaluation and classification of the enterprise entities is improved.
Example two
Fig. 3 is a schematic structural diagram of an evaluation classification apparatus for business entities according to a second embodiment of the present invention. As shown in fig. 3, the apparatus includes:
the statement set determining module 210 is configured to obtain a text to be analyzed, identify an enterprise entity in the text to be analyzed, and determine a statement set corresponding to the enterprise entity based on the text to be analyzed.
The evaluation classification module 220 is configured to, for a statement set corresponding to any business entity, perform evaluation classification on the business entity based on a basic text feature of text information in the statement set and/or a key text feature corresponding to the text information, so as to obtain an evaluation category of the business entity in the text to be analyzed.
Optionally, the evaluation classification module 220 is configured to input the text information in the sentence set into a multitask prediction model to obtain an evaluation category of the enterprise entity in the text to be analyzed, where the multitask prediction model includes a text processing submodel and a classification submodel, where the classification submodel is configured to perform basic feature extraction on the input text information, receive a key text feature transmitted by the text processing submodel, and perform evaluation classification on the enterprise entity based on the key text feature and the basic text feature.
Optionally, the text processing submodel includes a key text feature extraction module and a key text generation module; the classification submodel comprises a basic text feature extraction module and an evaluation classification module, wherein the evaluation classification module is connected with the key text feature extraction module, receives the key text features transmitted by the key text feature extraction module, splices the key text features and the basic text features, and evaluates and classifies the enterprise entities based on the spliced features.
Optionally, the text processing sub-model is an abstract generation sub-model.
Optionally, the evaluation classification module 220 includes a key text information determination unit and an evaluation classification unit; the key text information determining unit is used for determining key text information corresponding to the text information in the sentence set; and the evaluation classification unit is used for carrying out evaluation classification on the enterprise entity based on the basic text features of the text information in the statement set and/or the key text features of the key text information.
Optionally, the key text information determining unit is configured to generate a text summary based on the text information in the sentence set, and use the text summary as the key text information; or extracting at least one key sentence in the text information in the sentence set, and taking the at least one key sentence as the key text information.
Optionally, the evaluation and classification unit is configured to extract basic text features of the text information in the sentence set, extract key text features of the key text information, perform feature concatenation on the key text features and the basic text features, and perform evaluation and classification on the concatenation features.
Optionally, the evaluation classification unit is configured to extract a key text feature of the key text information, and perform evaluation classification based on the key text feature to obtain a first evaluation classification result; extracting basic text features of text information in the sentence set, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; determining a rating category for the business entity based on the first rating classification result and/or the second rating classification result.
Optionally, the evaluation classification unit is further configured to splice the sentences in the sentence set according to the sequence in the text to be analyzed to form a long text, extract basic text features of the long text, and perform evaluation classification based on the basic text features to obtain a second evaluation classification result; or respectively extracting basic text features of each sentence in the sentence set, and performing evaluation classification on the sentences based on the basic text features of each sentence to obtain a second evaluation classification result, wherein the second evaluation classification result comprises the evaluation category of each sentence.
Optionally, the sentence set determining module 210 includes an enterprise entity identifying unit, where the enterprise entity identifying unit is configured to perform sentence division processing on the text to be analyzed, and input each obtained sentence into the enterprise entity identifying model to obtain an enterprise entity corresponding to each sentence.
Optionally, after obtaining the enterprise entities corresponding to each statement, the apparatus further includes an enterprise entity selecting unit, where the enterprise entity selecting unit is configured to match the obtained enterprise entities in a white list, and remove enterprise entities that are not successfully matched; and/or the business entity is used for counting the occurrence frequency of the business entity in the text to be analyzed, and the business entity with the maximum occurrence frequency is used as the business entity for evaluation and classification.
The evaluation classification device for the enterprise entity provided by the embodiment of the invention can execute the evaluation classification method for the enterprise entity provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE III
Fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. Processor 11 performs the various methods and processes described above, such as the valuation classification methodology of a business entity.
In some embodiments, the method for ratings classification of business entities may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more of the steps of the above-described method of ratings classification of business entities. Alternatively, in other embodiments, the processor 11 may be configured in any other suitable manner (e.g., by means of firmware) to perform the method of rating classification of business entities.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
Example four
The fourth embodiment of the present invention further provides a computer-readable storage medium, where a computer instruction is stored in the computer-readable storage medium, and the computer instruction is used to enable a processor to execute the method for evaluating and classifying business entities, where the method includes:
acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a statement set corresponding to the enterprise entity based on the text to be analyzed; and for a statement set corresponding to any enterprise entity, performing evaluation classification on the enterprise entity based on basic text features of text information in the statement set and/or key text features corresponding to the text information to obtain the evaluation category of the enterprise entity in the text to be analyzed.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
EXAMPLE five
Fifth, an embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for classifying and evaluating business entities according to any embodiment of the present invention is implemented.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (15)

1. A method for rating and classifying business entities, comprising:
acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a statement set corresponding to the enterprise entity based on the text to be analyzed;
and for a statement set corresponding to any enterprise entity, performing evaluation classification on the enterprise entity based on basic text features of text information in the statement set and/or key text features corresponding to the text information to obtain the evaluation category of the enterprise entity in the text to be analyzed.
2. The method according to claim 1, wherein the performing of rating classification on the business entity based on the basic text features of the text information in the sentence collection and/or the key text features corresponding to the text information comprises:
inputting the text information in the statement set into a multitask prediction model to obtain the evaluation category of the enterprise entity in the text to be analyzed, wherein the multitask prediction model comprises a text processing submodel and a classification submodel, the classification submodel is used for extracting basic features of the input text information, receiving key text features transmitted by the text processing submodel, and evaluating and classifying the enterprise entity based on the key text features and the basic text features.
3. The method of claim 2, wherein the text processing submodel comprises a key text feature extraction module and a key text generation module;
the classification submodel comprises a basic text feature extraction module and an evaluation classification module, wherein the evaluation classification module is connected with the key text feature extraction module, receives the key text features transmitted by the key text feature extraction module, splices the key text features with the basic text features, and evaluates and classifies the enterprise entities based on the spliced features.
4. The method of claim 3, wherein the text processing submodel is a summary generation submodel.
5. The method of claim 1, wherein the performing the evaluation classification on the business entity based on the basic text features of the text information in the sentence collection and/or the key text features corresponding to the text information comprises:
determining key text information corresponding to the text information in the sentence set;
and performing evaluation classification on the business entities based on basic text features of text information in the statement set and/or key text features of the key text information.
6. The method of claim 5, wherein the determining key text information corresponding to the text information in the sentence set comprises:
generating a text abstract based on the text information in the sentence set, and taking the text abstract as key text information; alternatively, the first and second electrodes may be,
and extracting at least one key sentence in the text information in the sentence set, and taking the at least one key sentence as key text information.
7. The method of claim 5, wherein the rating classification of the business entity based on the basic textual features of textual information in the sentence collection and/or the key textual features of the key textual information comprises:
extracting basic text features of text information in the sentence set, extracting key text features of the key text information, performing feature splicing on the key text features and the basic text features, and evaluating and classifying the spliced features.
8. The method of claim 5, wherein the rating classification of the business entity based on the basic textual features of textual information in the sentence collection and/or the key textual features of the key textual information comprises:
extracting key text features of the key text information, and performing evaluation classification based on the key text features to obtain a first evaluation classification result;
extracting basic text features of text information in the sentence set, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result;
determining a rating category for the business entity based on the first rating classification result and/or the second rating classification result.
9. The method of claim 8, wherein performing a rating classification based on the text information in the sentence collection to obtain a second rating classification result comprises:
splicing the sentences in the sentence set according to the sequence in the text to be analyzed to form a long text, extracting basic text features of the long text, and performing evaluation classification based on the basic text features to obtain a second evaluation classification result; alternatively, the first and second liquid crystal display panels may be,
and respectively extracting basic text features of each sentence in the sentence set, and performing evaluation classification on the sentences based on the basic text features of each sentence to obtain a second evaluation classification result, wherein the second evaluation classification result comprises the evaluation category of each sentence.
10. The method according to any one of claims 1-9, wherein the identifying the business entity in the text to be analyzed comprises:
and performing sentence division processing on the text to be analyzed, and inputting each obtained sentence into an enterprise entity recognition model respectively to obtain an enterprise entity corresponding to each sentence respectively.
11. The method of claim 10, wherein after obtaining the business entity corresponding to each statement, the method further comprises:
matching the obtained enterprise entities in a white list, and removing the enterprise entities which are not successfully matched;
and/or the presence of a gas in the gas,
and counting to obtain the occurrence times of the enterprise entities in the text to be analyzed, and taking the enterprise entity with the maximum occurrence times as the enterprise entity for evaluation and classification.
12. An apparatus for rating and classifying business entities, comprising:
the sentence set determining module is used for acquiring a text to be analyzed, identifying an enterprise entity in the text to be analyzed, and determining a sentence set corresponding to the enterprise entity based on the text to be analyzed;
and the evaluation classification module is used for carrying out evaluation classification on the enterprise entities based on basic text features of text information in the sentence sets and/or key text features corresponding to the text information for the sentence sets corresponding to any enterprise entity to obtain the evaluation categories of the enterprise entities in the text to be analyzed.
13. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of ratings classification of a business entity of any one of claims 1-11.
14. A computer-readable storage medium storing computer instructions for causing a processor to perform the method for ratings classification of business entities of any one of claims 1-11 when executed.
15. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out a method for rating a business entity according to any of the claims 1-11.
CN202211184499.7A 2022-09-27 2022-09-27 Evaluation classification method, device, equipment and storage medium for enterprise entities Pending CN115640802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211184499.7A CN115640802A (en) 2022-09-27 2022-09-27 Evaluation classification method, device, equipment and storage medium for enterprise entities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211184499.7A CN115640802A (en) 2022-09-27 2022-09-27 Evaluation classification method, device, equipment and storage medium for enterprise entities

Publications (1)

Publication Number Publication Date
CN115640802A true CN115640802A (en) 2023-01-24

Family

ID=84942261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211184499.7A Pending CN115640802A (en) 2022-09-27 2022-09-27 Evaluation classification method, device, equipment and storage medium for enterprise entities

Country Status (1)

Country Link
CN (1) CN115640802A (en)

Similar Documents

Publication Publication Date Title
CN110909165A (en) Data processing method, device, medium and electronic equipment
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN113051380A (en) Information generation method and device, electronic equipment and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN112148841A (en) Object classification and classification model construction method and device
CN114880498B (en) Event information display method and device, equipment and medium
CN114417974B (en) Model training method, information processing device, electronic equipment and medium
CN114118049B (en) Information acquisition method, device, electronic equipment and storage medium
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN115640802A (en) Evaluation classification method, device, equipment and storage medium for enterprise entities
CN114417029A (en) Model training method and device, electronic equipment and storage medium
CN114116688A (en) Data processing and data quality inspection method, device and readable storage medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN113239273A (en) Method, device, equipment and storage medium for generating text
CN112818972A (en) Method and device for detecting interest point image, electronic equipment and storage medium
CN114492409B (en) Method and device for evaluating file content, electronic equipment and program product
CN116244740B (en) Log desensitization method and device, electronic equipment and storage medium
CN117633226A (en) Classification method and device, storage medium and electronic equipment
CN115619412A (en) Risk management and control method, device, equipment and storage medium
CN115935054A (en) Information pushing method and device, electronic equipment and storage medium
CN117574168A (en) Information report generation method and device
CN117708678A (en) Text classification grading method, device, equipment and storage medium
CN114218478A (en) Recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination