CN115906830A - Financial information feature extraction method and system based on feature theme and storage medium - Google Patents

Financial information feature extraction method and system based on feature theme and storage medium Download PDF

Info

Publication number
CN115906830A
CN115906830A CN202211310184.2A CN202211310184A CN115906830A CN 115906830 A CN115906830 A CN 115906830A CN 202211310184 A CN202211310184 A CN 202211310184A CN 115906830 A CN115906830 A CN 115906830A
Authority
CN
China
Prior art keywords
financial information
characteristic
financial
feature
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211310184.2A
Other languages
Chinese (zh)
Inventor
王擎
董青马
宋磊
顾见军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Digital Technology Co ltd
Original Assignee
Chengdu Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Digital Technology Co ltd filed Critical Chengdu Digital Technology Co ltd
Priority to CN202211310184.2A priority Critical patent/CN115906830A/en
Publication of CN115906830A publication Critical patent/CN115906830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a financial information feature extraction method and system based on a feature theme and a storage medium, which extract the features of financial information by preprocessing financial information, respectively performing financial information word segmentation filtering, syntax filtering and feature theme model filtering according to multi-dimensional features in a financial information feature theme library, and then calculating and extracting keywords. The whole extraction process is based on the characteristic theme of financial information, and accurate extraction of financial information characteristics is realized through filtering and calculating for many times. Under the condition of not damaging the financial information core information, the method and the device greatly reduce the number of the financial information participles to be processed, so that the space dimension of the participle vector is reduced, the calculation is simplified, and the speed and the efficiency of financial information processing are improved. Secondly, the characteristic value extracted in advance in the financial information is efficiently evaluated through a characteristic evaluation function based on a factor analysis method based on the characteristic subject library, and the requirement of the financial information analysis is better met.

Description

Financial information feature extraction method and system based on feature theme and storage medium
Technical Field
The invention relates to the technical field of financial data processing, in particular to a method and a system for extracting financial information characteristics based on characteristic subjects and a storage medium.
Background
With the globalization of economy and the rapid development of financial innovation, all financial institutions face increasingly diverse and complex financial information, and how to efficiently analyze the financial information becomes increasingly important.
Financial intelligence analysis is the process of comprehensively evaluating, analyzing and mining the full-source financial intelligence data, thereby converting the data into useful intelligence information to meet the requirements of financial users. In the whole implementation process, value clues can be found only through characteristic extraction and analysis of financial information data, so that the implicit law of financial market change is excavated, and the purpose of assisting decision making is achieved. Thus, in the process of mining and analyzing financial intelligence, the characteristics of the financial intelligence are crucial to the analysis.
The existing feature extraction of various information or data is based on the feature extraction of common text information, namely, feature words extracted from the text information are quantized, so that the text information is represented by the feature words. Therefore, the text information is converted from unstructured original text information into structured information which can be recognized and processed by a computer, namely, scientific abstraction is realized on the information, and a mathematical model of the information is established to describe and replace the text.
Currently, research on representation of text information mainly focuses on selection of a text information representation model and selection of a feature word selection algorithm. The basic unit for representing text information is generally referred to as a feature or feature item of text. The characteristic items must have certain characteristics: 1. the feature item is to be able to identify the text content indeed; 2. the feature item has the ability to distinguish the target text from other text; 3. the number of the characteristic items cannot be too many; 4. the characteristic item separation is easy to realize.
The existing text information feature extraction methods mainly include the following two types:
(1) The feature extraction method based on statistics comprises the following steps: the method evaluates each feature in the feature set by constructing an evaluation function, and scores each feature, so that each word obtains an evaluation value, also called weight. And then sorting all the features according to the weight value, and extracting a preset number of optimal features as a feature subset of the extraction result. Obviously, for this type of algorithm, the main factor that determines the effect of text feature extraction is the quality of the evaluation function.
(2) A feature extraction method based on semantics mainly comprises a text feature extraction method based on a contextual framework: the semantic analysis is integrated into a statistical algorithm, and the basic method is still statistics-extraction, or a text feature extraction method based on ontology, or a concept feature extraction method based on the known network:
these two different feature extraction methods perform feature extraction of text information from different angles.
With the development of society and the progress of technology, financial information is huge and rich, and the structure is multi-source and heterogeneous, wherein the massive, heterogeneous and distributed financial information cannot be directly used and understood due to the lack of computer understandable semantics, and needs to be mined. Data processed by traditional information mining is structured, while at present financial intelligence is semi-structured or unstructured data. Therefore, the first problem in financial information analysis is how to reasonably represent information in a computer, so that the computer can not only contain enough information to reflect the characteristics of financial information, but also not be too complex to be processed by an analysis algorithm. The extraction of the characteristics of the financial intelligence appears to be of great importance,
however, the current feature extraction of financial information cannot adapt to the new information analysis requirement because of the following reasons:
(1) Most current text classification systems use words as feature terms, called feature words. The characteristic words are used as an intermediate representation form of the document and are used for realizing similarity calculation between the document and between the document and a user target. If all words are used as feature terms, the dimension of the feature vector is too large, resulting in too large a calculation, in which case it is almost impossible to complete the text classification. So the current usual feature extraction in this case uses a vector space model to describe the text vector. However, the vector space model usually contains a large number of commonly used words which cannot become feature items, in order to improve the system operation efficiency, the system needs to establish a professional word segmentation table according to a mining target, so that the system operation efficiency can be obviously improved on the premise of ensuring the feature extraction accuracy, however, a general word stock usually does not contain feature words special for financial directions, so that dimension reduction cannot be realized, the calculation and extraction of text features can be seriously influenced due to overlarge calculated amount, and under the condition of existing massive multi-source heterogeneous financial information, the traditional feature extraction cannot meet the requirement of high-speed calculation and cannot meet the requirement of real-time extraction due to overlarge calculated amount.
(2) In the traditional feature extraction, in order to realize faster feature extraction, feature dimension reduction is needed, the traditional classifier algorithm is needed to realize text classification of financial information, but because the traditional classifier algorithm is based on the traditional corpus, the special semantic features of the financial information cannot be met, so that the traditional classifier algorithm cannot realize classification of professional information of the financial information, also cannot realize feature dimension reduction, and cannot meet the requirements of efficient calculation and analysis of the financial information.
(3) Currently, in the conventional feature extraction, it is more and more common to use an evaluation function to perform feature selection, and a feature selection algorithm selects a predetermined number of optimal features as a result of a feature subset by a method of constructing an evaluation function. In the conventional evaluation methods, each evaluation method has a word selection standard, and a feature word set with a certain limited range is selected from all words in a text set according to the standard, but because the structure of an evaluation function is simple, the environment specific to financial information cannot be met, and suitable features cannot be extracted from the types according to the conventional method.
(4) In the traditional evaluation function, because the condition that a word does not occur is considered, the contribution to judging the text type is not large, unnecessary interference is introduced, and particularly, the selection precision is reduced when data with highly unbalanced class distribution and characteristic value distribution is processed, so that the requirement of extracting the financial intelligence characteristic cannot be met.
Disclosure of Invention
The present application provides a method and a system for extracting financial information features based on feature topics, and a storage medium to solve the above technical problems. Financial information word segmentation filtering, syntax filtering and feature topic model filtering are respectively carried out according to multidimensional features in a financial information feature topic library through financial information preprocessing, then keyword calculation and extraction are carried out, so that the features of the financial information are extracted, the whole extraction process is based on the feature topics of the financial information, and accurate extraction of the financial information features is realized through multiple times of filtering and calculation.
The application is realized by the following technical scheme:
the application provides a financial information feature extraction method based on a feature theme, which comprises the following steps: preprocessing input financial information to realize preprocessing of data noise and data inconsistency of the financial information and detection of data abnormity; the method has a characteristic subject library of financial information, the characteristic subject library of the financial information has the characteristics of the existing financial information, and the characteristic subject library can be continuously iterated and updated along with the extraction of the characteristics of the financial information every time. Then, performing financial information word segmentation according to word segmentation characteristics in a financial information characteristic subject library, and performing information word segmentation filtering; performing syntax filtering of financial information according to the syntax characteristics in the financial information characteristic subject library; filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library; and then, calculating and extracting keywords for the filtered financial information, generating the characteristics of the financial information according to the extracted keywords, and inputting the generated characteristics of the financial information into a financial information characteristic library again to realize the iteration and the updating of the financial information characteristic library.
Where the input financial intelligence may be structured, semi-structured, or unstructured data.
The application provides a financial information feature extraction system based on characteristic theme includes:
the input module is used for inputting financial information;
the financial information preprocessing module is used for preprocessing financial information, preprocessing data noise and data inconsistency of the financial information and detecting abnormal points;
the financial information characteristic subject library module comprises the characteristics of financial information, and the characteristic library can continuously iterate along with the extraction of the financial information characteristics each time;
the financial information word segmentation filtering module is used for carrying out financial information word segmentation according to word segmentation characteristics in the financial information characteristic subject database and carrying out word segmentation filtering;
the financial information syntactic filtering module is used for carrying out financial information syntactic filtering according to syntactic characteristics in the financial information characteristic subject library;
the characteristic theme model filtering module is used for filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library;
the keyword extraction module is used for calculating and extracting keywords of the filtered financial information;
and the financial information characteristic module is used for generating financial information characteristics according to the extracted keywords and inputting the generated financial information characteristics into the financial information characteristic library again to realize the updating of the financial information characteristic library.
The present application provides a readable storage medium storing a computer program which, when executed by a processor, implements the financial intelligence feature extraction method.
Compared with the prior art, the method has the following beneficial effects:
under the condition of not damaging the financial information core information, the number of the participles of the financial information to be processed is firstly greatly reduced through the financial information characteristic subject database, so that the space dimension of the participle vector is reduced, the calculation is simplified, and the speed and the efficiency of financial information processing are improved. Secondly, based on the characteristic subject library, the characteristic value extracted in advance in the financial information can be efficiently evaluated through a characteristic evaluation function based on a factor analysis method, and the requirement of financial information analysis is better met. Thirdly, the defect that the market lacks of professional financial information characteristic theme analysis is overcome, and a professional support tool is provided for the financial information analysis.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a flow chart of the financial intelligence feature extraction based on feature topics in an embodiment of the present invention;
FIG. 2 is a flow chart of financial information preprocessing according to an embodiment of the present invention;
FIG. 3 is a flow chart of the financial intelligence feature topic library in an embodiment of the present invention;
FIG. 4 is a flow chart of financial information segmentation filtering according to an embodiment of the present invention;
FIG. 5 is a flow chart of syntax filtering of financial intelligence in an embodiment of the present invention;
FIG. 6 is a flow chart of the financial intelligence topic model filtering in an embodiment of the present invention;
FIG. 7 is a flow chart of keyword calculation and extraction in an embodiment of the present invention;
FIG. 8 is a flow chart of financial intelligence characterization in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention.
In addition, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict. It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
As shown in fig. 1, the method for extracting financial information features based on feature topics disclosed in this embodiment includes the following steps:
s101, financial information is input, and a data structure of the financial information comprises structured data, semi-structured data and unstructured data;
s102, preprocessing financial information, preprocessing data noise and data inconsistency of the financial information, and detecting abnormal points;
s103, performing financial information word segmentation according to word segmentation characteristics in the financial information characteristic subject database, and performing word segmentation filtering;
performing syntactic filtering of financial information according to syntactic characteristics in the financial information characteristic subject library;
filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library;
the financial information feature subject library has the features of all existing financial information, and the feature library can continuously iterate along with the extraction of each financial information feature.
S104, calculating and extracting keywords aiming at the filtered financial information;
s105, generating the characteristics of financial information according to the extracted keywords; and the generated financial information characteristics are input into the financial information characteristic subject library again to realize the updating of the financial information characteristic library.
It is worth to be noted that the intelligence word segmentation filtering, the intelligence syntax filtering and the feature topic model filtering in step S103 may be performed sequentially without being limited to the order, or may be performed simultaneously.
Optionally, as shown in fig. 2, in some embodiments, step S102 specifically includes:
s10201, after entering into financial information preprocessing module, the inputted financial information is cleaned first to realize preprocessing of data noise and data inconsistency of financial information;
s10202, then, data abnormity detection is carried out on the input financial information so as to meet the format requirement of subsequent financial information analysis processing.
As shown in fig. 3, the financial information feature topic library is the core of the present application, and it possesses the relevant features of the existing financial information and the feature topic formed by these features. The financial information characteristic theme library mainly comprises a financial information characteristic theme word library, a financial information characteristic theme syntax library and a financial information characteristic theme model library.
The financial information characteristic subject thesaurus comprises various participles related to the existing financial information; the finance information characteristic theme syntax library contains various syntaxes about the existing finance information; the financial intelligence characteristic topic model library contains various topic models related to the existing financial intelligence.
The financial information entering the financial information characteristic subject library each time is firstly subjected to the retrieval of the financial information characteristic subject, and the existing characteristic subject of the financial information is quickly retrieved; if yes, entering a related characteristic theme library to match the characteristic themes, otherwise, directly ending the step.
Optionally, as shown in fig. 4, in some embodiments, the method for filtering intelligence participles in step S103 mainly includes:
firstly, calling a financial information characteristic subject library;
then, searching related participle characteristic topics, and then performing financial information characteristic topic participle according to the search result;
and then, performing word segmentation filtering according to the topic word segmentation, and filtering out the word segmentation meeting the requirement.
Optionally, as shown in fig. 5, in some embodiments, the method for filtering the intelligence syntax in step S103 mainly includes:
firstly, calling a financial information characteristic subject library;
then, searching related syntactic characteristic topics, and performing syntactic analysis on the financial information characteristic topics according to a search result;
and then, syntax filtering is carried out according to the theme syntax, and financial intelligence sentences which meet the requirements are filtered.
Optionally, as shown in fig. 6, in some embodiments, the feature topic model filtering method in step S103 mainly includes:
firstly, calling a financial information characteristic subject library;
then, searching related characteristic themes of the financial information, and then performing financial information characteristic theme model analysis according to a search result;
and then, carrying out model filtering according to the characteristic topic model, and filtering out the financial information characteristics meeting the requirements.
As shown in fig. 1, the financial information is subjected to word segmentation filtering, syntax filtering, model filtering, and then keyword calculation and extraction. Optionally, as shown in fig. 7, in some embodiments, step S104 mainly includes:
s10401, summarizing the characteristics generated by word segmentation filtering, syntax filtering and model filtering;
s10402, calling a feature evaluation function, and calculating and evaluating each summarized feature to obtain a score of each feature;
and S10403, performing feature extraction according to the preset feature number according to the level of the score.
Optionally, the feature evaluation function is as follows:
Y=A*F+B
in the above formula, Y is a feature evaluation function, a is an evaluation coefficient based on a feature topic, F is an evaluation factor based on a feature topic, and B is a special factor based on a feature topic.
The core of the financial intelligence feature extraction is to perform factor analysis on a plurality of feature topic indexes and extract evaluation factors, and because the feature topic is a matrix vector of p × k, Y is also a vector matrix.
The vector matrix of Y = a × F + B thus represents the formula, i.e.:
Figure BDA0003907746570000071
/>
in the above formula, the vector Y (Y) 1 ,y 2 ,y 3 ,…,y p ) Is a 1 × p vector of a feature evaluation function Y, and represents that the same financial information feature is composed of a plurality of local financial information features, wherein Y 1 ,y 2 ,y 3 ,...,y p Is the local financial information feature, i.e. Y is defined by Y 1 ,y 2 ,y 3 ,...,y p The components are as follows.
A(a ij ) Is an evaluation factor F (F) 1 ,f 2 ,f 3 ,…,f k ) Is a matrix vector of p x k, a ij (i =1,2, \8230;, p; j =1,2, \8230;, k) is called the local feature weight coefficient, which is the feature weight of the ith variable on the jth common factor.
F(f 1 ,f 2 ,f 3 ,…,f k ) Is Y (Y) 1 ,y 2 ,y 3 ,…,y p ) Is also a 1 x p vector, wherein f 1 ,f 2 ,f 3 ,...,f k Is the financial information characteristic y of each part 1 ,y 2 ,y 3 ,...,y p Evaluation factor of f 1 ,f 2 ,f 3 ,...,f k Are evaluation factors independent from each other, and each factor is also independent from each other and different.
B(b 1 ,b 2 ,b 3 ,…,b p ) Is Y (Y) 1 ,y 2 ,y 3 ,…,y p ) The special factor based on the feature topic of (1 × p) is also a vector which cannot be evaluated by the first k evaluation factors f 1 ,f 2 ,f 3 ,...,f k The included parts and each factor are independent and different from each other.
The factor analysis method is a multivariate statistical analysis method which starts from the internal correlation dependency relationship of characteristic variables of summarized financial information and classifies some variables with complicated relationship into a few comprehensive factors. Therefore, the original summarized feature data can be classified and merged, the closely related feature variables are classified respectively, and a plurality of comprehensive indexes are classified, wherein the comprehensive indexes are not related to each other, namely the comprehensive information of the comprehensive indexes is not overlapped with each other.
The factor analysis method is a multivariate statistical analysis method which starts from the research of the dependency relationship in the index correlation matrix and summarizes some variables with overlapped information and complex relationship into a few irrelevant comprehensive factors. The basic idea is as follows: the variables are grouped according to the correlation size, so that the correlation between the variables in the same group is high, but the variables in different groups are not correlated or have low correlation, and each group of variables represents a basic structure, namely a common factor.
The core of the factor analysis method is to perform factor analysis on a plurality of comprehensive indexes and extract common factors, and then construct a score function by taking the variance contribution rate of each factor as the sum of the weight and the score multiplier of the factor. The mathematical representation of the factor analysis method of the present embodiment is a matrix:
as shown in fig. 8, the financial information is calculated and extracted by keywords, and the corresponding features of the financial information are extracted to form financial information features, and then the extracted financial information features are subjected to feature iterative optimization, and the newly extracted features of the financial information are iterated to the financial information feature topic library again, so as to update the financial information feature topic library.
The financial information extraction method and the financial information extraction system have the advantages that the problem that the calculated amount is too large in the extraction of traditional financial information features, the feature noise is large, the professional features are few is solved, the financial information is preprocessed, the financial information word segmentation filtering, the syntax filtering and the filtering of feature subject models are respectively carried out according to multi-dimensional feature subjects in a financial information feature subject library, then the calculation and the extraction of keywords are achieved, the features of financial information are extracted, the whole extraction process is based on the feature subjects of the financial information, the feature filtering and the calculation of the multiple subjects are achieved, and the accurate extraction of the financial information features is achieved.
Based on the above financial information feature extraction method based on the feature theme, the present application provides a financial information feature extraction system based on the feature theme, which can be used to implement the above financial information feature extraction method based on the feature theme, and the system includes:
the input module is used for inputting financial information;
the financial information preprocessing module is used for preprocessing financial information to realize the preprocessing of data noise and data inconsistency of the financial information and the detection of abnormal points;
the financial information characteristic theme base module comprises the characteristics of financial information, and the characteristic base can continuously iterate along with the extraction of the characteristics of the financial information each time;
the financial information word segmentation filtering module is used for carrying out financial information word segmentation according to word segmentation characteristics in the financial information characteristic subject database and carrying out word segmentation filtering;
the financial information syntactic filtering module is used for carrying out financial information syntactic filtering according to syntactic characteristics in the financial information characteristic subject library;
the characteristic theme model filtering module is used for filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library;
the keyword extraction module is used for calculating and extracting keywords of the filtered financial information;
and the financial information characteristic module is used for generating financial information characteristics according to the extracted keywords and inputting the generated financial information characteristics into the financial information characteristic library again to realize the updating of the financial information characteristic library.
The present application provides a readable storage medium storing a computer program which, when executed by a processor, implements the financial intelligence feature extraction method.
According to the method, some key features can be extracted as few as possible under the condition that financial information core information is not damaged, the score values of all the features are calculated according to a certain feature evaluation function, then the features are ranked according to the score values, and a plurality of feature words with the highest score values are selected as feature words, so that the space dimension of the feature vectors can be reduced, the calculation is simplified, and the speed and the efficiency of financial information processing are improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above embodiments are provided to explain the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. The financial information feature extraction method based on the feature theme is characterized by comprising the following steps: the method comprises the following steps:
preprocessing input financial information;
performing financial information word segmentation according to word segmentation characteristics in the financial information characteristic subject database, and performing information word segmentation filtering;
performing syntactic filtering of financial information according to syntactic characteristics in the financial information characteristic subject library;
filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library;
calculating and extracting key words of the filtered financial information;
generating the characteristics of financial information according to the extracted keywords, and re-inputting the generated characteristics of the financial information into the financial information characteristic library to realize the iteration and the update of the financial information characteristic library;
the financial intelligence characteristic theme library has the relevant characteristics of the existing financial intelligence and the characteristic theme formed by the characteristics;
the financial information characteristic subject library has the characteristics of the existing financial information, and the characteristic subject library can be continuously iterated and updated along with the extraction of the financial information characteristics every time.
2. The feature topic based financial intelligence feature extraction method of claim 1, wherein: the method for preprocessing the input financial intelligence comprises the following steps:
firstly, data cleaning is carried out on financial information, and preprocessing of data noise and data inconsistency of the financial information is achieved;
and then carrying out data anomaly detection on the financial information so as to meet the format requirement of subsequent financial information analysis and processing.
3. The feature topic based financial intelligence feature extraction method of claim 1, further comprising: the data structure of the input financial intelligence comprises structured data, semi-structured data and unstructured data.
4. The feature topic based financial intelligence feature extraction method of claim 1, further comprising: the financial information characteristic theme library comprises a financial information characteristic theme word library, a financial information characteristic theme syntax library and a financial information characteristic theme model library;
the financial information characteristic topic word library comprises various participles about the existing financial information;
the financial information characteristic subject syntax library comprises various syntaxes about the existing financial information;
the financial information characteristic theme model library comprises various theme models related to the existing financial information;
the financial information entering the financial information characteristic subject database each time is searched for the existing characteristic subjects of the financial information by the retrieval of the financial information characteristic subjects; if yes, entering a related characteristic theme library to match the characteristic theme, otherwise, directly ending the step.
5. The feature topic based financial intelligence feature extraction method of claim 1, wherein: the financial information word segmentation filtering method comprises the following steps: firstly, calling a financial information characteristic subject library; then searching related participle characteristic topics, and then performing financial information characteristic topic participle according to the search result; then, performing word segmentation filtering according to the topic word segmentation, and filtering out the word segmentation meeting the requirements;
the method for filtering the intelligence syntax comprises the following steps: firstly, calling a financial information characteristic subject library; then, searching related syntactic characteristic topics, and performing syntactic analysis on the financial information characteristic topics according to a search result; then, syntax filtering is carried out according to the theme syntax, and financial information sentences which meet the requirements are filtered;
the feature topic model filtering method comprises the following steps: firstly, calling a financial information characteristic subject library; then, searching related characteristic themes of the financial information, and then performing financial information characteristic theme model analysis according to the search result; and then, carrying out model filtering according to the characteristic topic model, and filtering out the financial information characteristics meeting the requirements.
6. The feature topic based financial intelligence feature extraction method of claim 1, wherein: the method for calculating and extracting the keywords comprises the following steps:
firstly, summarizing characteristics generated by word segmentation filtering, syntax filtering and characteristic topic model filtering;
then calling a feature evaluation function, and calculating and evaluating each summarized feature to obtain a score of each feature;
and then performing feature extraction according to preset feature numbers according to the scores.
7. The feature topic based financial intelligence feature extraction method of claim 6, wherein: the characteristic evaluation function is realized by adopting a factor analysis method.
8. The feature topic based financial intelligence feature extraction method of claim 6, wherein: the feature evaluation function is:
Figure 3597DEST_PATH_IMAGE002
in the above-mentioned formula, the compound has the following structure,
Figure DEST_PATH_IMAGE003
the method is characterized by comprising the following steps of A, F and B, wherein A is an evaluation coefficient based on a characteristic theme, F is an evaluation factor based on the characteristic theme, and B is a special factor based on the characteristic theme.
9. Financial information characteristic extraction system based on characteristic theme, its characterized in that: financial intelligence feature extraction method for implementing the financial intelligence feature extraction system of any of claims 1-8, the financial intelligence feature extraction system comprising:
the input module is used for inputting financial information;
the financial information preprocessing module is used for preprocessing financial information, preprocessing data noise and data inconsistency of the financial information and detecting abnormal points;
the financial information characteristic subject library module comprises the characteristics of financial information, and the characteristic library can continuously iterate along with the extraction of the financial information characteristics each time;
the financial information word segmentation filtering module is used for carrying out financial information word segmentation according to word segmentation characteristics in the financial information characteristic subject database and carrying out word segmentation filtering;
the financial information syntactic filtering module is used for carrying out financial information syntactic filtering according to syntactic characteristics in the financial information characteristic subject library;
the characteristic theme model filtering module is used for filtering the characteristic theme model according to the characteristic theme model in the financial information characteristic theme library;
the keyword extraction module is used for calculating and extracting keywords of the filtered financial information;
and the financial information characteristic module is used for generating financial information characteristics according to the extracted keywords and inputting the generated financial information characteristics into the financial information characteristic library again to realize the updating of the financial information characteristic library.
10. A readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a financial intelligence feature extraction method as claimed in any of claims 1-8.
CN202211310184.2A 2022-10-25 2022-10-25 Financial information feature extraction method and system based on feature theme and storage medium Pending CN115906830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211310184.2A CN115906830A (en) 2022-10-25 2022-10-25 Financial information feature extraction method and system based on feature theme and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211310184.2A CN115906830A (en) 2022-10-25 2022-10-25 Financial information feature extraction method and system based on feature theme and storage medium

Publications (1)

Publication Number Publication Date
CN115906830A true CN115906830A (en) 2023-04-04

Family

ID=86496507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211310184.2A Pending CN115906830A (en) 2022-10-25 2022-10-25 Financial information feature extraction method and system based on feature theme and storage medium

Country Status (1)

Country Link
CN (1) CN115906830A (en)

Similar Documents

Publication Publication Date Title
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
US9183286B2 (en) Methodologies and analytics tools for identifying white space opportunities in a given industry
US20180300315A1 (en) Systems and methods for document processing using machine learning
US7912849B2 (en) Method for determining contextual summary information across documents
US9280535B2 (en) Natural language querying with cascaded conditional random fields
CN113268569B (en) Semantic-based related word searching method and device, electronic equipment and storage medium
CN111767716A (en) Method and device for determining enterprise multilevel industry information and computer equipment
CN108875065B (en) Indonesia news webpage recommendation method based on content
CN110688593A (en) Social media account identification method and system
Mao et al. Automatic keywords extraction based on co-occurrence and semantic relationships between words
CN111274494B (en) Composite label recommendation method combining deep learning and collaborative filtering technology
CN107239455B (en) Core word recognition method and device
Desai et al. Automatic text summarization using supervised machine learning technique for Hindi langauge
JP2006227823A (en) Information processor and its control method
US11295078B2 (en) Portfolio-based text analytics tool
Rajman et al. From text to knowledge: Document processing and visualization: A text mining approach
Aljamel et al. Domain-specific relation extraction: Using distant supervision machine learning
Moumtzidou et al. Discovery of environmental nodes in the web
Chahal et al. An ontology based approach for finding semantic similarity between web documents
CN115906830A (en) Financial information feature extraction method and system based on feature theme and storage medium
CN113516202A (en) Webpage accurate classification method for CBL feature extraction and denoising
Ajitha et al. EFFECTIVE FEATURE EXTRACTION FOR DOCUMENT CLUSTERING TO ENHANCE SEARCH ENGINE USING XML.
Fan et al. Stop Words for Processing Software Engineering Documents: Do they Matter?
Saidi et al. New approch of opinion analysis from big social data environment using a supervised machine learning algirithm
CN116126893B (en) Data association retrieval method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination