CN113283461A

CN113283461A - Financial big data processing system and method based on block chain

Info

Publication number: CN113283461A
Application number: CN202110260389.3A
Authority: CN
Inventors: 张金海
Original assignee: Shanghai Maimi Technology Development Co ltd
Current assignee: Shanghai Maimi Technology Development Co ltd
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2021-08-20

Abstract

The financial big data processing method based on the block chain comprises the steps of collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; and Hash chaining and storing the generated relation classification result of the text to be classified based on a block chain technology. The invention realizes the effect of improving the data relation classification accuracy and the classification task accuracy, respectively calculates the cross entropy cost for the first probability distribution and the second probability distribution, then adds the cross entropy costs, and minimizes the sum of the costs, thereby leading the accuracy of the classification task to be higher.

Description

Financial big data processing system and method based on block chain

Technical Field

The application relates to the technical field of computers, in particular to a financial big data processing system and method based on a block chain.

Background

Block chain from a technology level, the block chain involves many scientific and technical problems such as mathematics, cryptography, internet and computer programming. From the application perspective, the blockchain is simply a distributed shared account book and database, and has the characteristics of decentralization, no tampering, trace remaining in the whole process, traceability, collective maintenance, public transparency and the like. The characteristics ensure the honesty and the transparency of the block chain and lay a foundation for creating trust for the block chain. And the rich application scenes of the block chains basically solve the problem of information asymmetry based on the block chains, and realize the cooperative trust and consistent action among a plurality of main bodies.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. The block chain is an important concept of bitcoin, which is essentially a decentralized database, and is a string of data blocks generated by using a cryptographic method to correlate the underlying technology of bitcoin.

At present, the blockchain is gradually applied to the field of financial big data, for example, a financial data traceability method based on the blockchain and the identification technology disclosed in the invention patent with the application number of CN202010370723.6 can store financial data in the blockchain for data traceability, and because data in the blockchain cannot be tampered, the security and reliability of the traceability data and the process are effectively improved. Specifically, the second device may send the first encrypted data to the first device, and the first device stores the first encrypted data in the blockchain. When the first device stores the first encrypted data in the blockchain, the first device may obtain the first identifier, and correspondingly store the first identifier and the first encrypted data in the blockchain. After the first device correspondingly stores the first encrypted data and the first identifier in the block chain, a first message carrying the first identifier may be sent to the second device, where the first message is used to indicate that the first encrypted data has been successfully stored in the block chain. The first identifier may be used as an index to the first encrypted data to facilitate tracing the first encrypted data.

It can be seen that the data source tracing of the blockchain applied to the financial big data is already mature, but the application of the blockchain to the financial text related data processing of the financial big data has great technical barriers and technical problems, for example, based on the blockchain technology, when text processing and classification are performed on the financial text big data, problems of inaccurate data relation classification and low classification task accuracy exist.

Disclosure of Invention

Therefore, it is necessary to provide a system and a method for processing financial big data based on a block chain, which can improve the accuracy of data relation classification and the accuracy of classification task.

The technical scheme of the invention is as follows:

a financial big data processing method based on a blockchain, the method comprising:

step S100: collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts;

step S200: performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified;

step S300: and Hash chaining and storing the generated relation classification result of the text to be classified based on a block chain technology.

Specifically, step S200: performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; the method specifically comprises the following steps:

step S210: collecting an original text set, and carrying out primary processing on a text and a relation mark in the original text set to obtain a marked expression, wherein the original text set comprises a chapter text and a relation mark corresponding to the chapter text; the preliminary processing of the discourse texts containing the implicit relations and the relation marks comprises the steps of extracting each pair of texts with the implicit relations in the texts, corresponding to the relations, and processing the texts into a series of ordered, fixed-format and discourse-level network models needing input; finally, dividing the input area required by the processed network model into a network model learning set and a network model checking set;

step S220: randomly selecting a pair of sample sentences from a network model learning set and a network model checking set as network model input according to a preset batch size, segmenting a first sentence and a second sentence in the selected sample sentences before inputting the selected pair of sample sentences into a pre-learned pre-training language model, adding a first dynamic vector bit before the first sentence, adding a second dynamic vector bit between the first sentence and the second sentence and after the second sentence respectively, inputting the segmented sentences into the pre-learned pre-training language model together, generating hidden layer vectors for the whole sentence pair by using model parameters in the pre-training language model, wherein each segmented word input into the pre-training language model corresponds to a dynamic word vector expression, and an output vector corresponding to the first dynamic vector bit comprises a relationship between an upper sentence and a lower sentence, finally, combining vectors obtained from the upper sentence and the lower sentence into a sequence to obtain the initial matrix vector expression of the whole sentence to the sequence, wherein the word vector corresponding to the first dynamic vector bit output by the pre-training language model comprises the relationship between the two sentences and the global information;

step S230: respectively acquiring specific sequence information of each sentence by adopting a time-cycle neural network, establishing a context relationship, specifically, respectively inputting word vector matrixes of upper and lower sentences into the time-cycle neural network, respectively outputting to obtain forward expression and backward expression, and then combining the forward expression and the backward expression to obtain expression of sequence information characteristics;

step S240: inputting the expression of the sequence information features obtained by combination into a graph convolution neural model, and modeling the relation between words in sentence pairs by using a graph convolution method; the expression of each word output by the graph convolution neural model is fused with word pair information between sentence pairs, and then the expressions of all words of the two sentences are input into the pooling layer to obtain the characteristic expression of the inter-sentence relation modeled by the graph convolution neural model;

step S250: classifying output characteristics of a first dynamic vector bit of the pre-training language model by using a first classifier, and converting the expression of a vector corresponding to the first dynamic vector bit into a first probability distribution of each relation through a feedforward network and a classification layer; inputting the feature expression of the interphrase relation modeled by the graph convolution neural model into a feed-forward network and a classification layer by using a second classifier, and converting the feature expression into a second probability distribution of each interphrase relation; respectively calculating cross entropy costs for the first probability distribution and the second probability distribution, adding the cross entropy costs to obtain a cost sum, and then minimizing the cost sum; finally, inputting network model learning samples and network model checking samples into the network model in random batches, continuously updating optimal parameter values of the network model by using an incremental gradient descent method, and simultaneously calculating optimal indexes on a network model checking set, wherein the optimal indexes are accuracy, recall rate and macro average value, when the optimal indexes on the network model checking set are not improved any more or the network model iterates for a certain number of times, the learning is stopped, and the optimal network model is represented on the network model checking set;

step S260: and loading the archived network model inspection set with the optimal network model, fixing the parameters of the network model, inputting the texts to be classified into the network model in batches, and outputting the relational classification results of the texts to be classified through the operation of the network model.

Specifically, step S260: loading the archived network model inspection set with the optimal network model, fixing the parameters of the network model, inputting the texts to be classified into the network model in batches, and outputting the relational classification results of the texts to be classified through the operation of the network model; the method specifically comprises the following steps:

step S261: loading an archived network model inspection set with an optimal network model, fixing parameters of the network model, and acquiring actual acquisition time nodes of the texts to be classified, wherein one text to be classified corresponds to one actual acquisition time node;

step S262: and sequencing the actual acquisition time nodes according to time and time, inputting the corresponding texts to be classified into the network model in batches according to the sequenced actual acquisition time nodes, and outputting a relation classification result of the texts to be classified through the operation of the network model.

Specifically, step S300: based on a block chain technology, Hash chaining and storing the generated relation classification result of the text to be classified; the method specifically comprises the following steps:

step S310: based on a block chain technology, carrying out Hash value extraction processing on the generated relation classification result of the text to be classified, and acquiring a current actual Hash value of the relation classification result of the text to be classified;

step S320: according to the obtained current actual hash value of the relation classification result of the text to be classified;

step S330: and according to the obtained current actual hash value of the relation classification result of the text to be classified, carrying out hash chain-linking on the relation classification result of the text to be classified and storing the relation classification result.

Specifically, step S100: collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; the method specifically comprises the following steps:

step S110: acquiring a loaded financial big data initial text in real time;

step S120: collecting the initial text of the financial big data acquired in real time and respectively generating a financial big data subset;

step S130: and generating the original text set according to the generated financial big data subset.

A blockchain-based financial big data processing system, the system comprising:

the system comprises an original text set acquisition module, a text analysis module and a text analysis module, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts;

the relation classification result generation module is used for performing text batch model classification processing on the original text set according to the original text set and generating a relation classification result of the text to be classified;

and the Hash chain-loading module is used for Hash chain-loading and storing the generated relation classification result of the text to be classified based on a block chain technology.

Specifically, the relationship classification result generation module includes:

the system comprises an original text collection module, a marking expression module and a text analysis module, wherein the original text collection module is used for collecting an original text set, and preliminarily processing texts and relationship marks in the original text set to obtain marked expression, and the original text set comprises discourse texts and relationship marks corresponding to the discourse texts; the preliminary processing of the discourse texts containing the implicit relations and the relation marks comprises the steps of extracting each pair of texts with the implicit relations in the texts, corresponding to the relations, and processing the texts into a series of ordered, fixed-format and discourse-level network models needing input; finally, dividing the input area required by the processed network model into a network model learning set and a network model checking set;

a global information obtaining module, for randomly selecting a pair of sample sentences from the network model learning set and the network model checking set as the network model input according to the preset batch size, before inputting the selected pair of sample sentences into the pre-learned pre-training language model, segmenting the first sentence and the second sentence in the selected sample sentences, then adding a first dynamic vector bit before the first sentence, and adding a second dynamic vector bit between the first sentence and the second sentence and after the second sentence, then inputting them into the pre-learned pre-training language model together, generating hidden layer vectors for the whole sentence pair by using the model parameters in the pre-training language model, each segmentation of the input pre-training language model corresponds to a dynamic word vector expression, and the output vector corresponding to the first dynamic vector bit contains the relationship between the upper sentence and the lower sentence, finally, combining vectors obtained from the upper sentence and the lower sentence into a sequence to obtain the initial matrix vector expression of the whole sentence to the sequence, wherein the word vector corresponding to the first dynamic vector bit output by the pre-training language model comprises the relationship between the two sentences and the global information;

the sequence information characteristic expression module is used for respectively acquiring specific sequence information of each sentence by adopting a time-cycle neural network, establishing a context relationship, respectively inputting word vector matrixes of upper and lower sentences into the time-cycle neural network, outputting the word vector matrixes to respectively obtain forward expression and backward expression, and then combining the forward expression and the backward expression to obtain the expression of the sequence information characteristic;

the sentence relation characteristic expression module is used for inputting the expression of the sequence information characteristics obtained by combination into a graph convolution neural model and modeling the relation between words in a sentence pair by using a graph convolution method; the expression of each word output by the graph convolution neural model is fused with word pair information between sentence pairs, and then the expressions of all words of the two sentences are input into the pooling layer to obtain the characteristic expression of the inter-sentence relation modeled by the graph convolution neural model;

the network model learning sample module is used for classifying the output characteristics of a first dynamic vector bit of the pre-training language model by using a first classifier, and converting the expression of a vector corresponding to the first dynamic vector bit into a first probability distribution of each relation through a feedforward network and a classification layer; inputting the feature expression of the interphrase relation modeled by the graph convolution neural model into a feed-forward network and a classification layer by using a second classifier, and converting the feature expression into a second probability distribution of each interphrase relation; respectively calculating cross entropy costs for the first probability distribution and the second probability distribution, adding the cross entropy costs to obtain a cost sum, and then minimizing the cost sum; finally, inputting network model learning samples and network model checking samples into the network model in random batches, continuously updating optimal parameter values of the network model by using an incremental gradient descent method, and simultaneously calculating optimal indexes on a network model checking set, wherein the optimal indexes are accuracy, recall rate and macro average value, when the optimal indexes on the network model checking set are not improved any more or the network model iterates for a certain number of times, the learning is stopped, and the optimal network model is represented on the network model checking set;

and the operation output module is used for loading the archived network model inspection set to represent the optimal network model, fixing the parameters of the network model, inputting the texts to be classified into the network model in batches, and outputting the relational classification result of the texts to be classified through the operation of the network model.

Specifically, the hash chaining module further includes:

the current actual hash value acquisition module is used for performing hash value extraction processing on the generated relation classification result of the text to be classified based on a block chain technology and acquiring a current actual hash value of the relation classification result of the text to be classified;

the relation classification result acquisition module is used for acquiring the current actual hash value of the relation classification result of the text to be classified;

and the Hash chain-winding storage module is used for Hash chain-winding and storing the relation classification result of the text to be classified according to the obtained current actual Hash value of the relation classification result of the text to be classified.

A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above-mentioned financial big data processing method based on block chain when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned blockchain-based financial big data processing method.

The invention has the following technical effects:

1. firstly, collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; then, performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; finally, based on a block chain technology, the generated relation classification result of the text to be classified is subjected to Hash chain linking and storage, wherein the problems of inaccurate data relation classification and low classification task accuracy rate when text processing classification is carried out on financial related text big data are solved based on the block chain technology, and the effects of improving the data relation classification accuracy rate and the classification task accuracy rate are realized;

2. the pre-learning model is used for obtaining better dynamic word vector expression, so that the overall expression effect of sentences in the big data text is improved; meanwhile, the interaction between the upper sentence and the lower sentence is enhanced by adopting a graph convolution neural model, so that the accuracy of relation classification is improved; moreover, the cross-entropy costs are calculated separately for the first probability distribution and the second probability distribution, then added, and then the sum of the costs is minimized, which results in higher accuracy of the classification task.

Drawings

FIG. 1 is a flow diagram illustrating a method for processing financial big data based on a blockchain in one embodiment;

FIG. 2 is a block diagram of a blockchain-based financial big data processing system in one embodiment;

FIG. 3 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, there is provided a method for processing financial big data based on a blockchain, the method including:

Firstly, collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; then, performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; and finally, based on a block chain technology, Hash chaining and storing the generated relation classification result of the text to be classified, wherein based on the block chain technology, the problems of inaccurate data relation classification and low classification task accuracy rate when text processing classification is carried out on financial related text big data are solved, and the effects of improving the data relation classification accuracy rate and the classification task accuracy rate are realized.

In one embodiment, step S200: performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; the method specifically comprises the following steps:

specifically, in this step, the preliminary processing of the chapter text and the relationship mark containing the implicit relationship includes extracting each pair of texts with the implicit relationship in the texts, corresponding to the relationship, and processing the texts into a series of ordered, fixed-format, chapter-level network models for input, so as to realize the ordered entry of financial big data and improve the data processing efficiency.

specifically, in the step, vectors obtained from the upper sentence and the lower sentence are combined into a sequence to obtain the initial matrix vector expression of the whole sentence to the sequence, and the word vector corresponding to the first dynamic vector bit output by the pre-training language model contains the relationship between the two sentences and the global information, so that the better dynamic word vector expression is obtained by using the pre-learning model, and the overall expression effect of the sentences in the big data text is improved.

specifically, in this step, the interaction between the upper sentence and the lower sentence is enhanced by using the graph convolution neural model, so that the accuracy of the relationship classification is improved.

in the step, the cross entropy costs are calculated respectively for the first probability distribution and the second probability distribution, then are added, and then the sum of the costs is minimized, so that the accuracy of the classification task is higher.

In one embodiment, step S260: loading the archived network model inspection set with the optimal network model, fixing the parameters of the network model, inputting the texts to be classified into the network model in batches, and outputting the relational classification results of the texts to be classified through the operation of the network model; the method specifically comprises the following steps:

specifically, in this step, the actual obtaining time node of the texts to be classified is obtained, which is beneficial to orderly arranging the obtained texts to be classified according to the time sequence.

In one embodiment, step S300: based on a block chain technology, Hash chaining and storing the generated relation classification result of the text to be classified; the method specifically comprises the following steps:

specifically, in this step, the current actual hash value of the relation classification result of the text to be classified is obtained, and the current actual hash value is used for chaining, so that the data is guaranteed not to be tampered and stable.

In one embodiment, step S100: collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; the method specifically comprises the following steps:

step S110: acquiring a loaded financial big data initial text in real time;

In one embodiment, as shown in fig. 2, there is provided a blockchain-based financial big data processing system, the system comprising:

In one embodiment, the relational classification result generation module includes:

Specifically, the hash chaining module further includes:

In one embodiment, the method is further used for acquiring the initial text of the loaded financial big data in real time; collecting the initial text of the financial big data acquired in real time and respectively generating a financial big data subset; generating the original text set according to the generated financial big data subset

In one embodiment, as shown in fig. 3, a computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above-mentioned financial big data processing method based on block chain when executing the computer program.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A financial big data processing method based on a block chain is characterized by comprising the following steps:

2. The method for processing financial big data based on block chain according to claim 1, wherein the step S200: performing text batch model classification processing on the original text set according to the original text set, and generating a relation classification result of the text to be classified; the method specifically comprises the following steps:

3. The method for processing financial big data based on block chain according to claim 2, wherein the step S260: loading the archived network model inspection set with the optimal network model, fixing the parameters of the network model, inputting the texts to be classified into the network model in batches, and outputting the relational classification results of the texts to be classified through the operation of the network model; the method specifically comprises the following steps:

4. The blockchain-based financial big data processing method according to any one of claims 1 to 3, wherein the step S300: based on a block chain technology, Hash chaining and storing the generated relation classification result of the text to be classified; the method specifically comprises the following steps:

5. The blockchain-based financial big data processing method according to any one of claims 1 to 3,

step S100: collecting an original text set containing financial big data, wherein the original text set comprises a plurality of financial big data subsets, and each financial big data subset comprises a plurality of financial data initial texts; the method specifically comprises the following steps:

step S110: acquiring a loaded financial big data initial text in real time;

6. A blockchain-based financial big data processing system, the system comprising:

7. The blockchain-based financial big data processing system according to claim 6, wherein the relational classification result generation module includes:

8. The blockchain-based financial big data processing system according to claim 1, wherein the hash chaining module further comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.