CN116150371A - Asset repayment plan mass data processing method based on sharingJDBC - Google Patents
Asset repayment plan mass data processing method based on sharingJDBC Download PDFInfo
- Publication number
- CN116150371A CN116150371A CN202310141878.6A CN202310141878A CN116150371A CN 116150371 A CN116150371 A CN 116150371A CN 202310141878 A CN202310141878 A CN 202310141878A CN 116150371 A CN116150371 A CN 116150371A
- Authority
- CN
- China
- Prior art keywords
- data
- feature
- semantic
- semantic understanding
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000003672 processing method Methods 0.000 title claims description 8
- 230000011218 segmentation Effects 0.000 claims abstract description 116
- 238000012545 processing Methods 0.000 claims abstract description 81
- 238000000034 method Methods 0.000 claims abstract description 53
- 239000013598 vector Substances 0.000 claims description 256
- 239000011159 matrix material Substances 0.000 claims description 154
- 238000009826 distribution Methods 0.000 claims description 58
- 238000013527 convolutional neural network Methods 0.000 claims description 29
- 238000005520 cutting process Methods 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 claims description 17
- 238000003062 neural network model Methods 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000001788 irregular Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 abstract description 9
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 18
- 238000003860 storage Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 9
- 238000013439 planning Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The utility model relates to the field of data processing, and particularly discloses a method for processing mass data of an asset repayment plan based on a sharingJDBC, which is characterized in that an alternative segmentation scheme is used for segmenting mass data of the asset repayment plan by adopting an artificial intelligence technology based on deep learning, so that semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data are extracted, and expression of the semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that the semantic understanding accuracy of the mass data of the asset repayment plan is improved, and the rationality judgment accuracy of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
Description
Technical Field
The present application relates to the field of data processing, and more particularly, to a method for processing mass data of an asset repayment plan based on a sharingjdbc.
Background
Apache ShardingSphere is a distributed database ecosystem, which can convert any database into a distributed database and enhance the original database through the capabilities of data slicing, elastic expansion, encryption and the like. ShardingJDBC is taken as a product of Apache ShardingSphere, and can be independently deployed and support the product composition of mixed deployment and matched use. The method and the system provide standardized incremental functions based on the database as a storage node, and can be suitable for various application scenes such as Java isomorphism, heterogeneous languages, cloud native and the like.
Although the introduction of the sharingjdbc is non-intrusive to the service code, no modification of any service code logic is required, and the segmentation can be completed only by introducing the jar packet and modifying the configuration file. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to split data based on manual experience, but the manual experience cannot reasonably slice based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected.
Accordingly, an optimized sharingjdbc-based asset repayment plan mass data processing scheme is desired.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides a method for processing mass data of an asset repayment plan based on a shared jdbc, which uses an alternative segmentation scheme to segment the mass data of the asset repayment plan by adopting an artificial intelligence technology based on deep learning, so as to extract semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data, and further enhances the expression of the semantic understanding characteristics of each segmentation sub-data through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so as to improve the semantic understanding accuracy of the mass data of the asset repayment plan, and further improve the accuracy of rationality judgment of the alternative segmentation scheme. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
According to one aspect of the present application, there is provided a method for processing mass data of an asset repayment plan based on a sharingjdbc, including:
acquiring mass data of an asset repayment plan to be segmented;
segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data;
Each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix;
the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and
and the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing each of the plurality of segment sub-data through a context encoder based on a converter to obtain a plurality of segment sub-data semantic understanding feature vectors includes: word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words; mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors; performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and cascading the plurality of global context semantic feature vectors to obtain the plurality of cut-molecule data semantic understanding feature vectors.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the performing global context semantic coding on the sequence of word embedded vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors based on a converter idea includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the calculating the euclidean distance between each two segmentation sub-data semantic understanding feature vectors in the plurality of segmentation sub-data semantic understanding feature vectors to obtain a semantic space topology matrix includes: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
wherein ,representation and->Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors> and />Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and
and matrixing the Euclidean distances to obtain the semantic space topology matrix.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of obtaining the semantic space distribution topology feature matrix by using the semantic space topology matrix through a convolutional neural network model as a feature extractor includes: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix includes: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
The method for processing the mass data of the asset repayment plan based on the sharingjdbc is characterized in that the feature distribution optimization is performed on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector, and the method comprises the following steps: performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and spatialization of the vector-normalized hilbert probability of the unfolded feature vector to obtain the classification feature vector according to the following formula:
wherein Is the expansion feature vector,>representing the two norms of the expansion feature vector, < >>Square of two norms representing the said expansion feature vector,/->Is the +.o of the expansion feature vector>Personal characteristic value->An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>Is the +.o of the classification feature vector>And characteristic values.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector includes: and expanding the topological global cut molecular data semantic understanding feature matrix along a row vector or a column vector to obtain the expanded feature vector.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable, includes: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
According to another aspect of the present application, there is provided a system for processing mass data for a shardingJDBC-based asset repayment plan, comprising:
the data acquisition module is used for acquiring mass data of the asset repayment plan to be segmented;
the segmentation module is used for segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub data;
the context coding module is used for enabling each piece of the plurality of pieces of sub-data to pass through a context coder based on a converter so as to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
the Euclidean distance calculation module is used for calculating the Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors so as to obtain a semantic space topology matrix;
the convolution module is used for enabling the semantic space topology matrix to pass through a convolution neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
the two-dimensional matrixing module is used for carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
The graph neural network module is used for enabling the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix to pass through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
the feature distribution optimization module is used for carrying out feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix so as to obtain a classification feature vector; and
and the classification result generation module is used for enabling the classification feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform a method of asset repayment plan mass data processing based on sharingjdbc as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a method of processing asset repayment plan mass data based on sharingjdbc as described above.
Compared with the prior art, the asset repayment plan mass data processing method based on the sharingJDBC uses the alternative segmentation scheme to segment the asset repayment plan mass data by adopting the artificial intelligence technology based on deep learning, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan mass data are extracted, and further expression of the semantic understanding characteristics of each segmentation sub-data is enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan mass data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a flowchart of a method for processing mass data of a shardingJDBC-based asset repayment plan, according to an embodiment of the present application;
FIG. 2 is a schematic architecture diagram of a method for processing mass data of asset repayment plan based on a sharingJDBC according to an embodiment of the present application;
FIG. 3 is a flowchart of context encoding in a method for processing mass data of a sharingJDBC-based asset repayment plan, according to an embodiment of the present application;
FIG. 4 is a flowchart of convolutional neural network encoding in a method for processing mass data of a sharingJDBC-based asset repayment plan according to an embodiment of the present application;
FIG. 5 is a flow chart of a classification process in a method for processing mass data of a shelf jdbc-based asset repayment plan, according to an embodiment of the present application;
FIG. 6 is a block diagram of a system for processing asset repayment plan mass data based on a sharingJDBC in accordance with an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Scene overview
As described in the background art, since the existing data slicing strategy is to split data based on manual experience, the manual experience cannot reasonably slice based on internal information of data and self-structure of data when facing unfamiliar data, which affects the use of subsequent databases. Accordingly, an optimized sharingjdbc-based asset repayment plan mass data processing scheme is desired.
Specifically, in the technical scheme of the application, the asset repayment plan mass data management method based on the sharingjdbc is provided, and is suitable for a general solution of database performance bottleneck caused by large single-table data size. The introduction of the sharingJDBC is non-invasive to the service codes, any service code logic is not required to be modified, and the segmentation can be completed only by introducing a jar packet and modifying a configuration file; splitting the database and the table, wherein the split tables have the same structure, and mapping the relationship between the logic table and the physical table through analysis and routing; after the configuration file is configured into the database sub-table strategy, the split service is clear, and the special purpose of the private database is achieved; the data quantity of a single library (table) is reduced, the system performance is improved, and the stability and the load capacity of the system are improved; for high concurrency scenarios, read-write separation is controlled by configuration policies in order to further reduce the pressure of the server.
Accordingly, considering that although the introduction of the sharingjdbc is not intrusive to the service code, the segmentation can be completed only by introducing the jar packet and modifying the configuration file without modifying any service code logic. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to split data based on manual experience, but the manual experience cannot reasonably slice based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides new solutions and schemes for reasonable data slicing based on internal information of data and the structure of the data itself.
Specifically, in the technical scheme of the application, an artificial intelligence technology based on deep learning is adopted, and an alternative segmentation scheme is used for segmenting the mass data of the asset repayment plan, so that the semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data are extracted, and further, the expression of the semantic understanding characteristics of each segmentation sub-data is enhanced through the semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that the semantic understanding accuracy of the mass data of the asset repayment plan is improved, and the rationality judgment accuracy of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
More specifically, in the technical scheme of the application, first, mass data of an asset repayment plan to be segmented is obtained. Then, in order to explore the rationality of data slicing of the mass data of the asset repayment plan to be segmented, so as to improve the use efficiency of a subsequent database, in the technical scheme of the application, the mass data of the asset repayment plan to be segmented is further segmented based on a first alternative segmentation scheme so as to obtain a plurality of segmentation sub data.
Then, considering that each piece of the plurality of pieces of sub-data is composed of a plurality of words and data, and each word and data has semantic understanding characteristics of relevance, in the technical scheme of the application, each piece of sub-data in the plurality of pieces of sub-data is encoded by a context encoder based on a converter, so that the relevant characteristic information based on global context semantic understanding in each piece of sub-data is extracted, and a plurality of pieces of sub-data semantic understanding characteristic vectors are obtained. That is, based on the transform concept, the converter is used to capture the characteristic of long-distance context dependence, and each of the plurality of segmentation sub-data is respectively subjected to global context semantic coding to obtain a context semantic association feature representation, i.e. the plurality of segmentation sub-data semantic understanding feature vectors, respectively using the overall semantic association of each word in each segmentation sub-data as a context background. It should be understood that, in the technical solution of the present application, the context encoder based on the converter may capture, for each word in the respective segment sub-data, a context semantic association feature representation of a semantic understanding feature related to each word in the respective segment sub-data relative to a semantic understanding feature related to each word in the respective segment sub-data, that is, the global high-dimensional semantic understanding feature information of the respective segment sub-data.
Further, considering that the semantic understanding features of the plurality of sub-data have a correlation relationship, in the technical scheme of the application, in order to improve the accuracy of judging the rationality of the segmentation scheme, the semantic space topological correlation features of the sub-data are further used for enhancing the expression of the semantic understanding features of the sub-data in the to-be-segmented asset repayment plan mass data. Specifically, the Euclidean distance between every two semantic understanding feature vectors of the plurality of semantic understanding feature vectors of the segmentation sub data is calculated, so that similarity associated feature distribution information among the semantic understanding features of each segmentation sub data is represented, and a semantic space topology matrix is obtained. And then, carrying out feature mining on the semantic space topology matrix in a convolutional neural network model serving as a feature extractor so as to extract semantic space topology association features among semantic understanding features of each piece of sub-data, thereby obtaining a semantic space distribution topology feature matrix.
And then, taking the semantic understanding feature vectors of the plurality of the cut molecular data as the feature representation of the nodes, taking the semantic space distribution topological feature matrix as the feature representation of the edges between the nodes, and passing the global cut molecular data semantic understanding feature matrix obtained by two-dimensional arrangement of the semantic understanding feature vectors of the plurality of the cut molecular data and the semantic space distribution topological feature through a graph neural network model to obtain the topological global cut molecular data semantic understanding feature matrix. Specifically, the graph neural network model performs graph structure data coding on the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a learnable neural network parameter to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut molecular data.
And then, further classifying the classification characteristic vector by a classifier to obtain a classification result used for indicating whether the first alternative segmentation scheme is reasonable. That is, in the technical solution of the present application, the label of the classifier includes that the first alternative segmentation scheme is reasonable, and the first alternative segmentation scheme is not reasonable, where the classifier determines, through a soft maximum function, to which classification label the classification feature vector belongs. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, reasonable data segmentation is carried out on mass data of the asset repayment plan based on the internal information of the data and the structure of the data, and the use efficiency of a subsequent database is improved.
Particularly, in the technical scheme of the application, when the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are obtained through a graph neural network model, each topological global cut-molecule data semantic understanding feature vector of the topological global cut-molecule data semantic understanding feature matrix, for example, a row vector represents a feature representation of context coding semantics of single cut-molecule data under semantic space topology, so that the spliced topological global cut-molecule data semantic understanding feature matrix may have poor dependence on a single classification result when classified by a classifier, and the accuracy of the classification result is affected.
Therefore, the topological global tangential molecular data semantic understanding feature matrix is subjected to vector-weighted Hilbert probability spatialization, which is specifically expressed as follows:
is the feature vector obtained after the feature matrix is unfolded by semantic understanding of the topological global cut molecular data, and is ++>Representing the two norms of the feature vector, +.>Representing the square thereof, i.e. the inner product of the eigenvector itself,/->Is the feature vector +.>Is>Characteristic value, and->Is the optimized feature vector +.>Is>And characteristic values.
Here, the vector-generalized Hilbert probability spatially maps feature vectors obtained by semantic understanding of feature matrices by the topological global cut-molecular dataThe eigenvector is itself given in Hilbert space defining the inner product of the vectors>And reduces the feature vector +.>Is a hidden disturbance of the class representation of the special local distribution of the (b) to the class representation of the overall Hilbert spatial topology, thereby increasing the eigenvector +.>Feature distribution collection of (1)Robustness of classification regression converging to a predetermined classification probability while promoting the feature vector by means of the establishment of a metric-induced probability spatial structure>Is a long-range dependence of the feature distribution of classification results across classifiers. Then, the optimized feature vector is directly processed By classifying through the classifier, the dependency of the topological global molecular cutting data semantic understanding feature matrix on classification results is improved when the topological global molecular cutting data semantic understanding feature matrix is classified through the classifier, and the accuracy of the classification results is improved. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, and further, reasonable data segmentation is carried out on mass data of the asset repayment plan, so that the use efficiency of a subsequent database is improved.
Based on this, the application provides a method for processing mass data of an asset repayment plan based on a sharingjdbc, which comprises the following steps: acquiring mass data of an asset repayment plan to be segmented; segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data; each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Exemplary method
Fig. 1 is a flowchart of a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application. As shown in fig. 1, a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application includes the steps of: s110, acquiring mass data of an asset repayment plan to be segmented; s120, segmenting mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub-data; s130, enabling each piece of the plurality of pieces of sub-data to pass through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; s140, calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; s150, passing the semantic space topology matrix through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; s160, carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; s170, passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; s180, performing feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector; and S190, passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Fig. 2 is a schematic architecture diagram of a method for processing mass data of asset repayment plan based on a sharingjdbc according to an embodiment of the present application. In the network structure, as shown in fig. 2, first, mass data of an asset repayment plan to be segmented is acquired; segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data; then, each piece of the plurality of pieces of sub-data passes through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; then, calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; then, carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and then, the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Specifically, in step S110 and step S120, mass data of an asset repayment plan to be segmented is obtained; and segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub data. It should be appreciated that although the introduction of the sharingjdbc is considered non-intrusive to the service code, the splitting can be accomplished by simply introducing the jar packet and modifying the configuration file without modifying any of the service code logic. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to slice data based on artificial experience, but the artificial experience cannot reasonably slice data based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected. Then, in order to explore the rationality of data slicing of the mass data of the asset repayment plan to be segmented, so as to improve the use efficiency of a subsequent database, in the technical scheme of the application, the mass data of the asset repayment plan to be segmented is further segmented based on a first alternative segmentation scheme so as to obtain a plurality of segmentation sub data.
Specifically, in step S130, each of the plurality of slice sub-data is passed through a context encoder based on a converter to obtain a plurality of slice sub-data semantic understanding feature vectors. In the technical scheme of the application, each piece of sub-data in the plurality of pieces of sub-data is encoded by a context encoder based on a converter so as to extract global context semantic understanding associated feature information in each piece of sub-data respectively, and therefore, a plurality of piece of sub-data semantic understanding feature vectors are obtained. That is, based on the transform concept, the converter is used to capture the characteristic of long-distance context dependence, and each of the plurality of segmentation sub-data is respectively subjected to global context semantic coding to obtain a context semantic association feature representation, i.e. the plurality of segmentation sub-data semantic understanding feature vectors, respectively using the overall semantic association of each word in each segmentation sub-data as a context background. It should be understood that, in the technical solution of the present application, the context encoder based on the converter may capture, for each word in the respective segment sub-data, a context semantic association feature representation of a semantic understanding feature related to each word in the respective segment sub-data relative to a semantic understanding feature related to each word in the respective segment sub-data, that is, the global high-dimensional semantic understanding feature information of the respective segment sub-data.
Fig. 3 is a flowchart of context encoding in a method for processing mass data of a shardingJDBC-based asset repayment plan according to an embodiment of the present application. As shown in fig. 3, in the context encoding process, it includes: s210, performing word segmentation processing on each of the plurality of segmentation data to convert each of the plurality of segmentation data into a word sequence composed of a plurality of words; s220, mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter so as to obtain a sequence of word embedding vectors; s230, performing global context semantic coding based on a converter thought on the sequence of word embedded vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and S240, cascading the plurality of global context semantic feature vectors to obtain the plurality of cut molecular data semantic understanding feature vectors. Wherein the performing, by the converter of the converter-based context encoder, global context semantic coding on the sequence of word embedding vectors based on a converter concept to obtain a plurality of global context semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
Specifically, in step S140, a euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors is calculated to obtain a semantic space topology matrix. In the technical scheme of the application, in order to improve the accuracy of rationality judgment of the segmentation scheme, the semantic understanding characteristics of each segmentation sub-data in the to-be-segmented asset repayment plan mass data are further enhanced by the semantic space topological association characteristics of each segmentation sub-data. Specifically, the Euclidean distance between every two semantic understanding feature vectors of the plurality of semantic understanding feature vectors of the segmentation sub data is calculated, so that similarity associated feature distribution information among the semantic understanding features of each segmentation sub data is represented, and a semantic space topology matrix is obtained. In a specific example of the present application, the calculating the euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors to obtain the semantic space topology matrix includes: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
wherein ,representation and->Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors> and />Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and matrixing the plurality of Euclidean distances to obtain the semantic space topology matrix.
Specifically, in step S150, the semantic space topology matrix is passed through a convolutional neural network model as a feature extractor to obtain a semantic space distribution topology feature matrix. The semantic space topology matrix is subjected to feature mining in a convolutional neural network model serving as a feature extractor, so that semantic space topology association features among semantic understanding features of each piece of sub-data are extracted, and a semantic space distribution topology feature matrix is obtained. In one particular example, the convolutional neural network includes a plurality of neural network layers that are cascaded with one another, wherein each neural network layer includes a convolutional layer, a pooling layer, and an activation layer. In the coding process of the convolutional neural network, each layer of the convolutional neural network carries out convolutional processing based on a convolutional kernel on input data by using the convolutional layer in the forward transmission process of the layer, carries out pooling processing on a convolutional feature map output by the convolutional layer by using the pooling layer and carries out activation processing on the pooling feature map output by the pooling layer by using the activation layer.
Fig. 4 is a flowchart of convolutional neural network encoding in a method for processing mass data of a sharingjdbc-based asset repayment plan according to an embodiment of the present application. As shown in fig. 4, in the convolutional neural network coding process, the method includes: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: s310, carrying out convolution processing on input data to obtain a convolution characteristic diagram; s320, pooling the convolution feature map along the channel dimension to obtain a pooled feature map; s330, performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
Specifically, in step S160 and step S170, the plurality of cut-molecule data semantic understanding feature vectors are two-dimensionally matrixed to obtain a global cut-molecule data semantic understanding feature matrix, and the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are passed through a graph neural network model to obtain a topological global cut-molecule data semantic understanding feature matrix. In the technical scheme of the application, the semantic understanding feature vectors of the plurality of molecular cutting data are used as feature representations of nodes, the semantic space distribution topological feature matrix is used as feature representations of edges between the nodes, and the global molecular cutting data semantic understanding feature matrix obtained by two-dimensional arrangement of the semantic understanding feature vectors of the plurality of molecular cutting data and the semantic space distribution topological feature are used for obtaining the topological global molecular cutting data semantic understanding feature matrix through a graph neural network model. Specifically, the graph neural network model performs graph structure data coding on the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a learnable neural network parameter to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut molecular data.
Specifically, in step S180, feature distribution optimization is performed on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector. Particularly, in the technical scheme of the application, when the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are obtained through a graph neural network model, each topological global cut-molecule data semantic understanding feature vector of the topological global cut-molecule data semantic understanding feature matrix, for example, a row vector represents a feature representation of context coding semantics of single cut-molecule data under semantic space topology, so that the spliced topological global cut-molecule data semantic understanding feature matrix may have poor dependence on a single classification result when classified by a classifier, and the accuracy of the classification result is affected. Therefore, the topological global tangential molecular data semantic understanding feature matrix is subjected to vector-weighted Hilbert probability spatialization, which is specifically expressed as follows:
wherein Is the expansion feature vector,>representing the two norms of the expansion feature vector, < >>Square of two norms representing the said expansion feature vector,/- >Is the +.o of the expansion feature vector>Personal characteristic value->An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>Is the +.o of the classification feature vector>And characteristic values. Here, the vector-generalized Hilbert probability spatialization is passed through the topological global cut molecular data semanticsUnderstanding the eigenvector +.>The eigenvector is itself given in Hilbert space defining the inner product of the vectors>And reduces the feature vector +.>Is a hidden disturbance of the class representation of the special local distribution of the (b) to the class representation of the overall Hilbert spatial topology, thereby increasing the eigenvector +.>Is converged to the robustness of the classification regression of the predetermined classification probability while promoting the feature vector +.>Is a long-range dependence of the feature distribution of classification results across classifiers. Then, the optimized feature vector is directly +.>By classifying through the classifier, the dependency of the topological global molecular cutting data semantic understanding feature matrix on classification results is improved when the topological global molecular cutting data semantic understanding feature matrix is classified through the classifier, and the accuracy of the classification results is improved. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, and further, reasonable data segmentation is carried out on mass data of the asset repayment plan, so that the use efficiency of a subsequent database is improved.
Specifically, in step S190, the classification feature vector is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable. That is, the classification feature vector is subjected to classification processing in the classifier, so as to obtain a classification result for indicating whether the first alternative segmentation scheme is reasonable. In a specific example of the present application, the passing the classification feature vector through a classifier to obtain a classification result includes: processing the classification feature vector using the classifier to obtain a classification result with the following formula:
, wherein ,/>To->Is a weight matrix>To->For the bias vector +.>Is a classification feature vector. Specifically, the classifier includes a plurality of fully connected layers and a Softmax layer cascaded with a last fully connected layer of the plurality of fully connected layers. In the classification processing of the classifier, multiple full-connection encoding is carried out on the classification feature vectors by using multiple full-connection layers of the classifier to obtain encoded classification feature vectors; further, the coding classification feature vector is input into a Softmax layer of the classifier, that is, the coding classification feature vector is subjected to classification processing by using the Softmax classification function to obtain a classification result for indicating whether the first alternative segmentation scheme is reasonable. That is, in the technical solution of the present application, the label of the classifier includes that the first alternative segmentation scheme is reasonable, and the first alternative segmentation scheme is not reasonable, where the classifier determines, through a soft maximum function, to which classification label the classification feature vector belongs. In this way, the rationality of the alternative segmentation scheme can be accurately judged, and then mass data of the asset repayment plan can be further processed based on the internal information of the data and the structure of the data Reasonable data slicing, and improves the use efficiency of the follow-up database.
Fig. 5 is a flowchart of a classification procedure in a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application. As shown in fig. 5, in the classification process, it includes: s410, performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and S420, passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In summary, the asset repayment plan massive data processing method based on the sharingjdbc according to the embodiment of the application is clarified, by adopting an artificial intelligence technology based on deep learning, an alternative segmentation scheme is used for segmenting asset repayment plan massive data, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan massive data are extracted, and expression of semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan massive data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
Exemplary System
FIG. 6 is a block diagram of a system for processing asset repayment plan mass data based on a sharingJDBC according to an embodiment of the present application. As shown in fig. 6, a system 300 for processing asset repayment plan mass data based on a sharingjdbc according to an embodiment of the present application includes: a data acquisition module 310; a segmentation module 320; a context encoding module 330; the euclidean distance calculation module 340; a convolution module 350; a two-dimensional matrixing module 360; a graph neural network module 370; a feature distribution optimization module 380; and a classification result generation module 390.
The data obtaining module 310 is configured to obtain mass data of an asset repayment plan to be segmented; the splitting module 320 is configured to split the mass data of the asset repayment plan to be split based on a first alternative splitting scheme to obtain a plurality of split sub-data; the context encoding module 330 is configured to pass each of the plurality of slice sub-data through a context encoder based on a converter to obtain a plurality of slice sub-data semantic understanding feature vectors; the euclidean distance calculating module 340 is configured to calculate euclidean distances between each two sub-data semantic understanding feature vectors in the plurality of sub-data semantic understanding feature vectors to obtain a semantic space topology matrix; the convolution module 350 is configured to pass the semantic space topology matrix through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; the two-dimensional matrixing module 360 is configured to two-dimensionally matrixing the plurality of tangent molecule data semantic understanding feature vectors to obtain a global tangent molecule data semantic understanding feature matrix; the graph neural network module 370 is configured to pass the global cut molecular data semantic understanding feature matrix and the semantic space distribution topology feature matrix through a graph neural network model to obtain a topology global cut molecular data semantic understanding feature matrix; the feature distribution optimization module 380 is configured to perform feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector; and the classification result generating module 390 is configured to pass the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable.
In one example, in the above described system 300 for processing asset repayment planning mass data based on sharingjdbc, the context encoding module 330 is further configured to: word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words; mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors; performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and cascading the plurality of global context semantic feature vectors to obtain the plurality of cut-molecule data semantic understanding feature vectors. Wherein the performing, by the converter of the converter-based context encoder, global context semantic coding on the sequence of word embedding vectors based on a converter concept to obtain a plurality of global context semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
In one example, in the above-mentioned asset repayment plan mass data processing system 300 based on sharingjdbc, the euclidean distance calculation module 340 is further configured to: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
wherein ,representation and->Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors> and />Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and matrixing the plurality of Euclidean distances to obtain the semantic space topology matrix.
In one example, in the above described system 300 for processing asset repayment planning mass data based on sharingjdbc, the convolution module 350 is further configured to: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
In one example, in the above described asset repayment planning mass data processing system 300 based on sharingjdbc, the graph neural network module 370 is further configured to: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
In one example, in the above-described asset repayment plan mass data processing system 300 based on sharingjdbc, the feature distribution optimization module 380 is further configured to: performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and spatialization of the vector-normalized hilbert probability of the unfolded feature vector to obtain the classification feature vector according to the following formula:
wherein Is the expansion feature vector,>representing the two norms of the expansion feature vector, < >>Square of two norms representing the said expansion feature vector,/- >Is the +.o of the expansion feature vector>Personal characteristic value->An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>Is the +.o of the classification feature vector>And characteristic values.
In one example, in the above-mentioned asset repayment plan mass data processing system 300 based on sharingjdbc, the classification result generating module 390 is further configured to: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In summary, the asset repayment plan massive data processing system 300 based on the sharingjdbc according to the embodiment of the application is illustrated, and by adopting an artificial intelligence technology based on deep learning, an alternative segmentation scheme is used to segment asset repayment plan massive data, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan massive data are extracted, and expression of semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan massive data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
As described above, the asset repayment plan mass data processing system based on the sharingjdbc according to the embodiment of the present application may be implemented in various terminal devices. In one example, the sharingjdbc-based asset repayment planning mass data processing system 300 according to an embodiment of the present application may be integrated into the terminal device as a software module and/or a hardware module. For example, the sharingjdbc-based asset repayment plan mass data processing system 300 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the sharingjdbc-based asset repayment planning mass data processing system 300 may also be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the sharingjdbc-based asset repayment plan mass data processing system 300 and the terminal device may be separate devices, and the sharingjdbc-based asset repayment plan mass data processing system 300 may be connected to the terminal device through a wired and/or wireless network, and transmit the interaction information according to a contracted data format.
Exemplary electronic device
Next, an electronic device according to an embodiment of the present application is described with reference to fig. 7.
Fig. 7 illustrates a block diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 7, the electronic device 10 includes one or more processors 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
The input means 13 may comprise, for example, a keyboard, a mouse, etc.
The output device 14 may output various information including the classification result and the like to the outside. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of a sharingjdbc-based asset repayment plan mass data processing method described in the above "exemplary methods" section of the present specification, according to various embodiments of the present application.
The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of a sharingjdbc-based asset repayment plan mass data processing method according to various embodiments of the present application described in the above-described "exemplary methods" section of the present specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.
The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.
Claims (9)
1. The asset repayment plan mass data processing method based on the sharingJDBC is characterized by comprising the following steps of:
Acquiring mass data of an asset repayment plan to be segmented;
segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data;
each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix;
the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and
And the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
2. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 1, wherein the step of passing each of the plurality of cut sub-data through a context encoder based on a converter to obtain a plurality of cut sub-data semantic understanding feature vectors comprises:
word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words;
mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors;
performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and
cascading the plurality of global context semantic feature vectors to obtain the plurality of cut molecular data semantic understanding feature vectors.
3. The method of processing asset repayment plan mass data based on sharingjdbc of claim 2, wherein said performing a global context semantic encoding of the sequence of word embedded vectors using a transducer of the transducer-based context encoder to obtain a plurality of global context semantic feature vectors based on a transducer concept comprises:
one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors;
calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices;
respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices;
obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices;
weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors;
Cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
4. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 3, wherein the calculating the euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors to obtain a semantic space topology matrix comprises:
calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
wherein ,representation and->Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors> and />Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and
And matrixing the Euclidean distances to obtain the semantic space topology matrix.
5. The method for processing mass data of asset repayment plan based on sharingjdbc according to claim 4, wherein said passing the semantic space topology matrix through a convolutional neural network model as a feature extractor to obtain a semantic space distribution topology feature matrix comprises: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data:
carrying out convolution processing on input data to obtain a convolution characteristic diagram;
pooling the convolution feature map along a channel dimension to obtain a pooled feature map; and
non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map;
the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
6. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 5, wherein said passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix comprises: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
7. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 6, wherein the performing feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector comprises:
performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and
the expanded feature vector is subjected to vector-weighted Hilbert probability spatialization to obtain the classification feature vector according to the following formula:
wherein Is the expansion feature vector,>representing the two norms of the expansion feature vector, < >>Square of two norms representing the said expansion feature vector,/->Is the +.o of the expansion feature vector>Personal characteristic value->An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>Is the +.o of the classification feature vector>And characteristic values.
8. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 7, wherein the performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector comprises: and expanding the topological global cut molecular data semantic understanding feature matrix along a row vector or a column vector to obtain the expanded feature vector.
9. The method for processing mass data of an asset repayment plan based on sharingjdbc according to claim 8, wherein the step of passing the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable, includes:
performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and
and the coding classification feature vector is passed through a Softmax classification function of the classifier to obtain the classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310141878.6A CN116150371A (en) | 2023-02-21 | 2023-02-21 | Asset repayment plan mass data processing method based on sharingJDBC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310141878.6A CN116150371A (en) | 2023-02-21 | 2023-02-21 | Asset repayment plan mass data processing method based on sharingJDBC |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116150371A true CN116150371A (en) | 2023-05-23 |
Family
ID=86355960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310141878.6A Withdrawn CN116150371A (en) | 2023-02-21 | 2023-02-21 | Asset repayment plan mass data processing method based on sharingJDBC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116150371A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116454772A (en) * | 2023-06-14 | 2023-07-18 | 浙江浙能迈领环境科技有限公司 | Decompression device and method for medium-voltage distribution cabinet of container |
-
2023
- 2023-02-21 CN CN202310141878.6A patent/CN116150371A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116454772A (en) * | 2023-06-14 | 2023-07-18 | 浙江浙能迈领环境科技有限公司 | Decompression device and method for medium-voltage distribution cabinet of container |
CN116454772B (en) * | 2023-06-14 | 2023-08-25 | 浙江浙能迈领环境科技有限公司 | Decompression device and method for medium-voltage distribution cabinet of container |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959246B (en) | Answer selection method and device based on improved attention mechanism and electronic equipment | |
US20220147715A1 (en) | Text processing method, model training method, and apparatus | |
CN109697451B (en) | Similar image clustering method and device, storage medium and electronic equipment | |
CN115203380A (en) | Text processing system and method based on multi-mode data fusion | |
KR20200019824A (en) | Entity relationship data generating method, apparatus, equipment and storage medium | |
US20160275196A1 (en) | Semantic search apparatus and method using mobile terminal | |
CN115994177B (en) | Intellectual property management method and system based on data lake | |
CN115834433B (en) | Data processing method and system based on Internet of things technology | |
KR20180129001A (en) | Method and System for Entity summarization based on multilingual projected entity space | |
CN116150371A (en) | Asset repayment plan mass data processing method based on sharingJDBC | |
Dourado et al. | Bag of textual graphs (BoTG): A general graph‐based text representation model | |
CN116579618A (en) | Data processing method, device, equipment and storage medium based on risk management | |
KR20120047622A (en) | System and method for managing digital contents | |
CN107341152B (en) | Parameter input method and device | |
CN117290478A (en) | Knowledge graph question-answering method, device, equipment and storage medium | |
CN109670071B (en) | Serialized multi-feature guided cross-media Hash retrieval method and system | |
CN116796288A (en) | Industrial document-oriented multi-mode information extraction method and system | |
CN116975340A (en) | Information retrieval method, apparatus, device, program product, and storage medium | |
CN116069953A (en) | MDATA knowledge representation method based on knowledge graph superposition space-time attribute | |
CN116186708A (en) | Class identification model generation method, device, computer equipment and storage medium | |
WO2022262632A1 (en) | Webpage search method and apparatus, and storage medium | |
US20220156336A1 (en) | Projecting queries into a content item embedding space | |
CN114398980A (en) | Cross-modal Hash model training method, encoding method, device and electronic equipment | |
CN114201957A (en) | Text emotion analysis method and device and computer readable storage medium | |
CN113704108A (en) | Similar code detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230523 |
|
WW01 | Invention patent application withdrawn after publication |