CN116150371A - Asset repayment plan mass data processing method based on sharingJDBC - Google Patents

Asset repayment plan mass data processing method based on sharingJDBC Download PDF

Info

Publication number
CN116150371A
CN116150371A CN202310141878.6A CN202310141878A CN116150371A CN 116150371 A CN116150371 A CN 116150371A CN 202310141878 A CN202310141878 A CN 202310141878A CN 116150371 A CN116150371 A CN 116150371A
Authority
CN
China
Prior art keywords
data
feature
semantic
semantic understanding
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310141878.6A
Other languages
Chinese (zh)
Inventor
陈粤龙
朱振华
张献力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangyin Consumer Finance Co ltd
Original Assignee
Hangyin Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangyin Consumer Finance Co ltd filed Critical Hangyin Consumer Finance Co ltd
Priority to CN202310141878.6A priority Critical patent/CN116150371A/en
Publication of CN116150371A publication Critical patent/CN116150371A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The utility model relates to the field of data processing, and particularly discloses a method for processing mass data of an asset repayment plan based on a sharingJDBC, which is characterized in that an alternative segmentation scheme is used for segmenting mass data of the asset repayment plan by adopting an artificial intelligence technology based on deep learning, so that semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data are extracted, and expression of the semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that the semantic understanding accuracy of the mass data of the asset repayment plan is improved, and the rationality judgment accuracy of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.

Description

Asset repayment plan mass data processing method based on sharingJDBC
Technical Field
The present application relates to the field of data processing, and more particularly, to a method for processing mass data of an asset repayment plan based on a sharingjdbc.
Background
Apache ShardingSphere is a distributed database ecosystem, which can convert any database into a distributed database and enhance the original database through the capabilities of data slicing, elastic expansion, encryption and the like. ShardingJDBC is taken as a product of Apache ShardingSphere, and can be independently deployed and support the product composition of mixed deployment and matched use. The method and the system provide standardized incremental functions based on the database as a storage node, and can be suitable for various application scenes such as Java isomorphism, heterogeneous languages, cloud native and the like.
Although the introduction of the sharingjdbc is non-intrusive to the service code, no modification of any service code logic is required, and the segmentation can be completed only by introducing the jar packet and modifying the configuration file. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to split data based on manual experience, but the manual experience cannot reasonably slice based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected.
Accordingly, an optimized sharingjdbc-based asset repayment plan mass data processing scheme is desired.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides a method for processing mass data of an asset repayment plan based on a shared jdbc, which uses an alternative segmentation scheme to segment the mass data of the asset repayment plan by adopting an artificial intelligence technology based on deep learning, so as to extract semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data, and further enhances the expression of the semantic understanding characteristics of each segmentation sub-data through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so as to improve the semantic understanding accuracy of the mass data of the asset repayment plan, and further improve the accuracy of rationality judgment of the alternative segmentation scheme. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
According to one aspect of the present application, there is provided a method for processing mass data of an asset repayment plan based on a sharingjdbc, including:
acquiring mass data of an asset repayment plan to be segmented;
segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data;
Each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix;
the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and
and the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing each of the plurality of segment sub-data through a context encoder based on a converter to obtain a plurality of segment sub-data semantic understanding feature vectors includes: word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words; mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors; performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and cascading the plurality of global context semantic feature vectors to obtain the plurality of cut-molecule data semantic understanding feature vectors.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the performing global context semantic coding on the sequence of word embedded vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors based on a converter idea includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the calculating the euclidean distance between each two segmentation sub-data semantic understanding feature vectors in the plurality of segmentation sub-data semantic understanding feature vectors to obtain a semantic space topology matrix includes: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
Figure SMS_1
wherein ,
Figure SMS_2
representation and->
Figure SMS_3
Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>
Figure SMS_4
Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors>
Figure SMS_5
and />
Figure SMS_6
Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and
and matrixing the Euclidean distances to obtain the semantic space topology matrix.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of obtaining the semantic space distribution topology feature matrix by using the semantic space topology matrix through a convolutional neural network model as a feature extractor includes: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix includes: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
The method for processing the mass data of the asset repayment plan based on the sharingjdbc is characterized in that the feature distribution optimization is performed on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector, and the method comprises the following steps: performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and spatialization of the vector-normalized hilbert probability of the unfolded feature vector to obtain the classification feature vector according to the following formula:
Figure SMS_7
wherein
Figure SMS_9
Is the expansion feature vector,>
Figure SMS_12
representing the two norms of the expansion feature vector, < >>
Figure SMS_14
Square of two norms representing the said expansion feature vector,/->
Figure SMS_10
Is the +.o of the expansion feature vector>
Figure SMS_11
Personal characteristic value->
Figure SMS_13
An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>
Figure SMS_15
Is the +.o of the classification feature vector>
Figure SMS_8
And characteristic values.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector includes: and expanding the topological global cut molecular data semantic understanding feature matrix along a row vector or a column vector to obtain the expanded feature vector.
In the above method for processing mass data of asset repayment plan based on sharingjdbc, the step of passing the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable, includes: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
According to another aspect of the present application, there is provided a system for processing mass data for a shardingJDBC-based asset repayment plan, comprising:
the data acquisition module is used for acquiring mass data of the asset repayment plan to be segmented;
the segmentation module is used for segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub data;
the context coding module is used for enabling each piece of the plurality of pieces of sub-data to pass through a context coder based on a converter so as to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
the Euclidean distance calculation module is used for calculating the Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors so as to obtain a semantic space topology matrix;
the convolution module is used for enabling the semantic space topology matrix to pass through a convolution neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
the two-dimensional matrixing module is used for carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
The graph neural network module is used for enabling the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix to pass through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
the feature distribution optimization module is used for carrying out feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix so as to obtain a classification feature vector; and
and the classification result generation module is used for enabling the classification feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform a method of asset repayment plan mass data processing based on sharingjdbc as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a method of processing asset repayment plan mass data based on sharingjdbc as described above.
Compared with the prior art, the asset repayment plan mass data processing method based on the sharingJDBC uses the alternative segmentation scheme to segment the asset repayment plan mass data by adopting the artificial intelligence technology based on deep learning, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan mass data are extracted, and further expression of the semantic understanding characteristics of each segmentation sub-data is enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan mass data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a flowchart of a method for processing mass data of a shardingJDBC-based asset repayment plan, according to an embodiment of the present application;
FIG. 2 is a schematic architecture diagram of a method for processing mass data of asset repayment plan based on a sharingJDBC according to an embodiment of the present application;
FIG. 3 is a flowchart of context encoding in a method for processing mass data of a sharingJDBC-based asset repayment plan, according to an embodiment of the present application;
FIG. 4 is a flowchart of convolutional neural network encoding in a method for processing mass data of a sharingJDBC-based asset repayment plan according to an embodiment of the present application;
FIG. 5 is a flow chart of a classification process in a method for processing mass data of a shelf jdbc-based asset repayment plan, according to an embodiment of the present application;
FIG. 6 is a block diagram of a system for processing asset repayment plan mass data based on a sharingJDBC in accordance with an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Scene overview
As described in the background art, since the existing data slicing strategy is to split data based on manual experience, the manual experience cannot reasonably slice based on internal information of data and self-structure of data when facing unfamiliar data, which affects the use of subsequent databases. Accordingly, an optimized sharingjdbc-based asset repayment plan mass data processing scheme is desired.
Specifically, in the technical scheme of the application, the asset repayment plan mass data management method based on the sharingjdbc is provided, and is suitable for a general solution of database performance bottleneck caused by large single-table data size. The introduction of the sharingJDBC is non-invasive to the service codes, any service code logic is not required to be modified, and the segmentation can be completed only by introducing a jar packet and modifying a configuration file; splitting the database and the table, wherein the split tables have the same structure, and mapping the relationship between the logic table and the physical table through analysis and routing; after the configuration file is configured into the database sub-table strategy, the split service is clear, and the special purpose of the private database is achieved; the data quantity of a single library (table) is reduced, the system performance is improved, and the stability and the load capacity of the system are improved; for high concurrency scenarios, read-write separation is controlled by configuration policies in order to further reduce the pressure of the server.
Accordingly, considering that although the introduction of the sharingjdbc is not intrusive to the service code, the segmentation can be completed only by introducing the jar packet and modifying the configuration file without modifying any service code logic. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to split data based on manual experience, but the manual experience cannot reasonably slice based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides new solutions and schemes for reasonable data slicing based on internal information of data and the structure of the data itself.
Specifically, in the technical scheme of the application, an artificial intelligence technology based on deep learning is adopted, and an alternative segmentation scheme is used for segmenting the mass data of the asset repayment plan, so that the semantic understanding characteristics of the mass data of the asset repayment plan based on global segmentation sub-data are extracted, and further, the expression of the semantic understanding characteristics of each segmentation sub-data is enhanced through the semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that the semantic understanding accuracy of the mass data of the asset repayment plan is improved, and the rationality judgment accuracy of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
More specifically, in the technical scheme of the application, first, mass data of an asset repayment plan to be segmented is obtained. Then, in order to explore the rationality of data slicing of the mass data of the asset repayment plan to be segmented, so as to improve the use efficiency of a subsequent database, in the technical scheme of the application, the mass data of the asset repayment plan to be segmented is further segmented based on a first alternative segmentation scheme so as to obtain a plurality of segmentation sub data.
Then, considering that each piece of the plurality of pieces of sub-data is composed of a plurality of words and data, and each word and data has semantic understanding characteristics of relevance, in the technical scheme of the application, each piece of sub-data in the plurality of pieces of sub-data is encoded by a context encoder based on a converter, so that the relevant characteristic information based on global context semantic understanding in each piece of sub-data is extracted, and a plurality of pieces of sub-data semantic understanding characteristic vectors are obtained. That is, based on the transform concept, the converter is used to capture the characteristic of long-distance context dependence, and each of the plurality of segmentation sub-data is respectively subjected to global context semantic coding to obtain a context semantic association feature representation, i.e. the plurality of segmentation sub-data semantic understanding feature vectors, respectively using the overall semantic association of each word in each segmentation sub-data as a context background. It should be understood that, in the technical solution of the present application, the context encoder based on the converter may capture, for each word in the respective segment sub-data, a context semantic association feature representation of a semantic understanding feature related to each word in the respective segment sub-data relative to a semantic understanding feature related to each word in the respective segment sub-data, that is, the global high-dimensional semantic understanding feature information of the respective segment sub-data.
Further, considering that the semantic understanding features of the plurality of sub-data have a correlation relationship, in the technical scheme of the application, in order to improve the accuracy of judging the rationality of the segmentation scheme, the semantic space topological correlation features of the sub-data are further used for enhancing the expression of the semantic understanding features of the sub-data in the to-be-segmented asset repayment plan mass data. Specifically, the Euclidean distance between every two semantic understanding feature vectors of the plurality of semantic understanding feature vectors of the segmentation sub data is calculated, so that similarity associated feature distribution information among the semantic understanding features of each segmentation sub data is represented, and a semantic space topology matrix is obtained. And then, carrying out feature mining on the semantic space topology matrix in a convolutional neural network model serving as a feature extractor so as to extract semantic space topology association features among semantic understanding features of each piece of sub-data, thereby obtaining a semantic space distribution topology feature matrix.
And then, taking the semantic understanding feature vectors of the plurality of the cut molecular data as the feature representation of the nodes, taking the semantic space distribution topological feature matrix as the feature representation of the edges between the nodes, and passing the global cut molecular data semantic understanding feature matrix obtained by two-dimensional arrangement of the semantic understanding feature vectors of the plurality of the cut molecular data and the semantic space distribution topological feature through a graph neural network model to obtain the topological global cut molecular data semantic understanding feature matrix. Specifically, the graph neural network model performs graph structure data coding on the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a learnable neural network parameter to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut molecular data.
And then, further classifying the classification characteristic vector by a classifier to obtain a classification result used for indicating whether the first alternative segmentation scheme is reasonable. That is, in the technical solution of the present application, the label of the classifier includes that the first alternative segmentation scheme is reasonable, and the first alternative segmentation scheme is not reasonable, where the classifier determines, through a soft maximum function, to which classification label the classification feature vector belongs. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, reasonable data segmentation is carried out on mass data of the asset repayment plan based on the internal information of the data and the structure of the data, and the use efficiency of a subsequent database is improved.
Particularly, in the technical scheme of the application, when the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are obtained through a graph neural network model, each topological global cut-molecule data semantic understanding feature vector of the topological global cut-molecule data semantic understanding feature matrix, for example, a row vector represents a feature representation of context coding semantics of single cut-molecule data under semantic space topology, so that the spliced topological global cut-molecule data semantic understanding feature matrix may have poor dependence on a single classification result when classified by a classifier, and the accuracy of the classification result is affected.
Therefore, the topological global tangential molecular data semantic understanding feature matrix is subjected to vector-weighted Hilbert probability spatialization, which is specifically expressed as follows:
Figure SMS_16
Figure SMS_18
is the feature vector obtained after the feature matrix is unfolded by semantic understanding of the topological global cut molecular data, and is ++>
Figure SMS_22
Representing the two norms of the feature vector, +.>
Figure SMS_24
Representing the square thereof, i.e. the inner product of the eigenvector itself,/->
Figure SMS_19
Is the feature vector +.>
Figure SMS_20
Is>
Figure SMS_23
Characteristic value, and->
Figure SMS_25
Is the optimized feature vector +.>
Figure SMS_17
Is>
Figure SMS_21
And characteristic values.
Here, the vector-generalized Hilbert probability spatially maps feature vectors obtained by semantic understanding of feature matrices by the topological global cut-molecular data
Figure SMS_26
The eigenvector is itself given in Hilbert space defining the inner product of the vectors>
Figure SMS_27
And reduces the feature vector +.>
Figure SMS_28
Is a hidden disturbance of the class representation of the special local distribution of the (b) to the class representation of the overall Hilbert spatial topology, thereby increasing the eigenvector +.>
Figure SMS_29
Feature distribution collection of (1)Robustness of classification regression converging to a predetermined classification probability while promoting the feature vector by means of the establishment of a metric-induced probability spatial structure>
Figure SMS_30
Is a long-range dependence of the feature distribution of classification results across classifiers. Then, the optimized feature vector is directly processed
Figure SMS_31
By classifying through the classifier, the dependency of the topological global molecular cutting data semantic understanding feature matrix on classification results is improved when the topological global molecular cutting data semantic understanding feature matrix is classified through the classifier, and the accuracy of the classification results is improved. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, and further, reasonable data segmentation is carried out on mass data of the asset repayment plan, so that the use efficiency of a subsequent database is improved.
Based on this, the application provides a method for processing mass data of an asset repayment plan based on a sharingjdbc, which comprises the following steps: acquiring mass data of an asset repayment plan to be segmented; segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data; each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Exemplary method
Fig. 1 is a flowchart of a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application. As shown in fig. 1, a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application includes the steps of: s110, acquiring mass data of an asset repayment plan to be segmented; s120, segmenting mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub-data; s130, enabling each piece of the plurality of pieces of sub-data to pass through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; s140, calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; s150, passing the semantic space topology matrix through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; s160, carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; s170, passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; s180, performing feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector; and S190, passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Fig. 2 is a schematic architecture diagram of a method for processing mass data of asset repayment plan based on a sharingjdbc according to an embodiment of the present application. In the network structure, as shown in fig. 2, first, mass data of an asset repayment plan to be segmented is acquired; segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data; then, each piece of the plurality of pieces of sub-data passes through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors; then, calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix; the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; then, carrying out two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix; the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix; performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and then, the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
Specifically, in step S110 and step S120, mass data of an asset repayment plan to be segmented is obtained; and segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmentation sub data. It should be appreciated that although the introduction of the sharingjdbc is considered non-intrusive to the service code, the splitting can be accomplished by simply introducing the jar packet and modifying the configuration file without modifying any of the service code logic. But how to perform data slicing is an important technical problem in specific data processing. The existing data slicing strategy is to slice data based on artificial experience, but the artificial experience cannot reasonably slice data based on internal information of the data and the structure of the data when facing unfamiliar data, so that the use of a subsequent database is affected. Then, in order to explore the rationality of data slicing of the mass data of the asset repayment plan to be segmented, so as to improve the use efficiency of a subsequent database, in the technical scheme of the application, the mass data of the asset repayment plan to be segmented is further segmented based on a first alternative segmentation scheme so as to obtain a plurality of segmentation sub data.
Specifically, in step S130, each of the plurality of slice sub-data is passed through a context encoder based on a converter to obtain a plurality of slice sub-data semantic understanding feature vectors. In the technical scheme of the application, each piece of sub-data in the plurality of pieces of sub-data is encoded by a context encoder based on a converter so as to extract global context semantic understanding associated feature information in each piece of sub-data respectively, and therefore, a plurality of piece of sub-data semantic understanding feature vectors are obtained. That is, based on the transform concept, the converter is used to capture the characteristic of long-distance context dependence, and each of the plurality of segmentation sub-data is respectively subjected to global context semantic coding to obtain a context semantic association feature representation, i.e. the plurality of segmentation sub-data semantic understanding feature vectors, respectively using the overall semantic association of each word in each segmentation sub-data as a context background. It should be understood that, in the technical solution of the present application, the context encoder based on the converter may capture, for each word in the respective segment sub-data, a context semantic association feature representation of a semantic understanding feature related to each word in the respective segment sub-data relative to a semantic understanding feature related to each word in the respective segment sub-data, that is, the global high-dimensional semantic understanding feature information of the respective segment sub-data.
Fig. 3 is a flowchart of context encoding in a method for processing mass data of a shardingJDBC-based asset repayment plan according to an embodiment of the present application. As shown in fig. 3, in the context encoding process, it includes: s210, performing word segmentation processing on each of the plurality of segmentation data to convert each of the plurality of segmentation data into a word sequence composed of a plurality of words; s220, mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter so as to obtain a sequence of word embedding vectors; s230, performing global context semantic coding based on a converter thought on the sequence of word embedded vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and S240, cascading the plurality of global context semantic feature vectors to obtain the plurality of cut molecular data semantic understanding feature vectors. Wherein the performing, by the converter of the converter-based context encoder, global context semantic coding on the sequence of word embedding vectors based on a converter concept to obtain a plurality of global context semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
Specifically, in step S140, a euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors is calculated to obtain a semantic space topology matrix. In the technical scheme of the application, in order to improve the accuracy of rationality judgment of the segmentation scheme, the semantic understanding characteristics of each segmentation sub-data in the to-be-segmented asset repayment plan mass data are further enhanced by the semantic space topological association characteristics of each segmentation sub-data. Specifically, the Euclidean distance between every two semantic understanding feature vectors of the plurality of semantic understanding feature vectors of the segmentation sub data is calculated, so that similarity associated feature distribution information among the semantic understanding features of each segmentation sub data is represented, and a semantic space topology matrix is obtained. In a specific example of the present application, the calculating the euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors to obtain the semantic space topology matrix includes: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
Figure SMS_32
wherein ,
Figure SMS_33
representation and->
Figure SMS_34
Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>
Figure SMS_35
Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors>
Figure SMS_36
and />
Figure SMS_37
Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and matrixing the plurality of Euclidean distances to obtain the semantic space topology matrix.
Specifically, in step S150, the semantic space topology matrix is passed through a convolutional neural network model as a feature extractor to obtain a semantic space distribution topology feature matrix. The semantic space topology matrix is subjected to feature mining in a convolutional neural network model serving as a feature extractor, so that semantic space topology association features among semantic understanding features of each piece of sub-data are extracted, and a semantic space distribution topology feature matrix is obtained. In one particular example, the convolutional neural network includes a plurality of neural network layers that are cascaded with one another, wherein each neural network layer includes a convolutional layer, a pooling layer, and an activation layer. In the coding process of the convolutional neural network, each layer of the convolutional neural network carries out convolutional processing based on a convolutional kernel on input data by using the convolutional layer in the forward transmission process of the layer, carries out pooling processing on a convolutional feature map output by the convolutional layer by using the pooling layer and carries out activation processing on the pooling feature map output by the pooling layer by using the activation layer.
Fig. 4 is a flowchart of convolutional neural network encoding in a method for processing mass data of a sharingjdbc-based asset repayment plan according to an embodiment of the present application. As shown in fig. 4, in the convolutional neural network coding process, the method includes: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: s310, carrying out convolution processing on input data to obtain a convolution characteristic diagram; s320, pooling the convolution feature map along the channel dimension to obtain a pooled feature map; s330, performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
Specifically, in step S160 and step S170, the plurality of cut-molecule data semantic understanding feature vectors are two-dimensionally matrixed to obtain a global cut-molecule data semantic understanding feature matrix, and the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are passed through a graph neural network model to obtain a topological global cut-molecule data semantic understanding feature matrix. In the technical scheme of the application, the semantic understanding feature vectors of the plurality of molecular cutting data are used as feature representations of nodes, the semantic space distribution topological feature matrix is used as feature representations of edges between the nodes, and the global molecular cutting data semantic understanding feature matrix obtained by two-dimensional arrangement of the semantic understanding feature vectors of the plurality of molecular cutting data and the semantic space distribution topological feature are used for obtaining the topological global molecular cutting data semantic understanding feature matrix through a graph neural network model. Specifically, the graph neural network model performs graph structure data coding on the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a learnable neural network parameter to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut molecular data.
Specifically, in step S180, feature distribution optimization is performed on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector. Particularly, in the technical scheme of the application, when the global cut-molecule data semantic understanding feature matrix and the semantic space distribution topological feature matrix are obtained through a graph neural network model, each topological global cut-molecule data semantic understanding feature vector of the topological global cut-molecule data semantic understanding feature matrix, for example, a row vector represents a feature representation of context coding semantics of single cut-molecule data under semantic space topology, so that the spliced topological global cut-molecule data semantic understanding feature matrix may have poor dependence on a single classification result when classified by a classifier, and the accuracy of the classification result is affected. Therefore, the topological global tangential molecular data semantic understanding feature matrix is subjected to vector-weighted Hilbert probability spatialization, which is specifically expressed as follows:
Figure SMS_38
wherein
Figure SMS_40
Is the expansion feature vector,>
Figure SMS_42
representing the two norms of the expansion feature vector, < >>
Figure SMS_45
Square of two norms representing the said expansion feature vector,/- >
Figure SMS_41
Is the +.o of the expansion feature vector>
Figure SMS_43
Personal characteristic value->
Figure SMS_46
An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>
Figure SMS_49
Is the +.o of the classification feature vector>
Figure SMS_39
And characteristic values. Here, the vector-generalized Hilbert probability spatialization is passed through the topological global cut molecular data semanticsUnderstanding the eigenvector +.>
Figure SMS_44
The eigenvector is itself given in Hilbert space defining the inner product of the vectors>
Figure SMS_47
And reduces the feature vector +.>
Figure SMS_50
Is a hidden disturbance of the class representation of the special local distribution of the (b) to the class representation of the overall Hilbert spatial topology, thereby increasing the eigenvector +.>
Figure SMS_48
Is converged to the robustness of the classification regression of the predetermined classification probability while promoting the feature vector +.>
Figure SMS_51
Is a long-range dependence of the feature distribution of classification results across classifiers. Then, the optimized feature vector is directly +.>
Figure SMS_52
By classifying through the classifier, the dependency of the topological global molecular cutting data semantic understanding feature matrix on classification results is improved when the topological global molecular cutting data semantic understanding feature matrix is classified through the classifier, and the accuracy of the classification results is improved. Therefore, the rationality of the alternative segmentation scheme can be accurately judged, and further, reasonable data segmentation is carried out on mass data of the asset repayment plan, so that the use efficiency of a subsequent database is improved.
Specifically, in step S190, the classification feature vector is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable. That is, the classification feature vector is subjected to classification processing in the classifier, so as to obtain a classification result for indicating whether the first alternative segmentation scheme is reasonable. In a specific example of the present application, the passing the classification feature vector through a classifier to obtain a classification result includes: processing the classification feature vector using the classifier to obtain a classification result with the following formula:
Figure SMS_53
, wherein ,/>
Figure SMS_54
To->
Figure SMS_55
Is a weight matrix>
Figure SMS_56
To->
Figure SMS_57
For the bias vector +.>
Figure SMS_58
Is a classification feature vector. Specifically, the classifier includes a plurality of fully connected layers and a Softmax layer cascaded with a last fully connected layer of the plurality of fully connected layers. In the classification processing of the classifier, multiple full-connection encoding is carried out on the classification feature vectors by using multiple full-connection layers of the classifier to obtain encoded classification feature vectors; further, the coding classification feature vector is input into a Softmax layer of the classifier, that is, the coding classification feature vector is subjected to classification processing by using the Softmax classification function to obtain a classification result for indicating whether the first alternative segmentation scheme is reasonable. That is, in the technical solution of the present application, the label of the classifier includes that the first alternative segmentation scheme is reasonable, and the first alternative segmentation scheme is not reasonable, where the classifier determines, through a soft maximum function, to which classification label the classification feature vector belongs. In this way, the rationality of the alternative segmentation scheme can be accurately judged, and then mass data of the asset repayment plan can be further processed based on the internal information of the data and the structure of the data Reasonable data slicing, and improves the use efficiency of the follow-up database.
Fig. 5 is a flowchart of a classification procedure in a method for processing mass data of an asset repayment plan based on a sharingjdbc according to an embodiment of the present application. As shown in fig. 5, in the classification process, it includes: s410, performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and S420, passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In summary, the asset repayment plan massive data processing method based on the sharingjdbc according to the embodiment of the application is clarified, by adopting an artificial intelligence technology based on deep learning, an alternative segmentation scheme is used for segmenting asset repayment plan massive data, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan massive data are extracted, and expression of semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan massive data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
Exemplary System
FIG. 6 is a block diagram of a system for processing asset repayment plan mass data based on a sharingJDBC according to an embodiment of the present application. As shown in fig. 6, a system 300 for processing asset repayment plan mass data based on a sharingjdbc according to an embodiment of the present application includes: a data acquisition module 310; a segmentation module 320; a context encoding module 330; the euclidean distance calculation module 340; a convolution module 350; a two-dimensional matrixing module 360; a graph neural network module 370; a feature distribution optimization module 380; and a classification result generation module 390.
The data obtaining module 310 is configured to obtain mass data of an asset repayment plan to be segmented; the splitting module 320 is configured to split the mass data of the asset repayment plan to be split based on a first alternative splitting scheme to obtain a plurality of split sub-data; the context encoding module 330 is configured to pass each of the plurality of slice sub-data through a context encoder based on a converter to obtain a plurality of slice sub-data semantic understanding feature vectors; the euclidean distance calculating module 340 is configured to calculate euclidean distances between each two sub-data semantic understanding feature vectors in the plurality of sub-data semantic understanding feature vectors to obtain a semantic space topology matrix; the convolution module 350 is configured to pass the semantic space topology matrix through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix; the two-dimensional matrixing module 360 is configured to two-dimensionally matrixing the plurality of tangent molecule data semantic understanding feature vectors to obtain a global tangent molecule data semantic understanding feature matrix; the graph neural network module 370 is configured to pass the global cut molecular data semantic understanding feature matrix and the semantic space distribution topology feature matrix through a graph neural network model to obtain a topology global cut molecular data semantic understanding feature matrix; the feature distribution optimization module 380 is configured to perform feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector; and the classification result generating module 390 is configured to pass the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable.
In one example, in the above described system 300 for processing asset repayment planning mass data based on sharingjdbc, the context encoding module 330 is further configured to: word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words; mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors; performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and cascading the plurality of global context semantic feature vectors to obtain the plurality of cut-molecule data semantic understanding feature vectors. Wherein the performing, by the converter of the converter-based context encoder, global context semantic coding on the sequence of word embedding vectors based on a converter concept to obtain a plurality of global context semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
In one example, in the above-mentioned asset repayment plan mass data processing system 300 based on sharingjdbc, the euclidean distance calculation module 340 is further configured to: calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
Figure SMS_59
wherein ,
Figure SMS_60
representation and->
Figure SMS_61
Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>
Figure SMS_62
Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors>
Figure SMS_63
and />
Figure SMS_64
Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and matrixing the plurality of Euclidean distances to obtain the semantic space topology matrix.
In one example, in the above described system 300 for processing asset repayment planning mass data based on sharingjdbc, the convolution module 350 is further configured to: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
In one example, in the above described asset repayment planning mass data processing system 300 based on sharingjdbc, the graph neural network module 370 is further configured to: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
In one example, in the above-described asset repayment plan mass data processing system 300 based on sharingjdbc, the feature distribution optimization module 380 is further configured to: performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and spatialization of the vector-normalized hilbert probability of the unfolded feature vector to obtain the classification feature vector according to the following formula:
Figure SMS_65
wherein
Figure SMS_67
Is the expansion feature vector,>
Figure SMS_70
representing the two norms of the expansion feature vector, < >>
Figure SMS_72
Square of two norms representing the said expansion feature vector,/- >
Figure SMS_66
Is the +.o of the expansion feature vector>
Figure SMS_69
Personal characteristic value->
Figure SMS_71
An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>
Figure SMS_73
Is the +.o of the classification feature vector>
Figure SMS_68
And characteristic values.
In one example, in the above-mentioned asset repayment plan mass data processing system 300 based on sharingjdbc, the classification result generating module 390 is further configured to: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In summary, the asset repayment plan massive data processing system 300 based on the sharingjdbc according to the embodiment of the application is illustrated, and by adopting an artificial intelligence technology based on deep learning, an alternative segmentation scheme is used to segment asset repayment plan massive data, so that global segmentation sub-data semantic understanding characteristics of the asset repayment plan massive data are extracted, and expression of semantic understanding characteristics of each segmentation sub-data is further enhanced through semantic topological association characteristics among the semantic understanding characteristics of each segmentation sub-data, so that semantic understanding accuracy of the asset repayment plan massive data is improved, and accuracy of rationality judgment of the alternative segmentation scheme is further improved. Therefore, reasonable data slicing can be carried out on mass data of the asset repayment plan, and the use efficiency of a subsequent database is further improved.
As described above, the asset repayment plan mass data processing system based on the sharingjdbc according to the embodiment of the present application may be implemented in various terminal devices. In one example, the sharingjdbc-based asset repayment planning mass data processing system 300 according to an embodiment of the present application may be integrated into the terminal device as a software module and/or a hardware module. For example, the sharingjdbc-based asset repayment plan mass data processing system 300 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the sharingjdbc-based asset repayment planning mass data processing system 300 may also be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the sharingjdbc-based asset repayment plan mass data processing system 300 and the terminal device may be separate devices, and the sharingjdbc-based asset repayment plan mass data processing system 300 may be connected to the terminal device through a wired and/or wireless network, and transmit the interaction information according to a contracted data format.
Exemplary electronic device
Next, an electronic device according to an embodiment of the present application is described with reference to fig. 7.
Fig. 7 illustrates a block diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 7, the electronic device 10 includes one or more processors 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 11 to perform the functions in the method of sharingjdbc-based asset repayment planning mass data processing method and/or other desired functions of the various embodiments of the present application described above. Various contents such as classification feature vectors may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
The input means 13 may comprise, for example, a keyboard, a mouse, etc.
The output device 14 may output various information including the classification result and the like to the outside. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of a sharingjdbc-based asset repayment plan mass data processing method described in the above "exemplary methods" section of the present specification, according to various embodiments of the present application.
The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of a sharingjdbc-based asset repayment plan mass data processing method according to various embodiments of the present application described in the above-described "exemplary methods" section of the present specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.
The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (9)

1. The asset repayment plan mass data processing method based on the sharingJDBC is characterized by comprising the following steps of:
Acquiring mass data of an asset repayment plan to be segmented;
segmenting the mass data of the asset repayment plan to be segmented based on a first alternative segmentation scheme to obtain a plurality of segmented sub data;
each piece of the plurality of pieces of sub-data is passed through a context encoder based on a converter to obtain a plurality of pieces of sub-data semantic understanding feature vectors;
calculating Euclidean distance between every two segmentation data semantic understanding feature vectors in the plurality of segmentation data semantic understanding feature vectors to obtain a semantic space topology matrix;
the semantic space topology matrix is passed through a convolutional neural network model serving as a feature extractor to obtain a semantic space distribution topology feature matrix;
performing two-dimensional matrixing on the plurality of molecular cutting data semantic understanding feature vectors to obtain a global molecular cutting data semantic understanding feature matrix;
the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix are subjected to a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix;
performing feature distribution optimization on the topological global molecular cutting data semantic understanding feature matrix to obtain a classification feature vector; and
And the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first alternative segmentation scheme is reasonable or not.
2. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 1, wherein the step of passing each of the plurality of cut sub-data through a context encoder based on a converter to obtain a plurality of cut sub-data semantic understanding feature vectors comprises:
word segmentation processing is carried out on each piece of the plurality of pieces of the piece of the data to convert each piece of the plurality of pieces of the data into a word sequence composed of a plurality of words;
mapping each word in the word sequence into a word embedding vector by using an embedding layer of the context encoder based on the converter to obtain a sequence of word embedding vectors;
performing global context semantic coding based on a converter thought on the sequence of word embedding vectors by using a converter of the converter-based context encoder to obtain a plurality of global context semantic feature vectors; and
cascading the plurality of global context semantic feature vectors to obtain the plurality of cut molecular data semantic understanding feature vectors.
3. The method of processing asset repayment plan mass data based on sharingjdbc of claim 2, wherein said performing a global context semantic encoding of the sequence of word embedded vectors using a transducer of the transducer-based context encoder to obtain a plurality of global context semantic feature vectors based on a transducer concept comprises:
one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global word feature vectors;
calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices;
respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices;
obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices;
weighting each word vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors;
Cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors.
4. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 3, wherein the calculating the euclidean distance between each two of the plurality of sub-data semantic understanding feature vectors to obtain a semantic space topology matrix comprises:
calculating Euclidean distances between every two segmentation data semantic understanding feature vectors in the segmentation data semantic understanding feature vectors according to the following formula to obtain a plurality of Euclidean distances;
Figure QLYQS_1
wherein ,
Figure QLYQS_2
representation and->
Figure QLYQS_3
Respectively representing any two of the plurality of sub-data semantic understanding feature vectors, +.>
Figure QLYQS_4
Representing calculating Euclidean distance between any two of the plurality of sub-data semantic understanding feature vectors,/for the sub-data semantic understanding feature vectors>
Figure QLYQS_5
and />
Figure QLYQS_6
Respectively representing the characteristic values of all positions of any two of the plurality of sub-data semantic understanding characteristic vectors; and
And matrixing the Euclidean distances to obtain the semantic space topology matrix.
5. The method for processing mass data of asset repayment plan based on sharingjdbc according to claim 4, wherein said passing the semantic space topology matrix through a convolutional neural network model as a feature extractor to obtain a semantic space distribution topology feature matrix comprises: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data:
carrying out convolution processing on input data to obtain a convolution characteristic diagram;
pooling the convolution feature map along a channel dimension to obtain a pooled feature map; and
non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map;
the output of the last layer of the convolutional neural network serving as the feature extractor is the semantic space distribution topological feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the semantic space topological matrix.
6. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 5, wherein said passing the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through a graph neural network model to obtain a topological global cut molecular data semantic understanding feature matrix comprises: the graph neural network processes the global cut molecular data semantic understanding feature matrix and the semantic space distribution topological feature matrix through the learnable neural network parameters to obtain the topological global cut molecular data semantic understanding feature matrix containing irregular semantic space topological association features and high-dimensional semantic understanding feature information of each cut sub data.
7. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 6, wherein the performing feature distribution optimization on the topological global cut molecular data semantic understanding feature matrix to obtain a classification feature vector comprises:
performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector; and
the expanded feature vector is subjected to vector-weighted Hilbert probability spatialization to obtain the classification feature vector according to the following formula:
Figure QLYQS_7
wherein
Figure QLYQS_9
Is the expansion feature vector,>
Figure QLYQS_11
representing the two norms of the expansion feature vector, < >>
Figure QLYQS_13
Square of two norms representing the said expansion feature vector,/->
Figure QLYQS_8
Is the +.o of the expansion feature vector>
Figure QLYQS_12
Personal characteristic value->
Figure QLYQS_14
An exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector, and +.>
Figure QLYQS_15
Is the +.o of the classification feature vector>
Figure QLYQS_10
And characteristic values.
8. The method for processing asset repayment plan mass data based on sharingjdbc according to claim 7, wherein the performing matrix expansion on the topological global cut molecular data semantic understanding feature matrix to obtain an expansion feature vector comprises: and expanding the topological global cut molecular data semantic understanding feature matrix along a row vector or a column vector to obtain the expanded feature vector.
9. The method for processing mass data of an asset repayment plan based on sharingjdbc according to claim 8, wherein the step of passing the classification feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the first alternative segmentation scheme is reasonable, includes:
performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and
and the coding classification feature vector is passed through a Softmax classification function of the classifier to obtain the classification result.
CN202310141878.6A 2023-02-21 2023-02-21 Asset repayment plan mass data processing method based on sharingJDBC Withdrawn CN116150371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310141878.6A CN116150371A (en) 2023-02-21 2023-02-21 Asset repayment plan mass data processing method based on sharingJDBC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310141878.6A CN116150371A (en) 2023-02-21 2023-02-21 Asset repayment plan mass data processing method based on sharingJDBC

Publications (1)

Publication Number Publication Date
CN116150371A true CN116150371A (en) 2023-05-23

Family

ID=86355960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310141878.6A Withdrawn CN116150371A (en) 2023-02-21 2023-02-21 Asset repayment plan mass data processing method based on sharingJDBC

Country Status (1)

Country Link
CN (1) CN116150371A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116454772A (en) * 2023-06-14 2023-07-18 浙江浙能迈领环境科技有限公司 Decompression device and method for medium-voltage distribution cabinet of container

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116454772A (en) * 2023-06-14 2023-07-18 浙江浙能迈领环境科技有限公司 Decompression device and method for medium-voltage distribution cabinet of container
CN116454772B (en) * 2023-06-14 2023-08-25 浙江浙能迈领环境科技有限公司 Decompression device and method for medium-voltage distribution cabinet of container

Similar Documents

Publication Publication Date Title
CN108959246B (en) Answer selection method and device based on improved attention mechanism and electronic equipment
US20220147715A1 (en) Text processing method, model training method, and apparatus
CN109697451B (en) Similar image clustering method and device, storage medium and electronic equipment
CN115203380A (en) Text processing system and method based on multi-mode data fusion
KR20200019824A (en) Entity relationship data generating method, apparatus, equipment and storage medium
US20160275196A1 (en) Semantic search apparatus and method using mobile terminal
CN115994177B (en) Intellectual property management method and system based on data lake
CN115834433B (en) Data processing method and system based on Internet of things technology
KR20180129001A (en) Method and System for Entity summarization based on multilingual projected entity space
CN116150371A (en) Asset repayment plan mass data processing method based on sharingJDBC
Dourado et al. Bag of textual graphs (BoTG): A general graph‐based text representation model
CN116579618A (en) Data processing method, device, equipment and storage medium based on risk management
KR20120047622A (en) System and method for managing digital contents
CN107341152B (en) Parameter input method and device
CN117290478A (en) Knowledge graph question-answering method, device, equipment and storage medium
CN109670071B (en) Serialized multi-feature guided cross-media Hash retrieval method and system
CN116796288A (en) Industrial document-oriented multi-mode information extraction method and system
CN116975340A (en) Information retrieval method, apparatus, device, program product, and storage medium
CN116069953A (en) MDATA knowledge representation method based on knowledge graph superposition space-time attribute
CN116186708A (en) Class identification model generation method, device, computer equipment and storage medium
WO2022262632A1 (en) Webpage search method and apparatus, and storage medium
US20220156336A1 (en) Projecting queries into a content item embedding space
CN114398980A (en) Cross-modal Hash model training method, encoding method, device and electronic equipment
CN114201957A (en) Text emotion analysis method and device and computer readable storage medium
CN113704108A (en) Similar code detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230523

WW01 Invention patent application withdrawn after publication