CN115374284B

CN115374284B - Data mining method and server based on artificial intelligence

Info

Publication number: CN115374284B
Application number: CN202211314538.0A
Authority: CN
Inventors: 张志华
Original assignee: Jiangsu Yibirui Information Technology Co ltd
Current assignee: Jiangsu Yibirui Information Technology Co ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-04-07
Anticipated expiration: 2042-10-26
Also published as: CN115374284A

Abstract

According to the data mining method and the server based on artificial intelligence, a plurality of disassembled texts are obtained by disassembling a text set to be processed, discrete text expression knowledge mining and distributed text expression knowledge mining are respectively carried out on the plurality of disassembled texts, expression knowledge collision is carried out, target text blending expression knowledge is obtained, text ideographic knowledge corresponding to the plurality of disassembled texts is obtained, accordingly, the mined text ideographic knowledge can take discrete contents and distributed contents into consideration, initial characteristics of the texts can be maintained as far as possible by the mined text ideographic knowledge, key text support degrees corresponding to the plurality of disassembled texts are obtained, accuracy and reliability of text type prediction are improved, a plurality of text sequences are determined from the text set, corresponding integrated ideographic knowledge is determined, the same text sequence set is obtained, and accuracy and reliability of prediction are improved and accuracy and reliability of the same text sequence set are obtained.

Description

Data mining method and server based on artificial intelligence

Technical Field

The application relates to the field of data mining and artificial intelligence, in particular to a data mining method and a server based on artificial intelligence.

Background

With the development of internet technology, in the text field, a long text often needs to be organized and combined with key information. For example, for an e-commerce platform, e-commerce log texts contain more information, and information for a specific commodity needs to be mined from the e-commerce platform and then integrated together to form information for the commodity, such as commodity evaluation; for a government and enterprise business platform, information for a specific event needs to be mined from a lengthy business log and then integrated to obtain associated information of the specific event. For a platform with huge traffic and high requirement on timeliness, the accuracy and efficiency of text mining are important consideration factors, and at present, for the above types of text mining, the accuracy and efficiency cannot meet the requirement.

Disclosure of Invention

The invention aims to provide a data mining method and a server based on artificial intelligence so as to improve the problems.

In order to achieve the above purpose, the embodiments of the present application are implemented as follows:

the embodiment of the application provides a data mining method based on artificial intelligence in a first aspect, which is applied to a server and comprises the following steps: acquiring a text set to be processed, and disassembling the text set to be processed to obtain a plurality of disassembled texts; respectively carrying out discrete text expression knowledge mining on the disassembled texts to obtain discrete text expression knowledge corresponding to the disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge; respectively mining distributed text expression knowledge of the disassembled texts to obtain distributed text expression knowledge corresponding to the disassembled texts, wherein the distributed text expression knowledge comprises transitional distributed text expression knowledge and convergence distributed text expression knowledge; carrying out expression knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts to obtain target text blending expression knowledge corresponding to the disassembled texts; performing ideographic knowledge mining according to the convergence discrete text expression knowledge, the convergence distribution text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts; determining a plurality of text sequences from the text set to be processed according to the support degree of the key text, and determining integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge; and predicting the text sequence type according to the integrated ideographic knowledge corresponding to the text sequences to obtain the same text sequence set.

Further, the predicting the text sequence type according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain a set of the same text sequences comprises: hidden mapping is carried out according to the integrated ideographic knowledge corresponding to the text sequences to obtain hidden mapping knowledge; performing reduction mapping through the hidden mapping knowledge and the support degrees of the key texts corresponding to the disassembled texts to obtain target integrated ideographic knowledge corresponding to the text sequences; and performing type prediction on the plurality of text sequences according to the target integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set.

Further, the hidden mapping is performed according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain hidden mapping knowledge, including: extracting initial text expression knowledge corresponding to the disassembled texts respectively, and determining text sequence initial expression knowledge corresponding to the text sequences from the initial text expression knowledge corresponding to the disassembled texts respectively; fusing the text sequence initial expression knowledge corresponding to the text sequences with the corresponding integrated ideographic knowledge respectively to obtain target text fusion expression knowledge corresponding to the text sequences; and loading the target text fusion expression knowledge corresponding to the plurality of text sequences into a hidden module of a mapping network for processing to obtain target hidden mapping knowledge.

Further, the performing type prediction on the plurality of text sequences according to the target integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set includes: determining a result of commonality measurement among the plurality of text sequences through target integrated ideographic knowledge corresponding to the plurality of text sequences; and clustering according to the result of the commonality measurement among the text sequences to obtain the same text sequence set.

Further, the mining discrete text expression knowledge of the disassembled texts to obtain discrete text expression knowledge corresponding to the disassembled texts, where the discrete text expression knowledge includes transition discrete text expression knowledge and convergence discrete text expression knowledge, includes: respectively carrying out discrete knowledge extraction operation on the disassembled texts to obtain a plurality of transitional linear knowledge and convergent linear knowledge corresponding to the disassembled texts; performing distributed dimension unified processing on the plurality of transitional linear knowledge to obtain a plurality of transitional discrete text expression knowledge corresponding to the plurality of disassembled texts; and carrying out distribution dimension unified processing on the convergence linear knowledge to obtain convergence discrete text expression knowledge corresponding to the plurality of disassembled texts.

Further, the mining of distributed text expression knowledge of the disassembled texts to obtain distributed text expression knowledge corresponding to the disassembled texts, where the distributed text expression knowledge includes transition distributed text expression knowledge and convergence distributed text expression knowledge, includes: extracting initial text expression knowledge corresponding to the disassembled texts respectively; and carrying out distributed knowledge extraction operation on the initial text expression knowledge corresponding to the disassembled texts respectively to obtain a plurality of transitional distributed text expression knowledge and convergence distributed text expression knowledge corresponding to the disassembled texts.

Further, the transitional discrete text expression knowledge comprises a plurality of discrete text expression knowledge, and the transitional distributed text expression knowledge comprises a plurality of discrete text expression knowledge; the method for performing knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts to obtain target text blending expression knowledge corresponding to the disassembled texts comprises the following steps: fusing a first transition discrete text expression knowledge in the transition discrete text expression knowledge with a corresponding first transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain a first text fusion expression knowledge, and performing knowledge extraction operation according to the first text fusion expression knowledge to obtain a first text fusion expression knowledge; fusing the first text blending expression knowledge, second transition discrete text expression knowledge in the transition discrete text expression knowledge and corresponding second transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain second text blending expression knowledge, and performing knowledge extraction operation according to the second text blending expression knowledge to obtain second text blending expression knowledge; and when the execution of the transition discrete text expression knowledge and the transition distribution text expression knowledge is finished, acquiring target text blending expression knowledge.

Further, the mining ideographic knowledge according to the bundling discrete text expression knowledge, the bundling distribution text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts to obtain the text ideographic knowledge corresponding to the disassembled texts, and predicting text types according to the text ideographic knowledge to obtain the key text support degrees corresponding to the disassembled texts, includes: fusing the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text fusion expression knowledge corresponding to the disassembled texts to obtain target text fusion expression knowledge corresponding to the disassembled texts; performing knowledge extraction operation according to target text fusion expression knowledge corresponding to the disassembled texts to obtain linear expression knowledge corresponding to the disassembled texts; determining a maximum value of a knowledge vector and a mean value of the knowledge vector corresponding to each dimension in the linear expression knowledge according to the linear expression knowledge corresponding to the decomposed texts; carrying out sum operation on the maximum value of the knowledge vector and the mean value of the knowledge vector to obtain corresponding ideographic mining knowledge vectors under each dimension in the linear expression knowledge, and obtaining corresponding ideographic mining knowledge of the disassembled texts according to the corresponding ideographic mining knowledge vectors under each dimension in the linear expression knowledge; activating ideographic mining knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts; and predicting the types of the key texts and the non-key texts through the text ideographic knowledge corresponding to the disassembled texts to obtain the support degrees of the key texts corresponding to the disassembled texts.

Further, the method further comprises: loading the text set to be processed into a text type prediction module, and disassembling the text set to be processed through the text type prediction module to obtain a plurality of disassembled texts; respectively carrying out discrete text expression knowledge mining on the disassembled texts through the text type prediction module to obtain discrete text expression knowledge corresponding to the disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge; respectively mining distributed text expression knowledge of the disassembled texts to obtain distributed text expression knowledge corresponding to the disassembled texts, wherein the distributed text expression knowledge comprises transitional distributed text expression knowledge and convergence distributed text expression knowledge; performing expression knowledge collision on the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts through the text type prediction module to obtain target text blending expression knowledge corresponding to the disassembled texts; performing ideographic knowledge mining on the convergence discrete text expression knowledge, the convergence distributed text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts through the text type prediction module to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts; the text type prediction module comprises a discrete text expression knowledge mining sub-module, a distributed text expression knowledge mining sub-module, an expression knowledge collision module, a text ideographic knowledge mining module and a type prediction module; the method further comprises the following steps: loading the text set to be processed into a text type prediction module, and disassembling the text set to be processed through the text type prediction module to obtain a plurality of disassembled texts; loading the plurality of disassembled texts into the discrete text expression knowledge mining submodule for mining distributed text expression knowledge to obtain transition discrete text expression knowledge and convergence discrete text expression knowledge; loading the plurality of disassembled texts into the distributed text expression knowledge mining submodule to mine the distributed text expression knowledge, and obtaining transition distributed text expression knowledge and convergence distributed text expression knowledge; loading transition discrete text expression knowledge and transition distribution text expression knowledge corresponding to a plurality of disassembled texts into the expression knowledge collision module for expression knowledge collision to obtain target text blending expression knowledge corresponding to the plurality of disassembled texts; loading the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text blending expression knowledge corresponding to the disassembled texts to the text ideographic knowledge mining module for ideographic knowledge mining to obtain text ideographic knowledge corresponding to the disassembled texts, and loading the text ideographic knowledge to the type prediction module for text type prediction to obtain the key text support degrees corresponding to the disassembled texts.

A second aspect of embodiments of the present application provides a server, comprising a processor and a memory, the memory storing a computer program, and the processor executing the computer program to perform the method described above.

The data mining method and the server based on the artificial intelligence, provided by the embodiment of the application, are characterized in that a text set to be processed is disassembled to obtain a plurality of disassembled texts, then discrete text expression knowledge mining is respectively carried out on the plurality of disassembled texts to obtain transition discrete text expression knowledge and convergence discrete text expression knowledge, distributed text expression knowledge mining is respectively carried out on the plurality of disassembled texts to obtain transition distribution text expression knowledge and convergence distribution text expression knowledge, expression knowledge collision is carried out through the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the plurality of disassembled texts to obtain target text blending expression knowledge corresponding to the plurality of disassembled texts, and the obtained target text blending expression knowledge covers information which is mutually filled between dispersion and distribution by adopting the expression knowledge collision. And then, performing ideographic knowledge mining through the convergence discrete text expression knowledge, the convergence distribution text expression knowledge and the target text blending expression knowledge corresponding to the plurality of disassembled texts to obtain the text ideographic knowledge corresponding to the plurality of disassembled texts, so that the mined text ideographic knowledge can give consideration to both discrete content and distribution content, and the mined text ideographic knowledge can maintain the initial characteristics of the texts as much as possible. And then, text type prediction is carried out according to the text ideographic knowledge, and the support degrees of the key texts corresponding to a plurality of disassembled texts are obtained, so that the accuracy and the reliability of the text type prediction can be improved. Then, determining a plurality of text sequences from the text set to be processed according to the support degree of the key text, and determining integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge; and predicting the text sequence types according to the integrated ideographic knowledge corresponding to the text sequences to obtain the same text sequence set, so that the accuracy and reliability of text sequence type prediction are improved, and the accuracy and reliability of the obtained same text sequence set are improved.

In the description that follows, additional features will be set forth, in part, in the description. These features will be in part apparent to those skilled in the art upon examination of the following and the accompanying drawings, or may be learned by production or use. The features of the present application may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations particularly pointed out in the detailed examples that follow.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

fig. 1 is a flowchart of a data mining method based on artificial intelligence according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a functional module architecture of a data mining device according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the embodiment of the present application, the execution subject of the artificial intelligence based data mining method is a server, for example, a server group consisting of a single network server and a plurality of network servers, or a cloud consisting of a large number of computers or network servers in cloud computing, where cloud computing is one of distributed computing and is a super virtual computer consisting of a group of loosely coupled computers. The server comprises a computer-readable storage medium having stored thereon a computer program which, when run on a processor, enables the processor to perform the respective content of the aforementioned method embodiments. The server further comprises a processor and a memory, wherein the memory is used for storing executable instructions of the processor, and the processor executes the executable instructions to execute the artificial intelligence based data mining method provided by the embodiment of the application.

Referring to fig. 1, a flowchart of an artificial intelligence based data mining method provided in an embodiment of the present application is shown, where the method includes the following steps:

s1, acquiring a text set to be processed, and disassembling the text set to be processed to obtain a plurality of disassembled texts.

The text set to be processed is a text set which needs to be subjected to text mining and arrangement to obtain the same text, and it is to be noted that the text set means that the text is composed of a plurality of text chapters or paragraphs and the like, and a formed text set, for example, for an e-commerce platform, can be a combined text of e-commerce text logs which are counted according to dates, or for a e-commerce platform, the text set to be processed is a text combination constructed by a series of business logs, or, for example, for the field of data security, the text set to be processed is a set formed by combining a plurality of data security report texts, or for a commenting platform, the text set to be processed is a commenting text set formed by a plurality of users performing different dish points on the same shop. The parsed text is a text parsing result in the text set to be processed, and is each text group obtained by parsing according to log date, text chapters and text paragraphs, for example. The texts may be collected and stored by the server from the terminal device according to a preset period, or may be sent to the server by a demand provider who needs to perform the above-mentioned demand. The server disassembles the text set to be processed, which may be equally divided according to the log date, chapter division, paragraph, and the like, or according to a preset text length, and this is not limited in this embodiment of the present application.

And S2, respectively carrying out discrete text expression knowledge mining on the plurality of disassembled texts to obtain discrete text expression knowledge corresponding to the plurality of disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge.

The discrete text expression knowledge is text expression knowledge mined in a discrete representation mode, the text expression knowledge is a feature vector mined based on an AI (artificial intelligence) expert model, the discrete representation mode can be a text mining mode based on models such as one-hot unique hot coding, a BOW (business object bag) model, a TF-IDF (Trans-idF) model and the like, the discrete representation mode is a process of coding the whole text, the obtained text expression knowledge represents feature information of the whole text, and the relation between words in the text cannot be measured. The transitional discrete text expression knowledge is ideographic knowledge obtained in the process of mining convergence discrete text expression knowledge, the convergence discrete text expression knowledge is discrete text expression knowledge corresponding to decomposed texts obtained by final mining, and the ideographic knowledge is characteristic information for performing meaning explanation on the texts.

For example, a plurality of knowledge extraction operations (such as convolution calculation) may be performed on the disassembled text, each knowledge extraction operation obtains transitional discrete text expression knowledge, the transitional discrete text expression knowledge is determined as input data of the next knowledge extraction operation until the knowledge extraction operation is completed, and a result of the last knowledge extraction operation is determined as the bundling discrete text expression knowledge. And excavating discrete text expression knowledge for each disassembled text to obtain transition discrete text expression knowledge and convergence discrete text expression knowledge corresponding to each disassembled text.

And S3, mining distributed text expression knowledge of the disassembled texts respectively to obtain distributed text expression knowledge corresponding to the disassembled texts, wherein the distributed text expression knowledge comprises transitional distributed text expression knowledge and convergence distributed text expression knowledge.

The distributed text expression knowledge is ideographic knowledge mined in a distributed expression mode, the distributed expression mode can be a text mining mode based on algorithms such as n-gram, word2vec, gloVe, ELMO and the like, and the distributed expression knowledge has stronger logic expression capacity in the text mining process by considering context information such as sentence word order and the like in the text. The transitional distribution text expression knowledge is the ideographic knowledge mined during mining of the closing distribution text expression knowledge, and the closing distribution text expression knowledge is the ideographic knowledge corresponding to the disassembled text finally mined.

For example, a plurality of knowledge extraction operations may be performed on the disassembled text, each knowledge extraction operation outputs the transitive distribution text expression knowledge, the transitive distribution text expression knowledge is determined as input data of the next knowledge extraction operation until the knowledge extraction operation is completed, and a result of the last knowledge extraction operation is determined as the bundling distribution text expression knowledge. And mining the distributed text expression knowledge of each disassembled text to obtain the transition distributed text expression knowledge and the convergence distributed text expression knowledge corresponding to each disassembled text.

And S4, carrying out expression knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts to obtain target text blending expression knowledge corresponding to the disassembled texts.

The expression knowledge collision is to perform text knowledge interaction on the transition discrete text expression knowledge and the corresponding transition distribution text expression knowledge, complement the transition discrete text expression knowledge and the corresponding transition distribution text expression knowledge to increase the reliability of text analysis and mine more complete ideographic knowledge. The target text blending expression knowledge is the ideographic knowledge obtained after the collision of the discrete ideographic knowledge and the distributed ideographic knowledge. For example, the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled text are fused to obtain the target text blending expression knowledge corresponding to the disassembled text, and the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to each disassembled text are fused to obtain the target text blending expression knowledge corresponding to each disassembled text.

And S5, performing ideographic knowledge mining according to the convergence discrete text expression knowledge, the convergence distributed text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain the key text support degrees corresponding to the disassembled texts.

The text ideographic knowledge is obtained by integrating discrete ideographic knowledge, distributed ideographic knowledge and text blending expression knowledge, and each disassembled text comprises corresponding text ideographic knowledge. The text type prediction is to predict whether the text is a key text, and the prediction result includes a key text and a non-key text. The support degree of the key text indicates the probability that the corresponding disassembled text is the key text, and the larger the support degree of the key text is, the larger the probability that the corresponding disassembled text is the key text is. For example, text ideographic knowledge integration calculation is performed through the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text blending expression knowledge corresponding to each disassembled text, so that the knowledge after the ideographic knowledge integration is obtained, and the text ideographic knowledge corresponding to each disassembled text is also obtained. And performing type prediction through text ideographic knowledge, determining whether the disassembled text is a key text or a non-key text, and determining the support degree of the key text corresponding to each disassembled text.

And S6, determining a plurality of text sequences from the text set to be processed according to the support degree of the key text, and determining integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge.

The text sequence is a text combination obtained by fusing a plurality of continuous key texts, the key texts are disassembled texts of which the key text support degree is greater than the preset key text support degree, the preset key text support degree is a numerical value determined in advance when the disassembled texts are the key texts, and the integrated ideographic knowledge represents the ideographic knowledge of the text sequence and is obtained by fusing the text ideographic knowledge corresponding to each key text. For example, the support degree of the key text corresponding to each parsed text is compared with a preset support degree of the key text, and when the support degree of the key text exceeds the preset support degree of the key text, the parsed text corresponding to the support degree of the key text is the key text. And then fusing the key texts which can be connected in the text set to be processed into text sequences according to the text position sequence to obtain a plurality of text sequences, fusing the text ideographic knowledge corresponding to each key text in the text sequences to obtain integrated ideographic knowledge corresponding to the text sequences, and executing each text sequence to obtain integrated ideographic knowledge corresponding to each text sequence.

And S7, predicting the text sequence type according to the integrated ideographic knowledge corresponding to the text sequences to obtain the same text sequence set.

The text sequence type prediction is used to determine whether the text sequences are the same text sequences, the same text sequence set includes each of the same text sequences, the same text sequences are text sequences with a matching degree greater than a preset matching degree, for example, when the matching degree is greater than a plurality of text sequences with a preset matching degree, the text sequence set may be a text paragraph set which is used by a commenter of the same shop and different dishes in the same shop and commented on the commenting platform, or may be a text set which is used by a user of the same seller and different dishes in the same commodity and evaluated on the same merchant platform and different consumers, or may be a text information set which is still obtained by integrating commodity description information of the same series of the same single product and different periods on the commenting platform.

Clustering can be performed on a plurality of text sequences through the integrated ideographic knowledge corresponding to the plurality of text sequences (for example, similarity division is performed by obtaining a vector distance between the knowledge based on a preset algorithm, and the preset algorithm can refer to a clustering algorithm such as k-means), so that one or more same text sequence sets are obtained.

According to the data mining method based on artificial intelligence, a plurality of disassembled texts are obtained by disassembling a text set to be processed, then discrete text expression knowledge mining is respectively carried out on the plurality of disassembled texts, transition discrete text expression knowledge and convergence discrete text expression knowledge are obtained, distributed text expression knowledge mining is respectively carried out on the plurality of disassembled texts, transition distributed text expression knowledge and convergence distributed text expression knowledge are obtained, expression knowledge collision is carried out through the transition discrete text expression knowledge and the transition distributed text expression knowledge corresponding to the plurality of disassembled texts, target text blending expression knowledge corresponding to the plurality of disassembled texts is obtained, and the obtained target text blending expression knowledge covers information which is mutually filled between dispersion and distribution by adopting expression knowledge collision. And then, performing ideographic knowledge mining through the convergence discrete text expression knowledge, the convergence distribution text expression knowledge and the target text blending expression knowledge corresponding to the plurality of disassembled texts to obtain the text ideographic knowledge corresponding to the plurality of disassembled texts, so that the mined text ideographic knowledge can give consideration to both discrete content and distribution content, and the mined text ideographic knowledge can maintain the initial characteristics of the texts as much as possible. And then, text type prediction is carried out according to the text ideographic knowledge to obtain the support degrees of the key texts corresponding to the disassembled texts, so that the accuracy and reliability of the text type prediction can be improved. Then, determining a plurality of text sequences from the text set to be processed according to the support degree of the key text, and determining integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge; and predicting the text sequence type according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set, so that the accuracy and reliability of predicting the text sequence type are improved, and the accuracy and reliability of the obtained same text sequence set are improved.

As an executable implementation manner, for the step S7, performing text sequence type prediction according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set, the method may specifically include the following steps:

step S71, hidden mapping is carried out according to the integrated ideographic knowledge corresponding to the text sequences to obtain hidden mapping knowledge.

The hidden mapping is to perform encoding processing through an Encoder module in a mapping network, the mapping network provided in the embodiment of the present application is obtained by building on a framework of a machine translation model, and the hidden mapping knowledge is an encoding vector integrating ideographic knowledge obtained after hidden mapping. The debugging process of the mapping network can be that parameter optimization is carried out on a preset mapping network, the optimization reaches a convergence condition, namely debugging is completed, sample data and mark information corresponding to the sample data are involved in the optimization process, the sample data are loaded to the mapping network for coding, loss between a result and the mark information is obtained, corresponding parameters are optimized through loss, and the debugging is carried out repeatedly until the network converges. Of course, in the embodiment of the present application, an existing mapping network may also be obtained in an open way as the mapping network.

And step S72, carrying out reduction mapping through the hidden mapping knowledge and the support degrees of the key texts corresponding to the plurality of disassembled texts to obtain target integrated ideographic knowledge corresponding to the plurality of text sequences.

The mapping recovery is a process of performing decoding operation through a Decoder module in the mapping network, for example, the key text support degree of the disassembled text corresponding to the current text sequence is obtained from the key text support degrees corresponding to the multiple disassembled texts, and then the hidden mapping knowledge corresponding to the current text sequence and the key text support degree of the disassembled text corresponding to the current text sequence are loaded into the Decoder module of the mapping network for decoding, so as to obtain the target integrated ideographic knowledge corresponding to the current text sequence. And finishing the execution of each text sequence to obtain target integrated ideographic knowledge corresponding to all the text sequences.

And S73, performing type prediction on the plurality of text sequences according to the target integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set.

For example, the target integrated ideographic knowledge corresponding to the plurality of text sequences may be clustered by a kmeans algorithm to obtain a plurality of clustered text sequences, and each type of text sequence is determined to be the same text sequence to obtain a text sequence set of the type.

As an embodiment, the mapping network may be composed following a classic machine translation model architecture, and may specifically include an input, an encoding module, a decoding module, and an output. The encoding module, i.e. the above-mentioned hidden module, as a classical configuration, includes 6 encoders, and correspondingly, the decoding module includes 6 decoders. The Encoder comprises a Multi-Head attachment module and a Feed Forward Network module, the Decoder comprises a Masked Multi-Head attachment module, a Multi-Head attachment module and a Feed Forward Network module, a ResNet module for preventing Network degradation and a normaize module for improving debugging speed are inserted among the modules, hidden mapping knowledge corresponding to a plurality of text sequences is obtained by loading integrated ideographic knowledge corresponding to the text sequences into a hidden module for processing, hidden mapping knowledge corresponding to the text sequences and key text support corresponding to the disassembled text are loaded into a decoding module for decoding, and target integrated ideographic knowledge corresponding to the text sequences is obtained. That is to say, by using the support degrees of the key texts corresponding to the plurality of disassembled texts as input data consistent with the decoding module, data determined by text types can be directly learned, and the ideographic capability of mapping network output ideographic knowledge is improved.

As an executable implementation manner, for step S71, performing hidden mapping according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain hidden mapping knowledge, specifically, the hidden mapping knowledge may include: mining initial text expression knowledge corresponding to the disassembled texts, and determining text sequence initial expression knowledge corresponding to the text sequences from the initial text expression knowledge corresponding to the disassembled texts; fusing the text sequence initial expression knowledge corresponding to the text sequences with the corresponding integrated ideographic knowledge respectively to obtain target text fusion expression knowledge corresponding to the text sequences; and loading target text fusion expression knowledge corresponding to the plurality of text sequences to a hidden module of the mapping network for processing to obtain the target hidden mapping knowledge.

The initial text expression knowledge is the most original feature information of the text, the initial text expression knowledge of the text sequence is the initial text expression knowledge corresponding to the text sequence, and is obtained by fusing the initial text expression knowledge of a plurality of disassembled texts corresponding to the text sequence, the target text fusion expression knowledge is a knowledge vector after fusing the original feature information, and the target hidden mapping knowledge is hidden mapping knowledge after fusing the original feature information.

For example, initial text expression knowledge corresponding to each of the disassembled texts is mined, and then the initial text expression knowledge of the disassembled texts corresponding to each text sequence is fused to obtain initial text sequence expression knowledge corresponding to each text sequence, wherein the initial text expression knowledge of the disassembled texts corresponding to each text sequence is known to be respectively linked with the integration ideographic knowledge corresponding to each text sequence to obtain target text fusion expression knowledge corresponding to each text sequence, and then the target text fusion expression knowledge corresponding to each text sequence is loaded one by one into the hidden module of the mapping network to be processed to obtain target hidden mapping knowledge.

Based on this, the text sequence initial expression knowledge and the corresponding integrated ideographic knowledge are fused and then processed, so that the accuracy and reliability of the output target hidden mapping knowledge can be improved, and the accuracy and reliability of the obtained target integrated ideographic knowledge can be improved.

As an executable implementation manner, in step S73, performing type prediction on a plurality of text sequences according to target integrated ideographic knowledge corresponding to the plurality of text sequences to obtain a same text sequence set, which may specifically include: determining a common measurement result among the text sequences through target integrated ideographic knowledge corresponding to the text sequences; and clustering according to the result of the commonality measurement among the text sequences to obtain the same text sequence set.

The similarity measurement result reflects the similarity degree between the text sequences, the similarity measurement result can be evaluated based on the distance or included angle between the obtained vectors, and the smaller the distance or included angle is, the higher the similarity degree of the two is, i.e. the larger the similarity measurement result is. For example, a first target integrated ideographic knowledge and a second target integrated ideographic knowledge are obtained from target integrated ideographic knowledge corresponding to a plurality of text sequences through target integrated ideographic knowledge corresponding to each text sequence, then a result of commonality measurement between the first target integrated ideographic knowledge and the second target integrated ideographic knowledge is determined, a result of commonality measurement between all target integrated ideographic knowledge is determined, then all results of commonality measurement are clustered (classified), and text sequences corresponding to target integrated ideographic knowledge with a result of commonality measurement larger than a preset value are integrated into a same text sequence set. Therefore, clustering is carried out by determining the result of the commonality measurement, so that the process of determining the central vector is not limited, and the speed, the accuracy and the reliability of the obtained same text sequence set can be improved.

As an executable implementation manner, in step S2, discrete text expression knowledge mining is performed on the multiple disassembled texts, so as to obtain discrete text expression knowledge corresponding to the multiple disassembled texts, where the discrete text expression knowledge includes transition discrete text expression knowledge and convergence discrete text expression knowledge, and the method specifically includes: respectively carrying out discrete knowledge extraction operation on the disassembled texts to obtain a plurality of transition linear knowledge and convergence linear knowledge corresponding to the disassembled texts; uniformly processing the distribution dimensions of the plurality of transitional linear knowledge to obtain a plurality of transitional discrete text expression knowledge corresponding to a plurality of disassembled texts; and carrying out distribution dimension unified processing on the convergence linear knowledge to obtain convergence discrete text expression knowledge corresponding to a plurality of disassembled texts.

In the embodiment of the present application, the discrete knowledge extraction operation is a knowledge extraction operation (for example, convolution calculation) for obtaining text discrete expression knowledge, the bundling linear knowledge is a knowledge vector obtained by the last knowledge extraction operation, the transitional linear knowledge is a knowledge vector obtained by the remaining knowledge extraction operations except for the last knowledge extraction operation, and the distribution dimension unified processing is a processing mode of converting the discrete text expression knowledge into the same dimension as the distribution text expression knowledge.

For example, discrete knowledge extraction operation is performed on each disassembled text to obtain a plurality of transition linear knowledge corresponding to each disassembled text and convergence linear knowledge obtained by one knowledge extraction operation at the tail, then distribution dimension unified processing is performed on each transition linear knowledge to obtain a plurality of transition discrete text expression knowledge corresponding to a plurality of disassembled texts, and distribution dimension unified processing is performed on the convergence linear knowledge to obtain convergence discrete text expression knowledge corresponding to a plurality of disassembled texts.

As an executable implementation manner, in step S3, mining distributed text expression knowledge of the plurality of disassembled texts to obtain distributed text expression knowledge corresponding to the plurality of disassembled texts, where the distributed text expression knowledge includes transition distributed text expression knowledge and convergence distributed text expression knowledge, and may specifically include: mining initial text expression knowledge corresponding to each of the disassembled texts; and carrying out distributed knowledge extraction operation on the initial text expression knowledge corresponding to the disassembled texts to obtain a plurality of transition distributed text expression knowledge and convergence distributed text expression knowledge corresponding to the disassembled texts. The distributed knowledge extraction operation is a knowledge extraction operation for obtaining text distributed expression knowledge, for example, initial text expression knowledge corresponding to each disassembled text is mined, then distributed knowledge extraction operations (such as convolution calculation) are performed on each initial text expression knowledge for multiple times, the number of distributed knowledge extraction operations is consistent with the number of discrete knowledge extraction operations, a final distributed knowledge extraction operation obtains a final distributed text expression knowledge, remaining distributed knowledge extraction operations obtain transitional distributed text expression knowledge, and finally a plurality of transitional distributed text expression knowledge and final distributed text expression knowledge corresponding to a plurality of disassembled texts are obtained. Therefore, initial text expression knowledge corresponding to a plurality of disassembled texts is mined; and then, extracting the distributed knowledge through the initial text expression knowledge to obtain a plurality of transitional distributed text expression knowledge and closing distributed text expression knowledge corresponding to a plurality of disassembled texts, so that the accuracy and reliability of the obtained distributed text expression knowledge can be improved.

As an executable implementation, the transitional discrete text expression knowledge comprises a plurality of discrete text expression knowledge, and the transitional distributed text expression knowledge comprises a plurality of discrete text expression knowledge; in step S4, performing expression knowledge collision according to the transition discrete text expression knowledge and the transition distributed text expression knowledge corresponding to the multiple disassembled texts to obtain target text blending expression knowledge corresponding to the multiple disassembled texts, which may specifically include:

step S41, fusing a first transition discrete text expression knowledge in the transition discrete text expression knowledge and a corresponding first transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain a first text fusion expression knowledge, and performing knowledge extraction operation according to the first text fusion expression knowledge to obtain a first text fusion expression knowledge.

The text fusion expression knowledge is obtained by splicing or adding knowledge vectors, and the text fusion expression knowledge is obtained after the expression knowledge is collided. For example, a first transitional discrete text expression knowledge and a corresponding first transitional distribution text expression knowledge are obtained, the first transitional discrete text expression knowledge and the corresponding first transitional distribution text expression knowledge are obtained based on a first knowledge extraction operation (for example, through a first convolution unit), then the first transitional discrete text expression knowledge and the corresponding first transitional distribution text expression knowledge are spliced (for example, in dimension), a first text fusion expression knowledge is obtained, and finally the knowledge extraction operation is performed on the first text fusion expression knowledge, so that the first text fusion expression knowledge is obtained.

And S42, fusing the first text blending expression knowledge, the second transition discrete text expression knowledge in the transition discrete text expression knowledge and the corresponding second transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain second text blending expression knowledge, and performing knowledge extraction operation according to the second text blending expression knowledge to obtain the second text blending expression knowledge.

For example, when the fusion of the transition discrete text expression knowledge and the transition distribution text expression knowledge is performed, the first text fusion expression knowledge obtained last time is fused at the same time to obtain the second text fusion expression knowledge, and then the knowledge extraction operation (for example, based on convolution) is performed on the second text fusion expression knowledge to obtain the second text fusion expression knowledge.

And S43, when the execution of the plurality of transitional discrete text expression knowledge and the plurality of transitional distribution text expression knowledge is finished, acquiring target text blending expression knowledge.

For example, expression knowledge collision is carried out on each transition discrete text expression knowledge and corresponding transition distribution text expression knowledge one by one to obtain the last text blending expression knowledge, the last text blending expression knowledge is fused with the current transition discrete text expression knowledge and the transition distribution text expression knowledge, then knowledge extraction operation is carried out on the text blending expression knowledge through convolution coefficients to obtain the current text blending expression knowledge, when expression knowledge collision is carried out at the end once, the text blending expression knowledge is fused with the final transition discrete text expression knowledge and the final transition distribution text expression knowledge to obtain the final text blending expression knowledge, and then knowledge extraction operation is carried out on the final text blending expression knowledge to obtain the target text blending expression knowledge. Based on the method, the expression knowledge collision is carried out on the transition discrete text expression knowledge and the corresponding transition distribution text expression knowledge, so that the discrete and distribution knowledge are mutually supplemented, the upper layer module acquires the information of the lower layer module, and the obtained target text fusion expression knowledge is further accurate and reliable.

As an executable implementation manner, for the step S5, performing ideographic knowledge mining according to the bundling discrete text expression knowledge, bundling distributed text expression knowledge, and target text blending expression knowledge corresponding to the multiple disassembled texts, to obtain text ideographic knowledge corresponding to the multiple disassembled texts, and performing text type prediction according to the text ideographic knowledge, to obtain key text support degrees corresponding to the multiple disassembled texts, specifically, the method may include the following steps:

and S51, fusing the convergence discrete text expression knowledge, the convergence distributed text expression knowledge and the target text fusion expression knowledge corresponding to the disassembled texts to obtain the target text fusion expression knowledge corresponding to the disassembled texts.

And S52, performing knowledge extraction operation according to the target text fusion expression knowledge corresponding to the disassembled texts to obtain linear expression knowledge corresponding to the disassembled texts.

The target text fusion expression knowledge is obtained by fusing the convergence discrete text expression knowledge, the convergence distribution text expression knowledge and the target text fusion expression knowledge, and the linear expression knowledge is obtained by performing knowledge extraction operation on the target text fusion expression knowledge.

For example, the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text fusion expression knowledge corresponding to each disassembled text are spliced in dimensions one by one to obtain target text fusion expression knowledge corresponding to each disassembled text, and then the target text fusion expression knowledge corresponding to each disassembled text is loaded to a convolution unit to perform knowledge extraction operation to obtain linear expression knowledge corresponding to each disassembled text.

And S53, determining the maximum value of the knowledge vector and the mean value of the knowledge vector corresponding to each dimension in the linear expression knowledge according to the linear expression knowledge corresponding to the plurality of disassembled texts.

And S54, carrying out sum operation on the maximum value of the knowledge vector and the mean value of the knowledge vector to obtain an ideographic mining knowledge vector corresponding to each dimension in the linear expression knowledge, and obtaining ideographic mining knowledge corresponding to a plurality of disassembled texts according to the ideographic mining knowledge vector corresponding to each dimension in the linear expression knowledge.

The maximum value of the knowledge vector is the maximum value of the knowledge vector in all the corresponding knowledge vectors under the dimension, the mean value of the knowledge vector is the mean value of all the corresponding knowledge vectors under the dimension, and the ideographic mining knowledge vector is a knowledge vector which is obtained by mining and represents the ideographic knowledge of the text.

For example, the ideographic mining knowledge corresponding to each disassembled text is determined one by one, the linear expression knowledge corresponding to the disassembled text to be determined at present is obtained, and then the maximum value of the knowledge vector and the mean value of the knowledge vector corresponding to each dimension in the linear expression knowledge are determined, that is, the mean value of the knowledge vector and the maximum value of the knowledge vector of all the knowledge vectors corresponding to each dimension are determined. And then, carrying out sum operation on the maximum value of the knowledge vector and the mean value of the knowledge vector to obtain an ideographic mining knowledge vector corresponding to each dimension in the linear expression knowledge, and determining the ideographic mining knowledge vector corresponding to each dimension as the ideographic mining knowledge corresponding to the current disassembled text.

And step S55, activating the ideographic mining knowledge corresponding to the disassembled texts to obtain the text ideographic knowledge corresponding to the disassembled texts.

And S56, performing key text and non-key text type prediction through text ideographic knowledge corresponding to the plurality of disassembled texts to obtain key text support degrees corresponding to the plurality of disassembled texts.

For example, the ideographic mining knowledge corresponding to each disassembled text is activated one by one through an activation function (such as a Relu function) to obtain text ideographic knowledge corresponding to a plurality of disassembled texts, and then a normalized exponential function (such as a softmax function) is adopted to predict the types of the key texts and the non-key texts according to the text ideographic knowledge to obtain the support degrees of the key texts corresponding to the plurality of disassembled texts. Therefore, the maximum value of the knowledge vector and the mean value of the knowledge vector are determined, the maximum value of the knowledge vector and the mean value of the knowledge vector are adopted to obtain ideographic mining knowledge, the maximum value of the knowledge vector can represent the optimal representation information, the mean value of the knowledge vector can represent the overall balance information, the text ideographic knowledge obtained by mining has high accuracy and reliability, and finally type prediction is carried out through the text ideographic knowledge, so that the accuracy and reliability of the obtained key text support degree are improved.

As an executable implementation, the artificial intelligence based data mining method further comprises:

and S10, loading the text set to be processed into a text type prediction module, and disassembling the text set to be processed through the text type prediction module to obtain a plurality of disassembled texts.

Step S20, respectively carrying out discrete text expression knowledge mining on a plurality of disassembled texts through a text type prediction module to obtain discrete text expression knowledge corresponding to the plurality of disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge; and respectively mining the distributed text expression knowledge of the plurality of disassembled texts to obtain the distributed text expression knowledge corresponding to the plurality of disassembled texts, wherein the distributed text expression knowledge comprises transition distributed text expression knowledge and convergence distributed text expression knowledge.

And S30, performing expression knowledge collision on the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts through a text type prediction module to obtain target text blending expression knowledge corresponding to the disassembled texts.

And S40, performing ideographic knowledge mining on the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text blending expression knowledge corresponding to the disassembled texts through a text type prediction module to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts.

The text type prediction module is used for predicting key information and non-key information types of a text set, and is obtained by debugging in advance, the basic framework of the text type prediction module can be CNN, RNN, FCNN and the like, specifically, the text type prediction module can be used for debugging through a text set sample and carried mark information, for example, a to-be-processed text set is obtained and loaded into the text type prediction module, the text type prediction module can comprise two sub-modules, convergence distribution text expression knowledge and convergence discrete text expression knowledge corresponding to the to-be-processed text set are simultaneously mined through the two sub-modules, expression knowledge collision is simultaneously carried out, expression knowledge collision is carried out on the mined transition distribution text expression knowledge and the transition discrete text expression knowledge, target text blending expression knowledge is obtained, and then text type prediction is carried out through the mined ideographic knowledge based on the obtained convergence distribution text expression knowledge, the convergence discrete text expression knowledge and the target text blending expression knowledge. Based on the method, the text type prediction is carried out through the text type prediction module, the support degree of the key texts corresponding to the disassembled texts is obtained, and the speed of text type prediction can be improved.

As an executable implementation, the text type prediction module comprises a discrete text expression knowledge mining sub-module, a distributed text expression knowledge mining sub-module, an expression knowledge collision module, a text ideographic knowledge mining module and a type prediction module; the artificial intelligence based data mining method may further include:

and S100, loading the text set to be processed into a text type prediction module, and disassembling the text set to be processed through the text type prediction module to obtain a plurality of disassembled texts.

And S200, loading a plurality of disassembled texts into a discrete text expression knowledge mining submodule to mine distributed text expression knowledge, and obtaining transition discrete text expression knowledge and convergence discrete text expression knowledge.

And S300, loading the plurality of disassembled texts into a distributed text expression knowledge mining submodule to mine the distributed text expression knowledge, and acquiring transition distributed text expression knowledge and convergence distributed text expression knowledge.

And S400, loading the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts into an expression knowledge collision module for collision of expression knowledge to obtain target text blending expression knowledge corresponding to the disassembled texts.

And S500, loading the convergence discrete text expression knowledge, the convergence distributed text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts into a text ideographic knowledge mining module for ideographic knowledge mining to obtain the text ideographic knowledge corresponding to the disassembled texts, and loading the text ideographic knowledge into a type prediction module for text type prediction to obtain the key text support degrees corresponding to the disassembled texts.

The discrete text expression knowledge mining submodule is used for mining discrete text expression knowledge of the text, and the distributed text expression knowledge mining submodule is used for mining the distributed text expression knowledge of the text. And the expression knowledge collision module is used for carrying out expression knowledge collision on the transition distribution text expression knowledge and the transition discrete text expression knowledge. The text ideographic knowledge mining module is used for mining the ideographic knowledge of the text, and the type prediction module is used for predicting the categories of the key information and the non-key information.

For example, loading a plurality of disassembled texts into the discrete text expression knowledge mining submodule for distributed text expression knowledge mining, that is, outputting discrete text expression knowledge by using convolution units in the discrete text expression knowledge mining submodule, wherein the last convolution unit outputs bundling discrete text expression knowledge, and the rest convolution units output transitional discrete text expression knowledge, and loading a plurality of disassembled texts into the distributed text expression knowledge mining submodule for distributed text expression knowledge mining, that is, outputting distributed text expression knowledge by using convolution units in the distributed text expression knowledge mining submodule, wherein the last convolution unit outputs bundling distributed text expression knowledge, and the rest convolution units output transitional distributed text expression knowledge. It can be understood that in the discrete text expression knowledge mining submodule and the distributed text expression knowledge mining submodule, the convolution times are consistent. And adopting an expression knowledge collision module to perform expression knowledge collision on the transition discrete text expression knowledge and the transition distribution text expression knowledge to obtain target text blending expression knowledge, then performing text ideographic knowledge mining through a text ideographic knowledge mining module, and then performing text type prediction according to a type prediction module to obtain the support degrees of the key texts corresponding to a plurality of disassembled texts.

As an implementation manner, the text type prediction module may include a trunk network and two branch networks corresponding to the discrete and distributed expressions, respectively, and the text type prediction module obtains a set of texts to be processed, loads the set of texts to the two branch networks, performs convolution and pooling for multiple times to obtain bundling discrete text expression knowledge and bundling distributed text expression knowledge, where the bundling distributed text expression knowledge and the bundling discrete text expression knowledge have the same dimension.

As an executable implementation, the tuning process of the text type prediction module includes the following steps:

and T1, acquiring a text set sample and carried mark information.

The text set samples are text sets adopted during adjustment, the marking information is indicative information corresponding to the text set samples, for example, indicative key information or non-key information, and each text of the text set samples can carry the marking information.

And T2, loading the text set samples into a text type prediction module to be debugged, and disassembling the text set samples through the text type prediction module to be debugged to obtain each debugging and disassembling text.

Step T3, respectively carrying out discrete text expression knowledge mining on each debugging and dismantling text through a text type prediction module to be debugged to obtain original discrete text expression knowledge corresponding to each debugging and dismantling text, wherein the original discrete text expression knowledge comprises original transition discrete text expression knowledge and original convergence discrete text expression knowledge; and respectively mining distributed text expression knowledge of each debugging and dismantling text to obtain original distributed text expression knowledge corresponding to each debugging and dismantling text, wherein the original distributed text expression knowledge comprises original transition distributed text expression knowledge and original convergence distributed text expression knowledge.

And T4, performing expression knowledge collision on the original transition discrete text expression knowledge and the original transition distribution text expression knowledge corresponding to each debugging and dismantling text through the text type prediction module to be debugged to obtain the original text blending expression knowledge corresponding to each debugging and dismantling text.

And T5, performing ideographic knowledge mining on the original bundling discrete text expression knowledge, the original bundling distributed text expression knowledge and the original text blending expression knowledge corresponding to each debugging and disassembling text through a text type prediction module to be debugged to obtain the original text ideographic knowledge corresponding to each debugging and disassembling text, and performing text type prediction according to the original text ideographic knowledge to obtain the original key text support degree corresponding to each debugging and disassembling text.

The debugging disassembled text is the disassembled text obtained by disassembling during debugging, and the original discrete text expression knowledge is the discrete text expression knowledge obtained by mining through the coefficient to be optimized. The original distributed text expression knowledge is distributed text expression knowledge obtained through coefficient mining to be optimized, and the original key text support degree is key text support degree obtained through coefficient prediction to be optimized. For example, a to-be-debugged text type prediction module is established based on a neural network, and then the to-be-debugged text type prediction module is used for performing first text type prediction on a text set sample to obtain the original key text support corresponding to each debugged and disassembled text, wherein the process of performing text type prediction on the to-be-debugged text type prediction module is the same as that of the debugged text type prediction module.

And T6, determining error information according to the original key text support corresponding to each debugging and disassembling text and the mark information carried by the text set sample to obtain an error result, and optimizing the text type prediction module to be debugged according to the error result to obtain an optimized text type prediction module.

And T7, determining the optimized text type prediction module as a text type prediction module to be debugged, and iterating the optimization process until convergence to obtain the text type prediction module.

Based on the method, the text type prediction module to be debugged is debugged through the text set sample and the carried mark information to obtain the text type prediction module, the text type prediction module is independently set up for debugging, and the debugging accuracy can be guaranteed through targeted debugging, so that the accuracy and the reliability of the obtained text type prediction module are improved, and finally the accuracy and the reliability of text set processing are improved.

For example, a text set processing network to be debugged is established, a debugging sample is adopted to debug the text set processing network to be debugged to obtain a text set processing network, the text set to be debugged is disassembled by adopting the text set processing network to obtain a plurality of disassembled texts, discrete text expression knowledge corresponding to the disassembled texts is obtained by respectively mining discrete text expression knowledge for the plurality of disassembled texts, the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge, the distributed text expression knowledge corresponding to the plurality of disassembled texts is obtained by respectively mining the distributed text expression knowledge, the knowledge distribution text expression knowledge comprises transition distribution text expression knowledge and convergence distribution text expression knowledge, expression knowledge collision is carried out according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the plurality of disassembled texts to obtain target text intersection expression knowledge corresponding to the plurality of disassembled texts, the convergence distribution text expression knowledge and the convergence text intersection expression knowledge corresponding to the plurality of disassembled texts are obtained, a plurality of semantic expression knowledge support sequences of the disassembled texts are obtained according to the convergence of the plurality of the convergence discrete text expression knowledge, and the semantic prediction support of the semantic expression of the semantic meaning of the plurality of the disassembled texts is obtained and the semantic sequences of the semantic expression of the semantic meaning of the disassembled texts are obtained.

As one embodiment, the artificial intelligence based data mining method is performed by a server, and the method includes:

step S1000, a text set to be processed is obtained, the text set to be processed is loaded into a text type prediction module, the text set to be processed is disassembled through the text type prediction module, and a plurality of disassembled texts are obtained, wherein the text type prediction module comprises a discrete text expression knowledge mining submodule, a distributed text expression knowledge mining submodule, an expression knowledge collision module, a text ideographic knowledge mining module and a type prediction module.

And S2000, loading the plurality of disassembled texts into a discrete text expression knowledge mining submodule to perform discrete knowledge extraction operation, so as to obtain transition linear knowledge and convergence linear knowledge corresponding to the plurality of disassembled texts, and performing distributed dimension unified processing on the transition linear knowledge and the convergence linear knowledge, so as to obtain transition discrete text expression knowledge and target discrete text expression knowledge corresponding to the plurality of disassembled texts.

Step S3000, extracting initial text expression knowledge corresponding to the disassembled texts, and loading the initial text expression knowledge corresponding to the disassembled texts into a distributed text expression knowledge mining submodule to perform distributed knowledge extraction operation, so as to obtain transitional distributed text expression knowledge and convergence distributed text expression knowledge corresponding to the disassembled texts. And meanwhile, fusing the transitional discrete text expression knowledge and the transitional distribution text expression knowledge to obtain first text fusion expression knowledge, and performing knowledge extraction operation according to the first text fusion expression knowledge to obtain target text fusion expression knowledge.

Step S4000, loading the bundling discrete text expression knowledge, the bundling distribution text expression knowledge and the target text fusion expression knowledge corresponding to the disassembled texts into a text ideographic knowledge mining module for fusion to obtain target text fusion expression knowledge corresponding to the disassembled texts, performing knowledge extraction operation according to the target text fusion expression knowledge corresponding to the disassembled texts to obtain linear expression knowledge corresponding to the disassembled texts, determining a maximum knowledge vector value and a mean knowledge vector value corresponding to each dimension in the linear expression knowledge according to the linear expression knowledge corresponding to the disassembled texts, performing sum operation on the maximum knowledge vector value and the mean knowledge vector value to obtain an ideographic knowledge mining vector corresponding to each dimension in the linear expression knowledge, mining the knowledge vector corresponding to each dimension in the linear expression knowledge according to the ideographic knowledge mining corresponding to each dimension in the linear expression knowledge to obtain ideographic knowledge corresponding to the disassembled texts.

And S5000, loading the text ideographic knowledge to a type prediction module to predict the types of the key texts and the non-key texts, so as to obtain the support degrees of the key texts corresponding to the plurality of disassembled texts. Determining a plurality of text sequences from the text set to be processed according to the support degrees of the key texts corresponding to the disassembled texts, and determining the integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge.

Step S6000, loading the integrated ideographic knowledge corresponding to the text sequences to a hidden module of a mapping network for hidden mapping to obtain hidden mapping knowledge corresponding to the text sequences, and loading the hidden mapping knowledge corresponding to the text sequences and the corresponding key text support degree to a decoding module of the mapping network for reduction mapping to obtain target integrated ideographic knowledge corresponding to the text sequences.

And S7000, determining a common measurement result among the text sequences through the target integration ideographic knowledge corresponding to the text sequences, and clustering according to the common measurement result among the text sequences to obtain a same text sequence set.

The related technical contents have already been described in the foregoing other embodiments, and are not described herein again.

Based on the same principle as the method shown in fig. 1, the embodiment of the present application further provides a data mining apparatus 10, where the data mining apparatus 10 may be a computer program (including program code) running in a server, or may be a physical apparatus included in the server, as shown in fig. 2, and the apparatus 10 includes:

the text disassembling module 11 is configured to obtain a to-be-processed text set, and disassemble the to-be-processed text set to obtain a plurality of disassembled texts.

The discrete knowledge mining module 12 is configured to perform discrete text expression knowledge mining on the multiple disassembled texts, respectively, to obtain discrete text expression knowledge corresponding to the multiple disassembled texts, where the discrete text expression knowledge includes transition discrete text expression knowledge and convergence discrete text expression knowledge.

And the distributed knowledge mining module 13 is configured to mine distributed text expression knowledge for each of the disassembled texts to obtain distributed text expression knowledge corresponding to the disassembled texts, where the distributed text expression knowledge includes transition distributed text expression knowledge and convergence distributed text expression knowledge.

And the knowledge collision module 14 is configured to perform expression knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the multiple disassembled texts, so as to obtain target text blending expression knowledge corresponding to the multiple disassembled texts.

And the support degree determining module 15 is configured to perform ideographic knowledge mining according to the bundling discrete text expression knowledge, bundling distributed text expression knowledge, and target text blending expression knowledge corresponding to the multiple disassembled texts, to obtain text ideographic knowledge corresponding to the multiple disassembled texts, and perform text type prediction according to the text ideographic knowledge, to obtain key text support degrees corresponding to the multiple disassembled texts.

And the integrating module 16 is configured to determine a plurality of text sequences from the text collection to be processed according to the key text support degree, and determine integrated ideographic knowledge corresponding to the plurality of text sequences according to the text ideographic knowledge.

And the prediction module 17 is configured to perform text sequence type prediction according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain a same text sequence set.

The data mining device 10 can be used to execute the artificial intelligence-based data mining method, and the specific principle and implementation process thereof have been described in the above embodiments and will not be described herein again.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a few embodiments of the present application and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present application, and that these improvements and modifications should also be considered as the protection scope of the present application.

Embodiments of the present application further provide a computer-readable storage medium containing instructions for being executed by a processor of a data mining service to implement the artificial intelligence based data mining method in the above method embodiments.

For example, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In one example implementation, the memory described above may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

In the embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computing device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data mining method based on artificial intelligence is characterized by being applied to a server, and the method comprises the following steps:

acquiring a text set to be processed, and disassembling the text set to be processed to obtain a plurality of disassembled texts;

respectively carrying out discrete text expression knowledge mining on the disassembled texts to obtain discrete text expression knowledge corresponding to the disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge;

respectively mining distributed text expression knowledge of the disassembled texts to obtain distributed text expression knowledge corresponding to the disassembled texts, wherein the distributed text expression knowledge comprises transitional distributed text expression knowledge and convergence distributed text expression knowledge;

performing expression knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts to obtain target text blending expression knowledge corresponding to the disassembled texts;

performing ideographic knowledge mining according to the convergence discrete text expression knowledge, the convergence distribution text expression knowledge and the target text blending expression knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts;

determining a plurality of text sequences from the text set to be processed according to the support degree of the key text, and determining integrated ideographic knowledge corresponding to the text sequences according to the text ideographic knowledge;

and predicting the text sequence type according to the integrated ideographic knowledge corresponding to the text sequences to obtain the same text sequence set.

2. The method of claim 1, wherein the performing text sequence type prediction according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain a same text sequence set comprises:

hidden mapping is carried out according to the integrated ideographic knowledge corresponding to the text sequences to obtain hidden mapping knowledge;

performing reduction mapping through the hidden mapping knowledge and the support degrees of the key texts corresponding to the disassembled texts to obtain target integrated ideographic knowledge corresponding to the text sequences;

and performing type prediction on the plurality of text sequences according to the target integrated ideographic knowledge corresponding to the plurality of text sequences to obtain the same text sequence set.

3. The method of claim 2, wherein the hidden mapping according to the integrated ideographic knowledge corresponding to the plurality of text sequences to obtain hidden mapping knowledge comprises:

extracting initial text expression knowledge corresponding to the disassembled texts respectively, and determining text sequence initial expression knowledge corresponding to the text sequences from the initial text expression knowledge corresponding to the disassembled texts respectively;

fusing the text sequence initial expression knowledge corresponding to the text sequences with the corresponding integrated ideographic knowledge respectively to obtain target text fusion expression knowledge corresponding to the text sequences;

and loading the target text fusion expression knowledge corresponding to the plurality of text sequences into a hidden module of a mapping network for processing to obtain target hidden mapping knowledge.

4. The method of claim 2, wherein the type prediction of the text sequences according to the target integrated ideographic knowledge corresponding to the text sequences to obtain the same text sequence set comprises:

determining a result of commonality measurement among the plurality of text sequences through target integrated ideographic knowledge corresponding to the plurality of text sequences;

and clustering according to the result of the commonality measurement among the text sequences to obtain the same text sequence set.

5. The method of claim 1, wherein the mining discrete text expression knowledge of the plurality of disassembled texts respectively to obtain discrete text expression knowledge corresponding to the plurality of disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge, and comprises:

respectively carrying out discrete knowledge extraction operation on the disassembled texts to obtain a plurality of transitional linear knowledge and convergent linear knowledge corresponding to the disassembled texts;

performing distributed dimension unified processing on the plurality of transitional linear knowledge to obtain a plurality of transitional discrete text expression knowledge corresponding to the plurality of disassembled texts;

and carrying out distribution dimension unified processing on the bundling linear knowledge to obtain bundling discrete text expression knowledge corresponding to the plurality of disassembled texts.

6. The method of claim 1, wherein the mining distributed text expression knowledge of the plurality of disassembled texts to obtain distributed text expression knowledge corresponding to the plurality of disassembled texts, wherein the distributed text expression knowledge comprises transition distributed text expression knowledge and convergence distributed text expression knowledge, and comprises:

extracting initial text expression knowledge corresponding to the disassembled texts respectively;

and carrying out distributed knowledge extraction operation on the initial text expression knowledge corresponding to the disassembled texts respectively to obtain a plurality of transitional distributed text expression knowledge and convergence distributed text expression knowledge corresponding to the disassembled texts.

7. The method of claim 1, wherein the transitional discrete textual expression knowledge includes a plurality, and the transitional distributed textual expression knowledge includes a plurality;

the method for performing expression knowledge collision according to the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts to obtain the target text blending expression knowledge corresponding to the disassembled texts comprises the following steps:

fusing a first transition discrete text expression knowledge in the transition discrete text expression knowledge with a corresponding first transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain a first text fusion expression knowledge, and performing knowledge extraction operation according to the first text fusion expression knowledge to obtain a first text fusion expression knowledge;

fusing the first text blending expression knowledge, second transition discrete text expression knowledge in the transition discrete text expression knowledge and corresponding second transition distribution text expression knowledge in the transition distribution text expression knowledge to obtain second text blending expression knowledge, and performing knowledge extraction operation according to the second text blending expression knowledge to obtain second text blending expression knowledge;

and when the execution of the plurality of transitional discrete text expression knowledge and the plurality of transitional distribution text expression knowledge is finished, acquiring target text blending expression knowledge.

8. The method of claim 1, wherein the performing ideographic knowledge mining according to the constrained discrete text expression knowledge, constrained distributed text expression knowledge and target text blending expression knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts comprises:

fusing the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text fusion expression knowledge corresponding to the disassembled texts to obtain target text fusion expression knowledge corresponding to the disassembled texts;

performing knowledge extraction operation according to target text fusion expression knowledge corresponding to the disassembled texts to obtain linear expression knowledge corresponding to the disassembled texts;

determining a maximum value of a knowledge vector and a mean value of the knowledge vector corresponding to each dimension in the linear expression knowledge according to the linear expression knowledge corresponding to the decomposed texts;

performing a sum operation on the maximum value of the knowledge vector and the mean value of the knowledge vector to obtain an ideographic mining knowledge vector corresponding to each dimension in the linear expression knowledge, and obtaining ideographic mining knowledge corresponding to the disassembled texts according to the ideographic mining knowledge vector corresponding to each dimension in the linear expression knowledge;

activating ideographic mining knowledge corresponding to the disassembled texts to obtain text ideographic knowledge corresponding to the disassembled texts;

and predicting the types of the key texts and the non-key texts through the text ideographic knowledge corresponding to the disassembled texts to obtain the support degrees of the key texts corresponding to the disassembled texts.

9. The method of claim 1, further comprising:

loading the text set to be processed into a text type prediction module, and disassembling the text set to be processed through the text type prediction module to obtain a plurality of disassembled texts;

respectively carrying out discrete text expression knowledge mining on the disassembled texts through the text type prediction module to obtain discrete text expression knowledge corresponding to the disassembled texts, wherein the discrete text expression knowledge comprises transition discrete text expression knowledge and convergence discrete text expression knowledge;

performing expression knowledge collision on the transition discrete text expression knowledge and the transition distribution text expression knowledge corresponding to the disassembled texts through the text type prediction module to obtain target text blending expression knowledge corresponding to the disassembled texts;

performing ideographic knowledge mining on the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text blending expression knowledge corresponding to the disassembled texts through the text type prediction module to obtain text ideographic knowledge corresponding to the disassembled texts, and performing text type prediction according to the text ideographic knowledge to obtain key text support degrees corresponding to the disassembled texts;

the text type prediction module comprises a discrete text expression knowledge mining sub-module, a distributed text expression knowledge mining sub-module, an expression knowledge collision module, a text ideographic knowledge mining module and a type prediction module;

the method further comprises the following steps:

loading the plurality of disassembled texts into the discrete text expression knowledge mining submodule to carry out discrete text expression knowledge mining, and obtaining transition discrete text expression knowledge and convergence discrete text expression knowledge;

loading the plurality of disassembled texts into the distributed text expression knowledge mining submodule to mine the distributed text expression knowledge, and obtaining transition distributed text expression knowledge and convergence distributed text expression knowledge;

loading transition discrete text expression knowledge and transition distribution text expression knowledge corresponding to a plurality of disassembled texts into the expression knowledge collision module for expression knowledge collision to obtain target text blending expression knowledge corresponding to the plurality of disassembled texts;

loading the convergence discrete text expression knowledge, convergence distribution text expression knowledge and target text blending expression knowledge corresponding to the disassembled texts to the text ideographic knowledge mining module for ideographic knowledge mining to obtain text ideographic knowledge corresponding to the disassembled texts, and loading the text ideographic knowledge to the type prediction module for text type prediction to obtain the key text support degrees corresponding to the disassembled texts.

10. A server comprising a processor and a memory, the memory storing a computer program that, when executed by the processor, performs the method of any of claims 1~9.