Disclosure of Invention
The invention provides a big data anomaly AI analysis method and a server based on online service interaction, and the technical scheme is as follows.
The first aspect is a big data anomaly AI analysis method based on online business interaction, applied to an artificial intelligence server, the method comprising:
performing text extraction operation through a first artificial intelligent network according to the online service interaction text to be analyzed to obtain a text unit to be analyzed in the online service interaction text to be analyzed;
Performing text knowledge mining operation through a second artificial intelligent network according to the text unit to be analyzed to obtain an interactive text knowledge vector of the text unit to be analyzed;
acquiring an interactive text knowledge vector corresponding to each abnormal text unit in a text unit pool, and determining a knowledge vector difference between the interactive text knowledge vector of the text unit to be analyzed and the interactive text knowledge vector of each abnormal text unit;
and taking the data exception label of the abnormal text unit corresponding to the target knowledge vector difference in the knowledge vector differences as the data exception label of the text unit to be analyzed.
In some optional embodiments, the text extraction operation is performed through a first artificial intelligent network according to the online service interaction text to be analyzed to obtain a text unit to be analyzed in the online service interaction text to be analyzed, including:
performing text detail adjustment operation of multiple stages on the online service interaction text to be analyzed through the first artificial intelligent network according to the online service interaction text to be analyzed to obtain a text detail adjustment vector set corresponding to each stage, wherein the text detail adjustment vector set of each stage except for the last stage is used as a raw material of the text detail adjustment operation of the next stage;
Vector aggregation operation is carried out on each text detail adjustment vector set, and a text detail aggregation vector set of the online service interaction text to be analyzed is obtained;
performing text window analysis operation on the online service interaction text to be analyzed according to the text detail aggregation vector set to obtain a text window of the annotation to be analyzed;
and disassembling the online service interaction text to be analyzed according to the text window to obtain the text unit to be analyzed in the online service interaction text to be analyzed.
In some alternative embodiments, the interactive text knowledge vector comprises a highlight annotation knowledge vector; performing text knowledge mining operation through a second artificial intelligent network according to the text unit to be analyzed to obtain an interactive text knowledge vector of the text unit to be analyzed, wherein the interactive text knowledge vector comprises the following components:
and carrying out content characterization knowledge mining operation through the second artificial intelligent network according to the text unit to be analyzed to obtain corresponding text content characterization knowledge of the text unit to be analyzed, and changing the text content characterization knowledge from a first output mode to a second output mode to obtain a highlight annotation knowledge vector of the text unit to be analyzed.
In some alternative embodiments, the interactive text knowledge vector comprises a highlight annotation knowledge vector; the step of obtaining the interactive text knowledge vector corresponding to each abnormal text unit in the text unit pool, and determining the knowledge vector difference between the interactive text knowledge vector of the text unit to be analyzed and the interactive text knowledge vector of each abnormal text unit comprises the following steps:
the highlight annotation knowledge vector corresponding to each abnormal text unit in the text unit pool is obtained, and the following operation is carried out on the highlight annotation knowledge vector corresponding to each abnormal text unit: performing characteristic operation on the highlight annotation knowledge vector of the text unit to be analyzed and the highlight annotation knowledge vector of the abnormal text unit to obtain a highlight annotation comparison vector; and determining a knowledge vector difference according to the highlight annotation comparison vector.
In some alternative embodiments, the interactive text knowledge vector includes a highlight annotation knowledge vector and an annotation category; the step of using the data exception label of the exception text unit corresponding to the target knowledge vector difference in the knowledge vector difference as the data exception label of the text unit to be analyzed includes:
When the target knowledge vector difference in the knowledge vector differences is not greater than a knowledge vector difference threshold, taking the data exception label of the exception text unit corresponding to the target knowledge vector difference in the knowledge vector differences as the data exception label of the text unit to be analyzed;
when the target knowledge vector difference in the knowledge vector difference is larger than a knowledge vector difference threshold, capturing user operation behaviors of the text unit to be analyzed through the second artificial intelligent network to obtain user operation behavior information in the text unit to be analyzed, and judging the behavior tendency of the user operation behavior information to obtain annotation types of the text unit to be analyzed;
and carrying out pairing processing on the annotation category of each abnormal text unit in the text unit pool according to the annotation category of the text unit to be analyzed, taking a target abnormal text unit paired with the annotation category of the text unit to be analyzed as the abnormal text unit corresponding to the target knowledge vector difference in the knowledge vector difference, and taking a data abnormal label to which the target abnormal text unit belongs as the data abnormal label to which the text unit to be analyzed belongs.
In some alternative embodiments, the interactive text knowledge vector includes a highlight annotation knowledge vector and an annotation category; after the data exception label of the exception text unit corresponding to the target knowledge vector difference in the knowledge vector differences is used as the data exception label of the text unit to be analyzed, the method further comprises:
integrating the distribution characteristics of the text window, the annotation category of the text unit to be analyzed and the data exception label of the text unit to be analyzed into a big data exception analysis report.
In some optional embodiments, before the obtaining the interactive text knowledge vector corresponding to each abnormal text unit in the text unit pool, the method further includes:
performing text knowledge mining operation on each abnormal text unit through the second artificial intelligent network to obtain an interactive text knowledge vector of each abnormal text unit;
carrying out label discrimination on each abnormal text unit according to the interactive text knowledge vector to obtain a data abnormal label to which each abnormal text unit belongs;
recording each abnormal text unit, the interactive text knowledge vector of each abnormal text unit and a mapping chain between each abnormal text unit and the data abnormal label to which each abnormal text unit belongs in the text unit pool.
In some optional embodiments, after the obtaining the interactive text knowledge vector corresponding to each abnormal text unit in the text unit pool, the method further includes:
acquiring a plurality of expanded abnormal text units, and performing text knowledge mining operation on each expanded abnormal text unit through the second artificial intelligent network to obtain an interactive text knowledge vector corresponding to each expanded abnormal text unit;
determining a data exception label to which each extended exception text unit belongs according to the interactive text knowledge vector corresponding to each extended exception text unit, wherein the data exception label comprises a first exception label and a second exception label;
recording each extended abnormal text unit, the interactive text knowledge vector of each extended abnormal text unit and a mapping chain between each extended abnormal text unit and the affiliated data abnormal label in the text unit pool.
In some optional embodiments, before the text extraction operation is performed through the first artificial intelligent network according to the online service interaction text to be analyzed to obtain the text unit to be analyzed in the online service interaction text to be analyzed, the method further includes:
Performing text extraction operation on each online service reconstruction text in the online service reconstruction text set through the basic first artificial intelligent network to obtain a text extraction result;
acquiring window prediction cost and label discrimination cost of the basic first artificial intelligent network according to the text extraction result, and debugging the basic first artificial intelligent network according to the label discrimination cost and the window prediction cost;
performing text knowledge mining operation on each abnormal text unit in the abnormal text unit set through the second artificial intelligent network to obtain a highlight annotation knowledge vector of each abnormal text unit;
and acquiring the iteration cost of the basic second artificial intelligent network according to each highlight annotation knowledge vector, and debugging the basic second artificial intelligent network according to the iteration cost.
In some alternative embodiments, the method further comprises: obtaining a plurality of abnormal text units to be selected, and carrying out text unit induction on the plurality of abnormal text units to obtain an abnormal text unit set; acquiring a plurality of example session texts to be selected, cleaning the example session texts which do not meet a setting requirement, and integrating the reserved example session texts into an example session text set, wherein the setting requirement is that the example session texts do not contain abnormal text units;
Generating an online service reconstruction text set according to the abnormal text unit set and the example session text set, wherein each online service reconstruction text in the online service reconstruction text set comprises: one of the example conversational text in the set of abnormal text units, at least one of the abnormal text units in the set of example conversational text.
Additionally, in some independent embodiments, said obtaining an iteration cost of said underlying second artificial intelligence network from each said highlighted knowledge vector, debugging said underlying second artificial intelligence network from said iteration cost comprises:
determining a first commonality measure between the abnormal text units of the same data abnormal label and a second commonality measure between the abnormal text units of different data abnormal labels according to the highlight annotation knowledge vector of each abnormal text unit;
determining an eccentricity coefficient corresponding to each first commonality measure and an eccentricity coefficient corresponding to each second commonality measure;
determining the iteration cost of the basic second artificial intelligent network according to each second commonality measure, the eccentricity coefficient corresponding to each second commonality measure, each first commonality measure and the eccentricity coefficient corresponding to each first commonality measure;
And performing feedback debugging on the second artificial intelligent network of the foundation according to the iteration cost to obtain an improved variable corresponding to the second artificial intelligent network of the foundation, and modifying the variable corresponding to the second artificial intelligent network of the foundation into the improved variable to obtain the second artificial intelligent network after debugging.
Additionally, in some independent embodiments, the determining the eccentricity factor for each of the first commonality metrics and the eccentricity factor for each of the second commonality metrics includes:
acquiring a first reference commonality measure in the first commonality measures, and taking a comparison result of the first reference commonality measure and each first commonality measure as an eccentric coefficient corresponding to each first commonality measure respectively;
and obtaining a second reference commonality measure in the second commonality measures, and respectively taking a comparison result of the second reference commonality measure and each second commonality measure as an eccentric coefficient corresponding to each second commonality measure.
In addition, in some independent embodiments, the performing text unit induction on the plurality of abnormal text units to obtain an abnormal text unit set includes: performing label pre-addition on each abnormal text unit according to the annotation category of each abnormal text unit to obtain a preamble data abnormal label corresponding to each abnormal text unit; according to the detail pairing result between every two abnormal text units, determining redundant abnormal text units, carrying out text unit induction on a preamble data abnormal label corresponding to the redundant abnormal text units, and integrating the abnormal text units obtained after the text unit induction into an abnormal text unit set;
The washing out the example session text that does not meet the set requirements includes: performing text extraction operation on each example session text to obtain a target example session text and a corresponding trusted coefficient, wherein the target example session text comprises an abnormal text unit; and cleaning the target example conversation text with the confidence coefficient greater than the confidence coefficient threshold.
Additionally, in some independent embodiments, the generating an online service reconstruction text set from the abnormal text unit set and the example conversation text set includes:
the method comprises the following operation on each example session text in the example session text set:
performing text content disassembly on the example session text to obtain a target session text in the example session text;
performing the following operations for the target session text of each of the example session texts:
acquiring at least one abnormal text unit from the abnormal text unit set, and performing text reconstruction processing on the at least one abnormal text unit and the example session text according to the target session text to obtain an online service reconstruction text corresponding to the example session text;
And integrating the online service reconstruction text obtained according to each example session text into an online service reconstruction text set.
In addition, in some independent embodiments, the performing text reconstruction processing on the at least one abnormal text unit and the example session text according to the target session text to obtain an online service reconstructed text corresponding to the example session text includes:
scaling the at least one abnormal text unit according to the scale of the example conversation text;
selecting the abnormal text units with the adjusted scales according to the setting possibility to carry out weight distribution;
determining at least one reconstruction distribution feature in the target session text according to a set distribution hit variable, adding the at least one abnormal text unit to the example session text based on each reconstruction distribution feature to obtain an online service reconstruction text corresponding to the example session text, wherein each reconstruction distribution feature has a set quantization relationship with a feature distance of a reference distribution feature of the example session text, and the set distribution hit variable corresponding to each reconstruction distribution feature.
In addition, in some independent embodiments, the obtaining, according to the text extraction result, a window prediction cost and a tag discrimination cost of the basic first artificial intelligent network, and according to the tag discrimination cost and the window prediction cost, debugging the basic first artificial intelligent network includes:
acquiring prior guidance corresponding to each online service reconstruction text in the online service reconstruction text set, and taking the prior guidance as a discrimination expectation, wherein the prior guidance corresponding to each online service reconstruction text comprises: reconstructing abnormal text units in the text by the online service, and distributing the corresponding characteristics of each abnormal text unit;
determining label discrimination cost and window prediction cost of the basic first artificial intelligent network according to the discrimination expectation and the text extraction result;
and determining an improved variable of the basic first artificial intelligent network according to the tag discrimination cost and the window prediction cost, and modifying a corresponding variable in the basic first artificial intelligent network into the improved variable to obtain the debugged first artificial intelligent network.
A second aspect is an artificial intelligence server comprising a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the computer instructions, when executed by the processor, cause the artificial intelligence server to perform the method of the first aspect.
A third aspect is a computer readable storage medium having stored thereon a computer program which, when run, performs the method of the first aspect.
According to the embodiment of the invention, the distribution characteristics of the text units in the online business interaction text to be analyzed are obtained through the first artificial intelligent network, the data abnormal labels of the text units are determined through the second artificial intelligent network, the different artificial intelligent networks are utilized to process the data abnormal labels respectively, and the decision making of the data abnormal labels can be cooperatively processed, so that the average of calculation cost is realized, the data abnormal labels of the text units to be analyzed are determined according to the knowledge vector difference between the abnormal text units in the text unit pool and the text units to be analyzed, the accuracy of the data abnormal label matching can be improved, the abnormal text units in the text unit pool are used as samples, the second artificial intelligent network can cope with more requirements of the data abnormal label matching, and the reliability of the data abnormal label matching analysis of the text units to be analyzed is improved.
Detailed Description
Hereinafter, the terms "first," "second," and "third," etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or "a third", etc., may explicitly or implicitly include one or more such feature.
FIG. 1 shows a flow chart of a big data anomaly AI analysis method based on online service interaction, which is provided by an embodiment of the invention, and the big data anomaly AI analysis method based on online service interaction can be realized by an artificial intelligence server, wherein the artificial intelligence server can comprise a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the processor, when executing the computer instructions, causes the artificial intelligence server to perform S101-S104.
S101, performing text extraction operation through a first artificial intelligent network according to the online service interaction text to be analyzed to obtain text units to be analyzed in the online service interaction text to be analyzed.
In the embodiment of the invention, the first artificial intelligent network can realize extraction and identification processing of text units in the online service interaction text, and can divide the online service interaction text to be analyzed to obtain a content set corresponding to the text units, so as to obtain the text units to be analyzed. Text units include, but are not limited to, content sets of words, sentences, and paragraphs.
Under some exemplary design considerations, S101 is implemented by: according to the online business interaction text to be analyzed, performing text detail adjustment operation of multiple stages on the online business interaction text to be analyzed through a backbone network of a first artificial intelligent network to obtain a text detail adjustment vector set corresponding to each stage, wherein the text detail adjustment vector set of each stage except for the last stage is used as a raw material of the text detail adjustment operation of the next stage; vector aggregation operation is carried out on each text detail adjustment vector set by a vector aggregation sub-network of the first artificial intelligent network, so that a text detail aggregation vector set of the online business interaction text to be analyzed is obtained; the text window analysis subnet of the first artificial intelligent network performs text window analysis operation on the online service interaction text to be analyzed according to the text detail aggregation vector set to obtain a text window of notes to be analyzed; and carrying out disassembly processing on the online service interaction text to be analyzed according to the text window to obtain text units to be analyzed in the online service interaction text to be analyzed.
The text detail adjustment operation of multiple stages may be understood as a text detail up-sampling process of multiple levels/layers, and the resulting text detail adjustment vector set may be understood as a text up-sampling vector record (or a text up-sampling vector diagram). The vector aggregation operation is used for carrying out vector fusion processing on the text detail adjustment vector set so as to obtain the text detail aggregation vector set. The text window analysis operation is used for predicting a text window, the text window of the annotation to be analyzed is a text window carrying a highlight annotation, and the content set contained in the text window can be understood as a text unit to be analyzed.
S102, performing text knowledge mining operation through a second artificial intelligent network according to the text units to be analyzed to obtain interactive text knowledge vectors of the text units to be analyzed.
In an embodiment of the present invention, the interactive text knowledge vector (interactive text feature) includes a highlight annotation knowledge vector (annotation feature) and an annotation category (category of annotation).
In the embodiment of the invention, the second artificial intelligent network can be used for mining the highlight annotation knowledge vector of the text unit, and determining the data exception label corresponding to the text unit according to the mined highlight annotation knowledge vector; and analyzing and processing the user operation behavior information in the text unit to obtain the annotation category corresponding to the text unit.
Under some exemplary design considerations, S102 may be implemented through S1021 and S1022.
S1021, performing content characterization knowledge mining operation through a second artificial intelligent network according to the text unit to be analyzed to obtain corresponding text content characterization knowledge of the text unit to be analyzed.
In the embodiment of the invention, the text content characterization knowledge can be understood as a semantic vector of a text, the second artificial intelligent network comprises a knowledge mining layer and a query layer, and the knowledge mining layer carries out text knowledge mining operation on the text units to be analyzed to obtain a highlight annotation knowledge vector and annotation category of the text units to be analyzed. The knowledge mining layer comprises a coding layer and a feature selection layer. The coding layer may be a residual layer for extracting text content characterization knowledge of the text unit to be analyzed. The output pattern of the text content characterization knowledge may be discrete.
In S1022, the text content characterization knowledge is changed from the first output mode to the second output mode, and a highlighted annotation knowledge vector of the text unit to be analyzed is obtained.
In an embodiment of the present invention, the first output mode may be a discrete numerical mode, and the second output mode may be a continuous characteristic mode. The textual content characterization knowledge may be altered into a pattern of sequential features by projecting the textual content characterization knowledge into the ideas in the feature space. The feature selection layer of the second artificial intelligence network may be a feature projection layer, the feature projection layer includes a fully connected module, the feature projection layer makes the text content characterization knowledge change from the first output mode to the second output mode by projecting the text content characterization knowledge into the feature space, and takes the text content characterization knowledge of the second output mode as a highlight annotation knowledge vector of the text unit to be analyzed.
S103, acquiring an interactive text knowledge vector corresponding to each abnormal text unit in the text unit pool, and determining a knowledge vector difference between the interactive text knowledge vector of the text unit to be analyzed and the interactive text knowledge vector of each abnormal text unit.
In the embodiment of the invention, the interactive text knowledge vector comprises a highlight annotation knowledge vector and an annotation category, wherein the highlight annotation knowledge vector can be characterized as a second output mode, and the feature distance between the abnormal text unit and the text unit to be analyzed can be determined through the highlight annotation knowledge vector. The abnormal text units in the text unit pool can be collected from the Internet through a text collection thread, and the text unit pool comprises the abnormal text units and data abnormal labels corresponding to the abnormal text units. The period of determination of the highlighted annotation knowledge vector corresponding to the abnormal text unit is not limited. Based on this, a text unit pool can also be understood as a sample database for text units. An abnormal text unit may be understood as a text unit sample.
Under some exemplary design considerations, S103 may be implemented by: the method comprises the steps of obtaining a highlight annotation knowledge vector corresponding to each abnormal text unit in a text unit pool, and carrying out the following operations on the highlight annotation knowledge vector corresponding to each abnormal text unit: performing characteristic operation on the highlight annotation knowledge vector of the text unit to be analyzed and the highlight annotation knowledge vector of the abnormal text unit to obtain a highlight annotation comparison vector; the knowledge vector differences are determined from the highlight annotation comparison vector. Those skilled in the art will appreciate that the feature operation may be a subtraction of the feature vector, the resulting highlight annotation comparison vector may be the result vector of the subtraction completed, and the knowledge vector difference is the vector distance determined based on the highlight annotation comparison vector.
In an embodiment of the invention, the query layer is configured to calculate a knowledge vector difference between the highlighted knowledge vector of the text unit to be analyzed and the highlighted knowledge vector of each abnormal text unit in the text unit pool. The knowledge vector differences can be used to reflect the similarity between the abnormal text unit and the text unit to be analyzed, the knowledge vector differences and the similarity are in negative correlation, and the smaller the knowledge vector differences, the higher the similarity.
S104, taking the data exception label of the exception text unit corresponding to the target knowledge vector difference in the knowledge vector difference as the data exception label of the text unit to be analyzed.
In the embodiment of the invention, the data exception label of the text unit can be judged based on the theme corresponding to the text unit and the exception event represented by the text unit. Such as: the text unit is a government enterprise business text set, and can realize multi-level data abnormal label discrimination of the text unit, such as three-level label discrimination of the text unit to be analyzed in combination with annotation category, namely the data abnormal label of the text unit to be analyzed can be 'user information security-data stealing-phishing'. For another example, where the text unit is a cross-border e-commerce text set, the corresponding data anomaly tag may be "funds flow anomaly-no-op period".
The target knowledge vector difference in the knowledge vector differences may be the smallest knowledge vector difference.
In the embodiment of the invention, the interactive text knowledge vector comprises a highlight annotation knowledge vector and an annotation category, the knowledge vector difference can be obtained according to the highlight annotation knowledge vector, and the data exception label corresponding to the minimum knowledge vector difference and belonging to the exception text unit can be used as the data exception label of the text unit to be analyzed. When the minimum knowledge vector difference is larger than the knowledge vector difference threshold, the data exception label to which the text unit to be analyzed belongs can be determined by pairing the annotation category of the text unit to be analyzed and the annotation category of each exception text unit.
Under some exemplary design considerations, S104 may be implemented by S1041 and S1043.
S1041, when the minimum knowledge vector difference is not greater than the knowledge vector difference threshold, taking the data exception label of the exception text unit corresponding to the minimum knowledge vector difference as the data exception label of the text unit to be analyzed.
In the embodiment of the invention, the smaller the knowledge vector difference is, the higher the similarity between the abnormal text unit and the text unit to be analyzed is, when the minimum knowledge vector difference is smaller than or equal to the knowledge vector difference threshold value, the similarity between the abnormal text unit corresponding to the minimum knowledge vector difference and the text unit to be analyzed is higher, the abnormal text unit and the text unit to be analyzed belong to the same data abnormal label, and the data abnormal label to which the abnormal text unit belongs can be used as the data abnormal label of the text unit to be analyzed.
In S1042, when the minimum knowledge vector difference is greater than the knowledge vector difference threshold, capturing user operation behaviors of the text unit to be analyzed through the second artificial intelligent network to obtain user operation behavior information in the text unit to be analyzed, and judging the behavior trend of the user operation behavior information to obtain the annotation category of the text unit to be analyzed.
In the embodiment of the invention, the knowledge mining layer further comprises a behavior capturing node and a behavior analyzing node. The behavior capturing node is used for capturing user operation behaviors of the text unit to be analyzed, determining user operation behavior information containing user operation behaviors in the text unit to be analyzed, analyzing user operation behavior preferences in the user operation behavior information by the behavior analyzing node, and taking the analyzed user operation behavior preferences as annotation categories of the text unit to be analyzed.
In the embodiment of the invention, when the minimum knowledge vector difference is larger than the knowledge vector difference threshold, the similarity between the highlight annotation knowledge vector of the abnormal text unit in the text unit pool and the highlight annotation knowledge vector of the text unit to be analyzed is lower, the annotation category of the text unit to be analyzed can be mined, and the data abnormal label of the text unit to be analyzed can be determined by taking the annotation category as a reference.
S1043, performing pairing processing on the annotation category of each abnormal text unit in the text unit pool according to the annotation category of the text unit to be analyzed, taking the target abnormal text unit paired with the annotation category of the text unit to be analyzed as the abnormal text unit corresponding to the minimum knowledge vector difference, and taking the data abnormal label of the target abnormal text unit as the data abnormal label of the text unit to be analyzed.
In the embodiment of the invention, the pairing processing can be realized by the following thought that the content elements corresponding to the annotation category of the text unit to be analyzed are paired with the content elements of the annotation category of each abnormal text unit to obtain the annotation category containing the content elements corresponding to the annotation category of the text unit to be analyzed, or the annotation category same as the annotation category of the text unit to be analyzed is obtained as a pairing result. In other words, the data exception label of the text unit corresponding to the same annotation category is used as the data exception label of the text unit to be analyzed.
In the embodiment of the invention, assuming that the annotation category paired result is that the annotation category paired with the annotation category of the text unit to be analyzed does not exist, the data anomaly tag corresponding to the text unit to be analyzed can be determined as the first anomaly tag. The first abnormal label corresponding to the text unit to be analyzed, the annotation category of the text unit to be analyzed and the high-brightness annotation knowledge vector can be recorded in the text unit pool, so that the text unit and the corresponding data abnormal label are added in the text unit pool.
Under some exemplary design ideas, the big data anomaly analysis report comprises distribution characteristics (position information) of a text window of a text unit to be analyzed in the online business interaction text to be analyzed, a data anomaly tag to which the annotation to be analyzed belongs, and an annotation category of the annotation to be analyzed.
Under some exemplary design considerations, prior to S102, a text unit pool may be generated based on the following considerations: performing text knowledge mining operation on each abnormal text unit through a second artificial intelligent network to obtain an interactive text knowledge vector of each abnormal text unit; carrying out label discrimination on each abnormal text unit according to the interactive text knowledge vector to obtain a data abnormal label to which each abnormal text unit belongs; recording each abnormal text unit, the interactive text knowledge vector of each abnormal text unit and the mapping chain between each abnormal text unit and the data abnormal label to which each abnormal text unit belongs in a text unit pool.
In the embodiment of the invention, the abnormal text units can be acquired from the Internet through a text acquisition thread, white list text units in the obtained abnormal text units are determined through a first artificial intelligent network, the white list text units are cleaned, and the reserved abnormal text units are recorded in a text unit pool. When the abnormal text unit is collected, information corresponding to the abnormal text unit (such as a data abnormal label or annotation category corresponding to the abnormal text unit) can be collected from the Internet, so that the operation cost can be effectively reduced.
Under some exemplary design considerations, the text unit pool may also be updated based on the following considerations: acquiring a plurality of expanded abnormal text units, and performing text knowledge mining operation on each expanded abnormal text unit through a second artificial intelligent network to obtain an interactive text knowledge vector corresponding to each expanded abnormal text unit; determining a data exception tag of each extended exception text unit according to the interactive text knowledge vector corresponding to each extended exception text unit, wherein the data exception tag comprises a first exception tag and a second exception tag; and recording each extended abnormal text unit, the interactive text knowledge vector of each extended abnormal text unit and a mapping chain between each extended abnormal text unit and the data abnormal label to which the extended abnormal text unit belongs in a text unit pool.
In the embodiment of the invention, the extended abnormal text unit is a text unit which is not recorded before in the text unit pool, and the label processing performance of the second artificial intelligent network can be enriched by recording the extended abnormal text unit in the text unit pool, so that the second artificial intelligent network can determine more data abnormal labels.
Under some exemplary design considerations, prior to S101, neural network debugging may be performed through S105-S108 to obtain a first artificial intelligence network and a second artificial intelligence network.
S105, performing text extraction operation on each online service reconstruction text in the online service reconstruction text set through a basic first artificial intelligent network to obtain a text extraction result.
In the embodiment of the present invention, the basic first artificial intelligence network (untrained, initialized, and general first artificial intelligence network) may be CNN, and the process of obtaining the text extraction result may refer to S101.
S106, acquiring window prediction cost and label discrimination cost of the basic first artificial intelligent network according to the text extraction result, and debugging the basic first artificial intelligent network according to the label discrimination cost and the window prediction cost.
In the embodiment of the invention, the window prediction cost reflects the difference between the text window of the correct text unit and the text window of the text unit predicted by the basic first artificial intelligent network, and the label discrimination cost reflects the difference between the correct annotation to be analyzed and the annotation predicted by the basic first artificial intelligent network.
Under some exemplary design considerations, S106 is implemented by: acquiring prior guidance corresponding to each online service reconstruction text in an online service reconstruction text set, taking the prior guidance as a discrimination expectation, and determining the label discrimination cost and the window prediction cost of a basic first artificial intelligent network according to the discrimination expectation and a text extraction result; and determining an improved variable of the basic first artificial intelligent network according to the tag discrimination cost and the window prediction cost, and modifying a corresponding variable in the basic first artificial intelligent network into the improved variable to obtain the debugged first artificial intelligent network.
In the embodiment of the present invention, the prior guidance (as can be understood by those skilled in the art as labeling data) corresponding to each online service reconfiguration text includes: reconstructing abnormal text units in the text by the online service and distributing characteristics corresponding to each abnormal text unit; the basic first artificial intelligent network can be CNN, the main network, the vector aggregation sub-network and the text window analysis sub-network in the first artificial intelligent network can be subjected to feedback debugging according to the label judging cost and the window predicting cost to obtain an improved variable (updated variable) which can integrate a text extraction result of the first artificial intelligent network to be debugged and an online service reconstruction text, and the corresponding variable in the basic first artificial intelligent network is modified into the improved variable to obtain the debugged first artificial intelligent network.
S107, performing text knowledge mining operation on each abnormal text unit in the abnormal text unit set through a basic second artificial intelligent network to obtain a highlight annotation knowledge vector of each abnormal text unit.
In an embodiment of the present invention, the underlying second artificial intelligence network may include a backbone network and a feature projection layer. The process of extracting the highlight annotation knowledge vector by the underlying second artificial intelligence network may refer to S102.
S108, acquiring iteration cost of the basic second artificial intelligent network according to each highlight annotation knowledge vector, and debugging the basic second artificial intelligent network according to the iteration cost.
Under some exemplary design considerations, obtaining an iteration cost and debugging the second artificial intelligence network in accordance with the iteration cost may be achieved by following S1081-S1084.
S1081, determining a first commonality measure between the abnormal text units of the same data abnormal label and a second commonality measure between the abnormal text units of different data abnormal labels according to the high-brightness annotation knowledge vector of each abnormal text unit.
In the embodiment of the invention, the first common metric is the similarity between text units under the same data anomaly label, the second common metric is the similarity between text units of different data anomaly labels, and the first common metric and the second common metric can be obtained by taking the highlight annotation knowledge vector as an operation basis.
S1082, determining an eccentricity coefficient corresponding to each first commonality measure and an eccentricity coefficient corresponding to each second commonality measure.
In the embodiment of the invention, the eccentricity coefficient (weight value) can be obtained based on the following thought: acquiring first reference commonality metrics in the first commonality metrics, and taking comparison results of the first reference commonality metrics and each first commonality metric as eccentric coefficients corresponding to each first commonality metric respectively; and obtaining a second reference commonality measure in the second commonality measures, and respectively taking a comparison result of the second reference commonality measure and each second commonality measure as an eccentric coefficient corresponding to each second commonality measure. The first reference commonality measure may be a center similarity in the first commonality measure and the second reference commonality measure may be a center similarity in the second commonality measure.
S1083, determining the iteration cost of the basic second artificial intelligent network according to each second commonality measure, the eccentricity coefficient corresponding to each second commonality measure, each first commonality measure and the eccentricity coefficient corresponding to each first commonality measure.
In the embodiment of the invention, the second artificial intelligent network can be repeatedly and circularly debugged, so that the label processing performance of the second artificial intelligent network on the text unit is better, and the second artificial intelligent network can accurately and reliably mine the highlight annotation knowledge vector in the text unit as much as possible.
Under some exemplary design considerations, a text set (including an abnormal text unit set and an online service reconstruction text set) for debugging a neural network may be obtained through S109 to S110
In S109, a plurality of abnormal text units to be selected are obtained, and text unit generalization is performed on the plurality of abnormal text units to obtain an abnormal text unit set.
Under some exemplary design considerations, text unit generalizations may be made based on the following considerations: performing label pre-addition on each abnormal text unit according to the annotation category of each abnormal text unit to obtain a preamble data abnormal label corresponding to each abnormal text unit; and carrying out text knowledge mining operation on each abnormal text unit, determining redundant abnormal text units according to the detail pairing result between every two abnormal text units, carrying out text unit induction on the front sequence data abnormal labels corresponding to the redundant abnormal text units, and integrating the abnormal text units obtained after the text unit induction into an abnormal text unit set.
In the embodiment of the invention, the name of the data exception label corresponding to the text unit can be added through the annotation category corresponding to the text unit, so as to judge different data exception labels. However, the abnormal text units to be selected may carry more redundant text units, and the annotation categories corresponding to the redundant text units may have a certain difference, for example: the two government enterprise business text sets are the same, but the annotation category of one government enterprise business text set is a numeric category, the annotation category of the other government enterprise business text set is a letter category, the two government enterprise business text sets are divided into different data exception labels due to different annotation categories, the information in the data exception labels has repetition and redundancy, and the redundant data exception labels need to be generalized and de-duplicated. Such as: the text units are government and enterprise business text sets, hundreds of abnormal text units are obtained, annotation categories corresponding to the abnormal text units are determined, and the abnormal text units are pre-judged as one type according to the annotation categories, in other words, the abnormal text units are in one-to-one correspondence with one data abnormal label. And carrying out text knowledge mining operation on each abnormal text unit, when the detail pairing result (or similarity) between the two abnormal text units reaches 0.9-1, adjusting the two abnormal text units to the same data abnormal label, summarizing the front data abnormal labels corresponding to the two abnormal text units, and integrating each abnormal text unit after the text units are summarized into an abnormal text unit set.
S110, acquiring a plurality of example session texts to be selected, cleaning the example session texts which do not meet the setting requirements, and integrating the reserved example session texts into an example session text set.
In the embodiment of the invention, the requirement is set that the example conversation text does not contain abnormal text units.
Under some exemplary design considerations, if an abnormal text unit is already included in the exemplary conversation text, the included abnormal text unit may negatively affect the neural network debugging, and the exemplary conversation text that does not meet the setting requirement in the exemplary conversation text to be selected may be cleaned based on the following considerations: and performing text extraction operation on each example conversation text through an original first artificial intelligent network to obtain a target example conversation text and a corresponding credibility coefficient, and cleaning the target example conversation text with the credibility coefficient larger than a credibility coefficient threshold.
In the embodiment of the invention, the confidence coefficient is the confidence weight of the extraction result obtained by the text extraction operation, and the extraction result obtained by the text extraction operation comprises two types of information: example conversation text that does not contain an abnormal text unit, and target example conversation text that includes an abnormal text unit. The confidence coefficient threshold may be 0.8, and when the confidence coefficient corresponding to the target example session text is greater than or equal to 0.8, which indicates that the target example session text includes abnormal text units, the target example session text is washed away, and the reserved example session text is integrated into an example session text set.
S111, generating an online service reconstruction text set according to the abnormal text unit set and the example session text set.
In the embodiment of the invention, each online service reconfiguration text in the online service reconfiguration text set comprises: one example conversation text in the set of abnormal text units, at least one abnormal text unit in the set of example conversation text. Those skilled in the art will recognize that expressions such as examples, and the like are understood to be training examples, such as example conversation text sets are understood to be conversation text set examples.
Under some exemplary design considerations, S111 may be implemented by the following S1111 to S1113.
In the embodiment of the present invention, S1111 to S1113 are performed on each example session text in the example session text set, and each online service reconfiguration text may be obtained.
S1111, performing text content disassembly on the example session text to obtain a target session text in the example session text.
In the embodiment of the invention, the information except the substantial text content can be used as the target session text, and the abnormal text unit can be added into the target session text.
S1112, for each target session text of the example session text, performing the following operations: and acquiring at least one abnormal text unit from the abnormal text unit set, and performing text reconstruction processing on the at least one abnormal text unit and the example session text according to the target session text to obtain an online service reconstruction text.
In the embodiment of the invention, at least one abnormal text unit is arbitrarily acquired from the abnormal text unit set. Assuming that the online business interaction text to be analyzed is a user portrait text, the abnormal text units may be determined to be added to at least one reconstructed distribution feature in the example conversation text according to the distribution feature of the letter class book units of the user portrait text. And adding at least one abnormal text element to the example conversation text.
Under some exemplary design ideas, an exemplary conversation text and a plurality of abnormal text units are taken as a sample set, and sufficient online service reconstruction text can be obtained as a sample adjustment.
Under some exemplary design considerations, S1112 may be implemented by: resizing (text sizing) the at least one abnormal text unit according to the size of the example conversation text; selecting the abnormal text units with the adjusted scales according to the setting possibility to carry out weight distribution; and determining at least one reconstruction distribution characteristic in the target conversation text according to the set distribution hit variable, and adding at least one abnormal text unit into the example conversation text based on each reconstruction distribution characteristic to obtain an online service reconstruction text corresponding to the example conversation text. Wherein the weight assignment is used to adjust the saliency of the abnormal text unit.
In the embodiment of the invention, the feature distance between each reconstructed distribution feature and the reference distribution feature of the example session text has a set quantization relationship with a set distribution hit variable corresponding to each reconstructed distribution feature. Setting the quantization relation can be understood as a positive correlation.
In the embodiment of the invention, the feature distance between each reconstructed distribution feature and the reference distribution feature of the example session text has a set quantization relationship with a set distribution hit variable corresponding to each reconstructed distribution feature. The closer the feature distance between the reconstructed distribution feature and the middle of the example session text, the higher the set distribution hit variable that the reconstructed distribution feature corresponds to. At least one abnormal text unit is present, the reconstructed distribution feature may also be at least one.
According to the embodiment of the invention, the abnormal text units can be better added to the example session text through weight distribution, the online service reconstruction text can be as equal as possible to the actual online service interaction text to be analyzed, the debugging efficiency of the first artificial intelligent network is improved, and the first artificial intelligent network can more accurately acquire the text units to be analyzed in the online service interaction text to be analyzed.
S1113, integrating the online service reconstruction text obtained according to each example session text into an online service reconstruction text set.
In embodiments of the present invention, each example session text may correspond to a plurality of online service reconfiguration texts, each online service reconfiguration text corresponding to one example session text. The online service reconstruction text can be annotated according to the abnormal text units and the example session text corresponding to each online service reconstruction text, and the prior guidance corresponding to each online service reconstruction text comprises distribution characteristics, quantity, scale and the like of the text units in the online service reconstruction text, and the data abnormal labels and the corresponding example session text of the text units corresponding to the online service reconstruction text, so that sufficient online service reconstruction text is obtained as an adjustment sample.
By applying the embodiment of the invention, the distribution characteristics of the text units in the online business interaction text to be analyzed are obtained through the first artificial intelligent network, the data exception labels of the text units are determined through the second artificial intelligent network, the decision making of the data exception labels can be cooperatively processed through task division treatment for determining the data exception labels, so that the average of operation cost is realized, the data exception labels of the text units to be analyzed are determined according to the knowledge vector difference between the exception text units in the text unit pool and the text units to be analyzed, the accuracy of the data exception label matching can be improved, the second artificial intelligent network can cope with more requirements for the data exception label matching through taking the exception text units in the text unit pool as samples, and the reliability of the data exception label matching analysis of the text units to be analyzed is improved.
In the embodiment of the present invention, the idea of the second artificial intelligence network of the debug base may be: and performing feedback debugging on the basic second artificial intelligent network according to the iteration cost, determining gradient information of the basic second artificial intelligent network, and updating corresponding variables in the basic second artificial intelligent network according to the gradient information to obtain a debugged second artificial intelligent network.
In combination with the foregoing, in some independent embodiments, after taking the data exception tag to which the exception text unit corresponding to the target knowledge vector difference in the knowledge vector differences belongs as the data exception tag to which the text unit to be analyzed belongs, the method further includes: and generating an abnormal risk protection guide text corresponding to the text unit to be analyzed based on the data abnormal label.
In combination with the foregoing, in some independent embodiments, the generating, based on the data anomaly tag, an anomaly risk protection guidance text corresponding to the text unit to be analyzed includes S201-S204.
S201, acquiring a first risk tendency text vector and a first abnormal feedback semantic vector of the text unit to be analyzed, and acquiring a second risk tendency text vector and a second abnormal feedback semantic vector of a reference text unit; the data exception label of the reference text unit is the same as the data exception label of the text unit to be analyzed.
The risk trend text vector reflects trend characteristics of abnormal risks of data of the corresponding text unit, and the abnormal feedback semantic vector is used for reflecting viewpoint characteristics aiming at the abnormal risks in user feedback information of the corresponding text unit.
S202, determining whether risk triggering conditions of the text unit to be analyzed and the reference text unit are matched according to the first risk trend text vector and the second risk trend text vector, and determining whether a business scene of the text unit to be analyzed and a business scene of the reference text unit are matched according to the first abnormal feedback semantic vector and the second abnormal feedback semantic vector.
The risk triggering condition characterizes a risk hit feature or a risk activation feature corresponding to the text unit.
S203, if the risk triggering condition of the text unit to be analyzed is matched with the risk triggering condition of the reference text unit and the service scene of the text unit to be analyzed is matched with the service scene of the reference text unit, judging that the text unit to be analyzed is matched with the reference text unit.
S204, determining an abnormal risk protection guide text of the text unit to be analyzed by using the abnormal risk protection reference text of the reference text unit.
It can be understood that a first mapping relation between the reference text unit and the abnormal risk protection reference text can be established, and a second mapping relation between the reference text unit and the text unit to be analyzed can be established, so that migration processing from the abnormal risk protection reference text to the abnormal risk protection guide text is realized based on the two mapping relations, and the abnormal risk protection guide text of the text unit to be analyzed can be rapidly and accurately determined. For example, the migration process may be adapted and adaptively optimized for some detailed guidance items.
In combination with the foregoing, in some independent embodiments, the determining, in S202, whether the risk triggering condition of the text unit to be analyzed and the risk triggering condition of the reference text unit match according to the first risk prone text vector and the second risk prone text vector includes: determining a first match score between the first risk prone text vector and the second risk prone text vector; and if the first matching score is larger than a first set score, determining that the risk triggering condition of the text unit to be analyzed is matched with the risk triggering condition of the reference text unit.
In combination with the foregoing, in some independent embodiments, the determining, in S202, whether the business scenario of the text unit to be analyzed and the business scenario of the reference text unit match according to the first abnormal feedback semantic vector and the second abnormal feedback semantic vector includes: determining a second matching score between the first abnormal feedback semantic vector and the second abnormal feedback semantic vector; and if the second matching score is larger than a second set score, determining that the business scene of the text unit to be analyzed is matched with the business scene of the reference text unit.
The foregoing is only a specific embodiment of the present invention. Variations and alternatives will occur to those skilled in the art based on the detailed description provided herein and are intended to be included within the scope of the invention.