CN112541360A

CN112541360A - Cross-platform anomaly identification and translation method, device, processor and storage medium for clustering by using hyper-parametric self-adaptive DBSCAN (direct media Access controller area network)

Info

Publication number: CN112541360A
Application number: CN202011417622.6A
Authority: CN
Inventors: 俞枫; 黄韦; 周素珍; 詹婷婷; 方优
Original assignee: Guotai Junan Securities Co Ltd
Current assignee: Guotai Junan Securities Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-23

Abstract

The invention relates to a method for realizing cross-platform abnormal recognition translation by utilizing a hyper-parameter self-adaptive DBSCAN clustering algorithm, which comprises the steps of reporting errors in operation and butting an intelligent service platform; analyzing the text of the error information, judging whether a result of relative comparison and matching exists, if so, displaying the result, and guiding the correct operation of the client; judging whether the customer is satisfied, if so, ending the step, otherwise, pacifying the customer, and performing answering and opinion feedback; recording error reporting content; if no relative comparison matching result exists, a clustering algorithm model is established, error reporting information is clustered, and different error reporting clusters are converted into background operation knowledge points. The invention also relates to a corresponding device, a processor and a computer readable storage medium thereof. By adopting the technical scheme, the integrated process of automatically performing error reporting and clustering, information identification and guide translation is realized through text clustering and semantic analysis technology in natural language processing, and users are helped to obtain better experience in actual retail business processes such as transaction, business handling and the like.

Description

Cross-platform anomaly identification and translation method, device, processor and storage medium for clustering by using hyper-parametric self-adaptive DBSCAN (direct media Access controller area network)

Technical Field

The invention relates to the field of natural language processing, in particular to the field of text clustering and semantic parsing, and particularly relates to a method, a device, a processor and a computer readable storage medium for realizing cross-platform abnormal recognition and translation by using a hyper-parametric self-adaptive DBSCAN clustering algorithm.

Background

In fact, the natural language processing technology is in a rapid development stage at home and abroad at present, and the trend of the internet economic era has urgent need for intelligent services, so that strong market power is provided for the development of the intelligent services. Text clustering is a process of automatically classifying document sets, and belongs to the technical category of natural language processing. At present, many algorithm researches on text clustering at home and abroad are available, and the algorithms are generally classified into clustering algorithms based on division, hierarchy and density, grids and models, and different algorithms are suitable for different application scenes. Although the existing technologies and applications are not few, the technologies can be really used for error text clustering, and are combined with intelligent customer service, and the work of realizing intelligent cross-platform abnormal recognition translation closed loop facing to a terminal is few, because the following reasons are as follows:

1. at present, the application of artificial intelligence in the stock market mainly adopts two specific modes of intelligent consultants and quantitative investment, and the popup error translation belongs to embedding artificial intelligence into a specific business operation process, and is a cold category.

2. The clustering algorithm has the limitation, the clustering effect of most algorithms depends on the prediction of the number of text categories to a great extent, and the terminal cannot predict the number of the categories in advance by the aid of the daily popup window error information.

Under the background of big data, with the perfection of a stock exchange market system, innovation of exchange varieties and the endless emergence of various characteristic applications, the systems such as counters, quotations, buses and the like hidden behind the applications are huge and complex in construction, and massive data logs are produced in later operation and maintenance, wherein important business error reporting is abnormal. Most of error reporting exceptions are error reporting information of a background system is directly displayed in a popup window at present, due to the fact that background system platforms are complicated and the mode corresponding to front-end application is difficult to unify, error reporting information displayed in the popup window usually only contains some system error codes and text information specified by a specific developer, and the error reporting information is too professionally glossed and difficult to understand for clients. When one wants to translate this information into a friendly guide that the customer can understand, two main problems are found:

1. the error reporting information has complex categories, numerous quantities and difficult enumeration;

2. even if the same type of error report is carried out, the error report cannot be processed uniformly because of some specific information or non-critical characters covering the user.

Therefore, the main purpose of the invention is how to realize the integration of automatic error reporting and clustering, information identification and guided translation through text clustering and semantic parsing technology in natural language processing, and help users to obtain better experience in actual retail business processes such as transaction, business transaction and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method, a device, a processor and a computer readable storage medium for realizing cross-platform abnormal recognition and translation by using a super-parameter self-adaptive DBSCAN clustering algorithm, which have the advantages of high accuracy, easiness in understanding and wide application range.

In order to achieve the above object, the method, device, processor and computer readable storage medium for implementing cross-platform anomaly identification translation processing by using hyper-parametric adaptive DBSCAN clustering algorithm of the present invention are as follows:

the method for realizing cross-platform abnormal recognition and translation by using the hyper-parametric self-adaptive DBSCAN clustering algorithm is mainly characterized by comprising the operation steps of multi-platform interactive guidance, and specifically comprises the following processing procedures:

(1) when the operation is reported in error, the intelligent service platform is docked;

(2) performing text analysis on the error information, judging whether a relative comparison matching result exists, and if so, continuing the step (3); otherwise, continuing the step (5);

(3) displaying the result to guide the client to operate correctly;

(4) judging whether the customer is satisfied, if so, ending the step, otherwise, continuing the step (5), and pacifying the customer to perform answer and opinion feedback;

(5) recording error reporting content;

(6) and establishing a clustering algorithm model, clustering error reporting information, and converting different error reporting clusters into background operation knowledge points.

Preferably, the step (6) specifically includes the following steps:

(6.1) establishing a clustering algorithm model;

(6.2) clustering and displaying the collected error reporting information at regular time according to different categories;

(6.3) judging whether the error-reporting clustering can be solved, if so, continuing to the step (6.4); otherwise, the contact person carries out assistance processing and continues the step (6.4);

and (6.4) filling a solution, and converting different error reporting clusters into background operation knowledge points of the intelligent service platform.

Preferably, the step (6.4) specifically comprises the following steps:

(6.4.1) filling in a solution;

(6.4.2) carrying out knowledge point duplicate checking, and if repeated, carrying out semantic tuning; otherwise, a knowledge point is newly built, and the backtracking effect is increased.

Preferably, the method further comprises a step of clustering algorithm, which specifically comprises the following processing procedures:

(1-1) acquiring a data set which needs to be clustered currently;

(1-2) deleting the history model;

(1-3) tone splitting to generate trainer data;

(1-4) establishing a doc2vec training sentence vector model;

(1-5) converting the data set based on the current model;

and (1-6) traversing the hyper-parameters to obtain the optimal cluster by using the contour coefficient as a tuning standard.

Preferably, the contour coefficients in step (1) are specifically:

the profile coefficients are calculated according to the following formula:

wherein, a (i) represents the average value of the dissimilarity degree of the i vector to other points in the same cluster, and b (i) represents the minimum value of the average dissimilarity degree of the i vector to other clusters.

The device for realizing cross-platform abnormal recognition translation processing by using the hyper-parametric self-adaptive DBSCAN clustering algorithm is mainly characterized by comprising the following steps:

a processor configured to execute computer-executable instructions;

and the memory stores one or more computer-executable instructions, and when the computer-executable instructions are executed by the processor, the steps of the method for realizing cross-platform anomaly identification translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm are realized.

The processor for realizing cross-platform anomaly identification translation processing by using the super-parameter self-adaptive DBSCAN clustering algorithm is mainly characterized in that the processor is configured to execute computer executable instructions, and the computer executable instructions are executed by the processor to realize the steps of the method for realizing cross-platform anomaly identification translation processing by using the super-parameter self-adaptive DBSCAN clustering algorithm.

The computer readable storage medium is mainly characterized in that a computer program is stored thereon, and the computer program can be executed by a processor to realize the steps of the method for realizing cross-platform anomaly identification translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm.

By adopting the method, the device, the processor and the computer readable storage medium for realizing cross-platform abnormal recognition translation processing by utilizing the super-parameter self-adaptive DBSCAN clustering algorithm, when the clustering data is more, the contour coefficients are very close, and the requirement of error reporting clustering is well met by utilizing the super-parameter self-adaptive DBSCAN clustering algorithm model. The invention realizes the integrated process of automatic error reporting clustering, information identification and guide translation by text clustering and semantic analysis technology in natural language processing, and helps users to obtain better experience in actual retail business processes such as transaction, business transaction and the like.

Drawings

Fig. 1 is a schematic processing flow diagram of an error reporting translation system for implementing a cross-platform anomaly identification translation processing method by using a super-parameter self-adaptive DBSCAN clustering algorithm according to the present invention.

Fig. 2 is a schematic flow diagram of a clustering algorithm for implementing a cross-platform anomaly recognition translation processing method by using a super-parameter self-adaptive DBSCAN clustering algorithm according to the present invention.

Fig. 3 is a comparison diagram of automatically obtaining optimal parameters by using a super-parameter self-adaptive DBSCAN clustering algorithm to realize a cross-platform anomaly identification translation processing method according to the present invention.

Fig. 4 is a clustering experiment effect diagram of the method for realizing cross-platform anomaly recognition translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm of the present invention.

Fig. 5 is a schematic page view of an error reporting translation system for implementing a cross-platform anomaly recognition translation processing method by using a super-parameter self-adaptive DBSCAN clustering algorithm according to the present invention.

Detailed Description

In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.

The invention relates to a method for realizing cross-platform abnormal recognition translation by utilizing a hyper-parametric self-adaptive DBSCAN clustering algorithm, which comprises the following steps of (1) reporting errors in operation, and butting an intelligent service platform;

(3) displaying the result to guide the client to operate correctly;

(4) judging whether the customer is satisfied, if yes, finishing the step, otherwise, continuing the step (5) and pacifying the customer,

carrying out answering and opinion feedback;

(5) recording error reporting content;

(6) establishing a clustering algorithm model, clustering error reporting information, and converting different error reporting clusters into background operation knowledge points;

(6.1) establishing a clustering algorithm model;

(6.3) judging whether the error-reporting clustering can be solved, if so, continuing to the step (6.4); otherwise, the contact person proceeds to

Assisting processing, and continuing the step (6.4);

(6.4) filling a solution, and converting different error reporting clusters into background operation knowledge points of the intelligent service platform;

(6.4.1) filling in a solution;

(6.4.2) carrying out knowledge point duplicate checking, and if repeated, carrying out semantic tuning; otherwise, a knowledge point is newly built,

the backtracking effect is increased.

As a preferred embodiment of the present invention, the method further includes a step of a clustering algorithm, specifically including the following processing procedures:

(1-1) acquiring a data set which needs to be clustered currently;

(1-2) deleting the history model;

(1-3) tone splitting to generate trainer data;

(1-4) establishing a doc2vec training sentence vector model;

(1-5) converting the data set based on the current model;

As a preferred embodiment of the present invention, the contour coefficients in step (1) are specifically:

the profile coefficients are calculated according to the following formula:

The device for realizing cross-platform exception identification translation processing by using the hyper-parametric self-adaptive DBSCAN clustering algorithm comprises the following steps:

a processor configured to execute computer-executable instructions;

The processor for realizing cross-platform anomaly identification translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for realizing cross-platform anomaly identification translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm are realized.

The computer readable storage medium has stored thereon a computer program executable by a processor to perform the steps of the above method for implementing cross-platform anomaly recognition translation processing using a hyperreference adaptive DBSCAN clustering algorithm.

In the specific implementation manner of the present invention, the technical problem to be solved is to provide a cross-platform error-reporting translation system for a terminal, which identifies and translates abnormal information occurring in business operations on the terminal into client-friendly dialect and subsequent business guidance, and assists a client, especially a new client with insufficient experience, to smoothly perform business operations such as market browsing and stock trading.

The error reporting translation system is mainly divided into two parts: the method comprises a multi-platform interactive guiding process and a clustering algorithm process.

One-platform and multi-platform interactive guide process

The error reporting translation system relates to a plurality of business parties such as clients, product managers, operators, customer service and the like in use, and the background relates to a plurality of systems such as a mobile phone, a computer terminal, an intelligent service platform operation background, a user center, a marketing center and the like. The design of the data stream is shown in fig. 1.

When a user encounters a popup report, the intelligent service platform is firstly docked, text analysis is carried out on the answer, and when a relatively matched answer exists, the answer is fed back to the client again in a popup mode, so that the client is guided to carry out correct operation. And when the customer is not satisfied with the answer or the intelligent service platform fails to analyze the error, collecting the error information.

The intelligent service platform may specifically adopt an intelligent service platform of the national taijunan company, namely a smart customer service system, or may be other service platforms with corresponding functions, which all belong to the intelligent service platforms commonly used in the field, and the technical details thereof are well known in the field and are not described herein again.

The error reporting translation system regularly clusters the collected error reporting information according to different categories, displays the result to a product manager, and the product manager and customer service operators cooperate together to convert different error reporting clusters into an intelligent service platform background operation knowledge point.

Second, clustering algorithm process

The DBSCAN algorithm is proposed by Martin Ester, Hans-Peter Kriegel et al in 1996, and is a spatial clustering algorithm based on density.

1. Eps neighborhood: an Eps neighborhood of an object p refers to a region centered on the object p and having a radius of Eps, that is:

N_Eps(p)＝{q∈D|Dist(p,q)≤Eps}(1)

in the formula, D is a data set; dist (p, q) is the distance between object p and object q; n is a radical of_Eps(p) contains all objects in the data set D that are not more than a distance Eps from object p.

2. Core object: given a dataset D, a neighborhood density threshold MinPts is set, and if there is an object p ∈ D and equation (2) is satisfied, then object p is a core object.

|N_Eps(p)|MinPts(2)

In the formula, | N_Eps(p) | represents the number of Eps neighborhood objects for object p.

The contour Coefficient (Silhouette Coefficient) is an evaluation method for evaluating the clustering effect. Originally proposed by Peter j. The method combines two factors of the cohesion degree and the separation degree, and can be used for evaluating the influence of different algorithms or different operation modes of the algorithms on the clustering result on the basis of the same original data. The calculation formula is as follows:

in the formula, a (i) represents the average value of the dissimilarity degree of the i vector to other points in the same cluster, and b (i) represents the minimum value of the average dissimilarity degree of the i vector to other clusters.

The density-based clustering algorithm is essentially to find high-density data sets in a data set, i.e., the data sets have small average distances between data points, and low-density regions exist between the high-density data sets. The DBSCAN algorithm uses the Eps and MinPts parameters to determine the threshold for partitioning the high-density data set. The original DBSCAN algorithm uses TF _ IDF word vectors based on word frequency, and semantic information of error reporting is lacked. The algorithm flow chart is shown in figure 2, a sentence vector trained by a doc2vec model based on a data set is used for replacing a TF _ IDF word frequency vector, word frequency information and semantic information of a text are covered, and then an outline coefficient is used as an adjusting and optimizing standard to automatically obtain optimal parameters so as to obtain a relatively optimal clustering result. Fig. 3 demonstrates the comparison of automatically obtaining optimal parameters in the algorithm flow.

As shown in fig. 4 and fig. 5, which are the clustering effects of the adoption of iris data sets and the actual collection of error reporting information sets by the system, it can be seen from fig. 4 that the contour coefficients are very close when there is more clustered data. As the types are more, and the color difference can be limited only by the naked eyes, a real system page screenshot is shown in the figure 5, and the system can be seen from the figure, and the requirement of error reporting clustering is well met by using a hyper-parametric self-adaptive DBSCAN clustering algorithm model.

Hardware requirements of the invention:

the system deployment needs one internal memory 8GB, the CPU is 16 cores, the hard disk is 500GB, and the system is a server of CentOS 7.

Data was stored using the mysql database, developed using Django (python) and SpringCloud (java and angularJS), with model training and clustering algorithms deployed using docker.

For a specific implementation of this embodiment, reference may be made to the relevant description in the above embodiments, which is not described herein again.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A cross-platform abnormal recognition and translation processing method realized by using a hyper-parametric adaptive DBSCAN clustering algorithm is characterized by comprising the operation steps of multi-platform interactive guidance, and specifically comprising the following processing procedures:

(3) displaying the result to guide the client to operate correctly;

(5) recording error reporting content;

2. The method for realizing cross-platform anomaly recognition and translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm according to claim 1, wherein the step (6) specifically comprises the following steps:

(6.1) establishing a clustering algorithm model;

3. The method for realizing cross-platform anomaly recognition and translation by using the hyper-parametric adaptive DBSCAN clustering algorithm according to claim 2, wherein the step (6.4) specifically comprises the following steps:

(6.4.1) filling in a solution;

4. The method for realizing cross-platform anomaly recognition and translation processing by using the hyper-parametric adaptive DBSCAN clustering algorithm according to claim 1, wherein the method further comprises a clustering algorithm step, and specifically comprises the following processing procedures:

(1-1) acquiring a data set which needs to be clustered currently;

(1-2) deleting the history model;

(1-3) tone splitting to generate trainer data;

(1-4) establishing a doc2vec training sentence vector model;

(1-5) converting the data set based on the current model;

5. The method for realizing cross-platform anomaly recognition and translation by using hyper-parametric adaptive DBSCAN clustering algorithm according to claim 4, wherein the contour coefficients in the step (1) are specifically:

the profile coefficients are calculated according to the following formula:

6. An apparatus for implementing cross-platform anomaly recognition translation processing by using a hyper-parametric adaptive DBSCAN clustering algorithm, the apparatus comprising:

a processor configured to execute computer-executable instructions;

a memory storing one or more computer-executable instructions that, when executed by the processor, perform the steps of the method for cross-platform anomaly recognition translation processing using a hyper-parametric adaptive DBSCAN clustering algorithm of any of claims 1 to 5.

7. A processor for implementing cross-platform anomaly recognition translation processing using a hyper-parametric adaptive DBSCAN clustering algorithm, wherein the processor is configured to execute computer executable instructions which, when executed by the processor, implement the steps of the method for implementing cross-platform anomaly recognition translation processing using a hyper-parametric adaptive DBSCAN clustering algorithm of any of claims 1 to 5.

8. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method for cross-platform anomaly recognition translation using a hyperparametric adaptive DBSCAN clustering algorithm of any one of claims 1 to 5.