CN115329745A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN115329745A
CN115329745A CN202210899406.2A CN202210899406A CN115329745A CN 115329745 A CN115329745 A CN 115329745A CN 202210899406 A CN202210899406 A CN 202210899406A CN 115329745 A CN115329745 A CN 115329745A
Authority
CN
China
Prior art keywords
text data
risk
data
event
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210899406.2A
Other languages
Chinese (zh)
Inventor
罗仕漳
高芸
张腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202210899406.2A priority Critical patent/CN115329745A/en
Publication of CN115329745A publication Critical patent/CN115329745A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a data processing method and a data processing device, wherein the method comprises the following steps: in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data is stored in a preset queue; adopting a natural language processing model to perform event positioning on the text data in the preset queue to obtain event positioning information of the text data; and performing risk analysis according to the event positioning information to obtain a risk analysis result. By the embodiment of the invention, the user requirements are automatically positioned and risk analysis is carried out, the timeliness and the accuracy of the user requirement positioning are improved, the possible risks are reduced, excessive manual participation is not needed, and the efficiency is improved.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data processing.
Background
At present, the user is increasingly conscious of product usage and conscious awareness of right to maintain, and if the user needs to be concerned and respond in time, risks such as user loss may occur.
In the prior art, after a user feeds back a problem, the problem fed back by the user is usually positioned by periodically adopting a quantitative statistics or manual sampling inspection mode, certain hysteresis exists, the user requirement cannot be timely and accurately positioned and responded, and the efficiency is low.
Disclosure of Invention
In view of the above, it is proposed to provide a method and apparatus for data processing that overcomes or at least partially solves the above mentioned problems, comprising:
a method of data processing, the method comprising:
in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data is stored in a preset queue;
adopting a natural language processing model to carry out event positioning on the text data in the preset queue to obtain event positioning information of the text data;
and performing risk analysis according to the event positioning information to obtain a risk analysis result.
Optionally, the method further comprises:
and optimizing the distributed publishing and subscribing message system according to the risk analysis result.
Optionally, in the process of processing the real-time data, collecting text data submitted by a user through a distributed publish-subscribe messaging system, and storing the text data in a preset queue, includes:
in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data of different data sources are sent to different subject objects of the distributed publish-subscribe message system to be stored in a preset queue corresponding to the subject objects.
Optionally, the event positioning information includes event category information, the natural language processing model is a natural language processing model based on a convolutional neural network, and the event positioning information of the text data is obtained by performing event positioning on the text data in the preset queue by using the natural language processing model, including:
acquiring text data from the preset queue;
converting the acquired text data into word embedding vectors by adopting a natural language processing model to obtain a word embedding vector matrix, and performing convolution processing on the word embedding vector matrix;
and classifying the convolution processing result to obtain event category information.
Optionally, the performing risk analysis according to the event positioning information to obtain a risk analysis result includes:
determining a first risk value according to the event positioning information according to the event dimension;
determining a second risk value according to the event positioning information according to the user dimension;
and determining a third risk value according to the first risk value and the second risk value.
Optionally, the method further comprises:
and carrying out risk early warning and/or risk preprocessing according to the risk analysis result.
Optionally, before the collecting, by the distributed publish-subscribe message system, text data submitted by a user, the method further includes:
and denoising the text data submitted by the user.
An apparatus for data processing, the apparatus comprising:
the system comprises a text data acquisition module, a data processing module and a data processing module, wherein the text data acquisition module is used for acquiring text data submitted by a user through a distributed publish-subscribe message system in the process of real-time data processing and storing the text data in a preset queue;
the text data processing module is used for carrying out event positioning on the text data in the preset queue by adopting a natural language processing model to obtain event positioning information of the text data;
and the risk analysis module is used for carrying out risk analysis according to the event positioning information to obtain a risk analysis result.
An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the method of data processing as above.
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method of data processing as above.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, in the process of real-time data processing, text data submitted by a user is acquired through a distributed publish-subscribe message system, the text data is stored in a preset queue, a natural language processing model is adopted to perform event positioning on the text data in the preset queue to obtain event positioning information of the text data, risk analysis is performed according to the event positioning information to obtain a risk analysis result, the user requirement is automatically positioned and risk analysis is performed, the timeliness and the accuracy of user requirement positioning are improved, possible risks are reduced, excessive manual participation is not needed, and the efficiency is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flow chart illustrating steps of a method for data processing according to an embodiment of the present invention;
FIG. 2a is a diagram of a system architecture according to an embodiment of the present invention;
FIG. 2b is a diagram illustrating an example of data processing provided by an embodiment of the present invention;
FIG. 3 is a flow chart of steps in another method of data processing according to an embodiment of the present invention;
FIG. 4 is a flow chart of steps in another method of data processing according to an embodiment of the invention;
fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For enterprises, user requirements are accurately positioned from data actively fed back by a large number of users, and potential risk events are subjected to instant warning and preprocessing, so that the enterprise is facilitated to timely improve product problems, maintain good relation with customers and enhance self competitiveness.
In one technical scheme, events can be classified through a mixed algorithm combining a K-means algorithm and a Bayesian network, a risk analysis model is obtained through event data training, and then user requirements are located and predicted, but the algorithm needs to continuously classify and adjust data objects, continuously calculates new cluster center points after adjustment, and when the data volume is very large, the time cost of the algorithm is very large, and the algorithm is not beneficial to immediate preprocessing of a large number of events.
In another technical scheme, different data sources can be processed in a layered mode according to preset rules to obtain a plurality of data source layers, and different processing modes are adopted according to risk degrees of the different data sources. However, different data sources are processed by relying on preset rules, and the rules are still required to be manually set for layering when a large amount of unstructured text data is faced, which is labor-consuming and not beneficial to standardization.
In the embodiment of the present invention, a Natural Language Processing (NLP) model based on a Convolutional Neural Network (CNN) and a real-time data Processing technology of a distributed publish-subscribe message system (e.g., kafka) may be combined to locate and early-warn a potential risk point.
Specifically, text data of each data source user can flow into a distributed publish-subscribe message system in a real-time manner and is stored in an early warning processing queue (hereinafter, a preset queue), events related to non-structural and text appeal data of the user are classified and positioned through a natural language processing model, and the method is applied to the early warning model, risk upgrade warning and preprocessing reminding are performed on management personnel, accurate positioning and real-time warning of the risk events are achieved, enterprises are helped to deal with potential problems in advance, event upgrading is avoided, enterprise service quality is improved, and user satisfaction is improved.
On one hand, the text information automatic summarization and appeal positioning are carried out by using a natural language processing algorithm, so that the skill dependence of personnel is reduced, and the ability of correctly analyzing the user intention is improved.
Moreover, by combining the natural language processing model based on the convolutional neural network and the real-time data processing technology of the distributed publish-subscribe message system, the early warning processing can be performed on the potential risk points, and the real-time data summarizing mode based on the distributed publish-subscribe message system is combined with the natural language processing model based on the convolutional neural network with extremely high processing speed, so that the potential risk events can be efficiently and timely positioned and early warned.
On the other hand, under a real-time environment, events related to the user non-structural text appeal data are classified and positioned through the natural language processing model, and the natural language processing model is applied to the early warning model, risk upgrading warning and preprocessing reminding are conducted on managers, and real-time warning and preprocessing are conducted on potential risk events.
Moreover, by combining a risk early warning model, a self-feedback mechanism (a distributed publishing and subscribing message system is optimized according to a risk analysis result) is added in the real-time data processing process, so that the manual intervention amount is reduced, the problem analysis efficiency is improved, the event upgrading caused by untimely processing is avoided, and the user experience is enhanced.
Further description is provided below:
referring to fig. 1, a flowchart illustrating steps of a data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data is stored in a preset queue.
The distributed publish-subscribe message system can process all action stream data of a consumer in a website, such as web browsing, searching, message leaving and the like, can unify online and offline message processing through a parallel loading mechanism of Hadoop, and can further provide real-time messages through a cluster, for example, the distributed publish-subscribe message system can be a Kafka system.
When a product is online for a new service, or at a specific time node of interest, such as a sales promotion activity, user requirements need to be positioned and responded in time to avoid adverse effects caused by potential risk upgrading, and text data submitted by a user through a webpage can be collected through a distributed publish-subscribe message system.
As an example, the text data may be text data submitted by a user through a message channel published on a web page, such as a complaint feedback message, and of course, the text data may also be other text data submitted on a web page, such as text data of a search word input by the user in a search box.
After the text data submitted by the user is collected, the text data can be stored in a preset queue, such as a dedicated queue for the user message, through the distributed publish-subscribe messaging system, so as to facilitate subsequent data processing.
In the embodiment of the invention, the distributed publish-subscribe message system is arranged to be responsible for the real-time data transmission service, so that the real-time data transmission service can deal with the scene of real-time data processing, provide a real-time data summarizing mode, ensure the real-time performance of data and further improve the timeliness of subsequent data processing.
In an embodiment of the present invention, step 101 may include:
in the process of real-time data processing, text data submitted by a user are collected through a distributed publish-subscribe message system, and the text data of different data sources are sent to different topic objects of the distributed publish-subscribe message system to be stored in a preset queue corresponding to the topic objects.
In practical application, a distributed publish-subscribe message system may be provided with a plurality of Topic objects (Topic), and each Topic object is provided with a corresponding preset queue.
Moreover, each subject object may be provided with a plurality of partitions, for example, if a partition corresponds to each day according to a time dimension, a multithread producer and a multi-partition strategy may be adopted, and according to a data source of text data, the data source of the text data is sent to the subject object corresponding to the data source, and then the data source of the text data may be stored in a preset queue corresponding to the subject object to be subjected to subsequent data processing.
In an embodiment of the present invention, before the collecting, by the distributed publish-subscribe message system, text data submitted by a user, the method may further include:
and denoising the text data submitted by the user.
In practical application, the noise data contained in the data can be identified and cleaned, and for example, denoising processing such as special character filtering, redundant blank character removing, repeated information deduplication and the like can be adopted.
And 102, performing event positioning on the text data in the preset queue by adopting a natural language processing model to obtain event positioning information of the text data.
The natural language processing model is based on a convolution neural network.
Processing the text data stored in the preset queue according to the arrangement sequence of the text data in the queue, for example, sequencing according to the time of entering the queue, and processing before entering the queue.
In a specific implementation, a natural language processing model may be adopted to perform event positioning on the text data in the preset queue, that is, to position the key information fed back by the user, so as to obtain event positioning information.
In the embodiment of the invention, the text information is automatically summarized and positioned by using a natural language processing algorithm, so that the skill dependence of personnel is reduced, and the ability of correctly analyzing the user intention is improved.
And on one hand, the distributed publish-subscribe message system is used for being responsible for real-time data transmission service, and on the other hand, the natural language processing model is used for data processing, so that the real-time data summarizing mode is combined with the natural language processing model which has extremely high processing speed and is based on the convolutional neural network, and potential risk events can be efficiently and timely positioned and early warned.
In one example, the processing analysis work of the current data can be completed before the latest data arrives by adopting the natural language processing model, and the real-time performance of data processing is ensured.
In an embodiment of the present invention, the event positioning information may include event category information, and accordingly, step 102 may include:
and a substep 11 of obtaining text data from the preset queue.
In a specific implementation, the stored text data may be retrieved sequentially from a preset queue.
And a substep 12 of converting the acquired text data into word embedding vectors by adopting a natural language processing model to obtain a word embedding vector matrix and performing convolution processing on the word embedding vector matrix.
After the text data is obtained, the obtained text data may be converted into Word Embedding vectors (Word Embedding), that is, one Word may be converted into a vector representation of a fixed length, and a Word Embedding vector matrix composed of a plurality of Word Embedding vectors may be obtained.
In the case that the natural language processing model is based on a convolutional neural network, the natural language processing model may be provided with a plurality of convolution kernels, and a large number of convolution operations may be asynchronously performed on the stacked word-embedded vector matrix by using convolution kernels having widths different from a Stride (Stride) size, so as to obtain information of different dimensions generated by different convolution kernels, that is, a result of convolution processing.
And a substep 13 of classifying the result of the convolution processing to obtain event category information.
After the convolution processing result is obtained, the convolution processing result can be input into a pooling layer, the data dimensionality is reduced, and then the convolution processing result can be classified to obtain event category information.
For example, using a Softmax classifier for classification, the Softmax function may be applied to the scaled n-dimensional input tensor such that the elements of the n-dimensional output tensor are in the range of [0,1] and sum to 1.
And 103, performing risk analysis according to the event positioning information to obtain a risk analysis result.
After the event positioning information is obtained, the event positioning information can be input into a preset risk early warning model, and then risk analysis can be performed according to a preset threshold value to obtain a risk analysis result.
In an embodiment of the present invention, step 103 may include:
a substep 21 of determining a first risk value from said event localization information according to the event dimension.
For different types of events, different time periods can be distinguished according to the dimension of the event, and the frequency of the events in the different time periods is subjected to uninterrupted statistics, so that a first risk value can be obtained.
In an example, a risk marking process may also be performed on events that reach an early warning threshold.
And a substep 22 of determining a second risk value from said event localization information according to the user dimensions.
For different users, the frequency of submitting feedback within a fixed time can be counted according to the dimension of the user, and then a second risk value can be obtained.
In an example, risk tagging may also be performed for users that reach an early warning threshold.
Substep 23 of determining a third risk value based on said first risk value and said second risk value.
After the first risk value and the second risk value are obtained, weighted average can be performed to obtain a third risk value, and the third risk value can be used as a risk analysis result.
In the embodiment of the invention, the first risk value calculation is carried out by using the event dimension, the second risk value calculation is carried out by using the user dimension, and then the total score of the early warning value in the time interval is obtained by the two calculated values, so that the risk analysis of abnormal data is realized, and the accuracy of the risk analysis result is ensured.
In an embodiment of the present invention, the method may further include:
and carrying out risk early warning and/or risk preprocessing according to the risk analysis result.
After the risk analysis result is obtained, the potential risk event exceeding the threshold value can be sent to a pushing prompt for a professional to carry out risk early warning, if the third risk value is larger than the threshold value, the risk early warning is carried out, and the potential risk event can be preprocessed, for example, a corresponding solution is generated in advance.
In an embodiment of the present invention, the method may further include:
and optimizing the distributed publishing and subscribing message system according to the risk analysis result.
After the risk analysis result is obtained, a self-feedback mechanism can be adopted, the distributed publishing and subscribing message system is optimized by adopting the risk analysis result, the manual intervention amount is reduced, the problem analysis efficiency is improved, event upgrading caused by untimely processing is avoided, and the user experience is enhanced.
For example, the distributed publish-subscribe message system is a Kafka cluster, and a self-feedback mechanism may be used to optimize the setting of the topic objects in the Kafka cluster, and may also optimize nodes in the Kafka cluster, data transmission paths, and other aspects.
For the Kafka cluster, the ith node v i Generated data packet P i Can be represented by a quadruple:
P i =(T i ,H i ,D i ,S i )
wherein, T i Indicates a packet generation cycle (failure period), H i Representing the total number of hops from the source node to the sink node (kafka, processed node), D i Representing dataPacket transmission deadline (acquisition deadline), S i Indicating the transmission path of the data packet, T i And D i Units are expressed in terms of the number of slots.
At any time slot t, a data packet contains three attributes: c i ,h i ,t i Wherein, C i Indicating the node where the packet is located
Figure BDA0003770359070000081
h i Representing the remaining number of hops (0) for data transmission from the current node to the destination node<h i <H i ),t i Indicates the number of time slots (0) included in the remaining deadline of the packet<t i <T i )。
For data in transmission process, if the data state satisfies t i >h i It means that the data can theoretically be transmitted to the destination.
In the embodiment of the invention, in the process of real-time data processing, text data submitted by a user is acquired through a distributed publish-subscribe message system, the text data is stored in a preset queue, a natural language processing model is adopted to carry out event positioning on the text data in the preset queue to obtain event positioning information of the text data, risk analysis is carried out according to the event positioning information to obtain a risk analysis result, the automatic positioning of user requirements and risk analysis are realized, the timeliness and the accuracy of user requirement positioning are improved, the possible risk is reduced, excessive manual participation is not needed, and the efficiency is improved.
Embodiments of the present invention are described below with reference to fig. 2a and 2 b:
as shown in fig. 2a, a data access layer, a real-time processing layer, an intermediate layer, and a background management layer may be provided, where the data access layer may be responsible for accessing and denoising text data, and the real-time processing layer may perform data transmission service through a distributed publish-subscribe message system, and may call the intermediate layer to perform natural language processing and risk value calculation. For the calculation result, on one hand, a self-feedback mechanism can be adopted for self-feedback processing to optimize the real-time processing layer, and on the other hand, the calculation result can be fed back to the background management layer.
As shown in fig. 2b, the distributed publish-subscribe message system may be Kafka cluster, and the natural language processing model may be a convolutional neural network-based natural language processing model.
In the process of real-time data processing, the accessed text data can be subjected to first denoising processing through the data access layer, then the accessed text data can be stored to an early warning processing queue (namely a preset queue) through the Kafka cluster in the real-time processing layer, and for the text data in the early warning processing queue, the middle layer can be called to perform second event positioning on the text data.
In the event positioning process, text data can be converted into n x k words embedded into a vector matrix, convolution operation is carried out on a convolution layer by adopting a convolution kernel, classification is carried out on a pooling layer to obtain event category information, the event category information is input into a risk early warning model through a full connection layer, and risk analysis is carried out in the risk early warning model to obtain a risk analysis result.
After the risk analysis result is obtained, the potential risk event reaching the early warning value can be fed back, and a corresponding feedback message is generated to the background management layer. Moreover, a self-feedback mechanism can be adopted to feed the risk analysis result back to the Kafka cluster in the real-time processing layer so as to optimize the Kafka cluster.
Referring to fig. 3, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 301, in the process of real-time data processing, acquiring text data submitted by a user through a distributed publish-subscribe message system, and sending the text data of different data sources to different topic objects of the distributed publish-subscribe message system to store the text data to a preset queue corresponding to the topic object.
In practical application, a distributed publish-subscribe message system may be provided with a plurality of Topic objects (Topic), and each Topic object is provided with a corresponding preset queue.
Moreover, each subject object may be provided with a plurality of partitions, for example, if each partition corresponds to each time dimension, a multithread producer and a multi-partition strategy may be adopted, and according to a data source of text data, the data source may be sent to the subject object corresponding to the data source, and further, the data source may be stored in a preset queue corresponding to the subject object, so as to be subjected to subsequent data processing.
Step 302, obtaining text data from the preset queue.
In a specific implementation, the stored text data may be retrieved sequentially from a preset queue.
Step 303, converting the obtained text data into word embedding vectors by using a natural language processing model to obtain a word embedding vector matrix, and performing convolution processing on the word embedding vector matrix.
After the text data is obtained, the obtained text data can be converted into word embedding vectors, that is, a word is converted into a vector representation with a fixed length, and then a word embedding vector matrix consisting of a plurality of word embedding vectors can be obtained.
Under the condition that the natural language processing model is based on a convolutional neural network, the natural language processing model can be provided with a plurality of convolution kernels, a large number of convolution operations can be asynchronously carried out on the stacked word embedding vector matrix by utilizing the convolution kernels with different widths and step sizes, and information with different dimensions generated by different convolution kernels, namely convolution processing results, can be obtained.
And step 304, classifying the convolution processing result to obtain event category information.
After the convolution processing result is obtained, the convolution processing result can be input into a pooling layer, the data dimensionality is reduced, and then the convolution processing result can be classified to obtain event category information.
And 305, performing risk analysis according to the event category information to obtain a risk analysis result.
After the event category information is obtained, the event category information can be input into a preset risk early warning model, and then risk analysis can be performed according to a preset threshold value to obtain a risk analysis result.
In the embodiment of the invention, in the process of real-time data processing, text data submitted by a user is acquired through a distributed publish-subscribe message system, the text data with different data sources is sent to different subject objects of the distributed publish-subscribe message system to be stored in preset queues corresponding to the subject objects, the text data is acquired from the preset queues, a natural language processing model is adopted to convert the acquired text data into word embedding vectors to obtain a word embedding vector matrix, the convolution processing is carried out on the word embedding vector matrix, the result of the convolution processing is classified to obtain event category information, risk analysis is carried out according to the event category information to obtain a risk analysis result, the combination of a real-time data summarizing mode based on the distributed publish-subscribe message system and a natural language processing model based on a convolution neural network with extremely high processing speed is realized, and potential risk events can be efficiently and timely positioned and early warned.
Referring to fig. 4, a flowchart illustrating steps of another data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 401, in the process of real-time data processing, collecting text data submitted by a user through a distributed publish-subscribe message system, and storing the text data in a preset queue.
When a product is online for a new service, or at a specific time node of interest, such as a sales promotion activity, user requirements need to be positioned and responded in time to avoid adverse effects caused by potential risk upgrading, and text data submitted by a user through a webpage can be collected through a distributed publish-subscribe message system.
After the text data submitted by the user is collected, the text data can be stored in a preset queue, such as an exclusive queue for the user message, through the distributed publish-subscribe message system, so as to facilitate subsequent data processing.
And 402, performing event positioning on the text data in the preset queue by adopting a natural language processing model to obtain event positioning information of the text data.
And processing the text data stored in the preset queue according to the arrangement sequence of the text data in the queue, for example, sequencing according to the time of entering the queue, and processing the text data entering the queue first and processing the text data first.
In a specific implementation, a natural language processing model may be adopted to perform event positioning on the text data in the preset queue, that is, to position the key information fed back by the user, so as to obtain event positioning information.
Step 403, determining a first risk value according to the event positioning information according to the event dimension.
For different types of events, different time periods can be distinguished according to the dimension of the event, and the frequency of the events in the different time periods is subjected to uninterrupted statistics, so that a first risk value can be obtained.
Step 404, determining a second risk value according to the event positioning information according to the user dimension.
For different users, the frequency of submitting feedback within a fixed time can be counted according to the dimension of the user, and then a second risk value can be obtained.
Step 405, determining a third risk value according to the first risk value and the second risk value.
After the first risk value and the second risk value are obtained, weighted average can be performed to obtain a third risk value, and the third risk value can be used as a risk analysis result.
In the embodiment of the invention, in the process of real-time data processing, text data submitted by a user is acquired through a distributed publish-subscribe message system, the text data is stored in a preset queue, a natural language processing model is adopted to carry out event positioning on the text data in the preset queue to obtain event positioning information of the text data, a first risk value is determined according to the event positioning information according to an event dimension, a second risk value is determined according to the user dimension and the event positioning information, a third risk value is determined according to the first risk value and the second risk value, the first risk value calculation is carried out according to the event dimension, the second risk value calculation is carried out according to the user dimension, then the total score of the early warning value in the time period is obtained through the calculation values twice, the risk analysis of abnormal data is realized, and the accuracy of a risk analysis result is ensured.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those of skill in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the embodiments of the invention.
Referring to fig. 5, a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
the text data collecting module 501 is configured to collect text data submitted by a user through a distributed publish-subscribe messaging system during a real-time data processing process, and store the text data in a preset queue.
The text data processing module 502 is configured to perform event positioning on the text data in the preset queue by using a natural language processing model, so as to obtain event positioning information of the text data.
And a risk analysis module 503, configured to perform risk analysis according to the event positioning information to obtain a risk analysis result.
In an embodiment of the present invention, the method further includes:
and the distributed publishing and subscribing message system optimizing module is used for optimizing the distributed publishing and subscribing message system according to the risk analysis result.
In an embodiment of the present invention, the text data acquiring module 501 includes:
and the storage submodule according to the theme object is used for acquiring text data submitted by a user through the distributed publish-subscribe message system in the real-time data processing process, and sending the text data of different data sources to different theme objects of the distributed publish-subscribe message system so as to store the text data to a preset queue corresponding to the theme object.
In an embodiment of the present invention, the event positioning information includes event category information, the natural language processing model is a natural language processing model based on a convolutional neural network, and the text data processing module 502 includes:
the text data acquisition submodule is used for acquiring text data from the preset queue;
the convolution processing submodule is used for converting the acquired text data into word embedding vectors by adopting a natural language processing model to obtain a word embedding vector matrix and carrying out convolution processing on the word embedding vector matrix;
and the data classification submodule is used for classifying the convolution processing result to obtain event classification information.
In an embodiment of the present invention, the risk analysis module 503 includes:
the first risk value determining submodule is used for determining a first risk value according to the event dimension and the event positioning information;
the second risk value determining submodule is used for determining a second risk value according to the user dimension and the event positioning information;
and the third risk value determining submodule is used for determining a third risk value according to the first risk value and the second risk value.
In an embodiment of the present invention, the method further includes:
and the early warning and preprocessing module is used for carrying out risk early warning and/or risk preprocessing according to the risk analysis result.
In an embodiment of the present invention, the method further includes:
and the denoising processing module is used for denoising the text data submitted by the user.
In the embodiment of the invention, in the process of real-time data processing, text data submitted by a user is acquired through a distributed publish-subscribe message system, the text data is stored in a preset queue, a natural language processing model is adopted to perform event positioning on the text data in the preset queue to obtain event positioning information of the text data, risk analysis is performed according to the event positioning information to obtain a risk analysis result, the user requirement is automatically positioned and risk analysis is performed, the timeliness and the accuracy of user requirement positioning are improved, possible risks are reduced, excessive manual participation is not needed, and the efficiency is improved.
An embodiment of the present invention further provides an electronic device, which may include a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the method for processing data as above is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above data processing method.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "include", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or terminal device including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or terminal device. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal apparatus that comprises the element.
The method and apparatus for data processing provided above are described in detail, and a specific example is applied herein to illustrate the principles and embodiments of the present invention, and the above description of the embodiment is only used to help understand the method and core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of data processing, the method comprising:
in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data is stored in a preset queue;
adopting a natural language processing model to perform event positioning on the text data in the preset queue to obtain event positioning information of the text data;
and performing risk analysis according to the event positioning information to obtain a risk analysis result.
2. The method of claim 1, further comprising:
and optimizing the distributed publish-subscribe message system according to the risk analysis result.
3. The method according to claim 1 or 2, wherein during the real-time data processing, collecting text data submitted by a user through a distributed publish-subscribe message system and storing the text data in a preset queue, comprises:
in the process of real-time data processing, text data submitted by a user is collected through a distributed publish-subscribe message system, and the text data of different data sources are sent to different subject objects of the distributed publish-subscribe message system to be stored in a preset queue corresponding to the subject objects.
4. The method according to claim 1 or 2, wherein the event positioning information includes event category information, the natural language processing model is a convolutional neural network-based natural language processing model, and the event positioning of the text data in the preset queue by using the natural language processing model to obtain the event positioning information of the text data includes:
acquiring text data from the preset queue;
converting the acquired text data into word embedding vectors by adopting a natural language processing model to obtain a word embedding vector matrix, and performing convolution processing on the word embedding vector matrix;
and classifying the convolution processing result to obtain event category information.
5. The method according to claim 1 or 2, wherein the performing risk analysis according to the event positioning information to obtain a risk analysis result comprises:
determining a first risk value according to the event positioning information according to the event dimension;
determining a second risk value according to the event positioning information according to the user dimension;
determining a third risk value according to the first risk value and the second risk value.
6. The method of claim 1, further comprising:
and carrying out risk early warning and/or risk preprocessing according to the risk analysis result.
7. The method of claim 1, further comprising, prior to the collecting text data submitted by a user via a distributed publish-subscribe message system:
and denoising the text data submitted by the user.
8. An apparatus for data processing, the apparatus comprising:
the system comprises a text data acquisition module, a data processing module and a data processing module, wherein the text data acquisition module is used for acquiring text data submitted by a user through a distributed publish-subscribe message system in the process of real-time data processing and storing the text data in a preset queue;
the text data processing module is used for carrying out event positioning on the text data in the preset queue by adopting a natural language processing model to obtain event positioning information of the text data;
and the risk analysis module is used for carrying out risk analysis according to the event positioning information to obtain a risk analysis result.
9. An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing a method of data processing according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of data processing according to any one of claims 1 to 7.
CN202210899406.2A 2022-07-28 2022-07-28 Data processing method and device Pending CN115329745A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210899406.2A CN115329745A (en) 2022-07-28 2022-07-28 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210899406.2A CN115329745A (en) 2022-07-28 2022-07-28 Data processing method and device

Publications (1)

Publication Number Publication Date
CN115329745A true CN115329745A (en) 2022-11-11

Family

ID=83920169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210899406.2A Pending CN115329745A (en) 2022-07-28 2022-07-28 Data processing method and device

Country Status (1)

Country Link
CN (1) CN115329745A (en)

Similar Documents

Publication Publication Date Title
US10817568B2 (en) Domain-oriented predictive model feature recommendation system
US9817893B2 (en) Tracking changes in user-generated textual content on social media computing platforms
US20190294975A1 (en) Predicting using digital twins
EP3113037B1 (en) Adaptive adjustment of network responses to client requests in digital networks
US10102481B2 (en) Hybrid active learning for non-stationary streaming data with asynchronous labeling
CN108416032B (en) Text classification method, device and storage medium
US20220405641A1 (en) Method for recommending information, recommendation server, and storage medium
CN109542737A (en) Platform alert processing method, device, electronic device and storage medium
US20230011954A1 (en) Device, method, and system for business plan management
CN111179055B (en) Credit line adjusting method and device and electronic equipment
US11146524B2 (en) Intelligent contextual timelining of notifications
Bhatt et al. An efficient approach for low latency processing in stream data
US11514458B2 (en) Intelligent automation of self service product identification and delivery
US11636377B1 (en) Artificial intelligence system incorporating automatic model updates based on change point detection using time series decomposing and clustering
CN115329745A (en) Data processing method and device
CN116225848A (en) Log monitoring method, device, equipment and medium
CN116151235A (en) Article generating method, article generating model training method and related equipment
US20220261683A1 (en) Constraint sampling reinforcement learning for recommendation systems
Rezaeenour et al. Developing a new hybrid intelligent approach for prediction online news popularity
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment
CN116132540B (en) Multi-service system data processing method and device
CN115392199B (en) Evaluation analysis and report generation method, device, electronic equipment and storage medium
CN110955823B (en) Information recommendation method and device
US20240028935A1 (en) Context-aware prediction and recommendation
US20220351034A1 (en) Engagement signal generation and analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination