CN117520410A - Service data processing method, device, electronic equipment and computer readable medium - Google Patents

Service data processing method, device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN117520410A
CN117520410A CN202311453968.5A CN202311453968A CN117520410A CN 117520410 A CN117520410 A CN 117520410A CN 202311453968 A CN202311453968 A CN 202311453968A CN 117520410 A CN117520410 A CN 117520410A
Authority
CN
China
Prior art keywords
monitored
service data
data
updated
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311453968.5A
Other languages
Chinese (zh)
Inventor
胡江南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fusionskye Beijing Software Co ltd
Original Assignee
Fusionskye Beijing Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fusionskye Beijing Software Co ltd filed Critical Fusionskye Beijing Software Co ltd
Priority to CN202311453968.5A priority Critical patent/CN117520410A/en
Publication of CN117520410A publication Critical patent/CN117520410A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the disclosure discloses a business data processing method, a business data processing device, electronic equipment and a computer readable medium. One embodiment of the method comprises the following steps: receiving a service data set to be monitored; updating the array value of the initial bitmap array to obtain an updated bitmap array; determining a target number; generating an updated service data sequence to be monitored; generating a service data characteristic set to be monitored; performing anomaly detection on the updated service data sequence to be monitored to obtain an anomaly data set; performing data restoration on the abnormal data set to obtain a restored data set; and generating the service data sequence to be monitored after processing according to the repaired data set and the updated service data sequence to be monitored. The embodiment realizes the effective detection of the missing service data, avoids the problem of discontinuous service data caused by directly storing the service data reported by subordinate institutions, and reduces the subsequent data retrieval cost for the service data on the side.

Description

Service data processing method, device, electronic equipment and computer readable medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a service data processing method, apparatus, electronic device, and computer readable medium.
Background
With the development of computer related technology, data electronization is widely popularized and applied. Meanwhile, with the increase of the subordinate institutions and the real-time service data generated by the subordinate institutions, how to realize the efficient utilization of mass service data becomes important, and at present, when processing the service data, the following modes are generally adopted: and directly storing the business data reported by subordinate institutions.
However, the inventors found that when the above manner is adopted, there are often the following technical problems:
firstly, the time-series business data is often stored in storage according to the time-series property, but the business data reported by subordinate institutions is directly stored, so that the missing business data is difficult to effectively detect, the problem of discontinuous business data is possibly caused, and the subsequent data retrieval cost for the business data is increased;
second, large volumes of business data often contain noise and outliers that may interfere with the detection performance for abnormal data detection, resulting in an inability to accurately perform effective detection and data repair on abnormal business data;
Third, global features corresponding to a large amount of service data may be affected by overall data distribution, and local features corresponding to a large amount of service data may be interfered by noise or local changes, so that a single feature processing manner may cause difficulty in accurately predicting the amount of the article circulation, and thus may cause failure in effectively and accurately repairing data of abnormal service data.
The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose business data processing methods, apparatuses, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a method of service data processing, the method comprising: receiving a service data set to be monitored, wherein the service data to be monitored in the service data set to be monitored is service data which is reported by subordinate institutions in real time and corresponds to identification information; according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, updating the array value of the initial bitmap array to obtain an updated bitmap array; determining a target number according to the updated bitmap array, wherein the target number represents the number of array values meeting a first screening condition in the updated bitmap array; generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, which meets a second screening condition; generating a service data feature set to be monitored according to a pre-trained data feature extraction model and the updated service data sequence to be monitored; performing anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data by using an anomaly data detection model trained in advance, so as to obtain an anomaly data set; based on the service data feature set to be monitored, carrying out data restoration on each abnormal data in the abnormal data set to obtain a restored data set; and generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
In a second aspect, some embodiments of the present disclosure provide a service data processing apparatus, the apparatus including: the receiving unit is configured to receive a to-be-monitored service data set, wherein the to-be-monitored service data in the to-be-monitored service data set is service data which is reported by subordinate institutions in real time and corresponds to identification information; the updating unit is configured to update the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, so as to obtain an updated bitmap array; a determining unit configured to determine a target number according to the updated bitmap array, where the target number characterizes a number of array values in the updated bitmap array that satisfy a first filtering condition; the first generation unit is configured to generate an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, which meets the second screening condition; the second generating unit is configured to generate a service data feature set to be monitored according to the pre-trained data feature extraction model and the updated service data sequence to be monitored; the abnormality detection unit is configured to perform abnormality detection on each updated service data to be monitored in the updated service data sequence through an abnormality data detection model trained in advance so as to generate abnormal data and obtain an abnormal data set; the data restoration unit is configured to restore the data of each abnormal data in the abnormal data set based on the service data feature set to be monitored to obtain a restored data set; and the third generating unit is configured to generate a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
The above embodiments of the present disclosure have the following advantageous effects: by the service data processing method of some embodiments of the present disclosure, effective detection of missing service data is achieved, the problem of service data discontinuity caused by directly storing service data reported by subordinate institutions is avoided, and the subsequent data retrieval cost for service data is reduced laterally. Specifically, the reason for increasing the cost of subsequent data retrieval for the service data is that: the service data aiming at time sequence is often stored in warehouse according to the time sequence, but the service data reported by subordinate institutions is directly stored, so that the missing service data is difficult to effectively detect, the problem of discontinuous service data is possibly caused, and the subsequent data retrieval cost aiming at the service data is increased. Based on this, in the service data processing method of some embodiments of the present disclosure, firstly, a to-be-monitored service data set is received, where to-be-monitored service data in the to-be-monitored service data set is service data corresponding to identification information and reported in real time by a subordinate institution. And secondly, updating the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, and obtaining an updated bitmap array. For a large amount of service data, the service data are converted into binary groups for subsequent judgment of whether the service data are continuous, so that the operation efficiency is effectively improved and the consumption of calculation resources is greatly reduced compared with a traversing mode. For example, by storing traversal identifiers in integer (int), for business data of hundreds of millions of levels, memory resources are often consumed, and a large amount of memory resources (e.g., memory or external memory) are consumed by intermediate data generated in the traversal process. And then, determining a target number according to the updated bitmap array, wherein the target number represents the number of array values meeting the first screening condition in the updated bitmap array, and judging whether the data are continuous or not according to the obtained target number. And then, generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, meeting the second screening condition, and the time-series service data are stored in storage according to the time sequence, so that when the data are discontinuous, the subsequent data processing is affected. For example, when the main key of the service data is discontinuous, a corresponding data search statement needs to be additionally designed, and the search cost of the data is increased laterally. Therefore, the method and the device can avoid the problem of increased data retrieval cost of the service data caused by discontinuous data by generating the continuously updated service data sequence to be monitored. Further, according to the pre-trained data feature extraction model and the updated service data sequence to be monitored, a service data feature set to be monitored is generated, so that data features aiming at the service data to be monitored are obtained for repairing abnormal data later. In addition, through an abnormal data detection model trained in advance, carrying out abnormal detection on each updated service data to be monitored in the updated service data sequence to generate abnormal data, and obtaining an abnormal data set. And then, based on the service data characteristic set to be monitored, carrying out data restoration on each abnormal data in the abnormal data set to obtain a restored data set. Thereby realizing data repair for abnormal service data. And finally, generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored. By the method, the problem of discontinuous service data caused by directly storing the service data reported by subordinate institutions is avoided, the data continuity of the service data is ensured, and the subsequent data retrieval cost for the service data is reduced on the side face.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flow chart of some embodiments of a business data processing method according to the present disclosure;
FIG. 2 is a schematic diagram of the architecture of some embodiments of a traffic data processing apparatus according to the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Referring to fig. 1, a flow 100 of some embodiments of a business data processing method according to the present disclosure is shown. The service data processing method comprises the following steps:
Step 101, receiving a service data set to be monitored.
In some embodiments, an execution body (for example, a computing device) of the service data processing method may receive a to-be-monitored service data set through a wired connection or a wireless connection, where to-be-monitored service data in the to-be-monitored service data set is service data corresponding to identification information and reported in real time by a subordinate institution. The subordinate mechanism is a mechanism for providing service data to be monitored. In practice, the subordinate may be a stock exchange. The identification information is information for uniquely identifying the service data to be monitored.
It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.
The computing device may be hardware or software. When the computing device is hardware, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein. It should be appreciated that the number of computing devices may have any number, as desired for implementation.
And 102, updating the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, and obtaining an updated bitmap array.
In some embodiments, the executing body may update an array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, so as to obtain an updated bitmap array. Wherein the initial bitmap array is an initialized array composed of binary bits.
As an example, the executing body may perform an array value assignment on the initial bitmap array according to the identification information corresponding to the to-be-monitored service data in the to-be-monitored service data set, to obtain an updated bitmap array.
In some optional implementations of some embodiments, the executing body updates an array value of the initial bitmap array according to the identification information corresponding to each service data to be monitored in the service data set to be monitored, to obtain an updated bitmap array, and may include the following steps:
and updating an array value matched with the identification information corresponding to the service data to be monitored in the initial bitmap array to a preset array value in the updated bitmap array for each service data to be monitored in the service data set to be monitored.
Wherein the array value is a value in the initial bitmap array. In practice, the array value is a binary value. Wherein, the preset array value may be "1".
In practice, the executing body updates the array value of the initial bitmap array corresponding to the information index which is the same as the identification information corresponding to the service data to be monitored into the preset array value in the updated bitmap array.
As an example, the identification information corresponding to the service data to be monitored may be "2", and thus, the execution body may update the array value corresponding to the information index "2" in the initial bitmap array to "1".
Step 103, determining the target number according to the updated bitmap array.
In some embodiments, the executing body may determine a target number according to the updated bitmap array, where the target number characterizes a number of array values in the updated bitmap array that satisfy the first filtering condition. Wherein, the first screening condition is "the array value is '1'".
In practice, the execution body may traverse the updated bitmap array to determine the number of array values that satisfy the first filtering condition.
As an example, the set of identification information corresponding to the set of service data to be monitored is "{2,4,7, 10,3,5,6,1}", and the updated bitmap array may be "{1,1,1,1,1,1,1,0,0,1,0,0 }"). Thus, the target number may be "8".
And 104, generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target quantity is smaller than the identification information corresponding to the target service data to be monitored.
In some embodiments, in response to determining that the target number is less than the identification information corresponding to the target to-be-monitored service data, the executing body may generate an updated to-be-monitored service data sequence according to the to-be-monitored service data set, where the target to-be-monitored service data is to-be-monitored service data whose corresponding identification information in the to-be-monitored service data set satisfies the second filtering condition. Wherein, the second screening condition is: and the target service data to be monitored is the same as the service data to be monitored with the maximum corresponding identification information in the service data set to be monitored.
In practice, in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, the execution subject may traverse the service data set to be monitored to generate an updated service data sequence to be monitored.
As an example, the set of identification information corresponding to the set of service data to be monitored is "{2,4,7, 10,3,5,6,1}", the updated bitmap array may be "{1,1,1,1,1,1,1,0,0,1,0,0}", the target number may be "8", the set of identification information corresponding to the target service data to be monitored may be "10", and the set of service data to be monitored may be "10"
{id=2:2,200,97,55,44;
id=4:10,1000,82,55,11;
id=7:8,800,53,66,66;
id=10:2,200,43,66,66;
id=3:5,500,92,66,33;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=1: 1, 100, 99, 55, 66; }". Thus, the updated service data sequence to be monitored can be'
{id=1:1,100,99,55,66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=7:8,800,53,66,66;
id=8:3,300,50,55,55;
id=9:5,500,45,88,99;
id=10:2,200,43,66,66;
id=11:3,300,40,45,46;
id=12: 10 1000, 40, 48, 66; }". Where "id" represents identification information.
In some optional implementations of some embodiments, the generating, by the executing body, an updated service data sequence to be monitored according to the service data set to be monitored may include the following steps:
and firstly, sequencing each service data to be monitored in the service data set to be monitored to obtain a sequenced service data sequence to be monitored.
In practice, the executing body may perform bubbling sequencing on the to-be-monitored service data set according to the identifier information set corresponding to each to-be-monitored service data in the to-be-monitored service data set, to obtain a sequenced to-be-monitored service data sequence.
As an example, the set of traffic data to be monitored may be "
{id=2:2,200,97,55,44;
id=4:10,1000,82,55,11;
id=7:8,800,53,66,66;
id=10:2,200,43,66,66;
id=3:5,500,92,66,33;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=1: 1, 100, 99, 55, 66; }". The ordered traffic data sequence to be monitored may be "{ id=1: 1, 100, 99, 55, 66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=7:8,800,53,66,66;
id=10:2,200,43,66,66;}”。
and secondly, confirming an abnormal identification information sequence based on the ordered service data sequence to be monitored.
The abnormal identification information in the abnormal identification information sequence is identification information which is missing in the to-be-monitored service data set and corresponds to the to-be-monitored service data which is required to be reported by the subordinate institution again.
In practice, first, the execution subject may create a complete sequence containing 1 to 12 as an initial numerical sequence. And finally, for each initial value in the initial value sequence, when the initial value does not exist in the identification information sequence corresponding to the service data sequence to be monitored after the sequencing, the execution body determines the initial value as the abnormal identification information in the abnormal identification information sequence.
As an example, the initial value sequence may be "{1,2,3,4,5,6,7,8,9, 10, 11, 12}", and the identification information sequence corresponding to the sequenced service data sequence to be monitored may be "{1,2,3,4,5,6,7, 10}". Thus, the anomaly identification information sequence may be "{8,9, 11, 12 }).
And thirdly, transmitting the abnormal identification information sequence to the subordinate organization so that the subordinate organization can carry out data feeding according to the abnormal identification information sequence.
In practice, the execution subject may send the abnormality identification information sequence to the subordinate organization by means of a wired connection or a wireless connection.
It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.
And fourthly, in response to receiving the candidate to-be-monitored service data sequence which is uploaded by the subordinate mechanism and aims at the abnormal identification information sequence, splicing the sequenced to-be-monitored service data sequence and the candidate to-be-monitored service data sequence to obtain a spliced to-be-monitored service data sequence.
As an example, the ordered traffic data sequence to be monitored may be "{ id=1: 1, 100, 99, 55, 66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=7:8,800,53,66,66;
id=10: 2, 200, 43, 66, 66; }". The candidate traffic data sequence to be monitored may be "{ id=8: 3, 300, 50, 55, 55;
id=9:5,500,45,88,99;
id=11:3,300,40,45,46;
id=12: 10 1000, 40, 48, 66; }". Thus, the post-splice traffic data sequence to be monitored may be "{ id=1: 1, 100, 99, 55, 66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=7:8,800,53,66,66;
id=10:2,200,43,66,66;
id=8:3,300,50,55,55;
id=9:5,500,45,88,99;
id=11:3,300,40,45,46;
id=12:10,1000,40,48,66;}”。
and fifthly, sequencing the spliced service data sequences to be monitored to obtain the updated service data sequences to be monitored.
In practice, the execution main body may select and sort the spliced service data sequences to be monitored according to the identifier information set corresponding to the spliced service data sequences to be monitored, so as to obtain the updated service data sequences to be monitored.
And 105, generating a service data feature set to be monitored according to the pre-trained data feature extraction model and the updated service data sequence to be monitored.
In some embodiments, the executing body may generate the to-be-monitored service data feature set according to the pre-trained data feature extraction model and the updated to-be-monitored service data sequence. The data feature extraction model is used for extracting features of the updated service data sequence to be monitored. In practice, the data feature extraction model may be a convolutional neural network model.
In practice, the execution body may take the updated service data sequence to be monitored as an input of the data feature extraction model, so as to generate a service data feature set to be monitored.
Optionally, the updating the service data to be monitored in the service data sequence to be monitored includes: the data feature extraction model comprises the following components: the method comprises a historical virtual article value feature extraction model, a historical virtual article flow quantity feature extraction model and a feature fusion model. The historical virtual article value may be a historical value corresponding to the virtual article. For example, the historical virtual item value may be a historical stock price. The historical virtual article circulation amount can be a historical circulation amount corresponding to the virtual article. For example, the historical virtual item flow amount may be a historical stock exchange amount.
In some optional implementations of some embodiments, the generating, by the executing entity, a set of to-be-monitored service data features according to the pre-trained data feature extraction model and the updated to-be-monitored service data sequence may include the following steps:
and performing data smoothing processing on the historical virtual article value included in each updated service data to be monitored in the updated service data sequence to be monitored, so as to obtain a smoothed historical virtual article value sequence.
In practice, the executing body may use a moving average technology to perform data smoothing processing on the historical virtual article value included in each updated service data to be monitored in the updated service data sequence to obtain a smoothed historical virtual article value sequence.
And secondly, performing data conversion processing on the historical virtual article flow quantity included in each updated service data to be monitored in the updated service data sequence to be monitored, and obtaining a data converted historical virtual article flow quantity sequence.
In practice, the execution subject may perform logarithmic transformation processing on the historical virtual article flow amount included in each updated service data to be monitored in the updated service data sequence to be monitored, so as to obtain a smoothed historical virtual article value sequence.
And thirdly, inputting the smoothed historical virtual article value sequence into the historical virtual article value feature extraction model to generate a first data feature.
The historical virtual article value feature extraction model is used for extracting features of the smoothed historical virtual article value sequence. In practice, the historical virtual item value feature extraction model may include at least one convolution layer, wherein the convolution layers of the at least one convolution layer are serially connected.
And step four, extracting data features of the historical virtual article flow quantity sequence after data conversion through the historical virtual article flow quantity feature extraction model so as to generate second data features, wherein the first data features and the second data features are time sequence features.
The historical virtual article circulation quantity feature extraction model is used for extracting features of the historical virtual article circulation quantity sequence after data conversion. In practice, the model structure of the historical virtual article circulation quantity feature extraction model may be identical to the model structure of the historical virtual article value feature extraction model.
And fifthly, inputting the first data feature and the second data feature into the feature fusion model to generate the to-be-monitored service data feature set.
The feature fusion model is used for carrying out feature fusion on the first data feature extracted by the historical virtual article value feature extraction model and the second data feature extracted by the historical virtual article flow quantity. The feature fusion model comprises the following components: and a characteristic splicing layer. The execution main body can perform feature splicing on the first data features and the second data features through a feature splicing layer to obtain the service data feature set to be monitored.
And 106, carrying out anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data through an anomaly data detection model trained in advance, and obtaining an anomaly data set.
In some embodiments, the executing body may perform anomaly detection on each updated service data to be monitored in the updated service data sequence through an anomaly data detection model trained in advance, so as to generate anomaly data, and obtain an anomaly data set. The abnormal data represents the data abnormal in the updated service data sequence to be monitored and the updated service data to be monitored. The number of the abnormal data in the abnormal data set is smaller than or equal to the number of the updated service data to be monitored in the updated service data sequence to be monitored. In practice, the execution subject inputs each updated service data to be monitored in the updated service data sequence to the pre-trained abnormal data detection model to generate abnormal data, so as to obtain an abnormal data set.
Optionally, the abnormal data detection model includes: a numerical data anomaly detection model and a discrete data anomaly detection model, the anomaly data in the anomaly data set comprising: abnormal historical virtual item value and abnormal historical virtual item flow quantity. The numerical data anomaly detection model takes the service data to be monitored after the secondary cleaning as input and takes the value of the anomaly history virtual article included in the anomaly data corresponding to the service data to be monitored after the updating as output. The discrete data anomaly detection model takes the service data to be monitored after the secondary cleaning as input and takes the anomaly history virtual article circulation quantity included in the anomaly data corresponding to the updated service data to be monitored as output. In practice, both the numerical data anomaly detection model and the discrete data anomaly detection model may be time series models. For example, both the numerical data anomaly detection model and the discrete data anomaly detection model may be ARIMA (autoregressive integrated moving average model, autoregressive Integrated Moving Average) models.
In some optional implementations of some embodiments, the executing body performs anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data through a pre-trained anomaly data detection model, and may include the following steps:
and the first step is to carry out first data cleaning treatment on the historical virtual article value included in the updated service data to be monitored, so as to obtain the service data to be monitored after cleaning.
In practice, the executing body can perform normalization processing on the historical virtual article value included in the updated service data to be monitored to obtain the service data to be monitored after cleaning.
And secondly, performing second data cleaning treatment on the historical virtual article circulation quantity included in the service data to be monitored after the cleaning, and obtaining the service data to be monitored after the secondary cleaning.
In practice, the execution main body can obtain the service data to be monitored after the secondary cleaning by performing data conversion processing on the historical virtual article circulation quantity included in the service data to be monitored after the cleaning.
And thirdly, inputting the service data to be monitored after the secondary cleaning into the numerical data anomaly detection model to obtain the value of the anomaly history virtual article included in the anomaly data corresponding to the updated service data to be monitored.
And fourthly, detecting abnormal data of the service data to be monitored after the secondary cleaning through the discrete data abnormal detection model so as to generate abnormal historical virtual article circulation quantity included in the abnormal data corresponding to the updated service data to be monitored.
In practice, the execution main body inputs the service data to be monitored after the secondary cleaning into the discrete data anomaly detection model to realize anomaly data detection so as to generate the anomaly history virtual article circulation quantity included in the anomaly data corresponding to the service data to be monitored after the updating.
And step 107, performing data repair on each abnormal data in the abnormal data set based on the service data feature set to be monitored, and obtaining a repaired data set.
In some embodiments, the executing body may perform data repair on each abnormal data in the abnormal data set based on the to-be-monitored service data feature set, to obtain a repaired data set.
In practice, firstly, the executing body may send an abnormal data set to the subordinate mechanism, then, the executing body may receive data fed by the subordinate mechanism, and finally, the executing body may perform data repair on each abnormal data in the abnormal data set according to the data fed by the subordinate mechanism, so as to obtain a repaired data set.
Optionally, the post-repair data in the post-repair data set includes: the value of the repaired historical virtual article and the circulation quantity of the repaired historical virtual article. The post-repair historical virtual article value is obtained by performing data repair on the abnormal historical virtual article value. And the repaired historical virtual article circulation quantity is the historical virtual article circulation quantity after the data of the abnormal historical virtual article circulation quantity is repaired.
In some optional implementations of some embodiments, the executing body performs data repair on each abnormal data in the abnormal data set based on the to-be-monitored service data feature set to obtain a repaired data set, and may include the following steps:
for each abnormal data in the abnormal data set, the following data repair steps are performed:
and selecting the updated service data to be monitored, of which the corresponding identification information is smaller than the identification information corresponding to the abnormal data, from the updated service data sequence to be monitored as candidate service data to be monitored, and obtaining a candidate service data sequence to be monitored.
In practice, the executing body determines the updated service data to be monitored, of which the corresponding identification information is smaller than the identification information corresponding to the abnormal data, in the updated service data sequence to be monitored as candidate service data to obtain a candidate service data sequence to be monitored.
As an example, the updated service data sequence to be monitored may be'
{id=1:1,100,99,55,66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;
id=5:12,1200,86,66,880000;
id=6:15,1500,61,55,44;
id=7:8,800,53,66,66;
id=8:3,300,50,55,55;
id=9:5,500,45,88,99;
id=10:2,200,43,66,66;
id=11:3,300,40,45,46;
id=12: 10 1000, 40, 48, 66; }". The above-mentioned abnormal data may be "{ id=5: 12, 1200, 86, 66, 880000}". The identification information corresponding to the abnormal data is "5". Thus, the traffic data sequence to be monitored may be "{ id=1: 1, 100, 99, 55, 66;
id=2:2,200,97,55,44;
id=3:5,500,92,66,33;
id=4:10,1000,82,55,11;}”。
second, the first processing step is executed for the value of the abnormal history virtual article included in the abnormal data, wherein the first processing step is as follows:
and 1, clustering the candidate business data sequences to be monitored and the abnormal historical virtual article values included in the abnormal data to obtain a first clustering result, wherein the first clustering result comprises at least one first clustering cluster.
And the first clustering result is a result after clustering the candidate business data sequence to be monitored and the abnormal historical virtual article value included in the abnormal data.
In practice, the execution subject performs hierarchical clustering on the candidate service data sequence to be monitored and the value of the abnormal historical virtual object included in the abnormal data to obtain a first clustering result.
And 2, selecting a first cluster containing the value of the abnormal historical virtual object included in the abnormal data from the at least one first cluster according to the first clustering result, and taking the first cluster as a service data sequence to be monitored after screening.
In practice, according to the first clustering result, the executing body determines a first cluster including the value of the abnormal historical virtual object included in the abnormal data in the at least one first cluster as a service data sequence to be monitored after screening.
And 3, generating a first data matrix according to the service data sequence to be monitored after screening.
The first data matrix may refer to a data matrix containing the service data sequence to be monitored after the screening and zero elements.
In practice, first, the execution body may reject the service data sequence to be monitored after the screening from the identification information set. And then, the execution main body converts the obtained service data sequence to be monitored after screening and removing the identification information into a matrix. Finally, the execution body fills the part to be predicted in the matrix into element '0', and the first data matrix is obtained.
As an example, the service data sequence to be monitored after the screening may be'
{id=2:2,200,97,55,44;
id=4:10,1000,82,55,11;
id=5: 12 1200, 86, 66, 880000; }". Thus, the first data matrix may be'
[2,200,97,55,66;
10,1000,82,22,44;
12,1200,86,66,0]”。
And step 4, inputting the first data matrix and the to-be-monitored service data characteristic set into a pre-trained historical virtual article flow quantity prediction model so as to generate a repaired historical virtual article flow quantity which is included in the repaired data and corresponds to the abnormal data.
The historical virtual article circulation quantity prediction model is used for predicting historical virtual article circulation quantity according to the first data matrix and the to-be-monitored service data characteristic set.
Optionally, the historical virtual article flow quantity prediction model includes: a historical virtual article flow quantity global prediction model and a historical virtual article flow quantity local prediction model. The historical virtual article circulation quantity global prediction model is used for carrying out global prediction on the historical virtual article circulation quantity according to the to-be-monitored business data feature set. The local prediction model of the historical virtual article circulation quantity is a model for adjusting the historical virtual article circulation quantity obtained through global prediction according to the first data matrix. In practice, the network structure of the above-mentioned historical virtual article flow amount global prediction model may include: convolution layer, pooling layer, loop layer and full connection layer. The convolution layer can carry out convolution processing on the service data characteristic set to be monitored. The convolutional layer may consist of a predetermined number of 3 x 3 convolutional kernels and a ReLU activation function. For example, the predetermined number may be 256. The pooling layer may perform an average pooling process on the output of the convolutional layer. The recurrent layer may be a recurrent neural network. For example, the loop layer may be an LSTM (Long Short Term Memory network) layer. The fully-connected layer can convert the output of the circulating layer into the predicted historical virtual article circulation quantity. The model structure of the historical virtual article flow local prediction model can multiplex the network structure of the historical virtual article flow global prediction model.
In some optional implementations of some embodiments, the executing body inputs the first data matrix and the to-be-monitored service data feature set to a pre-trained historical virtual article flow amount prediction model to generate a post-repair historical virtual article flow amount included in post-repair data corresponding to the abnormal data, and may include the following steps:
and a first sub-step of inputting the to-be-monitored business data feature set into the history virtual article circulation quantity global prediction model to generate a predicted history virtual article circulation quantity.
And a second sub-step of adjusting the predicted historical virtual article circulation quantity according to a local prediction model of the historical virtual article circulation quantity to obtain the repaired historical virtual article circulation quantity included in the repaired data corresponding to the abnormal data.
In practice, the execution body inputs the predicted historical virtual article circulation amount into a historical virtual article circulation amount local prediction model to adjust, and obtains the repaired historical virtual article circulation amount included in the repaired data corresponding to the abnormal data.
The content of the foregoing "in some optional implementations of some embodiments" is taken as an invention point of the disclosure, which solves the third technical problem mentioned in the background art, namely, "global features corresponding to large-batch service data may be affected by overall data distribution, and local features corresponding to large-batch service data may be interfered by noise or local changes, so that a single feature processing manner may cause difficulty in accurately predicting the commodity circulation quantity, and may cause failure in effectively and accurately repairing abnormal service data. In practice, as the data volume of service data increases, global feature extraction is performed on service data corresponding to a to-be-monitored service data feature set, and detailed features (local features) corresponding to specific to-be-monitored service data features cannot be focused due to large receptive field granularity, so that data repair is not accurate enough. Local feature extraction is performed on service data corresponding to the service data feature set to be monitored, and overall features (global features) corresponding to the global service data feature to be monitored cannot be concerned due to small receptive field granularity, so that data repair is not accurate enough. Based on the above, the present disclosure firstly inputs the to-be-monitored service data feature set into the historical virtual article circulation amount global prediction model to generate a predicted historical virtual article circulation amount, so as to obtain a predicted historical virtual article circulation amount under a global view angle. And secondly, according to a local prediction model of the historical virtual article circulation quantity, the predicted historical virtual article circulation quantity is adjusted to obtain the repaired historical virtual article circulation quantity which is included in the repaired data corresponding to the abnormal data, so that the article circulation quantity is adjusted by combining the local characteristics on the predicted historical virtual article circulation quantity under the global view angle, and the repaired historical virtual article circulation quantity is obtained. By combining the global features and the local features, the method realizes effective and accurate data restoration of abnormal service data.
Third, the following second processing step is executed for the abnormal historical virtual article circulation amount included in the abnormal data:
and step 1, clustering the candidate business data sequences to be monitored and the abnormal historical virtual article circulation quantity included in the abnormal data to obtain a second clustering result.
Wherein the second cluster result comprises at least one second cluster. And the second clustering result is a result after clustering the candidate business data sequence to be monitored and the abnormal historical virtual article circulation quantity included by the abnormal data.
In practice, the execution main body performs hierarchical clustering on the candidate service data sequence to be monitored and the abnormal historical virtual article flow quantity included in the abnormal data to obtain a second clustering result.
And 2, selecting a second cluster containing the abnormal historical virtual article flow quantity included in the abnormal data from the at least one second cluster according to the second cluster result, and taking the second cluster as a service data sequence to be monitored after second screening.
In practice, according to the second clustering result, the execution body determines a second cluster including the abnormal historical virtual article flow amount included in the abnormal data in the at least one second cluster as a service data sequence to be monitored after the second screening.
And 3, generating a second data matrix according to the second screened service data sequence to be monitored.
The second data matrix may refer to a data matrix containing the second screened service data sequence to be monitored and zero elements.
In practice, first, the executing body may reject the second screened service data sequence to be monitored from the identification information set. And then, the execution main body converts the obtained second screened service data sequence to be monitored after removing the identification information into a matrix. Finally, the execution body fills the part to be predicted in the matrix to be 0, and the second data matrix is obtained.
As an example, the service data sequence to be monitored after the second screening may be "{ id=2: 2, 200, 97, 55, 44;
id=4:10,1000,82,55,11;
id=5: 12 1200000, 86, 6600, 88; }". Thus, the first data matrix may be'
[2,200,97,55,66;
10,1000,82,22,44;
0,0,0,0,88]”。
And step 4, inputting the second data matrix and the to-be-monitored business data feature set into a pre-trained historical virtual article value prediction model to generate the repaired historical virtual article value contained in the repaired data corresponding to the abnormal data.
The historical virtual article value prediction model is used for predicting the historical virtual article value according to the second data matrix and the to-be-monitored business data characteristic set. In practice, the model structure of the historical virtual item value prediction model may multiplex the historical virtual item flow amount prediction model.
The first to third steps are taken as an invention point of the present disclosure, and solve the second technical problem mentioned in the background art, that is, "a large amount of service data often contains noise and abnormal values, and these noise and abnormal values may interfere with the detection performance for detecting abnormal data, so that the abnormal service data cannot be accurately detected and repaired effectively. In practice, large volumes of service data often contain noise and outliers, which may interfere with the performance of the anomaly detection algorithm, resulting in reduced accuracy of anomaly detection and thus an inability to accurately detect and repair the anomalous service data. Based on this, first, the present disclosure selects, from the updated to-be-monitored service data sequence, updated to-be-monitored service data having corresponding identification information smaller than the identification information corresponding to the abnormal data, as candidate to-be-monitored service data, and obtains a candidate to-be-monitored service data sequence. Because the service data is time sequence data and the historical time sequence data can influence the current time sequence data, the method and the device can screen out the historical service data to be monitored by selecting the updated service data to be monitored, of which the corresponding identification information is smaller than the identification information corresponding to the abnormal data, as candidate service data to be monitored. Secondly, because the value of the abnormal historical virtual article and the value of the abnormal historical virtual article represent different data meanings, abnormal values and noise cannot be effectively identified by adopting a merging and clustering mode, and therefore, the method and the device respectively adopt corresponding clustering steps for the value of the abnormal historical virtual article and the value of the abnormal historical virtual article. The method comprises the following steps: the first processing step is executed for the value of the abnormal history virtual article included in the abnormal data, wherein the first processing step is as follows: and a first step of clustering the candidate business data sequences to be monitored and the abnormal historical virtual article values included in the abnormal data to obtain a first clustering result, wherein the first clustering result comprises at least one first clustering cluster. And selecting a first cluster containing the value of the abnormal historical virtual object included in the abnormal data from the at least one first cluster according to the first clustering result, and taking the first cluster as a service data sequence to be monitored after screening. In practice, noise or abnormal values often have the characteristic of outliers, and meanwhile, clustering can realize aggregation of service data to be monitored with similar data characteristics, so that identification of outliers and noise (for example, a service data sequence to be monitored after screening) can be realized by clustering candidate service data sequences to be monitored and abnormal historical virtual article values included in the abnormal data and combining the generated first clustering result. And thirdly, generating a first data matrix according to the service data sequence to be monitored after screening. The service data sequence to be monitored after screening is converted into a first data matrix, so that the service data sequence to be monitored after screening is converted into a model input format of a historical virtual article flow quantity prediction model. And fourthly, inputting the first data matrix and the to-be-monitored service data characteristic set into a pre-trained historical virtual article flow quantity prediction model to generate a repaired historical virtual article flow quantity which is included in the repaired data and corresponds to the abnormal data. Therefore, the data restoration of the abnormal historical virtual article circulation quantity aiming at the abnormality is realized. The following second processing step is executed for the abnormal historical virtual article circulation amount included in the abnormal data: and a first step of clustering the candidate business data sequences to be monitored and the abnormal historical virtual article flow quantity included in the abnormal data to obtain a second clustering result, wherein the second clustering result comprises at least one second clustering cluster. And selecting a second cluster containing the abnormal historical virtual article flow quantity included in the abnormal data from the at least one second cluster according to the second cluster result, and taking the second cluster as a service data sequence to be monitored after second screening. In practice, noise or abnormal values often have the characteristic of outliers, and meanwhile, clustering can realize aggregation of service data to be monitored with similar data characteristics, so that identification of outliers and noise (for example, service data sequences to be monitored after second screening) can be realized by clustering candidate service data sequences to be monitored and abnormal historical virtual article flow amounts included in the abnormal data and combining generated second clustering results. And thirdly, generating a second data matrix according to the service data sequence to be monitored after the second screening. And converting the second filtered business data sequence to be monitored into a second data matrix, so that the second filtered business data sequence to be monitored is converted into a model input format of the historical virtual article value prediction model. And fourthly, inputting the second data matrix and the to-be-monitored business data characteristic set into a pre-trained historical virtual article value prediction model to generate a repaired historical virtual article value contained in the repaired data corresponding to the abnormal data. Therefore, the data restoration of the value of the abnormal historical virtual article aiming at the abnormality is realized. By the method, the interference of noise and abnormal values contained in a large amount of service data on abnormal data detection is avoided, and the abnormal service data is effectively and accurately repaired.
And step 108, generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
In some embodiments, the executing entity may generate the processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
In practice, firstly, the execution body splices the repaired data set and the updated service data sequence to be monitored to obtain a spliced service data set to be monitored, and finally, the execution body rapidly sorts the spliced service data set to be monitored according to the identification information set corresponding to each spliced service data to be monitored in the spliced service data set to obtain the processed service data sequence to be monitored.
The above embodiments of the present disclosure have the following advantageous effects: by the service data processing method of some embodiments of the present disclosure, effective detection of missing service data is achieved, the problem of service data discontinuity caused by directly storing service data reported by subordinate institutions is avoided, and the subsequent data retrieval cost for service data is reduced laterally. Specifically, the reason for increasing the cost of subsequent data retrieval for the service data is that: the service data aiming at time sequence is often stored in warehouse according to the time sequence, but the service data reported by subordinate institutions is directly stored, so that the missing service data is difficult to effectively detect, the problem of discontinuous service data is possibly caused, and the subsequent data retrieval cost aiming at the service data is increased. Based on this, in the service data processing method of some embodiments of the present disclosure, firstly, a to-be-monitored service data set is received, where to-be-monitored service data in the to-be-monitored service data set is service data corresponding to identification information and reported in real time by a subordinate institution. And secondly, updating the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, and obtaining an updated bitmap array. For a large amount of service data, the service data are converted into binary groups for subsequent judgment of whether the service data are continuous, so that the operation efficiency is effectively improved and the consumption of calculation resources is greatly reduced compared with a traversing mode. For example, by storing traversal identifiers in integer (int), for business data of hundreds of millions of levels, memory resources are often consumed, and a large amount of memory resources (e.g., memory or external memory) are consumed by intermediate data generated in the traversal process. And then, determining a target number according to the updated bitmap array, wherein the target number represents the number of array values meeting the first screening condition in the updated bitmap array, and judging whether the data are continuous or not according to the obtained target number. And then, generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, meeting the second screening condition, and the time-series service data are stored in storage according to the time sequence, so that when the data are discontinuous, the subsequent data processing is affected. For example, when the main key of the service data is discontinuous, a corresponding data search statement needs to be additionally designed, and the search cost of the data is increased laterally. Therefore, the method and the device can avoid the problem of increased data retrieval cost of the service data caused by discontinuous data by generating the continuously updated service data sequence to be monitored. Further, according to the pre-trained data feature extraction model and the updated service data sequence to be monitored, a service data feature set to be monitored is generated, so that data features aiming at the service data to be monitored are obtained for repairing abnormal data later. In addition, through an abnormal data detection model trained in advance, carrying out abnormal detection on each updated service data to be monitored in the updated service data sequence to generate abnormal data, and obtaining an abnormal data set. And then, based on the service data characteristic set to be monitored, carrying out data restoration on each abnormal data in the abnormal data set to obtain a restored data set. Thereby realizing data repair for abnormal service data. And finally, generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored. By the method, the problem of discontinuous service data caused by directly storing the service data reported by subordinate institutions is avoided, the data continuity of the service data is ensured, and the subsequent data retrieval cost for the service data is reduced on the side face.
With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a service data processing apparatus, which correspond to those method embodiments shown in fig. 1, and which are particularly applicable in various electronic devices.
As shown in fig. 2, the service data processing apparatus 200 of some embodiments includes: a receiving unit 201, an updating unit 202, a determining unit 203, a first generating unit 204, a second generating unit 205, an abnormality detecting unit 206, a data restoring unit 207, and a third generating unit 208. Wherein, the receiving unit 201 is configured to receive a to-be-monitored service data set, where to-be-monitored service data in the to-be-monitored service data set is service data corresponding to identification information and reported in real time by a subordinate institution; the updating unit 202 is configured to update the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, so as to obtain an updated bitmap array; the determining unit 203 is configured to determine a target number according to the updated bitmap array, where the target number characterizes a number of array values in the updated bitmap array that satisfy a first filtering condition; the first generating unit 204 is configured to generate an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is less than the identification information corresponding to the target service data to be monitored, where the target service data to be monitored is service data to be monitored for which the corresponding identification information in the service data set to be monitored meets a second screening condition; the second generating unit 205 is configured to generate a to-be-monitored service data feature set according to the pre-trained data feature extraction model and the updated to-be-monitored service data sequence; the anomaly detection unit 206 is configured to perform anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data through a pre-trained anomaly data detection model, so as to obtain an anomaly data set; the data repairing unit 207 is configured to perform data repairing on each abnormal data in the abnormal data set based on the to-be-monitored service data feature set, so as to obtain a repaired data set; the third generating unit 208 is configured to generate a processed traffic data sequence to be monitored according to the repaired data set and the updated traffic data sequence to be monitored.
It will be appreciated that the elements recited in the business data processing apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and advantages described above with respect to the method are equally applicable to the service data processing apparatus 200 and the units contained therein, and are not described herein.
Referring now to fig. 3, a schematic diagram of an electronic device (e.g., computing device) 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with programs stored in a read-only memory 302 or programs loaded from a storage 308 into a random access memory 303. In the random access memory 303, various programs and data necessary for the operation of the electronic device 300 are also stored. The processing means 301, the read only memory 302 and the random access memory 303 are connected to each other by a bus 304. An input/output interface 305 is also connected to the bus 304.
In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from read only memory 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.
It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a service data set to be monitored, wherein the service data to be monitored in the service data set to be monitored is service data which is reported by subordinate institutions in real time and corresponds to identification information; according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, updating the array value of the initial bitmap array to obtain an updated bitmap array; determining a target number according to the updated bitmap array, wherein the target number represents the number of array values meeting a first screening condition in the updated bitmap array; generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, which meets a second screening condition; generating a service data feature set to be monitored according to a pre-trained data feature extraction model and the updated service data sequence to be monitored; performing anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data by using an anomaly data detection model trained in advance, so as to obtain an anomaly data set; based on the service data feature set to be monitored, carrying out data restoration on each abnormal data in the abnormal data set to obtain a restored data set; and generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a receiving unit, an updating unit, a determining unit, a first generating unit, a second generating unit, an abnormality detecting unit, a data repairing unit, and a third generating unit. The names of these units do not in any way constitute a limitation of the unit itself, for example the receiving unit may also be described as "unit receiving the set of traffic data to be monitored".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (8)

1. A business data processing method, comprising:
receiving a to-be-monitored service data set, wherein the to-be-monitored service data in the to-be-monitored service data set is service data which is reported by subordinate institutions in real time and corresponds to identification information;
according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, updating the array value of the initial bitmap array to obtain an updated bitmap array;
determining a target number according to the updated bitmap array, wherein the target number represents the number of array values meeting a first screening condition in the updated bitmap array;
generating an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, which meets a second screening condition;
generating a service data feature set to be monitored according to a pre-trained data feature extraction model and the updated service data sequence to be monitored;
Performing anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data by using an anomaly data detection model trained in advance, so as to obtain an anomaly data set;
based on the service data feature set to be monitored, carrying out data restoration on each abnormal data in the abnormal data set to obtain a restored data set;
and generating a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
2. The method of claim 1, wherein updating the array value of the initial bitmap array according to the identification information corresponding to each service data to be monitored in the service data set to be monitored to obtain an updated bitmap array, comprises:
and updating an array value matched with the identification information corresponding to the service data to be monitored in the initial bitmap array to a preset array value in the updated bitmap array for each service data to be monitored in the service data set to be monitored.
3. The method of claim 2, wherein the generating an updated service data sequence to be monitored from the service data set to be monitored comprises:
Sequencing each service data to be monitored in the service data set to be monitored to obtain a sequenced service data sequence to be monitored;
confirming an abnormal identification information sequence based on the ordered service data sequence to be monitored;
the abnormal identification information sequence is sent to the subordinate mechanism so that the subordinate mechanism can carry out data feeding according to the abnormal identification information sequence;
in response to receiving the candidate to-be-monitored service data sequence which is uploaded by the subordinate mechanism and aims at the abnormal identification information sequence, splicing the sequenced to-be-monitored service data sequence and the candidate to-be-monitored service data sequence to obtain a spliced to-be-monitored service data sequence;
and sequencing the spliced service data sequences to be monitored to obtain the updated service data sequences to be monitored.
4. A method according to claim 3, wherein the updated traffic data to be monitored in the updated traffic data sequence to be monitored comprises: and the data characteristic extraction model comprises the following components of historical virtual article value and historical virtual article circulation quantity: a historical virtual article value feature extraction model, a historical virtual article flow quantity feature extraction model and a feature fusion model; and
The generating a to-be-monitored service data feature set according to the pre-trained data feature extraction model and the updated to-be-monitored service data sequence comprises the following steps:
performing data smoothing on the historical virtual article value included in each updated service data to be monitored in the updated service data sequence to be monitored to obtain a smoothed historical virtual article value sequence;
performing data conversion processing on the historical virtual article circulation quantity included in each updated service data to be monitored in the updated service data sequence to obtain a data converted historical virtual article circulation quantity sequence;
inputting the smoothed historical virtual article value sequence into the historical virtual article value feature extraction model to generate a first data feature;
extracting data features of the historical virtual article circulation quantity sequence after data conversion through the historical virtual article circulation quantity feature extraction model to generate second data features, wherein the first data features and the second data features are time sequence features;
and inputting the first data characteristic and the second data characteristic into the characteristic fusion model to generate the service data characteristic set to be monitored.
5. The method of claim 4, wherein the anomaly data detection model comprises: a numerical data anomaly detection model and a discrete data anomaly detection model, the anomaly data in the anomaly data set comprising: abnormal historical virtual article value and abnormal historical virtual article circulation quantity; and
the performing anomaly detection on each updated service data to be monitored in the updated service data sequence to generate anomaly data through a pre-trained anomaly data detection model comprises the following steps:
performing first data cleaning treatment on the historical virtual article value included in the updated service data to be monitored to obtain service data to be monitored after cleaning;
performing second data cleaning treatment on the historical virtual article circulation quantity included in the service data to be monitored after cleaning to obtain service data to be monitored after secondary cleaning;
inputting the service data to be monitored after the secondary cleaning into the numerical data anomaly detection model to obtain the value of the anomaly history virtual article included in the anomaly data corresponding to the updated service data to be monitored;
and detecting abnormal data of the service data to be monitored after the secondary cleaning by the discrete data abnormal detection model so as to generate abnormal historical virtual article circulation quantity included in the abnormal data corresponding to the updated service data to be monitored.
6. A traffic data processing apparatus comprising:
the receiving unit is configured to receive a to-be-monitored service data set, wherein the to-be-monitored service data in the to-be-monitored service data set is service data which is reported by subordinate institutions in real time and corresponds to identification information;
the updating unit is configured to update the array value of the initial bitmap array according to the identification information corresponding to the service data to be monitored in the service data set to be monitored, so as to obtain an updated bitmap array;
a determining unit configured to determine a target number according to the updated bitmap array, wherein the target number characterizes the number of array values satisfying a first filtering condition in the updated bitmap array;
the first generation unit is configured to generate an updated service data sequence to be monitored according to the service data set to be monitored in response to determining that the target number is smaller than the identification information corresponding to the target service data to be monitored, wherein the target service data to be monitored is the service data to be monitored, corresponding to the identification information in the service data set to be monitored, which meets a second screening condition;
the second generation unit is configured to generate a service data feature set to be monitored according to a pre-trained data feature extraction model and the updated service data sequence to be monitored;
The anomaly detection unit is configured to perform anomaly detection on each updated service data to be monitored in the updated service data sequence through a pre-trained anomaly data detection model so as to generate anomaly data and obtain an anomaly data set;
the data restoration unit is configured to restore the data of each abnormal data in the abnormal data set based on the service data feature set to be monitored to obtain a restored data set;
and the third generating unit is configured to generate a processed service data sequence to be monitored according to the repaired data set and the updated service data sequence to be monitored.
7. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1 to 5.
8. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 5.
CN202311453968.5A 2023-11-03 2023-11-03 Service data processing method, device, electronic equipment and computer readable medium Pending CN117520410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311453968.5A CN117520410A (en) 2023-11-03 2023-11-03 Service data processing method, device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311453968.5A CN117520410A (en) 2023-11-03 2023-11-03 Service data processing method, device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN117520410A true CN117520410A (en) 2024-02-06

Family

ID=89741003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311453968.5A Pending CN117520410A (en) 2023-11-03 2023-11-03 Service data processing method, device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN117520410A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222267A (en) * 2019-06-06 2019-09-10 中山大学 A kind of gaming platform information-pushing method, system, storage medium and equipment
CN112084056A (en) * 2020-08-25 2020-12-15 腾讯科技(深圳)有限公司 Abnormality detection method, apparatus, device and storage medium
CN112463783A (en) * 2020-12-04 2021-03-09 广州品唯软件有限公司 Index data monitoring method and device, computer equipment and storage medium
CN113468226A (en) * 2021-06-24 2021-10-01 北京达佳互联信息技术有限公司 Service processing method, device, electronic equipment and storage medium
CN113900886A (en) * 2021-09-06 2022-01-07 浪潮软件股份有限公司 Abnormal log monitoring method
CN115062079A (en) * 2022-06-21 2022-09-16 平安银行股份有限公司 Transaction flow processing method, device, equipment and storage medium
WO2023045829A1 (en) * 2021-09-24 2023-03-30 中兴通讯股份有限公司 Service abnormality prediction method and device, storage medium, and electronic device
CN116720202A (en) * 2023-05-19 2023-09-08 国网物资有限公司 Service information detection method, device, electronic equipment and computer readable medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222267A (en) * 2019-06-06 2019-09-10 中山大学 A kind of gaming platform information-pushing method, system, storage medium and equipment
CN112084056A (en) * 2020-08-25 2020-12-15 腾讯科技(深圳)有限公司 Abnormality detection method, apparatus, device and storage medium
CN112463783A (en) * 2020-12-04 2021-03-09 广州品唯软件有限公司 Index data monitoring method and device, computer equipment and storage medium
CN113468226A (en) * 2021-06-24 2021-10-01 北京达佳互联信息技术有限公司 Service processing method, device, electronic equipment and storage medium
CN113900886A (en) * 2021-09-06 2022-01-07 浪潮软件股份有限公司 Abnormal log monitoring method
WO2023045829A1 (en) * 2021-09-24 2023-03-30 中兴通讯股份有限公司 Service abnormality prediction method and device, storage medium, and electronic device
CN115062079A (en) * 2022-06-21 2022-09-16 平安银行股份有限公司 Transaction flow processing method, device, equipment and storage medium
CN116720202A (en) * 2023-05-19 2023-09-08 国网物资有限公司 Service information detection method, device, electronic equipment and computer readable medium

Similar Documents

Publication Publication Date Title
CN115085196B (en) Power load predicted value determination method, device, equipment and computer readable medium
CN114202130A (en) Flow transfer amount prediction multitask model generation method, scheduling method, device and equipment
CN114780338A (en) Host information processing method and device, electronic equipment and computer readable medium
CN116562600B (en) Water supply control method, device, electronic equipment and computer readable medium
CN116388112B (en) Abnormal supply end power-off method, device, electronic equipment and computer readable medium
CN113255950B (en) Method and device for optimizing logistics network
CN117520410A (en) Service data processing method, device, electronic equipment and computer readable medium
CN114639072A (en) People flow information generation method and device, electronic equipment and computer readable medium
CN117235535B (en) Abnormal supply end power-off method and device, electronic equipment and medium
CN110633596A (en) Method and device for predicting vehicle direction angle
CN116703263B (en) Power equipment distribution method, device, electronic equipment and computer readable medium
CN116881097B (en) User terminal alarm method, device, electronic equipment and computer readable medium
CN115393652B (en) Artificial intelligence model updating method, identification method and equipment based on countermeasure network
CN116862319B (en) Power index information generation method, device, electronic equipment and medium
CN116800834B (en) Virtual gift merging method, device, electronic equipment and computer readable medium
CN115034769B (en) Power information generation method and device, electronic equipment and computer readable medium
CN116186020B (en) Feature information processing method, device, electronic equipment and computer readable medium
CN116894163B (en) Charging and discharging facility load prediction information generation method and device based on information security
CN111767085B (en) Storm platform parameter configuration method and apparatus
CN113095756B (en) Abnormal waybill feature determination method, device, equipment and computer readable medium
CN116451168B (en) Abnormal power information generation method, device, electronic equipment and readable medium
CN111582482B (en) Method, apparatus, device and medium for generating network model information
CN117196199A (en) Article scheduling method, apparatus, electronic device and computer readable medium
Wang et al. Multi-Agent Systems for Collaborative Inference Based on Deep Policy Q-Inference Network
CN117993709A (en) Value risk information generation method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination