CN112671932B

CN112671932B - Data processing method based on big data and cloud computing node

Info

Publication number: CN112671932B
Application number: CN202110099666.7A
Authority: CN
Inventors: 梁志彬
Original assignee: Zhonglin Yunxin Shanghai Network Technology Co Ltd
Current assignee: Zhonglin Yunxin (Shanghai) Network Technology Co., Ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2021-12-03
Anticipated expiration: 2041-01-25
Also published as: CN112671932A

Abstract

The application provides a data processing method based on big data and cloud computing and a cloud computing node, and relates to the technical field of cloud computing, aiming at an obtained data processing request, a first data processing cache space is established at a first cloud computing node, a second data processing cache space is established at a second cloud computing node, and a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space are established, so that the first cloud computing node and the second cloud computing node carry out access data interaction through the data transmission link and the data transmission signaling, and an access data processing result between the first cloud computing node and a client is computed based on the interacted access data; therefore, the safety of data access processing can be improved.

Description

Data processing method based on big data and cloud computing node

Technical Field

The application relates to the technical field of cloud computing, in particular to a data processing method based on big data and cloud computing and a cloud computing node.

Background

With the development of the internet and cloud computing technology, service providers can provide services such as taxi taking, shopping and live broadcasting for users through developed service platforms, so that the life of people is facilitated.

The service provider can adopt a cloud computing technology, and a plurality of devices are respectively used as cloud computing nodes, so that a service network is constructed, and a large amount of computing power is provided.

However, in the prior art, access data between different cloud computing nodes is often processed only simply, and the security is poor.

Disclosure of Invention

The present application aims to provide a data processing method based on big data and cloud computing and a cloud computing node, so as to solve at least some of the above technical problems.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

in a first aspect, the present application provides a data processing method based on big data and cloud computing, where the method includes: acquiring a data processing request aiming at access data; the data processing request indicates that a second cloud computing node which needs to perform access data interaction with the first cloud computing node exists; according to the data processing request, a first data processing cache space is created in a first cloud computing node, and a second data processing cache space is created in the local of a second cloud computing node by a second cloud computing node; when the space configuration information of the first data processing cache space and the second data processing cache space meets a pre-configured cache control strategy, constructing a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space; and performing access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and computing an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

In a second aspect, the present application provides a cloud computing node comprising a memory for storing one or more programs; a processor; when the one or more programs are executed by the processor, a big data and cloud computing-based data processing method as described above is implemented.

According to the data processing method based on big data and cloud computing and the cloud computing node, aiming at an obtained data processing request, a first data processing cache space is created in a first cloud computing node, a second data processing cache space is created in a second cloud computing node, and a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space are established, so that the first cloud computing node and the second cloud computing node perform access data interaction through the data transmission link and the data transmission signaling, and an access data processing result between the first cloud computing node and a client is computed based on the interacted access data; according to the scheme, the data processing cache spaces are respectively created in the first cloud computing node and the second cloud computing node, and the data transmission link and the data transmission signaling between the first data processing cache space and the second data processing cache space are constructed, so that the safety of data access processing can be improved.

Drawings

Fig. 1 is a structural block diagram of a cloud computing node provided in the present application.

Fig. 2 is a flowchart of a data processing method based on big data and cloud computing according to the present application.

Fig. 3 is another flowchart of a data processing method based on big data and cloud computing according to the present application.

Fig. 4 is a flowchart of a data processing apparatus based on big data and cloud computing according to the present application.

Detailed Description

It can be understood that the data processing method based on big data and cloud computing provided by the application can be applied to various scenes, such as an urban intelligent cluster system, a large-scale new energy automobile management system, a large-scale network platform, a cloud game system, a live network server system, a live network shopping system, a cloud office system utilizing cloud computing technology, a financial data management system utilizing cloud computing technology, and the like.

Referring to fig. 1, fig. 1 is a structural block diagram of a cloud computing node 100 provided in the present application, where the cloud computing node 100 includes a memory 101, a processor 102, and a communication interface 103, and the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other directly or indirectly to implement data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 101 may be used to store software programs and modules, and the processor 102 executes the software programs and modules stored in the memory 101 to execute various functional applications and data processing, so as to execute the steps of the data processing method based on big data and cloud computing provided by the present application. The communication interface 103 may be used for communicating signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

Referring to fig. 2, fig. 2 is a flowchart of a data processing method based on big data and cloud computing according to the present application, where the data processing method includes the following steps:

s210, acquiring a data processing request aiming at the access data; and the data processing request indicates that a second cloud computing node which needs to perform access data interaction with the first cloud computing node exists.

In this embodiment, taking the first cloud computing node as an execution subject, the first cloud computing node may receive a data processing request sent by a second cloud computing node, where the data processing request is used to instruct to process access data of the first cloud computing node.

And S220, according to the data processing request, creating a first data processing cache space at a first cloud computing node, and enabling a second cloud computing node to create a second data processing cache space at the local of the second cloud computing node.

In this embodiment, in response to the data processing request, the first cloud computing node and the second cloud computing node may respectively create a data processing cache space locally, that is, the first cloud computing node creates a first data processing cache space locally, the second cloud computing node creates a second data processing cache space locally, and both the first data processing cache space and the second data processing cache space are used for processing data.

S230, when the space configuration information of the first data processing cache space and the second data processing cache space meets a pre-configured cache control policy, constructing a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space.

In this embodiment, when the first data processing cache space and the second data processing cache space are created, the first cloud computing node and the second cloud computing node may respectively perform verification on the first data processing cache space and the second data processing cache space, for example, perform security verification, stability verification, and the like by using a preconfigured policy, until when the spatial configuration information of the first data processing cache space and the second data processing cache space meets the preconfigured cache control policy, a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space are constructed.

And S240, performing access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and computing an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

In this embodiment, the first cloud computing node may perform access data interaction with the second cloud computing node based on the created data transmission link and the created data transmission signaling, so as to compute an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

Therefore, according to the technical scheme provided by the application, for the acquired data processing request, a first data processing cache space is created in the first cloud computing node, a second data processing cache space is created in the second cloud computing node, and a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space are constructed, so that the first cloud computing node and the second cloud computing node perform access data interaction through the data transmission link and the data transmission signaling, and an access data processing result between the first cloud computing node and the client is calculated based on the interacted access data; according to the scheme, the data processing cache spaces are respectively created in the first cloud computing node and the second cloud computing node, and the data transmission link and the data transmission signaling between the first data processing cache space and the second data processing cache space are constructed, so that the safety of data access processing can be improved.

In this embodiment, when the first cloud computing node executes step S240 to obtain an access data processing result, the first cloud computing node may exchange metadata information of access data stored in the first cloud computing node with the second cloud computing node through the data transmission link; wherein the metadata information comprises a packet size of first access data of a first cloud computing node and a packet size of second access data of a second cloud computing node; then, comparing the data packet size of the second access data with the data packet size of the first access data to determine a data management mode of the second cloud computing node; and then, according to the data transmission link, the first data processing cache space and the data transmission signaling, performing access data interaction with the second cloud computing node by adopting an access data processing strategy corresponding to the data management mode of the second cloud computing node, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data. Therefore, targeted processing can be performed on the second cloud computing node in different data management modes, and the reliability of data access processing is improved.

When the data management mode of the second cloud computing node is determined by comparing the packet size of the second access data with the packet size of the first access data, the first cloud computing node may first calculate a packet difference ratio between the packet size of the second access data and the packet size of the first access data of the first cloud computing node; on one hand, when the data packet difference ratio does not exceed a preset difference ratio threshold, determining that the data management mode of the second cloud computing node is a parallel computing mode, for example, the parallel computing mode may be used to instruct the second cloud computing node to compute with the same computing power as the first cloud computing node; on the other hand, when the data packet difference ratio exceeds the preset difference ratio threshold, the data management mode of the second cloud computing node is determined to be the difference computing mode. For example, the differential computing mode may be used to instruct the second cloud computing node to compute with a different computational power than the first cloud computing node.

In addition, in this embodiment, when the first cloud computing node performs access data interaction with the second cloud computing node by using an access data processing policy corresponding to the data management mode of the second cloud computing node according to the data transmission link, the first data processing cache space and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, a target data interaction unit for performing access data interaction with the first cloud computing node may be determined in the second cloud computing node according to the data management mode of the second cloud computing node, and an access data processing policy of the target data interaction unit may be determined; then, according to the access data processing strategy and the data transmission signaling, performing access data interaction on the first access data and second access data of the target data interaction unit through the data transmission link and the first data processing cache space, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

When the first cloud computing node determines a target data interaction unit for performing access data interaction with the first cloud computing node in the second cloud computing node according to the data management mode of the second cloud computing node and determines an access data processing strategy of the target data interaction unit, on one hand, when the data management mode of the second cloud computing node is a parallel computing mode, the data interaction unit used by the second cloud computing node in the parallel computing mode is used as the target data interaction unit for performing access data interaction with the first cloud computing node, and the access data processing strategy of the target data interaction unit is determined to be parallel access data interaction; on the other hand, when the data management mode of the second cloud computing node is the difference computing mode, determining a preset number of data interaction units as the target data interaction units from all data interaction units used by the second cloud computing node in the difference computing mode, and determining an access data processing strategy of the target data interaction units as difference access data interaction; on the other hand, when the data management mode of the second cloud computing node includes the parallel computing mode and the difference computing mode, a preset number of data interaction units are determined from all data interaction units used by the second cloud computing node in the difference computing mode, the preset number of data interaction units and all data interaction units used by the second cloud computing node in the parallel computing mode are used as the target data interaction units, and an access data processing strategy of the target data interaction units is determined to be mixed access data interaction; and after the parallel access data interaction is finished, performing difference access data interaction in the difference calculation mode.

In addition, in this embodiment, when the first cloud computing node performs access data interaction between the first access data and the second access data of the target data interaction unit through the data transmission link and the first data processing cache space according to the access data processing policy and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, on one hand, when the access data processing policy is parallel access data interaction, the first access data with a preset data size and the second access data in the parallel computing mode are performed access data interaction through the data transmission link and the data transmission signaling, and the association degree between the first access data and the second access data in the parallel computing mode is calculated in the first data processing cache space, to process results as access data between the first cloud computing node and the second cloud computing node; on the other hand, when the access data processing strategy is difference access data interaction, receiving to-be-computed access data sent by the second cloud computing node in the difference computing mode, and computing an access data processing result between the first cloud computing node and the second cloud computing node according to the to-be-computed access data; on the other hand, when the access data processing strategy is hybrid access data interaction, access data interaction is performed on the first access data with a preset data volume and the second access data in the parallel computing mode, the association degree between the first access data and the second access data in the parallel computing mode is obtained through calculation in the first data processing cache space, an intermediate access processing result is obtained, the second access data sent by the second cloud computing node in the differential computing mode is received, and an access data processing result between the first cloud computing node and the second cloud computing node is calculated according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode. Therefore, adaptive processing is performed on the second cloud computing nodes in different modes, and flexibility of data access processing is improved.

It is to be appreciated that in some scenarios, the data transmission signaling includes cross check information, link check information, and data encryption check information; based on this, the first cloud computing node performs access data interaction on the first access data with a preset data volume and the second access data in the parallel computing mode through the data transmission link and the data transmission signaling, and calculating the association degree between the first access data and the second access data in the parallel calculation mode in the first data processing cache space, as a result of processing the access data between the first cloud computing node and the second cloud computing node, according to metadata information locally stored by the first cloud computing node in the parallel computing mode, determining the size of a preset data volume for performing access data interaction by the first cloud computing node in the parallel computing mode and the number of storage units of a data storage unit for storing access data; then, creating first data storage units corresponding to the number of the storage units in a cache area except the first data processing cache space in the cache space of the first cloud computing node; then, checking the interactive checking information to obtain an interactive checking result; wherein the interaction verification result keeps an activated state in one access data interaction; then, data marking is carried out on the first access data by adopting the interactive verification result; next, performing data segmentation on the first cloud computing node access data marked by the data to obtain sub access data corresponding to the number of the storage units, and determining a storage unit serial number of each sub access data stored in the first data storage unit; then, storing each sub-access data into a first data storage unit corresponding to the storage unit serial number; determining first access data corresponding to the preset data size in all the first data storage units as target first access data, and storing the target first access data to the first data processing cache space; then, data marking is carried out on the target first access data in the first data processing cache space by adopting the link verification information and the data encryption verification information, and the target first access data after data marking is sent to a second cloud computing node in the parallel computing mode through the data transmission link; next, receiving target second access data sent by a second cloud computing node in the parallel computing mode, and performing decryption verification on the target second access data by using the data encryption verification information in the first data processing cache space; then, when the decryption verification of the target second access data passes, data marking is carried out on the target second access data by adopting the link verification information; next, calculating the association degree between the target second access data marked by the data and the remaining first access data which are not sent to the second cloud computing node in the parallel computing mode to obtain a first intermediate access processing result; then, sending the first intermediate access processing result to a second cloud computing node in the parallel computing mode, and receiving a second intermediate access processing result sent by the second cloud computing node in the parallel computing mode; the second intermediate access processing result is used for indicating the association degree between the remaining second access data which are calculated by the second cloud computing node in the parallel computing mode and are not sent to the first cloud computing node and the target first access data; and then, combining the first intermediate access processing result and the second intermediate access processing result to obtain an access data processing result between the first cloud computing node and the second cloud computing node. Therefore, the access data processing result between the first cloud computing node and the second cloud computing node can be comprehensively calculated, and the reliability of the access data processing result is improved.

Additionally, in some embodiments, the access data to be computed includes a first access data to be computed and a second access data to be computed; the data transmission signaling comprises data encryption verification information, link verification information and interaction verification information. Based on this, in the process that the first cloud computing node receives the to-be-computed access data sent by the second cloud computing node in the differential computing mode and computes the processing result of the access data between the first cloud computing node and the second cloud computing node according to the to-be-computed access data, when the size of a data packet of the first access data is larger than that of the second access data of the second cloud computing node in the differential computing mode, the first to-be-computed access data sent by the second cloud computing node in the differential computing mode is received, and decryption verification is performed on the first to-be-computed access data in the first data processing cache space by using the data encryption verification information; when the decryption verification of the first to-be-calculated access data passes, performing data marking on the first to-be-calculated access data by adopting the link verification information in the first data processing cache space; obtaining a target access data processing result by the association degree between the first to-be-calculated access data with the first access data after the data mark is calculated in the first data processing cache space, and taking the target access data processing result as an access data processing result between the first cloud computing node and the second cloud computing node; the first to-be-computed access data is second access data obtained after data cleaning is carried out on the second cloud computing node according to a preset filtering strategy in the difference computing mode; otherwise, when the data packet size of the first access data is smaller than or equal to the data packet size of the second access data in the difference calculation mode, data marking is carried out on the first access data by adopting the interactive verification information, and the first access data after data marking is stored in a data storage unit; then, storing the data storage unit to the first data processing cache space, and performing data marking on the first access data in the data storage unit by using the link verification information and the data encryption verification information to obtain first access data after data marking; then, sending the marked first access data to a second cloud computing node in the differential computing mode, so that the second cloud computing node in the differential computing mode combines the marked first access data with second access data stored by the second cloud computing node to obtain second access data to be computed; and then, receiving the second to-be-computed access data sent by the second cloud computing node in the differential computing mode, and using the second to-be-computed access data as an access data processing result between the first cloud computing node and the second cloud computing node.

In addition, in some embodiments, when the first cloud computing node performs access data interaction between the first access data with a preset data size and the second access data in the parallel computing mode, and calculates in the first data processing cache space to obtain a correlation degree between the first access data and the second access data in the parallel computing mode, to obtain an intermediate access processing result, and receives the second access data sent by the second cloud computing node in the differential computing mode, and calculates an access data processing result between the first cloud computing node and the second cloud computing node according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode, the first cloud computing node may determine first target access data to be computed for access data interaction in the first access data first, data marking is carried out on the first target access data to be calculated by adopting the data transmission signaling; then, the first target to-be-calculated access data marked by the data are sent to a second cloud computing node in the parallel computing mode through the data transmission link, and second target to-be-calculated access data sent by the second cloud computing node in the parallel computing mode are received; next, calculating the association degree between the second target data to be accessed to be calculated and other data to be accessed, except the first target data to be accessed, in the first data processing cache space, to obtain an intermediate access processing result between the first data to be accessed and second data to be accessed, which is sent by a second cloud computing node in the parallel computing mode; then, when the data packet size of the access data in the intermediate access processing result is larger than the data packet size of the second access data in the differential computing mode, receiving the current second access data sent by the second cloud computing node in the differential computing mode, calculating the association degree between the intermediate access processing result and the current second access data to obtain a current access data processing result, and then taking the current access data processing result as the access data processing result between the first cloud computing node and the second cloud computing node; then, when the size of a data packet of access data in the intermediate access processing result is smaller than or larger than the size of a data packet of second access data of a second cloud computing node in the difference computing mode, according to the data transmission signaling, marking the intermediate access processing result, sending the marked intermediate access processing result to the second cloud computing node in the difference computing mode, receiving an access data association result sent by the second cloud computing node in the difference computing mode after calculating the association degree for the intermediate access processing result, and taking the access data association result as an access data processing result between the first cloud computing node and the second cloud computing node. Thus, according to the scheme provided by the application, the reliability of the access data processing result can be improved.

In addition, based on the access data processing result obtained by the above scheme provided by the application, the first cloud computing node may further store the access data processing result.

Therefore, as an embodiment, as shown in fig. 3, the data processing method described above may further include the following steps:

and S310, packaging the access data processing result to obtain a target data packet to be stored in a storage.

And S320, respectively performing service data extraction and protocol data extraction on the plurality of data blocks in the target data packet to obtain a service data sequence and a protocol data sequence.

In this embodiment, all data in the target database may be divided into a plurality of data blocks according to the received time sequence, and each data block may include service data and protocol data. For example, the service data may be a surveillance video code stream shot by a surveillance camera, or goods sales data generated by an intelligent shelf; the Protocol data may be a heartbeat message or an ARP (Address Resolution Protocol) message.

In this embodiment, in the service data sequence and the protocol data sequence obtained by performing step S320, the service data sequence includes all service data in the corresponding data block, and the protocol data sequence includes all protocol data in the corresponding data block.

S330, extracting first key information of the service data sequence to obtain a first key information data set comprising service data.

And S340, performing second key information extraction on the protocol data sequence through a second key information extraction strategy to obtain a second key information data set comprising the protocol data.

In this embodiment, for the extracted service data sequence and the extracted protocol data sequence, key information extraction is performed on the service data sequence and the protocol data sequence respectively through a first key information extraction policy configured for the service data in advance and a second key information extraction policy configured for the protocol data in advance, so as to obtain a first key information data set and a second key information data set respectively. It is understood that the first key information data set obtained based on the step S330 is a data set including service data, and the second key information data set obtained based on the step S340 is a data set including protocol data.

And S350, performing target information matching based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet.

In this embodiment, the target extraction content may be extraction content input by a user, that is: the target extraction content represents the content which needs to be extracted by the user; wherein the target extraction content includes at least one of service data and protocol data, that is: the user can extract a part of service data, a part of protocol data, a part of service data and a part of protocol data. In this way, after the first key information data set and the second key information data set are obtained, the target extraction data set corresponding to the target extraction content in the target data packet is obtained by performing target information matching on the first key information data set and the second key information data set based on the target extraction content, and then the target data packet may be put into storage based on the target extraction data set, for example, the target data packet may be compressed and encrypted by using a hash value corresponding to the target extraction data set as a compression key, so that the compressed and encrypted target data packet is put into storage.

Therefore, by adopting the technical scheme provided by the application, the service data sequence and the protocol data sequence are obtained by respectively extracting the service data and the protocol data of the plurality of data blocks of the target data packet to be stored in the storage, namely, the data blocks in the target data packet are classified into the service data class and the protocol data class; then, respectively extracting key information of the service data sequence and the protocol data sequence through a first key information extraction strategy and a second key information extraction strategy to obtain a first key information data set containing service data and a second key information data set containing protocol data; then, target information matching is carried out on the basis of the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content, and therefore a target data packet is stored in a warehouse on the basis of the target extraction data set; compared with the prior art, the safety of data packet storage can be improved.

As an embodiment, when the service data sequence and the protocol data sequence are extracted in step S320, in order to improve the accuracy of the service data and the protocol data, the following scheme may be adopted: firstly, respectively extracting service data from a plurality of data blocks in the target data packet to obtain service data extraction windows in the data blocks and initial service data corresponding to the service data extraction windows, for example, in this embodiment, each service data extraction window may be an extraction window corresponding to service data, such as video data, audio data, and the like; then, determining a service data sequence based on the service data extraction window and corresponding initial service data in each data block; then, respectively carrying out protocol data identification on a plurality of data blocks in the target data packet to obtain protocol data contents corresponding to the data blocks; then, respectively identifying protocol types of a plurality of data blocks in the target data packet to obtain target protocol types followed by the data blocks; then, associating the protocol data content with the target protocol type; and then, extracting protocol data based on a data packet of a target protocol type associated with preset target protocol data content in the target data packet to obtain a protocol data sequence.

In addition, as an embodiment, when step 330 is executed to extract the first key information data set, for each data block corresponding to the service data sequence, when the number of the initial service data of the data block is at least two, a service flag value of each initial service data may be obtained; the service marking value is used for indicating the counted times of the corresponding service data in a preset time interval; on one hand, when the initial service data with the highest service tag value is one, the initial service data with the highest service tag value is used as the target service data of the corresponding data block; on the other hand, when the number of the initial service data with the highest service mark value is at least two, acquiring the service type priority of the corresponding service data extraction window aiming at the initial service data with the highest service mark value; next, determining target service data corresponding to the corresponding data block according to the initial service data corresponding to the service type priority with the highest corresponding priority; then, for each data block, acquiring a target window ratio of a service data extraction window corresponding to corresponding target service data in each data block; the target window ratio is used for indicating the ratio of the length of the corresponding service data extraction window to the length of all the service data extraction windows; next, when the target window ratio is within a preset window ratio interval, retaining a corresponding service data extraction result; the reserved service data extraction result comprises a service data extraction window and target service data corresponding to the service data extraction window; then, when the target window ratio is not within the preset window ratio interval, setting the service data extraction result of the corresponding data block as an empty service data set; next, obtaining an updated service data sequence based on the service data extraction result corresponding to each data block; then, carrying out timestamp label identification on the updated service data sequence to obtain multiple groups of service initial data and service end data; next, determining the service data duration between each group of service initial data and service end data; then, when the service data duration is greater than or equal to a first set duration threshold, taking a key information data set formed by service start data and service end data of a corresponding group as a first alternative key information data set; next, for each first alternative key information data set, determining the feature service category with the largest occurrence count according to the updated target service data corresponding to each data block in the first alternative key information data set; then, the characteristic service category is used as a service category to which the service data included in the corresponding first candidate key information data set belongs; then, determining the service category to which each first alternative key information data set belongs; and then, when at least two first candidate key information data sets which are adjacent in time sequence all belong to the same service category, combining the at least two first candidate key information data sets to obtain a first key information data set corresponding to the same service category. Therefore, by adopting the scheme provided by the application, the first key information data set can be accurately extracted, and the pollution of dirty data is avoided.

In some scenarios, the service data extraction result in the service data sequence includes an empty service data set and a non-empty service data set; namely: some of the traffic data sequences are extracted as empty sets without traffic data.

Based on this, when performing timestamp label identification on the updated service data sequence to obtain multiple sets of service start data and service end data, a data block corresponding to a first non-empty service data set in a current identification cycle in the updated service data sequence may be used as the service start data of a current set; then, traversing the data block behind the service initial data of the current group; when the service data extraction result corresponding to the traversed current data block is an empty service data set and the service data extraction results corresponding to the data blocks within a second set time length threshold from the current data block are all empty service data sets, taking the current data block as the service end data of the current group; and then, taking a data block corresponding to a first non-empty service data set after the service end data of the current group as service start data of the current group of next identification cycle, and returning the step of traversing the data block after the service start data of the current group to continue execution until obtaining multiple groups of service start data and service end data.

In addition, in order to improve the integrity of data processing and avoid data omission, when the service data extraction result corresponding to the traversed current data block is an empty service data set and the service data extraction results corresponding to the data blocks within the second set duration threshold from the current data block are both empty service data sets, before the current data block is taken as the service end data of the current group, when the key information data set duration determined by the traversed current data block and the service start data of the current group is less than a third set duration threshold, whether the service data extraction result corresponding to the current data block is an empty service data set may be determined first; when the service data extraction result corresponding to the current data block is a non-empty service data set, taking the current data block as one of the key information data sets corresponding to the current group; then, when the current data block corresponds to an empty service data set and a service data extraction result within a second set duration threshold from the current data block includes a non-empty service data set, taking a data block corresponding to a first non-empty service data set within the second set duration threshold from the current data block as a traversed next current data block, and returning to the step of determining whether the service data extraction result corresponding to the current data block is an empty service data set when the duration of a key information data set determined by the traversed current data block and the service start data of the current group is less than a third set duration threshold.

Optionally, in this embodiment, when a data block corresponding to a first non-empty service data set in a current identification cycle in the updated service data sequence is used as service start data of a current group, a target data packet corresponding to the first non-empty service data set in the current identification cycle in the updated service data sequence may be obtained first; then, when the service data extraction result corresponding to the next data packet of the target data packet is an empty service data set, setting the service data extraction result corresponding to the target data packet as an empty service data set; or, when the service data extraction result corresponding to the next data packet of the target data packet is a non-empty service data set, using the target data packet as the service start data of the current group.

In addition, in this embodiment, as an implementation manner, when performing second key information extraction on the protocol data sequence through a second key information extraction policy to obtain a second key information data set including protocol data, timestamp tag identification may be performed on each protocol data in the protocol data sequence to obtain a plurality of second candidate key information data sets including protocol data; and then merging the second alternative key information data sets belonging to the same protocol type according to the protocol type corresponding to each second alternative key information data set to obtain a second key information data set comprising protocol data. It can be understood that, unlike a wide variety of service data, the protocol data generally follows a strict protocol standard, and therefore, a simpler extraction method can be adopted.

In addition, in this embodiment, as an implementation manner, when performing target information matching based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet, core key information in the target extraction content sent by a maintenance device may be obtained first, and it is understood that the maintenance device is a device used by a manager; then, performing keyword screening on the core key information to obtain screened core key information, and adding the screened core key information to a corresponding information processing node; next, performing key information matching on the first key information data set and the second key information data set based on the screened core key information and the information processing node to obtain at least one piece of initial matching key information; and then, determining at least one target matching key information which accords with a preset information screening strategy in all the initial matching key information, and constructing all the target matching key information into a target extraction data set.

Optionally, as an embodiment, the core key information includes a plurality of core keywords; based on this, when the key words of the core key information are screened to obtain the screened core key information, the historical statistical times and the threshold statistical times of each core key word in the core key information can be obtained firstly; and then, deleting the core key words of which the difference between the historical statistical times and the threshold statistical times exceeds a set time threshold, and taking the rest core key words as the core key information after screening.

In addition, in this embodiment, as an implementation manner, when performing key information matching on the first key information data set and the second key information data set based on the screened core key information and the information processing node to obtain at least one initial matching key information, the information processing node may first be used to respectively calculate a first key information association degree of each screened core key information with the first key information data set and a second key information association degree of the information processing node; then, for each piece of screened core key information, when any one of the corresponding first key information relevance degree and the corresponding second key information relevance degree is greater than a preset relevance degree threshold value, determining the corresponding core key information as initial matching key information; next, for each piece of screened core key information, when the corresponding first key information relevance degree and the corresponding second key information relevance degree are both smaller than or equal to a preset relevance degree threshold value, determining the corresponding core key information as initial matching key information; then, for each of the screened core key information, when one of the corresponding first key information relevance degree and the corresponding second key information relevance degree is greater than a preset relevance degree threshold value, and the other one is less than or equal to the preset relevance degree threshold value, discarding the corresponding core key information. Therefore, according to the scheme provided by the application, the accuracy of acquiring the key information can be improved.

As an implementation manner, when determining at least one target matching key information meeting a preset information screening policy in all the initial matching key information, first calculating each feature matching value of each initial matching key information based on at least one pre-configured feature value calculation policy, and calculating an initial evaluation score of each initial matching key information based on each feature matching value; then, determining an evaluation score threshold value corresponding to each initial matching key information based on the key information type to which each initial matching key information belongs; next, whether the initial evaluation score of each initial matching key information is larger than the corresponding evaluation score threshold value may be judged; then, at least one piece of initial matching key information, of which the corresponding initial evaluation score is greater than the respective corresponding evaluation score threshold value, is determined as target matching key information.

In addition, referring to fig. 4, fig. 4 is a block diagram of a data processing apparatus 400 based on big data and cloud computing according to the present application, where the data processing apparatus 400 includes an obtaining module 410 and a processing module 420.

An obtaining module 410, configured to obtain a data processing request for accessing data; the data processing request indicates that a second cloud computing node which needs to perform access data interaction with the first cloud computing node exists;

a processing module 420, configured to create a first data processing cache space at a first cloud computing node according to the data processing request, and enable the second cloud computing node to create a second data processing cache space locally at the second cloud computing node;

the processing module 420 is further configured to, when the space configuration information of the first data processing cache space and the second data processing cache space meets a pre-configured cache control policy, construct a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space;

the processing module 420 is further configured to perform access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and calculate an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

Optionally, when the processing module 420 performs access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the processing module may be configured to:

exchanging metadata information of the access data stored in the second cloud computing node with the second cloud computing node through the data transmission link; wherein the metadata information comprises a packet size of first access data of a first cloud computing node and a packet size of second access data of a second cloud computing node;

comparing the packet size of the second access data with the packet size of the first access data to determine a data management mode of the second cloud computing node;

and according to the data transmission link, the first data processing cache space and the data transmission signaling, performing access data interaction with the second cloud computing node by adopting an access data processing strategy corresponding to the data management mode of the second cloud computing node, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

Optionally, when comparing the packet size of the second access data with the packet size of the first access data to determine the data management mode of the second cloud computing node, the processing module 420 may be configured to:

calculating a data packet difference ratio of the data packet size of the second access data to the data packet size of the first access data of the first cloud computing node;

when the data packet difference ratio does not exceed a preset difference ratio threshold, determining that the data management mode of the second cloud computing node is a parallel computing mode;

and when the data packet difference ratio exceeds the preset difference ratio threshold, determining that the data management mode of the second cloud computing node is a difference computing mode.

Optionally, when performing access data interaction with the second cloud computing node by using the access data processing policy corresponding to the data management mode of the second cloud computing node according to the data transmission link, the first data processing cache space, and the data transmission signaling, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the processing module 420 may be configured to:

according to the data management mode of the second cloud computing node, determining a target data interaction unit for performing access data interaction with the first cloud computing node in the second cloud computing node, and determining an access data processing strategy of the target data interaction unit;

according to the access data processing strategy and the data transmission signaling, performing access data interaction on the first access data and second access data of the target data interaction unit through the data transmission link and the first data processing cache space, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data;

the processing module 420 may be configured to determine, in the second cloud computing node, a target data interaction unit for performing access data interaction with the first cloud computing node according to the data management mode of the second cloud computing node, and determine an access data processing policy of the target data interaction unit

When the data management mode of the second cloud computing node is a parallel computing mode, taking the data interaction unit used by the second cloud computing node in the parallel computing mode as the target data interaction unit for performing access data interaction with the first cloud computing node, and determining that the access data processing strategy of the target data interaction unit is parallel access data interaction;

when the data management mode of the second cloud computing node is a difference computing mode, determining a preset number of data interaction units as target data interaction units from all data interaction units used by the second cloud computing node in the difference computing mode, and determining an access data processing strategy of the target data interaction units as difference access data interaction;

when the data management mode of the second cloud computing node comprises the parallel computing mode and the difference computing mode, determining a preset number of data interaction units in all data interaction units used by the second cloud computing node in the difference computing mode, taking the preset number of data interaction units and the data interaction units used by the second cloud computing node in all parallel computing modes as target data interaction units, and determining that an access data processing strategy of the target data interaction units is mixed access data interaction; the mixed access data interaction and the parallel access data interaction are carried out in the parallel computing mode, and after the parallel access data interaction is finished, the difference access data interaction is carried out in the difference computing mode;

when performing access data interaction on the first access data and the second access data of the target data interaction unit through the data transmission link and the first data processing cache space according to the access data processing policy and the data transmission signaling, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the processing module 420 may be configured to:

when the access data processing strategy is parallel access data interaction, performing access data interaction on the first access data with a preset data volume and the second access data in the parallel computing mode through the data transmission link and the data transmission signaling, and calculating in the first data processing cache space to obtain a correlation degree between the first access data and the second access data in the parallel computing mode to serve as an access data processing result between the first cloud computing node and the second cloud computing node;

when the access data processing strategy is difference access data interaction, receiving access data to be calculated sent by the second cloud computing node in the difference computing mode, and calculating an access data processing result between the first cloud computing node and the second cloud computing node according to the access data to be calculated;

when the access data processing strategy is hybrid access data interaction, performing access data interaction on the first access data with a preset data volume and the second access data in the parallel computing mode, calculating in the first data processing cache space to obtain the association degree between the first access data and the second access data in the parallel computing mode to obtain an intermediate access processing result, receiving the second access data sent by the second cloud computing node in the differential computing mode, and calculating an access data processing result between the first cloud computing node and the second cloud computing node according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode.

Optionally, the data transmission signaling includes interaction check information, link check information, and data encryption check information; the processing module 420 is specifically configured to perform access data interaction on the first access data with a preset data size and the second access data in the parallel computing mode through the data transmission link and the data transmission signaling, and obtain, by computing in the first data processing cache space, an association degree between the first access data and the second access data in the parallel computing mode, where the association degree is used as an access data processing result between the first cloud computing node and the second cloud computing node:

according to metadata information locally stored by the first cloud computing node in the parallel computing mode, determining the size of a preset data volume of access data interaction of the first cloud computing node in the parallel computing mode and the number of storage units of a data storage unit for storing the access data;

creating first data storage units corresponding to the number of the storage units in a cache area except the first data processing cache space in the cache space of the first cloud computing node;

verifying the interactive verification information to obtain an interactive verification result; wherein the interaction verification result keeps an activated state in one access data interaction;

performing data marking on the first access data by adopting the interactive verification result;

performing data segmentation on the first cloud computing node access data after data marking to obtain sub access data corresponding to the number of the storage units, and determining a storage unit serial number of each sub access data stored in a first data storage unit;

storing each sub access data into a first data storage unit corresponding to the storage unit serial number;

determining first access data corresponding to the preset data size in all the first data storage units as target first access data, and storing the target first access data to the first data processing cache space;

data marking is carried out on the target first access data in the first data processing cache space by adopting the link verification information and the data encryption verification information, and the target first access data after data marking is sent to a second cloud computing node in the parallel computing mode through the data transmission link;

receiving target second access data sent by a second cloud computing node in the parallel computing mode, and performing decryption verification on the target second access data by using the data encryption verification information in the first data processing cache space;

when the decryption verification of the target second access data passes, data marking is carried out on the target second access data by adopting the link verification information;

calculating the association degree between the target second access data marked by the data and the rest first access data which are not sent to the second cloud computing node in the parallel computing mode to obtain a first intermediate access processing result;

sending the first intermediate access processing result to a second cloud computing node in the parallel computing mode, and receiving a second intermediate access processing result sent by the second cloud computing node in the parallel computing mode; the second intermediate access processing result is used for indicating the association degree between the remaining second access data which are calculated by the second cloud computing node in the parallel computing mode and are not sent to the first cloud computing node and the target first access data;

and combining the first intermediate access processing result and the second intermediate access processing result to obtain an access data processing result between the first cloud computing node and the second cloud computing node.

Optionally, the access data to be calculated includes first access data to be calculated and second access data to be calculated; the data transmission signaling comprises data encryption verification information, link verification information and interaction verification information;

the processing module 420 may be configured to receive the to-be-computed access data sent by the second cloud computing node in the differential computing mode, and when computing a processing result of the access data between the first cloud computing node and the second cloud computing node according to the to-be-computed access data, where the processing result is used to perform the processing result on the access data between the first cloud computing node and the second cloud computing node

When the data packet size of the first access data is larger than that of second access data of the second cloud computing node in the differential computing mode, receiving the first to-be-computed access data sent by the second cloud computing node in the differential computing mode, and performing decryption verification on the first to-be-computed access data in the first data processing cache space by using the data encryption verification information;

when the decryption verification of the first to-be-calculated access data passes, performing data marking on the first to-be-calculated access data by adopting the link verification information in the first data processing cache space;

obtaining a target access data processing result by the association degree between the first to-be-calculated access data with the first access data after the data mark is calculated in the first data processing cache space, and taking the target access data processing result as an access data processing result between the first cloud computing node and the second cloud computing node; the first to-be-computed access data is second access data obtained after data cleaning is carried out on the second cloud computing node according to a preset filtering strategy in the difference computing mode;

when the data packet size of the first access data is smaller than or equal to the data packet size of the second access data in the difference calculation mode, data marking is carried out on the first access data by adopting the interactive verification information, and the first access data after data marking is stored in a data storage unit;

storing the data storage unit to the first data processing cache space, and performing data marking on first access data in the data storage unit by adopting the link verification information and the data encryption verification information to obtain first access data after data marking;

sending the marked first access data to a second cloud computing node in the difference computing mode, so that the marked first access data and second access data stored by the second cloud computing node in the difference computing mode are combined to obtain second access data to be computed;

and receiving the second to-be-computed access data sent by the second cloud computing node in the difference computing mode, and using the second to-be-computed access data as an access data processing result between the first cloud computing node and the second cloud computing node.

Optionally, when the processing module 420 performs access data interaction on the first access data with a preset data size and the second access data in the parallel computing mode, and calculates in the first data processing cache space to obtain a correlation degree between the first access data and the second access data in the parallel computing mode, to obtain an intermediate access processing result, and receives the second access data sent by the second cloud computing node in the differential computing mode, and calculates an access data processing result between the first cloud computing node and the second cloud computing node according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode, the processing module may be configured to:

determining first target to-be-calculated access data used for access data interaction in the first access data, and performing data marking on the first target to-be-calculated access data by adopting the data transmission signaling;

sending the first target to-be-calculated access data after data marking to a second cloud computing node in the parallel computing mode through the data transmission link, and receiving second target to-be-calculated access data sent by the second cloud computing node in the parallel computing mode;

calculating the association degree of the second target data to be accessed and other data to be accessed except the first target data to be accessed in the first data processing cache space to obtain an intermediate access processing result between the first data to be accessed and the second data to be accessed sent by the second cloud computing node in the parallel computing mode;

when the size of a data packet of access data in the intermediate access processing result is larger than that of a data packet of second access data in the differential computing mode, receiving current second access data sent by a second cloud computing node in the differential computing mode, calculating the association degree between the intermediate access processing result and the current second access data to obtain a current access data processing result, and then taking the current access data processing result as an access data processing result between a first cloud computing node and the second cloud computing node;

when the size of a data packet of access data in the intermediate access processing result is smaller than or larger than the size of a data packet of second access data of a second cloud computing node in the difference computing mode, according to the data transmission signaling, marking the intermediate access processing result, sending the marked intermediate access processing result to the second cloud computing node in the difference computing mode, receiving an access data association result sent by the second cloud computing node in the difference computing mode after calculating the association degree aiming at the intermediate access processing result, and taking the access data association result as an access data processing result between the first cloud computing node and the second cloud computing node.

Optionally, the processing module 420 is further configured to: packaging the access data processing result to obtain a target data packet to be stored in a storage;

respectively extracting service data and protocol data from a plurality of data blocks in the target data packet to obtain a service data sequence and a protocol data sequence;

performing first key information extraction on the service data sequence through a first key information extraction strategy to obtain a first key information data set comprising service data;

performing second key information extraction on the protocol data sequence through a second key information extraction strategy to obtain a second key information data set comprising protocol data;

performing target information matching based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet; the target extraction content comprises at least one of service data and protocol data, and the target extraction data set is used for storing the target data packet in a storage mode.

Optionally, the processing module 420 may be configured to extract service data and protocol data from a plurality of data blocks in the target data packet respectively to obtain a service data sequence and a protocol data sequence

Respectively extracting service data from a plurality of data blocks in the target data packet to obtain service data extraction windows in the data blocks and initial service data corresponding to the service data extraction windows;

determining a service data sequence based on the service data extraction window and corresponding initial service data in each data block;

respectively identifying protocol data of a plurality of data blocks in the target data packet to obtain protocol data contents corresponding to the data blocks;

respectively identifying protocol types of a plurality of data blocks in the target data packet to obtain target protocol types followed by the data blocks;

associating the protocol data content with the target protocol type;

and extracting protocol data based on a data packet of a target protocol type associated with preset target protocol data content in the target data packet to obtain a protocol data sequence.

Optionally, the processing module 420 may be configured to extract first key information from the service data sequence through a first key information extraction policy to obtain a first key information data set including service data

For each data block corresponding to the service data sequence, when the number of the initial service data of the data block is at least two, acquiring a service marking value of each initial service data; the service marking value is used for indicating the counted times of the corresponding service data in a preset time interval;

when the initial service data with the highest service tag value is one, taking the initial service data with the highest service tag value as the target service data of the corresponding data block;

when the number of the initial service data with the highest service marking value is at least two, acquiring the service type priority of the corresponding service data extraction window aiming at the initial service data with the highest service marking value;

determining target service data corresponding to the corresponding data block according to the initial service data corresponding to the service type priority with the highest corresponding priority;

for each data block, acquiring a target window ratio of a service data extraction window corresponding to corresponding target service data in each data block; the target window ratio is used for indicating the ratio of the length of the corresponding service data extraction window to the length of all the service data extraction windows;

when the target window ratio is within a preset window ratio interval, retaining a corresponding service data extraction result; the reserved service data extraction result comprises a service data extraction window and target service data corresponding to the service data extraction window;

when the target window ratio is not in the preset window ratio interval, setting the service data extraction result of the corresponding data block as a null service data set;

obtaining an updated service data sequence based on the service data extraction result corresponding to each data block;

performing timestamp label identification on the updated service data sequence to obtain multiple groups of service initial data and service end data;

determining the service data duration between each group of service initial data and service end data;

when the service data duration is greater than or equal to a first set duration threshold, taking a key information data set formed by service starting data and service ending data of a corresponding group as a first alternative key information data set;

for each first alternative key information data set, determining the feature service category with the largest occurrence count according to the updated target service data corresponding to each data block in the first alternative key information data set;

taking the characteristic service category as a service category to which service data included in the corresponding first alternative key information data set belongs;

determining the service category to which each first alternative key information data set belongs;

when at least two first alternative key information data sets which are adjacent in time sequence all belong to the same service category, merging the at least two first alternative key information data sets to obtain a first key information data set corresponding to the same service category.

Optionally, the service data extraction result in the service data sequence includes an empty service data set and a non-empty service data set;

when the processing module 420 performs timestamp label identification on the updated service data sequence to obtain multiple sets of service start data and service end data, it may be configured to:

taking a data block corresponding to a first non-empty service data set in a current identification cycle in the updated service data sequence as service initial data of a current group;

traversing the data block behind the service starting data of the current group;

when the service data extraction result corresponding to the traversed current data block is an empty service data set and the service data extraction results corresponding to the data blocks within a second set time length threshold from the current data block are all empty service data sets, taking the current data block as the service end data of the current group;

and taking the data block corresponding to the first non-empty service data set after the service end data of the current group as the service start data of the current group of the next identification cycle, and returning the step of traversing the data block after the service start data of the current group to continue execution until obtaining multiple groups of service start data and service end data.

Optionally, when the service data extraction result corresponding to the traversed current data block is an empty service data set and the service data extraction results corresponding to the data blocks within the second set duration threshold from the current data block are all empty service data sets, before the current data block is used as the service end data of the current group, the processing module 420 is further configured to:

when the time length of a key information data set determined by the traversed current data block and the service initial data of the current group is less than a third set time length threshold, determining whether a service data extraction result corresponding to the current data block is an empty service data set;

when the service data extraction result corresponding to the current data block is a non-empty service data set, taking the current data block as one of the key information data sets corresponding to the current group;

and when the current data block corresponds to an empty service data set and a service data extraction result within a second set time length threshold from the current data block comprises a non-empty service data set, taking a data block corresponding to a first non-empty service data set within the second set time length threshold from the current data block as a traversed next current data block, and returning to the step of determining whether the service data extraction result corresponding to the current data block is an empty service data set when the key information data set time length determined by the traversed current data block and the service starting data of the current group is less than a third set time length threshold.

Optionally, the processing module 420 may be configured to use a data block corresponding to a first non-empty service data set in a current identification cycle in the updated service data sequence as service start data of a current group

Acquiring a target data packet corresponding to a first non-empty service data set in the current identification cycle in the updated service data sequence;

when the service data extraction result corresponding to the next data packet of the target data packet is an empty service data set, setting the service data extraction result corresponding to the target data packet as the empty service data set;

and when the service data extraction result corresponding to the next data packet of the target data packet is a non-empty service data set, taking the target data packet as the service initial data of the current group.

Optionally, when the processing module 420 performs second key information extraction on the protocol data sequence through a second key information extraction policy to obtain a second key information data set including protocol data, the processing module may be configured to:

performing timestamp label identification on each protocol data in the protocol data sequence to obtain a plurality of second alternative key information data sets comprising the protocol data;

and merging the second alternative key information data sets belonging to the same protocol type according to the protocol type corresponding to each second alternative key information data set to obtain a second key information data set comprising protocol data.

Optionally, when the target information matching is performed based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet, the processing module 420 may be configured to:

obtaining core key information in target extraction content sent by maintenance equipment;

performing keyword screening on the core key information to obtain screened core key information, and adding the screened core key information to a corresponding information processing node;

performing key information matching on the first key information data set and the second key information data set based on the screened core key information and the information processing node to obtain at least one piece of initial matching key information;

and determining at least one target matching key information which accords with a preset information screening strategy in all the initial matching key information, and constructing all the target matching key information into a target extraction data set.

Optionally, the core key information includes a plurality of core keywords;

the processing module 420 may be configured to, when performing keyword screening on the core key information to obtain the screened core key information:

acquiring historical statistics times and threshold statistics times of each core keyword in the core key information;

and deleting the core key words of which the difference value between the historical statistical times and the threshold statistical times exceeds a set time threshold, and taking the rest core key words as the core key information after screening.

Optionally, when the processing module 420 performs key information matching on the first key information data set and the second key information data set based on the filtered core key information and the information processing node to obtain at least one initial matching key information, it may be configured to:

respectively calculating a first key information association degree of each screened core key information with the first key information data set and a second key information association degree of the information processing node by using the information processing node;

for each piece of screened core key information, when any one of the corresponding first key information relevancy and the corresponding second key information relevancy is greater than a preset relevancy threshold, determining the corresponding core key information as initial matching key information;

for each piece of screened core key information, when the corresponding first key information relevance degree and the corresponding second key information relevance degree are both smaller than or equal to a preset relevance degree threshold value, determining the corresponding core key information as initial matching key information;

and for each piece of screened core key information, when one of the corresponding first key information relevance degree and the corresponding second key information relevance degree is greater than a preset relevance degree threshold value, and the other one is less than or equal to the preset relevance degree threshold value, discarding the corresponding core key information.

Optionally, when determining that at least one target matching key information in all the initial matching key information meets the preset information screening policy, the processing module 420 may be configured to:

calculating each feature matching value of each piece of initial matching key information based on at least one pre-configured feature value calculation strategy, and calculating an initial evaluation score of each piece of initial matching key information based on each feature matching value;

determining an evaluation score threshold value corresponding to each initial matching key information based on the key information type to which each initial matching key information belongs;

judging whether the initial evaluation score of each piece of initial matching key information is larger than the corresponding evaluation score threshold value;

and determining at least one piece of initial matching key information with the corresponding initial evaluation score larger than the corresponding evaluation score threshold value as target matching key information.

In addition, the application also provides a data processing system based on big data and cloud computing, and the data processing system comprises an acquisition node and a computing node.

The acquisition node is used for acquiring a data processing request aiming at the access data; the data processing request indicates that a second cloud computing node which needs to perform access data interaction with the first cloud computing node exists;

the computing node is used for creating a first data processing cache space in a first cloud computing node according to the data processing request, and enabling the second cloud computing node to create a second data processing cache space in the local of the second cloud computing node;

the computing node is further configured to, when the space configuration information of the first data processing cache space and the second data processing cache space satisfies a pre-configured cache control policy, construct a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space;

the computing node is further configured to perform access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and compute an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data.

Optionally, when the computing node performs access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the computing node may be configured to:

Optionally, when comparing the packet size of the second access data with the packet size of the first access data to determine the data management mode of the second cloud computing node, the computing node may be configured to:

Optionally, when the computing node performs access data interaction with the second cloud computing node by using an access data processing policy corresponding to the data management mode of the second cloud computing node according to the data transmission link, the first data processing cache space, and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the computing node may be configured to:

the computing node can be used for determining a target data interaction unit for performing access data interaction with a first cloud computing node in a second cloud computing node according to a data management mode of the second cloud computing node and determining an access data processing strategy of the target data interaction unit

when the computing node performs access data interaction on the first access data and the second access data of the target data interaction unit through the data transmission link and the first data processing cache space according to the access data processing policy and the data transmission signaling, and calculates an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, the computing node may be configured to:

Optionally, the data transmission signaling includes interaction check information, link check information, and data encryption check information; the computing node performs access data interaction on the first access data with a preset data size and the second access data in the parallel computing mode through the data transmission link and the data transmission signaling, and calculates in the first data processing cache space to obtain a correlation degree between the first access data and the second access data in the parallel computing mode, so that when the correlation degree is used as an access data processing result between the first cloud computing node and the second cloud computing node, the computing node is specifically configured to:

when the computing node receives the access data to be computed sent by the second cloud computing node in the differential computing mode and computes the processing result of the access data between the first cloud computing node and the second cloud computing node according to the access data to be computed, the computing node may be used for

Optionally, when the computing node performs access data interaction on the first access data with a preset data size and the second access data in the parallel computing mode, calculates in the first data processing cache space to obtain a correlation between the first access data and the second access data in the parallel computing mode to obtain an intermediate access processing result, receives the second access data sent by the second cloud computing node in the differential computing mode, and calculates an access data processing result between the first cloud computing node and the second cloud computing node according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode, the computing node may be configured to:

Optionally, the computing node is further configured to: packaging the access data processing result to obtain a target data packet to be stored in a storage;

Optionally, the computing node may be configured to extract service data and protocol data from the plurality of data blocks in the target data packet to obtain a service data sequence and a protocol data sequence

associating the protocol data content with the target protocol type;

Optionally, the computing node may be configured to extract the first key information of the service data sequence through a first key information extraction policy to obtain a first key information data set including the service data

when the computing node performs timestamp label identification on the updated service data sequence to obtain multiple sets of service start data and service end data, the computing node may be configured to:

Optionally, when the service data extraction result corresponding to the traversed current data block is an empty service data set and the service data extraction results corresponding to the data blocks within the second set duration threshold from the current data block are all empty service data sets, before the current data block is used as the service end data of the current group, the computing node is further configured to:

Optionally, when the data block corresponding to the first non-empty service data set in the current identification cycle in the updated service data sequence is used as the service start data of the current group, the computing node may be configured to use the data block as the service start data of the current group

Optionally, when the computing node performs second key information extraction on the protocol data sequence through a second key information extraction policy to obtain a second key information data set including protocol data, the computing node may be configured to:

Optionally, when the computing node performs target information matching based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet, the computing node may be configured to:

Optionally, the core key information includes a plurality of core keywords;

the computing node may be configured to, when performing keyword screening on the core key information to obtain screened core key information:

Optionally, when the computing node performs key information matching on the first key information data set and the second key information data set based on the screened core key information and the information processing node to obtain at least one initial matching key information, the computing node may be configured to:

Optionally, when determining that at least one target matching key information in all the initial matching key information meets the preset information screening policy, the computing node may be configured to:

Claims

1. A data processing method based on big data and cloud computing is characterized by comprising the following steps:

acquiring a data processing request aiming at access data; the data processing request indicates that a second cloud computing node which needs to perform access data interaction with the first cloud computing node exists;

according to the data processing request, a first data processing cache space is created in a first cloud computing node, and a second data processing cache space is created in the local of a second cloud computing node by a second cloud computing node;

when the space configuration information of the first data processing cache space and the second data processing cache space meets a pre-configured cache control strategy, constructing a data transmission link and a data transmission signaling between the first data processing cache space and the second data processing cache space;

performing access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and computing an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data;

packaging the access data processing result to obtain a target data packet to be stored in a storage;

performing target information matching based on the first key information data set and the second key information data set to obtain a target extraction data set corresponding to target extraction content in the target data packet; the target extraction content comprises at least one of service data and protocol data, and the target extraction data set is used for storing the target data packet in a storage mode;

wherein, the extracting the first key information of the service data sequence by the first key information extracting strategy to obtain a first key information data set including service data includes:

when at least two first alternative key information data sets which are adjacent in time sequence all belong to the same service category, merging the at least two first alternative key information data sets to obtain a first key information data set corresponding to the same service category;

the second key information extraction is performed on the protocol data sequence through a second key information extraction strategy to obtain a second key information data set including protocol data, and the method includes:

merging second alternative key information data sets belonging to the same protocol type according to the protocol type corresponding to each second alternative key information data set to obtain a second key information data set comprising protocol data;

the performing access data interaction with the second cloud computing node through the data transmission link and the data transmission signaling, and computing an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data includes:

2. The method of claim 1, wherein comparing the packet size of the second access data with the packet size of the first access data to determine the data management mode of the second cloud computing node comprises:

3. The method according to claim 2, wherein the performing access data interaction with the second cloud computing node by using an access data processing policy corresponding to the data management mode of the second cloud computing node according to the data transmission link, the first data processing cache space and the data transmission signaling, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data comprises:

the determining, in the second cloud computing node, a target data interaction unit for performing access data interaction with the first cloud computing node and determining an access data processing policy of the target data interaction unit according to the data management mode of the second cloud computing node includes:

the performing access data interaction between the first access data and the second access data of the target data interaction unit through the data transmission link and the first data processing cache space according to the access data processing policy and the data transmission signaling, and calculating an access data processing result between the first cloud computing node and the second cloud computing node based on the interacted access data, includes:

4. The method of claim 3, wherein the data transmission signaling comprises interactive check information, link check information, and data encryption check information;

the performing, by the data transmission link and the data transmission signaling, access data interaction between the first access data with a preset data size and the second access data in the parallel computing mode, and obtaining a correlation degree between the first access data and the second access data in the parallel computing mode by computing in the first data processing cache space, so as to serve as an access data processing result between the first cloud computing node and the second cloud computing node, includes:

5. The method according to claim 4, wherein the access data to be computed comprises a first access data to be computed and a second access data to be computed; the data transmission signaling comprises data encryption verification information, link verification information and interaction verification information;

the receiving access data to be calculated, which is sent by the second cloud computing node in the differential computing mode, and calculating an access data processing result between the first cloud computing node and the second cloud computing node according to the access data to be calculated includes:

6. The method according to claim 4, wherein performing access data interaction on the first access data with a preset data size and the second access data in the parallel computing mode, calculating a correlation degree between the first access data and the second access data in the parallel computing mode in the first data processing cache space to obtain an intermediate access processing result, receiving the second access data sent by the second cloud computing node in the differential computing mode, and calculating an access data processing result between the first cloud computing node and the second cloud computing node according to the intermediate access processing result and the second access data sent by the second cloud computing node in the differential computing mode, includes:

7. A cloud computing node, comprising:

a memory for storing one or more programs;

a processor;

the one or more programs, when executed by the processor, implement the big data and cloud computing-based data processing method of any of claims 1-6.