Disclosure of Invention
In view of this, the present invention provides a data security tamper-proofing method, system and cloud platform based on big data, so as to improve the problem of low reliability of data security verification in the prior art.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
a data security tamper-proofing method based on big data is applied to a big data processing cloud platform, the big data processing cloud platform is in communication connection with a plurality of data access terminal devices, and the data security tamper-proofing method based on the big data comprises the following steps:
for any data access terminal equipment in the plurality of data access terminal equipment, carrying out equipment identity determination processing on the data access terminal equipment to generate an equipment identity determination result corresponding to the data access terminal equipment, wherein the equipment identity determination result is used for reflecting whether the corresponding data access terminal equipment belongs to first data access terminal equipment or not;
under the condition that an equipment identity determination result corresponding to each data access terminal equipment is formed, for each determined first data access terminal equipment, equipment correlation coefficient determination processing is carried out on the first data access terminal equipment and each other data access terminal equipment except the first data access terminal equipment so as to generate equipment correlation coefficients between the first data access terminal equipment and each other data access terminal equipment;
for each first data access terminal device, matching at least one second data access terminal device corresponding to the first data access terminal device from other data access terminal devices through a device correlation coefficient between the first data access terminal device and each other data access terminal device, and performing data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, the step of performing, for any data access terminal device in the multiple data access terminal devices, device identity determination processing on the data access terminal device to generate a device identity determination result corresponding to the data access terminal device includes:
for any data access terminal equipment in the plurality of data access terminal equipment, performing information sending monitoring processing on the data access terminal equipment to form an information sending monitoring result corresponding to the data access terminal equipment, wherein the information sending monitoring result is used for reflecting whether the corresponding data access terminal equipment sends a data access request to the big data processing cloud platform or not;
and for any data access terminal equipment in the plurality of data access terminal equipment, taking an information sending monitoring result corresponding to the data access terminal equipment as a basis for equipment identity analysis, and performing equipment identity determination processing on the data access terminal equipment to generate an equipment identity determination result corresponding to the data access terminal equipment.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, the step of performing, for any one of the plurality of data access terminal devices, device identity determination processing on the data access terminal device by using an information sending monitoring result corresponding to the data access terminal device as a basis for device identity analysis to generate a device identity determination result corresponding to the data access terminal device includes:
for any data access terminal equipment in the plurality of data access terminal equipment, under the condition that an information sending monitoring result corresponding to the data access terminal equipment reflects that the data access terminal equipment sends a data access request to the big data processing cloud platform, analyzing the data access request to output equipment identity information corresponding to the data access terminal equipment;
for any data access terminal equipment in the plurality of data access terminal equipment, comparing and analyzing equipment identity information corresponding to the data access terminal equipment with a preset reference equipment identity information set to output a comparison and analysis result corresponding to the data access terminal equipment, wherein the comparison and analysis result is used for reflecting whether the equipment identity information corresponding to the data access terminal equipment is the same as one piece of reference equipment identity information in the reference equipment identity information set or not;
for any data access terminal equipment in the plurality of data access terminal equipment, forming an equipment identity determination result corresponding to the data access terminal equipment according to a comparative analysis result corresponding to the data access terminal equipment, wherein the corresponding equipment identity determination result reflects that the data access terminal equipment belongs to the first data access terminal equipment under the condition that the comparative analysis result reflects that the equipment identity information corresponding to the data access terminal equipment is different from each piece of reference equipment identity information in the reference equipment identity information set, and the corresponding equipment identity determination result reflects that the data access terminal equipment does not belong to the first data access terminal equipment under the condition that the comparative analysis result reflects that the equipment identity information corresponding to the data access terminal equipment is the same as one piece of reference equipment identity information in the reference equipment identity information set.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, in the case that a device identity determination result corresponding to each of the data access terminal devices is formed, for each of the determined first data access terminal devices, performing device correlation coefficient determination processing on the first data access terminal device and each of the other data access terminal devices except the first data access terminal device to generate a device correlation coefficient between the first data access terminal device and each of the other data access terminal devices, the method includes:
under the condition that an equipment identity determination result corresponding to each data access terminal equipment is formed, for each data access terminal equipment in the plurality of data access terminal equipment, performing data extraction processing on equipment interaction performed by the data access terminal equipment in history to form an equipment interaction data set corresponding to the data access terminal equipment;
and for each determined first data access terminal device, respectively performing device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device except the first data access terminal device according to the device interaction data set corresponding to the first data access terminal device and the device interaction data set corresponding to each other data access terminal device except the first data access terminal device, so as to generate a device correlation coefficient between the first data access terminal device and each other data access terminal device.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, for each determined first data access terminal device, respectively according to a device interaction data set corresponding to the first data access terminal device and a device interaction data set corresponding to each other data access terminal device except the first data access terminal device, the step of performing device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device except the first data access terminal device to generate a device correlation coefficient between the first data access terminal device and each other data access terminal device includes:
for each determined first data access terminal device, respectively performing set correlation calculation processing on a device interaction data set corresponding to the first data access terminal device and a device interaction data set corresponding to each other data access terminal device except the first data access terminal device, so as to output set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device;
and for each determined first data access terminal device, performing device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device according to the set correlation degree between the first data access terminal device and each other data access terminal device except the first data access terminal device, so as to generate a device correlation coefficient between the first data access terminal device and each other data access terminal device.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, for each determined first data access terminal device, performing set correlation calculation processing on a device interaction data set corresponding to the first data access terminal device and a device interaction data set corresponding to each other data access terminal device except the first data access terminal device, respectively, so as to output a set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device, the method includes:
for each piece of first equipment interaction data in an equipment interaction data set corresponding to the first data access terminal equipment, performing text keyword extraction processing on the first equipment interaction data to form a corresponding first text keyword set, and for each piece of second equipment interaction data in an equipment interaction data set corresponding to the other data access terminal equipment, performing text keyword extraction processing on the second equipment interaction data to form a corresponding second text keyword set;
for each piece of first device interaction data, performing set overlap ratio calculation processing on a first text keyword set corresponding to the first device interaction data and a first text keyword set corresponding to previous adjacent first device interaction data of the first device interaction data to output first text set overlap ratio corresponding to the first device interaction data, and for each piece of second device interaction data, performing set overlap ratio calculation processing on a second text keyword set corresponding to the second device interaction data and a second text keyword set corresponding to previous adjacent second device interaction data of the second device interaction data to output second text set overlap ratio corresponding to the second device interaction data;
according to the coincidence degree of a first text set corresponding to each piece of first equipment interaction data, splitting the equipment interaction data set corresponding to the first data access terminal equipment to form a plurality of first equipment interaction data sub-sets, wherein the equipment interaction data set belongs to an ordered set, and the splitting positions of the equipment interaction data set are respectively each piece of first equipment interaction data of which the coincidence degree of the corresponding first text set is less than or equal to the coincidence degree of a preset text set;
splitting the device interaction data sets corresponding to the other data access terminal devices according to the coincidence degree of the second text set corresponding to each piece of the second device interaction data to form a plurality of second device interaction data subsets, wherein the splitting positions of the device interaction data sets are respectively each piece of the second device interaction data of which the coincidence degree of the corresponding second text set is less than or equal to the coincidence degree of the preset text set;
for each first device interactive data subset, merging a first text keyword set corresponding to each piece of first device interactive data included in the first device interactive data subset to form a first text keyword merged set corresponding to the first device interactive data subset, and for each second device interactive data subset, merging a second text keyword set corresponding to each piece of second device interactive data included in the second device interactive data subset to form a second text keyword merged set corresponding to the second device interactive data subset;
respectively calculating the text set overlap ratio between each first text keyword merging set and each second text keyword merging set, and then carrying out fusion processing on the text set overlap ratio between each first text keyword merging set and each second text keyword merging set to form first set correlation;
forming a first object set corresponding to the first data access terminal device according to a data interaction object corresponding to each piece of first device interaction data in the device interaction data set corresponding to the first data access terminal device, and forming a second object set corresponding to the other data access terminal devices according to a data interaction object corresponding to each piece of second device interaction data in the device interaction data set corresponding to the other data access terminal devices;
and performing set contact ratio calculation processing on the first object set and the second object set to output a second set correlation degree, and performing fusion processing on the second set correlation degree and the first set correlation degree to output the set correlation degree between the first data access terminal equipment and the other data access terminal equipment.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, the step of matching, for each first data access terminal device, at least one second data access terminal device corresponding to the first data access terminal device from other data access terminal devices through a device correlation coefficient between the first data access terminal device and each other data access terminal device, so as to perform data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device includes:
for each first data access terminal device, screening out a target number of other data access terminal devices with the minimum device correlation coefficient with the first data access terminal device from other data access terminal devices except the first data access terminal device, and respectively marking each other data access terminal device in the target number of other data access terminal devices as a second data access terminal device corresponding to the first data access terminal device;
and for each first data access terminal device, respectively performing data security verification processing on the data access of the first data access terminal device through each second data access terminal device corresponding to the first data access terminal device.
In some preferred embodiments, in the above method for preventing data security from being tampered with based on big data, the step of performing, for each first data access terminal device, data security verification processing on data access of the first data access terminal device through each second data access terminal device corresponding to the first data access terminal device includes:
for each first data access terminal device, respectively sending a data access request corresponding to data access of the first data access terminal device to each second data access terminal device corresponding to the first data access terminal device for data security verification processing to form a data security verification result corresponding to each second data access terminal device, wherein the data security verification result is used for representing whether the corresponding data access request belongs to a secure data access request;
for each first data access terminal device, performing fusion processing on a data security verification result corresponding to each second data access terminal device according to a device correlation coefficient between the first data access terminal device and each second data access terminal device corresponding to the first data access terminal device to form a target data security verification result corresponding to the first data access terminal device, wherein the target data security verification result is used for reflecting whether a data access request corresponding to the corresponding first data access terminal device belongs to a secure data access request, and the big data processing cloud platform is further used for refusing to execute an unsecure data access request.
The embodiment of the invention also provides a data security tamper-proof system based on big data, which is applied to a big data processing cloud platform, wherein the big data processing cloud platform is in communication connection with a plurality of data access terminal devices, and the data security tamper-proof system based on big data comprises:
the device identity determining module is used for determining the identity of any data access terminal device in the data access terminal devices to generate a device identity determining result corresponding to the data access terminal device, wherein the device identity determining result is used for reflecting whether the corresponding data access terminal device belongs to the first data access terminal device;
a correlation coefficient determining module, configured to, in a case where an apparatus identity determination result corresponding to each of the data access terminal apparatuses is formed, perform, for each of the determined first data access terminal apparatuses, apparatus correlation coefficient determination processing on the first data access terminal apparatus and each of other data access terminal apparatuses other than the first data access terminal apparatus, so as to generate an apparatus correlation coefficient between the first data access terminal apparatus and each of the other data access terminal apparatuses;
and the data security verification module is used for matching at least one second data access terminal device corresponding to the first data access terminal device from other data access terminal devices through the device correlation coefficient between the first data access terminal device and each other data access terminal device for each first data access terminal device, so as to perform data security verification processing on the data access of the first data access terminal device through the at least one second data access terminal device.
The embodiment of the invention also provides a big data processing cloud platform, and the big data processing cloud platform is used for executing the big data-based data security tamper-proof method.
According to the data security tamper-proofing method, system and cloud platform based on the big data, firstly, for each first data access terminal device, device correlation coefficient determination processing is carried out on the first data access terminal device and each other data access terminal device except the first data access terminal device, so that a device correlation coefficient is generated. Secondly, for each first data access terminal device, matching at least one second data access terminal device corresponding to the first data access terminal device from other data access terminal devices through a device correlation coefficient, so as to perform data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device. Based on the foregoing, the diversity of the devices performing the data security verification processing can be improved to a certain extent, so that the difficulty that malicious access terminal devices attack (manipulate) the devices performing the verification processing is improved, the corresponding data security verification processing can be performed more reliably, and the problem of low reliability of the data security verification in the prior art is solved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a big data processing cloud platform. Wherein the big data processing cloud platform may include a memory and a processor.
For example, the memory and the processor are electrically connected, directly or indirectly, to enable transmission or interaction of data. For example, they may be electrically connected to each other via one or more communication buses or signal lines. The memory can have at least one software functional module (computer program) stored therein, which can be in the form of software or firmware. The processor may be configured to execute the executable computer program stored in the memory, so as to implement the big data based data security tamper-proof method provided by the embodiment of the present invention.
Further, in some embodiments, the Memory may be, for example, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), programmable Read-Only Memory (PROM), erasable Read-Only Memory (EPROM), electrically Erasable Read-Only Memory (EEPROM), and the like. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a System on Chip (SoC), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Further, for example, in some embodiments, the big data processing cloud platform may be a server with data processing capabilities. The big data processing cloud platform can be in communication connection with a plurality of data access terminal devices such as mobile phones and computers.
With reference to fig. 2, an embodiment of the present invention further provides a data security tamper-proofing method based on big data, which is applicable to the big data processing cloud platform. The method steps defined by the flow related to the big data-based data security tamper-proof method can be realized by the big data processing cloud platform. The specific process shown in fig. 2 will be described in detail below.
Step S110, for any data access terminal device in the plurality of data access terminal devices, performing device identity determination processing on the data access terminal device to generate a device identity determination result corresponding to the data access terminal device.
In the embodiment of the present invention, the big data processing cloud platform may perform, for any one of the plurality of data access terminal devices, device identity determination processing on the data access terminal device, so as to generate a device identity determination result corresponding to the data access terminal device. And the equipment identity determination result is used for reflecting whether the corresponding data access terminal equipment belongs to the first data access terminal equipment.
Step S120, in a case where an apparatus identity determination result corresponding to each of the data access terminal apparatuses is formed, for each of the determined first data access terminal apparatuses, performing apparatus correlation coefficient determination processing on the first data access terminal apparatus and each of the other data access terminal apparatuses other than the first data access terminal apparatus, so as to generate an apparatus correlation coefficient between the first data access terminal apparatus and each of the other data access terminal apparatuses.
In this embodiment of the present invention, in the case that an apparatus identity determination result corresponding to each data access terminal device is formed, for each determined first data access terminal device, the big data processing cloud platform may perform apparatus correlation coefficient determination processing on the first data access terminal device and each other data access terminal device except the first data access terminal device, so as to generate an apparatus correlation coefficient between the first data access terminal device and each other data access terminal device.
Step S130, for each of the first data access terminal devices, matching at least one second data access terminal device corresponding to the first data access terminal device from the other data access terminal devices through the device correlation coefficient between the first data access terminal device and each of the other data access terminal devices, so as to perform data security verification processing on the data access of the first data access terminal device through the at least one second data access terminal device.
In this embodiment of the present invention, for each first data access terminal device, the big data processing cloud platform may match, from other data access terminal devices, at least one second data access terminal device corresponding to the first data access terminal device through a device correlation coefficient between the first data access terminal device and each other data access terminal device, so as to perform data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device.
Based on the foregoing, the diversity of the devices performing the data security verification processing can be improved to a certain extent, so that the difficulty that malicious access terminal devices attack (manipulate) the devices performing the verification processing is improved, the corresponding data security verification processing can be performed more reliably, and the problem of low reliability of the data security verification in the prior art is solved.
Further, for example, in some specific embodiments, the step S110 recited above may include the following more detailed steps:
for any data access terminal equipment in the plurality of data access terminal equipment, performing information sending monitoring processing on the data access terminal equipment to form an information sending monitoring result corresponding to the data access terminal equipment, wherein the information sending monitoring result is used for reflecting whether the corresponding data access terminal equipment sends a data access request to the big data processing cloud platform or not;
and for any data access terminal equipment in the plurality of data access terminal equipment, taking an information sending monitoring result corresponding to the data access terminal equipment as a basis for equipment identity analysis, and performing equipment identity determination processing on the data access terminal equipment to generate an equipment identity determination result corresponding to the data access terminal equipment.
Further, for example, in some specific embodiments, the step of performing, for any data access terminal device in the plurality of data access terminal devices, device identity determination processing on the data access terminal device by using the information sending monitoring result corresponding to the data access terminal device as a basis for device identity analysis to generate a device identity determination result corresponding to the data access terminal device may include the following more detailed steps:
for any data access terminal equipment in the plurality of data access terminal equipment, under the condition that an information sending monitoring result corresponding to the data access terminal equipment reflects that the data access terminal equipment sends a data access request to the big data processing cloud platform, analyzing the data access request to output equipment identity information corresponding to the data access terminal equipment;
for any data access terminal equipment in the plurality of data access terminal equipment, comparing and analyzing equipment identity information corresponding to the data access terminal equipment with a preset reference equipment identity information set to output a comparison and analysis result corresponding to the data access terminal equipment, wherein the comparison and analysis result is used for reflecting whether the equipment identity information corresponding to the data access terminal equipment is the same as one piece of reference equipment identity information in the reference equipment identity information set or not;
for any data access terminal equipment in the plurality of data access terminal equipment, forming an equipment identity determination result corresponding to the data access terminal equipment according to a comparative analysis result corresponding to the data access terminal equipment, wherein the corresponding equipment identity determination result reflects that the data access terminal equipment belongs to the first data access terminal equipment under the condition that the comparative analysis result reflects that the equipment identity information corresponding to the data access terminal equipment is different from each piece of reference equipment identity information in the reference equipment identity information set, and the corresponding equipment identity determination result reflects that the data access terminal equipment does not belong to the first data access terminal equipment under the condition that the comparative analysis result reflects that the equipment identity information corresponding to the data access terminal equipment is the same as one piece of reference equipment identity information in the reference equipment identity information set.
Further, for example, in some specific embodiments, the above-mentioned step S120 may include the following more detailed steps:
under the condition that an equipment identity determination result corresponding to each data access terminal equipment is formed, for each data access terminal equipment in the plurality of data access terminal equipment, performing data extraction processing on historical equipment interaction of the data access terminal equipment to form an equipment interaction data set corresponding to the data access terminal equipment;
and for each determined first data access terminal device, respectively performing device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device except the first data access terminal device according to the device interaction data set corresponding to the first data access terminal device and the device interaction data set corresponding to each other data access terminal device except the first data access terminal device, so as to generate a device correlation coefficient between the first data access terminal device and each other data access terminal device.
Further, for example, in some specific embodiments, the above-mentioned step of performing, for each determined first data access terminal device, device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device except the first data access terminal device according to the device interaction data set corresponding to the first data access terminal device and the device interaction data set corresponding to each other data access terminal device except the first data access terminal device, respectively, to generate the device correlation coefficient between the first data access terminal device and each other data access terminal device includes the following more detailed steps:
for each determined first data access terminal device, respectively performing set correlation calculation processing on a device interaction data set corresponding to the first data access terminal device and a device interaction data set corresponding to each other data access terminal device except the first data access terminal device to output set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device;
for each determined first data access terminal device, performing device correlation coefficient determination processing on the first data access terminal device and each other data access terminal device according to the set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device, so as to generate a device correlation coefficient between the first data access terminal device and each other data access terminal device (the device correlation coefficient and the set correlation may have a positive correlation corresponding relationship).
Further, for example, in some specific embodiments, the above-mentioned step of, for each determined first data access terminal device, performing set correlation calculation processing on the device interaction data set corresponding to the first data access terminal device and the device interaction data set corresponding to each other data access terminal device except the first data access terminal device, respectively, to output the set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device may include the following more detailed steps:
for each piece of first equipment interaction data in an equipment interaction data set corresponding to the first data access terminal equipment, performing text keyword extraction processing on the first equipment interaction data to form a corresponding first text keyword set, and for each piece of second equipment interaction data in an equipment interaction data set corresponding to the other data access terminal equipment, performing text keyword extraction processing on the second equipment interaction data to form a corresponding second text keyword set;
for each piece of first device interaction data, performing set overlap ratio calculation processing on a first text keyword set corresponding to the first device interaction data and a first text keyword set corresponding to previous adjacent first device interaction data of the first device interaction data (namely calculating the number ratio of the same first text keywords between the two sets) to output a first text set overlap ratio corresponding to the first device interaction data, and for each piece of second device interaction data, performing set overlap ratio calculation processing on a second text keyword set corresponding to the second device interaction data and a second text keyword set corresponding to previous adjacent second device interaction data of the second device interaction data to output a second text set overlap ratio corresponding to the second device interaction data;
according to the coincidence degree of a first text set corresponding to each piece of first equipment interaction data, splitting the equipment interaction data set corresponding to the first data access terminal equipment to form a plurality of first equipment interaction data sub-sets, wherein the equipment interaction data set belongs to an ordered set, and the splitting positions of the equipment interaction data set are respectively each piece of first equipment interaction data of which the coincidence degree of the corresponding first text set is less than or equal to the coincidence degree of a preset text set;
splitting the device interaction data sets corresponding to the other data access terminal devices according to the coincidence degree of the second text set corresponding to each piece of the second device interaction data to form a plurality of second device interaction data subsets, wherein the splitting positions of the device interaction data sets are respectively each piece of the second device interaction data of which the coincidence degree of the corresponding second text set is less than or equal to the coincidence degree of the preset text set;
for each first device interactive data subset, merging a first text keyword set corresponding to each piece of first device interactive data included in the first device interactive data subset to form a first text keyword merged set corresponding to the first device interactive data subset, and for each second device interactive data subset, merging a second text keyword set corresponding to each piece of second device interactive data included in the second device interactive data subset to form a second text keyword merged set corresponding to the second device interactive data subset;
respectively calculating the text set overlap ratio between each first text keyword merging set and each second text keyword merging set, and then performing fusion processing (such as average value calculation) on the text set overlap ratio between each first text keyword merging set and each second text keyword merging set to form first set correlation;
forming a first object set corresponding to the first data access terminal device according to a data interaction object corresponding to each piece of first device interaction data in a device interaction data set corresponding to the first data access terminal device (the data interaction object refers to another terminal device performing interaction), and forming a second object set corresponding to the other data access terminal device according to a data interaction object corresponding to each piece of second device interaction data in a device interaction data set corresponding to the other data access terminal device;
and performing set contact degree calculation processing on the first object set and the second object set to output a second set correlation degree, and performing fusion processing (such as weighted sum calculation processing) on the second set correlation degree and the first set correlation degree to output a set correlation degree between the first data access terminal device and the other data access terminal devices.
Further, for example, in some specific embodiments, the above-mentioned step of, for each determined first data access terminal device, performing set correlation calculation processing on the device interaction data set corresponding to the first data access terminal device and the device interaction data set corresponding to each other data access terminal device except the first data access terminal device, respectively, to output the set correlation between the first data access terminal device and each other data access terminal device except the first data access terminal device may include the following more detailed steps:
for each piece of first equipment interaction data in an equipment interaction data set corresponding to the first data access terminal equipment, performing text keyword extraction processing on the first equipment interaction data to form a corresponding first text keyword set, and for each piece of second equipment interaction data in an equipment interaction data set corresponding to the other data access terminal equipment, performing text keyword extraction processing on the second equipment interaction data to form a corresponding second text keyword set;
randomly splitting the device interaction data set corresponding to the first data access terminal device to form a plurality of first device interaction data subsets, and then randomly splitting the device interaction data sets corresponding to the other data access terminal devices to form a plurality of second device interaction data subsets, wherein the device interaction data sets belong to an ordered set;
for each first device interaction data subset in the multiple first device interaction data subsets, merging the first text keyword sets corresponding to each piece of first device interaction data included in the first device interaction data subsets to form a first text keyword merged set corresponding to the first device interaction data subsets;
for each second device interaction data subset in the multiple second device interaction data subsets, merging a second text keyword set corresponding to each piece of second device interaction data included in the second device interaction data subset to form a second text keyword merged set corresponding to the second device interaction data subset;
for each first text keyword included in each first text keyword merging set, performing word frequency statistical processing on the first text keyword according to each first device interaction data included in a device interaction data set corresponding to the first data access terminal device and each second device interaction data included in a device interaction data set corresponding to the other data access terminal device to output a first word frequency statistical value (namely, the number of occurrences) corresponding to the first text keyword, and for each second text keyword included in each second text keyword merging set, performing word frequency statistical processing on the second text keyword according to each first device interaction data included in a device interaction data set corresponding to the first data access terminal device and each second device interaction data included in a device interaction data set corresponding to the other data access terminal device to output a second word frequency statistical value corresponding to the second text keyword;
for each first text keyword merging set, according to a first word frequency statistic corresponding to each first text keyword included in the first text keyword merging set, performing importance determination processing on a first device interaction data subset corresponding to the first text keyword merging set (a sum of the first word frequency statistics corresponding to each first text keyword may be calculated first, and then a first importance is determined according to the sum, the first importance positively relates to the sum), so as to output the first importance corresponding to the first device interaction data subset;
for each second text keyword merging set, according to a second word frequency statistic value corresponding to each second text keyword included in the second text keyword merging set, performing importance degree determination processing on a second device interactive data subset corresponding to the second text keyword merging set to output a second importance degree corresponding to the second device interactive data subset;
screening the plurality of first device interaction data subsets according to a first importance degree corresponding to each first device interaction data subset (for example, a first device interaction data subset with a larger first importance degree can be screened out to be used as a target first device interaction data subset), so as to output at least one target first device interaction data subset corresponding to the first data access terminal device, and then screening the plurality of second device interaction data subsets according to a second importance degree corresponding to each second device interaction data subset, so as to output at least one target second device interaction data subset corresponding to the other data access terminal devices;
respectively calculating text set overlap ratio between a first text keyword merging set corresponding to each target first device interactive data sub-set and a second text keyword merging set corresponding to each target second device interactive data sub-set, and then performing fusion processing (such as mean value calculation) on the text set overlap ratio between each first text keyword merging set and each second text keyword merging set to form first set correlation;
respectively calculating object set overlap ratio between a first data interaction object set corresponding to each target first device interaction data subset and a second data interaction object set corresponding to each target second device interaction data subset, and then performing fusion processing (such as mean value calculation and the like) on the object set overlap ratio between each first data interaction object set and each second data interaction object set to form second set correlation;
and performing fusion processing (for example, calculation of a weighted sum value and the like) on the second set correlation and the first set correlation to output the set correlation between the first data access terminal device and the other data access terminal devices.
Further, for example, in some specific embodiments, the step S130 described above may include the following more detailed steps:
for each first data access terminal device, screening out a target number of other data access terminal devices with the minimum device correlation coefficient from the other data access terminal devices except the first data access terminal device, and marking each other data access terminal device in the target number of other data access terminal devices as a second data access terminal device corresponding to the first data access terminal device;
and for each first data access terminal device, respectively performing data security verification processing on the data access of the first data access terminal device through each second data access terminal device corresponding to the first data access terminal device.
Further, for example, in some specific embodiments, the step of performing, for each first data access terminal device, data security verification processing on data access of the first data access terminal device through each second data access terminal device corresponding to the first data access terminal device may include the following more detailed steps:
for each first data access terminal device, respectively sending a data access request corresponding to data access of the first data access terminal device to each second data access terminal device corresponding to the first data access terminal device for data security verification processing to form a data security verification result corresponding to each second data access terminal device, wherein the data security verification result is used for representing whether the corresponding data access request belongs to a secure data access request;
for each first data access terminal device, performing fusion processing on the data security verification result corresponding to each second data access terminal device according to a device correlation coefficient between the first data access terminal device and each second data access terminal device corresponding to the first data access terminal device (in the fusion processing, the greater the device correlation coefficient is, the greater the importance of the corresponding data security verification result is), so as to form a target data security verification result corresponding to the first data access terminal device, where the target data security verification result is used to reflect whether the data access request corresponding to the corresponding first data access terminal device belongs to a secure data access request, and the large data processing cloud platform is further used to reject execution of an unsecure data access request.
With reference to fig. 3, an embodiment of the present invention further provides a data security tamper-proofing system based on big data, which is applicable to the big data processing cloud platform. The data security tamper-proof system can comprise an equipment identity determining module, a correlation coefficient determining module and a data security verifying module.
Further, for example, in some specific embodiments, the device identity determining module is configured to perform, for any data access terminal device in the multiple data access terminal devices, device identity determining processing on the data access terminal device to generate a device identity determining result corresponding to the data access terminal device, where the device identity determining result is used to reflect whether the corresponding data access terminal device belongs to the first data access terminal device. The correlation coefficient determining module is configured to, in a case where an apparatus identity determination result corresponding to each of the data access terminal apparatuses is formed, perform, for each of the determined first data access terminal apparatuses, apparatus correlation coefficient determination processing on the first data access terminal apparatus and each of the other data access terminal apparatuses other than the first data access terminal apparatus, so as to generate an apparatus correlation coefficient between the first data access terminal apparatus and each of the other data access terminal apparatuses. The data security verification module is configured to, for each first data access terminal device, match at least one second data access terminal device corresponding to the first data access terminal device from the other data access terminal devices through a device correlation coefficient between the first data access terminal device and each other data access terminal device, and perform data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device.
In summary, according to the data security tamper-proofing method, system and cloud platform based on big data provided by the present invention, first, for each first data access terminal device, a device correlation coefficient determination process is performed on the first data access terminal device and each other data access terminal device except the first data access terminal device, so as to generate a device correlation coefficient. Secondly, for each first data access terminal device, matching at least one second data access terminal device corresponding to the first data access terminal device from other data access terminal devices through a device correlation coefficient, so as to perform data security verification processing on data access of the first data access terminal device through the at least one second data access terminal device. Based on the above, the diversity of the devices for performing data security verification processing can be improved to a certain extent, so that the difficulty of attacking (operating) the devices for performing verification processing by malicious access terminal devices is improved, the corresponding data security verification processing can be performed more reliably, and the problem of low reliability of data security verification in the prior art is solved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.