CN115828012A - Internet data analysis method and system based on big data - Google Patents

Internet data analysis method and system based on big data Download PDF

Info

Publication number
CN115828012A
CN115828012A CN202211271926.5A CN202211271926A CN115828012A CN 115828012 A CN115828012 A CN 115828012A CN 202211271926 A CN202211271926 A CN 202211271926A CN 115828012 A CN115828012 A CN 115828012A
Authority
CN
China
Prior art keywords
internet
interaction data
data
cluster
interactive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211271926.5A
Other languages
Chinese (zh)
Inventor
丁浩冉
王梦琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuzhou Hai Qing Mdt Infotech Ltd
Original Assignee
Xuzhou Hai Qing Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuzhou Hai Qing Mdt Infotech Ltd filed Critical Xuzhou Hai Qing Mdt Infotech Ltd
Priority to CN202211271926.5A priority Critical patent/CN115828012A/en
Publication of CN115828012A publication Critical patent/CN115828012A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

According to the internet data analysis method and system based on big data, target internet interaction data in an internet interaction data cluster needing to be analyzed and processed are analyzed again, and a final analysis result is generated. The method comprises the steps of analyzing and processing target internet interaction data in an internet interaction data cluster needing to be analyzed and processed one by one through a plurality of analysis methods to generate a plurality of first analysis results; and optimizing the plurality of first analysis results to determine a current interactive label cluster of each target internet interactive data, and analyzing the target internet interactive data in the internet interactive data cluster to be analyzed and processed again based on the current interactive label cluster of the target internet interactive data, so that the accuracy and reliability of the analysis result of the target internet interactive data in the internet interactive data cluster to be analyzed and processed can be improved.

Description

Internet data analysis method and system based on big data
Technical Field
The application relates to the technical field of data analysis, in particular to an internet data analysis method and system based on big data.
Background
The data analysis means that a large amount of collected data is analyzed by using a proper statistical analysis method, and the collected data is summarized, understood and digested so as to maximally develop the function of the data and play the role of the data. Data analysis is the process of studying and summarizing data in detail to extract useful information and to form conclusions.
At present, when big data is combined with an internet data analysis technology, because the data analysis technology is not perfect enough, the internet interactive data may not be accurately and reliably analyzed. Therefore, it is difficult to guarantee the accuracy and reliability of the analysis result of the target internet interaction data.
Disclosure of Invention
In order to solve the technical problems in the related art, the present disclosure provides a big data based internet data analysis method and system.
In a first aspect, a big data-based internet data analysis method is provided, which is applied to a data analysis system, and the method at least includes: the method comprises the steps of obtaining an internet interactive data cluster needing to be analyzed, wherein the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data; pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one; determining the current interactive label cluster of each target internet interactive data by combining the first analysis results corresponding to the at least two analysis methods one by one; and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed again through the current interaction label cluster of each target internet interaction data to generate a final analysis result.
In an independently implemented embodiment, the pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one, includes: randomly identifying the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed to generate at least two local interaction data clusters; and pre-analyzing the target internet interaction data in each local interaction data cluster one by one according to the at least two analysis methods to generate the first analysis results corresponding to each local interaction data cluster one by one between each analysis method and each local interaction data cluster.
In an independently implemented embodiment, the pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed one by one based on not less than two analysis methods to generate first analysis results corresponding to the not less than two analysis methods one by one, and before the pre-analyzing, the method further includes: carrying out knowledge derivation information selection on the target internet interactive data in the internet interactive data cluster needing to be analyzed and processed to generate first knowledge derivation information of the target internet interactive data; simplifying the first knowledge deduction information of each target internet interaction data respectively to generate second knowledge deduction information of each target internet interaction data; the pre-analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed one by one based on not less than two analysis methods to generate first analysis results corresponding to the not less than two analysis methods one by one, and the method comprises the following steps: and pre-analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed one by using the at least two analysis methods and through second knowledge derivation information of each target internet interaction data to generate first analysis results corresponding to the at least two analysis methods one by one.
In an independently implemented embodiment, the simplifying the first knowledge derivation information of each target internet interaction data to generate the second knowledge derivation information of each target internet interaction data includes: and simplifying the first knowledge deduction information of each target internet interaction data one by adopting a principal component analysis method to generate second knowledge deduction information of each target internet interaction data.
In an independently implemented embodiment, the determining, by combining the first analysis results corresponding to the at least two analysis methods one by one, a current interactive label cluster of each target internet interaction data includes: traversing each target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, and determining a neighbor internet interaction data cluster corresponding to the screened target internet interaction data based on a common score between the screened target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, wherein the neighbor internet interaction data cluster comprises a specified number of neighbor internet interaction data with the maximum common score of the screened target internet interaction data; and determining a current interactive label cluster of the target internet interactive data by combining a neighbor internet interactive data cluster of the target internet interactive data and the first analysis results corresponding to the at least two analysis methods one by one, wherein the current interactive label cluster is a staged set of the neighbor internet interactive data cluster.
In an independently implemented embodiment, the determining, by combining the neighboring internet interaction data cluster of the target internet interaction data and the first analysis result corresponding to the at least two analysis methods one by one, a current interaction tag cluster of the target internet interaction data includes: determining whether the target internet interaction data and the neighbor internet interaction data belong to the same label or not by combining the credible weights of the target internet interaction data and the neighbor internet interaction data in the first analysis result; and determining the current interactive label cluster of the target internet interactive data by combining the neighbor internet interactive data which belongs to the same label as the target internet interactive data in the neighbor internet interactive data cluster.
In an independently implemented embodiment, the first analysis result includes an analysis manner of the target internet interaction data included in the internet interaction data cluster to be analyzed; the determining whether the target internet interaction data and the neighbor internet interaction data belong to the same label or not by combining the credibility weights of the target internet interaction data and the neighbor internet interaction data in the first analysis result comprises: screening the neighbor internet interaction data in the neighbor internet interaction data cluster corresponding to the target internet interaction data; determining the quantity of the first analysis results covering the target internet interaction data and the screened neighbor internet interaction data simultaneously in all the first analysis results to be a first quantity; determining the number of the first analysis results with the same analysis mode of the target internet interaction data and the screened neighbor internet interaction data as a second number in all the first analysis results covering the target internet interaction data and the screened neighbor internet interaction data at the same time; and determining whether the target internet interaction data and the screened neighbor internet interaction data belong to the same label or not by combining the credible weight of the second number in the first number.
In an independently implemented embodiment, the determining whether the filtered target internet interaction data and the neighbor internet interaction data belong to the same tag in combination with the trusted weight of the second number in the first number includes: in response to the second number exceeding a given vector by a confidence weight of the first number, determining that the target internet interaction data and the filtered neighbor internet interaction data belong to the same label; determining that the target internet interaction data and the filtered neighbor internet interaction data do not belong to the same tag in response to the second number not having a confidence weight in the first number greater than the designated vector.
In an independently implemented embodiment, the re-analyzing the target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed through the current interaction tag cluster of each target internet interaction data to generate a final analysis result includes: determining the relevance between two random target internet interactive data through the current interactive label clusters of the target internet interactive data; based on the relevance between two random target internet interaction data, distinguishing the target internet interaction data in the internet interaction data cluster needing to be analyzed into at least one label internet interaction data cluster; the label internet interaction data cluster comprises not less than two target internet interaction data of which the relevance exceeds a relevance target value; and integrating the at least one label Internet interactive data cluster to generate a final analysis result of the Internet interactive data cluster needing to be analyzed.
In an independently implemented embodiment, the relevance between the random two target internet interaction data is determined by the following method comprising the following steps: randomly screening two target internet interaction data from the target internet interaction data, wherein one target internet interaction data is first internet interaction data, and the other target internet interaction data is second internet interaction data; determining a first neighbor set by combining the characteristic of the sharing between the complementary queue of the current interactive label cluster of the second internet interactive data and the current interactive label cluster of the first internet interactive data; the first neighbor set is a staged set of a current interactive label cluster of the first internet interactive data; determining a second neighbor set by combining the characteristic of the sharing between the complementary queue of the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data; the second neighbor set is a staged set of a current interactive label cluster of the second internet interactive data; and determining the relevance between the first internet interaction data and the second internet interaction data by combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data in sequence.
In an independently implemented embodiment, the determining, by combining the first neighbor set and the second neighbor set, the association between the first internet interaction data and the second internet interaction data in sequence with the current interactive label cluster of the first internet interaction data and the current interactive label cluster of the second internet interaction data includes: combining the first neighbor set and the second neighbor set with the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data in sequence, and determining a common coefficient comparison result corresponding to the first internet interactive data and the second internet interactive data one by one; and determining the relevance between the first internet interaction data and the second internet interaction data based on the common coefficient comparison result with the maximum comparison vector in the common coefficient comparison results corresponding to the first internet interaction data and the second internet interaction data one by one.
In an independently implemented embodiment, the determining, by combining the first neighbor set and the second neighbor set, a result of comparing common coefficients corresponding to the first internet interaction data and the second internet interaction data one by one, sequentially with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data, includes: determining a corresponding correlation metric value between the first internet interaction data and the second internet interaction data and between the first neighbor set and the second neighbor set in turn by combining the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data and the first neighbor set and the second neighbor set in turn; determining a common coefficient comparison result of the first internet interaction data by combining the correlation metric value corresponding to the first internet interaction data and the first neighbor set and a comparison vector between the correlation metric value corresponding to the first internet interaction data and the second neighbor set; and determining a common coefficient comparison result of the second internet interaction data by combining the correlation metric value corresponding to the second internet interaction data and the first neighbor set and the comparison vector between the correlation metric value corresponding to the second internet interaction data and the second neighbor set.
In an independently implemented embodiment, the final analysis result includes a plurality of key tags, and each key tag includes not less than one piece of target internet interaction data; the integrating the at least one label internet interactive data cluster to generate a final analysis result of the internet interactive data cluster needing to be analyzed further comprises: integrating the at least one label internet interactive data cluster based on the condition that whether two label internet interactive data clusters cover the same target internet interactive data randomly or not, and generating at least one key label; each key label comprises not less than one label internet interaction data cluster; when the key tag comprises a plurality of tag internet interaction data clusters, two random tag internet interaction data clusters in the plurality of tag internet interaction data clusters cover the same target internet interaction data; responding to the situation that the two label internet interaction data clusters contain the same target internet interaction data, and combining all the target internet interaction data contained in the two analysis internet interaction data clusters to form the key label; belonging to the same label.
In a second aspect, a big data based internet data analysis system is provided, which comprises a processor and a memory, wherein the processor and the memory are communicated with each other, and the processor is used for reading the computer program from the memory and executing the computer program to realize the method.
According to the internet data analysis method and system based on big data, an internet interactive data cluster needing to be analyzed is obtained, and the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data; pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one; determining a current interactive label cluster of each target internet interactive data based on first analysis results corresponding to at least two analysis methods one by one; and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed again based on the current interaction label cluster of each target internet interaction data to generate a final analysis result. The method comprises the steps of analyzing and processing target internet interaction data in an internet interaction data cluster needing to be analyzed and processed one by one through a plurality of analysis methods to generate a plurality of first analysis results; and optimizing the plurality of first analysis results to determine a current interactive label cluster of each target internet interactive data, and analyzing the target internet interactive data in the internet interactive data cluster to be analyzed again based on the current interactive label cluster of the target internet interactive data, so that the accuracy and reliability of the analysis results of the target internet interactive data in the internet interactive data cluster to be analyzed can be improved.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart of an internet data analysis method based on big data according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of an internet data analysis device based on big data according to an embodiment of the present disclosure.
Fig. 3 is an architecture diagram of an internet data analysis system based on big data according to an embodiment of the present disclosure.
Detailed Description
In order to better understand the technical solutions of the present disclosure, the following detailed descriptions of the technical solutions of the present disclosure are provided with the accompanying drawings and the specific embodiments, and it should be understood that the specific features of the embodiments and the examples of the present disclosure are the detailed descriptions of the technical solutions of the present disclosure, and are not limitations of the technical solutions of the present disclosure, and the technical features of the embodiments and the examples of the present disclosure may be combined with each other without conflict.
Referring to fig. 1, a big data-based internet data analysis method is shown, which may include the technical solutions described in the following steps 11 to 14.
Step11: and obtaining an internet interactive data cluster needing to be analyzed, wherein the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data.
Illustratively, knowledge derivation information selection is carried out on target internet interactive data in an internet interactive data cluster needing analysis processing, and first knowledge derivation information of the target internet interactive data is generated; and simplifying the first knowledge deduction information of each target internet interaction data respectively to generate second knowledge deduction information of each target internet interaction data. Illustratively, the first knowledge derivation information of each target internet interaction data is simplified one by adopting a principal component analysis method, and second knowledge derivation information of each target internet interaction data is generated.
Step12: target internet interaction data in the internet interaction data cluster needing analysis processing are pre-analyzed one by one based on at least two analysis methods, and first analysis results corresponding to the at least two analysis methods one by one are generated.
Exemplarily, randomly identifying target internet interaction data in an internet interaction data cluster needing to be analyzed and processed to generate at least two local interaction data clusters; target internet interaction data in each local interaction data cluster is pre-analyzed one by one through not less than two analysis methods, and first analysis results corresponding to each analysis method and each local interaction data cluster one by one are generated.
And pre-analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on second knowledge derivation information of each target internet interaction data by adopting at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one.
Step13: and determining the current interactive label cluster of the interactive data of each target internet based on the first analysis results corresponding to the at least two analysis methods one by one.
Illustratively, traversing each target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, and determining a neighbor internet interaction data cluster corresponding to the screened target internet interaction data based on a common score between the screened target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, wherein the neighbor internet interaction data cluster comprises a specified number of neighbor internet interaction data with the maximum common score with the screened target internet interaction data; and determining a current interactive label cluster of the target internet interactive data based on the neighbor internet interactive data cluster of the target internet interactive data and the first analysis results corresponding to the at least two analysis methods one by one, wherein the current interactive label cluster is a staged set of the neighbor internet interactive data cluster.
In the embodiment, whether the target internet interaction data and the neighbor internet interaction data belong to the same label is determined based on the credible weight of the target internet interaction data and the neighbor internet interaction data in the first analysis result; and determining the current interactive label cluster of the target internet interactive data based on the neighbor internet interactive data which belongs to the same label as the target internet interactive data in the neighbor internet interactive data cluster.
Step14: and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed again based on the current interaction label cluster of each target internet interaction data to generate a final analysis result.
Illustratively, screening neighbor internet interaction data from a neighbor internet interaction data cluster corresponding to target internet interaction data; determining the quantity of the first analysis results which simultaneously cover the target internet interaction data and the screened neighbor internet interaction data to be a first quantity in all the first analysis results; determining the quantity of first analysis results with the same analysis mode of the target internet interaction data and the screened neighbor internet interaction data as a second quantity in all first analysis results covering the target internet interaction data and the screened neighbor internet interaction data at the same time; and determining whether the target internet interaction data and the screened neighbor internet interaction data belong to the same label or not based on the credible weights of the second number in the first number.
In response to the credible weight of the second number in the first number exceeding a specified vector, determining that the target internet interactive data and the screened neighbor internet interactive data belong to the same label; and in response to the credible weight of the second number in the first number not being greater than the designated vector, determining that the target internet interaction data and the screened neighbor internet interaction data do not belong to the same label.
In the embodiment, the relevance between two random target internet interactive data is determined based on the current interactive label cluster of each target internet interactive data; based on the relevance between two random target internet interaction data, distinguishing the target internet interaction data in the internet interaction data cluster needing to be analyzed into at least one label internet interaction data cluster; the label internet interactive data cluster comprises not less than two target internet interactive data with the relevance exceeding a relevance target value; and integrating at least one label internet interactive data cluster to generate a final analysis result of the internet interactive data cluster needing to be analyzed.
In one possible implementation, a first neighbor set is determined based on a characteristic of a commonality between the complementary queue of the current interactive tag cluster of the second internet interaction data and the current interactive tag cluster of the first internet interaction data; the first neighbor set is a staged set of a current interactive label cluster of the first internet interactive data; determining a second neighbor set based on the sharing characteristics between the complementary queue of the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data; the second neighbor set is a staged set of the current interactive label cluster of the second internet interactive data; and determining the relevance between the first internet interaction data and the second internet interaction data based on the first neighbor set and the second neighbor set and the current interaction label cluster of the first internet interaction data and the current interaction label cluster of the second internet interaction data in sequence.
Determining common coefficient comparison results corresponding to the first internet interaction data and the second internet interaction data one by one based on the first neighbor set and the second neighbor set and the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data in sequence; and determining the relevance between the first internet interactive data and the second internet interactive data based on the result of the comparison of the common coefficient with the maximum comparison vector in the result of the comparison of the common coefficients corresponding to the first internet interactive data and the second internet interactive data one by one.
Exemplarily, based on a current interactive label cluster of the first internet interactive data and a current interactive label cluster of the second internet interactive data sequentially and a first neighbor set and a second neighbor set, determining corresponding correlation metric values between the first internet interactive data and the second internet interactive data sequentially and the first neighbor set and the second neighbor set; determining a common coefficient comparison result of the first internet interaction data based on a comparison vector between a correlation metric value corresponding to the first internet interaction data and the first neighbor set and a correlation metric value corresponding to the first internet interaction data and the second neighbor set; and determining a common coefficient comparison result of the second internet interaction data based on a comparison vector between the correlation metric value corresponding to the second internet interaction data and the first neighbor set and the correlation metric value corresponding to the second internet interaction data and the second neighbor set.
In this embodiment, the final analysis result includes a plurality of key tags, and each key tag includes at least one piece of target internet interaction data. Integrating at least one label internet interactive data cluster based on the condition that whether two random label internet interactive data clusters cover the same target internet interactive data or not, and generating at least one key label; each key label comprises at least one label internet interaction data cluster; when the key label comprises a plurality of label internet interaction data clusters, two random label internet interaction data clusters in the plurality of label internet interaction data clusters cover the same target internet interaction data; responding to the situation that the same target internet interaction data are covered in the two label internet interaction data clusters, combining all target internet interaction data covered in the two analysis internet interaction data clusters to form a key label; belonging to the same label.
The internet data analysis method based on big data provided by the embodiment comprises the steps of obtaining an internet interactive data cluster needing to be analyzed, wherein the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data; pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one; determining a current interactive label cluster of each target internet interactive data based on first analysis results corresponding to not less than two analysis methods one by one; and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed again based on the current interaction label cluster of each target internet interaction data. The method comprises the steps of analyzing and processing target internet interaction data in an internet interaction data cluster needing to be analyzed and processed one by one through a plurality of analysis methods to generate a plurality of first analysis results; and optimizing the plurality of first analysis results to determine a current interaction tag cluster of each target internet interaction data, and analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed again based on the current interaction tag cluster of the target internet interaction data, so that the accuracy and reliability of the analysis result of the target internet interaction data in the internet interaction data cluster to be analyzed and processed can be improved, and a final analysis result is generated.
The present embodiment provides a big data based internet data analysis method, and the content described in the big data based internet data analysis method may specifically include the following steps.
Step201: and obtaining the Internet interactive data cluster needing to be analyzed.
Illustratively, the internet interaction data cluster to be analyzed includes several target internet interaction data. The target internet interaction data comprises first internet interaction data, second internet interaction data and the like. The rest internet interactive data covering the same attribute part can be also taken. In one possible implementation example, the X pieces of first internet interaction data form an internet interaction data cluster which needs to be analyzed. The X first internet interaction data may be Y indications of differences and each first internet interaction data covers only one first. X, Y are integers greater than 0 and X is equal to or greater than Y.
Step202: and selecting knowledge deduction information of target internet interactive data in the internet interactive data cluster needing analysis processing, and generating first knowledge deduction information of the target internet interactive data.
Illustratively, knowledge derivation information selection is performed on target internet interaction data in an internet interaction data cluster to be analyzed through a knowledge derivation information selection model, so as to generate first knowledge derivation information of each target internet interaction data. The direction of the first knowledge derivation information of all target internet interaction data covered in the internet interaction data cluster needing analysis processing is a direction. For example, the first knowledge derivation information of all target internet interaction data included in the internet interaction data cluster to be analyzed is represented as each of the first knowledge derivation information z1, z2, z3,. And zn belonging to Ra.
Step203: and simplifying the first knowledge deduction information of each target internet interaction data to generate second knowledge deduction information of each target internet interaction data.
For example, in order to reduce the operation time and the memory credibility weight in the analysis process, the first knowledge derivation information of the target internet interaction data may be simplified to generate the second knowledge derivation information of the target internet interaction data.
In this embodiment, a system analysis method (systematic analysis method) is adopted to simplify the first knowledge derivation information of each target internet interaction data one by one, and generate second knowledge derivation information of each target internet interaction data.
The direction of the second knowledge derivation information is b dimension, b is less than a. For example, the second knowledge derivation information of all target internet interaction data included in the internet interaction data cluster to be analyzed is represented as each of the first knowledge derivation information z1, z2, z3,. And zn belonging to Rb.
Step204: and randomly identifying target internet interactive data in the internet interactive data cluster needing to be analyzed and processed to generate at least two local interactive data clusters.
Illustratively, in order to improve the accuracy and reliability of an analysis result, target internet interactive data selected from internet interactive data clusters needing to be analyzed are randomly identified, and a plurality of local interactive data clusters are determined.
In this embodiment, each time a specified number of target internet interaction data are screened from the internet interaction data cluster to be analyzed and processed, the selected target internet interaction data belong to a local interaction data cluster, the selected target internet interaction data are placed back into the internet interaction data cluster to be analyzed and processed, the specified number of target internet interaction data are screened and belong to the local interaction data cluster, until the target internet interaction data in the local interaction data cluster meet the specified number, and one-time screening of the target internet interaction data is finished. Part of target internet interactive data in the internet interactive data cluster needing analysis processing may be screened for several times, and part of the target internet interactive data may not be selected once.
Step205: and pre-analyzing the target internet interaction data in the local interaction data cluster one by one based on second knowledge derivation information of each target internet interaction data by adopting at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one.
Illustratively, in order to improve the accuracy of the analysis result, a plurality of analysis methods are used to perform pre-analysis on target internet interaction data in the local interaction data cluster one by one, and a first analysis result of each analysis method for analyzing the local interaction data cluster is generated. In other words, there is H analysis method to analyze the local interaction data clusters one by one, and generate H kinds of first analysis results of the local interaction data clusters. The first analysis result comprises an analysis mode of target internet interaction data contained in the internet interaction data cluster needing to be analyzed.
Step206: and traversing each target internet interaction data in the internet interaction data cluster needing to be analyzed, and determining a neighbor internet interaction data cluster corresponding to the screened target internet interaction data based on the common score between the screened target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster needing to be analyzed.
Illustratively, the neighbor internet interaction data cluster includes a specified number of neighbor internet interaction data having the largest commonality score with the filtered target internet interaction data.
And deducing a common score among information according to the first knowledge of each target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, and determining the common score among the target internet interaction data. The common scores corresponding to the screened target internet interaction data and the rest of the target internet interaction data one by one are arranged according to a certain distribution mode (for example, the common scores are arranged from small to large or from large to small), the rest of the target internet interaction data corresponding to the u common scores one by one which are screened are determined as neighbor internet interaction data of the screened target internet interaction data, and the neighbor internet interaction data cluster of the screened target internet interaction data is formed by the u neighbor internet interaction data.
And traversing each target internet interactive data in the internet interactive data cluster to be analyzed and processed to generate a neighbor internet interactive data cluster of each target internet interactive data.
Step207: and determining a current interactive label cluster of the target internet interactive data based on a neighbor internet interactive data cluster of the target internet interactive data and first analysis results corresponding to the two analysis methods one by one.
For example, to further determine whether the target internet interaction data and each neighbor internet interaction data in the neighbor internet interaction data cluster belong to the same tag, it is determined whether the target internet interaction data and the neighbor internet interaction data belong to the same person.
Step207 is further explained by the internet data analysis method based on big data, and specifically, the following steps may be included.
Step207a: and determining whether the target internet interactive data and the neighbor internet interactive data belong to the same label or not based on the credibility weight of the target internet interactive data and the neighbor internet interactive data in the first analysis result.
Illustratively, screening neighbor internet interaction data from a neighbor internet interaction data cluster corresponding to target internet interaction data; and determining the quantity of the first analysis results which simultaneously cover the target internet interaction data and the screened neighbor internet interaction data to be a first quantity in all the first analysis results. Determining the quantity of first analysis results with the same analysis mode of the target internet interaction data and the screened neighbor internet interaction data as a second quantity in all first analysis results covering the target internet interaction data and the screened neighbor internet interaction data at the same time; and determining whether the target internet interaction data and the screened neighbor internet interaction data belong to the same label or not based on the credible weights of the second number in the first number.
Step207b: and determining the current interactive label cluster of the target internet interactive data based on the neighbor internet interactive data which belongs to the same label as the target internet interactive data in the neighbor internet interactive data cluster.
Illustratively, the target internet interaction data is determined to belong to the same label as the filtered neighbor internet interaction data in response to the second number exceeding the specified vector by the confidence weight in the first number. In other words, if the probability that the target internet interaction data z1 and the neighbor internet interaction data belong to the same tag exceeds the given vector, the neighbor internet interaction data is determined as the current neighbor internet interaction data of the target internet interaction data z 1.
And in response to the credible weight of the second number in the first number not being greater than the designated vector, determining that the target internet interaction data and the screened neighbor internet interaction data do not belong to the same label. In other words, the probability that the target internet interaction data z1 and the neighbor internet interaction data belong to the same tag is not greater than the specified vector, and then the neighbor internet interaction data is not the current neighbor internet interaction data of the target internet interaction data z 1.
Traversing the probability between each neighbor internet interactive data in the neighbor internet interactive data cluster of the target internet interactive data z1 and the target internet interactive data z1, screening all neighbor internet interactive data belonging to the same label as the target internet interactive data z1 in the neighbor internet interactive data cluster, determining the neighbor internet interactive data as the current neighbor internet interactive data, and forming the current interactive label cluster of the target internet interactive data z1 through the current neighbor internet interactive data. For example, the current interactive label cluster of the target internet interactive data z1 is, wherein u1 represents the number of current neighbor internet interactive data covered in the current interactive label cluster of the target internet interactive data z 1.
Compared with the current interactive label cluster of the target internet interactive data z1, the interference covered in the neighbor internet interactive data cluster of the target internet interactive data z1 is smaller, and the probability that the target internet interactive data z1 and the current neighbor internet interactive data covered in the current interactive label cluster are one is higher.
In this embodiment, two target internet interaction data are randomly screened from a plurality of target internet interaction data, one target internet interaction data is first internet interaction data, and the other target internet interaction data is second internet interaction data. For some possible embodiments, the internet interaction data cluster to be analyzed includes at least a first internet interaction data, a second internet interaction data and a third internet interaction data.
Step208: and determining the corresponding relevance between the first internet interaction data and the second internet interaction data based on the current interaction label cluster of the first internet interaction data and the current interaction label cluster of the second internet interaction data.
The Step208 is further described in the internet data analysis method based on big data, and specifically includes the following steps.
Step208a: and determining a first neighbor set based on the sharing characteristic between the complementary queue of the current interactive label cluster of the second internet interactive data and the current interactive label cluster of the first internet interactive data.
Illustratively, the first neighbor set M1 is determined by a characteristic of the sharing between the complementary queue of the current interactive tag cluster N of the second internet interaction data and the current interactive tag cluster M of the first internet interaction data. The first neighbor set M1 is an element of the current interactive label cluster M of the first internet interactive data that is not covered in the current interactive label cluster N of the second internet interactive data. In other words, the first neighbor set M1 may be represented as. The first neighbor set is a staged set of current interactive label clusters of the first internet interactive data.
Step208b: and determining a second neighbor set based on the characteristic of the sharing between the complementary queue of the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data.
Illustratively, the second neighbor set N1 is determined by a characteristic of the sharing between the complementary queue of the current interactive tag cluster M of the first internet interaction data and the current interactive tag cluster N of the second internet interaction data. The second neighbor set N1 is an element of the current interactive tag cluster N of the second internet interactive data that is not covered in the current interactive tag cluster M of the first internet interactive data. In other words, the second neighbor set N1 may be represented as. The second neighbor set is a staged set of current interactive label clusters of the second internet interactive data.
Step208c: and determining the relevance between the first internet interaction data and the second internet interaction data based on the first neighbor set and the second neighbor set and the current interaction label cluster of the first internet interaction data and the current interaction label cluster of the second internet interaction data in sequence.
Illustratively, the specific step of determining the association between the first internet interaction data and the second internet interaction data based on the first neighbor set and the second neighbor set sequentially interacting with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data is as follows.
The Step208c is further described in the internet data analysis method based on big data, and may specifically include the following steps.
Step208c1: and determining corresponding correlation metric values between the first internet interaction data and the second internet interaction data and between the first neighbor set and the second neighbor set in sequence based on the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data and the first neighbor set and the second neighbor set in sequence.
In one possible implementation, for the current interactive label cluster M of the first internet interaction data, a correlation metric value between the current interactive label cluster M of the first internet interaction data and the first neighbor set M1 and the second neighbor set N1 in turn is calculated.
Step208c2: and determining common coefficient comparison results corresponding to the first internet interaction data and the second internet interaction data one by one based on the first neighbor set and the second neighbor set and the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data in sequence.
Illustratively, the result of the comparison of the commonality coefficient corresponding to the first internet interaction data is determined by a comparison vector between the correlation metric value between the current interactive label cluster M of the first internet interaction data and the first neighbor set M1 and the correlation metric value between the current interactive label cluster M of the first internet interaction data and the second neighbor set N1.
And determining a common coefficient comparison result corresponding to the second internet interaction data through a comparison vector between a correlation metric value between the current interaction label cluster N of the second internet interaction data and the first neighbor set M1 and a correlation metric value between the current interaction label cluster N of the second internet interaction data and the second neighbor set N1.
If the comparison vector between the result StepA of the comparison of the commonality coefficient of the first Internet interaction data and the result StepB of the comparison of the commonality coefficient of the second Internet interaction data tends to 0, the probability that the first Internet interaction data and the second Internet interaction data belong to the same person is higher.
Step208c3: and determining the relevance between the first internet interactive data and the second internet interactive data based on the result of the comparison of the common coefficient with the maximum comparison vector in the result of the comparison of the common coefficients corresponding to the first internet interactive data and the second internet interactive data one by one.
Exemplarily, a result of the comparison of the common coefficient of the first internet interaction data is determined based on a comparison vector between a correlation metric value corresponding to the first internet interaction data and the first neighbor set and a correlation metric value corresponding to the first internet interaction data and the second neighbor set; and determining a common coefficient comparison result of the second internet interaction data based on a comparison vector between the correlation metric value corresponding to the second internet interaction data and the first neighbor set and the correlation metric value corresponding to the second internet interaction data and the second neighbor set.
Step209: and determining whether the first internet interaction data and the second internet interaction data belong to the same tag internet interaction data cluster or not based on the correlation between the first internet interaction data and the second internet interaction data.
Illustratively, the relevance between the first internet interaction data and the second internet interaction data is compared to a relevance target value. In response to the relevance exceeding a relevance target value, it is determined that the first internet interaction data and the second internet interaction data belong to the same tag internet interaction data cluster.
It is determined whether the second internet interactive data and the third internet interactive data belong to the same tag internet interactive data cluster and the first internet interactive data and the third tag belong to the same tag internet interactive data cluster through Step208 and Step 209.
Step210: and integrating at least one label internet interactive data cluster to generate a final analysis result of the internet interactive data cluster needing to be analyzed.
Illustratively, integrating at least one label internet interactive data cluster to generate at least one key label based on the condition that whether two random label internet interactive data clusters cover the same target internet interactive data or not; each key label comprises at least one label internet interaction data cluster; when the key label comprises a plurality of label internet interaction data clusters, two random label internet interaction data clusters in the plurality of label internet interaction data clusters cover the same target internet interaction data; responding to the situation that the two label internet interaction data clusters contain the same target internet interaction data, combining all target internet interaction data contained in the two analysis internet interaction data clusters to form a key label; belonging to the same label.
In one possible implementation, the first internet interaction data and the second internet interaction data belong to a same tag internet interaction data cluster, and the second internet interaction data and the third internet interaction data belong to the same tag internet interaction data cluster. The first internet interaction data and the third internet interaction data are attributed to the tags having the difference.
The first internet interaction data and the second internet interaction data are attributed to the same label, and the third internet interaction data and the second internet interaction data are attributed to the same label internet interaction data cluster based on the performance of the correlation between the target internet interaction data.
And traversing all target internet interaction data in the internet interaction data to be analyzed and processed through the steps, and generating a final analysis result corresponding to the internet interaction data to be analyzed and processed.
The internet data analysis method based on big data provided by the embodiment comprises the steps of obtaining an internet interactive data cluster needing to be analyzed, wherein the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data; pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one; determining a current interactive label cluster of each target internet interactive data based on first analysis results corresponding to not less than two analysis methods one by one; and analyzing the target internet interactive data in the internet interactive data cluster needing to be analyzed again based on the current interactive label cluster of each target internet interactive data. The method comprises the steps of analyzing and processing target internet interaction data in an internet interaction data cluster needing to be analyzed and processed one by one through a plurality of analysis methods to generate a plurality of first analysis results; and optimizing the plurality of first analysis results to determine a current interactive label cluster of each target internet interactive data, and analyzing the target internet interactive data in the internet interactive data cluster to be analyzed and processed again based on the current interactive label cluster of the target internet interactive data, so that the accuracy and reliability of the analysis result of the target internet interactive data in the internet interactive data cluster to be analyzed and processed can be improved.
On the basis, please refer to fig. 2 in combination, an internet data analysis device 200 based on big data is provided, and is applied to an internet data analysis cloud platform based on big data, the device includes:
the interactive data obtaining module 210 is configured to obtain an internet interactive data cluster that needs to be analyzed, where the internet interactive data cluster that needs to be analyzed includes a plurality of target internet interactive data;
the first result analysis module 220 is configured to perform pre-analysis on the target internet interaction data in the internet interaction data cluster to be analyzed one by one based on at least two analysis methods, so as to generate first analysis results corresponding to the at least two analysis methods one by one;
an interactive tag determining module 230, configured to determine, by combining the first analysis results corresponding to the at least two analysis methods one by one, a current interactive tag cluster of each target internet interactive data;
and a final result analysis module 240, configured to perform reanalysis on the target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed through the current interaction tag cluster of each target internet interaction data, so as to generate a final analysis result.
On the basis of the above, please refer to fig. 3 in combination, which shows a big data based internet data analysis system 300, which includes a processor 310 and a memory 320 that are in communication with each other, wherein the processor 310 is configured to read a computer program from the memory 320 and execute the computer program to implement the above method.
On the basis of the above, a computer-readable storage medium is also provided, on which a computer program stored is executed to implement the above-described method.
In summary, based on the above scheme, an internet interactive data cluster to be analyzed is obtained, where the internet interactive data cluster to be analyzed includes a plurality of target internet interactive data; pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one; determining a current interactive label cluster of each target internet interactive data based on first analysis results corresponding to at least two analysis methods one by one; and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed again based on the current interaction label cluster of each target internet interaction data to generate a final analysis result. The method comprises the steps of analyzing and processing target internet interaction data in an internet interaction data cluster needing to be analyzed and processed one by one through a plurality of analysis methods to generate a plurality of first analysis results; and optimizing the plurality of first analysis results to determine a current interactive label cluster of each target internet interactive data, and analyzing the target internet interactive data in the internet interactive data cluster to be analyzed again based on the current interactive label cluster of the target internet interactive data, so that the accuracy and reliability of the analysis results of the target internet interactive data in the internet interactive data cluster to be analyzed can be improved.
It should be appreciated that the system and its modules shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present disclosure may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting of the disclosure. Various modifications, improvements and adaptations to the present disclosure may occur to those skilled in the art, although not explicitly described herein. Such alterations, modifications, and improvements are intended to be suggested in this disclosure, and are intended to be within the spirit and scope of the exemplary embodiments of this disclosure.
Also, this disclosure uses specific language to describe embodiments of the disclosure. Such as "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the disclosure is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the disclosure may be combined as appropriate.
Further, those skilled in the art will appreciate that aspects of the present disclosure may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present disclosure may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present disclosure may be embodied as a computer product, located in one or more computer-readable media, comprising computer-readable program code.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for operation of various portions of the present disclosure may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C + +, C #, VB.NET, python, and the like, a conventional programming language such as C, visual Basic, fortran 2003, perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the present disclosure are processed, the use of numerical letters, or the use of other names are not intended to limit the order of the processes and methods of the present disclosure, unless explicitly recited in the claims. While certain presently contemplated useful embodiments have been discussed in the foregoing disclosure by way of examples, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the foregoing description of embodiments of the disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the numbers allow for adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, in particular embodiments, such numerical values are set forth as precisely as possible within the scope of the disclosure.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, and the like, cited in connection with this disclosure is hereby incorporated by reference in its entirety into this disclosure. Except where the application is filed in a manner inconsistent or conflicting with the present disclosure, except where a claim of the present disclosure is filed in its broadest scope (either currently or later appended to the present disclosure). It is to be understood that the descriptions, definitions and/or uses of terms in the attached material of the present disclosure shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in the present disclosure.
Finally, it should be understood that the embodiments described in this disclosure are merely illustrative of the principles of the embodiments of the disclosure. Other variations are also possible within the scope of the disclosure. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the disclosure can be viewed as being consistent with the teachings of the disclosure. Accordingly, embodiments of the disclosure are not limited to only those explicitly described and depicted.
The above are merely examples of the present disclosure, and are not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the scope of the claims of the present disclosure.

Claims (10)

1. An internet data analysis method based on big data is applied to a data analysis system, and the method at least comprises the following steps:
the method comprises the steps of obtaining an internet interactive data cluster needing to be analyzed, wherein the internet interactive data cluster needing to be analyzed comprises a plurality of target internet interactive data;
pre-analyzing target internet interaction data in the internet interaction data cluster needing to be analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one;
determining the current interactive label cluster of each target internet interactive data by combining the first analysis results corresponding to the at least two analysis methods one by one;
and analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed again through the current interaction label cluster of each target internet interaction data to generate a final analysis result.
2. The method for analyzing internet data based on big data as claimed in claim 1, wherein the pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed one by one based on at least two analysis methods to generate the first analysis result corresponding to the at least two analysis methods one by one, comprises:
randomly identifying the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed to generate at least two local interaction data clusters;
and pre-analyzing the target internet interaction data in each local interaction data cluster one by one according to the at least two analysis methods to generate the first analysis results corresponding to each local interaction data cluster one by one between each analysis method and each local interaction data cluster.
3. The internet data analysis method based on big data according to claim 1 or 2, wherein the target internet interaction data in the internet interaction data cluster to be analyzed is pre-analyzed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one, before further comprising:
carrying out knowledge derivation information selection on the target internet interactive data in the internet interactive data cluster needing to be analyzed and processed to generate first knowledge derivation information of the target internet interactive data;
simplifying the first knowledge deduction information of each target internet interaction data respectively to generate second knowledge deduction information of each target internet interaction data;
the pre-analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed one by one based on at least two analysis methods to generate first analysis results corresponding to the at least two analysis methods one by one, and the method comprises the following steps: and pre-analyzing the target internet interaction data in the internet interaction data cluster needing to be analyzed one by using the at least two analysis methods and through second knowledge derivation information of each target internet interaction data to generate first analysis results corresponding to the at least two analysis methods one by one.
4. The method as claimed in claim 3, wherein the step of simplifying the first knowledge derivation information of each target internet interaction data to generate the second knowledge derivation information of each target internet interaction data comprises: and simplifying the first knowledge deduction information of each target internet interaction data one by adopting a principal component analysis method to generate second knowledge deduction information of each target internet interaction data.
5. The method for analyzing internet data based on big data as claimed in claim 1, wherein said determining the current interactive label cluster of each of the target internet interactive data by combining the first analysis results corresponding to the at least two analysis methods one by one comprises:
traversing each target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, and determining a neighbor internet interaction data cluster corresponding to the screened target internet interaction data based on a common score between the screened target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster needing to be analyzed and processed, wherein the neighbor internet interaction data cluster comprises a specified number of neighbor internet interaction data with the maximum common score of the screened target internet interaction data;
and determining a current interactive label cluster of the target internet interactive data by combining a neighbor internet interactive data cluster of the target internet interactive data and the first analysis results corresponding to the at least two analysis methods one by one, wherein the current interactive label cluster is a staged set of the neighbor internet interactive data cluster.
6. The big-data-based internet data analysis method according to claim 5, wherein the determining the current interactive label cluster of the target internet interactive data by combining the neighbor internet interactive data cluster of the target internet interactive data and the first analysis result corresponding to the at least two analysis methods one by one comprises:
determining whether the target internet interaction data and the neighbor internet interaction data belong to the same label or not by combining the credibility weights of the target internet interaction data and the neighbor internet interaction data in the first analysis result;
and determining the current interactive label cluster of the target internet interactive data by combining the neighbor internet interactive data which belongs to the same label as the target internet interactive data in the neighbor internet interactive data cluster.
7. The big-data-based internet data analysis method according to claim 6, wherein the first analysis result comprises an analysis manner of the target internet interaction data included in the internet interaction data cluster to be analyzed; the determining whether the target internet interaction data and the neighbor internet interaction data belong to the same label or not by combining the credible weight of the target internet interaction data and the neighbor internet interaction data in the first analysis result includes:
screening the neighbor internet interaction data in the neighbor internet interaction data cluster corresponding to the target internet interaction data;
determining the quantity of the first analysis results covering the target internet interaction data and the screened neighbor internet interaction data simultaneously in all the first analysis results to be a first quantity;
determining the number of the first analysis results with the same analysis mode of the target internet interaction data and the screened neighbor internet interaction data as a second number in all the first analysis results covering the target internet interaction data and the screened neighbor internet interaction data at the same time;
and determining whether the target internet interaction data and the screened neighbor internet interaction data belong to the same label or not by combining the credible weights of the second number in the first number.
8. The big-data based internet data analysis method of claim 7, wherein the determining whether the filtered target internet interaction data and the neighbor internet interaction data belong to the same label in combination with the confidence weight of the second number in the first number comprises:
in response to the second number exceeding a given vector by a confidence weight of the first number, determining that the target internet interaction data and the filtered neighbor internet interaction data belong to the same label;
determining that the target internet interaction data and the filtered neighbor internet interaction data do not belong to the same tag in response to the second number not having a confidence weight in the first number greater than the designated vector.
9. The big data-based internet data analysis method of claim 5, wherein the re-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed through the current interaction tag cluster of each target internet interaction data to generate a final analysis result comprises:
determining the relevance between two random target internet interactive data through the current interactive label clusters of the target internet interactive data;
based on the relevance between two random target internet interaction data, distinguishing the target internet interaction data in the internet interaction data cluster needing to be analyzed and processed into at least one label internet interaction data cluster; the label internet interactive data cluster comprises not less than two target internet interactive data of which the relevance exceeds a relevance target value;
integrating the at least one label Internet interactive data cluster to generate a final analysis result of the Internet interactive data cluster needing to be analyzed;
the relevance between the random two target internet interaction data is determined by the following method, including:
randomly screening two target internet interaction data from the target internet interaction data, wherein one target internet interaction data is first internet interaction data, and the other target internet interaction data is second internet interaction data;
determining a first neighbor set by combining the characteristic of the sharing between the complementary queue of the current interactive label cluster of the second internet interactive data and the current interactive label cluster of the first internet interactive data;
the first neighbor set is a staged set of a current interactive label cluster of the first internet interactive data;
determining a second neighbor set by combining the characteristic of the sharing between the complementary queue of the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data;
the second neighbor set is a staged set of a current interactive label cluster of the second internet interactive data;
determining the relevance between the first internet interaction data and the second internet interaction data by combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data in sequence;
wherein, the determining the association between the first internet interaction data and the second internet interaction data by combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data in sequence comprises:
combining the first neighbor set and the second neighbor set with the current interactive label cluster of the first internet interactive data and the current interactive label cluster of the second internet interactive data in sequence, and determining a common coefficient comparison result corresponding to the first internet interactive data and the second internet interactive data one by one;
determining the relevance between the first internet interaction data and the second internet interaction data based on the common coefficient comparison result with the maximum comparison vector in the common coefficient comparison results corresponding to the first internet interaction data and the second internet interaction data one by one;
wherein, the determining, by combining the first neighbor set and the second neighbor set in turn with the current interactive tag cluster of the first internet interaction data and the current interactive tag cluster of the second internet interaction data, a result of comparing the similarity coefficients corresponding to the first internet interaction data and the second internet interaction data one by one includes:
determining a corresponding correlation metric value between the first internet interaction data and the second internet interaction data and between the first neighbor set and the second neighbor set in turn by combining the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data and the first neighbor set and the second neighbor set in turn;
determining a common coefficient comparison result of the first internet interaction data by combining the correlation metric value corresponding to the first internet interaction data and the first neighbor set and a comparison vector between the correlation metric value corresponding to the first internet interaction data and the second neighbor set;
determining a common coefficient comparison result of the second internet interaction data by combining the correlation metric value corresponding to the second internet interaction data and the first neighbor set and a comparison vector between the correlation metric value corresponding to the second internet interaction data and the second neighbor set;
the final analysis result comprises a plurality of key labels, and each key label comprises not less than one target internet interaction data; the integrating the at least one label internet interactive data cluster to generate a final analysis result of the internet interactive data cluster needing to be analyzed further comprises:
integrating the at least one label internet interactive data cluster based on the condition that whether two label internet interactive data clusters cover the same target internet interactive data randomly or not, and generating at least one key label; each key label comprises not less than one label internet interaction data cluster;
when the key tag comprises a plurality of tag internet interaction data clusters, two random tag internet interaction data clusters in the tag internet interaction data clusters cover the same target internet interaction data;
responding to the situation that the two label internet interaction data clusters contain the same target internet interaction data, and combining all the target internet interaction data contained in the two analysis internet interaction data clusters to form the key label; belonging to the same label.
10. An internet data analysis system based on big data, comprising a processor and a memory communicating with each other, the processor being configured to read a computer program from the memory and execute the computer program to implement the method of any one of claims 1 to 9.
CN202211271926.5A 2022-10-18 2022-10-18 Internet data analysis method and system based on big data Pending CN115828012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211271926.5A CN115828012A (en) 2022-10-18 2022-10-18 Internet data analysis method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211271926.5A CN115828012A (en) 2022-10-18 2022-10-18 Internet data analysis method and system based on big data

Publications (1)

Publication Number Publication Date
CN115828012A true CN115828012A (en) 2023-03-21

Family

ID=85524936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211271926.5A Pending CN115828012A (en) 2022-10-18 2022-10-18 Internet data analysis method and system based on big data

Country Status (1)

Country Link
CN (1) CN115828012A (en)

Similar Documents

Publication Publication Date Title
CN113553596A (en) Information protection method applied to big data service and server
CN116737975A (en) Public health data query method and system applied to image analysis
CN114661994B (en) User interest data processing method and system based on artificial intelligence and cloud platform
CN115732050A (en) Intelligent medical big data information acquisition method and system
CN113313463A (en) Data analysis method and data analysis server applied to big data cloud office
CN115641176B (en) Data analysis method and AI system
CN115828012A (en) Internet data analysis method and system based on big data
CN115373688B (en) Optimization method and system of software development thread and cloud platform
CN115473822A (en) 5G intelligent gateway data transmission method and system and cloud platform
CN114329116A (en) Artificial intelligence-based intelligent park resource matching degree analysis method and system
CN115187191A (en) Scientific research project progress monitoring method and system based on teaching centralized control management
CN115481197A (en) Distributed data processing method and system and cloud platform
CN115514570A (en) Network diagnosis processing method and system and cloud platform
CN114779923A (en) VR simulation scene positioning method and system based on ultrasonic waves
CN113380352A (en) Medical micro-service arrangement-based intermediate language description method and system
CN115456101B (en) Data security transmission method and system based on data center
CN114648364B (en) Method and system for analyzing sales data of electronic commerce website
CN115563153B (en) Task batch processing method, system and server based on artificial intelligence
CN113918985B (en) Security management policy generation method and device
CN114625624B (en) Data processing method and system combined with artificial intelligence and cloud platform
CN114611478B (en) Information processing method and system based on artificial intelligence and cloud platform
CN115409510B (en) Online transaction security system and method
CN113627490B (en) Operation and maintenance multi-mode decision method and system based on multi-core heterogeneous processor
CN114168999A (en) Comprehensive security method and system based on data center
CN115564048A (en) Medical big data sharing analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination