CN118300851B - A network information detection method and system based on big data - Google Patents

A network information detection method and system based on big data Download PDF

Info

Publication number
CN118300851B
CN118300851B CN202410426391.7A CN202410426391A CN118300851B CN 118300851 B CN118300851 B CN 118300851B CN 202410426391 A CN202410426391 A CN 202410426391A CN 118300851 B CN118300851 B CN 118300851B
Authority
CN
China
Prior art keywords
information data
information
bad
text
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410426391.7A
Other languages
Chinese (zh)
Other versions
CN118300851A (en
Inventor
方韩伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anqianxingke Technology Co ltd
Original Assignee
Beijing Anqianxingke Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anqianxingke Technology Co ltd filed Critical Beijing Anqianxingke Technology Co ltd
Priority to CN202410426391.7A priority Critical patent/CN118300851B/en
Publication of CN118300851A publication Critical patent/CN118300851A/en
Application granted granted Critical
Publication of CN118300851B publication Critical patent/CN118300851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于大数据的网络信息检测方法及系统,涉及网络信息检测领域。该基于大数据的网络信息检测方法,通过获取待检测的网络信息数据,所述网络信息数据包括文本信息数据、文件信息数据、网络流量信息数据、图像信息数据;对待检测的网络信息数据进行特征提取,分别得到文本特征、文件特征、网络流量特征、图像特征,本发明通过对待检测的网络信息数据进行多维度的特征提取,包括文本、文件、网络流量和图像等多种类型的特征,从而能够更全面地反映信息的属性和特征,以提高信息检测的准确性和鲁棒性,并且通过对不同类型信息的特征进行综合分析,可以更准确地判断信息是否为不良信息,从而有效地遏制不良信息在网络中的传播。

The present invention discloses a network information detection method and system based on big data, and relates to the field of network information detection. The network information detection method based on big data obtains network information data to be detected, and the network information data includes text information data, file information data, network flow information data, and image information data; performs feature extraction on the network information data to be detected, and obtains text features, file features, network flow features, and image features respectively. The present invention performs multi-dimensional feature extraction on the network information data to be detected, including multiple types of features such as text, file, network flow, and image, so as to more comprehensively reflect the attributes and features of the information, so as to improve the accuracy and robustness of information detection, and through comprehensive analysis of the features of different types of information, it can more accurately judge whether the information is bad information, thereby effectively curbing the spread of bad information in the network.

Description

Network information detection method and system based on big data
Technical Field
The invention relates to the field of network information detection, in particular to a network information detection method and system based on big data.
Background
In the present digital age, the explosive growth of network information is increasing, and at the same time, bad information on the network is also increasing rampant, aiming at the challenge, a network information detection method based on big data is generated, which becomes an important means for effectively dealing with the bad information, and big data technology provides powerful data processing and analysis capability, so that massive network information can be collected, stored, processed and analyzed efficiently. Through big data technology, we can monitor the information flow on the network in real time, discern and filter out the bad content among them to guarantee the clear and safe of network space.
The method for detecting the bad information and the server disclosed in the patent application of the invention with the bulletin number of CN102880636A in the prior art comprises the steps of receiving an information release request sent by a client by the server, carrying out word segmentation matching on information carried by the information release request and a preset filtering word stock, calculating the relevance, and carrying out filtering processing on the information when the relevance is larger than a preset threshold. According to the invention, word segmentation matching is carried out on each piece of information by maintaining a filtered word stock, the correlation degree between the piece of information and sensitive words in the filtered word stock is calculated, the correlation degree is compared with a preset threshold value, when the correlation degree is larger than the preset threshold value, the piece of information is filtered, and corresponding marks are made in a database, so that the piece of information is only visible to a publisher and not visible to other people. Thereby effectively preventing the propagation of bad information and enhancing the maintainability and good interactivity of websites; in addition, the manual participation is reduced, so that the efficiency of website information management is improved.
In view of the above technical solution, it is found that the limitation of the current network information detection method for detecting network information is that the current network information detection method only detects information obtained in a single type, but the network information contains a lot of network information, so that the network information detection method is difficult to process various types of network information data.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a network information detection method and a system based on big data, which solve the problems that the current network information detection method only aims at single type obtained information to detect, but the network information contains a lot of network information, so that the network information detection method is difficult to process multiple types of network information data.
The network information detection method based on big data comprises the steps of obtaining network information data to be detected, wherein the network information data comprise text information data, file information data, network flow information data and image information data, extracting characteristics of the network information data to be detected to obtain text characteristics, file characteristics, network flow characteristics and image characteristics respectively, analyzing and processing the text characteristics, the file characteristics, the network flow characteristics and the image characteristics respectively, judging whether the text information data, the file information data and the network flow information data are bad information according to analysis and processing results respectively, and adopting corresponding processing measures for the corresponding information data judged to be bad information.
The text feature is specifically all word information data in the text information data, the specific process of judging whether the text information data is bad information is that the text information data is read and divided, stop word processing is conducted on the divided text information data, word shape restoration processing is conducted on the text information data subjected to stop word processing to obtain all word information in the text information data, all word information in the text information data is matched with a bad word bank to obtain the number of bad words of the text information data, text bad coefficients of the text information data are calculated based on all word information and the number of bad words of the text information data, the text bad coefficients of the text information data are compared with a preset text bad coefficient threshold, and if the bad coefficients of the text information data are higher than the preset bad coefficient threshold, the text information data are regarded as bad information.
Further, the specific calculation formula of the defect coefficient is as follows: Wherein, the method comprises the steps of, Is a bad coefficient of the text information data,After matching with the bad word library in the text information data, the obtained number of bad words,For the total number of words in the text information data,The method comprises the specific steps of obtaining word information data of a plurality of text information data, matching the word information data of the text information data with a bad word stock respectively to obtain measured quantity values of bad words of each text information data, obtaining actual quantity values of the bad words of each text information data after the word information data of each text information data are matched with the bad word stock respectively, and comprehensively analyzing based on the measured quantity values and the actual quantity values of the bad words of each text information data and combining a moving index average method to obtain the matching correction factor.
The file characteristic is a file type in file information data, the file type comprises a text file, an image file, an audio file and a binary file, the specific process of judging whether the file information data is bad information is as follows, if the file characteristic is a text file, reading the content of the text file into a memory based on a file operation function in a program language, marking the content as a character string, closing a file handle after the file reading is completed, releasing resources, carrying out hash calculation on the character string based on a hash algorithm to obtain a text hash value, matching the text hash value with bad text Ha Xiku, and if the matching is successful, treating the file information data as bad information; if the file feature is an image file, the image information in the image file is adjusted to a preset size, gray processing is carried out on the adjusted image information, pixel mean values are calculated based on the image information, character strings are constructed based on the pixel mean values, namely image hash values are matched with bad images Ha Xiku, if the matching is successful, file information data are regarded as bad information, if the file feature is an audio file, a sound spectrum image in the audio file is extracted, the sound spectrum image is converted into an audio character string, hash calculation is carried out on the audio character string based on a hash algorithm, the audio hash values are obtained, the audio hash values are matched with bad audio Ha Xiku, if the matching is successful, the file information data are regarded as bad information, if the file feature is a binary file, the content of the binary file is read into a memory based on a file operation function in a program language, and the content of the binary file is marked as a character string after the file reading is completed, closing the file handle, releasing the resources, carrying out hash calculation on the character string based on a hash algorithm to obtain a binary hash value, matching the binary hash value with the bad binary Ha Xiku, and if the matching is successful, regarding the file information data as bad information.
The network traffic characteristic is specifically that the byte number of a data packet, the data transmission rate, the source IP address and the target IP address of the data packet, the source port number and the target port number of the data packet in the network traffic information data, and the specific process of judging whether the network traffic information data is bad information is as follows, wherein normalization processing is performed based on the extracted byte number of the data packet, the data transmission rate, the source IP address and the target IP address of the data packet, the source port number and the target port number of the data packet, comprehensive analysis calculation is performed based on the byte number of the data packet, the data transmission rate, the source IP address and the target IP address of the data packet, the source port number and the target port number of the data packet after normalization processing, a network traffic characteristic value is obtained, the network traffic characteristic value is compared with a preset network traffic characteristic threshold, and if the network traffic characteristic value is higher than the preset network traffic characteristic threshold, the network traffic information data is regarded as bad information.
The image information data comprises image information data and video information data, wherein the image characteristics comprise image characteristics extracted based on the image information data and video characteristics extracted based on the video information data, the image characteristics are specifically color histograms, edge characteristics, texture characteristics and depth characteristics of the image information data, and the specific process of judging whether the image information data is bad information is that the color histograms, the edge characteristics, the texture characteristics and the depth characteristics of the extracted image information data are subjected to characteristic fusion to obtain image characteristic comprehensive vectors, the image characteristic comprehensive vectors are read and input into a classification model to be classified, and whether the image information data is bad information is judged according to classification results.
The video features are specifically color histograms, edge features, texture features and depth features of picture information data of each key frame in the video information data, and the specific process of judging whether the video information data is bad information is that the color histograms, edge features, texture features and depth features of the picture information data of each key frame in the extracted video information data are read and input into a classification model to be classified, the classification of the picture information data of each key frame is obtained, whether the key frame with the bad picture information data is judged, and if yes, the video information data is regarded as bad information.
Further, corresponding processing measures are adopted for corresponding information data which are judged to be bad information, namely if the bad information is text information data and image information data, a deleting mode is adopted for the text data and the image information data which contain the bad information so as to prevent continuous transmission, text content and image content which are uploaded by a user subsequently are monitored in real time, and bad information is found and processed in time, if the bad information is file information data, an isolating mode is adopted for the file which contains the bad information so as to prevent continuous transmission, the file content which is uploaded by the user subsequently is monitored in real time, bad files are found and processed in time, if the bad information is network flow information data, network security equipment is used for preventing the transmission of bad flow, and access control measures are implemented for the source of the bad flow so as to limit the access authority of the bad flow to network resources.
A network information detection system based on big data comprises a data acquisition subsystem, a characteristic extraction subsystem, an analysis processing subsystem and a processing measure subsystem, wherein the data acquisition subsystem is used for acquiring network information data to be detected based on network monitoring software, the network information data comprise text information data, file information data, network flow information data and image information data, the characteristic extraction subsystem is used for carrying out characteristic extraction on the network information data to be detected to respectively obtain text characteristics, file characteristics, network flow characteristics and image characteristics, the analysis processing subsystem is used for respectively carrying out analysis processing based on the text characteristics, the file characteristics, the network flow characteristics and the image characteristics and respectively judging whether the text information data, the file information data and the network flow information data are bad information according to analysis processing results, and the processing measure subsystem is used for taking corresponding processing measures on the information data corresponding to the bad information.
The analysis processing subsystem comprises a text analysis processing module, a file analysis processing module, a network flow analysis processing module and an image analysis processing module, wherein the text analysis processing module is used for carrying out analysis processing according to text characteristics and judging whether text information data is bad information, the file analysis processing module is used for carrying out analysis processing according to file characteristics and judging whether the file information data is bad information, the network flow analysis processing module is used for carrying out analysis processing according to the network flow characteristics and judging whether the network flow information data is bad information, and the image analysis processing module is used for carrying out analysis processing according to the image characteristics and judging whether the image information data is bad information.
The invention has the following beneficial effects:
(1) According to the network information detection method based on big data, the multi-dimensional feature extraction is carried out on the network information data to be detected, and the multi-dimensional feature extraction comprises various types of features such as texts, files, network flow and images, so that the attribute and the feature of the information can be reflected more comprehensively, the accuracy and the robustness of information detection are improved, and whether the information is poor information can be judged more accurately through comprehensive analysis on the features of different types of information, so that the propagation of the poor information in a network is effectively restrained.
(2) According to the network information detection method based on the big data, the network information can be detected, and meanwhile, the data which is judged to be bad information can be monitored in real time and targeted processing measures can be adopted. Corresponding processing strategies are adopted for different types of bad information, such as deleting text information and image information, isolating file information, preventing network traffic and the like. The mechanism for real-time monitoring and targeted processing can effectively prevent the continuous propagation of bad information and protect network safety and user benefits.
(3) According to the big data-based network information detection system, the analysis processing subsystem is divided into the text, the file, the network flow and the image analysis processing module, so that modularized processing is realized, each module is focused on the analysis processing of specific type information, the flexibility and the expandability of the system are enhanced, and the modularized design enables the system to dynamically adjust the functions and the configuration of each module according to actual requirements, so that the system is better suitable for network information detection tasks under different scenes.
(4) According to the network information detection system based on the big data, the text, the file, the network flow and the image analysis processing module are respectively responsible for the analysis processing of the corresponding type information, so that the refinement processing of the task is realized, the system can more efficiently detect the information, the processing speed and accuracy are improved, the processing task is decomposed into a plurality of modules for parallel processing, the system can fully utilize the computing resources, the information detection task can be completed more quickly, and the processing requirement on the large-scale data is effectively met.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
Fig. 1 is a flowchart of a network information detection method based on big data in the present invention.
Fig. 2 is a flowchart showing specific steps for determining whether text information data is bad information in the network information detection method based on big data according to the present invention.
Fig. 3 is a flowchart showing specific steps for determining whether file information data is bad information in the network information detection method based on big data according to the present invention.
Fig. 4 is a flowchart illustrating specific steps for determining whether network traffic information data is bad information in the network information detection method based on big data according to the present invention.
Fig. 5 is a flowchart illustrating a specific step of determining whether picture information data is bad information in a network information detection method based on big data according to the present invention.
Fig. 6 is a flowchart showing specific steps for determining whether video information data is bad information in the network information detection method based on big data according to the present invention.
Fig. 7 is a block diagram of a network information detection system based on big data according to the present invention.
Fig. 8 is a block diagram of an analysis processing subsystem in a network information detection system based on big data according to the present invention.
Detailed Description
The embodiment of the application solves the problem that the current network information detection method only aims at single type obtained information to detect, but the network information contains a lot of information, so that the network information detection method is difficult to process multiple types of network information data.
The problems in the embodiment of the application have the following general ideas:
Firstly, network information data to be detected, including text information data, file information data, network flow information data and image information data, are obtained from a network, then the obtained network information data is subjected to feature extraction to respectively obtain text features, file features, network flow features and image features, then the text information data, the file information data, the network flow information data and the image information data are respectively analyzed and processed based on the extracted features, text reject coefficients are calculated according to vocabulary information, reject information judgment is carried out on the file information data according to file types and features, reject information judgment is carried out on the network flow information data according to the features of data packets, reject information judgment is carried out on the image information data according to the features of the images, finally corresponding processing measures are adopted for the data judged to be reject information, the text information data and the image information data are adopted to prevent continuous transmission, the content uploaded later is monitored in real time, isolation or deletion and the like are adopted for the file information data, access control measures are prevented by using network security equipment for the network flow information data, and access rights to network resources are limited.
Referring to fig. 1, an embodiment of the invention provides a method for detecting network information based on big data, which comprises the steps of acquiring network information data to be detected based on network monitoring software, wherein the network information data comprises text information data, file information data, network flow information data and image information data, extracting characteristics of the network information data to be detected to respectively obtain text characteristics, file characteristics, network flow characteristics and image characteristics, respectively analyzing and processing the text characteristics, the file characteristics, the network flow characteristics and the image characteristics, respectively judging whether the text information data, the file information data and the network flow information data are bad information according to analysis and processing results, and adopting corresponding processing measures for the information data which are judged to be bad information.
The text feature is all word information data in the text information data, and the specific process of judging whether the text information data is bad information is as follows, as shown in figure 2, the text information data is read and divided in the form of characters, and the process can divide the text into discrete words or marks as the basic unit of the feature; removing stop words from the divided text information data, removing stop words in the text, wherein the words usually occur in languages but do not carry much information, such as yes, etc., performing word shape reduction processing on the text information data subjected to the stop word removal processing to obtain all word information in the text information data, specifically merging different forms of the words into an original form of the text information data, wherein word shape reduction is to reduce the words into a basic form in a dictionary of the word shape reduction, such as reducing the way into be, respectively matching all word information in the text information data with a bad word library to obtain the number of bad words of the text information data, and the bad word library is specifically an information database containing all bad words: Wherein, the method comprises the steps of, Is a bad coefficient of the text information data,After matching with the bad word library in the text information data, the obtained number of bad words,For the total number of words in the text information data,To match the correction factor.
The method comprises the specific steps of obtaining word information data of a plurality of text information data, respectively matching the word information data of the text information data with a bad word stock to respectively obtain measured quantity values of bad words of each text information data, simultaneously obtaining actual quantity values of the bad words of each text information data after the word information data of each text information data are respectively matched with the bad word stock, comprehensively analyzing by combining a moving index average method based on the measured quantity values and the actual quantity values of the bad words of each text information data to obtain a matching correction factor, and obtaining the matching correction factor by more accurate calculation by combining the moving index average method besides the text processing technology, wherein the calculation formula is as follows: Wherein, the method comprises the steps of, Is the firstA measured quantity value of the bad word after matching the bad word library of the text information data,Is the firstThe actual quantity value of the bad word obtained after matching with the bad word library in the text information data,Is the firstA measured quantity value of the bad word after matching the bad word library of the text information data,Is the firstThe actual quantity value of the bad word obtained after matching with the bad word library in the text information data,,For the amount of text information data,Is thatIs used for the ratio of the coefficients of (a),Is thatIs a proportional coefficient of (c).
In the embodiment, the text information data is split into words or marks, the words are taken as basic units of the features, the features of the text information can be more comprehensively captured, words with different forms can be integrated into the original forms through morphological reduction processing, feature extraction is more accurate and complete, the comprehensiveness and the flexibility enable a system to cope with text information data with different types and forms, the accuracy and the applicability of detection are improved, meanwhile, the defect degree of the text information data can be quantified through calculating a text defect coefficient and combining with a preset text defect coefficient threshold value, whether the text information data is defective information or not is judged according to the defect degree, the accuracy is further improved through the introduction of a matching correction factor, the matching condition of defective words can be better reflected through comprehensive analysis of a measured quantity value and an actual quantity value, and the possibility of misjudgment is reduced. The method has practicability and can help the network information detection system to efficiently identify and filter out bad information, thereby protecting the safety and health of the network environment.
The specific process of judging whether the file information data is bad information is as follows, if the file characteristics are text files, reading the content of the text files into a memory based on a file operation function in a programming language, marking the content as character strings, closing file handles after the file reading is completed, releasing resources, namely closing the text files, releasing system resources occupied by the files, carrying out hash calculation on the character strings based on a hash algorithm to obtain text hash values, and matching the text hash values with bad texts Ha Xiku, wherein if the matching is successful, the file information data is regarded as bad information, the bad texts Ha Xiku are information databases containing all bad text hash values, and the bad text hash values are specific values exceeding a preset bad text hash threshold; if the file features are image files, adjusting the image information in the image files to a preset size, carrying out gray processing on the adjusted image information, calculating pixel mean values based on the image information, wherein the image information is specifically pixel value information of all pixel points in the image files, setting all pixels larger than the mean value to be 1, setting all pixels smaller than the mean value to be 0, constructing a character string based on the pixel mean values, namely, connecting binary values of all pixels to form a long binary character string, namely, an image hash value, matching the image hash value with a bad image Ha Xiku, if the matching is successful, the method comprises the steps of regarding file information data as bad information, regarding a bad image Ha Xiku as an information database containing all bad image hash values, regarding the bad image hash values as numerical values exceeding a preset bad image hash threshold, extracting a sound spectrum diagram in an audio file if the file characteristics are audio files, converting the sound spectrum diagram into an audio character string, performing hash calculation on the audio character string based on a hash algorithm to obtain the audio hash value, and matching the audio hash value with a bad audio Ha Xiku, regarding the file information data as the bad information if matching is successful, regarding the bad audio Ha Xiku as an information database containing all bad audio hash values, regarding the bad audio hash values as numerical values exceeding a preset bad audio hash threshold, regarding the binary operation function based on a file in a program language as a binary file if the file characteristics are the audio file, reading the content of the binary file into a character string, closing the file after the file reading is completed, namely closing the binary file, releasing the binary file, regarding the hash value as a binary handle if the hash algorithm is successful, regarding the hash value as a binary handle, and regarding the hash value as the binary value which exceeds the preset bad audio hash value, and regarding the hash value as the binary value as the bad value, and regarding the bad value as the bad value if the hash value, and the hash value is the binary value which exceeds the binary value, and the binary value is the binary value which is the binary value, and the binary value which is matched with the bad value, and the bad value is obtained.
The pixel mean value is calculated by firstly eliminating pixel points of pixel value information in an image file, then calculating the pixel mean value based on the image information of the image file after elimination, and the pixel mean value can be obtained by more accurate calculation by combining a harmonic mean method besides being obtained by a computer vision technology, wherein the calculation formula is as follows: Wherein, the method comprises the steps of, As the mean value of the pixels,To the first in the rejected image filePixel value information of the individual pixel points,,For rejected image files the number of pixels in the pixel array.
The specific process of extracting the spectrogram in the audio file is that the audio signal is divided into segments of short time period, the frames are usually divided by using a mode of overlapping windows, for example, each window has the length of 20-30 milliseconds, a window function, for example, a hanning window or a hamming window, is applied to each segment to reduce the boundary effect, the audio signal in each window is subjected to fourier transformation, the time domain signal is converted into the frequency domain signal, the frequency domain signal in each window is subjected to square operation to obtain the energy spectrum, then the energy spectrums are summed and logarithmized to obtain the power spectrum density of each time segment, the power spectrum density is expressed as an image, namely, the spectrogram, the horizontal axis is time, the vertical axis is frequency, and the color depth represents the power value.
The specific steps of converting the sound spectrum into character strings are as follows, compressing the sound spectrum according to a certain compression algorithm to reduce data volume, common compression algorithms include Fourier transform, wavelet transform and the like, and then encoding the compressed data to obtain a character string representation. The coding method can be to convert each data point into a character or convert a segment of data into a substring, and connect all the coded character strings to obtain a long character string, namely the character string representation of the spectrogram.
In this embodiment, by identifying different types of files, such as text files, image files, audio files and binary files, the method is applicable to various forms of information data, so as to improve applicability and diversity of the system, and different types of files are processed in different ways, so that the system can effectively cope with different types of bad information, and provides omnibearing guarantee for network security, and for different types of files, a targeted processing method is adopted, such as hash calculation is performed on the text files, pixel mean calculation and gray processing are performed on the image files, spectrogram extraction is performed on the audio files, hash calculation is performed on the binary files, and the like, so that file characteristics can be extracted more accurately, and by matching with bad Ha Xiku, whether the files are bad information can be reliably judged, so that possibility of misjudgment is reduced, accuracy and reliability of detection are improved, and a specific process of judging whether the file information data are bad information is realized by adopting a computer program automatic file processing way, and processing efficiency and practicability are improved. Through the file operation function in the program language, the file content can be quickly read into the memory, and corresponding processing and calculation are performed, so that the file information can be quickly analyzed and judged, the system can timely identify and filter out bad information due to the high efficiency and the practicability, and the safety and the health of a network environment are ensured.
Specifically, as shown in fig. 4, the network traffic characteristic is specifically that the byte number of a data packet, the data transmission rate, the source IP address and the destination IP address of the data packet, the source port number and the destination port number of the data packet in the network traffic information data, and the specific process of judging whether the network traffic information data is poor information is that, based on the extracted byte number of the data packet, the data transmission rate, the source IP address and the destination IP address of the data packet, the source port number and the destination port number of the data packet, the normalization processing is performed, and based on the byte number of the data packet, the data transmission rate, the source IP address and the destination IP address of the data packet, the source port number and the destination port number of the data packet after the normalization processing, the network traffic characteristic value is obtained by comprehensively analyzing the byte number of the data packet, the data transmission rate, the source IP address and the destination port number of the data packet, and the network traffic characteristic value is obtained, wherein, the IPv4 address is split into four bytes, and each byte value is divided by 255, the obtained result is a value within a range of [0,1] and the IPv6 address is also similarly normalized, and the port number is also obtained by 65535, and the obtained result is a value within a range of [0, if the preset value is a preset value within a preset value, and the preset network traffic characteristic value is considered to be compared with the network traffic characteristic value, and is high, and the network traffic characteristic value is compared to the network traffic characteristic value, and has a threshold value.
The network flow characteristic value can be obtained through more accurate calculation by combining a weighted average method besides the network packet capturing tool, such as Wireshark, and the calculation formula is as follows: Wherein, the method comprises the steps of, As a characteristic value of the network traffic,To normalize the byte count of the processed packet,To normalize the rate of the processed data transmission,To normalize the source IP address value of the processed packet,To normalize the destination IP address value of the processed packet,To normalize the source port number value of the processed packet,To normalize the destination port number value of the processed packet,Is thatIs used for the ratio of the coefficients of (a),Is thatIs used for the ratio of the coefficients of (a),Is thatIs used for the ratio of the coefficients of (a),Is thatIs used for the ratio of the coefficients of (a),Is thatIs used for the ratio of the coefficients of (a),Is thatIs used for the ratio of the coefficients of (a),
In this embodiment, the network traffic is comprehensively analyzed by extracting a plurality of characteristics of the network traffic data, such as the byte number, transmission rate, source IP address, destination IP address, source port number, destination port number, and the like of the data packet. The comprehensive feature extraction can more comprehensively understand the condition of network traffic, is favorable for finding and identifying bad information, improves the accuracy and reliability of detection, adopts normalization processing to normalize the network traffic features, ensures that different types of feature values have the same dimension, facilitates comprehensive analysis and comparison, eliminates the order-of-magnitude difference between different feature values by normalizing the data such as IPv4 addresses, port numbers and the like to be in the range of 0 to 1, ensures that the comparison of the feature values is more fair and accurate, improves the comparability and reliability of detection, is flexible and diverse, and can be adjusted and optimized according to different characteristics of the network traffic. Meanwhile, by setting the preset network flow characteristic threshold, different types of network flows can be detected in a customized mode according to specific conditions, so that safety requirements under different scenes are met, the applicability and flexibility of the system are improved, a specific process of judging whether the network flow information data is bad information adopts a mode of automatically processing the network flow data, and the processing efficiency and practicality are improved. The network traffic characteristics are automatically extracted through the computer program and normalized, the network traffic data can be rapidly and accurately analyzed, abnormal conditions can be timely found, corresponding measures are taken, the efficiency and the practicability enable the system to timely cope with network security threats, and the safety and the stability of a network environment are ensured.
Specifically, as shown in fig. 5, the image information data comprises picture information data and video information data, the image features comprise picture features extracted based on the picture information data and video features extracted based on the video information data, the picture features are specifically color histograms, edge features, texture features and depth features of the picture information data, the video features are specifically color histograms, edge features, texture features and depth features of the picture information data of each key frame in the video information data, the specific process of judging whether the picture information data is bad information is as follows, feature fusion is carried out on the extracted color histograms, edge features, texture features and depth features of the picture information data in a weighted summation mode to obtain picture feature comprehensive vectors, the picture feature comprehensive vectors are used for describing the comprehensive features of the picture information data, the picture feature comprehensive vectors are read and input into a classification model to be classified, whether the picture information data is bad information is judged according to classification results, and if the picture information data is bad information is normal and bad information is judged, and if the picture information data is bad information is regarded as bad information.
The specific process of extracting the picture features is that the picture information data is input into a picture feature extraction model to perform feature extraction, and a color histogram, edge features, texture features and depth features of the picture information data are obtained.
The specific training process of the picture feature extraction model comprises the steps of obtaining a plurality of picture information data to be extracted, and establishing a picture information data set; the method comprises the steps of dividing a picture information data set into a picture information training set and a picture information verification set, reading the picture information training set, training a picture feature extraction model, calculating to obtain a picture loss function, evaluating model performance based on the picture information verification set, adjusting model parameters according to verification results until a color histogram, edge features, texture features and depth features predicted by the model accord with actual color histograms, edge features, texture features and depth features, inputting the picture feature extraction model, wherein input data are picture information data to be extracted, color histograms of the picture information data, edge features, texture features and depth features respectively, the picture feature extraction model comprises an input layer for receiving the picture information data to be extracted, a plurality of convolution layers for extracting features from the picture information data, each convolution layer comprises a filter and an activation function, the activation function is used for introducing nonlinear characteristics, increasing the expression capacity of the model, preserving important information and reducing the calculated amount of pooling layers, mapping the features extracted by the convolution layers onto final output categories, a full-connection layer for outputting final predicted picture information data of the model, a color layer, a depth feature layer, a color layer for outputting the final predicted picture information data of the model, a color layer for the edge feature, a depth layer for outputting the final predicted picture information data of the model, and a histogram feature for optimizing a difference function, and a difference between the output function and a histogram loss function is used for optimizing a difference function.
The training process of the classification model comprises the steps of acquiring a plurality of picture feature comprehensive vectors, establishing a picture feature comprehensive vector data set, wherein each picture feature comprehensive vector is attached with a label, specifically a picture category, dividing the picture feature comprehensive vector data set into a picture feature training set and a picture feature verification set, reading the picture feature training set and training a classification model, calculating a picture feature loss function, evaluating model performance based on the picture feature verification set, regulating model parameters according to verification results until the label of the picture feature comprehensive vector predicted by the model accords with an actual label of the feature comprehensive vector, transmitting gradients of the loss function back to the model through a back propagation algorithm to update parameters of the model, and the training process generally comprises a plurality of iteration cycles, each cycle comprises two stages of forward propagation and back propagation, the classification model comprises an input layer for receiving the feature vector as an input node, a plurality of hidden layers for carrying out nonlinear transformation and feature extraction on the input features, each hidden layer comprises a plurality of neurons, and each neuron is connected with the input node and weight layer, and the model has a bias layer, and a neural layer for expressing the difference between the input node and the model has a bias, and a model has a difference between the two-level and the model, and the model has a normal performance and a difference between the two-level and the model is represented by the two-level and the model has the difference, an optimizer for adjusting the model parameters to minimize the loss function.
The color histogram is statistical information of pixel color distribution in an image, divides a color space in the image into a plurality of areas, counts the number of pixels in each area, and represents the result as a histogram, and the color histogram can reflect the distribution condition of different colors in the image and is helpful for describing the color characteristics of the image.
Edge features describe edge information in an image, i.e., areas in the image where gray values vary greatly, and common edge detection algorithms include Sobel, canny, etc., which can detect edge positions and directions in the image and extract the edge information for describing contour and shape features of the image.
Texture features describe texture information in an image, i.e. the spatial distribution of color and gray scale of local areas of the image, and commonly used texture descriptors include local binary patterns, directional gradient histograms, etc., which can be used to describe the texture roughness, direction and frequency of the image.
Depth features are high-level semantic features learned from images by a depth learning model. The depth features have strong representation capability, abstract information in the image, such as objects, scenes and the like in the image, and common depth learning models comprise VGG, resNet, inception and the like, and convolution layers in the models can extract the depth features of the image.
In the embodiment, the image characteristics can be more comprehensively described by extracting multidimensional characteristics such as color histograms, edge characteristics, texture characteristics, depth characteristics and the like of the image information data and comprehensively analyzing the characteristics through characteristic fusion and classification models, so that the identification accuracy of the bad information is improved, the bad image is efficiently detected and filtered through the capability of judging normal and bad images through training models, the method is not only suitable for static image data but also suitable for key frames in video information data, different characteristics are extracted for different types of image data and classified by combining with training models, the image data of various forms can be adapted, the flexibility and the applicability are higher, the diversified network security requirements can be met, the image characteristic extraction model and the classification model are constructed, the automatic processing mode can rapidly process a large amount of image data, timely find and filter bad information, effectively protect the safety of a network environment, continuously collect and mark picture information data, perform model training and verification, continuously optimize and improve the performance of the model so as to adapt to the dynamic change and the continuously evolving safety threat of the network environment, the continuously optimizing process enables the model to have stronger robustness and adaptability, can keep stable detection effect in the complex network environment, timely identify and filter the bad picture information, help to prevent various safety threats caused by the bad information to the network environment, thereby improving the overall safety level of the network, the legal rights and interests of the user and the healthy development of network space are protected.
Specifically, as shown in fig. 6, the video features are color histogram, edge feature, texture feature and depth feature of the picture information data of each key frame in the video information data, and the specific process of judging whether the video information data is bad information is as follows, namely, reading the color histogram, edge feature, texture feature and depth feature of the picture information data of each key frame in the extracted video information data, inputting the color histogram, edge feature, texture feature and depth feature into a classification model to classify the color histogram, edge feature, texture feature and depth feature into the classification model to obtain the class of the picture information data of each key frame, judging whether the key frame of the picture information data with bad class exists, and if the key frame exists, judging the video information data as bad information.
The specific extraction process of the key frame comprises the steps of reading video information data and dividing the video information data to obtain picture information data of all frames of the video information data, respectively counting color histograms of the picture information data of each frame based on the picture information data of all frames of the video information data, wherein each color histogram specifically comprises a plurality of intervals, each interval represents a color range, counting the number of pixel points falling into each color range in an image, respectively calculating the similarity of the picture information data among the frames based on the color histograms of the picture information data of each frame, taking the frame number of the picture information data corresponding to the similarity exceeding a similarity threshold value as the key frame, and obtaining the similarity through a more accurate calculation method besides the image processing and computer vision technology, wherein the calculation formula of the similarity is as follows: Wherein, the method comprises the steps of, Is the firstPicture information data of frame and the firstSimilarity of picture information data of a frame,Is the firstColor histogram in picture information data of frameThe statistics of the color intervals, which is specifically the number of pixels in the color interval,Is the firstColor histogram in picture information data of frameThe statistics of the individual color intervals are used,,As the number of frames of the video information data,,The number of color bins of the color histogram in the picture information data for each frame.
In this embodiment, by extracting multi-dimensional features such as color histogram, edge feature, texture feature, depth feature and the like of the picture information data of each key frame and classifying the multi-dimensional features through the classification model, the video content can be comprehensively analyzed, the identification accuracy of the bad information is improved, compared with the whole analysis of the video, the feature extraction and classification are performed on the key frames, the bad content in the video can be more accurately captured, the processing cost of the video content can be greatly reduced due to the extraction of the key frames, and the complex calculation process of frame-by-frame analysis of the whole video is avoided. By carrying out feature extraction and classification on the key frames, the processing efficiency can be improved while the detection accuracy is ensured, the calculation resources and the time cost are saved, and the automatic processing and intelligent analysis of the video content are realized by constructing a classification model and an algorithm for extracting the key frames. The automatic processing mode can rapidly process a large amount of video data, timely find and filter bad information, improve processing efficiency and intelligence level, and help to prevent various security threats caused by the bad information to a network environment by timely identifying and filtering the bad information in video content, thereby improving the overall security level of the network and protecting legal rights and interests of users and healthy development of network space.
The method comprises the steps of judging whether the corresponding information data of the bad information is text information data or image information data, if so, deleting the text data and the image information data containing the bad information to prevent continuous transmission, monitoring text content and image content uploaded by a user in real time, timely finding and processing the bad information, if so, adopting an isolation mode to prevent continuous transmission of the file containing the bad information, timely monitoring the file content uploaded by the user, timely finding and processing the bad file, and if so, using network security equipment such as a firewall, an intrusion detection system and the like to prevent the transmission of the bad flow, implementing access control measures for the source of the bad flow, and limiting the access authority of the file to network resources.
In this embodiment, for text data and image data containing bad information, a timely and effective measure is adopted to prevent further transmission of bad information, through real-time monitoring of text content and image content uploaded by users subsequently, bad information can be found and processed in time to prevent the spread of bad information, protecting users from the influence of bad content, maintaining health and order of network environment, for files containing bad information, an isolation way is adopted to be an effective measure, transmission of bad information can be prevented, and use of other legal content is not affected, file content uploaded by users subsequently is monitored in real time, bad files can be found and processed in time, security and stability of network environment are protected, network security devices such as a firewall and an intrusion detection system are used, bad flow is prevented and filtered, access control measures are implemented for sources of bad flow, access authority of network resources is limited, propagation of bad information and attack to network can be effectively prevented, stability and security of network are maintained, isolation and security of network are ensured, or the bad information can be prevented from being effectively prevented from being transmitted in the network environment, the network environment is prevented from being polluted by users, the bad information is effectively prevented from being processed, the network security system is improved, the network security is protected, the bad information is protected from being well-being processed, and the network environment is protected, and the network security is protected, and the network is protected from being bad environment is effectively, and the network is protected from being well is protected, and the network is protected.
Referring to fig. 7, the embodiment of the invention provides a technical scheme that a network information detection system based on big data comprises a data acquisition subsystem, a feature extraction subsystem, an analysis processing subsystem and a processing measure subsystem; the data acquisition subsystem is used for acquiring network information data to be detected based on network monitoring software, wherein the network information data comprises text information data, file information data, network traffic information data and image information data, and is particularly responsible for collecting the text information data from a network, including text contents, news articles and the like on social media, collecting the file information data from the network, including various types of files, such as text files, image files, audio files and binary files, capturing and recording the network traffic data, including information such as byte numbers, transmission rates, IP addresses, port numbers and the like of data packets, and collecting the image information data from the network, including pictures and videos; the feature extraction subsystem is used for carrying out feature extraction on network information data to be detected to respectively obtain text features, file features, network traffic features and image features, in particular for carrying out feature extraction on the text information data, extracting vocabulary information and other relevant features of the text, carrying out feature extraction on the file information data, extracting corresponding features such as text hash value, image hash value, audio hash value and the like according to file types, carrying out feature extraction on the network traffic information data, extracting relevant features of data packets such as byte numbers, transmission rate, IP addresses and the like, carrying out feature extraction on the image information data, extracting color histograms, edge features, texture features and depth features of the image, and the analysis processing subsystem, the processing measure subsystem is used for taking corresponding processing measures for corresponding information data which are judged to be bad information, in particular taking corresponding processing measures for the text information data which are judged to be bad information, such as deleting, shielding and the like, taking corresponding processing measures for the file information data which are judged to be bad information, such as isolating, deleting and the like, taking corresponding processing measures for the network flow information data which are judged to be bad information, such as flow filtering, access control and the like, and taking corresponding processing measures for the image information data which are judged to be bad information, such as deleting, checking and the like.
Specifically, as shown in fig. 8, the analysis processing subsystem comprises a text analysis processing module, a file analysis processing module, a network flow analysis processing module and an image analysis processing module, wherein the text analysis processing module is used for performing analysis processing according to text characteristics and judging whether text information data is bad information, the file analysis processing module is used for performing analysis processing according to file characteristics and judging whether the file information data is bad information, the network flow analysis processing module is used for performing analysis processing according to the network flow characteristics and judging whether the network flow information data is bad information, and the image analysis processing module is used for performing analysis processing according to image characteristics and judging whether the image information data is bad information.
In this embodiment, the text analysis processing module is focused on analyzing and processing text information data, and through analysis of text features, keywords and semantic information in text can be identified, and further whether text information data is bad information can be judged, for example, through word frequency statistics and semantic analysis, sensitive words or bad content in text can be found, so that corresponding processing measures can be timely taken, the file analysis processing module is used for processing file information data, through analysis of file features, feature extraction and analysis are performed on files, abnormal content or malicious code in files can be detected, and further whether file information data is bad information can be judged, for example, through features such as file format, file size and file structure, potentially bad files can be quickly identified, and safety of a system and a user can be protected, the network flow analysis processing module is used for analyzing and processing network flow information data, analyzing through network flow characteristics, detecting abnormal network behavior and flow data, identifying whether bad information is transmitted, for example, monitoring and analyzing the network flow, finding abnormal data packets, malicious network connection and the like, timely taking defending measures to protect the safety and stability of the network, the image analysis processing module is used for processing image information data, analyzing through image characteristics, extracting and analyzing the characteristics of the image, identifying abnormal content or bad information in the image, further judging whether the image information data is bad information, for example, detecting bad content in the image through the characteristics of color, texture, shape and the like of the image, timely processing, refining analysis and processing of different types of information data, the system can more accurately identify and process bad information, effectively protect the data security of users and the stability of network environment, discover and process bad information in time, effectively purify network environment, promote the use experience of users on the network, strengthen the trust sense of users on the system, timely discover and prevent malicious attacks and network intrusion through monitoring and analyzing network traffic, improve the overall security level of the network, and ensure the security of network systems and user data.
In summary, the present application has at least the following effects:
The method has the advantages that the multi-dimensional feature extraction is carried out on the network information data to be detected, and the multi-dimensional feature extraction comprises various types of features such as texts, files, network flow and images, so that the attribute and the feature of the information can be reflected more comprehensively, the accuracy and the robustness of information detection are improved, and whether the information is bad information can be judged more accurately by comprehensively analyzing the features of different types of information, so that the propagation of the bad information in a network is effectively restrained.
The network information is detected, and meanwhile, the data which is judged to be bad information can be monitored in real time and targeted processing measures can be adopted. Corresponding processing strategies are adopted for different types of bad information, such as deleting text information and image information, isolating file information, preventing network traffic and the like. The mechanism for real-time monitoring and targeted processing can effectively prevent the continuous propagation of bad information and protect network safety and user benefits.
By dividing the analysis processing subsystem into text, file, network flow and image analysis processing modules, modular processing is realized, each module is focused on the analysis processing of specific type information, flexibility and expandability of the system are enhanced, and the modular design enables the system to dynamically adjust functions and configurations of each module according to actual requirements, so that the system is better suitable for network information detection tasks under different scenes.
The text, the file, the network flow and the image analysis processing module are respectively responsible for analysis processing of corresponding type information, so that the refinement processing of the task is realized, the system can more efficiently detect the information, the processing speed and accuracy are improved, the processing task is decomposed into a plurality of modules for parallel processing, the system can fully utilize computing resources, the information detection task can be completed more quickly, and the processing requirement on large-scale data is effectively met.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. The network information detection method based on big data is characterized by comprising the following steps:
acquiring network information data to be detected, wherein the network information data comprises text information data, file information data, network flow information data and image information data;
Extracting characteristics of network information data to be detected to respectively obtain text characteristics, file characteristics, network flow characteristics and image characteristics;
Respectively analyzing and processing based on the text characteristics, the file characteristics, the network flow characteristics and the image characteristics, and respectively judging whether the text information data, the file information data and the network flow information data are bad information according to analysis and processing results;
and corresponding processing measures are adopted for the corresponding information data which is judged to be the bad information;
The text features are all word information data in the text information data, and the specific process for judging whether the text information data is bad information is as follows:
Reading text information data and dividing the text information data;
Removing stop words from the divided text information data;
performing morphological reduction processing on the text information data from which the stop word processing is removed to obtain all word information in the text information data;
Matching all word information in the text information data with the bad word stock respectively to obtain the number of bad words in the text information data;
calculating text reject ratio of the text information data based on all word information and reject ratio of the text information data, and comparing the text reject ratio of the text information data with a preset text reject ratio threshold;
If the bad coefficient of the text information data is higher than a preset bad coefficient threshold value, the text information data is regarded as bad information;
the specific calculation formula of the defect coefficient is as follows:
;
Wherein, Is a bad coefficient of the text information data,After matching with the bad word library in the text information data, the obtained number of bad words,For the total number of words in the text information data,To match the correction factor;
Acquiring word information data of a plurality of text information data, and respectively matching the word information data of the text information data with a bad word stock to respectively obtain measured quantity values of bad words of each text information data;
simultaneously acquiring the actual quantity value of the bad words of each text information data after the word information data of each text information data are respectively matched with the bad word library;
Comprehensively analyzing based on the measured quantity value and the actual quantity value of the bad words of each text information data by combining a moving index averaging method to obtain a matching correction factor;
The file characteristics are specifically file types in file information data, the file types comprise text files, image files, audio files and binary files, and the specific process for judging whether the file information data is bad information is as follows:
if the file features are text files, reading the content of the text files into a memory based on a file operation function in a program language, marking the content of the text files as character strings, closing file handles after the file reading is completed, releasing resources, carrying out hash calculation on the character strings based on a hash algorithm to obtain text hash values, matching the text hash values with bad texts Ha Xiku, and if the matching is successful, regarding file information data as bad information;
If the file features are the image file, adjusting the image information in the image file to a preset size, carrying out gray processing on the adjusted image information, calculating a pixel mean value based on the image information, constructing a character string based on the pixel mean value, namely, an image hash value, and matching the image hash value with a bad image Ha Xiku, and if the matching is successful, regarding the file information data as bad information;
If the file features are audio files, extracting a sound spectrum diagram in the audio files, converting the sound spectrum diagram into audio character strings, carrying out hash calculation on the audio character strings based on a hash algorithm to obtain audio hash values, matching the audio hash values with bad audio Ha Xiku, and if the matching is successful, regarding file information data as bad information;
if the file features are binary files, reading the content of the binary files into a memory based on a file operation function in a program language, marking the content as a character string, closing a file handle after the file reading is completed, releasing resources, carrying out hash calculation on the character string based on a hash algorithm to obtain binary hash values, matching the binary hash values with bad binary Ha Xiku, and if the matching is successful, regarding file information data as bad information;
The network flow characteristics are specifically the byte number of a data packet in the network flow information data, the data transmission speed, the source IP address and the target IP address of the data packet, the source port number and the target port number of the data packet, and the specific process of judging whether the network flow information data is bad information is as follows:
Normalizing based on the byte number of the extracted data packet, the data transmission speed, the source IP address and the target IP address of the data packet, the source port number and the target port number of the data packet;
Based on byte number of the data packet after normalization processing, data transmission speed, source IP address and destination IP address of the data packet, source port number and destination port number of the data packet, carrying out comprehensive analysis and calculation to obtain a network flow characteristic value;
And comparing the network flow characteristic value with a preset network flow characteristic threshold value, and if the network flow characteristic value is higher than the preset network flow characteristic threshold value, regarding the network flow information data as bad information.
2. The method for detecting network information based on big data according to claim 1, wherein the image information data comprises picture information data and video information data, the image features comprise picture features extracted based on the picture information data and video features extracted based on the video information data, the picture features are specifically color histograms, edge features, texture features and depth features of the picture information data, and the specific process of judging whether the picture information data is bad information is as follows:
carrying out feature fusion on the color histogram, edge features, texture features and depth features of the extracted picture information data to obtain a picture feature comprehensive vector;
and reading the image characteristic comprehensive vector, inputting the image characteristic comprehensive vector into a classification model for classification, and judging whether the image information data is bad information according to a classification result.
3. The method for detecting network information based on big data as set forth in claim 2, wherein the video features are color histogram, edge feature, texture feature, depth feature of picture information data of each key frame in the video information data, and the specific process for judging whether the video information data is bad information is as follows:
reading a color histogram, edge characteristics, texture characteristics and depth characteristics of the picture information data of each key frame in the extracted video information data, and inputting the color histogram, the edge characteristics, the texture characteristics and the depth characteristics into a classification model for classification to obtain the type of the picture information data of each key frame;
judging whether a key frame of the picture information data with poor category exists or not;
If so, the video information data is regarded as bad information.
4. The method for detecting network information based on big data according to claim 1, wherein the step of performing the corresponding processing steps on the corresponding information data determined as the bad information is specifically:
If the bad information is text information data and image information data, deleting the text data and the image information data containing the bad information to prevent continuous transmission, and monitoring the text content and the image content uploaded by a user in real time to discover and process the bad information in time;
If the bad information is file information data, adopting an isolation mode for the file containing the bad information to prevent continuous transmission, monitoring the content of the file uploaded by a user in real time, and finding and processing the bad file in time;
If the bad information is network traffic information data, using network security equipment to prevent the propagation of the bad traffic, and implementing access control measures for the source of the bad traffic to limit the access authority of the bad traffic to network resources.
5. The network information detection system based on big data, which is applied to the network information detection method based on big data according to any one of claims 1-4, is characterized by comprising a data acquisition subsystem, a feature extraction subsystem, an analysis processing subsystem and a processing measure subsystem;
The data acquisition subsystem is used for acquiring network information data to be detected based on network monitoring software, wherein the network information data comprises text information data, file information data, network flow information data and image information data;
the feature extraction subsystem is used for extracting features of the network information data to be detected to respectively obtain text features, file features, network flow features and image features;
The analysis processing subsystem is used for respectively carrying out analysis processing based on the text characteristics, the file characteristics, the network flow characteristics and the image characteristics, and respectively judging whether the text information data, the file information data and the network flow information data are bad information according to analysis processing results;
and the processing measure subsystem is used for taking corresponding processing measures for the corresponding information data which is judged to be the bad information.
6. The big data based network information detection system of claim 5, wherein the analysis processing subsystem comprises a text analysis processing module, a file analysis processing module, a network traffic analysis processing module, and an image analysis processing module;
The text analysis processing module is used for carrying out analysis processing according to the text characteristics and judging whether the text information data is bad information or not;
the file analysis processing module is used for carrying out analysis processing according to file characteristics and judging whether file information data are bad information or not;
the network flow analysis processing module is used for carrying out analysis processing according to the network flow characteristics and judging whether the network flow information data is bad information or not;
the image analysis processing module is used for carrying out analysis processing according to the image characteristics and judging whether the image information data is bad information or not.
CN202410426391.7A 2024-04-10 2024-04-10 A network information detection method and system based on big data Active CN118300851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410426391.7A CN118300851B (en) 2024-04-10 2024-04-10 A network information detection method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410426391.7A CN118300851B (en) 2024-04-10 2024-04-10 A network information detection method and system based on big data

Publications (2)

Publication Number Publication Date
CN118300851A CN118300851A (en) 2024-07-05
CN118300851B true CN118300851B (en) 2025-03-21

Family

ID=91677675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410426391.7A Active CN118300851B (en) 2024-04-10 2024-04-10 A network information detection method and system based on big data

Country Status (1)

Country Link
CN (1) CN118300851B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119172754A (en) * 2024-08-02 2024-12-20 中国电信股份有限公司 Spam message blocking method, device, electronic device and computer readable medium
CN119046613B (en) * 2024-08-20 2025-05-30 学科网(北京)股份有限公司 Sensitive content detection-based material preprocessing method and system
CN119520161B (en) * 2025-01-15 2025-04-08 浙江云针信息科技有限公司 Deep feature aggregation guided light-weight network attack detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112003850A (en) * 2020-08-14 2020-11-27 北京浪潮数据技术有限公司 Flow monitoring method, device, equipment and storage medium based on cloud network
CN113254577A (en) * 2021-05-11 2021-08-13 北京鸿腾智能科技有限公司 Sensitive file detection method, device, equipment and storage medium
CN115827870A (en) * 2022-12-26 2023-03-21 中移信息技术有限公司 Data processing method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107547555B (en) * 2017-09-11 2021-04-16 北京匠数科技有限公司 Website security monitoring method and device
CN109753987B (en) * 2018-04-18 2021-08-06 新华三信息安全技术有限公司 File recognition method and feature extraction method
CN116955522A (en) * 2023-07-26 2023-10-27 招商银行股份有限公司 Sensitive word detection method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112003850A (en) * 2020-08-14 2020-11-27 北京浪潮数据技术有限公司 Flow monitoring method, device, equipment and storage medium based on cloud network
CN113254577A (en) * 2021-05-11 2021-08-13 北京鸿腾智能科技有限公司 Sensitive file detection method, device, equipment and storage medium
CN115827870A (en) * 2022-12-26 2023-03-21 中移信息技术有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN118300851A (en) 2024-07-05

Similar Documents

Publication Publication Date Title
CN118300851B (en) A network information detection method and system based on big data
CN110351244A (en) A kind of network inbreak detection method and system based on multireel product neural network fusion
CN109309675A (en) A network intrusion detection method based on convolutional neural network
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
CN111460441A (en) A network intrusion detection method based on batch normalized convolutional neural network
CN116910752B (en) Malicious code detection method based on big data
CN110991246A (en) Video detection method and system
CN119484017A (en) A method and system for identifying and analyzing power sensitive data
CN115019121B (en) A method for defending against adversarial sample attacks for network digital image recognition
CN117807590B (en) Information security prediction and monitoring system and method based on artificial intelligence
CN111680286B (en) Refinement method of Internet of things equipment fingerprint library
CN110995713A (en) Botnet detection system and method based on convolutional neural network
CN119312224B (en) Abnormal data detection and classification method based on deep learning
CN119832673B (en) A campus security system based on the Internet of Things and its service terminal
CN119046481B (en) Multimedia video stream management system and method based on artificial intelligence
CN120433991A (en) A network security vulnerability detection method
CN114268484A (en) Malicious encrypted flow detection method and device, electronic equipment and storage medium
CN119276563A (en) A network threat monitoring and early warning method based on YOLO algorithm
CN118474043A (en) SD-WAN application identification method and system based on deep learning
CN117527295A (en) Self-adaptive network threat detection system based on artificial intelligence
CN113869182B (en) A video anomaly detection network and its training method
CN117938494A (en) Cloud service-oriented abnormal attack detection method and system based on 1D-TextCNN
CN117372804A (en) An adversarial sample defense method based on image gradient calculation
CN100363943C (en) Color Image Matching Analysis Method Based on Color Content and Distribution
CN114462510A (en) Equipment classification method and system for precise protection of Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20250228

Address after: Room A-T2321, Building 3, No. 20 Yong'an Road, Shilong Economic Development Zone, Mentougou District, Beijing, 102300 (Cluster Registration)

Applicant after: Beijing Anqianxingke Technology Co.,Ltd.

Country or region after: China

Address before: 801, Floor 8, Xincheng Power Industry Building, No. 398, Ganquan Road, Shushan District, Hefei City, Anhui Province, 230000

Applicant before: Hefei cloud Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant