CN114760124B

CN114760124B - Big data based computer network security intelligent analysis system and method

Info

Publication number: CN114760124B
Application number: CN202210364704.1A
Authority: CN
Inventors: 娄存恺; 金旭佳
Original assignee: Yabang Management Technology Beijing Co ltd
Current assignee: Yabang Management Technology Beijing Co ltd
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-10-04
Anticipated expiration: 2042-04-07
Also published as: CN114760124A

Abstract

The invention discloses a computer network safety intelligent analysis system and a computer network safety intelligent analysis method based on big data, wherein the intelligent analysis system comprises an authentication database, an operation information monitoring module, a website judging module and an access analysis module, the authentication database is used for storing the website of an authentication website, the operation information monitoring module is used for monitoring the operation information of a current user of a computer, when the situation that the current user opens a new website is detected, the new website is the to-be-detected website, the website of the to-be-detected website is the to-be-detected website, the website judging module is used for judging whether the to-be-detected website is the website in the authentication database, if the to-be-detected website is the website in the authentication database, the current user is allowed to directly access, if the to-be-detected website is the website other than the website in the authentication database, the access analysis module is used for obtaining the characteristic information of the to-be-detected website and the historical operation information of the current user, and accordingly judging whether to send access early warning information.

Description

Big data based computer network security intelligent analysis system and method

Technical Field

The invention relates to the technical field of computers, in particular to a computer network security intelligent analysis system and method based on big data.

Background

With the development of internet technology, more and more websites come into operation, people inevitably complete various matters in life by visiting the websites in daily life, such as transferring money in online banks or purchasing online in e-commerce websites, and the like, thereby facilitating the daily life of people. At the same time, many illegal websites are bred. These illegal websites either steal private information such as bank accounts and passwords submitted by users, or induce users to go wrong way, and once they are mistakenly entered into these illegal websites, there is a possibility that they may pose a significant threat to the physical and mental health and property safety of users.

In the prior art, some websites newly visited by a user cannot be effectively monitored, and the user is easy to mistakenly access an illegal website.

Disclosure of Invention

The invention aims to provide a computer network security intelligent analysis system and method based on big data, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme: a computer network security intelligent analysis system based on big data comprises an authentication database, an operation information monitoring module, a website judging module and an access analysis module, wherein the authentication database is used for storing the website of an authentication website, the operation information monitoring module is used for monitoring the operation information of a current user of a computer, when the situation that the current user opens a new website is detected, the new website is obtained and is a to-be-detected website, the website of the to-be-detected website is the to-be-detected website, the website judging module is used for judging whether the to-be-detected website is the website in the authentication database, if the to-be-detected website is the website in the authentication database, the current user is allowed to directly access, and if the to-be-detected website is the website other than the website in the authentication database, the access analysis module obtains the characteristic information of the to-be-detected website and the historical operation information of the current user, and accordingly judges whether to send access early warning information.

Further, the access analysis module comprises a website analysis module and a user analysis module, the website analysis module comprises a font parameter acquisition module, a color parameter acquisition module, a matching parameter acquisition module, an in-doubt parameter calculation module and an in-doubt parameter comparison module, the font parameter acquisition module acquires font parameters S of the website to be detected according to font information on a page of the website to be detected, the color parameter acquisition module divides a homepage of the website to be detected into m detection regions according to different background colors of the regions, sorts the background color of the homepage of the website to be detected according to the area occupied by the background color in the homepage of the website to be detected in a sequence from multiple to multiple, selects the first background color as a central color, and calculates the color parameter of the website to be detected

Wherein t is the number of detection areas with the background color as the central color in the m detection areas, e is the ratio of the area of the detection areas with the background color as the central color in the website to be detected to the total area of all the detection areas of the website to be detected, the matching parameter acquisition module calculates the matching parameter Z of the homepage of the website to be detected according to the font information and the color matching information of the page to be detected, and the in-doubt parameter calculation module calculates the in-doubt parameter R =0.62 of the website to be detected* S + 0.22C + 0.16Z, the in-doubt parameter comparison module compares the in-doubt parameter of the website to be detected with the in-doubt threshold, and if the in-doubt parameter of the website to be detected is greater than the in-doubt threshold, the current user is allowed to directly access; and if the doubt parameter of the website to be detected is smaller than the doubt threshold value, analyzing the historical operation information of the current user.

Further, the font parameter obtaining module includes a center size selecting module, a size classifying module, a ranking coefficient obtaining module and a font parameter calculating module, the center size selecting module sorts font sizes of the website homepage to be detected according to a size order from large to small to obtain an analysis ranking, the number of fonts of each font size in the website homepage to be detected is obtained respectively, the font size with the largest number of selected fonts is the center size, the size classifying module obtains the font size on the left side of the center size in the analysis ranking as a first size, the font size on the right side of the center size as a second size, the font size category number in the first size is a first category number, the font size category number in the second size is a second category number, the ranking coefficient obtaining module calculates the ranking coefficient b = d1/d2 of the center size, wherein d1 is the smaller one of the first category number and the second category number, d2 is the larger one of the first category number and the second category number, and the font parameter calculating module to be detected calculates the font parameter of the website to be detected

The matching parameter acquiring module calculates a matching parameter Z = Ns/h of the website homepage to be detected, wherein, h is the number of fonts corresponding to the center size, g1 is the sum of the numbers of the fonts corresponding to the first size and the center size, and g2 is the sum of the numbers of the fonts corresponding to the second size and the center size, and Ns is the number of the fonts with the center size and the background color in the website homepage to be detected.

Further, the user analysis module includes a feature average value calculation module and an average value comparison module, the feature average value calculation module obtains an average value of feature indexes of a current user logging in a computer to access a website every time in a recent period of time, wherein the feature index w = Ys/Yz of the current user logging in the computer to access the website every time, yz is the total number of the current user logging in the computer to access the website every time, ys is the number of illegal websites in the website accessed by the current user logging in the computer every time, the average value comparison module compares the average value of the feature indexes of the current user with a feature threshold, if the average value of the feature indexes of the current user is smaller than the feature threshold, the current user is allowed to directly access, and if not, early warning information that the website to be detected is suspected to have danger is sent to the user.

A big data-based computer network security intelligent analysis method comprises the following steps:

pre-establishing an authentication database for storing the website address of the authentication website,

monitoring the operation information of the current user of the computer, acquiring a new website as a to-be-detected website when detecting that the current user opens the new website, and acquiring the website of the to-be-detected website as the to-be-detected website,

if the web address to be detected is a web address in the authentication database, the current user is allowed to directly access,

and if the website to be detected is a website other than the website in the authentication database, acquiring the characteristic information of the website to be detected and the historical operation information of the current user, and judging whether to send access early warning information or not according to the characteristic information.

Further, the acquiring the characteristic information of the website to be detected includes:

collecting page information of a website to be detected,

sorting the font sizes of the website homepage to be detected according to the size from large to small to obtain analysis sorting, respectively obtaining the number of fonts of each font size in the website homepage to be detected, selecting the font size with the largest number of fonts as the center size,

obtaining the font size on the left side of the center size in the analysis sorting as a first size, the font size on the right side of the center size as a second size, the font size category number in the first size as a first category number, the font size category number in the second size as a second category number,

then the center size ranking factor b = d1/d2, where d1 is the smaller of the first number of classes and the second number of classes, d2 is the larger of the first number of classes and the second number of classes,

calculating font parameters of to-be-detected website

Wherein h is the number of the fonts corresponding to the center size, g1 is the sum of the number of the fonts corresponding to the first size and the center size, g2 is the sum of the number of the fonts corresponding to the second size and the center size,

dividing the web page to be detected into m detection regions according to the difference of the background colors of the regions, sorting the background colors of the web page to be detected according to the area occupied by the background colors in the web page to be detected in a sequence from the top to the bottom, selecting the first background color as the central color,

calculating color parameters of to-be-detected website

Wherein t is the number of detection regions with the background color as the central color in the m detection regions, e is the ratio of the area of the detection region with the background color as the central color in the website to be detected to the total area of all the detection regions of the website to be detected,

calculating a matching parameter Z = Ns/h of the website homepage to be detected, wherein Ns is the number of fonts with the font size as the center size in the website homepage to be detected and the background color as the center color;

then the in-doubt parameter R =0.62 + s +0.22 + c +0.16 + z of the website to be detected,

if the doubt parameter of the website to be detected is greater than the doubt threshold value, allowing the current user to directly access;

and if the doubt parameter of the website to be detected is smaller than the doubt threshold value, analyzing the historical operation information of the current user.

Further, the analyzing the historical operation information of the current user includes:

acquiring an average value of characteristic indexes of a current user for accessing a website by logging in a computer every time in a recent period of time, wherein the characteristic index w = Ys/Yz of the current user for accessing the website by logging in the computer every time, yz is the total number of the current user for accessing the website by logging in the computer every time, ys is the number of illegal websites in the current user for accessing the website by logging in the computer every time,

if the average value of the feature index of the current user is less than the feature threshold, allowing the current user direct access,

and otherwise, sending early warning information suspected of danger in the website to be detected to the user.

Further, the pre-established authentication database includes:

when the number of times that the user accesses a certain website is larger than the threshold number of times, the website is an authentication website.

Further, the illegal website includes a phishing website, a gambling website, and a marketing website.

Compared with the prior art, the invention has the following beneficial effects: when a user accesses a new website, whether the current website has a potential risk is judged by acquiring the distribution condition of font sizes in the website and the condition of background colors of areas in the website, when the website is judged to have the potential risk, the historical operation condition information of the user is analyzed, and when the user is judged to be possibly deceived by the website, the reminding early warning information is sent out, so that the probability that the user is deceived by the website is reduced, and the personal and property safety of the user in the internet surfing process is maintained.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a big data-based computer network security intelligent analysis system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a computer network security intelligent analysis system based on big data comprises an authentication database, an operation information monitoring module, a website judging module and an access analysis module, wherein the authentication database is used for storing the website of an authentication website, the operation information monitoring module is used for monitoring the operation information of a current user of a computer, when the situation that the current user opens a new website is detected, the new website is obtained and is a to-be-detected website, the website of the to-be-detected website is the to-be-detected website, the website judging module is used for judging whether the to-be-detected website is the website in the authentication database, if the to-be-detected website is the website in the authentication database, the current user is allowed to directly access, and if the to-be-detected website is the website other than the website in the authentication database, the access analysis module obtains the characteristic information of the to-be-detected website and the historical operation information of the current user, and accordingly judges whether to send access early warning information.

The access analysis module comprises a website analysis module and a user analysis module, the website analysis module comprises a font parameter acquisition module, a color parameter acquisition module, a matching parameter acquisition module, an in-doubt parameter calculation module and an in-doubt parameter comparison module, the font parameter acquisition module acquires a font parameter S of the website to be detected according to font information on a page of the website to be detected, the color parameter acquisition module divides a homepage of the website to be detected into m detection areas according to the difference of background colors of the areas, sorts the background colors of the homepage of the website to be detected according to the sequence from more than one to less according to the area occupied by the background colors in the homepage of the website to be detected, selects the first background color as a central color, and calculates the color parameter of the website to be detected

The method comprises the following steps that t is the number of detection areas with the background color as the central color in m detection areas, e is the ratio of the area of the detection areas with the background color as the central color in a website to be detected to the total area of all the detection areas of the website to be detected, a matching parameter obtaining module calculates a matching parameter Z of a homepage of the website to be detected according to font information and color matching information of a page to be detected, an in-doubt parameter calculating module calculates an in-doubt parameter R = 0.62S + 0.22C + 0.16Z of the website to be detected, an in-doubt parameter comparing module compares the in-doubt parameter of the website to be detected with an in-doubt threshold, and if the in-doubt parameter of the website to be detected is greater than the in-doubt threshold, a current user is allowed to directly access; and if the doubt parameter of the website to be detected is smaller than the doubt threshold value, analyzing the historical operation information of the current user.

The font parameter obtaining module comprises a center size selecting module, a size classifying module, a ranking coefficient obtaining module and a font parameter calculating module, wherein the center size selecting module is used for sequencing font sizes of the website homepage to be detected from large to small to obtain analysis sequencing, the number of fonts of each font size in the website homepage to be detected is obtained respectively, the font size with the largest number of the selected fonts is the center size, the size classifying module is used for obtaining the font size on the left side of the center size in the analysis sequencing as a first size, the font size on the right side of the center size as a second size, the font size number in the first size is a first number, the font size number in the second size is a second number, the ranking coefficient obtaining module is used for calculating the ranking coefficient b = d1/d2 of the center size, wherein d1 is the smaller one of the first number and the second number, d2 is the larger one of the first number and the second number, and the font parameter calculating module is used for calculating the website to be detected

H is the number of the fonts corresponding to the center size, g1 is the sum of the number of the fonts corresponding to the first size and the center size, and g2 is the second sizeThe matching parameter acquisition module calculates the matching parameter Z = Ns/h of the website homepage to be detected, wherein Ns is the number of fonts with the font size as the center size in the website homepage to be detected, and the background color as the center color.

The user analysis module comprises a characteristic average value calculation module and an average value comparison module, wherein the characteristic average value calculation module obtains an average value of characteristic indexes of a current user logging in a computer to access a website every time in a recent period of time, the characteristic index w = Ys/Yz of the current user logging in the computer to access the website every time, yz is the total number of the current user logging in the computer to access the website every time, ys is the number of illegal websites in the website accessed by the current user logging in the computer every time, the average value comparison module compares the average value of the characteristic indexes of the current user with a characteristic threshold value, if the average value of the characteristic indexes of the current user is smaller than the characteristic threshold value, the current user is allowed to directly access, and if not, early warning information that the website to be detected is suspected to have danger is sent to the user.

pre-establishing an authentication database, wherein the authentication database is used for storing the website address of an authentication website, when the number of times that a user accesses a certain website is greater than a number threshold value, the website is the authentication website,

if the web address to be detected is a web address other than the web address in the authentication database,

collecting page information of a website to be detected,

obtaining the font size on the left side of the center size in the analysis sorting as a first size, the font size on the right side of the center size in the analysis sorting as a second size, the font size category number in the first size as a first category number, the font size category number in the second size as a second category number,

calculating font parameters of to-be-detected website

Wherein h is the number of fonts corresponding to the center size, g1 is the sum of the number of fonts corresponding to the first size and the center size, g2 is the sum of the number of fonts corresponding to the second size and the center size,

for example, the order of the analysis ranks is: the number of fonts corresponding to size 1, size 2, size 3, size 4, size 5 and size 6 is 10, 25, 80, 50, 20 and 8 respectively,

then the center size is size 3, size 1 and size 2 are first sizes, size 4, size 5 and size 6 are second sizes, the number of the first categories is 2, the number of the second categories is 3, then the ranking coefficient b =2/3 of the center size, then the font parameters of the website to be detected

According to the method, the font size in the homepage is set to be larger in consideration of clicking of a plurality of illegal websites for attraction, so that the number of large-size fonts is larger, while the font distribution of the normal websites is in the condition that the large-size fonts and the small-size fonts are relatively fewer, and the medium-size fonts are more, the font parameters are calculated by utilizing the number proportion and the ranking coefficient of the central-size fonts, and the websites are judged to be illegal networks according to the font parametersThe probability of the website is smaller, and when the font parameters are larger, the probability that the website is an illegal website is smaller;

dividing the website homepage to be detected into m detection areas according to the difference of the background colors of the areas, sorting the background colors of the website homepage to be detected according to the area occupied by the background colors in the website homepage to be detected in a sequence from multiple to few, and selecting the first background color as a central color, wherein the background colors of the areas do not comprise the white regions arranged on the two sides of the webpage;

calculating color parameters of to-be-detected website

The method comprises the steps that t is the number of detection areas with the background color as the central color in m detection areas, and e is the ratio of the area of the detection areas with the background color as the central color in a website to be detected to the total area of all the detection areas of the website to be detected, the method considers that a lot of illegal websites click for attraction, the colors of the liriohu whistle arranged on the homepage of the website are mixed and scattered, a normal website is often provided with a main background color, the area occupied by the main background color is larger, based on the method, the color parameters are used as the probability of judging the website to be the illegal website, when the color parameters are smaller, the website is not provided with the main background color, the colors of all the areas are mixed, and the probability of the website to be the illegal website is larger;

calculating a matching parameter Z = Ns/h of the website homepage to be detected, wherein Ns is the number of fonts with the font size as the center size in the website homepage to be detected and the background color as the center color; in a normal website, the font corresponding to the main background color is the font size with the largest font number in the whole page, so that the probability that the website to be detected is an illegal website is lower when the matching parameter is larger;

if the doubt parameter of the website to be detected is smaller than the doubt threshold value, analyzing the historical operation information of the current user; when the doubt parameter is smaller, the probability that the website to be detected is an illegal website is higher;

the analyzing the historical operation information of the current user comprises:

acquiring an average value of characteristic indexes of websites accessed by a current user through logging in a computer every time in a recent period of time, wherein the characteristic index w = Ys/Yz of the websites accessed by the current user through logging in the computer every time, yz is the total number of the websites accessed by the current user through logging in the computer every time, ys is the number of illegal websites in the websites accessed by the current user through logging in the computer every time, and the illegal websites comprise phishing websites, gambling websites and marketing websites;

if the average value of the characteristic indexes of the current user is less than the characteristic threshold value, allowing the current user to directly access,

and if the average value of the characteristic indexes of the current user is larger than or equal to the characteristic threshold, sending early warning information of suspected danger of the website to be detected to the user. When the average value of the characteristic indexes of the user is larger, the probability that the user accesses an illegal website in the internet surfing process is higher, and therefore, the user needs to be reminded and early warned in advance.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An intelligent analysis system for computer network security based on big data is characterized by comprising an authentication database, an operation information monitoring module, a website judging module and an access analysis module, wherein the authentication database is used for storing websites of authentication websites, the operation information monitoring module is used for monitoring operation information of a current user of a computer, when the situation that the current user opens a new website is detected, the new website is a to-be-detected website, the website of the to-be-detected website is the to-be-detected website, the website judging module is used for judging whether the to-be-detected website is the website in the authentication database, if the to-be-detected website is the website in the authentication database, the current user is allowed to directly access, and if the to-be-detected website is the website in the authentication database except the website, the access analysis module obtains characteristic information of the to-be-detected website and historical operation information of the current user, and judges whether to send access early warning information according to the characteristic information and the historical operation information of the current user;

the access analysis module comprises a website analysis module and a user analysis module, the website analysis module comprises a font parameter acquisition module, a color parameter acquisition module, a matching parameter acquisition module, an in-doubt parameter calculation module and an in-doubt parameter comparison module, the font parameter acquisition module acquires font parameters S of the website to be detected according to font information on a page of the website to be detected, the color parameter acquisition module divides the homepage of the website to be detected into m detection regions according to the difference of background colors of the regions, sorts the background colors of the homepage of the website to be detected according to the sequence from multiple to few of the area occupied by the background colors in the homepage of the website to be detected, selects the first background color as a central color, and calculates the color parameter of the website to be detected

WhereinT is the number of detection areas with the background color as the central color in the m detection areas, e is the ratio of the area of the detection areas with the background color as the central color in the website to be detected to the total area of all the detection areas of the website to be detected, the matching parameter obtaining module calculates the matching parameter Z of the homepage of the website to be detected according to the font information and the color matching information of the page to be detected, the in-doubt parameter calculating module calculates the in-doubt parameter R = 0.62S + 0.22C + 0.16Z of the website to be detected, the in-doubt parameter comparing module compares the in-doubt parameter of the website to be detected with the in-doubt threshold, and if the in-doubt parameter of the website to be detected is greater than the in-doubt threshold, the current user is allowed to directly access; if the doubt parameter of the website to be detected is smaller than the doubt threshold value, analyzing the historical operation information of the current user;

Wherein h is the number of fonts corresponding to the center size, g1 is the sum of the numbers of the fonts corresponding to the first size and the center size, and g2 is the sum of the numbers of the fonts corresponding to the second size and the center size, and the matching parameters are obtainedAnd calculating a matching parameter Z = Ns/h of the website homepage to be detected by the module, wherein Ns is the number of fonts with the font size as the center size in the website homepage to be detected and the background color as the center color.

2. The big data based computer network security intelligent analysis system of claim 1, wherein: the user analysis module comprises a characteristic average value calculation module and an average value comparison module, wherein the characteristic average value calculation module obtains an average value of characteristic indexes of a current user logging in a computer to access a website every time in a recent period of time, the characteristic index w = Ys/Yz of the current user logging in the computer to access the website every time, yz is the total number of the current user logging in the computer to access the website every time, ys is the number of illegal websites in the website accessed by the current user logging in the computer every time, the average value comparison module compares the average value of the characteristic indexes of the current user with a characteristic threshold value, if the average value of the characteristic indexes of the current user is smaller than the characteristic threshold value, the current user is allowed to directly access, and if not, early warning information that the website to be detected is suspected to have danger is sent to the user.

3. A computer network security intelligent analysis method based on big data is characterized in that: the intelligent analysis method comprises the following steps:

pre-establishing an authentication database for storing a website address of an authentication website,

monitoring the operation information of the current user of the computer, acquiring a new website as a website to be detected when detecting that the current user opens the new website,

if the website to be detected is a website other than the website in the authentication database, acquiring characteristic information of the website to be detected and historical operation information of a current user, and judging whether to send access early warning information or not according to the characteristic information;

the acquiring of the characteristic information of the website to be detected comprises the following steps:

collecting page information of a website to be detected,

calculating font parameters of to-be-detected website

calculating color parameters of to-be-detected website

4. The big data-based computer network security intelligent analysis method according to claim 3, wherein: the analyzing the historical operation information of the current user comprises:

5. The big data-based computer network security intelligent analysis method according to claim 3, wherein: the pre-established authentication database comprises:

6. The big data-based computer network security intelligent analysis method according to claim 4, wherein: the illegal websites comprise phishing websites, gambling websites and marketing websites.