CN118094049A - Portal website dynamic management system based on big data - Google Patents

Portal website dynamic management system based on big data Download PDF

Info

Publication number
CN118094049A
CN118094049A CN202410471580.6A CN202410471580A CN118094049A CN 118094049 A CN118094049 A CN 118094049A CN 202410471580 A CN202410471580 A CN 202410471580A CN 118094049 A CN118094049 A CN 118094049A
Authority
CN
China
Prior art keywords
partition
data
user
access
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410471580.6A
Other languages
Chinese (zh)
Inventor
兰佳福
黄小能
高璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Provincial Government Portal Website Operation Management Co ltd
Original Assignee
Fujian Provincial Government Portal Website Operation Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Provincial Government Portal Website Operation Management Co ltd filed Critical Fujian Provincial Government Portal Website Operation Management Co ltd
Priority to CN202410471580.6A priority Critical patent/CN118094049A/en
Publication of CN118094049A publication Critical patent/CN118094049A/en
Pending legal-status Critical Current

Links

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention discloses a portal dynamic management system based on big data, which belongs to the technical field of website management and specifically comprises the following steps: collecting data of website users, including documents, music, web pages and videos, are acquired; counting browsing times and duration, marking marks exceeding a preset threshold value as key data, generating brief introduction, and deleting failure data; the key data are stored in the independent partition and are divided into a first partition and a second partition, the key data are only stored in the first partition, the data of browsing times and the inverse of duration are stored in the second partition, and the data quantity of the two partitions is the same; the non-original user IP accesses the independent partition to verify the account number and the password, and refuses to access if the account number and the password are wrong; the original user identity information and the problems are further verified if the user identity information and the problems are correct, only the second partition is allowed to be accessed if the user identity information and the problems are wrong, and only the first partition is allowed to be accessed if the user identity information and the problems are correct; the invention realizes the protection of the access content of the website user.

Description

Portal website dynamic management system based on big data
Technical Field
The invention relates to the technical field of website management, in particular to a portal dynamic management system based on big data.
Background
Today, with the daily and monthly variation of internet technology, portal sites have gradually evolved into important platforms for people to acquire information, share views and interact and communicate. It plays a vital role as an information distribution place in the digital age. The quality of the dynamic management of the portal directly relates to the satisfaction of the user experience and the information transfer efficiency. However, in reality, many conventional portal management systems have obvious disadvantages in terms of data update, user behavior analysis, data security, and the like.
Conventional portal management systems often face the problem of untimely data updates. In the age of information explosion, timeliness and accuracy of information are of paramount importance. However, many portals cannot be updated in real time due to technical limitations or administrative negligence, so that the information acquired by the user may be outdated or lost in value. This not only affects the user's experience, but also compromises the reputation and creditability of the web portal.
In addition, the accuracy of user behavior analysis is also an important aspect of portal management. Through deep knowledge of user preference, behavior mode and consumption habit, the portal can provide more personalized and accurate content recommendation, thereby improving user viscosity and satisfaction. However, many existing systems perform poorly in this regard, often failing to accurately capture the actual needs and behavioral characteristics of the user, resulting in uneven quality of the recommended content.
Data security is a non-negligible loop of portal management. With the frequency of network attacks and data leakage events, users have increasingly demanded personal information protection. Portal sites are important storage and transmission platforms for user information, and strict security measures must be taken to ensure the security of user data. In reality, however, many portals have vulnerabilities in terms of data protection, exposing user data to the risk of leakage and abuse.
Disclosure of Invention
The invention aims to provide a portal dynamic management system based on big data, which solves the following technical problems:
Portal sites are important storage and transmission platforms for user information, and strict security measures must be taken to ensure the security of user data. In reality, however, many portals have vulnerabilities in terms of data protection, exposing user data to the risk of leakage and abuse.
The aim of the invention can be achieved by the following technical scheme:
a big data based portal dynamic management system comprising:
The content acquisition module is used for acquiring collection data of website users according to website URLs, wherein the collection data comprises but is not limited to documents, music, webpages and videos;
The content analysis module is used for counting the browsing times and browsing time of a user on any collection data, marking the collection data of which the browsing times and browsing time exceed the corresponding preset threshold values as key data, generating brief introduction contents of a document and a webpage, and deleting the invalid collection data;
The data management module is used for transferring and storing key data into an independent partition, dividing the independent partition into a first partition and a second partition, storing the key data into the first partition only, screening the collection data of browsing times and browsing duration reciprocal of a user, and storing the collection data of the first partition and the second partition into the second partition, wherein the collection data of the first partition and the second partition are the same in quantity;
The normal state monitoring module is used for verifying the account number and the password of the independent partition to the visitor when any non-original user IP tries to access the independent partition; if the account password verification is wrong, refusing access; if the account password is verified correctly, verifying the identity information and verification problems of the original user for the visitor; if the identity information and the verification question answer is wrong, only allowing the visitor to access the second partition; if the identity information and the authentication questions are answered correctly, the user is only allowed to access the first partition.
As a further scheme of the invention: in the data management module, if the fields of the documents in the collection data of the second partition are identical to the fields of the documents in the collection data of the first partition, and the fields are subject classification, deleting 50% of the collection data in the second partition, and collecting the documents different from the fields of the first partition along the reciprocal order.
As a further scheme of the invention: in the data management module, if the number of the documents in the second partition cannot reach the same number as the documents in the first partition, randomly selecting the documents in the field different from the first partition from the website for filling.
As a further scheme of the invention: when the access record of the account password, the identity information or the wrong answer of the verification question is acquired, the corresponding access record is sent to the user side.
As a further scheme of the invention: in the content analysis module, the process of generating the brief introduction is as follows:
Dividing the content of a document and a webpage according to each paragraph and sentence, removing interference elements, detecting the first three paragraphs of the text based on a natural language processing method, if the text paragraphs are smaller than three, detecting all the paragraphs, searching paragraphs with different text formats from the rest, wherein the text formats comprise but are not limited to fonts, word sizes and colors, extracting keywords from the paragraphs through an entity recognition model, combining sentences containing the keywords, carrying out NLP (non-linear language) correction on context information, and generating brief contents.
As a further scheme of the invention: the interfering elements include headers, footers, advertisements, and DOCK columns.
As a further scheme of the invention: the keyword extraction process comprises the following steps:
And inputting paragraph contents into the entity recognition model after position coding, wherein the entity recognition module comprises a BiLSTM layer and a CRF layer, performing deep learning on word vector context feature information through the BiLSTM layer, outputting the probability of each text information corresponding to a label, generating a label sequence, inputting the label sequence into the CRF layer for sorting, obtaining a label sequence optimizing an objective function, extracting content features in the label sequence, clustering to generate undetermined keywords, and using the undetermined keywords to reversely propagate to update the model to obtain the keywords repeatedly updated for a plurality of times.
As a further scheme of the invention: in the normal state monitoring module, a user with access failure is marked as a pending user, for any pending user, the ratio of the total number of times the user accesses all the personal partitions to the registration duration is calculated, if the ratio is smaller than a preset threshold value and the user does not generate own personal partition, the user is judged to be an abnormal user, and the authority of the user to access the personal partition is limited
The invention has the beneficial effects that:
According to the method and the system, the key content which is more interesting to the user can be identified by analyzing the browsing times and the browsing time of the user on the collection data, the key data are stored in the independent partition and are managed separately from the common data, and the data processing efficiency and the system response speed are improved; by setting an access verification mechanism, including account password verification and identity information verification, the security of the system is enhanced, and private data of a user is protected from unauthorized access; the wrong access attempt is monitored and recorded, and the information is fed back to the user, so that the user can know the account security condition timely.
Drawings
The invention is further described below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention is a portal dynamic management system based on big data, comprising:
The content acquisition module is used for acquiring collection data of website users according to website URLs, wherein the collection data comprises but is not limited to documents, music, webpages and videos;
The content analysis module is used for counting the browsing times and browsing time of a user on any collection data, marking the collection data of which the browsing times and browsing time exceed the corresponding preset threshold values as key data, generating brief introduction contents of a document and a webpage, and deleting the invalid collection data;
The data management module is used for transferring and storing key data into an independent partition, dividing the independent partition into a first partition and a second partition, storing the key data into the first partition only, screening the collection data of browsing times and browsing duration reciprocal of a user, and storing the collection data of the first partition and the second partition into the second partition, wherein the collection data of the first partition and the second partition are the same in quantity;
The normal state monitoring module is used for verifying the account number and the password of the independent partition to the visitor when any non-original user IP tries to access the independent partition; if the account password verification is wrong, refusing access; if the account password is verified correctly, verifying the identity information and verification problems of the original user for the visitor; if the identity information and the verification question answer is wrong, only allowing the visitor to access the second partition; if the identity information and the authentication questions are answered correctly, the user is only allowed to access the first partition.
According to the method and the system, the key content which is more interesting to the user can be identified by analyzing the browsing times and the browsing time of the user on the collection data, the key data are stored in the independent partition and are managed separately from the common data, and the data processing efficiency and the system response speed are improved; by setting an access verification mechanism, including account password verification and identity information verification, the security of the system is enhanced, and private data of a user is protected from unauthorized access; the wrong access attempt is monitored and recorded, and the information is fed back to the user, so that the user can know the account security condition timely.
In another preferred embodiment of the present invention, in the data management module, if the document in the collection data of the second partition is identical to the document in the collection data of the first partition in the domain, and the domain is subject classification, deleting 50% of the collection data in the second partition, and collecting documents different from the domain of the first partition in the order of reciprocal.
In another preferred embodiment of the present invention, in the data management module, if the number of documents in the second partition cannot reach the same number as that of documents in the first partition, randomly selecting, from the website, documents different from the domain of the first partition for filling.
In another preferred embodiment of the present invention, when an access record of an account number password, identity information or an answer error of a verification question is collected, the corresponding access record is sent to the user side.
In another preferred embodiment of the present invention, in the content analysis module, the process of generating the profile is:
Dividing the content of a document and a webpage according to each paragraph and sentence, removing interference elements, detecting the first three paragraphs of the text based on a natural language processing method, if the text paragraphs are smaller than three, detecting all the paragraphs, searching paragraphs with different text formats from the rest, wherein the text formats comprise but are not limited to fonts, word sizes and colors, extracting keywords from the paragraphs through an entity recognition model, combining sentences containing the keywords, carrying out NLP (non-linear language) correction on context information, and generating brief contents.
In another preferred embodiment of the present invention, the interfering elements include headers, footers, advertisements, and DOCK columns.
In another preferred embodiment of the present invention, the process of extracting keywords is:
And inputting paragraph contents into the entity recognition model after position coding, wherein the entity recognition module comprises a BiLSTM layer and a CRF layer, performing deep learning on word vector context feature information through the BiLSTM layer, outputting the probability of each text information corresponding to a label, generating a label sequence, inputting the label sequence into the CRF layer for sorting, obtaining a label sequence optimizing an objective function, extracting content features in the label sequence, clustering to generate undetermined keywords, and using the undetermined keywords to reversely propagate to update the model to obtain the keywords repeatedly updated for a plurality of times.
In another preferred embodiment of the present invention, in the normal state monitoring module, the user with failed access is marked as a pending user, for any pending user, a ratio of the total number of times the user accesses all the personal partitions to the registration duration is calculated, if the ratio is smaller than a preset threshold, and the user does not generate the personal partition of the user, the user is determined to be an abnormal user, and the authority of the user to access the personal partition is limited.
The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims (8)

1. A big data based portal dynamic management system, comprising:
The content acquisition module is used for acquiring collection data of website users according to website URLs, wherein the collection data comprises but is not limited to documents, music, webpages and videos;
The content analysis module is used for counting the browsing times and browsing time of a user on any collection data, marking the collection data of which the browsing times and browsing time exceed the corresponding preset threshold values as key data, generating brief introduction contents of a document and a webpage, and deleting the invalid collection data;
The data management module is used for transferring and storing key data into an independent partition, dividing the independent partition into a first partition and a second partition, storing the key data into the first partition only, screening the collection data of browsing times and browsing duration reciprocal of a user, and storing the collection data of the first partition and the second partition into the second partition, wherein the collection data of the first partition and the second partition are the same in quantity;
The normal state monitoring module is used for verifying the account number and the password of the independent partition to the visitor when any non-original user IP tries to access the independent partition; if the account password verification is wrong, refusing access; if the account password is verified correctly, verifying the identity information and verification problems of the original user for the visitor; if the identity information and the verification question answer is wrong, only allowing the visitor to access the second partition; if the identity information and the authentication questions are answered correctly, the user is only allowed to access the first partition.
2. The dynamic portal management system based on big data according to claim 1, wherein in the data management module, if the domain of the documents in the collection data of the second partition is completely the same as the domain of the documents in the collection data of the first partition, the domain is subject classification, deleting 50% of the collection data in the second partition, and collecting the documents different from the domain of the first partition in the inverse order.
3. The dynamic portal management system based on big data according to claim 2, wherein in the data management module, if the number of documents in the second partition cannot reach the same number as the documents in the first partition, randomly selecting the documents in the area different from the first partition from the website for filling.
4. The dynamic portal management system based on big data according to claim 2, wherein when an access record of an account number password, identity information or wrong answer of a verification question is collected, the corresponding access record is sent to the user side.
5. The dynamic portal management system based on big data according to claim 1, wherein the content analysis module generates a profile by:
Dividing the content of a document and a webpage according to each paragraph and sentence, removing interference elements, detecting the first three paragraphs of the text based on a natural language processing method, if the text paragraphs are smaller than three, detecting all the paragraphs, searching paragraphs with different text formats from the rest, wherein the text formats comprise but are not limited to fonts, word sizes and colors, extracting keywords from the paragraphs through an entity recognition model, combining sentences containing the keywords, carrying out NLP (non-linear language) correction on context information, and generating brief contents.
6. The dynamic web portal management system based on big data as recited in claim 5, wherein the disturbing elements include headers, footers, advertisements and DOCK columns.
7. The big data based portal dynamic management system of claim 5, wherein the keyword extraction process is as follows:
And inputting paragraph contents into the entity recognition model after position coding, wherein the entity recognition module comprises a BiLSTM layer and a CRF layer, performing deep learning on word vector context feature information through the BiLSTM layer, outputting the probability of each text information corresponding to a label, generating a label sequence, inputting the label sequence into the CRF layer for sorting, obtaining a label sequence optimizing an objective function, extracting content features in the label sequence, clustering to generate undetermined keywords, and using the undetermined keywords to reversely propagate to update the model to obtain the keywords repeatedly updated for a plurality of times.
8. The big data based portal dynamic management system according to claim 1, wherein in the normal state monitoring module, the user with failed access is marked as a pending user, for any pending user, a ratio of total times of access to all personal partitions by the user to a registration duration is calculated, if the ratio is smaller than a preset threshold, and the user does not generate own personal partition, the abnormal user is determined, and the authority of access to the personal partition by the user is limited.
CN202410471580.6A 2024-04-19 2024-04-19 Portal website dynamic management system based on big data Pending CN118094049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410471580.6A CN118094049A (en) 2024-04-19 2024-04-19 Portal website dynamic management system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410471580.6A CN118094049A (en) 2024-04-19 2024-04-19 Portal website dynamic management system based on big data

Publications (1)

Publication Number Publication Date
CN118094049A true CN118094049A (en) 2024-05-28

Family

ID=91142309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410471580.6A Pending CN118094049A (en) 2024-04-19 2024-04-19 Portal website dynamic management system based on big data

Country Status (1)

Country Link
CN (1) CN118094049A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090258637A1 (en) * 2008-04-11 2009-10-15 Beijing Focus Wireless Media Technology Co., ltd. Method for user identity tracking
CN108133016A (en) * 2017-12-22 2018-06-08 大连景竣科技有限公司 One kind does public document alignment system and method
CN112613020A (en) * 2020-12-31 2021-04-06 中国农业银行股份有限公司 Identity verification method and device
CN113542232A (en) * 2021-06-23 2021-10-22 广州欢享网络科技有限公司 Website data safety protection system based on big data
CN113626704A (en) * 2021-08-10 2021-11-09 平安国际智慧城市科技股份有限公司 Method, device and equipment for recommending information based on word2vec model
CN117312711A (en) * 2023-09-26 2023-12-29 珍岛信息技术(上海)股份有限公司 Search engine optimization method and system based on AI analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090258637A1 (en) * 2008-04-11 2009-10-15 Beijing Focus Wireless Media Technology Co., ltd. Method for user identity tracking
CN108133016A (en) * 2017-12-22 2018-06-08 大连景竣科技有限公司 One kind does public document alignment system and method
CN112613020A (en) * 2020-12-31 2021-04-06 中国农业银行股份有限公司 Identity verification method and device
CN113542232A (en) * 2021-06-23 2021-10-22 广州欢享网络科技有限公司 Website data safety protection system based on big data
CN113626704A (en) * 2021-08-10 2021-11-09 平安国际智慧城市科技股份有限公司 Method, device and equipment for recommending information based on word2vec model
CN117312711A (en) * 2023-09-26 2023-12-29 珍岛信息技术(上海)股份有限公司 Search engine optimization method and system based on AI analysis

Similar Documents

Publication Publication Date Title
US9928301B2 (en) Classifying uniform resource locators
US9449271B2 (en) Classifying resources using a deep network
CN103546446B (en) Phishing website detection method, device and terminal
CN108038173B (en) Webpage classification method and system and webpage classification equipment
CN108876058B (en) News event influence prediction method based on microblog
CN111753171B (en) Malicious website identification method and device
CN107679075B (en) Network monitoring method and equipment
CN111181922A (en) Fishing link detection method and system
CN110569350A (en) Legal recommendation method, equipment and storage medium
CN113065330A (en) Method for extracting sensitive information from unstructured data
Han et al. CBR‐Based Decision Support Methodology for Cybercrime Investigation: Focused on the Data‐Driven Website Defacement Analysis
CN114595689A (en) Data processing method, data processing device, storage medium and computer equipment
Krokos et al. A look into twitter hashtag discovery and generation
CN116976435B (en) Knowledge graph construction method based on network security
Shah et al. Web pages credibility scores for improving accuracy of answers in web-based question answering systems
KR101556714B1 (en) Method, system and computer readable recording medium for providing search results
Korsgaard et al. Reengineering the Wikipedia for reputation
CN118094049A (en) Portal website dynamic management system based on big data
CN106547780A (en) Article reprints statistics of variables method and device
CN105701232B (en) Hypertext link list pushing system based on APP information data
CN114064893A (en) Abnormal data auditing method, device, equipment and storage medium
CN113076453A (en) Domain name classification method, device and computer readable storage medium
Lapteva et al. Rationale for principles of developing control and protection of web content using CMS Drupal
CN111814643A (en) Black and gray URL (Uniform resource locator) identification method and device, electronic equipment and medium
CN103116760A (en) Method and device for identifying text-missing web pages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination