CN117194754A - Computer network data acquisition, analysis and management method, equipment and storage medium - Google Patents

Computer network data acquisition, analysis and management method, equipment and storage medium Download PDF

Info

Publication number
CN117194754A
CN117194754A CN202311233107.6A CN202311233107A CN117194754A CN 117194754 A CN117194754 A CN 117194754A CN 202311233107 A CN202311233107 A CN 202311233107A CN 117194754 A CN117194754 A CN 117194754A
Authority
CN
China
Prior art keywords
network data
network
keywords
data
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311233107.6A
Other languages
Chinese (zh)
Inventor
王�锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Huike Information Technology Co ltd
Original Assignee
Guangdong Huike Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Huike Information Technology Co ltd filed Critical Guangdong Huike Information Technology Co ltd
Priority to CN202311233107.6A priority Critical patent/CN117194754A/en
Publication of CN117194754A publication Critical patent/CN117194754A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to the technical field of data acquisition, in particular to a computer network data acquisition analysis management method, equipment and a storage medium. The invention has the advantages of realizing convenient analysis and management of network data and improving the authenticity of network data acquisition.

Description

Computer network data acquisition, analysis and management method, equipment and storage medium
Technical Field
The invention relates to a computer network data acquisition, analysis and management method, in particular to a computer network data acquisition, analysis and management method, equipment and a storage medium.
Background
The technology adopted by the current network data acquisition is basically completed by comprehensively utilizing technologies such as a network spider (or a data acquisition robot), a word segmentation system, a task and index system and the like by utilizing the vertical search engine technology; with the development of internet technology and the growth of network massive information, the acquisition and sorting of information become an increasing demand.
Along with the development of computer networks and the large burst of network information, the prior art generally collects a large amount of data information in the process of computer network data acquisition, so that inconvenience of screening operation on the massive networks by manpower is very easy to occur, and the later acquisition result deviation caused by the unreal network data in the network data is not beneficial to network data acquisition and use.
Based on the above reasons, the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium for solving the problems in the prior art.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium.
In order to solve the technical problems, the invention provides the following technical scheme:
a computer network data acquisition analysis management method comprises the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, and starting data mining and acquisition operation;
s2: summarizing data information acquired from a network according to the keywords, and forming a network database, wherein the number is K1;
s3: integrating the data in the network database K1, analyzing and processing mutual authentication, and separating the analyzed network data again, wherein the numbers of the separated databases are K1-1, K1-2 and K1-3;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the three outputted summary information based on the keywords, analyzing two summary information with larger deviation based on the keywords, and feeding back the two summary information to the system;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
As a preferred technical scheme of the invention, the keyword format types of the network data acquisition in the step 1 comprise Chinese characters, english letters, pictures, arabic numerals and character strings, the network data keywords can be in the form of one or more combination of the Chinese characters, english letters, pictures, arabic numerals and character strings, and the search of the data in the step 1 can be realized based on search engines in the prior art or various web crawler programs.
As a preferable technical scheme of the invention, the network data collected from the computer network based on the keywords in the step 2 consists of the content of the upper and lower text based on the keywords and the provenance information of the network data, for example, when the keyword collection source is from a paper, the collected network data comprises the content of the paper based on the keywords and the website of the paper.
As a preferable technical scheme of the invention, the number of data information acquired from the network according to the keywords in the step 2 is at least 10, the maximum data amount is 10000, the specific data information amount to be acquired can be manually adjusted, the adjustment standard is that the adjustment is carried out once every 10 times, and if the data amount acquired from the computer network is less than 10 times, the specific acquired data amount is directly displayed.
As a preferred technical scheme of the invention, the basis of the verification analysis of the network database K1 in the step 3 is the proportion of the same or similar content of the acquired network data, wherein the database K1-1 is composed of network data with the same or similar proportion of more than 70%, the database K1-2 is composed of network data with the same or similar proportion of less than 70% and more than 30%, and the database K1-3 is composed of network data with the same or similar proportion of less than 30%.
As a preferred technical solution of the present invention, the basis of the three summary information in the step 5 is one or more of the common general knowledge in the prior art, the reserve of the prior art, the access analysis and the practice research operation by collecting the source of the specific network data.
A computer network data acquisition analysis management apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the computer network data acquisition analysis management method when executing the computer program.
A computer storage medium for data acquisition, analysis and management of a computer network, the readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method for data acquisition, analysis and management of a computer network.
The embodiment of the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium, which have the following beneficial effects:
1. the invention can realize the control of the network data acquisition amount in the process of computer network data acquisition, establishes a database of the data acquired through the network, and realizes the self-analysis operation of the network data by integrating and mutually verifying the network data in the database, thereby facilitating the rapid management operation of data acquisition personnel on mass data and improving the analysis and management efficiency of computer network data acquisition;
2. when the invention is used for computer network data acquisition operation, the network data acquired from the computer network consists of the upper and lower space content based on the keywords and the provenance information of the network data, not only is provided with the data, but also is provided with a data source, and when the later verification is carried out, a user can verify the acquired network data by directly calling the data source, so that the authenticity of the network data is ensured to be verified conveniently, the accuracy of the network data in later application is ensured, and the use based on the calculation of the network data acquisition is convenient.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for computer network data acquisition, analysis and management of the present invention;
fig. 2 is a diagram of a database network data similarity information structure in a computer network data acquisition, analysis and management method according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Examples: as shown in fig. 1-2, a computer network data acquisition, analysis and management method includes the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, starting data mining and acquisition operation, wherein the types of the formats of the keywords needing to be subjected to the network data acquisition comprise Chinese characters, english letters, pictures, arabic numerals and character strings, the keywords of the network data can be in the form of one or more combination of the Chinese characters, english letters, pictures, arabic numerals and character strings, and the search of the data can be realized based on search engines or various web crawler programs in the prior art;
s2: summarizing data information acquired according to keywords in a network, forming a network database, wherein the number is K1, the network data acquired from a computer network based on the keywords consists of upper and lower space contents based on the keywords and source information of the network data, when a keyword acquisition source is from a paper, the acquired network data comprise paper contents based on the keywords and paper output websites, the number of the data information acquired from the network according to the keywords is at least 10, the maximum data amount is 10000, the specific data information amount to be acquired can be manually adjusted, the adjustment standard is that every 10 and one adjustment are performed, and if the data amount acquired from the computer network is less than 10, the specific acquired data amount is directly displayed;
s3: integrating and mutually-verified analysis processing is carried out on data in the network database K1, the analyzed network data are subjected to secondary database separation, the numbers of the databases are K1-1, K1-2 and K1-3, the basis of verification analysis on the network database K1 is the proportion of the same or similar content of the acquired network data, wherein the database K1-1 is composed of network data with the same or similar content of more than 70%, the database K1-2 is composed of network data with the same or similar content of less than 70% and more than 30%, and the database K1-3 is composed of network data with the same or similar content of less than 30%;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the outputted three summary information based on the keywords, analyzing the two summary information based on the larger deviation of the keywords, feeding back the two summary information to the system, wherein the basis of the three summary information is one or more forms of common knowledge and storage of the prior art, and entering analysis and practice research operation by collecting specific network data;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
A computer network data acquisition analysis management apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the computer network data acquisition analysis management method when executing the computer program.
A computer storage medium for data acquisition, analysis and management of a computer network, the readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method for data acquisition, analysis and management of a computer network.
The foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. The computer network data acquisition, analysis and management method is characterized by comprising the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, and starting data mining and acquisition operation;
s2: summarizing data information acquired from a network according to the keywords, and forming a network database, wherein the number is K1;
s3: integrating the data in the network database K1, analyzing and processing mutual authentication, and separating the analyzed network data again, wherein the numbers of the separated databases are K1-1, K1-2 and K1-3;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the three outputted summary information based on the keywords, analyzing two summary information with larger deviation based on the keywords, and feeding back the two summary information to the system;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
2. The method according to claim 1, wherein the keyword format types of the network data collection in the step 1 include kanji, english letters, pictures, arabic numerals and character strings, and the network data keywords may be in the form of one or more combinations of kanji, english letters, pictures, arabic numerals and character strings, and the searching of the data in the step 1 may be implemented based on a search engine of the prior art or various web crawler programs.
3. The method according to claim 1, wherein the network data collected from the computer network based on the keywords in step 2 is composed of the context content based on the keywords and the provenance information of the network data, such as when the keyword collection source is from a paper, the network data collected at this time includes the paper content based on the keywords and the paper output website.
4. The method according to claim 1, wherein the number of data information collected from the network according to the keywords in the step 2 is at least 10, the maximum data size is 10000, the specific number of data information to be collected can be manually adjusted, the standard of adjustment is 10 and one time, and if the number of data collected from the computer network is less than 10, the specific number of collected data is directly displayed.
5. The computer network data collection, analysis and management method according to claim 1, wherein the basis of the verification, analysis and library division of the network database K1 in the step 3 is the proportion of the same or similar content of the collected network data, wherein the library K1-1 is a network data composition with the same or similar content of the network data more than 70%, the library K1-2 is a network data composition with the same or similar content of the network data less than 70% and more than 30%, and the library K1-3 is a network data composition with the same or similar content of the network data less than 30%.
6. The method according to claim 1, wherein the manual basis for three summary information in step 5 is one or more of common knowledge in the art, reserves of prior knowledge, entry into analysis and practice research operations by collecting source of specific network data.
7. A computer network data acquisition analysis management apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a computer network data acquisition analysis management method according to any one of claims 1 to 6 when executing said computer program.
8. A computer storage medium for computer network data acquisition analysis management, characterized in that the readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of a computer network data acquisition analysis management method according to any one of claims 1 to 6.
CN202311233107.6A 2023-09-22 2023-09-22 Computer network data acquisition, analysis and management method, equipment and storage medium Pending CN117194754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311233107.6A CN117194754A (en) 2023-09-22 2023-09-22 Computer network data acquisition, analysis and management method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311233107.6A CN117194754A (en) 2023-09-22 2023-09-22 Computer network data acquisition, analysis and management method, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117194754A true CN117194754A (en) 2023-12-08

Family

ID=89003230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311233107.6A Pending CN117194754A (en) 2023-09-22 2023-09-22 Computer network data acquisition, analysis and management method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117194754A (en)

Similar Documents

Publication Publication Date Title
US7424421B2 (en) Word collection method and system for use in word-breaking
US10360229B2 (en) Systems and methods for enterprise data search and analysis
US10915543B2 (en) Systems and methods for enterprise data search and analysis
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
US6374270B1 (en) Corporate disclosure and repository system utilizing inference synthesis as applied to a database
US8090720B2 (en) Method for merging document clusters
Wu et al. Efficient near-duplicate detection for q&a forum
US20220147526A1 (en) Keyword and business tag extraction
WO2020155749A1 (en) Method and apparatus for constructing personal knowledge graph, computer device, and storage medium
CN107945092A (en) Big data integrated management approach and system for audit field
CN105550169A (en) Method and device for identifying point of interest names based on character length
CN115757689A (en) Information query system, method and equipment
CN113220672A (en) Military and civil fusion policy information database system
CN112948429B (en) Data reporting method, device and equipment
Knap Towards Odalic, a Semantic Table Interpretation Tool in the ADEQUATe Project.
CN117171650A (en) Document data processing method, system and medium based on web crawler technology
CN106682107B (en) Method and device for determining incidence relation of database table
CN117194754A (en) Computer network data acquisition, analysis and management method, equipment and storage medium
Doerr et al. A method for estimating the precision of placename matching
CN115239060A (en) Airworthiness approval risk assessment system and method based on big data analysis
CN114385794A (en) Method, device, equipment and storage medium for generating enterprise knowledge graph
Lieberman et al. Spatio-textual spreadsheets: Geotagging via spatial coherence
US20180121502A1 (en) User Search Query Processing
TW202022771A (en) Species data analysis method, system and computer program product
US20230222145A1 (en) Information search system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination