CN117194754A - Computer network data acquisition, analysis and management method, equipment and storage medium - Google Patents
Computer network data acquisition, analysis and management method, equipment and storage medium Download PDFInfo
- Publication number
- CN117194754A CN117194754A CN202311233107.6A CN202311233107A CN117194754A CN 117194754 A CN117194754 A CN 117194754A CN 202311233107 A CN202311233107 A CN 202311233107A CN 117194754 A CN117194754 A CN 117194754A
- Authority
- CN
- China
- Prior art keywords
- network data
- network
- keywords
- data
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 39
- 238000007726 management method Methods 0.000 title claims abstract description 32
- 238000004590 computer program Methods 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 9
- 101100179070 Typhula ishikariensis K1-A gene Proteins 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 4
- 238000007418 data mining Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims 2
- 238000005516 engineering process Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000239290 Araneae Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Abstract
The invention relates to the technical field of data acquisition, in particular to a computer network data acquisition analysis management method, equipment and a storage medium. The invention has the advantages of realizing convenient analysis and management of network data and improving the authenticity of network data acquisition.
Description
Technical Field
The invention relates to a computer network data acquisition, analysis and management method, in particular to a computer network data acquisition, analysis and management method, equipment and a storage medium.
Background
The technology adopted by the current network data acquisition is basically completed by comprehensively utilizing technologies such as a network spider (or a data acquisition robot), a word segmentation system, a task and index system and the like by utilizing the vertical search engine technology; with the development of internet technology and the growth of network massive information, the acquisition and sorting of information become an increasing demand.
Along with the development of computer networks and the large burst of network information, the prior art generally collects a large amount of data information in the process of computer network data acquisition, so that inconvenience of screening operation on the massive networks by manpower is very easy to occur, and the later acquisition result deviation caused by the unreal network data in the network data is not beneficial to network data acquisition and use.
Based on the above reasons, the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium for solving the problems in the prior art.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium.
In order to solve the technical problems, the invention provides the following technical scheme:
a computer network data acquisition analysis management method comprises the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, and starting data mining and acquisition operation;
s2: summarizing data information acquired from a network according to the keywords, and forming a network database, wherein the number is K1;
s3: integrating the data in the network database K1, analyzing and processing mutual authentication, and separating the analyzed network data again, wherein the numbers of the separated databases are K1-1, K1-2 and K1-3;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the three outputted summary information based on the keywords, analyzing two summary information with larger deviation based on the keywords, and feeding back the two summary information to the system;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
As a preferred technical scheme of the invention, the keyword format types of the network data acquisition in the step 1 comprise Chinese characters, english letters, pictures, arabic numerals and character strings, the network data keywords can be in the form of one or more combination of the Chinese characters, english letters, pictures, arabic numerals and character strings, and the search of the data in the step 1 can be realized based on search engines in the prior art or various web crawler programs.
As a preferable technical scheme of the invention, the network data collected from the computer network based on the keywords in the step 2 consists of the content of the upper and lower text based on the keywords and the provenance information of the network data, for example, when the keyword collection source is from a paper, the collected network data comprises the content of the paper based on the keywords and the website of the paper.
As a preferable technical scheme of the invention, the number of data information acquired from the network according to the keywords in the step 2 is at least 10, the maximum data amount is 10000, the specific data information amount to be acquired can be manually adjusted, the adjustment standard is that the adjustment is carried out once every 10 times, and if the data amount acquired from the computer network is less than 10 times, the specific acquired data amount is directly displayed.
As a preferred technical scheme of the invention, the basis of the verification analysis of the network database K1 in the step 3 is the proportion of the same or similar content of the acquired network data, wherein the database K1-1 is composed of network data with the same or similar proportion of more than 70%, the database K1-2 is composed of network data with the same or similar proportion of less than 70% and more than 30%, and the database K1-3 is composed of network data with the same or similar proportion of less than 30%.
As a preferred technical solution of the present invention, the basis of the three summary information in the step 5 is one or more of the common general knowledge in the prior art, the reserve of the prior art, the access analysis and the practice research operation by collecting the source of the specific network data.
A computer network data acquisition analysis management apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the computer network data acquisition analysis management method when executing the computer program.
A computer storage medium for data acquisition, analysis and management of a computer network, the readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method for data acquisition, analysis and management of a computer network.
The embodiment of the invention provides a computer network data acquisition, analysis and management method, equipment and a storage medium, which have the following beneficial effects:
1. the invention can realize the control of the network data acquisition amount in the process of computer network data acquisition, establishes a database of the data acquired through the network, and realizes the self-analysis operation of the network data by integrating and mutually verifying the network data in the database, thereby facilitating the rapid management operation of data acquisition personnel on mass data and improving the analysis and management efficiency of computer network data acquisition;
2. when the invention is used for computer network data acquisition operation, the network data acquired from the computer network consists of the upper and lower space content based on the keywords and the provenance information of the network data, not only is provided with the data, but also is provided with a data source, and when the later verification is carried out, a user can verify the acquired network data by directly calling the data source, so that the authenticity of the network data is ensured to be verified conveniently, the accuracy of the network data in later application is ensured, and the use based on the calculation of the network data acquisition is convenient.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for computer network data acquisition, analysis and management of the present invention;
fig. 2 is a diagram of a database network data similarity information structure in a computer network data acquisition, analysis and management method according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Examples: as shown in fig. 1-2, a computer network data acquisition, analysis and management method includes the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, starting data mining and acquisition operation, wherein the types of the formats of the keywords needing to be subjected to the network data acquisition comprise Chinese characters, english letters, pictures, arabic numerals and character strings, the keywords of the network data can be in the form of one or more combination of the Chinese characters, english letters, pictures, arabic numerals and character strings, and the search of the data can be realized based on search engines or various web crawler programs in the prior art;
s2: summarizing data information acquired according to keywords in a network, forming a network database, wherein the number is K1, the network data acquired from a computer network based on the keywords consists of upper and lower space contents based on the keywords and source information of the network data, when a keyword acquisition source is from a paper, the acquired network data comprise paper contents based on the keywords and paper output websites, the number of the data information acquired from the network according to the keywords is at least 10, the maximum data amount is 10000, the specific data information amount to be acquired can be manually adjusted, the adjustment standard is that every 10 and one adjustment are performed, and if the data amount acquired from the computer network is less than 10, the specific acquired data amount is directly displayed;
s3: integrating and mutually-verified analysis processing is carried out on data in the network database K1, the analyzed network data are subjected to secondary database separation, the numbers of the databases are K1-1, K1-2 and K1-3, the basis of verification analysis on the network database K1 is the proportion of the same or similar content of the acquired network data, wherein the database K1-1 is composed of network data with the same or similar content of more than 70%, the database K1-2 is composed of network data with the same or similar content of less than 70% and more than 30%, and the database K1-3 is composed of network data with the same or similar content of less than 30%;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the outputted three summary information based on the keywords, analyzing the two summary information based on the larger deviation of the keywords, feeding back the two summary information to the system, wherein the basis of the three summary information is one or more forms of common knowledge and storage of the prior art, and entering analysis and practice research operation by collecting specific network data;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
A computer network data acquisition analysis management apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the computer network data acquisition analysis management method when executing the computer program.
A computer storage medium for data acquisition, analysis and management of a computer network, the readable storage medium storing a computer program which, when executed by a processor, implements the steps of a method for data acquisition, analysis and management of a computer network.
The foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. The computer network data acquisition, analysis and management method is characterized by comprising the following steps:
s1: inputting keywords needing to be subjected to network data acquisition, and starting data mining and acquisition operation;
s2: summarizing data information acquired from a network according to the keywords, and forming a network database, wherein the number is K1;
s3: integrating the data in the network database K1, analyzing and processing mutual authentication, and separating the analyzed network data again, wherein the numbers of the separated databases are K1-1, K1-2 and K1-3;
s4: carrying out automatic integration analysis on the sub-libraries with the numbers of K1-1, K1-2 and K1-3 to obtain the summary information based on the keywords of each sub-library K1-1, K1-2 and K1-3, and outputting the obtained three summary information based on the keywords;
s5: manually analyzing the three outputted summary information based on the keywords, analyzing two summary information with larger deviation based on the keywords, and feeding back the two summary information to the system;
s6: deleting the network data collected in the two summarized information sub-databases with larger deviation based on the key words from the network database K1 based on the feedback information, reestablishing the database of the rest network data in the network database K1, numbering the database as K1-A, and completing the management of the network data;
s7: the referencing operation is performed based on the network data within database K1-A.
2. The method according to claim 1, wherein the keyword format types of the network data collection in the step 1 include kanji, english letters, pictures, arabic numerals and character strings, and the network data keywords may be in the form of one or more combinations of kanji, english letters, pictures, arabic numerals and character strings, and the searching of the data in the step 1 may be implemented based on a search engine of the prior art or various web crawler programs.
3. The method according to claim 1, wherein the network data collected from the computer network based on the keywords in step 2 is composed of the context content based on the keywords and the provenance information of the network data, such as when the keyword collection source is from a paper, the network data collected at this time includes the paper content based on the keywords and the paper output website.
4. The method according to claim 1, wherein the number of data information collected from the network according to the keywords in the step 2 is at least 10, the maximum data size is 10000, the specific number of data information to be collected can be manually adjusted, the standard of adjustment is 10 and one time, and if the number of data collected from the computer network is less than 10, the specific number of collected data is directly displayed.
5. The computer network data collection, analysis and management method according to claim 1, wherein the basis of the verification, analysis and library division of the network database K1 in the step 3 is the proportion of the same or similar content of the collected network data, wherein the library K1-1 is a network data composition with the same or similar content of the network data more than 70%, the library K1-2 is a network data composition with the same or similar content of the network data less than 70% and more than 30%, and the library K1-3 is a network data composition with the same or similar content of the network data less than 30%.
6. The method according to claim 1, wherein the manual basis for three summary information in step 5 is one or more of common knowledge in the art, reserves of prior knowledge, entry into analysis and practice research operations by collecting source of specific network data.
7. A computer network data acquisition analysis management apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a computer network data acquisition analysis management method according to any one of claims 1 to 6 when executing said computer program.
8. A computer storage medium for computer network data acquisition analysis management, characterized in that the readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of a computer network data acquisition analysis management method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311233107.6A CN117194754A (en) | 2023-09-22 | 2023-09-22 | Computer network data acquisition, analysis and management method, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311233107.6A CN117194754A (en) | 2023-09-22 | 2023-09-22 | Computer network data acquisition, analysis and management method, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117194754A true CN117194754A (en) | 2023-12-08 |
Family
ID=89003230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311233107.6A Pending CN117194754A (en) | 2023-09-22 | 2023-09-22 | Computer network data acquisition, analysis and management method, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117194754A (en) |
-
2023
- 2023-09-22 CN CN202311233107.6A patent/CN117194754A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7424421B2 (en) | Word collection method and system for use in word-breaking | |
US10360229B2 (en) | Systems and methods for enterprise data search and analysis | |
US10915543B2 (en) | Systems and methods for enterprise data search and analysis | |
CN111967761A (en) | Monitoring and early warning method and device based on knowledge graph and electronic equipment | |
US6374270B1 (en) | Corporate disclosure and repository system utilizing inference synthesis as applied to a database | |
US8090720B2 (en) | Method for merging document clusters | |
Wu et al. | Efficient near-duplicate detection for q&a forum | |
US20220147526A1 (en) | Keyword and business tag extraction | |
WO2020155749A1 (en) | Method and apparatus for constructing personal knowledge graph, computer device, and storage medium | |
CN107945092A (en) | Big data integrated management approach and system for audit field | |
CN105550169A (en) | Method and device for identifying point of interest names based on character length | |
CN115757689A (en) | Information query system, method and equipment | |
CN113220672A (en) | Military and civil fusion policy information database system | |
CN112948429B (en) | Data reporting method, device and equipment | |
Knap | Towards Odalic, a Semantic Table Interpretation Tool in the ADEQUATe Project. | |
CN117171650A (en) | Document data processing method, system and medium based on web crawler technology | |
CN106682107B (en) | Method and device for determining incidence relation of database table | |
CN117194754A (en) | Computer network data acquisition, analysis and management method, equipment and storage medium | |
Doerr et al. | A method for estimating the precision of placename matching | |
CN115239060A (en) | Airworthiness approval risk assessment system and method based on big data analysis | |
CN114385794A (en) | Method, device, equipment and storage medium for generating enterprise knowledge graph | |
Lieberman et al. | Spatio-textual spreadsheets: Geotagging via spatial coherence | |
US20180121502A1 (en) | User Search Query Processing | |
TW202022771A (en) | Species data analysis method, system and computer program product | |
US20230222145A1 (en) | Information search system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |