CN112256986A - Method and device for monitoring virtual currency website, electronic equipment and storage medium - Google Patents
Method and device for monitoring virtual currency website, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112256986A CN112256986A CN202011120060.9A CN202011120060A CN112256986A CN 112256986 A CN112256986 A CN 112256986A CN 202011120060 A CN202011120060 A CN 202011120060A CN 112256986 A CN112256986 A CN 112256986A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- virtual currency
- target object
- preset
- service type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for monitoring a virtual currency website, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof; screening a target object from the monitoring objects through a preset condition; and inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object. According to the embodiment of the invention, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the suspected illegal virtual currency information website and designating the website list, the identification accuracy is improved by using the virtual currency business discrimination model trained in advance, and the suspected illegal virtual currency business website is identified.
Description
Technical Field
The invention belongs to the technical field of internet monitoring, and particularly relates to a method and a device for monitoring a virtual currency website, electronic equipment and a storage medium.
Background
The relevant supervision departments pay close attention to the websites for developing the related business of the virtual currency and actively develop monitoring work. The method used in the earlier stage mainly comprises the following steps: the first stage, firstly, searching relevant keywords by manually using a search engine, accessing a virtual currency information website, and collecting websites by receiving reporting clues and other ways; then, the obtained websites are manually checked and verified one by one, the website filing information is inquired, and the data information form is filled. The second stage, introducing crawler technology, firstly, automatically calling a search engine to search specified keywords, and crawling by customizing a crawler program for a part of information websites to obtain related websites; then crawling each website html document, judging whether to develop virtual currency related services according to whether a small number of specific keywords (such as 'bit currency transaction' and the like) are contained, then manually accessing and checking websites suspected to develop related services, introducing third-party interface data to supplement website record information and the like.
The method in the first stage is completely completed manually, the monitoring period is long, and the overall efficiency is low; the method is subject to human efficiency, the number of the acquired websites is small, and the monitoring coverage is limited. The second stage method utilizes automatic crawling of programs, can acquire more websites, enlarges monitoring coverage and carries out primary screening on the acquired websites. However, due to the fact that the screening rule is too simple, the hit rate of the virtual currency related business websites after screening is still low and never higher than 40%, and the hit rate generally fluctuates up and down by 25%, so that a large number of unrelated websites enter a subsequent manual investigation link, and the subsequent monitoring working pressure is increased.
Besides the methods, no device or method specially used for monitoring websites related to the illegal virtual money transaction is available in the market at present, and no special model customization is performed on the types of the illegal virtual money transaction related services, so that the problems of high manual investigation pressure, low identification accuracy, low monitoring efficiency and the like are caused.
Disclosure of Invention
The embodiment of the invention provides a method and a device for monitoring a virtual currency website, an electronic device and a storage medium, which can improve the identification accuracy, reduce the manual investigation pressure and improve the monitoring efficiency.
The embodiment of the invention provides a method for monitoring a virtual currency website, which comprises the following steps:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The method for monitoring a virtual currency website according to the embodiment of the present invention, wherein the S1 specifically includes: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the virtual currency website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
The method for monitoring a virtual currency website according to the embodiment of the present invention, wherein the S2 specifically includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The method for monitoring a virtual currency website according to the embodiment of the present invention, wherein the S32 specifically includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
The method for monitoring a virtual currency website according to the embodiment of the present invention, wherein the S33 specifically includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is virtual currency information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The method for monitoring a virtual currency website according to the embodiment of the present invention, wherein the step S3 is followed by:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
The method for monitoring the virtual currency website according to the embodiment of the invention, wherein the training process of the preset virtual currency business discrimination model specifically comprises the following steps:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the virtual currency business discrimination model, executing the steps from S31 to S33, and outputting the business types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the preset threshold keywords and the preset keyword dictionary content of the virtual currency business discrimination model according to the accuracy.
The embodiment of the invention provides a device for monitoring a virtual currency website, which comprises:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, configured to input the html document of the target object to a preset virtual currency service discrimination model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The embodiment of the invention provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the method for monitoring the virtual currency website.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the method for monitoring a virtual currency website.
According to the method and device for monitoring the virtual currency website, the electronic equipment and the storage medium, provided by the embodiment of the invention, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the suspected illegal virtual currency information website and designating the website list, the identification accuracy is improved by using the pre-trained virtual currency business discrimination model, the suspected illegal virtual currency business website is identified, compared with the prior art, the manual investigation pressure is greatly reduced, and the monitoring efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for monitoring a virtual currency website according to an embodiment of the present invention;
FIG. 2 is a partial flowchart of a method for monitoring a virtual currency website according to an embodiment of the present invention;
fig. 3 is a schematic operation diagram of a virtual currency business discrimination model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating overall operation of a method for monitoring a virtual currency website according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an apparatus for monitoring a virtual currency website according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1-2 are schematic flow charts of a method for monitoring a virtual currency website according to an embodiment of the present invention, as shown in fig. 1-2, the method includes:
a method of monitoring a virtual currency website, comprising:
s1: and collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof.
In S1, a target object is screened from the monitored objects according to a preset condition, that is, a website related to a suspicious virtual currency website is obtained. Specifically, the following three approaches can be adopted,
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords; namely, based on the experience of one-line manual investigation, a unique preset keyword library is established, the addition of newly added keywords temporarily selected during current investigation is supported, a search engine is called according to the preset keyword library and the newly added keywords, and the searched website is crawled. The preset keyword library is updated irregularly according to the dynamic state of the suspicious website and the experience of the inspectors.
Crawling an information website related to the virtual currency website as a monitoring object; namely, the virtual currency information platform is monitored, and the website to which the virtual currency information platform is directed is crawled regularly.
And taking the input websites or the batch imported websites as monitoring objects. Namely, the website is directly input or imported into the website list in batches, and the method is suitable for reporting clues and can also support directed troubleshooting tasks. The websites obtained by the three ways all enter S2. The virtual currency website is a virtual currency transaction website, and the monitoring object is mainly a virtual currency website for suspected illegal transactions.
S2: and screening out the target object from the monitored objects through preset conditions.
Specifically, S2 includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects; in particular, the method screens the acquired websites by using a white list, and aims to exclude a part of irrelevant websites. Within the white list are mainly specific web site domain names, such as those of government departments, colleges and universities, and mainstream large media websites. The websites belonging to the white list are recorded as the websites of the white list for summarizing data, and after the main domain name comparison and screening, the rest monitoring objects enter S22.
S22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The abnormal website elimination refers to the situation that the website cannot be opened or is opened abnormally. If the webpage cannot be opened or is abnormally opened, the business can be carried out normally. None of the websites where these situations exist are of interest for monitoring and therefore need to be excluded. By crawling the website html document, special conditions can be eliminated according to the acquisition condition and simple analysis of html document content. The web site where the special case occurs will record the specific special case for the summary data, and the remaining web sites and their html documents go to S3.
Fig. 3 is a schematic diagram illustrating operation of a virtual money service discrimination model in the method for monitoring a virtual money website according to the embodiment of the present invention, as shown in fig. 3, where, in S3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample; that is, if the html document does not contain any threshold keyword, the output service type is an invalid sample. If any threshold keyword is included in the html document, the process proceeds to S32.
Specifically, the Regular matching refers to matching by using a Regular Expression, which is also called a Regular Expression (RE), and is a concept of computer science, and the Regular Expression is generally used to retrieve and replace text that conforms to a certain pattern (rule). Threshold keywords can be selected according to training of the multi-sample model and actual troubleshooting experience, the model is debugged repeatedly by taking the judgment accuracy as a target, the threshold keywords are increased or decreased, a preset keyword dictionary is modified, and assigned values and assigned rules are determined. And the judgment result is divided into 3 types of target business types, information and invalid samples, the classification is carried out according to the scoring condition of the website in the 3 types of business types, and the business types are recorded. The threshold keywords may be selected from bitcoin, virtual, monetary, transactional, etc.
S32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s32 specifically includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules; the classification key words can be selected from bitcoin, virtual, currency, transaction and the like.
The format of the keyword dictionary is preset as { keyword A (whether the keyword A is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score), and keyword B (whether the keyword B is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score) … … }. And for the keywords which are enabled to be 'yes' (if the keywords are enabled to be 'no', the keywords do not participate in subsequent statistical calculation, the parameter has the functions of facilitating repeated debugging of the model, stopping enabling the keywords with poor judging effect or recovering enabling the keywords which are beneficial to judging), obtaining the occurrence frequency of each keyword through regular matching, updating the frequency initial value in the dictionary, calculating scores by combining the scores, the frequencies and the calculation rules, and updating the score initial value.
S322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
Because the number of keywords of each service type is different from the assigning and calculating rules, especially the information type has a wider range and a larger number of classified keywords, the score is relatively higher, and thus, the subsequent comparison has unreasonable advantages. Therefore, the standard score is set, so that the influence of the number difference of the model classification keywords on the result can be reduced, and the influence that part of websites do not reach the standard score can be avoided, and the websites are judged as the target service types only because of extremely individual classification keywords. The classification keywords may be the same as or overlapping with the threshold keywords, or may be different from the threshold keywords.
S33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
S33 specifically includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is virtual currency information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The target business type refers to a business type which is mainly concerned by monitoring, and here refers to a business type related to illegal virtual currency transaction; the information mainly refers to media websites, and also comprises virtual currency information websites and general information websites, and because the content of the virtual currency information websites is wide and often comprises introduction of target business types, the information is greatly overlapped with the target business type websites in terms of business judging keywords, so that the information is classified separately and judged according to the information characteristics; the invalid sample class refers to a website, such as a gaming website, that is not related to the target traffic type. In addition, the website for carrying out the related business of the illegal and suspected virtual currency transaction also needs to pay attention to the virtual currency information website, so if the target business type score of the information website is higher, the website is recorded as the 'virtual currency information' business (namely illegal and suspected virtual currency transaction information which is different from 'general information' business), enters the next step together with the illegal and suspected virtual currency transaction website, and the website which is judged to be general information and invalid sample business is only recorded for data summarization.
Fig. 4 is a schematic overall operation diagram of the method for monitoring a virtual currency website according to the embodiment of the present invention, as shown in fig. 4, further comprising, after the step S3:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object.
S5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; and extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP (Internet protocol) record information displayed by webpages (some webpages can display overdue ICP record information which cannot be obtained through a third-party data interface, and the displayed ICP record information has certain significance for searching an operation main body).
S6: and summarizing the process data from the S1 to the S5 and storing the process data in a database. The process data refers to the data recording results of all the websites related to the steps, and the data recording results are used for subsequent display or export.
Preferably, the training process of the preset virtual currency business discrimination model specifically includes:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the virtual currency business discrimination model, executing the steps from S31 to S33, and outputting the business types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the preset threshold keywords and the preset keyword dictionary content of the virtual currency business discrimination model according to the accuracy.
The preset virtual currency business discrimination model is trained, summarized and adjusted continuously according to error conditions in training samples, namely the training process is repeatedly executed, the virtual currency business discrimination model is optimized continuously until the accuracy reaches the standard, and the method can be used for monitoring the virtual currency website in the embodiment of the scheme, so that certain pair of accuracy is ensured. Under the condition that the websites are suspicious, the keywords which are used for searching the keyword library often have no function of distinguishing the service types. For example, if the html document of the website obtained by searching the "bit currency" generally contains the keyword of the "bit currency", and if the "bit currency" is assigned in the model in a larger way and belongs to the target service type, the overall result is biased to the target service type, which not only does not play a role in distinguishing the service types, but also may cause the classification result to be biased to the target service type by mistake, so that the selection and assignment of the model keyword need more thinking. Thirdly, the calculation rule of the score of each keyword can be different according to different keywords, and the calculation rule is different according to the distinguishing effect of the keyword. For example, the financial is biased to the information category, but the target service type may have a certain news information area, so that a not too high score upper limit should be set for the score of the word; the "United nations" is relatively more biased toward information categories and the more frequent the occurrence is, the more likely it is to be an information website, so the way it is scored can be related to its frequency. Finally, some keywords may have multiple meanings, such as when the number of occurrences is small, the target business type is favored, and when the number of occurrences is very large, the information or invalid sample is favored.
The service type discrimination model takes accuracy as a target, thousands of website html documents are selected for training, and the accuracy is counted by taking results obtained by manually checking the websites as reference. The results are classified as correct and error, wherein errors are further classified as general and serious errors. The correct type means that the service type is judged without errors, the general error means that the information and the invalid sample are judged by mistake, and the serious error means that the target service type is judged as a non-target service type (information or invalid sample) or the non-target service type is judged by mistake as the target service type. In order to prevent overfitting, the accuracy of the finally selected model is over 80%, and the serious error rate is within 5%.
According to the embodiment, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the suspected illegal virtual currency information website and designating the website list, the identification accuracy is improved by using the virtual currency business discrimination model trained in advance, the suspected illegal virtual currency business website is identified, and the monitoring efficiency is improved. Through practical tests, the embodiment of the invention has better and more stable overall operation effect, and compared with the method used in the prior art, the method has the advantages that the accuracy is greatly improved, the manual troubleshooting pressure is greatly reduced, and the monitoring efficiency is improved.
Fig. 5 is a schematic flowchart of an apparatus for monitoring a virtual currency website according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes:
the monitoring object acquisition module 10 is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module 20 is connected with the monitored object acquisition module 10 and is used for screening the target object from the monitored objects according to preset conditions;
a service type discriminating module 30, connected to the target object screening module 20, configured to input the html document of the target object into a preset virtual currency service discriminating model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The working principle of the device for monitoring a virtual currency website in the embodiment of the present application is corresponding to that of the method for monitoring a virtual currency website in the embodiment described above, and details are not repeated here.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a method of monitoring a virtual currency web site, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, embodiments of the present invention also provide a computer program product including a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions which, when executed by a computer, the computer is capable of performing a method of monitoring virtual currency websites, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In yet another aspect, an embodiment of the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform a method of monitoring a virtual currency website, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of monitoring a virtual currency web site, comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset virtual currency business discrimination model, and discriminating the business type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
2. The method for monitoring virtual currency websites according to claim 1, wherein said S1 specifically comprises: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the virtual currency website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
3. The method for monitoring virtual currency websites according to claim 1, wherein said S2 specifically comprises:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
4. The method for monitoring virtual currency websites according to claim 1, wherein said S32 specifically comprises:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
5. The method for monitoring virtual currency websites according to claim 1, wherein said S33 specifically comprises:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is virtual currency information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
6. The method for monitoring virtual currency websites according to claim 1, further comprising after said S3:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
7. The method for monitoring virtual currency websites according to any one of claims 1 to 6, wherein the training process of the preset virtual currency business discriminant model specifically comprises:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the virtual currency business discrimination model, executing the steps from S31 to S33, and outputting the business types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the preset threshold keywords and the preset keyword dictionary content of the virtual currency business discrimination model according to the accuracy.
8. An apparatus for monitoring a virtual currency web site, comprising:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, configured to input the html document of the target object to a preset virtual currency service discrimination model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of monitoring a virtual currency web site as claimed in any one of claims 1 to 7 are carried out when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of monitoring virtual money web sites according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011120060.9A CN112256986A (en) | 2020-10-19 | 2020-10-19 | Method and device for monitoring virtual currency website, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011120060.9A CN112256986A (en) | 2020-10-19 | 2020-10-19 | Method and device for monitoring virtual currency website, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112256986A true CN112256986A (en) | 2021-01-22 |
Family
ID=74244907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011120060.9A Withdrawn CN112256986A (en) | 2020-10-19 | 2020-10-19 | Method and device for monitoring virtual currency website, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256986A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090006391A1 (en) * | 2007-06-27 | 2009-01-01 | T Reghu Ram | Automatic categorization of document through tagging |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
CN103927302A (en) * | 2013-01-10 | 2014-07-16 | 阿里巴巴集团控股有限公司 | Text classification method and system |
CN106484919A (en) * | 2016-11-15 | 2017-03-08 | 任子行网络技术股份有限公司 | A kind of industrial sustainability sorting technique based on webpage autonomous word and system |
CN107766481A (en) * | 2017-10-13 | 2018-03-06 | 国家计算机网络与信息安全管理中心 | A kind of method and system for finding internet financial platform |
-
2020
- 2020-10-19 CN CN202011120060.9A patent/CN112256986A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090006391A1 (en) * | 2007-06-27 | 2009-01-01 | T Reghu Ram | Automatic categorization of document through tagging |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
CN103927302A (en) * | 2013-01-10 | 2014-07-16 | 阿里巴巴集团控股有限公司 | Text classification method and system |
CN106484919A (en) * | 2016-11-15 | 2017-03-08 | 任子行网络技术股份有限公司 | A kind of industrial sustainability sorting technique based on webpage autonomous word and system |
CN107766481A (en) * | 2017-10-13 | 2018-03-06 | 国家计算机网络与信息安全管理中心 | A kind of method and system for finding internet financial platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599155B (en) | Webpage classification method and system | |
CN111523996A (en) | Approval method and system | |
EP4319054A2 (en) | Identifying legitimate websites to remove false positives from domain discovery analysis | |
US20210182859A1 (en) | System And Method For Modifying An Existing Anti-Money Laundering Rule By Reducing False Alerts | |
CN104702492A (en) | Garbage message model training method, garbage message identifying method and device thereof | |
CN110781308A (en) | Anti-fraud system for building knowledge graph based on big data | |
CN111461216A (en) | Case risk identification method based on machine learning | |
CN112612813B (en) | Test data generation method and device | |
WO2020048056A1 (en) | Risk decision method and apparatus | |
CN110263155B (en) | Data classification method, and training method and system of data classification model | |
CN107733902A (en) | A kind of monitoring method and device of target data diffusion process | |
CN112328936A (en) | Website identification method, device and equipment and computer readable storage medium | |
CN112016317A (en) | Sensitive word recognition method and device based on artificial intelligence and computer equipment | |
CN116401343A (en) | Data compliance analysis method | |
CN110611655B (en) | Blacklist screening method and related product | |
CN117934154A (en) | Transaction risk prediction method, model training method, device, equipment, medium and program product | |
CN111861733B (en) | Fraud prevention and control system and method based on address fuzzy matching | |
CN112256988A (en) | Method and device for monitoring cross-border house-buying website, electronic equipment and storage medium | |
CN117437001A (en) | Target object index data processing method and device and computer equipment | |
CN112417329A (en) | Method and device for monitoring illegal internet foreign exchange deposit transaction platform | |
CN111652708A (en) | Risk assessment method and device applied to house mortgage loan products | |
CN108717637B (en) | Automatic mining method and system for E-commerce safety related entities | |
CN112256986A (en) | Method and device for monitoring virtual currency website, electronic equipment and storage medium | |
CN104063514B (en) | Method for vertical search | |
CN112256987A (en) | Method, device, equipment and storage medium for monitoring overseas stock trading website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210122 |
|
WW01 | Invention patent application withdrawn after publication |