CN112256987A - Method, device, equipment and storage medium for monitoring overseas stock trading website - Google Patents

Method, device, equipment and storage medium for monitoring overseas stock trading website Download PDF

Info

Publication number
CN112256987A
CN112256987A CN202011121231.XA CN202011121231A CN112256987A CN 112256987 A CN112256987 A CN 112256987A CN 202011121231 A CN202011121231 A CN 202011121231A CN 112256987 A CN112256987 A CN 112256987A
Authority
CN
China
Prior art keywords
monitoring
target object
preset
website
service type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011121231.XA
Other languages
Chinese (zh)
Inventor
张黎娜
黄琨
葛子川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Internet Finance Association
Original Assignee
China Internet Finance Association
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Internet Finance Association filed Critical China Internet Finance Association
Priority to CN202011121231.XA priority Critical patent/CN112256987A/en
Publication of CN112256987A publication Critical patent/CN112256987A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for monitoring an overseas stock trading website, wherein the method comprises the following steps: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof; screening a target object from the monitoring objects through a preset condition; and inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object. According to the method and the device, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the oversea stock transaction information website suspected of being developed and designating the website list, the identification accuracy is improved by using the pre-trained oversea stock transaction business discrimination model, the oversea stock transaction business website suspected of being developed is identified, compared with the prior art, the manual investigation pressure is greatly reduced, and the monitoring efficiency is improved.

Description

Method, device, equipment and storage medium for monitoring overseas stock trading website
Technical Field
The invention belongs to the technical field of internet monitoring, and particularly relates to a method, a device, equipment and a storage medium for monitoring an overseas stock trading website.
Background
In the prior art, a supervision department closely pays attention to websites for developing overseas stock trading related services and actively develops monitoring work. The method used in the earlier stage mainly comprises the following steps: the first stage, firstly, searching relevant keywords by manually using a search engine, and collecting relevant websites; then, manually opening the obtained websites one by one for checking and verifying, inquiring the record information of the websites and the like, and filling in a data information form. The second stage, introducing crawler technology, firstly, automatically calling a search engine to search specified keywords, and crawling by customizing a crawler program for a few information websites to obtain related websites; then crawling each website html document, judging whether to develop the relevant business of oversea stock exchange according to whether a small number of specific keywords (such as 'shareholding exchange' and the like) are contained, then manually visiting and checking the website suspected to develop the relevant business, introducing third-party interface data to supplement website record information and the like.
The method in the first stage is completely completed manually, the monitoring period is long, and the overall efficiency is low; the method is subject to human efficiency, the number of the acquired websites is small, and the monitoring coverage is limited. The second stage method utilizes automatic crawling of programs, can acquire more websites, enlarges monitoring coverage and carries out primary screening on the acquired websites. However, due to the fact that the screening rule is too simple, the hit rate of the overseas stock transaction related service websites after screening is still low and never higher than 40%, and generally fluctuates up and down by 25%, so that a large number of unrelated websites enter a subsequent manual investigation link, and the subsequent monitoring working pressure is increased.
Besides the above methods, there is no product or method for monitoring websites specially for overseas stock exchange related services in the market at present. Although the related technology is helpful for improving the monitoring efficiency, the related technologies do not perform special research and model customization aiming at the related business types of overseas stock trading, and cannot meet the supervision requirement by combining with a front-line manual investigation working experience.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for monitoring overseas stock trading websites, which can improve the identification accuracy, reduce the manual investigation pressure and improve the monitoring efficiency.
The embodiment of the invention provides a method for monitoring an overseas stock trading website, which comprises the following steps:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The method for monitoring the overseas stock trading website according to the embodiment of the present invention, wherein the S1 specifically includes: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the overseas stock trading website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
The method for monitoring the overseas stock trading website according to the embodiment of the present invention, wherein the S2 specifically includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The method for monitoring the overseas stock trading website according to the embodiment of the present invention, wherein the S32 specifically includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
The method for monitoring the overseas stock trading website according to the embodiment of the present invention, wherein the S33 specifically includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is oversea stock transaction information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The method for monitoring the overseas stock trading website according to the embodiment of the present invention, wherein the step S3 is followed by the step of:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
The method for monitoring the overseas stock trading website according to the embodiment of the invention comprises the following specific steps of:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the overseas stock transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of the preset threshold keywords and the preset keyword dictionary of the overseas stock transaction service discrimination model according to the accuracy.
The embodiment of the invention provides a device for monitoring an overseas stock trading website, which comprises:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, for inputting the html document of the target object into a preset overseas stock transaction service discrimination model, and discriminating the service type developed by the target object, specifically including:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The embodiment of the invention provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the method for monitoring the overseas stock trading website.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps of the method for monitoring an overseas stock trading website.
The embodiment of the scheme collects the monitoring object, namely the suspicious website name, by automatically searching keywords in the preset keyword library, regularly crawling the overseas stock transaction information website and designating the website name list, improves the identification accuracy by using the pre-trained overseas stock transaction service discrimination model, identifies the overseas stock transaction service website suspected to be developed and improves the monitoring efficiency. Through practical tests, the embodiment of the invention has better and more stable overall operation effect, and compared with the method used in the prior art, the method has the advantages that the accuracy is greatly improved, the manual troubleshooting pressure is greatly reduced, and the monitoring efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring an overseas stock trading website according to an embodiment of the present invention;
fig. 2 is a partial flowchart of a method for monitoring an overseas stock trading website according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the operation of the overseas stock exchange business discrimination model according to an embodiment of the present invention;
fig. 4 is a schematic overall operation diagram of a method for monitoring an overseas stock trading website according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an apparatus for monitoring an overseas stock trading website according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1-2 are schematic flow charts of a method for monitoring an overseas stock trading website according to an embodiment of the present invention, as shown in fig. 1-2, the method includes:
a method of monitoring an overseas stock trading website, comprising:
s1: and collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof.
In S1, the target object is screened from the monitored objects by preset conditions, that is, the website related to the suspected overseas stock exchange website is obtained. Specifically, the following three approaches can be adopted,
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords; namely, based on the experience of one-line manual investigation, a unique preset keyword library is established, the addition of newly added keywords temporarily selected during current investigation is supported, a search engine is called according to the preset keyword library and the newly added keywords, and the searched website is crawled. The preset keyword library is updated irregularly according to the dynamic state of the suspicious website and the experience of the inspectors.
Crawling an information website related to an overseas stock trading website as a monitoring object; namely, the overseas stock transaction information platform is monitored, and the website to which the stock transaction information platform is directed is crawled regularly.
And taking the input websites or the batch imported websites as monitoring objects. Namely, the website is directly input or imported into the website list in batches, and the method is suitable for reporting clues and can also support directed troubleshooting tasks. The websites obtained by the three ways all enter S2. The overseas stock exchange website is an overseas stock exchange website, and a monitoring object mainly aims at the suspected overseas stock exchange website.
S2: and screening out the target object from the monitored objects through preset conditions.
Specifically, S2 includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects; in particular, the method screens the acquired websites by using a white list, and aims to exclude a part of irrelevant websites. Within the white list are mainly specific web site domain names, such as those of government departments, colleges and universities, and mainstream large media websites. The websites belonging to the white list are recorded as the websites of the white list for summarizing data, and after the main domain name comparison and screening, the rest monitoring objects enter S22.
S22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The abnormal website elimination refers to the situation that the website cannot be opened or is opened abnormally. If the webpage cannot be opened or is abnormally opened, the business can be carried out normally. None of the websites where these situations exist are of interest for monitoring and therefore need to be excluded. By crawling the website html document, special conditions can be eliminated according to the acquisition condition and simple analysis of html document content. The web site where the special case occurs will record the specific special case for the summary data, and the remaining web sites and their html documents go to S3.
Fig. 3 is a schematic diagram illustrating the operation of the oversea stock exchange business discrimination model in the method for monitoring the oversea stock exchange website according to the embodiment of the present invention, as shown in fig. 3, S3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample; that is, if the html document does not contain any threshold keyword, the output service type is an invalid sample. If any threshold keyword is included in the html document, the process proceeds to S32.
Specifically, the Regular matching refers to matching by using a Regular Expression, which is also called a Regular Expression (RE), and is a concept of computer science, and the Regular Expression is generally used to retrieve and replace text that conforms to a certain pattern (rule). Threshold keywords can be selected according to training of the multi-sample model and actual troubleshooting experience, the model is debugged repeatedly by taking the judgment accuracy as a target, the threshold keywords are increased or decreased, a preset keyword dictionary is modified, and assigned values and assigned rules are determined. And the judgment result is divided into 3 types of target business types, information and invalid samples, the classification is carried out according to the scoring condition of the website in the 3 types of business types, and the business types are recorded. The threshold keywords may be selected from shares, stocks, trades, etc.
S32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s32 specifically includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules; the classified keywords can be selected from the group consisting of shares, stocks, trades, and the like.
The format of the keyword dictionary is preset as { keyword A (whether the keyword A is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score), and keyword B (whether the keyword B is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score) … … }. And for the keywords which are enabled to be 'yes' (if the keywords are enabled to be 'no', the keywords do not participate in subsequent statistical calculation, the parameter has the functions of facilitating repeated debugging of the model, stopping enabling the keywords with poor judging effect or recovering enabling the keywords which are beneficial to judging), obtaining the occurrence frequency of each keyword through regular matching, updating the frequency initial value in the dictionary, calculating scores by combining the scores, the frequencies and the calculation rules, and updating the score initial value.
S322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
Because the number of keywords of each service type is different from the assigning and calculating rules, especially the information type has a wider range and a larger number of classified keywords, the score is relatively higher, and thus, the subsequent comparison has unreasonable advantages. Therefore, the standard score is set, so that the influence of the number difference of the model classification keywords on the result can be reduced, and the influence that part of websites do not reach the standard score can be avoided, and the websites are judged as the target service types only because of extremely individual classification keywords. The classification keywords may be the same as or overlapping with the threshold keywords, or may be different from the threshold keywords.
S33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
S33 specifically includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is oversea stock transaction information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The target business type refers to a business type which is mainly concerned by monitoring, and here refers to a related business type of stock trading outside the suspected situation; the information mainly refers to media websites, and also comprises overseas stock transaction information websites and general information websites, and the overseas stock transaction information websites are widely involved and often contain introduction of target service types, so that the target service types are greatly overlapped with the target service type websites in terms of service distinguishing keywords, and are classified separately and distinguished according to the information characteristics; the invalid sample class refers to a website, such as a gaming website, that is not related to the target traffic type. In addition, the website for developing the related business of suspected stock transaction and the website for paying attention to the off-shore stock transaction information website are concerned, so if the target business type score of the information website is higher, the website recorded as the off-shore stock transaction information business (namely the off-shore stock transaction information, which is different from the general information business) and the off-shore stock transaction business website enter the next step, and the website which is judged to be the general information and the invalid sample business is only recorded for summarizing data.
Fig. 4 is a schematic overall operation diagram of a method for monitoring an overseas stock trading website according to an embodiment of the present invention, as shown in fig. 4, further comprising, after the step S3:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object.
S5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; and extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP (Internet protocol) record information displayed by webpages (some webpages can display overdue ICP record information which cannot be obtained through a third-party data interface, and the displayed ICP record information has certain significance for searching an operation main body).
S6: and summarizing the process data from the S1 to the S5 and storing the process data in a database. The process data refers to the data recording results of all the websites related to the steps, and the data recording results are used for subsequent display or export.
Preferably, the training process of the preset overseas stock transaction decision model specifically includes:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the overseas stock transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of the preset threshold keywords and the preset keyword dictionary of the overseas stock transaction service discrimination model according to the accuracy.
The preset oversea stock exchange business discrimination model is trained, summarized and adjusted continuously according to error conditions in training samples, namely the training process is repeatedly executed, the oversea stock exchange business discrimination model is optimized continuously until the accuracy reaches the standard, and the method for monitoring the oversea stock exchange website can be used for ensuring certain accuracy. Under the condition that the websites are suspicious, the keywords which are used for searching the keyword library often have no function of distinguishing the service types. For example, if the html document of the website obtained by searching for "beautiful stock" generally contains the keyword of "beautiful stock", and if the "beautiful stock" is given a larger score in the model and belongs to the target service type, the overall result is biased to the target service type, which not only does not play a role in distinguishing the service type, but also may cause the classification result to be biased to the target service type by mistake, so that the selection and the assigning of the model keyword need more thinking. Thirdly, the calculation rule of the score of each keyword can be different according to different keywords, and the calculation rule is different according to the distinguishing effect of the keyword. For example, the financial is biased to the information category, but the target service type may have a certain news information area, so that a not too high score upper limit should be set for the score of the word; the "United nations" is relatively more biased toward information categories and the more frequent the occurrence is, the more likely it is to be an information website, so the way it is scored can be related to its frequency. Finally, some keywords may have multiple meanings, such as when the number of occurrences is small, the target business type is favored, and when the number of occurrences is very large, the information or invalid sample is favored.
The service type discrimination model takes accuracy as a target, thousands of website html documents are selected for training, and the accuracy is counted by taking results obtained by manually checking the websites as reference. The results are classified as correct and error, wherein errors are further classified as general and serious errors. The correct type means that the service type is judged without errors, the general error means that the information and the invalid sample are judged by mistake, and the serious error means that the target service type is judged as a non-target service type (information or invalid sample) or the non-target service type is judged by mistake as the target service type. In order to prevent overfitting, the accuracy of the finally selected model is over 80%, and the serious error rate is within 5%.
The embodiment of the scheme collects the monitoring object, namely the suspicious website name, by automatically searching keywords in the preset keyword library, regularly crawling the overseas stock transaction information website and designating the website name list, improves the identification accuracy by using the pre-trained overseas stock transaction service discrimination model, identifies the overseas stock transaction service website suspected to be developed and improves the monitoring efficiency. Through practical tests, the embodiment of the invention has better and more stable overall operation effect, and compared with the method used in the prior art, the method has the advantages that the accuracy is greatly improved, the manual troubleshooting pressure is greatly reduced, and the monitoring efficiency is improved.
Fig. 5 is a flowchart illustrating an apparatus for monitoring an overseas stock trading website according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes:
the monitoring object acquisition module 10 is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module 20 is connected with the monitored object acquisition module 10 and is used for screening the target object from the monitored objects according to preset conditions;
a service type discriminating module 30, connected to the target object screening module 20, for inputting html document of the target object into a preset overseas stock transaction service discriminating model, and discriminating the service type developed by the target object, specifically including:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The working principle of the device for monitoring the overseas stock trading website of the embodiment is corresponding to that of the method for monitoring the overseas stock trading website of the embodiment, and the detailed description is omitted. The device for monitoring the overseas stock exchange website of the embodiment of the scheme can be applied to the development of an illegal internet foreign exchange risk monitoring system, and is a special module which can be selected and called in the system.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a method of monitoring an overseas stock trading website, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, embodiments of the present invention also provide a computer program product including a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions which, when executed by a computer, the computer is capable of performing a method of monitoring an overseas stock exchange website, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In yet another aspect, embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform a method of monitoring an overseas stock trading website, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of monitoring an overseas stock trading website, comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset overseas stock transaction business discrimination model, and discriminating the business type developed by the target object, wherein the method specifically comprises the following steps:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
2. The method of monitoring an overseas stock exchange website as claimed in claim 1, wherein the S1 particularly includes: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the overseas stock trading website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
3. The method of monitoring an overseas stock exchange website as claimed in claim 1, wherein the S2 particularly includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
4. The method of monitoring an overseas stock exchange website as claimed in claim 1, wherein the S32 particularly includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
5. The method of monitoring an overseas stock exchange website as claimed in claim 1, wherein the S33 particularly includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is oversea stock transaction information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
6. The method of monitoring an overseas stock exchange website as claimed in claim 1, further comprising after the S3:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
7. The method for monitoring overseas stock exchange website as claimed in any one of claims 1 to 6, wherein the training process of the preset overseas stock exchange business discriminant model specifically comprises:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the overseas stock transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of the preset threshold keywords and the preset keyword dictionary of the overseas stock transaction service discrimination model according to the accuracy.
8. An apparatus for monitoring an overseas stock trading website, comprising:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, for inputting the html document of the target object into a preset overseas stock transaction service discrimination model, and discriminating the service type developed by the target object, specifically including:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of monitoring an overseas stock trading website of any one of claims 1-7 are implemented when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, performs the steps of the method of monitoring an overseas stock exchange website as claimed in any one of claims 1 to 7.
CN202011121231.XA 2020-10-19 2020-10-19 Method, device, equipment and storage medium for monitoring overseas stock trading website Withdrawn CN112256987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011121231.XA CN112256987A (en) 2020-10-19 2020-10-19 Method, device, equipment and storage medium for monitoring overseas stock trading website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011121231.XA CN112256987A (en) 2020-10-19 2020-10-19 Method, device, equipment and storage medium for monitoring overseas stock trading website

Publications (1)

Publication Number Publication Date
CN112256987A true CN112256987A (en) 2021-01-22

Family

ID=74245222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011121231.XA Withdrawn CN112256987A (en) 2020-10-19 2020-10-19 Method, device, equipment and storage medium for monitoring overseas stock trading website

Country Status (1)

Country Link
CN (1) CN112256987A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002033591A2 (en) * 2000-10-02 2002-04-25 Ninesigma, Inc. Apparatus and method for acquisition of research and development projects
CN102253998A (en) * 2011-07-12 2011-11-23 武汉大学 Method for automatically discovering and sequencing outdated webpage based on Web time inconsistency
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN109062972A (en) * 2018-06-29 2018-12-21 平安科技(深圳)有限公司 Web page classification method, device and computer readable storage medium
CN110163758A (en) * 2019-06-03 2019-08-23 成都慧财智科技有限公司 Artificial intelligence Stock investment analysis system
CN111107048A (en) * 2018-10-29 2020-05-05 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002033591A2 (en) * 2000-10-02 2002-04-25 Ninesigma, Inc. Apparatus and method for acquisition of research and development projects
CN102253998A (en) * 2011-07-12 2011-11-23 武汉大学 Method for automatically discovering and sequencing outdated webpage based on Web time inconsistency
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN109062972A (en) * 2018-06-29 2018-12-21 平安科技(深圳)有限公司 Web page classification method, device and computer readable storage medium
CN111107048A (en) * 2018-10-29 2020-05-05 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium
CN110163758A (en) * 2019-06-03 2019-08-23 成都慧财智科技有限公司 Artificial intelligence Stock investment analysis system

Similar Documents

Publication Publication Date Title
US8762180B2 (en) Claims analytics engine
EP4319054A2 (en) Identifying legitimate websites to remove false positives from domain discovery analysis
CN112348520A (en) XGboost-based risk assessment method and device and electronic equipment
CN105308640A (en) Methods and systems for automatically generating high quality adverse action notifications
CN112668859A (en) Big data based customer risk rating method, device, equipment and storage medium
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
CN111461216A (en) Case risk identification method based on machine learning
CN111275338A (en) Method, device, equipment and storage medium for judging enterprise fraud behaviors
CN116996325B (en) Network security detection method and system based on cloud computing
CN115063035A (en) Customer evaluation method, system, equipment and storage medium based on neural network
CN111061948A (en) User label recommendation method and device, computer equipment and storage medium
KR20210029326A (en) Apparatus and method for diagnosing soundness of company using unstructured financial information
CN110189170A (en) Market sentiment analysis method and system
CN112256988A (en) Method and device for monitoring cross-border house-buying website, electronic equipment and storage medium
CN112417329A (en) Method and device for monitoring illegal internet foreign exchange deposit transaction platform
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
CN112256987A (en) Method, device, equipment and storage medium for monitoring overseas stock trading website
CN115660101A (en) Data service providing method and device based on service node information
CN112256986A (en) Method and device for monitoring virtual currency website, electronic equipment and storage medium
CN114708090A (en) Bank payment business risk identification device based on big data
CN113239126A (en) Business activity information standardization scheme based on BOR method
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium
CN113487440A (en) Model generation method, health insurance claim settlement determination method, device, equipment and medium
CN113962573A (en) Regional financial development situation prediction method and device
KR20220099690A (en) Apparatus, method and computer program for summarizing document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210122

WW01 Invention patent application withdrawn after publication