CN112417329A - Method and device for monitoring illegal internet foreign exchange deposit transaction platform - Google Patents

Method and device for monitoring illegal internet foreign exchange deposit transaction platform Download PDF

Info

Publication number
CN112417329A
CN112417329A CN202011523198.3A CN202011523198A CN112417329A CN 112417329 A CN112417329 A CN 112417329A CN 202011523198 A CN202011523198 A CN 202011523198A CN 112417329 A CN112417329 A CN 112417329A
Authority
CN
China
Prior art keywords
foreign exchange
monitoring
target object
service type
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011523198.3A
Other languages
Chinese (zh)
Inventor
张黎娜
冯晓飞
刘泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Internet Finance Association
Original Assignee
China Internet Finance Association
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Internet Finance Association filed Critical China Internet Finance Association
Publication of CN112417329A publication Critical patent/CN112417329A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a method and a device for monitoring an illegal internet foreign exchange deposit transaction platform, which comprises the following steps: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof; screening a target object from the monitoring objects through a preset condition; and inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object. According to the embodiment of the invention, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the suspected illegal internet foreign exchange deposit guarantee information website and designating the website list, the identification accuracy is improved by using the pre-trained illegal internet foreign exchange deposit guarantee business discrimination model, the suspected illegal internet foreign exchange deposit guarantee business website is identified, the manual investigation pressure is greatly reduced, and the monitoring efficiency is improved.

Description

Method and device for monitoring illegal internet foreign exchange deposit transaction platform
Technical Field
The invention belongs to the technical field of internet monitoring, and particularly relates to a method and a device for monitoring an illegal internet foreign exchange deposit transaction platform.
Background
With the development and popularization of the internet, part of platforms develop illegal foreign exchange transaction services through the internet, in particular to foreign exchange deposit transactions prohibited in China. Some organizations hold foreign license plates but do not hold the license plates at home, illegal financial services such as foreign exchange deposit transaction, stock transaction and the like are provided for cross-border at home through websites, relevant laws and regulations of the country are violated, huge risks are brought to investors, and the healthy development of internet finance is not facilitated.
The 'notice about illegal foreign exchange futures and foreign exchange fund-pressing trading activities at strict inspection places' issued by the China testimony indicates that the illegal act of illegal foreign exchange futures and foreign exchange fund-pressing trading belongs to illegal behaviors, wherein the illegal act is carried out by financial institutions, futures brokerages and other institutions which are not approved by the China testimony and the national foreign exchange administration and registered in the national industrial and commercial administration; the customers (units and individuals) entrust the institutions which are not approved to register to carry out foreign currency futures and foreign currency deposit transaction, and the law violation of the deposit in foreign currency or RMB is also caused. The "meeting summary about implementation of China certificate and prison, State foreign exchange administration, State Industrial and commercial administration, and Notification about illegal foreign exchange futures and foreign exchange fund-pressing transaction activities in strict inspection department" issued by the Ministry of China, such as China certificate and prison, mentioned that since 1980, the relevant departments of the State academy only approve foreign exchange appointed banks and a small number of non-bank financial institutions to buy and sell real-time foreign exchange commodities, but never approve any unit of foreign exchange commodities and foreign exchange fund-pressing transactions, and all institutions for developing such businesses are illegal operations. The client entrusts these institutions with the foreign exchange futures and foreign exchange deposit transaction is not allowed by the current legislation in China, and therefore is illegal.
The related supervision departments pay close attention to the websites for developing illegal internet foreign exchange deposit transaction services and actively develop monitoring work. The method used in the earlier stage mainly comprises the following steps: the first stage, firstly, searching relevant keywords by manually using a search engine, accessing an illegal internet foreign exchange deposit guarantee information website, and collecting websites by receiving reporting clues and other modes; then, manually opening the obtained websites one by one for checking and verifying, inquiring the record information of the websites and the like, and filling in a data information form. The second stage, introducing crawler technology, firstly, automatically calling a search engine to search specified keywords, and crawling by customizing a crawler program for a part of information websites to obtain related websites; then crawling each website html document, judging whether to develop related services of illegal internet foreign exchange deposit transaction according to whether a small number of specific keywords (such as foreign exchange deposit transaction) are contained, then carrying out manual access investigation on the website suspected to develop related services, introducing third-party interface data to supplement website filing information and the like.
The method in the first stage is completely completed manually, the monitoring period is long, and the overall efficiency is low; the method is subject to human efficiency, the number of the acquired websites is small, and the monitoring coverage is limited. The second stage method utilizes automatic crawling of programs, can acquire more websites, enlarges monitoring coverage and carries out primary screening on the acquired websites. However, due to the fact that the screening rule is too simple, the hit rate of the illegal internet foreign exchange guaranteed fund transaction related service websites is still low and never exceeds 40%, and the hit rate generally fluctuates by 25%, so that a large number of unrelated websites enter a subsequent manual investigation link, and the subsequent monitoring working pressure is increased.
Besides the methods, currently, no device or method specially used for monitoring websites of related services of illegal internet foreign exchange deposit transactions exists in the market, and special model customization is not performed on the types of the related services of the illegal internet foreign exchange deposit transactions, so that the problems of high manual investigation pressure, low identification accuracy, low monitoring efficiency and the like are caused.
Disclosure of Invention
The embodiment of the invention provides a method and a device for monitoring an illegal internet foreign exchange deposit transaction platform, which can improve the identification accuracy, reduce the manual investigation pressure and improve the monitoring efficiency.
The embodiment of the invention provides a method for monitoring an illegal internet foreign exchange deposit transaction platform, which comprises the following steps:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The method for monitoring the illegal internet foreign exchange deposit guarantee transaction platform according to the embodiment of the invention, wherein the step S1 specifically comprises the following steps: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the foreign exchange deposit transaction website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
The method for monitoring the illegal internet foreign exchange deposit guarantee transaction platform according to the embodiment of the invention, wherein the step S2 specifically comprises the following steps:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The method for monitoring the illegal internet foreign exchange deposit guarantee transaction platform according to the embodiment of the invention, wherein the step S32 specifically comprises the following steps:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
The method for monitoring the illegal internet foreign exchange deposit guarantee transaction platform according to the embodiment of the invention, wherein the step S33 specifically comprises the following steps:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is foreign exchange information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The method for monitoring the illegal internet foreign exchange deposit guarantee transaction platform according to the embodiment of the invention, wherein the step S3 is followed by the following steps:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
The method for monitoring the illegal internet foreign exchange deposit transaction platform according to the embodiment of the invention comprises the following steps of:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the illegal internet foreign exchange deposit transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of a preset threshold keyword and a preset keyword dictionary of the illegal internet foreign exchange deposit transaction service judgment model according to the accuracy.
The embodiment of the invention provides a device for monitoring an illegal internet foreign exchange deposit guarantee trading platform, which comprises:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, and configured to input the html document of the target object to a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The embodiment of the invention provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the method for monitoring the illegal internet foreign exchange deposit guarantee platform.
Embodiments of the present invention provide a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for monitoring an illegal internet foreign exchange deposit transaction platform.
According to the method and the device for monitoring the illegal internet foreign exchange deposit transaction platform, provided by the embodiment of the invention, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the foreign exchange information website and designating the website list, the identification accuracy is improved by using the pre-trained illegal internet foreign exchange deposit transaction service discrimination model, and the suspected illegal internet foreign exchange deposit transaction website is identified.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring an illegal internet foreign exchange deposit transaction platform according to an embodiment of the present invention;
fig. 2 is a partial flowchart of a method for monitoring an illegal internet foreign exchange deposit transaction platform according to an embodiment of the present invention;
fig. 3 is a schematic operation diagram of a discrimination model of illegal internet foreign exchange deposit transaction service according to an embodiment of the present invention;
fig. 4 is a schematic overall operation diagram of a method for monitoring an illegal internet foreign exchange deposit transaction platform according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an apparatus for monitoring illegal Internet foreign exchange deposit transaction platform according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1-2 are schematic flow charts of a method for monitoring an illegal internet foreign exchange deposit transaction platform according to an embodiment of the present invention, as shown in fig. 1-2, the method includes:
a method for monitoring an illegal Internet foreign exchange deposit transaction platform comprises the following steps:
s1: and collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof.
In S1, a target object is screened from the monitored objects according to a preset condition, that is, the website of the suspicious website is obtained. Specifically, the following three approaches can be adopted,
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords; namely, based on the experience of one-line manual investigation, a unique preset keyword library is established, the addition of newly added keywords temporarily selected during current investigation is supported, a search engine is called according to the preset keyword library and the newly added keywords, and the searched website is crawled. The preset keyword library is updated irregularly according to the dynamic state of the suspicious website and the experience of the inspectors.
Crawling an information website related to a foreign exchange deposit transaction website as a monitoring object; namely, the foreign exchange information platform is monitored, and the website to which the foreign exchange information platform is directed is crawled regularly.
And taking the input websites or the batch imported websites as monitoring objects. Namely, the website is directly input or imported into the website list in batches, and the method is suitable for reporting clues and can also support directed troubleshooting tasks. The websites obtained by the three ways all enter S2.
S2: and screening out the target object from the monitored objects through preset conditions.
Specifically, S2 includes:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects; in particular, the method screens the acquired websites by using a white list, and aims to exclude a part of irrelevant websites. Within the white list are mainly specific web site domain names, such as those of government departments, colleges and universities, and mainstream large media websites. The websites belonging to the white list are recorded as the websites of the white list for summarizing data, and after the main domain name comparison and screening, the rest monitoring objects enter S22.
S22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
The abnormal website elimination refers to the situation that the website cannot be opened or is opened abnormally. If the webpage cannot be opened or is abnormally opened, the business can be carried out normally. None of the websites where these situations exist are of interest for monitoring and therefore need to be excluded. By crawling the website html document, special conditions can be eliminated according to the acquisition condition and simple analysis of html document content. The web site where the special case occurs will record the specific special case for the summary data, and the remaining web sites and their html documents go to S3.
Fig. 3 is a schematic operation diagram of an illegal internet foreign exchange deposit transaction service discrimination model in the method for monitoring an illegal internet foreign exchange deposit transaction platform according to the embodiment of the present invention, as shown in fig. 3, S3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample; that is, if the html document does not contain any threshold keyword, the output service type is an invalid sample. If any threshold keyword is included in the html document, the process proceeds to S32.
Specifically, the Regular matching refers to matching by using a Regular Expression, which is also called a Regular Expression (RE), and is a concept of computer science, and the Regular Expression is generally used to retrieve and replace text that conforms to a certain pattern (rule). Threshold keywords can be selected according to training of the multi-sample model and actual troubleshooting experience, the model is debugged repeatedly by taking the judgment accuracy as a target, the threshold keywords are increased or decreased, a preset keyword dictionary is modified, and assigned values and assigned rules are determined. And the judgment result is divided into 3 types of target business types, information and invalid samples, the classification is carried out according to the scoring condition of the website in the 3 types of business types, and the business types are recorded. The threshold keywords can be selected from foreign exchange, deposit, financial, transaction and the like.
S32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s32 specifically includes:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules; the classified keywords can be foreign currencies, deposit, financial, transaction, etc.
The format of the keyword dictionary is preset as { keyword A (whether the keyword A is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score), and keyword B (whether the keyword B is started, the business type of the keyword, the assigned score, the used calculation rule, the occurrence frequency and the score) … … }. And for the keywords which are enabled to be 'yes' (if the keywords are enabled to be 'no', the keywords do not participate in subsequent statistical calculation, the parameter has the functions of facilitating repeated debugging of the model, stopping enabling the keywords with poor judging effect or recovering enabling the keywords which are beneficial to judging), obtaining the occurrence frequency of each keyword through regular matching, updating the frequency initial value in the dictionary, calculating scores by combining the scores, the frequencies and the calculation rules, and updating the score initial value.
S322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
Because the number of keywords of each service type is different from the assigning and calculating rules, especially the information type has a wider range and a larger number of classified keywords, the score is relatively higher, and thus, the subsequent comparison has unreasonable advantages. Therefore, the standard score is set, so that the influence of the number difference of the model classification keywords on the result can be reduced, and the influence that part of websites do not reach the standard score can be avoided, and the websites are judged as the target service types only because of extremely individual classification keywords. The classification keywords may be the same as or overlapping with the threshold keywords, or may be different from the threshold keywords.
S33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
S33 specifically includes:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is foreign exchange information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
The target service type refers to a service type which is mainly concerned by monitoring, and here refers to a service type related to illegal internet foreign exchange deposit transaction; the information mainly refers to media websites and also comprises foreign exchange information websites and general information websites, and the foreign exchange information websites are wide in content and often contain introduction of target service types, so that the foreign exchange information websites are greatly overlapped with the target service type websites in terms of service distinguishing keywords, and are classified separately and distinguished according to the information characteristics; the invalid sample class refers to a website, such as a gaming website, that is not related to the target traffic type. In addition, in addition to paying attention to the website for carrying out related services of illegal internet foreign exchange deposit transaction, the foreign exchange information website also needs to be paid attention to, so if the target service type score of the information website is higher, the foreign exchange information service is recorded (namely the illegal internet foreign exchange deposit transaction information which is different from the general information service), the next step is carried out together with the illegal internet foreign exchange deposit transaction service website, and the website which is judged to be the general information and invalid sample service is only recorded for summarizing data.
Fig. 4 is a schematic overall operation diagram of the method for monitoring an illegal internet foreign exchange deposit transaction platform according to the embodiment of the present invention, as shown in fig. 4, further, after the step S3, the method further includes:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object.
S5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; and extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP (Internet protocol) record information displayed by webpages (some webpages can display overdue ICP record information which cannot be obtained through a third-party data interface, and the displayed ICP record information has certain significance for searching an operation main body).
S6: and summarizing the process data from the S1 to the S5 and storing the process data in a database. The process data refers to the data recording results of all the websites related to the steps, and the data recording results are used for subsequent display or export.
Preferably, the training process of the preset illegal internet foreign exchange deposit guarantee business discrimination model specifically includes:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the illegal internet foreign exchange deposit transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of a preset threshold keyword and a preset keyword dictionary of the illegal internet foreign exchange deposit transaction service judgment model according to the accuracy.
The preset illegal internet foreign exchange deposit guarantee fund transaction distinguishing model is trained, summarized and adjusted continuously according to error situations in training samples, namely the training process is repeatedly executed, the illegal internet foreign exchange deposit guarantee fund transaction distinguishing model is optimized continuously until the accuracy reaches the standard, and the method for monitoring the illegal internet foreign exchange deposit guarantee fund transaction platform can be used for ensuring certain accuracy. Under the condition that the websites are suspicious, the keywords which are used for searching the keyword library often have no function of distinguishing the service types. For example, if the html document of the website obtained by searching the "foreign exchange" generally contains the keyword of the "foreign exchange", and if the "foreign exchange" is given a large score in the model and belongs to the target service type, the overall result is biased to the target service type, which not only does not play a role in distinguishing the service type, but also may cause the classification result to be biased to the target service type by mistake, so that the selection and the assigning of the model keyword need more thinking. Thirdly, the calculation rule of the score of each keyword can be different according to different keywords, and the calculation rule is different according to the distinguishing effect of the keyword. For example, the financial is biased to the information category, but the target service type may have a certain news information area, so that a not too high score upper limit should be set for the score of the word; the "United nations" is relatively more biased toward information categories and the more frequent the occurrence is, the more likely it is to be an information website, so the way it is scored can be related to its frequency. Finally, some keywords may have multiple meanings, such as when the number of occurrences is small, the target business type is favored, and when the number of occurrences is very large, the information or invalid sample is favored.
The service type discrimination model takes accuracy as a target, thousands of website html documents are selected for training, and the accuracy is counted by taking results obtained by manually checking the websites as reference. The results are classified as correct and error, wherein errors are further classified as general and serious errors. The correct type means that the service type is judged without errors, the general error means that the information and the invalid sample are judged by mistake, and the serious error means that the target service type is judged as a non-target service type (information or invalid sample) or the non-target service type is judged by mistake as the target service type. In order to prevent overfitting, the accuracy of the finally selected model is over 80%, and the serious error rate is within 5%.
According to the embodiment, the monitoring object, namely the suspicious website list, is collected by automatically searching the keywords in the preset keyword library, regularly crawling the foreign exchange information website and designating the website list, the identification accuracy is improved by using the pre-trained illegal internet foreign exchange deposit transaction service distinguishing model, the suspected illegal internet foreign exchange deposit service development website is identified, and the monitoring efficiency is improved. Through practical tests, the embodiment of the invention has better and more stable overall operation effect, and compared with the method used in the prior art, the method has the advantages that the accuracy is greatly improved, the manual troubleshooting pressure is greatly reduced, and the monitoring efficiency is improved.
Fig. 5 is a schematic flow chart of an apparatus for monitoring an illegal internet foreign exchange deposit transaction platform according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes:
the monitoring object acquisition module 10 is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module 20 is connected with the monitored object acquisition module 10 and is used for screening the target object from the monitored objects according to preset conditions;
a service type discriminating module 30, connected to the target object screening module 20, configured to input the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discriminating model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The working principle of the device for monitoring the illegal internet foreign exchange deposit transaction platform according to the embodiment of the present application is corresponding to the method for monitoring the illegal internet foreign exchange deposit transaction platform according to the embodiment, and the details are not repeated here.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a method of monitoring an illegal internet foreign exchange deposit trading platform, the method comprising:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing a method for monitoring an illegal internet foreign exchange deposit trading platform, the method including:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform a method for monitoring an illegal internet foreign exchange deposit transaction platform, the method including:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for monitoring an illegal Internet foreign exchange deposit transaction platform is characterized by comprising the following steps:
s1: collecting monitoring objects, wherein the monitoring objects comprise a plurality of websites and html documents thereof;
s2: screening a target object from the monitoring objects through a preset condition;
s3: inputting the html document of the target object into a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminating the service type developed by the target object, specifically comprising:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
2. The method for monitoring illegal internet foreign exchange deposit transaction platform according to claim 1, wherein the step S1 specifically comprises: screening a target object from the monitoring objects through a preset condition;
calling a search engine according to a preset keyword library, and crawling a monitored object, wherein the preset keyword library comprises preset keywords and newly added keywords;
or crawling an information website related to the foreign exchange deposit transaction website as a monitoring object;
or the input websites or the batch imported websites are used as monitoring objects.
3. The method for monitoring illegal internet foreign exchange deposit transaction platform according to claim 1, wherein the step S2 specifically comprises:
s21: screening the monitoring objects by using a white list, and eliminating websites belonging to the white list to obtain residual monitoring objects;
s22: and crawling html documents of the remaining monitoring objects, and eliminating abnormal websites according to the acquisition condition and the analysis of html document contents to obtain the target object.
4. The method for monitoring illegal internet foreign exchange deposit transaction platform according to claim 1, wherein the step S32 specifically comprises:
s321: classifying and scoring the html documents according to a preset keyword dictionary, wherein the preset keyword dictionary takes classified keywords as indexes, and the values of the classified keywords comprise whether the classified keywords are enabled, which service types the classified keywords belong to, assigned values, used calculation rules and occurrence frequencies, and scores calculated according to the assigned values, the occurrence frequencies and the used calculation rules;
s322: classifying the html documents according to the service types to which the classification keywords belong, respectively summing up the scores of each service type to obtain initial scores of the html documents on the three service types, and subtracting the standard scores corresponding to the service types from the initial scores on the service types to obtain the final scores of the html documents on the service types.
5. The method for monitoring illegal internet foreign exchange deposit transaction platform according to claim 1, wherein the step S33 specifically comprises:
if the final score of the target service type is the highest, judging whether the final score of the target service type is greater than 0, if so, judging that the service type developed by the target object is the target service type, and if not, judging that the service type developed by the target object is an invalid sample;
if the final score of the information is the highest, judging whether the final score of the target business type is larger than 0, if so, judging that the business type developed by the target object is foreign exchange information, and if not, judging that the business type developed by the target object is general information;
and if the final score of the invalid sample is the highest, judging the type of the business developed by the target object as the invalid sample.
6. The method for monitoring illegal internet foreign exchange deposit transaction platform according to claim 1, wherein said step S3 is followed by further comprising:
s4: judging whether the service type developed by the target object is a target service type, if so, executing S5; if not, recording the service type developed by the target object;
s5: introducing third party interface data to supplement third party related information of the target object, wherein the third party related information comprises ICP record information and IP address information; extracting and analyzing the html document to obtain website related information of the target object, wherein the website related information comprises copyright information and ICP filing information displayed by a webpage;
s6: and summarizing the process data from the S1 to the S5 and storing the process data in a database.
7. The method for monitoring the illegal internet foreign exchange deposit transaction platform according to any one of claims 1 to 6, wherein the training process of the preset illegal internet foreign exchange deposit transaction service discrimination model specifically comprises:
selecting a plurality of preset samples as target objects, respectively inputting the plurality of preset samples into the illegal internet foreign exchange deposit transaction service discrimination model, executing the steps from S31 to S33, and outputting service types of the plurality of preset samples;
counting the accuracy of the service types of the output preset samples by taking the actual service types of the preset samples as reference;
and correcting the content of a preset threshold keyword and a preset keyword dictionary of the illegal internet foreign exchange deposit transaction service judgment model according to the accuracy.
8. A device for monitoring illegal Internet foreign exchange deposit transaction platform is characterized by comprising:
the monitoring object acquisition module is used for acquiring monitoring objects, and the monitoring objects comprise a plurality of websites and html documents thereof;
the target object screening module is connected with the monitoring object acquisition module and used for screening the target object from the monitoring objects according to preset conditions;
a service type discrimination module, connected to the target object screening module, and configured to input the html document of the target object to a preset illegal internet foreign exchange deposit transaction service discrimination model, and discriminate the service type developed by the target object, which specifically includes:
s31: searching and matching whether preset threshold keywords exist in the input html document through regular matching; if yes, go to S32; if not, the output service type is an invalid sample;
s32: classifying and scoring the html documents according to a preset keyword dictionary, and calculating the final score of the html documents in each service type according to the standard score corresponding to each service type; wherein, the service type comprises a target service type, information and an invalid sample;
s33: and judging the business type developed by the target object based on the classification result of the html document and the final score of each business type.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of monitoring an illegal internet foreign exchange deposit transaction platform according to any of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of monitoring an illegal internet foreign exchange deposit trading platform according to any one of claims 1-7.
CN202011523198.3A 2020-10-19 2020-12-21 Method and device for monitoring illegal internet foreign exchange deposit transaction platform Withdrawn CN112417329A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020111212521 2020-10-19
CN202011121252 2020-10-19

Publications (1)

Publication Number Publication Date
CN112417329A true CN112417329A (en) 2021-02-26

Family

ID=74782437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011523198.3A Withdrawn CN112417329A (en) 2020-10-19 2020-12-21 Method and device for monitoring illegal internet foreign exchange deposit transaction platform

Country Status (1)

Country Link
CN (1) CN112417329A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129120A (en) * 2021-04-16 2021-07-16 建信金融科技有限责任公司 Financial institution data supervision method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100082673A1 (en) * 2008-09-30 2010-04-01 Kabushiki Kaisha Toshiba Apparatus, method and program product for classifying web browsing purposes
US20100293062A1 (en) * 2009-05-14 2010-11-18 Rajan Lukose Advertisement selection based on key words
CN106484919A (en) * 2016-11-15 2017-03-08 任子行网络技术股份有限公司 A kind of industrial sustainability sorting technique based on webpage autonomous word and system
CN106960063A (en) * 2017-04-20 2017-07-18 广州优亚信息技术有限公司 A kind of internet information crawl and commending system for field of inviting outside investment
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN109274632A (en) * 2017-07-12 2019-01-25 中国移动通信集团广东有限公司 A kind of recognition methods of website and device
CN109308330A (en) * 2018-07-24 2019-02-05 国家计算机网络与信息安全管理中心 The method of enterprise's leakage information extraction, analysis and classification Internet-based
CN110413908A (en) * 2018-04-26 2019-11-05 维布络有限公司 The method and apparatus classified based on web site contents to uniform resource locator
CN110825998A (en) * 2019-08-09 2020-02-21 国家计算机网络与信息安全管理中心 Website identification method and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100082673A1 (en) * 2008-09-30 2010-04-01 Kabushiki Kaisha Toshiba Apparatus, method and program product for classifying web browsing purposes
US20100293062A1 (en) * 2009-05-14 2010-11-18 Rajan Lukose Advertisement selection based on key words
CN106484919A (en) * 2016-11-15 2017-03-08 任子行网络技术股份有限公司 A kind of industrial sustainability sorting technique based on webpage autonomous word and system
CN106960063A (en) * 2017-04-20 2017-07-18 广州优亚信息技术有限公司 A kind of internet information crawl and commending system for field of inviting outside investment
CN109274632A (en) * 2017-07-12 2019-01-25 中国移动通信集团广东有限公司 A kind of recognition methods of website and device
CN107766481A (en) * 2017-10-13 2018-03-06 国家计算机网络与信息安全管理中心 A kind of method and system for finding internet financial platform
CN110413908A (en) * 2018-04-26 2019-11-05 维布络有限公司 The method and apparatus classified based on web site contents to uniform resource locator
CN109308330A (en) * 2018-07-24 2019-02-05 国家计算机网络与信息安全管理中心 The method of enterprise's leakage information extraction, analysis and classification Internet-based
CN110825998A (en) * 2019-08-09 2020-02-21 国家计算机网络与信息安全管理中心 Website identification method and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129120A (en) * 2021-04-16 2021-07-16 建信金融科技有限责任公司 Financial institution data supervision method and device

Similar Documents

Publication Publication Date Title
US7720751B2 (en) System and method of continuous assurance for internal control
CN112348520A (en) XGboost-based risk assessment method and device and electronic equipment
US20130036038A1 (en) Financial activity monitoring system
CN112668859A (en) Big data based customer risk rating method, device, equipment and storage medium
CN110689438A (en) Enterprise financial risk scoring method and device, computer equipment and storage medium
US20210182859A1 (en) System And Method For Modifying An Existing Anti-Money Laundering Rule By Reducing False Alerts
CN113011973B (en) Method and equipment for financial transaction supervision model based on intelligent contract data lake
CN106506454A (en) Fraud business recognition method and device
CN101194286A (en) Risk based data assessment
CN110781308A (en) Anti-fraud system for building knowledge graph based on big data
CN111985937A (en) Method, system, storage medium and computer equipment for evaluating value information of transaction traders
US20170221075A1 (en) Fraud inspection framework
CN112037006A (en) Credit risk identification method and device for small and micro enterprises
CN112417329A (en) Method and device for monitoring illegal internet foreign exchange deposit transaction platform
CN112256988A (en) Method and device for monitoring cross-border house-buying website, electronic equipment and storage medium
CN111951105A (en) Intelligent credit wind control system based on multidimensional big data analysis
Wang et al. How does the pandemic change operational risk? Evidence from textual risk disclosures in financial reports
CN114708090A (en) Bank payment business risk identification device based on big data
CN114723548A (en) Data processing method, apparatus, device, medium, and program product
CN113870007A (en) Product recommendation method, device, equipment and medium
CN112256987A (en) Method, device, equipment and storage medium for monitoring overseas stock trading website
CN112256986A (en) Method and device for monitoring virtual currency website, electronic equipment and storage medium
CN113962573A (en) Regional financial development situation prediction method and device
CN111914542A (en) Suspected illegal investment market subject identification method, device, terminal and storage medium
TWM591191U (en) System for monitoring and analyzing negative news

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210226

WW01 Invention patent application withdrawn after publication