CN115688107B - Fraud-related APP detection system and method - Google Patents
Fraud-related APP detection system and method Download PDFInfo
- Publication number
- CN115688107B CN115688107B CN202211692329.XA CN202211692329A CN115688107B CN 115688107 B CN115688107 B CN 115688107B CN 202211692329 A CN202211692329 A CN 202211692329A CN 115688107 B CN115688107 B CN 115688107B
- Authority
- CN
- China
- Prior art keywords
- fraud
- app
- module
- information
- monitoring module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000001514 detection method Methods 0.000 title claims description 32
- 238000012544 monitoring process Methods 0.000 claims abstract description 56
- 238000005516 engineering process Methods 0.000 claims abstract description 15
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000001914 filtration Methods 0.000 claims abstract description 12
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000004458 analytical method Methods 0.000 claims description 33
- 239000013598 vector Substances 0.000 claims description 26
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 15
- 238000011161 development Methods 0.000 claims description 14
- 230000018109 developmental process Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 10
- 238000010801 machine learning Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000010200 validation analysis Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 11
- 230000006399 behavior Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000003068 static effect Effects 0.000 description 8
- 230000007547 defect Effects 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000009960 carding Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A system or a method for detecting a fraud-related APP (application) is used for detecting whether the APP running on a smart device is fraud-related, and comprises the following steps: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module; the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and the application name, and determines a second-stage suspected fraud-related APP by comparing and analyzing the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list; carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the APP fraud possibility value; the result output module outputs an APP list with high possibility of involvement in fraud. Screening suspected fraud-related APP samples by the technology of 'AndrodManifest feature matching + application name similarity comparison + white list positive version APP signature certificate information filtering', then extracting text information from screenshots by utilizing webpage screenshots and an OCR technology, and judging whether a fraud webpage is involved or not by an algorithm.
Description
Technical Field
The application belongs to the technical field of computer security, and particularly relates to a fraud-related APP detection system and method.
Background
Xml is a necessary file in each android program. It is located in the root directory of the entire project, describing the exposed components (activities, services, etc.) in the package, their respective implementation classes, various data that can be processed and the starting location. In addition to declaring Activities, contentProviders, services, and Intent Receivers in a program, properties and instrumentation can be specified.
TF-IDF, term Frequency-Inverse Document Frequency, is mainly used to estimate the importance of a word in a Document.
In recent years, fraud using APP has become one of the main criminal means of the telecommunication phishing case. Among them, the phishing APPs such as network concurrent bill-reading and fast loan are more, and especially some APPs imitating various banks and financial platforms have greater confusion and deception.
Such fraud-related APPs are usually implemented by using "third-party mobile application rapid development platform framework code + integrated H5 website domain name", and the development cost is extremely low. Meanwhile, the fraud-related APP is mainly scam through an integrated H5 website page, malicious static codes are almost absent, sensitive permission is absent, malicious behaviors such as sending short messages and reading address lists are absent, and common mobile phone malicious application detection technologies based on static codes and dynamic behavior analysis cannot effectively identify the fraud-related APP.
At present, a common method for detecting malicious applications of a mobile phone includes: the analysis method is based on a static code analysis method (such as a Chinese patent application document with the application number of '202011536663.7'), a dynamic behavior analysis method (such as a Chinese patent application document with the application number of '201310309568.7'), an analysis method based on the combination of a static code and a dynamic behavior (such as a Chinese patent application document with the application number of '201910968202.8'), and the like.
The malicious mobile application detection technology based on static code analysis has the following defects: when detecting a fraud-related APP comprising a third-party mobile application rapid development platform framework code and an integrated H5 website domain name, only the code of the third-party mobile application rapid development platform can be scanned, and the code may exist in a normal application using the same mobile application rapid development platform, so that malicious static code features of the fraud-related APP cannot be extracted, and the fraud-related APP cannot be identified and detected.
The malicious mobile application detection technology based on dynamic behavior analysis has the following defects: fraud-related APPs developed using the third-party mobile application rapid development platform framework code + integrated H5 website domain name technology are generally defrauded by H5 webpages. For example, the false loan fraud APP induces the victim to upload the personal sensitive data through the integrated false loan H5 webpage, then communicates with the victim through the integrated chat webpage, and induces the victim to forward payment through the loan with the payment demand guarantee fund and other borrowers. Under the circumstance, the fraud-related APP does not have malicious behaviors such as sending short messages and stealing an address book, and finally the malicious mobile application detection technology based on dynamic behavior analysis cannot effectively detect the fraud-related APP.
The malicious mobile application detection technology based on the combination of static codes and dynamic behaviors has the following defects: when a fraud-related APP of 'third-party mobile application rapid development platform framework code + integrated H5 website domain name' is detected, static code characteristics and dynamic behavior characteristics cannot be extracted, and finally, effective detection on the fraud-related APP cannot be performed.
Disclosure of Invention
In order to solve the problems, the suspected fraud-related APP samples are screened by the technology of 'AnderManifest feature matching + application name similarity comparison + white list positive version APP signature certificate information filtering', then the webpage screenshots are utilized, text information is extracted from the screenshots through the OCR technology, and whether the webpage texts are fraud-related or not is judged through an algorithm, so that the automatic judging capability of the fraud-related APP is realized.
The technical solution for solving the above technical problem is a fraud-related APP detection system for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology includes: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module; the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to android manifest information and/or application names, compares and filters the first-stage suspected fraud-related APP with a white list positive version APP signature certificate, and determines a second-stage suspected fraud-related APP; the screen information monitoring module captures a screen of the running second-level suspected fraud-related APP to obtain an interface image of the APP running, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the probability high and low values of the APP fraud-related; the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module outputs prompt information to enable a user to manually operate screen capture, the screen capture module records or captures an interface of an APP in operation, and the image recognition and analysis module performs image recognition on an obtained APP interface image;
the result output module outputs an APP list with high possibility of involvement in fraud.
The technical solution for solving the above technical problem can also be a fraud-related APP detection system, for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology includes: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out first-stage suspected fraud-related APPs according to android Manifest information and/or application names; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of the APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of the APP fraud; the result output module outputs an APP list with high possibility of involvement in fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module is used for recording or capturing an interface of an APP in operation, and the image recognition and analysis module is used for recognizing an image of the obtained APP interface;
the anti-fraud monitoring module is used for testing more than 2 APPs according to an input test list; finding out first-stage suspected fraud-related APPs by setting keyword screening application names; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP; the text information analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
The technical scheme that above-mentioned technical problem was solved in this application can also be that screen information monitoring module includes screen capture module, image recognition analysis module, and the screen capture module carries out the interface to the APP in operation and records or the intercepting, and image recognition analysis module carries out image recognition to the APP interface image that obtains, and screen capture module output prompt information, and prompt information can be jump out the window or float window or fixed operating button, lets user manual operation screen intercepting.
The technical scheme for solving the technical problems can be that the image recognition and analysis module comprises a text information extraction module, a word segmentation module, a fraud-related webpage TF-IDF feature dictionary module, a TF-IDF vector calculation module and a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud.
The technical scheme for solving the technical problems can also be that the TF-IDF characteristic dictionary module related to the fraud webpage updates the TF-IDF characteristic dictionary through the network server.
The technical scheme for solving the technical problems can also be that the characteristic data information monitoring module comprises a to-be-detected sample information extraction module and a white list positive version APP signature certificate characteristic comparison module.
The technical scheme for solving the technical problem can also be that the white list genuine APP signature certificate feature comparison module updates the white list digital certificate feature through a network server.
The technical solution for solving the above technical problem may also be a fraud-related APP detection method, for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is involved, including:
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
step 100: finding out a first-stage suspected fraud-related APP according to the android message, the application name and/or the signature certificate, and comparing and analyzing the first-stage suspected fraud-related APP and the white-list legal version APP signature certificate to determine a second-stage suspected fraud-related APP;
step 200: and operating the suspected fraud-related APP at the second level, outputting prompt information, enabling a user to manually operate a screen to intercept, obtaining an interface image of the operation of the APP, carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the probability high-low value of the fraud-related APP.
For centralized testing, a list of APPs with high likelihood of fraud may be output.
The technical solution of the present application for solving the above technical problem may be that the step 100 includes:
step 110: acquiring android manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on an android Manifest matching rule feature library and an application name matching rule feature library;
step 140: and comparing and filtering the positive APP signature certificates of the white list, eliminating a white list sample, and determining the suspected fraud-related APP at the second level.
The technical solution of the present application for solving the above technical problem may further include that step 200 includes:
step 210: the method comprises the steps of capturing a screen of an operating APP to obtain an interface image of the operating APP;
step 220: carrying out image recognition on the interface image, and extracting text information;
step 230: and segmenting the text information to obtain phrases, analyzing and calculating the phrases to obtain the probability high and low values of APP fraud, wherein the analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
Step 230 may also include;
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud;
step 233: outputting an APP list with high possibility of concerning fraud.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described fraud-related APP detection method.
One of the technical effects of the technical scheme is as follows: primarily screening out first-stage suspected fraud-related APPs by the android manifest information and the application names; and eliminating regular software by using the white list positive version APP signature certificate to obtain the second-level suspected fraud APP, so that the workload of obtaining the interface image by obtaining the running APP can be greatly reduced, and the detection work is accelerated.
The second technical effect of the technical scheme is as follows: the anti-fraud monitoring module can be directly installed in a user mobile phone or intelligent equipment through user permission or manual operation of a user, and the suspicion of the background camera user mobile phone interface is eliminated.
The third technical effect of the technical scheme is as follows: the image recognition and analysis module comprises a text information extraction module and a word segmentation module, information of the H5 webpage is extracted, and the H5 webpage type-related APP can be recognized and detected.
The fourth technical effect of the technical scheme is as follows: automated test framework, test APP that can be in batches.
The fifth technical effect of the technical scheme is as follows: and eliminating regular software by using the white list positive version APP signature certificate to obtain the second-level suspected fraud-related APP, so that the workload of obtaining the interface image of the APP operation can be greatly reduced, and the detection work is accelerated.
The sixth technical effect of the technical scheme is as follows: the TF-IDF feature dictionary is updated through the network server, the latest feature dictionary can be obtained, and the anti-fraud monitoring module can aim at the latest key vocabulary in real time.
The seventh technical effect of the technical scheme is as follows: the white list legal version APP signature certificate features are updated through the network server, and the APP of a legal financial institution can be eliminated in time.
Drawings
FIG. 1 is a schematic block diagram of a fraud-related APP detection system;
FIG. 2 is a schematic block diagram including an Apdium automated test framework;
FIG. 3 is a schematic block diagram of the anti-fraud monitoring module internal modules;
FIG. 4 is a schematic diagram of the internal modules of the screen information monitoring module;
FIG. 5 is a schematic diagram of the internal modules of the image recognition analysis module
FIG. 6 is a schematic diagram of the internal modules of the characteristic data information monitoring module
FIG. 7 is a schematic flow chart of a method for detecting a fraud-related APP;
FIG. 8 is a schematic flow chart illustrating the process of determining suspected fraud-related APPs of the first and second levels;
FIG. 9 is a schematic flow diagram of screen shot information monitoring analysis;
fig. 10 is a flow chart diagram of the TF-IDF algorithm.
Detailed Description
The present disclosure is described in further detail below with reference to the drawings.
It should be noted that the following description is of the preferred embodiments of the present invention and should not be construed as limiting the invention in any way. The description of the preferred embodiments of the present invention is made merely for the purpose of illustrating the general principles of the invention. The embodiments described in this application are only some embodiments of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present application, it is to be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the device or element so referred to must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be construed as limiting the invention. Furthermore, the terms "first", "second", and technical features numbered with Arabic numerals 1, 2, 3, etc., and such numbers as "A" and "B", are used for descriptive purposes only and are not intended to represent a temporal or spatial ordering; are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", and numbered with an arabic numeral 1, 2, 3, etc., may explicitly or implicitly include one or more of the features. In the description of the present invention, "a plurality" means two or more unless specifically limited otherwise.
Referring to fig. 1, a system for detecting a fraud-related APP, which is operated on a smart device, includes: as shown in fig. 3, the anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out first-stage suspected fraud-related APPs according to android manifest information and/or application name signature certificates; comparing and analyzing the first-stage suspected fraud-related APP with the white-list positive APP signature certificate, and determining a second-stage suspected fraud-related APP; finding out the suspected fraud-related APP at the first level can be found out by setting keyword screening.
The screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of the APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of the APP fraud;
the result output module outputs an APP list with high possibility of involvement in fraud.
After image recognition is carried out, after the text information of the APP running interface is obtained, whether the text information relates to the behaviors of luring users to loan and loan is judged by various methods, for example, a neural network algorithm, an artificial intelligence algorithm and the like are adopted, the calculated results of the algorithms are presented in terms of possibility, for example, 0 to 100%, and the output list is manually judged according to the high possibility, for example, higher than 80%.
Because the APP is operated and the operation interface of the APP is obtained, the required time is long, the calculated amount is large, all the APPs cannot be tested in a short time, the first-level suspected fraud-related APP is found out by using keywords first, the APP of the regular financial institution is eliminated through the white-list formal APP signature certificate, the number of the APPs which need to be identified by image identification processing is greatly reduced, and the working efficiency is greatly improved.
The anti-fraud monitoring module can be a software module embedded in the intelligent equipment and can also be an APP installed at the later stage of the intelligent equipment, the system authority of the anti-fraud monitoring module is higher, and information of other APPs can be obtained and operation interfaces of other software can be intercepted during the operation of other APPs.
The android Manifest information and the application name of the sample to be detected can be obtained through the aapt tool, the signature certificate of the sample to be detected can be obtained through the Keytool tool, and the android Manifest.xml information is obtained from the APK to be detected through an aapt dump xmltree xxx.
The system acquires application name information (application-label) from an APK to be checked through an' aapt dump bag addressing xxx.
The system obtains the signature certificate information from the APK to be checked through a 'keytool-printcert-jarfile d: \ 18i6ic.apk' command, wherein the signature certificate information comprises an owner, an effect starting time, an effect ending time, a school queue number and the like.
And comparing the android Manifest information and the application name information of the sample to be detected to screen a first-stage fraud-related APP sample based on the android Manifest matching rule feature library, the application name matching rule feature library and the imitated enterprise APP original edition digital certificate feature library.
And (4) carding original APP certificate information of the common counterfeited enterprise by a security expert, and inputting the original APP certificate information into a counterfeit enterprise APP original digital certificate feature library to form a white list sample.
The system compares the android Manifest information, the application name information and the signature certificate information of the sample to be detected based on the android Manifest matching rule feature library, the application name matching rule feature library and the APP original edition digital certificate feature library of the counterfeited enterprise, and screens suspected fraud-related APP samples. The android manifest information is matched by keywords, the application name filters punctuation marks/special characters first (the current phishing APP has the condition of mixing punctuation marks or special characters, such as 'Jing, east, jin, bar') and then is matched by a regular expression, and the signature certificate is matched by a serial number. If the sample to be detected hits the android match and the application name matching rule at the same time; and then, determining the suspected fraud APP at the second level if the corresponding signature certificate does not exist in the original digital certificate feature library of the counterfeit enterprise APP.
As shown in fig. 2, the system further includes an APP automated testing framework, where the APPs run in the APP automated testing framework, and the anti-fraud monitoring module tests more than 2 APPs according to the input testing list. Adopt the automatic test frame of APP, can carry out the automatic start-up operation to a lot of APPs, test APP in batches, this kind of mode can be used in special detection instrument of wading with the fraud. The automatic test framework can be selected for use in a variety of ways, and test software capable of automatically driving APP to run can be selected for use.
Primarily screening first-level suspected fraud-related APPs through android Manifest information and/or application names; and comparing and filtering the white list positive version APP signature certificate, and removing the normal software to obtain the second-level suspected fraud-related APP, so that the workload of obtaining the interface image of the APP operation can be greatly reduced, the detection work is accelerated, and the text information analysis algorithm has various choices including TF-IDF, WORD2VEC or/and BERT.
As shown in fig. 4, the screen information monitoring module includes a screen capture module and an image recognition and analysis module, the screen capture module performs interface recording or capturing on the running APP, the image recognition and analysis module performs image recognition on the obtained APP interface image, and the screen capture module outputs prompt information, which can be a jump-out window or a fixed or floating control button, to enable a user to manually operate screen capturing.
If the screen capturing is required to obtain higher authority when the screen capturing is required when the screen capturing is operated on intelligent equipment such as a mobile phone of a user, prompt information can be given to prompt the user that the screen is currently captured or a window is jumped out, so that the user can manually operate the screen capturing. The anti-fraud monitoring module can be directly installed in a user mobile phone or smart device through user permission or manual operation of the user.
As shown in fig. 5, the image recognition and analysis module includes a text information extraction module, a word segmentation module, a TF-IDF feature dictionary module of a fraud-related webpage, a TF-IDF vector calculation module, and a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP fraud.
As shown in FIG. 5, the TF-IDF characteristic dictionary module related to the fraud webpage updates the TF-IDF characteristic dictionary through the network server.
The TF-IDF feature dictionary is updated through the network server, the latest feature dictionary can be obtained, and the anti-fraud monitoring module can aim at the latest key vocabulary in real time.
As shown in fig. 6, the characteristic data information monitoring module includes a to-be-detected sample information extraction module and a white list positive version APP signature certificate characteristic comparison module.
As shown in fig. 6, the whitelist positive version APP signature certificate feature comparison module updates the whitelist digital certificate feature through the web server.
By updating the white list positive version APP signature certificate characteristics through the network server, the APP of the normal financial institution can be excluded.
As shown in fig. 7, a method for detecting a fraud-related APP, which is used to detect whether an APP running on a smart device is fraud-related, includes:
step 100: finding out first-stage suspected fraud-related APPs according to the android manifest information, the application names and/or the signature certificates, and comparing and analyzing the first-stage suspected fraud-related APPs with a white list to determine second-stage suspected fraud-related APPs;
step 200: and operating the suspected fraud-related APP at the second level, performing screen capture to obtain an interface image of the APP operation, performing image recognition on the interface image, extracting text information, analyzing the text information to obtain the high and low values of the probability of the APP to the fraud, and outputting an APP list with high probability of the fraud.
As shown in fig. 8, step 100 includes:
step 110: acquiring android manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on the android match rule feature library and the application name match rule feature library;
step 140: and comparing and filtering the positive APP signature certificates of the white list, eliminating a white list sample, and determining the suspected fraud-related APP at the second level.
As shown in fig. 9, step 200 includes:
step 210: the method comprises the steps of performing screen capture on an operating APP to obtain an interface image of the APP operation;
step 220: performing image recognition on the interface image, and extracting text information;
step 230: the text information is segmented to obtain phrases, the phrases are analyzed and calculated to obtain the probability high and low values of APP fraud, and the algorithm of the analysis and calculation comprises TF-IDF, WORD2VEC or/and BERT.
Step 230 includes:
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP for fraudulence;
step 233: outputting an APP list with high possibility of concerning fraud.
TF-IDF, term Frequency-Inverse Document Frequency, is mainly used to estimate the importance of a word in a Document.
Description of the symbols:
document set: d = { D1, D2, D3., dn }
nw, d: number of occurrences of word w in document d
{ wd }: set of all words in document d
nw: number of documents containing word w
In the step 231, the process proceeds to,the calculation formula of the word frequency TF is as follows
Inverse document frequency IDF calculation formula
The TF-IDF is calculated by the formula
In step 232, based on the trained fraud-related webpage text classification Machine learning model (a linear SVC linear classification Support Vector Machine (SVM) supervised learning algorithm is adopted), and with screenshot text TF-IDF Vector as input, whether the sample to be detected is a fraud-related APP and the corresponding type are researched and judged.
And the TF-IDF vector is used as input, calculation and classification are carried out through a classification machine learning model, the probability degree of the sample relating to fraud can be obtained, and for the samples which are larger than a set value, a fraud-related APP list is output and final judgment is carried out manually.
A readable storage medium having stored thereon a computer program for executing the above method by a processor.
While the invention has been illustrated and described in terms of a preferred embodiment and several alternatives, the invention is not limited by the specific description in this specification. Other additional alternative or equivalent components may also be used in the practice of the present invention.
Claims (11)
1. A kind of detection system of APP related to the fraud, is used for detecting and is based on the APP that the quick development platform framework code of third party's mobile application and integrated H5 website domain name technical development is related to the fraud, characterized by including: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list, and determining the second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of APP fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module outputs prompt information to enable a user to manually operate screen capture, the screen capture module records or captures an interface of an APP in operation, and the image recognition and analysis module performs image recognition on an obtained APP interface image;
the result output module outputs an APP list with high possibility of involvement in fraud.
2. A fraud-related APP detection system is used for detecting whether a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is involved, and is characterized by comprising: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list, and determining the second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of APP fraud; the result output module outputs an APP list with high possibility of involvement in fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module is used for recording or capturing an interface of an APP in operation, and the image recognition and analysis module is used for recognizing an image of the obtained APP interface;
the anti-fraud monitoring module is used for testing more than 2 APPs according to an input test list; the first-level suspected fraud-related APP is found out by setting a keyword screening application name; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP; the text information analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
3. The fraud-related APP detection system of claim 1 or 2, wherein the image recognition analysis module comprises a text information extraction module, a word segmentation module, a fraud-related webpage TF-IDF feature dictionary module, a TF-IDF vector calculation module, a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP fraud.
4. The fraud-related APP detection system of claim 3, wherein said fraud-related webpage TF-IDF feature dictionary module updates a TF-IDF feature dictionary through a network server.
5. The fraud-related APP detection system of claim 3, wherein the characteristic data information monitoring module comprises a to-be-detected sample information extraction module and a white-list positive version APP signature certificate characteristic comparison module.
6. The fraud-related APP detection system of claim 5, wherein the whitelist positive APP signature certificate feature comparison module updates whitelist positive APP signature certificate features through a network server.
7. A fraud-related APP detection method is used for detecting whether an APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is fraud-related or not, and is characterized by comprising the following steps:
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
step 100: finding out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP and the white-list genuine APP signature certificate, and determining a second-stage suspected fraud-related APP;
step 200: and operating the suspected fraud-related APP at the second level, outputting prompt information, enabling a user to manually operate screen capture to obtain an interface image of the APP operation, carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the probability height value of the APP fraud-related APP.
8. The fraud-related APP detection method of claim 7, wherein said step 100 comprises:
step 110: acquiring android Manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on the android match rule feature library and the application name match rule feature library;
step 140: and comparing and filtering according to the positive APP signature certificates of the white list, eliminating white list samples, and determining the suspected fraud-related APP of the second level.
9. The fraud-related APP detection method of claim 8, wherein said step 200 comprises:
step 210: the method comprises the steps of capturing a screen of an operating APP to obtain an interface image of the operating APP;
step 220: performing image recognition on the interface image, and extracting text information;
step 230: and performing WORD segmentation on the text information to obtain a phrase, and performing analysis calculation on the phrase to obtain a probability high-low value of APP fraud, wherein the algorithm of the analysis calculation comprises TF-IDF, WORD2VEC or/and BERT.
10. The fraud-related APP detection method of claim 9, wherein said step 230 comprises:
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF feature dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud;
step 233: outputting a list of APPs with high probability of fraud.
11. A readable storage medium having stored thereon a computer program, characterized in that,
the program, when executed by a processor, implements the fraud-related APP detection method of any one of claims 7 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211692329.XA CN115688107B (en) | 2022-12-28 | 2022-12-28 | Fraud-related APP detection system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211692329.XA CN115688107B (en) | 2022-12-28 | 2022-12-28 | Fraud-related APP detection system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115688107A CN115688107A (en) | 2023-02-03 |
CN115688107B true CN115688107B (en) | 2023-04-11 |
Family
ID=85055081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211692329.XA Active CN115688107B (en) | 2022-12-28 | 2022-12-28 | Fraud-related APP detection system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115688107B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115859292B (en) * | 2023-02-20 | 2023-05-09 | 卓望数码技术(深圳)有限公司 | Fraud-related APP detection system, fraud-related APP judgment method and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6039768B1 (en) * | 2015-08-12 | 2016-12-07 | 日本電信電話株式会社 | ADJUSTMENT DEVICE, ADJUSTMENT METHOD, AND ADJUSTMENT PROGRAM |
CN107169049B (en) * | 2017-04-25 | 2023-04-28 | 腾讯科技(深圳)有限公司 | Application tag information generation method and device |
CN107871080A (en) * | 2017-12-04 | 2018-04-03 | 杭州安恒信息技术有限公司 | The hybrid Android malicious code detecting methods of big data and device |
CN114492584A (en) * | 2021-12-28 | 2022-05-13 | 南方科技大学 | Automatic content grading method for android Chinese application market |
CN114662033B (en) * | 2022-04-06 | 2024-05-03 | 昆明信息港传媒有限责任公司 | Multi-mode harmful link identification based on text and image |
CN115292674A (en) * | 2022-08-08 | 2022-11-04 | 重庆邮电大学 | Fraud application detection method and system based on user comment data |
-
2022
- 2022-12-28 CN CN202211692329.XA patent/CN115688107B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115688107A (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107203765B (en) | Sensitive image detection method and device | |
CN113450147B (en) | Product matching method, device, equipment and storage medium based on decision tree | |
CN106713579B (en) | Telephone number identification method and device | |
CN112464237B (en) | Static code security diagnosis method and device | |
CN111861731A (en) | Post-credit check system and method based on OCR | |
CN109801151B (en) | Financial falsification risk monitoring method, device, computer equipment and storage medium | |
CN110209841A (en) | A kind of fraud analysis method and device based on swindle case merit | |
CN113221032A (en) | Link risk detection method, device and storage medium | |
CN114448664A (en) | Phishing webpage identification method and device, computer equipment and storage medium | |
CN115688107B (en) | Fraud-related APP detection system and method | |
CN113946826A (en) | A method, system, device and medium for silent analysis and monitoring of vulnerability fingerprints | |
CN113836297B (en) | Training method and device for text emotion analysis model | |
CN113568934B (en) | Data query method and device, electronic equipment and storage medium | |
CN110955796A (en) | Case characteristic information extraction method and device based on record information | |
CN112818150B (en) | Picture content auditing method, device, equipment and medium | |
CN115171125A (en) | Data anomaly detection method | |
CN114386013A (en) | Automatic student status authentication method and device, computer equipment and storage medium | |
CN112698883A (en) | Configuration data processing method, device, terminal and storage medium | |
CN111143858A (en) | Data checking method and device | |
CN113988226B (en) | Data desensitization validity verification method and device, computer equipment and storage medium | |
CN111931687B (en) | Bill identification method and device | |
CN112163217B (en) | Malware variant identification method, device, equipment and computer storage medium | |
CN115859292B (en) | Fraud-related APP detection system, fraud-related APP judgment method and storage medium | |
Banerjee et al. | Quote examiner: verifying quoted images using web-based text similarity | |
CN113868416A (en) | Detection method, device, computer equipment and medium for abnormal short message |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |