CN115688107B - Fraud-related APP detection system and method - Google Patents

Fraud-related APP detection system and method Download PDF

Info

Publication number
CN115688107B
CN115688107B CN202211692329.XA CN202211692329A CN115688107B CN 115688107 B CN115688107 B CN 115688107B CN 202211692329 A CN202211692329 A CN 202211692329A CN 115688107 B CN115688107 B CN 115688107B
Authority
CN
China
Prior art keywords
fraud
app
module
information
monitoring module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211692329.XA
Other languages
Chinese (zh)
Other versions
CN115688107A (en
Inventor
周宇飞
马洪晓
胡铁
熊瑛
叶蕴芳
潘淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aspire Technologies Shenzhen Ltd
Original Assignee
Aspire Technologies Shenzhen Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aspire Technologies Shenzhen Ltd filed Critical Aspire Technologies Shenzhen Ltd
Priority to CN202211692329.XA priority Critical patent/CN115688107B/en
Publication of CN115688107A publication Critical patent/CN115688107A/en
Application granted granted Critical
Publication of CN115688107B publication Critical patent/CN115688107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A system or a method for detecting a fraud-related APP (application) is used for detecting whether the APP running on a smart device is fraud-related, and comprises the following steps: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module; the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and the application name, and determines a second-stage suspected fraud-related APP by comparing and analyzing the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list; carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the APP fraud possibility value; the result output module outputs an APP list with high possibility of involvement in fraud. Screening suspected fraud-related APP samples by the technology of 'AndrodManifest feature matching + application name similarity comparison + white list positive version APP signature certificate information filtering', then extracting text information from screenshots by utilizing webpage screenshots and an OCR technology, and judging whether a fraud webpage is involved or not by an algorithm.

Description

Fraud-related APP detection system and method
Technical Field
The application belongs to the technical field of computer security, and particularly relates to a fraud-related APP detection system and method.
Background
Xml is a necessary file in each android program. It is located in the root directory of the entire project, describing the exposed components (activities, services, etc.) in the package, their respective implementation classes, various data that can be processed and the starting location. In addition to declaring Activities, contentProviders, services, and Intent Receivers in a program, properties and instrumentation can be specified.
TF-IDF, term Frequency-Inverse Document Frequency, is mainly used to estimate the importance of a word in a Document.
In recent years, fraud using APP has become one of the main criminal means of the telecommunication phishing case. Among them, the phishing APPs such as network concurrent bill-reading and fast loan are more, and especially some APPs imitating various banks and financial platforms have greater confusion and deception.
Such fraud-related APPs are usually implemented by using "third-party mobile application rapid development platform framework code + integrated H5 website domain name", and the development cost is extremely low. Meanwhile, the fraud-related APP is mainly scam through an integrated H5 website page, malicious static codes are almost absent, sensitive permission is absent, malicious behaviors such as sending short messages and reading address lists are absent, and common mobile phone malicious application detection technologies based on static codes and dynamic behavior analysis cannot effectively identify the fraud-related APP.
At present, a common method for detecting malicious applications of a mobile phone includes: the analysis method is based on a static code analysis method (such as a Chinese patent application document with the application number of '202011536663.7'), a dynamic behavior analysis method (such as a Chinese patent application document with the application number of '201310309568.7'), an analysis method based on the combination of a static code and a dynamic behavior (such as a Chinese patent application document with the application number of '201910968202.8'), and the like.
The malicious mobile application detection technology based on static code analysis has the following defects: when detecting a fraud-related APP comprising a third-party mobile application rapid development platform framework code and an integrated H5 website domain name, only the code of the third-party mobile application rapid development platform can be scanned, and the code may exist in a normal application using the same mobile application rapid development platform, so that malicious static code features of the fraud-related APP cannot be extracted, and the fraud-related APP cannot be identified and detected.
The malicious mobile application detection technology based on dynamic behavior analysis has the following defects: fraud-related APPs developed using the third-party mobile application rapid development platform framework code + integrated H5 website domain name technology are generally defrauded by H5 webpages. For example, the false loan fraud APP induces the victim to upload the personal sensitive data through the integrated false loan H5 webpage, then communicates with the victim through the integrated chat webpage, and induces the victim to forward payment through the loan with the payment demand guarantee fund and other borrowers. Under the circumstance, the fraud-related APP does not have malicious behaviors such as sending short messages and stealing an address book, and finally the malicious mobile application detection technology based on dynamic behavior analysis cannot effectively detect the fraud-related APP.
The malicious mobile application detection technology based on the combination of static codes and dynamic behaviors has the following defects: when a fraud-related APP of 'third-party mobile application rapid development platform framework code + integrated H5 website domain name' is detected, static code characteristics and dynamic behavior characteristics cannot be extracted, and finally, effective detection on the fraud-related APP cannot be performed.
Disclosure of Invention
In order to solve the problems, the suspected fraud-related APP samples are screened by the technology of 'AnderManifest feature matching + application name similarity comparison + white list positive version APP signature certificate information filtering', then the webpage screenshots are utilized, text information is extracted from the screenshots through the OCR technology, and whether the webpage texts are fraud-related or not is judged through an algorithm, so that the automatic judging capability of the fraud-related APP is realized.
The technical solution for solving the above technical problem is a fraud-related APP detection system for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology includes: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module; the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to android manifest information and/or application names, compares and filters the first-stage suspected fraud-related APP with a white list positive version APP signature certificate, and determines a second-stage suspected fraud-related APP; the screen information monitoring module captures a screen of the running second-level suspected fraud-related APP to obtain an interface image of the APP running, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the probability high and low values of the APP fraud-related; the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module outputs prompt information to enable a user to manually operate screen capture, the screen capture module records or captures an interface of an APP in operation, and the image recognition and analysis module performs image recognition on an obtained APP interface image;
the result output module outputs an APP list with high possibility of involvement in fraud.
The technical solution for solving the above technical problem can also be a fraud-related APP detection system, for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology includes: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out first-stage suspected fraud-related APPs according to android Manifest information and/or application names; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of the APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of the APP fraud; the result output module outputs an APP list with high possibility of involvement in fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module is used for recording or capturing an interface of an APP in operation, and the image recognition and analysis module is used for recognizing an image of the obtained APP interface;
the anti-fraud monitoring module is used for testing more than 2 APPs according to an input test list; finding out first-stage suspected fraud-related APPs by setting keyword screening application names; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP; the text information analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
The technical scheme that above-mentioned technical problem was solved in this application can also be that screen information monitoring module includes screen capture module, image recognition analysis module, and the screen capture module carries out the interface to the APP in operation and records or the intercepting, and image recognition analysis module carries out image recognition to the APP interface image that obtains, and screen capture module output prompt information, and prompt information can be jump out the window or float window or fixed operating button, lets user manual operation screen intercepting.
The technical scheme for solving the technical problems can be that the image recognition and analysis module comprises a text information extraction module, a word segmentation module, a fraud-related webpage TF-IDF feature dictionary module, a TF-IDF vector calculation module and a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud.
The technical scheme for solving the technical problems can also be that the TF-IDF characteristic dictionary module related to the fraud webpage updates the TF-IDF characteristic dictionary through the network server.
The technical scheme for solving the technical problems can also be that the characteristic data information monitoring module comprises a to-be-detected sample information extraction module and a white list positive version APP signature certificate characteristic comparison module.
The technical scheme for solving the technical problem can also be that the white list genuine APP signature certificate feature comparison module updates the white list digital certificate feature through a network server.
The technical solution for solving the above technical problem may also be a fraud-related APP detection method, for detecting whether or not a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is involved, including:
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
step 100: finding out a first-stage suspected fraud-related APP according to the android message, the application name and/or the signature certificate, and comparing and analyzing the first-stage suspected fraud-related APP and the white-list legal version APP signature certificate to determine a second-stage suspected fraud-related APP;
step 200: and operating the suspected fraud-related APP at the second level, outputting prompt information, enabling a user to manually operate a screen to intercept, obtaining an interface image of the operation of the APP, carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the probability high-low value of the fraud-related APP.
For centralized testing, a list of APPs with high likelihood of fraud may be output.
The technical solution of the present application for solving the above technical problem may be that the step 100 includes:
step 110: acquiring android manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on an android Manifest matching rule feature library and an application name matching rule feature library;
step 140: and comparing and filtering the positive APP signature certificates of the white list, eliminating a white list sample, and determining the suspected fraud-related APP at the second level.
The technical solution of the present application for solving the above technical problem may further include that step 200 includes:
step 210: the method comprises the steps of capturing a screen of an operating APP to obtain an interface image of the operating APP;
step 220: carrying out image recognition on the interface image, and extracting text information;
step 230: and segmenting the text information to obtain phrases, analyzing and calculating the phrases to obtain the probability high and low values of APP fraud, wherein the analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
Step 230 may also include;
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud;
step 233: outputting an APP list with high possibility of concerning fraud.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described fraud-related APP detection method.
One of the technical effects of the technical scheme is as follows: primarily screening out first-stage suspected fraud-related APPs by the android manifest information and the application names; and eliminating regular software by using the white list positive version APP signature certificate to obtain the second-level suspected fraud APP, so that the workload of obtaining the interface image by obtaining the running APP can be greatly reduced, and the detection work is accelerated.
The second technical effect of the technical scheme is as follows: the anti-fraud monitoring module can be directly installed in a user mobile phone or intelligent equipment through user permission or manual operation of a user, and the suspicion of the background camera user mobile phone interface is eliminated.
The third technical effect of the technical scheme is as follows: the image recognition and analysis module comprises a text information extraction module and a word segmentation module, information of the H5 webpage is extracted, and the H5 webpage type-related APP can be recognized and detected.
The fourth technical effect of the technical scheme is as follows: automated test framework, test APP that can be in batches.
The fifth technical effect of the technical scheme is as follows: and eliminating regular software by using the white list positive version APP signature certificate to obtain the second-level suspected fraud-related APP, so that the workload of obtaining the interface image of the APP operation can be greatly reduced, and the detection work is accelerated.
The sixth technical effect of the technical scheme is as follows: the TF-IDF feature dictionary is updated through the network server, the latest feature dictionary can be obtained, and the anti-fraud monitoring module can aim at the latest key vocabulary in real time.
The seventh technical effect of the technical scheme is as follows: the white list legal version APP signature certificate features are updated through the network server, and the APP of a legal financial institution can be eliminated in time.
Drawings
FIG. 1 is a schematic block diagram of a fraud-related APP detection system;
FIG. 2 is a schematic block diagram including an Apdium automated test framework;
FIG. 3 is a schematic block diagram of the anti-fraud monitoring module internal modules;
FIG. 4 is a schematic diagram of the internal modules of the screen information monitoring module;
FIG. 5 is a schematic diagram of the internal modules of the image recognition analysis module
FIG. 6 is a schematic diagram of the internal modules of the characteristic data information monitoring module
FIG. 7 is a schematic flow chart of a method for detecting a fraud-related APP;
FIG. 8 is a schematic flow chart illustrating the process of determining suspected fraud-related APPs of the first and second levels;
FIG. 9 is a schematic flow diagram of screen shot information monitoring analysis;
fig. 10 is a flow chart diagram of the TF-IDF algorithm.
Detailed Description
The present disclosure is described in further detail below with reference to the drawings.
It should be noted that the following description is of the preferred embodiments of the present invention and should not be construed as limiting the invention in any way. The description of the preferred embodiments of the present invention is made merely for the purpose of illustrating the general principles of the invention. The embodiments described in this application are only some embodiments of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present application, it is to be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the device or element so referred to must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be construed as limiting the invention. Furthermore, the terms "first", "second", and technical features numbered with Arabic numerals 1, 2, 3, etc., and such numbers as "A" and "B", are used for descriptive purposes only and are not intended to represent a temporal or spatial ordering; are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", and numbered with an arabic numeral 1, 2, 3, etc., may explicitly or implicitly include one or more of the features. In the description of the present invention, "a plurality" means two or more unless specifically limited otherwise.
Referring to fig. 1, a system for detecting a fraud-related APP, which is operated on a smart device, includes: as shown in fig. 3, the anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out first-stage suspected fraud-related APPs according to android manifest information and/or application name signature certificates; comparing and analyzing the first-stage suspected fraud-related APP with the white-list positive APP signature certificate, and determining a second-stage suspected fraud-related APP; finding out the suspected fraud-related APP at the first level can be found out by setting keyword screening.
The screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of the APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of the APP fraud;
the result output module outputs an APP list with high possibility of involvement in fraud.
After image recognition is carried out, after the text information of the APP running interface is obtained, whether the text information relates to the behaviors of luring users to loan and loan is judged by various methods, for example, a neural network algorithm, an artificial intelligence algorithm and the like are adopted, the calculated results of the algorithms are presented in terms of possibility, for example, 0 to 100%, and the output list is manually judged according to the high possibility, for example, higher than 80%.
Because the APP is operated and the operation interface of the APP is obtained, the required time is long, the calculated amount is large, all the APPs cannot be tested in a short time, the first-level suspected fraud-related APP is found out by using keywords first, the APP of the regular financial institution is eliminated through the white-list formal APP signature certificate, the number of the APPs which need to be identified by image identification processing is greatly reduced, and the working efficiency is greatly improved.
The anti-fraud monitoring module can be a software module embedded in the intelligent equipment and can also be an APP installed at the later stage of the intelligent equipment, the system authority of the anti-fraud monitoring module is higher, and information of other APPs can be obtained and operation interfaces of other software can be intercepted during the operation of other APPs.
The android Manifest information and the application name of the sample to be detected can be obtained through the aapt tool, the signature certificate of the sample to be detected can be obtained through the Keytool tool, and the android Manifest.xml information is obtained from the APK to be detected through an aapt dump xmltree xxx.
The system acquires application name information (application-label) from an APK to be checked through an' aapt dump bag addressing xxx.
The system obtains the signature certificate information from the APK to be checked through a 'keytool-printcert-jarfile d: \ 18i6ic.apk' command, wherein the signature certificate information comprises an owner, an effect starting time, an effect ending time, a school queue number and the like.
And comparing the android Manifest information and the application name information of the sample to be detected to screen a first-stage fraud-related APP sample based on the android Manifest matching rule feature library, the application name matching rule feature library and the imitated enterprise APP original edition digital certificate feature library.
And (4) carding original APP certificate information of the common counterfeited enterprise by a security expert, and inputting the original APP certificate information into a counterfeit enterprise APP original digital certificate feature library to form a white list sample.
The system compares the android Manifest information, the application name information and the signature certificate information of the sample to be detected based on the android Manifest matching rule feature library, the application name matching rule feature library and the APP original edition digital certificate feature library of the counterfeited enterprise, and screens suspected fraud-related APP samples. The android manifest information is matched by keywords, the application name filters punctuation marks/special characters first (the current phishing APP has the condition of mixing punctuation marks or special characters, such as 'Jing, east, jin, bar') and then is matched by a regular expression, and the signature certificate is matched by a serial number. If the sample to be detected hits the android match and the application name matching rule at the same time; and then, determining the suspected fraud APP at the second level if the corresponding signature certificate does not exist in the original digital certificate feature library of the counterfeit enterprise APP.
As shown in fig. 2, the system further includes an APP automated testing framework, where the APPs run in the APP automated testing framework, and the anti-fraud monitoring module tests more than 2 APPs according to the input testing list. Adopt the automatic test frame of APP, can carry out the automatic start-up operation to a lot of APPs, test APP in batches, this kind of mode can be used in special detection instrument of wading with the fraud. The automatic test framework can be selected for use in a variety of ways, and test software capable of automatically driving APP to run can be selected for use.
Primarily screening first-level suspected fraud-related APPs through android Manifest information and/or application names; and comparing and filtering the white list positive version APP signature certificate, and removing the normal software to obtain the second-level suspected fraud-related APP, so that the workload of obtaining the interface image of the APP operation can be greatly reduced, the detection work is accelerated, and the text information analysis algorithm has various choices including TF-IDF, WORD2VEC or/and BERT.
As shown in fig. 4, the screen information monitoring module includes a screen capture module and an image recognition and analysis module, the screen capture module performs interface recording or capturing on the running APP, the image recognition and analysis module performs image recognition on the obtained APP interface image, and the screen capture module outputs prompt information, which can be a jump-out window or a fixed or floating control button, to enable a user to manually operate screen capturing.
If the screen capturing is required to obtain higher authority when the screen capturing is required when the screen capturing is operated on intelligent equipment such as a mobile phone of a user, prompt information can be given to prompt the user that the screen is currently captured or a window is jumped out, so that the user can manually operate the screen capturing. The anti-fraud monitoring module can be directly installed in a user mobile phone or smart device through user permission or manual operation of the user.
As shown in fig. 5, the image recognition and analysis module includes a text information extraction module, a word segmentation module, a TF-IDF feature dictionary module of a fraud-related webpage, a TF-IDF vector calculation module, and a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP fraud.
As shown in FIG. 5, the TF-IDF characteristic dictionary module related to the fraud webpage updates the TF-IDF characteristic dictionary through the network server.
The TF-IDF feature dictionary is updated through the network server, the latest feature dictionary can be obtained, and the anti-fraud monitoring module can aim at the latest key vocabulary in real time.
As shown in fig. 6, the characteristic data information monitoring module includes a to-be-detected sample information extraction module and a white list positive version APP signature certificate characteristic comparison module.
As shown in fig. 6, the whitelist positive version APP signature certificate feature comparison module updates the whitelist digital certificate feature through the web server.
By updating the white list positive version APP signature certificate characteristics through the network server, the APP of the normal financial institution can be excluded.
As shown in fig. 7, a method for detecting a fraud-related APP, which is used to detect whether an APP running on a smart device is fraud-related, includes:
step 100: finding out first-stage suspected fraud-related APPs according to the android manifest information, the application names and/or the signature certificates, and comparing and analyzing the first-stage suspected fraud-related APPs with a white list to determine second-stage suspected fraud-related APPs;
step 200: and operating the suspected fraud-related APP at the second level, performing screen capture to obtain an interface image of the APP operation, performing image recognition on the interface image, extracting text information, analyzing the text information to obtain the high and low values of the probability of the APP to the fraud, and outputting an APP list with high probability of the fraud.
As shown in fig. 8, step 100 includes:
step 110: acquiring android manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on the android match rule feature library and the application name match rule feature library;
step 140: and comparing and filtering the positive APP signature certificates of the white list, eliminating a white list sample, and determining the suspected fraud-related APP at the second level.
As shown in fig. 9, step 200 includes:
step 210: the method comprises the steps of performing screen capture on an operating APP to obtain an interface image of the APP operation;
step 220: performing image recognition on the interface image, and extracting text information;
step 230: the text information is segmented to obtain phrases, the phrases are analyzed and calculated to obtain the probability high and low values of APP fraud, and the algorithm of the analysis and calculation comprises TF-IDF, WORD2VEC or/and BERT.
Step 230 includes:
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP for fraudulence;
step 233: outputting an APP list with high possibility of concerning fraud.
TF-IDF, term Frequency-Inverse Document Frequency, is mainly used to estimate the importance of a word in a Document.
Description of the symbols:
document set: d = { D1, D2, D3., dn }
nw, d: number of occurrences of word w in document d
{ wd }: set of all words in document d
nw: number of documents containing word w
In the step 231, the process proceeds to,the calculation formula of the word frequency TF is as follows
Figure SMS_1
Inverse document frequency IDF calculation formula
Figure SMS_2
The TF-IDF is calculated by the formula
Figure SMS_3
In step 232, based on the trained fraud-related webpage text classification Machine learning model (a linear SVC linear classification Support Vector Machine (SVM) supervised learning algorithm is adopted), and with screenshot text TF-IDF Vector as input, whether the sample to be detected is a fraud-related APP and the corresponding type are researched and judged.
And the TF-IDF vector is used as input, calculation and classification are carried out through a classification machine learning model, the probability degree of the sample relating to fraud can be obtained, and for the samples which are larger than a set value, a fraud-related APP list is output and final judgment is carried out manually.
A readable storage medium having stored thereon a computer program for executing the above method by a processor.
While the invention has been illustrated and described in terms of a preferred embodiment and several alternatives, the invention is not limited by the specific description in this specification. Other additional alternative or equivalent components may also be used in the practice of the present invention.

Claims (11)

1. A kind of detection system of APP related to the fraud, is used for detecting and is based on the APP that the quick development platform framework code of third party's mobile application and integrated H5 website domain name technical development is related to the fraud, characterized by including: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list, and determining the second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of APP fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module outputs prompt information to enable a user to manually operate screen capture, the screen capture module records or captures an interface of an APP in operation, and the image recognition and analysis module performs image recognition on an obtained APP interface image;
the result output module outputs an APP list with high possibility of involvement in fraud.
2. A fraud-related APP detection system is used for detecting whether a fraud-related APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is involved, and is characterized by comprising: anti-fraud monitoring module, anti-fraud monitoring module includes: the system comprises a characteristic data information monitoring module, a screen information monitoring module and a result output module;
the characteristic data information monitoring module finds out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP with the positive version APP signature certificate of the white list, and determining the second-stage suspected fraud-related APP;
the screen information monitoring module captures a screen of a second-level suspected fraud-related APP, obtains an interface image of APP operation, performs image recognition on the interface image, extracts text information, and analyzes the text information to obtain the possibility high and low values of APP fraud; the result output module outputs an APP list with high possibility of involvement in fraud;
the screen information monitoring module comprises a screen capture module and an image recognition and analysis module, the screen capture module is used for recording or capturing an interface of an APP in operation, and the image recognition and analysis module is used for recognizing an image of the obtained APP interface;
the anti-fraud monitoring module is used for testing more than 2 APPs according to an input test list; the first-level suspected fraud-related APP is found out by setting a keyword screening application name; comparing and filtering the first-stage suspected fraud-related APP with the genuine APP signature certificate of the white list, and determining a second-stage suspected fraud-related APP; the text information analysis algorithm comprises TF-IDF, WORD2VEC or/and BERT.
3. The fraud-related APP detection system of claim 1 or 2, wherein the image recognition analysis module comprises a text information extraction module, a word segmentation module, a fraud-related webpage TF-IDF feature dictionary module, a TF-IDF vector calculation module, a classification machine learning module; the text information extraction module processes the information after the image recognition to obtain text information; the word segmentation module processes the text information to obtain a word group; the TF-IDF vector calculation module carries out TF-IDF vector calculation on the phrases according to the TF-IDF characteristic dictionary module of the fraud-related webpage to obtain phrases TF-IDF vectors; and the classification machine learning module processes the obtained phrase TF-IDF vector to obtain the probability high and low values of APP fraud.
4. The fraud-related APP detection system of claim 3, wherein said fraud-related webpage TF-IDF feature dictionary module updates a TF-IDF feature dictionary through a network server.
5. The fraud-related APP detection system of claim 3, wherein the characteristic data information monitoring module comprises a to-be-detected sample information extraction module and a white-list positive version APP signature certificate characteristic comparison module.
6. The fraud-related APP detection system of claim 5, wherein the whitelist positive APP signature certificate feature comparison module updates whitelist positive APP signature certificate features through a network server.
7. A fraud-related APP detection method is used for detecting whether an APP developed based on a third-party mobile application rapid development platform framework code and an integrated H5 website domain name technology is fraud-related or not, and is characterized by comprising the following steps:
the fraud-related APP detection system is directly installed in the intelligent equipment through user permission or manual operation of a user;
step 100: finding out a first-stage suspected fraud-related APP according to the android manifest information and/or the application name; comparing and filtering the first-stage suspected fraud-related APP and the white-list genuine APP signature certificate, and determining a second-stage suspected fraud-related APP;
step 200: and operating the suspected fraud-related APP at the second level, outputting prompt information, enabling a user to manually operate screen capture to obtain an interface image of the APP operation, carrying out image recognition on the interface image, extracting text information, and analyzing the text information to obtain the probability height value of the APP fraud-related APP.
8. The fraud-related APP detection method of claim 7, wherein said step 100 comprises:
step 110: acquiring android Manifest information and/or an application name of a sample to be detected;
step 120: acquiring a sample signature certificate to be detected, wherein the signature certificate information comprises: owner, validation start time, validation end time, and/or sequence number;
step 130: determining a first-stage suspected fraud-related APP based on the android match rule feature library and the application name match rule feature library;
step 140: and comparing and filtering according to the positive APP signature certificates of the white list, eliminating white list samples, and determining the suspected fraud-related APP of the second level.
9. The fraud-related APP detection method of claim 8, wherein said step 200 comprises:
step 210: the method comprises the steps of capturing a screen of an operating APP to obtain an interface image of the operating APP;
step 220: performing image recognition on the interface image, and extracting text information;
step 230: and performing WORD segmentation on the text information to obtain a phrase, and performing analysis calculation on the phrase to obtain a probability high-low value of APP fraud, wherein the algorithm of the analysis calculation comprises TF-IDF, WORD2VEC or/and BERT.
10. The fraud-related APP detection method of claim 9, wherein said step 230 comprises:
step 231: performing TF-IDF vector calculation on the phrases according to the TF-IDF feature dictionary of the fraud-related webpage to obtain phrases TF-IDF vectors;
step 232: using a classification machine to learn, and processing the obtained phrase TF-IDF vector to obtain the probability high and low values of APP involvement in fraud;
step 233: outputting a list of APPs with high probability of fraud.
11. A readable storage medium having stored thereon a computer program, characterized in that,
the program, when executed by a processor, implements the fraud-related APP detection method of any one of claims 7 to 10.
CN202211692329.XA 2022-12-28 2022-12-28 Fraud-related APP detection system and method Active CN115688107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211692329.XA CN115688107B (en) 2022-12-28 2022-12-28 Fraud-related APP detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211692329.XA CN115688107B (en) 2022-12-28 2022-12-28 Fraud-related APP detection system and method

Publications (2)

Publication Number Publication Date
CN115688107A CN115688107A (en) 2023-02-03
CN115688107B true CN115688107B (en) 2023-04-11

Family

ID=85055081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211692329.XA Active CN115688107B (en) 2022-12-28 2022-12-28 Fraud-related APP detection system and method

Country Status (1)

Country Link
CN (1) CN115688107B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859292B (en) * 2023-02-20 2023-05-09 卓望数码技术(深圳)有限公司 Fraud-related APP detection system, fraud-related APP judgment method and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6039768B1 (en) * 2015-08-12 2016-12-07 日本電信電話株式会社 ADJUSTMENT DEVICE, ADJUSTMENT METHOD, AND ADJUSTMENT PROGRAM
CN107169049B (en) * 2017-04-25 2023-04-28 腾讯科技(深圳)有限公司 Application tag information generation method and device
CN107871080A (en) * 2017-12-04 2018-04-03 杭州安恒信息技术有限公司 The hybrid Android malicious code detecting methods of big data and device
CN114492584A (en) * 2021-12-28 2022-05-13 南方科技大学 Automatic content grading method for android Chinese application market
CN114662033B (en) * 2022-04-06 2024-05-03 昆明信息港传媒有限责任公司 Multi-mode harmful link identification based on text and image
CN115292674A (en) * 2022-08-08 2022-11-04 重庆邮电大学 Fraud application detection method and system based on user comment data

Also Published As

Publication number Publication date
CN115688107A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN107203765B (en) Sensitive image detection method and device
CN113450147B (en) Product matching method, device, equipment and storage medium based on decision tree
CN106713579B (en) Telephone number identification method and device
CN112464237B (en) Static code security diagnosis method and device
CN111861731A (en) Post-credit check system and method based on OCR
CN109801151B (en) Financial falsification risk monitoring method, device, computer equipment and storage medium
CN110209841A (en) A kind of fraud analysis method and device based on swindle case merit
CN113221032A (en) Link risk detection method, device and storage medium
CN114448664A (en) Phishing webpage identification method and device, computer equipment and storage medium
CN115688107B (en) Fraud-related APP detection system and method
CN113946826A (en) A method, system, device and medium for silent analysis and monitoring of vulnerability fingerprints
CN113836297B (en) Training method and device for text emotion analysis model
CN113568934B (en) Data query method and device, electronic equipment and storage medium
CN110955796A (en) Case characteristic information extraction method and device based on record information
CN112818150B (en) Picture content auditing method, device, equipment and medium
CN115171125A (en) Data anomaly detection method
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN112698883A (en) Configuration data processing method, device, terminal and storage medium
CN111143858A (en) Data checking method and device
CN113988226B (en) Data desensitization validity verification method and device, computer equipment and storage medium
CN111931687B (en) Bill identification method and device
CN112163217B (en) Malware variant identification method, device, equipment and computer storage medium
CN115859292B (en) Fraud-related APP detection system, fraud-related APP judgment method and storage medium
Banerjee et al. Quote examiner: verifying quoted images using web-based text similarity
CN113868416A (en) Detection method, device, computer equipment and medium for abnormal short message

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant