CN113918949A - Recognition method of fraud APP based on multi-mode fusion - Google Patents

Recognition method of fraud APP based on multi-mode fusion Download PDF

Info

Publication number
CN113918949A
CN113918949A CN202111515201.1A CN202111515201A CN113918949A CN 113918949 A CN113918949 A CN 113918949A CN 202111515201 A CN202111515201 A CN 202111515201A CN 113918949 A CN113918949 A CN 113918949A
Authority
CN
China
Prior art keywords
fraud
app
installation program
program file
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111515201.1A
Other languages
Chinese (zh)
Inventor
罗峰
谢东岳
卢永强
王翔
袁振龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fule Technology Co ltd
Original Assignee
Beijing Fule Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fule Technology Co ltd filed Critical Beijing Fule Technology Co ltd
Priority to CN202111515201.1A priority Critical patent/CN113918949A/en
Publication of CN113918949A publication Critical patent/CN113918949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Embodiments of the present disclosure provide a method, apparatus, device and computer-readable storage medium for identifying a fraud APP based on multimodal fusion. The method comprises the steps of obtaining an installation program file of an APP; extracting the characteristics of the installation program file to obtain the characteristics of the installation program file; performing multi-mode fusion on the characteristics of the installation program file, and identifying the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs. In this way, the efficiency of identification of fraudulent APPs is improved.

Description

Recognition method of fraud APP based on multi-mode fusion
Technical Field
Embodiments of the present disclosure relate generally to the field of data analysis, and more particularly, to a method, apparatus, device and computer-readable storage medium for identifying a fraud APP based on multimodal fusion.
Background
Against the background of the era of rapid development of information communication technology, telecommunications phishing has become one of the most socially influential forms of crime. By utilizing communication and network technologies, the hidden fraud molecules can carry out remote and non-contact fraud on unspecified majority of people, seriously infringe the property rights and personal rights of the masses and seriously threaten the social security of China.
With the development of mobile internet technology, fraud molecular committing means are continuously upgraded, and fraud means have been shifted from traditional telecommunication fraud to novel phishing. Among them, APP fraud is growing explosively, and for this reason, APP fraud has the characteristics of low manufacturing cost, fast updating speed, low acceptance threshold, high imperceptibility, and the like, so that the fraud development situation is gradually shifting to the APP end. The traditional anti-fraud means can only cover website fraud and cannot adapt to the development situation of APP fraud; in addition, the traditional anti-fraud means mostly relies on manual marking of fraud websites, so that the cost is high, the efficiency is low, and the traditional anti-fraud means cannot keep up with the update cycle of APP fraud.
The current identification technology for fraud APPs generally adopts the modes of package name, APP name matching and URL extraction by sandbox operation for analysis, has less characteristic dimensionality and does not consider the actual situation of fraud cases.
Disclosure of Invention
According to an embodiment of the present disclosure, a recognition scheme for a fraud APP based on multimodal fusion is provided.
In a first aspect of the present disclosure, a method of identifying a fraud APP based on multimodal fusion is provided. The method comprises the following steps:
obtaining an installation program file of an APP;
extracting the characteristics of the installation program file to obtain the characteristics of the installation program file;
performing multi-mode fusion on the characteristics of the installation program file, and identifying the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
Further, the obtaining of the installation program file of the APP includes:
obtaining application downloading clue information from an internet behavior log, threat information, application market and/or forum community comments;
and filtering the application downloading clue information, and crawling the installation program file of the APP.
Further, the extracting the features of the installation program file to obtain the features of the installation program file includes:
according to the type of the installation program file, extracting features to obtain the features of the installation program file; wherein the content of the first and second substances,
extracting metadata features in the installation program file through a metadata processing model; the metadata features include APP name, file size, certificate, and/or developer information;
extracting content class characteristics in the installation program file through a content processing model; the content class characteristics comprise icons and internal texts;
extracting dynamic characteristics of the installation program file through a dynamic extraction model; the dynamic characteristics comprise memory data and geographical position information;
and extracting the reverse class characteristics of the installation program file through a reverse extraction model, and determining the registration logic, the login logic and/or the SDK information of the installation program file.
Further, the extracting, by using a reverse extraction model, the reverse class feature of the installer file, and determining the registration logic and the login logic of the installer file include:
reversely pushing the installation program file through a dynamic extraction model, and restoring a source code of the installation program file;
and identifying the source code through an enumeration type to obtain the registration logic and the login logic.
Further, the multi-modal fusion of the features of the installation program file, and based on the fused features and the fraud-related feature library, the identifying the APP includes:
carrying out original value combination on the non-numerical value features in the installation program file to obtain an original value feature set;
and comparing the original value feature set with a fraud-related feature library to complete the identification of the APP.
Further, the multi-modal fusion of the features of the installation program file, and based on the fused features and the fraud-related feature library, the identifying the APP includes:
converting text form features in the installation program file into vector representation through a natural language model;
converting the picture form characteristics in the installation program file into vector representation through a preset coding form;
splicing the vector corresponding to the text form characteristic and the vector corresponding to the picture form characteristic to obtain a vector fusion characteristic;
and carrying out cluster analysis on the vector fusion characteristics and the fraud-related characteristic library, determining the probability and the type of the APP belonging to the fraud APP, and finishing the identification of the APP.
Further, still include:
if the APP belongs to a fraud APP, extracting the network fingerprint of the APP, identifying the crowd using the APP, and sending anti-fraud information to the crowd; at the same time, the network fingerprint is updated to the fraud-related feature library.
In a second aspect of the present disclosure, a device for recognizing a fraud APP based on multimodal fusion is provided. The device includes:
the acquisition module is used for acquiring an installation program file of the APP;
the extraction module is used for extracting the characteristics of the installation program file to obtain the characteristics of the installation program file;
the recognition module is used for performing multi-mode fusion on the characteristics of the installation program file and recognizing the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
In a third aspect of the disclosure, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.
In a fourth aspect of the present disclosure, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method as in accordance with the first aspect of the present disclosure.
According to the method for identifying the fraud APP based on the multi-mode fusion, the installation program file of the APP is obtained; extracting the characteristics of the installation program file to obtain the characteristics of the installation program file; performing multi-mode fusion on the characteristics of the installation program file, and identifying the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises the original value feature set and the vector fusion feature of the fraud APP, and efficient identification of the fraud APP is achieved.
It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an exemplary operating environment in which embodiments of the present disclosure can be implemented;
FIG. 2 shows a flow chart of a method of identification of a fraud APP based on multimodal fusion, according to an embodiment of the present disclosure;
FIG. 3 illustrates a feature extraction flow diagram for an installer file according to an embodiment of the disclosure;
FIG. 4 shows a block diagram of a recognition apparatus based on multimodal fusion fraud APP, according to an embodiment of the present disclosure;
FIG. 5 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present multimodal fusion based fraud APP identification method or multimodal fusion based fraud APP identification apparatus may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a model training application, a video recognition application, a web browser application, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they may be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminals 101, 102, 103 are hardware, a video capture device may also be installed thereon. The video acquisition equipment can be various equipment capable of realizing the function of acquiring video, such as a camera, a sensor and the like. The user may capture video using a video capture device on the terminal 101, 102, 103.
The server 105 may be a server that provides various services, such as a background server that processes data displayed on the terminal devices 101, 102, 103. The background server may perform processing such as analysis on the received data, and may feed back a processing result (e.g., an identification result) to the terminal device.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In particular, in the case where the target data does not need to be acquired from a remote place, the above system architecture may not include a network but only a terminal device or a server.
As shown in fig. 2, it is a flowchart of an identification method of fraud APP based on multi-modal fusion according to an embodiment of the present application. As can be seen from fig. 2, the identification method of fraud APP based on multi-modal fusion of the embodiment includes the following steps:
s210, obtaining an installation program file of the APP.
The APP installer files described in the present disclosure are usually batch APP installer files in a non-white list, that is, installer files with fraud APPs, and the types of the APP installer files may be apk and the like.
In this embodiment, an executing agent (e.g., a server shown in fig. 1) of the recognition method for fraud APPs based on multimodal fusion may acquire an installer file of an APP in a wired manner or a wireless connection manner.
Further, the execution main body may obtain an installation program file of an APP sent by an electronic device (for example, the terminal device shown in fig. 1) communicatively connected to the execution main body, or may obtain an installation program file of an APP stored locally in advance.
In some embodiments, the APP installer file may be obtained by:
obtaining application downloading clue information from online behavior logs, threat information, application markets and forum community comments in batches;
filtering and screening the user internet log to screen out an access link containing an APP installation program (such as apk);
continuously monitoring the evaluation areas of high-risk forums (determined by manual calibration or big data analysis) and communities (mainly illegal categories such as stocks, investment and financing) to obtain download links containing APP installation programs released by individual users;
filtering a white list of the download link containing the APP installation program, removing a download address of a normal APP and reducing the data processing amount;
crawling the filtered download link to obtain the installation program file of the APP.
S220, extracting the characteristics of the installation program file to obtain the characteristics of the installation program file.
In some embodiments, different extraction models are adopted for the type characteristics of the installation program file to perform characteristic extraction on the installation program file;
specifically, through a metadata processing model, extracting metadata features in the installation program file; the metadata characteristics comprise package names, APP names, file sizes, claim authorities, certificates, developer information, whether reinforcement is needed, Service and/or Receive component information and the like; the metadata is a basic element of an installer file and can be directly decomposed through the metadata processing model;
extracting content class characteristics in the installation program file through a content processing model; when the installation program file is an APK file, the content type characteristics comprise an APK icon, an APK internal text and/or an APK file resource and the like; icons of installation files of the same type generally have the same commonality, e.g., APK icons of investment financing classes tend to have monetary designations; icons for network apps of the type of network appliances often bear implications, etc.; the content processing model comprises an analysis tool corresponding to the installation program file, such as an analysis tool corresponding to the APK file;
extracting dynamic characteristics of the installation program file through a dynamic extraction model, putting the installation program file into a sandbox for operation, extracting IP, URL and Host accessed in the program operation process, and acquiring memory data and IO data in the operation process; acquiring the geographical position of the server and the like through the IP and the Host of the server;
extracting the reverse class characteristics of the installation program file through a reverse extraction model, and determining the registration logic, the login logic and/or the SDK information of the installation program file;
the source code of the installation program file is reversely pushed, the source code of the installation program file is restored, the registration logic, the login logic and the called SDK information of the program can be obtained through analysis from the source code, the registration logic and the login logic of the fraud program are often not as strict as those of a normal program, for example, a mobile phone verification code is not needed;
the registration logic and the login logic may be identified by enumeration types, such as face recognition, bank card registration, mobile phone number verification, verification code verification, invitation code verification, and/or the like, and an APP may carry one or more enumeration identifiers. The installer of the fraud APP will often call a special SDK interface to service its fraud process.
The various extraction models used in this step may be obtained by training through a mechanical learning method, or may be existing extraction tools, and the like, and are not limited in this disclosure.
S230, performing multi-mode fusion on the characteristics of the installation program file, and identifying the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
In some embodiments, referring to fig. 3, original value merging is performed on non-numerical features in the installer file to obtain an original value feature set; if so, the package name features of the fraud-related APP are integrated to obtain a fraud-related package name assembly; forming a set of name features of the fraud-related APP to obtain a set of name features of the fraud-related APP; forming a set of the certificate features of the fraud-related APP to obtain a fraud-related certificate set; forming a set of the information of the developers involved in the fraud APP to obtain a set of the developers involved in the fraud and the like;
further, comparing the original value feature set with the fraud-related feature library, when some features of the installation program file are overlapped or close to the features in the fraud-related feature library (some features can be preset according to application scenes), directly determining the installation program file as a fraud APK, and further determining the fraud type of the installation program file; the fraud type is determined according to coincident or similar characteristics.
In some embodiments, referring to fig. 3, if an APK cannot be determined by the set class feature, it is determined whether it belongs to a fraud-related APP by the vector feature clustering method;
specifically, text form features in the installation program file are converted into vector representations such as package names, names and the like through a natural language model, and cosine distances of the vectors are used for representing the similarity degree between two texts;
converting the picture form characteristics in the installation program file into vector representation through a preset coding form;
if so, performing One-Hot coding on the registration logic and the login logic, and converting the One-Hot coding into a vector form;
when the icon is processed, the convolutional neural network is utilized to reduce the dimension of the picture, vectorization characteristics of the picture are extracted, and the cosine distance of the vector is used for representing the similarity degree between the two pictures;
and splicing the obtained vector features to obtain fusion features.
Further, performing cluster analysis on the fusion features and the fusion features in the fraud rule base, and judging the probability of the fusion features belonging to fraud APPs and corresponding fraud types;
the clustering analysis can adopt a K-means clustering algorithm, the fraud-related feature library is used as reference data, the algorithm obtains the distance between the installation program to be identified and each fraud-related installation program set, and the minimum distance is the minimum similarity;
further, a distance threshold may be set in the above process, the distance being less than the threshold being a fraudulent installer, and the type of the fraudulent installer being taken as the category result of the installer to be identified; the fraud categories of installation programs can be classified into financial loan fraud, swipe ticket fraud, impersonation public inspection fraud, internet-terminal fraud, and "pig-killing-tray fraud, and the like, and are not further limited in this disclosure.
In some embodiments, the fraud-related feature library may be constructed by:
obtaining fraud-related APP samples through big data analysis and/or manual labeling and other modes, such as APK files and/or accessed websites provided when the victim reports the cases, and if the victim provides the cheated websites, manually accessing the websites to download the case-related APK;
extracting the characteristics of the APK file, and referring to step S220; the original value feature set and the vector fusion feature of the fraud APP refer to the corresponding step in step S230, and are not described again;
and constructing a fraud-related feature library based on the extracted features, the original value feature set and the vector fusion features.
Further, still include:
if the APP belongs to a fraud APP, extracting the network fingerprint of the APP, identifying the crowd using the APP, and sending anti-fraud information to the crowd; at the same time, updating the network fingerprint to the fraud-related feature library;
the network fingerprint of the installation program can be acquired in a mode of combining static analysis and a dynamic sandbox; the network fingerprint comprises a server domain name, a server IP and/or an interface link and the like;
based on the network characteristics, people who use fraud APP are identified, loss prevention is dissuaded, for example, reporting is carried out, alarm information is sent to related personnel, and the occurrence of fraud cases is restrained.
According to the embodiment of the disclosure, the following technical effects are achieved:
through the mode of multi-mode fusion, the APP is identified from multiple dimensions, the efficient identification of fraud APPs is realized, and the property right and the personal right of the people are protected.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.
Fig. 4 shows a block diagram of an apparatus 400 for recognizing a fraud APP based on multi-modal fusion according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 includes:
an obtaining module 410, configured to obtain an installation program file of an APP;
an extraction module 420, configured to perform feature extraction on the installer file to obtain features of the installer file;
the recognition module 430 is configured to perform multi-modal fusion on the features of the installation program file, and recognize the APP based on the fused features and the fraud-related feature library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
FIG. 5 illustrates a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure. As shown, device 500 includes a Central Processing Unit (CPU) 501 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 502 or loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The CPU 501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The processing unit 501 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into RAM 703 and executed by CPU 501, one or more steps of method 200 described above may be performed. Alternatively, in other embodiments, CPU 501 may be configured to perform method 200 in any other suitable manner (e.g., by way of firmware).
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A method for identifying fraud APP based on multi-modal fusion is characterized by comprising the following steps:
obtaining an installation program file of an APP;
extracting the characteristics of the installation program file to obtain the characteristics of the installation program file;
performing multi-mode fusion on the characteristics of the installation program file, and identifying the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
2. The method of claim 1, wherein obtaining the installer file for the APP comprises:
obtaining application downloading clue information from an internet behavior log, threat information, application market and/or forum community comments;
and filtering the application downloading clue information, and crawling the installation program file of the APP.
3. The method of claim 2, wherein the extracting the features of the installer file comprises:
according to the type of the installation program file, extracting features to obtain the features of the installation program file; wherein the content of the first and second substances,
extracting metadata features in the installation program file through a metadata processing model; the metadata features include APP name, file size, certificate, and/or developer information;
extracting content class characteristics in the installation program file through a content processing model; the content class characteristics comprise icons and internal texts;
extracting dynamic characteristics of the installation program file through a dynamic extraction model; the dynamic characteristics comprise memory data and geographical position information;
and extracting the reverse class characteristics of the installation program file through a reverse extraction model, and determining the registration logic, the login logic and/or the SDK information of the installation program file.
4. The method according to claim 3, wherein the extracting reverse class features of the installer file through a reverse extraction model, and the determining the registration logic and the login logic of the installer file comprises:
reversely pushing the installation program file through a reverse extraction model, and restoring a source code of the installation program file;
and identifying the source code through an enumeration type to obtain the registration logic and the login logic.
5. The method as claimed in claim 4, wherein the multimodal fusing of the features of the installer file, and identifying the APP based on the fused features and the fraud-related feature library comprises:
carrying out original value combination on the non-numerical value features in the installation program file to obtain an original value feature set;
and comparing the original value feature set with a fraud-related feature library to complete the identification of the APP.
6. The method as claimed in claim 5, wherein the multimodal fusing of the features of the installer file, and identifying the APP based on the fused features and the fraud-related feature library comprises:
converting text form features in the installation program file into vector representation through a natural language model;
converting the picture form characteristics in the installation program file into vector representation through a preset coding form;
splicing the vector corresponding to the text form characteristic and the vector corresponding to the picture form characteristic to obtain a vector fusion characteristic;
and carrying out cluster analysis on the vector fusion characteristics and the fraud-related characteristic library, determining the probability and the type of the APP belonging to the fraud APP, and finishing the identification of the APP.
7. The method of claim 6, further comprising:
if the APP belongs to a fraud APP, extracting the network fingerprint of the APP, identifying the crowd using the APP, and sending anti-fraud information to the crowd; at the same time, the network fingerprint is updated to the fraud-related feature library.
8. An apparatus for recognizing fraud APP based on multi-modal fusion, comprising:
the acquisition module is used for acquiring an installation program file of the APP;
the extraction module is used for extracting the characteristics of the installation program file to obtain the characteristics of the installation program file;
the recognition module is used for performing multi-mode fusion on the characteristics of the installation program file and recognizing the APP based on the fused characteristics and the fraud-related characteristic library; the fraud-related feature library comprises original-value feature sets and vector fusion features of fraud APPs.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the program, implements the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202111515201.1A 2021-12-13 2021-12-13 Recognition method of fraud APP based on multi-mode fusion Pending CN113918949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111515201.1A CN113918949A (en) 2021-12-13 2021-12-13 Recognition method of fraud APP based on multi-mode fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111515201.1A CN113918949A (en) 2021-12-13 2021-12-13 Recognition method of fraud APP based on multi-mode fusion

Publications (1)

Publication Number Publication Date
CN113918949A true CN113918949A (en) 2022-01-11

Family

ID=79249098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111515201.1A Pending CN113918949A (en) 2021-12-13 2021-12-13 Recognition method of fraud APP based on multi-mode fusion

Country Status (1)

Country Link
CN (1) CN113918949A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598777A (en) * 2022-02-25 2022-06-07 马上消费金融股份有限公司 Intention detection method, device, electronic equipment and storage medium
CN115859292A (en) * 2023-02-20 2023-03-28 卓望数码技术(深圳)有限公司 Fraud-related APP detection system, judgment method and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190149575A1 (en) * 2017-11-13 2019-05-16 International Business Machines Corporation System to prevent scams
CN111222131A (en) * 2020-01-07 2020-06-02 上海欣方智能系统有限公司 Internet fraud APK (android Package) identification method
CN113034331A (en) * 2021-05-06 2021-06-25 国家计算机网络与信息安全管理中心上海分中心 Android gambling application identification method and system based on multi-mode fusion
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190149575A1 (en) * 2017-11-13 2019-05-16 International Business Machines Corporation System to prevent scams
CN111222131A (en) * 2020-01-07 2020-06-02 上海欣方智能系统有限公司 Internet fraud APK (android Package) identification method
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN113034331A (en) * 2021-05-06 2021-06-25 国家计算机网络与信息安全管理中心上海分中心 Android gambling application identification method and system based on multi-mode fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598777A (en) * 2022-02-25 2022-06-07 马上消费金融股份有限公司 Intention detection method, device, electronic equipment and storage medium
CN115859292A (en) * 2023-02-20 2023-03-28 卓望数码技术(深圳)有限公司 Fraud-related APP detection system, judgment method and storage medium

Similar Documents

Publication Publication Date Title
CN112148987B (en) Message pushing method based on target object activity and related equipment
CN107730389A (en) Electronic installation, insurance products recommend method and computer-readable recording medium
CN108614970B (en) Virus program detection method, model training method, device and equipment
CN110929799B (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN113918949A (en) Recognition method of fraud APP based on multi-mode fusion
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN112394908A (en) Method and device for automatically generating embedded point page, computer equipment and storage medium
CN112330331A (en) Identity verification method, device and equipment based on face recognition and storage medium
CN107437088B (en) File identification method and device
CN110895811B (en) Image tampering detection method and device
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN111639360A (en) Intelligent data desensitization method and device, computer equipment and storage medium
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
CN111275071B (en) Prediction model training method, prediction device and electronic equipment
CN113518075A (en) Phishing early warning method and device, electronic equipment and storage medium
CN112182520B (en) Identification method and device of illegal account number, readable medium and electronic equipment
CN112307464A (en) Fraud identification method and device and electronic equipment
CN114219664A (en) Product recommendation method and device, computer equipment and storage medium
CN113901817A (en) Document classification method and device, computer equipment and storage medium
CN113362069A (en) Dynamic adjustment method, device and equipment of wind control model and readable storage medium
CN107368597B (en) Information output method and device
CN112733645A (en) Handwritten signature verification method and device, computer equipment and storage medium
CN113792342B (en) Desensitization data reduction method, device, computer equipment and storage medium
CN116911304B (en) Text recommendation method and device
CN110674497B (en) Malicious program similarity calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220111

RJ01 Rejection of invention patent application after publication