CN111400705B - Application program detection method, device and equipment - Google Patents

Application program detection method, device and equipment Download PDF

Info

Publication number
CN111400705B
CN111400705B CN202010143430.4A CN202010143430A CN111400705B CN 111400705 B CN111400705 B CN 111400705B CN 202010143430 A CN202010143430 A CN 202010143430A CN 111400705 B CN111400705 B CN 111400705B
Authority
CN
China
Prior art keywords
page
preset
user
information
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010143430.4A
Other languages
Chinese (zh)
Other versions
CN111400705A (en
Inventor
金璐
黄继堂
张鸿翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010143430.4A priority Critical patent/CN111400705B/en
Publication of CN111400705A publication Critical patent/CN111400705A/en
Application granted granted Critical
Publication of CN111400705B publication Critical patent/CN111400705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a detection method, a device and equipment of an application program, wherein the method comprises the following steps: when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page; performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; inputting word segmentation information contained in page content of a preset page and word segmentation information contained in input information of a user into a pre-trained page recognition model to obtain target probability that the preset page is an induced acquisition page which has malicious infringement on privacy information of the user; and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information.

Description

Application program detection method, device and equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for detecting an application program.
Background
In order to meet the diversified use requirements of the memory of the electronic device and the user, the applet (i.e., the application program that needs to be installed in the host program for operation) is created, but the malicious applet can cause the problems of disclosure of the privacy information of the user and property loss, so how to detect whether the applet is a malicious application program that maliciously invades the privacy information of the user to avoid the problems of disclosure of the privacy of the user and economic loss, which becomes the focus of attention of the program developer.
At present, in risk prevention and control of an application program, whether the application program is a malicious application program which induces and collects privacy information of a user can be determined based on reporting information of the user, however, whether the application program has risks is determined through the reporting information of the user, and the problem of risk perception hysteresis exists.
Disclosure of Invention
An object of the embodiments of the present specification is to provide a method, an apparatus, and a device for detecting an application program, so as to provide a solution that can timely and accurately detect whether the application program is a malicious application program that maliciously violates user privacy information, and process the malicious application program.
In order to implement the above technical solution, the embodiments of the present specification are implemented as follows:
in a first aspect, an embodiment of the present specification provides a method for detecting an application program, where the method includes: when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program; performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and determining whether a preset page of the target application program is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In a second aspect, an embodiment of the present disclosure provides a method for detecting an application, where the method is applied to a host program, and the method includes: when detecting that a target application program in the host program runs, respectively acquiring page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page based on a preset feature extraction algorithm, wherein the target application program is a small program carried in the host program; determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features; under the condition that the preset page is determined to be a suspected induced acquisition page, sending the page content of the preset page and the input information of the user to a server, and receiving a target probability that the preset page sent by the server is an induced acquisition page which maliciously invades the privacy information of the user, wherein the target probability is a probability obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, and the page recognition model is a model obtained by training based on a preset page content sample and a preset user input information sample and used for judging the probability of the induced acquisition page; and determining whether a preset page of the target application program is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In a third aspect, a method for detecting an application program provided in an embodiment of the present specification is applied to a server, and the method includes: acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program; performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page induces the collection page to have malicious infringement on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of inducing the collection page; and sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In a fourth aspect, an embodiment of the present specification provides an apparatus for detecting an application, where the apparatus includes: the information acquisition module is used for acquiring page content of a preset page opened when a target application program in a host program runs and input information of a user on the preset page when the target application program is detected to run, wherein the target application program is a small program carried in the host program; the first processing module is used for performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; the probability acquisition module is used for inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain the target probability that the preset page is an induced acquisition page with malicious infringement on the privacy information of the user, and the page recognition model is a model which is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and the page detection module is used for determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
In a fifth aspect, an embodiment of the present specification provides an apparatus for detecting an application, where the apparatus includes: the feature extraction module is used for respectively acquiring page features corresponding to the page content of the preset page and input features corresponding to the input information of the user in the preset page based on a preset feature extraction algorithm when the target application program in the host program is detected to run, wherein the target application program is a small program carried in the host program; the page determining module is used for determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features; the information sending module is used for sending the page content of the preset page and the input information of the user to a server under the condition that the preset page is determined to be a suspected induced acquisition page, and receiving the target probability that the preset page sent by the server is an induced acquisition page which has malicious infringement on user privacy information, wherein the target probability is the probability obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, and the page recognition model is a model which is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and the page detection module is used for determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
In a sixth aspect, an embodiment of the present specification provides an apparatus for detecting an application, where the apparatus includes: the information receiving module is used for acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet carried in the host program; the word segmentation processing module is used for carrying out word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; the probability acquisition module is used for inputting word segmentation information contained in page content of the preset page and word segmentation information contained in input information of the user into a pre-trained page recognition model to obtain target probability that the preset page is an induced acquisition page with malicious invasion on user privacy information, and the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and the probability sending module is used for sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In a seventh aspect, an embodiment of the present specification provides an apparatus for detecting an application, where the apparatus for detecting an application includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program; performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
In an eighth aspect, an embodiment of the present specification provides an apparatus for detecting an application, where the apparatus for detecting an application includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: when detecting that a target application program in the host program runs, respectively acquiring page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page based on a preset feature extraction algorithm, wherein the target application program is a small program carried in the host program; determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features; under the condition that the preset page is determined to be a suspected induced acquisition page, sending the page content of the preset page and the input information of the user to a server, and receiving a target probability that the preset page sent by the server is an induced acquisition page which maliciously invades the privacy information of the user, wherein the target probability is a probability obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, and the page recognition model is a model obtained by training based on a preset page content sample and a preset user input information sample and used for judging the probability of the induced acquisition page; and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
In a ninth aspect, an embodiment of the present specification provides an apparatus for detecting an application program, where the apparatus for detecting an application program includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program; performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user; inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page; and sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the present specification, and for those skilled in the art, other drawings may be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a method for detecting an application program;
FIG. 2 is a schematic view of a default page of the present specification;
FIG. 3 is a flow chart of another embodiment of a method for detecting applications;
FIG. 4 is a flow chart of another embodiment of a method for detecting applications;
FIG. 5 is a flow chart of another embodiment of a method for detecting applications;
FIG. 6 is a flow chart of another embodiment of a method for detecting applications;
FIG. 7 is a flow chart of another embodiment of a method for detecting applications;
FIG. 8 is a flow chart of another embodiment of a method for detecting applications;
FIG. 9 is a schematic diagram of a detection process of an application of the present disclosure;
FIG. 10 is a schematic structural diagram of an embodiment of an apparatus for detecting an application according to the present disclosure;
FIG. 11 is a schematic structural diagram of another embodiment of a detection device for application programs in the present specification;
FIG. 12 is a schematic diagram of an embodiment of a detection apparatus for detecting a program according to another embodiment of the present disclosure;
FIG. 13 is a schematic structural diagram of a detection device for an application according to the present disclosure;
FIG. 14 is a schematic diagram of an application detection device according to the present disclosure;
fig. 15 is a schematic structural diagram of a detection device of another application program in the present specification.
Detailed Description
The embodiment of the specification provides a method, a device and equipment for detecting an application program.
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.
Example one
As shown in fig. 1, an execution subject of the method may be a terminal device or a server where a host program is located, where the terminal device may be a device such as a personal computer, or may also be a mobile terminal device such as a mobile phone and a tablet computer, and the server may be an independent server or a server cluster composed of multiple servers. The method may specifically comprise the steps of:
in S102, when it is detected that a target application program in a host program runs, page content of a preset page opened when the target application program runs and input information of a user on the preset page are acquired.
In practical application, generally, the host program and the applet do not belong to the same developer, that is, the applet is generally a third-party application program relative to the host program, for example, the host program may be an instant messaging application program, the target application program may be an applet developed by a certain game development mechanism capable of being loaded in the instant messaging application program, and the like, the preset page may be a page requiring a user to input information, for example, the preset page may be a lottery page in the target application program, and the user needs to input personal information such as an identification number and a mobile phone number in the lottery page to perform lottery.
In implementation, in order to meet diversified use requirements of a memory of an electronic device and a user, an applet (i.e., an application program that needs to be installed in a host program for running) is created, but a malicious applet may cause problems such as disclosure of private information of the user and property loss, and therefore, how to detect whether the applet is a malicious application program that maliciously invades private information of the user to avoid the problems such as disclosure of privacy of the user and economic loss becomes a focus of attention of a program developer. At present, in risk prevention and control of an application program, whether the application program is a malicious application program which induces and collects privacy information of a user can be determined based on reporting information of the user, however, whether the application program has risks is determined through the reporting information of the user, and the problem of risk perception hysteresis exists. Therefore, the embodiments of the present disclosure provide a technical solution that can solve the above problems, and the following contents may be specifically referred to.
The host program can monitor the running states of the multiple applets, when the fact that a user opens a certain applet and continuous information input is performed in a preset page is detected, the applet is the target application program, and the host program can obtain the page content of the preset page and the input information of the user. In addition, for different applets, a designated page in the applet may be set as a preset page, and if the host program detects that a user runs a certain applet (i.e., a target application) and performs continuous information input on the designated page of the applet (i.e., a preset page), the host program may obtain page content of the preset page and input information of the user on the preset page.
In addition, when detecting that the user inputs information in the preset page of the target application, the host program may trigger the terminal device to obtain the page content of the preset page and the input information of the user on the preset page. Or, the host program may also send a page detection request to the server, and send the page content of the preset page and the input information of the user on the preset page to the server, that is, the server may obtain the page content of the preset page and the input information of the user on the preset page.
Taking a host program as an application program capable of providing financial services (such as transfer services and payment services) for a user as an example, a plurality of applets (such as a game applet developed by a game development mechanism and a financial applet developed by a financial institution) may be loaded in the host program, the user may open the game applet in the host program and input personal information such as a phone number and a gender of the user in a page 1 of the game applet, and the host program may trigger a terminal device to acquire page content of the page 1 (which is a preset page) and input information of the user when detecting that the user inputs information in the page 1 of the running game applet. Or, the host program may also send the page content of the page 1 (that is, the preset page) and the input information of the user to the server, that is, the server may obtain the page content of the page 1 (that is, the preset page) and the input information of the user.
In S104, word segmentation processing is performed on the page content of the preset page and the input information of the user, so as to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user.
In implementation, as shown in fig. 2, taking a certain game applet in which the target application is an instant messaging application as an example, the preset page may be a lottery page, and the input information of the user may be a name, a mobile phone number, and an address input on the lottery page. After the page content of the preset page and the input information of the user are obtained, word segmentation processing can be performed on the page content of the preset page and the input information of the user based on a preset sample library, and the obtained word segmentation information contained in the page content of the preset page can include: the term information included in the input information of the user, such as "lucky", "lucky draw", "congratulatory happiness", "first prize", "prize", and the like, may include: "zhangsan", "130XXXXXXXX", "a city B-cell C-cell".
In addition, different target application programs can correspond to different sample libraries, and different sample libraries can be selected according to different target application programs so as to perform word segmentation processing on the page content of the preset page and the input information of the user.
In addition, the method for performing word segmentation processing on the page content of the preset page and the input information of the user is an optional and realizable word segmentation processing method, and besides, a plurality of different word segmentation processing methods may be available, which may be different according to different actual application scenarios, and this is not specifically limited in this embodiment of the specification.
In S106, the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user are input into the pre-trained page recognition model, so as to obtain a target probability that the preset page is an induced acquisition page in which the privacy information of the user is maliciously invaded.
The page recognition model can be obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of inducing the collection of the page. The user privacy information may include multiple types, for example, a bank card account number and a password, an identity card number, facial data, fingerprint data, friend information and chat information of a predetermined instant messaging application, and in practical applications, the user privacy information may further include other information besides the above information, which may be specifically set according to practical situations, and this is not limited in the embodiment of the present specification.
In implementation, the page recognition model may be a model trained based on a predetermined page content sample and a predetermined user input information sample, for example, the page recognition model may be a model constructed based on a preset text recognition algorithm, trained based on a predetermined page content sample containing induced keywords and non-induced keywords, and a user input information sample containing predetermined sensitive information and non-sensitive information.
The word segmentation information contained in the page content of the preset page obtained by word segmentation processing and the word segmentation information contained in the input information of the user are input into a page recognition model trained in advance, so that the page content contained in the preset page and the input information of the user can be accurately recognized, and the target probability that the preset page is an induction collection page can be obtained.
In S108, based on the target probability, it is determined whether the preset page of the target application is an induced collection page that maliciously invades the user privacy information, so as to protect the user privacy information and prevent the target application from maliciously invading the user privacy information.
In implementation, whether the preset page is an induced collection page may be determined based on a preset risk probability threshold, for example, if the target probability that the preset page of the target application is an induced collection page having malicious invasion on the user privacy information is 90%, and the preset risk probability threshold is assumed to be 80%, it may be determined that the preset page of the target application is an induced collection page having malicious invasion on the user privacy information.
The determination method of whether the preset page is the induced acquisition page is an optional and realizable determination method, and in an actual application scenario, there may be a plurality of different determination methods, for example, different preset risk probability thresholds may be set according to different categories of the target application program to determine whether the preset page is the induced acquisition page, or different preset risk probability thresholds may be set according to different categories of the host program to determine whether the preset page is the induced acquisition page, and the like, and different determination methods may be selected according to different actual application scenarios, which is not specifically limited in the embodiments of the present specification.
The embodiment of the specification provides an application program detection method, which includes the steps of obtaining page content of a preset page opened when a target application program in a host program runs and input information of a user in the preset page when the target application program in the host program runs, carrying out word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain target probability that the preset page is an induced acquisition page which has malicious infringement on user privacy information, and determining whether the preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability to protect the user privacy information and prevent the target application program from maliciously infringement on the user privacy information. Therefore, the target probability that the preset page is the induced acquisition page with malicious invasion to the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, namely, whether the preset page of the target application program is the induced acquisition page with malicious invasion to the user privacy information or not can be timely and accurately determined according to the target probability, and the detection accuracy and the detection efficiency of the application program are improved.
Example two
As shown in fig. 3, an execution subject of the method may be a terminal device or a server where a host program is located, where the terminal device may be a device such as a personal computer, or may also be a mobile terminal device such as a mobile phone and a tablet computer, and the server may be an independent server or a server cluster composed of multiple servers. The method specifically comprises the following steps:
in S302, when it is detected that a target application in a host program runs, page content of a preset page opened when the target application runs and input information of a user on the preset page are obtained.
For the specific processing procedure of S302, reference may be made to relevant contents of S102 in the first embodiment, which is not described herein again.
In S304, based on a preset feature extraction algorithm, a page feature corresponding to the page content of the preset page and an input feature corresponding to the input information of the user are respectively obtained.
In implementation, for example, the preset page may be a page as shown in fig. 2, and after the preset page is obtained, text feature extraction may be performed on the preset page based on the preset lexicon to obtain a page feature corresponding to the page content of the preset page and an input feature corresponding to the input information of the user. Or, the word segmentation processing may be performed on the page content of the preset page and the input information of the user, and the word segmentation information included in the page content of the preset page is used as the page feature, and the word segmentation information included in the input information of the user is used as the input feature.
The method for acquiring the page feature and the input feature is an optional and realizable acquisition method, and in an actual application scenario, there may be a plurality of different acquisition methods, which may be different according to different actual application scenarios, and this is not specifically limited in this embodiment of the present specification.
In S306, it is determined whether the preset page is a suspected induced collection page based on the preset feature matching algorithm, the page feature, and the input feature.
In practice, the processing manner of S306 may be various, and specifically may include the following steps one to three.
Step one, carrying out induction matching detection on the page characteristics based on a preset induction keyword characteristic matching rule to obtain a first detection result.
In implementation, a matching rate of the page features and the preset induction keyword features may be determined based on a preset regular matching rule, and the first detection result may be determined based on the matching rate. For example, the preset induction keyword features may include "draw", "congratulations", "first-class prizes", and the like, and a corresponding regular expression may be designed based on the preset induction keyword features, and a matching rate of the page features and the preset induction keyword features may be determined based on the regular expression.
And secondly, performing sensitive information matching detection on the input features based on a preset sensitive information feature matching algorithm to obtain a second detection result.
In an implementation, for example, whether the input features include the sensitive information features may be detected, such as whether the input features include a 28-digit identity card number feature, whether the input features include an 11-digit mobile phone number feature, and the like, and the second detection result may be determined according to the number of items of the sensitive information features included in the input features, for example, the second detection result may be that the second detection result includes 3 items of the sensitive information features. Or, the second detection result may be determined according to a sensitivity coefficient of the sensitive information feature, for example, based on a preset sensitive information feature matching algorithm, it may be detected that the input feature includes three sensitive information features, that is, an identification number, a bank card number, and a mobile phone number, where the sensitivity coefficient of the identification number feature is 1, the sensitivity coefficient of the bank card number feature is 1.2, and the sensitivity coefficient of the mobile phone number feature is 0.4, and then the second detection result may be: the sensitivity corresponding to the input feature is 1+1.2+0.4=2.6.
And step three, determining whether the preset page is a suspected induced acquisition page or not based on the first detection result and the second detection result.
In implementation, whether the preset page is a suspected induced collection page or not can be determined according to the matching rate in the first detection result and the number of items or the sensitivity of the sensitive information features in the second detection result. For example, if the matching rate in the first detection result is greater than a preset matching rate threshold, and the number of items of the sensitive information features in the second detection result is greater than a preset item number threshold or the sensitivity is greater than a preset sensitivity threshold, it may be determined that the preset page is the suspected induction collecting page.
The method for determining the suspected induction collection page is an optional and realizable determination method, and in an actual application scenario, there may be a plurality of different determination methods, which may be different according to different actual application scenarios, and this is not specifically limited in the embodiment of the present specification.
In S308, under the condition that the preset page is determined to be the suspected induction collecting page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information included in the preset page content and word segmentation information included in the input information of the user.
For the specific processing procedure of S308, reference may be made to the related content of S104 in the first embodiment, which is not described herein again.
In S310, a predetermined page content sample and a predetermined user input information sample are obtained within a preset time period.
The preset time period can be one month, three months and the like, different preset time periods can be set according to different detection requirements of actual application scenes, the page content samples can comprise page content samples of historical induced acquisition pages and page content samples of historical non-induced acquisition pages, and the input information samples are input information of a page to which the preset page content belongs by a user.
In S312, the word segmentation processing is performed on the page content sample and the input information sample to obtain word segmentation information included in the page content sample and word segmentation information included in the input information sample.
In implementation, referring to the method for performing word segmentation processing on the preset page content and the input information of the user in S104 in the above embodiment, word segmentation processing may be performed on the page content sample and the input information sample, which is not described herein again.
In S314, a page recognition model including the named entity recognition model is trained based on the word segmentation information included in the page content sample and the word segmentation information included in the input information sample, so as to obtain a trained page recognition model.
The page identification model may include a Named Entity identification model (NER), and the Named Entity identification model may be constructed based on BERT (Bidirectional Encoder retrieval from transforms) model.
In implementation, the BERT model is a bidirectional coding representation model of transforms, and a mask model can be used to implement the bidirectionality of the BERT model, that is, when the BERT model is trained, each piece of word segmentation information input into the BERT model can simultaneously use context information of the word segmentation information. Therefore, the BERT model is trained based on the word segmentation information contained in the page content sample and the word segmentation information contained in the input information sample, so that a word segmentation result with high accuracy can be obtained, a good model training effect is realized, and the trained page recognition model (namely the BERT model) can also be accurately used for judging the probability of inducing the collection of the page.
In S316, the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user are input into the pre-trained page recognition model, so as to obtain a target probability that the preset page is an induced collection page in which the privacy information of the user is maliciously invaded.
In S318, based on the target probability, it is determined whether the preset page of the target application is an induced collection page that maliciously violates the user privacy information, so as to protect the user privacy information and prevent the target application from maliciously violating the user privacy information.
For the specific processing procedures of S316 to S318, reference may be made to the relevant contents of S106 to S108 in the first embodiment, which are not described herein again.
In S320, in a case where the preset page is determined to be the induced collection page, outputting preset alarm information.
In implementation, under the condition that the preset page is determined as the induced acquisition page, the host program can trigger the terminal device to output preset alarm information so as to prompt the user that the target application program has a risk, and prevent the user from continuously inputting privacy information in the target application program so as to protect the privacy information of the user. Or, the server may output preset alarm information to the terminal device to perform a safety prompt on the user when the preset page is determined as the induced acquisition page, or the server may output the preset alarm information and the page content of the preset page of the target application program to the relevant application program processing platform when the preset page is determined as the induced acquisition page, so that the relevant application program processing platform performs secondary detection on the page content of the preset page to perform risk control on the target application program.
The embodiment of the specification provides an application program detection method, which includes the steps of obtaining page content of a preset page opened when a target application program in a host program runs and input information of a user in the preset page when the target application program in the host program runs, carrying out word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain target probability that the preset page is an induced acquisition page which has malicious infringement on user privacy information, and determining whether the preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability to protect the user privacy information and prevent the target application program from maliciously infringement on the user privacy information. Therefore, the target probability that the preset page is the induced acquisition page which has malicious invasion on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, namely whether the preset page of the target application program is the induced acquisition page which has malicious invasion on the user privacy information can be timely and accurately determined according to the target probability, and the detection accuracy and the detection efficiency of the application program are improved.
EXAMPLE III
As shown in fig. 4, an execution subject of the method may be a terminal device where a host program is located, where the terminal device may be a device such as a personal computer, or may be a mobile terminal device such as a mobile phone and a tablet computer. The method specifically comprises the following steps:
in S402, when it is detected that the target application in the host program runs, based on a preset feature extraction algorithm, a page feature corresponding to the page content of the preset page and an input feature corresponding to the input information of the user in the preset page are respectively obtained.
The target application may be an applet loaded in the host program.
In implementation, when the host program detects that the target application program runs, the host program may trigger the terminal device to monitor whether the user continuously inputs information in the preset page. Under the condition that the user is detected to continuously input information in the preset page, the host program can trigger the terminal device to acquire the page content of the preset page and the input information of the user in the preset page, and trigger the terminal device to respectively acquire the page features corresponding to the page content of the preset page and the input features corresponding to the input information of the user in the preset page based on a preset feature extraction algorithm.
In S404, it is determined whether the preset page is a suspected induced collection page based on the preset feature matching algorithm, the page feature, and the input feature.
For the specific processing procedure of S404, reference may be made to the relevant content of S306 in the second embodiment, which is not described herein again.
In S406, when it is determined that the preset page is a suspected induced collection page, the page content of the preset page and the input information of the user are sent to the server, and a target probability that the preset page sent by the server is an induced collection page that has malicious infringement on the privacy information of the user is received.
The target probability can be a probability that the preset page is an induced collection page with malicious invasion to user privacy information, which is judged by the server based on page content of the preset page, input information of a user and a pre-trained page recognition model, and the page recognition model is a model which is obtained by training based on a preset page content sample and an input information sample of a preset user and is used for judging the probability of the induced collection page.
In implementation, the host program sends the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, so that the data processing pressure of the server can be relieved, and the detection accuracy and the detection efficiency of the application program are improved.
In S408, based on the target probability, it is determined whether the preset page of the target application is an induced collection page that has malicious infringement on the user privacy information, so as to protect the user privacy information and prevent the target application from maliciously infringing the user privacy information.
The specific processing procedure of S408 may refer to the related content of S108 in the first embodiment, which is not described herein again.
The embodiment of the specification provides a detection method of an application program, the method comprises the steps of respectively obtaining page features corresponding to page contents of a preset page and input features corresponding to input information of a user in the preset page on the basis of a preset feature extraction algorithm when a target application program in a host program is detected to run, determining whether the preset page is a suspected induced acquisition page or not on the basis of a preset feature matching algorithm, the page features and the input features, sending the page contents of the preset page and the input information of the user to a server under the condition that the preset page is determined to be the suspected induced acquisition page, receiving a target probability that the preset page sent by the server is the induced acquisition page with malicious invasion to the privacy information of the user, obtaining the target probability on the basis of the page contents of the preset page, the input information of the user and a pre-trained page recognition model, training the page recognition model is obtained on the basis of a preset page content sample and a pre-trained page recognition model, judging a probability for judging the probability of the induced acquisition page, determining whether the preset page of the target application program protects the privacy information of the user on the privacy information on the basis of the preset page, and preventing the malicious invasion of the user by the user. In this way, the host program can process the page content of the preset page and the input information of the user on the preset page based on the preset feature extraction algorithm and the preset feature matching algorithm, determine whether the preset page is a suspected induced acquisition page, send the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, and determine whether the preset page is the induced acquisition page which maliciously invades the privacy information of the user according to the target probability determined by the server, so that the data processing pressure of the server can be reduced, the detection efficiency and the detection accuracy of the preset page can be improved, and the detection accuracy and the detection efficiency of the corresponding application program can be improved.
Example four
As shown in fig. 5, an execution subject of the method may be a terminal device where a host program is located, where the terminal device may be a device such as a personal computer, or may be a mobile terminal device such as a mobile phone and a tablet computer. The method specifically comprises the following steps:
in S502, when it is detected that the target application program in the host program runs, based on a preset feature extraction algorithm, a page feature corresponding to the page content of the preset page and an input feature corresponding to the input information of the user in the preset page are respectively obtained.
In S504, it is determined whether the preset page is a suspected induced collection page based on the preset feature matching algorithm, the page feature, and the input feature.
In S506, when it is determined that the preset page is the suspected induced collection page, the page content of the preset page and the input information of the user are sent to the server, and the target probability that the preset page sent by the server is the induced collection page that has malicious infringement on the privacy information of the user is received.
In S508, based on the target probability, it is determined whether the preset page of the target application is an induced collection page in which malicious infringement exists on the user privacy information.
For the specific processing procedures of S502 to S508, reference may be made to the relevant contents of S402 to S408 in the third embodiment, which are not described herein again.
In S510, in a case where it is determined that the preset page is the induced acquisition page, preset alarm information is output.
In implementation, the host program may trigger the terminal device to display the preset alarm information when determining that the preset page is the induced acquisition page, so as to prompt the user that the target application program may have an induced acquisition risk, and avoid the problem that the user continues to input personal information in the target application program, which causes personal privacy disclosure, and the like.
The embodiment of the specification provides a detection method of an application program, the method includes the steps of respectively obtaining page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page on the basis of a preset feature extraction algorithm when a target application program in a host program is detected to run, determining whether the preset page is a suspected induced acquisition page or not on the basis of a preset feature matching algorithm, the page features and the input features, sending the page contents of the preset page and the input information of the users to a server under the condition that the preset page is determined to be the suspected induced acquisition page, receiving a target probability that the preset page sent by the server is the induced acquisition page which has malicious invasion on the privacy information of the users, obtaining the target probability on the basis of the page contents of the preset page, the input information of the users and a pre-trained page identification model, training the page identification model is obtained on the basis of a preset page content sample and a preset user input information sample, judging a probability for judging the induced acquisition page, determining whether the target application program has the malicious invasion on the privacy information on the target application program on the basis of the preset page probability, and preventing the malicious invasion on the privacy information of the user. In this way, the host program can process the page content of the preset page and the input information of the user on the preset page based on the preset feature extraction algorithm and the preset feature matching algorithm, determine whether the preset page is a suspected induced acquisition page, send the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, and determine whether the preset page is the induced acquisition page which maliciously invades the privacy information of the user according to the target probability determined by the server, so that the data processing pressure of the server can be reduced, the detection efficiency and the detection accuracy of the preset page can be improved, and the detection accuracy and the detection efficiency of the corresponding application program can be improved.
EXAMPLE five
As shown in fig. 6, an execution subject of the method may be a server, which may be an independent server or a server cluster composed of multiple servers. The method may specifically comprise the steps of:
in S602, page content of a preset page of the target application provided by the host program and input information of the user on the preset page are obtained.
The target application may be an applet loaded in the host program.
In S604, the page content of the preset page and the input information of the user are subjected to word segmentation processing, so as to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user.
In S606, the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user are input into the pre-trained page recognition model, so as to obtain a target probability that the preset page is an induced acquisition page in which the privacy information of the user is maliciously invaded.
For the specific processing procedures of S604 to S606, reference may be made to the relevant contents of S104 to S106 in the first embodiment, which are not described herein again.
In S608, the target probability is sent to the host program, so that the host program determines whether the preset page is an induced collection page that maliciously invades the user privacy information based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
The embodiment of the specification provides a detection method of an application program, which includes the steps of obtaining page content of a preset page of a target application program provided by a host program and input information of a user in the preset page, carrying out word segmentation processing on the page content of the preset page and the input information of the user by the target application program for a sub-process carried in the host program to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain target probability of an induced collection page of which the preset page is maliciously invaded to user privacy information, training the page recognition model based on a preset page content sample and a preset user input information sample to judge a model of the probability of the induced collection page, and sending the target probability to the host program to determine whether the preset page is the induced collection page of the user privacy information maliciously invaded to protect the user privacy information and prevent the target application program from maliciously invading the privacy information. Therefore, the target probability that the preset page is the induced acquisition page with malicious infringement on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, and meanwhile, according to the target probability, whether the preset page of the target application program is the induced acquisition page with malicious infringement on the user privacy information or not can be accurately determined, so that the detection accuracy and the detection efficiency of the application program can be improved.
EXAMPLE six
As shown in fig. 7, an execution subject of the method may be a server, which may be an independent server or a server cluster composed of multiple servers. The method may specifically comprise the steps of:
in S702, encrypted information provided by the host program is acquired.
The encrypted information may be information obtained by encrypting, by the host program, the page content of the preset page and the input information of the user on the preset page based on a preset encryption algorithm.
In implementation, the host program and the server may determine the corresponding encryption key and decryption key based on the target application program, and the server may receive the encrypted information obtained by encrypting the page content of the preset page and the input information of the user based on the encryption key by the host program, so that it is ensured that privacy protection of the user privacy information is achieved in the information transmission process.
In addition, the preset encryption algorithm may be multiple, and may be different according to different actual application scenarios, which is not specifically limited in this embodiment of the present specification.
In S704, based on a preset decryption algorithm, the encrypted information is decrypted to obtain the page content of the preset page and the input information of the user on the preset page.
In implementation, after receiving the encrypted information, the server may decrypt the encrypted information based on the decryption key to obtain a decryption result including the page content of the preset page and the input information of the user, and then the server may determine the target probability based on the page content of the preset page, the input information of the user, and the pre-trained page recognition model, and send the target probability to the host program.
In S706, word segmentation processing is performed on the page content of the preset page and the input information of the user, so as to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user.
In S708, the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user are input into a pre-trained page recognition model, so as to obtain a target probability that the preset page is an induced collection page in which the privacy information of the user is maliciously invaded.
For the specific processing procedures of S706 to S708, reference may be made to the relevant contents of S602 to S606 in the fifth embodiment, which are not described herein again.
After obtaining the target probability that the preset page is the induced collection page having the malicious infringement on the user privacy information, the server may determine, based on the target probability, whether the preset page is the induced collection page having the malicious infringement on the user privacy information, that is, after executing S708, may continue to execute S712. Or, the server may send the target probability to the host program, so that the host program determines, based on the target probability, whether the preset page is an induced collection page in which malicious infringement exists on the user privacy information, that is, after S708 is executed, S710 may be continuously executed.
In S710, the target probability is sent to the host program, so that the host program determines, based on the target probability, whether the preset page is an induced acquisition page in which malicious invasion exists on the user privacy information, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In S712, it is determined whether the preset page is an induced collection page in which there is a malicious infringement on the user privacy information, based on the target probability.
In addition, after the preset page is determined to be an induced collection page with malicious infringement on the user privacy information, S714 or S716 to S718 may be continuously performed.
In S714, in a case where it is determined that the preset page is the induced collection page, alarm information that the preset page is the induced collection page is sent to the preset management device.
The preset management device may be set by a developer of the host program, and may be used for a device that performs risk control on a small program in the host program, and the like.
In implementation, under the condition that the preset page is determined as the induced acquisition page, the server may output alarm information to the preset management device, so that the preset management device performs risk control on the target application program. For example, the preset management device may acquire user information of the target application program in the host program, and then may send alarm information to a user using the target application program after receiving the alarm information of the preset page as an induced collection page, so as to avoid the problem that the user continues to use the target application program, which causes leakage of privacy information of the user, and the like. Or, the preset management device may further determine, according to the alarm information, whether to prohibit the target application from being installed in the host program, and the like.
In S716, link information of the preset page is obtained when the preset page is determined as the induced acquisition page.
In S718, the link information of the preset page is sent to the preset induced page processing platform, so that the preset induced page processing platform processes the target application to which the preset page belongs.
In implementation, the preset induced page processing platform may be a commodity inspection platform or the like, and the link information of the preset page may be sent to the preset induced page processing platform, so that the preset induced page processing platform obtains the page content of the preset page according to the link information of the preset page, and detects the preset page, thereby avoiding misjudgment on the preset page and improving the accuracy of page detection. In addition, under the condition that the preset page is determined as an induced acquisition page which maliciously invades the user privacy information, the preset induced page processing platform can process a target application program to which the preset page belongs, for example, link information of the preset page is stored in a link black list, so that the protection of the user privacy information and the risk control of the application program are realized.
In addition, the link information of the preset page can be stored, so that the page content of the preset page and the input information of the user can be obtained according to the link information, and a preset page content sample and a preset user input information sample are formed, so that the page recognition model can be trained in the next model training period.
The embodiment of the specification provides a detection method of an application program, which comprises the steps of obtaining page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, carrying out word segmentation processing on the page content of the preset page and the input information of the user for a sub-process carried in the host program by the target application program to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputting a pre-trained page recognition model into the page content of the preset page and the word segmentation information contained in the input information of the user to obtain target probability that the preset page is an induced collection page which has malicious infringement on the privacy information of the user, training the page recognition model based on a preset page content sample and a preset user input information sample to obtain a model for judging the probability of the induced collection page, and sending the target probability to the host program to determine whether the preset page is an induced collection page which has malicious infringement on the privacy information of the user or not based on the target probability, so as to protect the privacy information of the user and prevent the privacy information from being infringed by the target application program. Therefore, the target probability that the preset page is the induced acquisition page which has malicious infringement on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, and whether the preset page of the target application program is the induced acquisition page which has malicious infringement on the user privacy information can be timely and accurately determined according to the target probability, so that the detection accuracy and the detection efficiency of the application program can be improved.
EXAMPLE seven
As shown in fig. 8, an embodiment of the present specification provides a method for detecting an application program, which specifically includes the following steps:
in S802, when detecting that a target application in the host program runs, the host program obtains, based on a preset feature extraction algorithm, a page feature corresponding to the page content of a preset page and an input feature corresponding to input information of a user in the preset page, respectively.
In S804, the host program determines whether the preset page is a suspected induced collection page based on the preset feature matching algorithm, the page feature, and the input feature.
In S806, the host program sends the page content of the preset page and the input information of the user to the server when determining that the preset page is the suspected induced collection page.
In S808, the server performs a word segmentation process on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user.
In S810, the server inputs the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user into the pre-trained page recognition model, so as to obtain a target probability that the preset page is an induced acquisition page in which the privacy information of the user is maliciously violated.
In S812, the server transmits the target probability to the host program.
In S814, the host program determines, based on the target probability, whether the preset page of the target application program is an induced collection page in which malicious infringement exists on the user privacy information.
In S816, the server determines, based on the target probability, whether the preset page is an induced collection page that has a malicious infringement on the user privacy information.
In S818, the server sends alarm information that the preset page is the induced collection page to the preset management device when determining that the preset page is the induced collection page.
As shown in fig. 9, when the host program detects that the target application program is running and the user has performed continuous information input on the preset page, the host program may trigger the terminal device to respectively obtain the page features corresponding to the page content of the preset page and the input features corresponding to the input information of the user in the preset page based on a preset feature extraction algorithm. And then, the host program can continuously trigger the terminal device to determine whether the preset page is a suspected induced acquisition page or not based on the preset feature matching algorithm, the page feature and the input feature, namely, the host program can trigger the terminal device to perform induced matching detection on the page content of the preset page and perform sensitive information matching detection on the input information of the user.
The host program can send the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, and the detection pressure of the server on the application program can be relieved by preliminarily screening the suspected induced acquisition page through the host program, so that the detection efficiency and the detection accuracy of the application program are improved. Meanwhile, the host program can encrypt the page content of the preset page and the input information of the user under the condition that the preset page is determined to be the suspected induced acquisition page, and send an encryption result to the server, so that privacy protection in the information transmission process is realized.
After the server obtains the page content of the preset page and the input information of the user (or decrypts the received encryption result to obtain the page content of the preset page and the input information of the user), the server may perform word segmentation on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user. And then inputting the content after word segmentation into a pre-trained page recognition model to obtain the target probability that the preset page is an induced acquisition page which has malicious infringement on the privacy information of the user.
After the target probability is obtained, the server can send the target probability to the host program, so that the host program determines whether the preset page is an induced acquisition page with malicious invasion on the user privacy information based on the target probability, and meanwhile, the server can also determine whether the preset page is an induced acquisition page with malicious invasion on the user privacy information based on the target probability. The method comprises the steps that when the preset page is determined to be an induced acquisition page which maliciously invades user privacy information, the server can send alarm information that the preset page is the induced acquisition page to the preset management device, or the server can also obtain link information of the preset page and send the link information to a preset induced page processing platform, so that the preset induced page processing platform processes a target application program to which the preset page belongs.
The embodiment of the specification provides an application program detection method, when a host program detects that a target application program in the host program runs, based on a preset feature extraction algorithm, respectively acquiring page features corresponding to page contents of a preset page and input features corresponding to input information of a user in the preset page, the host program determines whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features, under the condition that the preset page is determined to be the suspected induced acquisition page, the host program sends the page contents of the preset page and the input information of the user to a server, the server acquires the page contents of the preset page of the target application program provided by the host program and the input information of the user in the preset page, and the server performs word segmentation processing on the page contents of the preset page and the input information of the user, the method comprises the steps of obtaining word segmentation information contained in page content of a preset page and word segmentation information contained in input information of a user, inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance by a server to obtain target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, sending the target probability to a host program by the server, determining whether the preset page of a target application program is the induced acquisition page with the malicious invasion on the privacy information of the user based on the target probability by the host program, determining whether the preset page is the induced acquisition page with the malicious invasion on the privacy information of the user based on the target probability by the server, and under the condition that the preset page is determined to be the induced acquisition page, and sending alarm information that the preset page is an induced acquisition page to the preset management equipment. Therefore, the host program can initially screen whether the preset page is a suspected induced acquisition page or not, the accuracy and the efficiency of determining whether the preset page is an induced acquisition page with malicious invasion to the user privacy information or not by the server can be improved, meanwhile, the server determines the target probability that the preset page is the induced acquisition page with malicious invasion to the user privacy information based on a page recognition model trained in advance, and the accuracy and the detection efficiency of the application program can also be improved
Example eight
Based on the same idea, the application detection method provided in the embodiment of the present specification further provides an application detection apparatus, as shown in fig. 10.
The detection device of the application program comprises: an information obtaining module 1001, a first processing module 1002, a probability obtaining module 1003 and a page detecting module 1004, wherein:
the information acquisition module 1001 is configured to, when it is detected that a target application program in a host program runs, acquire page content of a preset page opened when the target application program runs and input information of a user on the preset page, where the target application program is an applet loaded in the host program;
a first processing module 1002, configured to perform word segmentation on the page content of the preset page and the input information of the user, so as to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user;
a probability obtaining module 1003, configured to input pre-trained page recognition models to the participle information included in the page content of the preset page and the participle information included in the input information of the user, so as to obtain a target probability that the preset page is an induced collection page in which malicious invasion exists on the privacy information of the user, where the page recognition models are models obtained by training based on a preset page content sample and a preset user input information sample and used for determining the probability of the induced collection page;
the page detection module 1004 is configured to determine, based on the target probability, whether a preset page of the target application is an induced collection page in which malicious invasion exists on user privacy information, so as to protect the user privacy information and prevent the target application from maliciously invading the user privacy information.
In an embodiment of the present specification, the page identification model includes a named entity identification model, and the named entity identification model is constructed based on a BERT model.
In an embodiment of this specification, the apparatus further includes:
the sample acquisition module is used for acquiring the preset page content sample and an input information sample of a preset user in a preset time period, wherein the page content sample comprises a page content sample of a historical induced acquisition page and a page content sample of a historical non-induced acquisition page, and the input information sample is input information of the user in a page to which the preset page content belongs;
the second processing module is used for performing word segmentation processing on the page content sample and the input information sample to obtain word segmentation information contained in the page content sample and word segmentation information contained in the input information sample;
and the model training module is used for training a page recognition model comprising the named entity recognition model based on the word segmentation information contained in the page content sample and the word segmentation information contained in the input information sample to obtain the trained page recognition model.
In this embodiment of the present specification, the first processing module 1002 includes:
the feature extraction unit is used for respectively acquiring page features corresponding to the page content of the preset page and input features corresponding to the input information of the user based on a preset feature extraction algorithm;
the page determining unit is used for determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features;
and the word segmentation processing unit is used for performing word segmentation processing on the preset page content and the input information of the user under the condition that the preset page is determined to be a suspected induced collection page, so as to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user.
In an embodiment of this specification, the page determining unit is configured to:
performing induction matching detection on the page features based on a preset induction keyword feature matching rule to obtain a first detection result;
performing sensitive information matching detection on the input features based on a preset sensitive information feature matching algorithm to obtain a second detection result;
and determining whether the preset page is a suspected induced acquisition page or not based on the first detection result and the second detection result.
In an embodiment of this specification, the page determining unit is configured to:
and determining the matching rate of the page features and preset induction keyword features based on a preset regular matching rule, and determining the first detection result based on the matching rate.
In an embodiment of this specification, the apparatus further includes:
and the alarm module is used for outputting preset alarm information under the condition that the preset page is determined to be the induced acquisition page.
The embodiment of the specification provides an application program detection device, which obtains page content of a preset page opened when a target application program in a host program operates and input information of a user in the preset page when the target application program in the host program operates, performs word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputs word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious infringement on user privacy information, and determines whether the preset page of the target application program is an induced acquisition page with malicious infringement on the user privacy information based on the target probability to protect the user privacy information and prevent the target application program from maliciously infringement on the user privacy information. Therefore, the target probability that the preset page is the induced acquisition page which has malicious invasion on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, namely whether the preset page of the target application program is the induced acquisition page which has malicious invasion on the user privacy information can be timely and accurately determined according to the target probability, and the detection accuracy and the detection efficiency of the application program are improved.
Example nine
Based on the same idea, embodiments of the present specification further provide an apparatus for detecting an application program, as shown in fig. 11.
The detection device of the application program comprises: the system comprises a feature extraction module 1101, a page determination module 1102, an information sending module 1103 and a page detection module 1104, wherein:
the feature extraction module 1101 is configured to, when it is detected that a target application program in the host program runs, respectively obtain, based on a preset feature extraction algorithm, a page feature corresponding to page content of the preset page and an input feature corresponding to input information of a user in the preset page, where the target application program is an applet loaded in the host program;
a page determining module 1102, configured to determine whether the preset page is a suspected induced collection page based on a preset feature matching algorithm, the page feature, and the input feature;
an information sending module 1103, configured to send, to a server, page content of a preset page and input information of a user when it is determined that the preset page is a suspected induced acquisition page, and receive a target probability that the preset page sent by the server is an induced acquisition page in which malicious infringement exists on user privacy information, where the target probability is obtained by the server based on the page content of the preset page, the input information of the user, and a pre-trained page recognition model, and the page recognition model is a model trained based on a predetermined page content sample and a predetermined user input information sample and used for determining a probability of an induced acquisition page;
and the page detection module 1104 is configured to determine whether the preset page of the target application is an induced collection page in which malicious invasion exists on the user privacy information based on the target probability, so as to protect the user privacy information and prevent the target application from maliciously invading the user privacy information.
In an embodiment of this specification, the apparatus further includes:
and the alarm module is used for outputting preset alarm information under the condition that the preset page is determined to be the induced acquisition page.
The embodiment of the specification provides a detection device of an application program, when a target application program in a host program is detected to run, based on a preset feature extraction algorithm, page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page are respectively obtained, the target application program is a small program carried in the host program, based on a preset feature matching algorithm, the page features and the input features, whether the preset page is a suspected induced acquisition page is determined, under the condition that the preset page is determined to be the suspected induced acquisition page, the page contents of the preset page and the input information of the users are sent to a server, the target probability of the induced acquisition page which is transmitted by the server and is malicious invasion to the privacy information of the users is received, the target probability is obtained by the server based on the page contents of the preset page, the input information of the users and a pre-trained page identification model, the page identification model is obtained based on a preset page content sample and a preset user input information sample, and is used for judging a probability of the induced acquisition page, whether the target application program is malicious invasion to the privacy information is determined, and the malicious invasion to the privacy information of the user is prevented. In this way, the host program can process the page content of the preset page and the input information of the user on the preset page based on the preset feature extraction algorithm and the preset feature matching algorithm, determine whether the preset page is a suspected induced acquisition page, send the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, and determine whether the preset page is the induced acquisition page which maliciously invades the privacy information of the user according to the target probability determined by the server, so that the data processing pressure of the server can be reduced, the detection efficiency and the detection accuracy of the preset page can be improved, and the detection accuracy and the detection efficiency of the corresponding application program can be improved.
Example ten
Based on the same idea, the embodiments of the present specification further provide a detection apparatus for an application program, as shown in fig. 12.
The detection device of the application program comprises: an information receiving module 1201, a word segmentation processing module 1202, a probability obtaining module 1203, and a probability sending module 1204, wherein:
the information receiving module 1201 is configured to obtain page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, where the target application program is an applet loaded in the host program;
a word segmentation processing module 1202, configured to perform word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user;
a probability obtaining module 1203, configured to input pre-trained page recognition models to the participle information included in the page content of the preset page and the participle information included in the input information of the user, so as to obtain a target probability that the preset page is an induced collection page in which malicious invasion exists on the privacy information of the user, where the page recognition models are models obtained by training based on a preset page content sample and a preset user input information sample and used for determining the probability of the induced collection page;
a probability sending module 1204, configured to send the target probability to the host program, so that the host program determines, based on the target probability, whether the preset page is an induced collection page that maliciously invades user privacy information, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
In this embodiment of the present specification, the information receiving module 1201 includes:
the information receiving unit is used for acquiring encrypted information provided by the host program, wherein the encrypted information is information obtained by encrypting the page content of the preset page and the input information of the user on the preset page by the host program based on a preset encryption algorithm;
and the decryption processing unit is used for decrypting the encrypted information based on a preset decryption algorithm to obtain the page content of the preset page and the input information of the user on the preset page.
In an embodiment of this specification, the apparatus further includes:
the page detection module is used for determining whether the preset page is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability;
and the alarm module is used for sending alarm information that the preset page is an induction acquisition page to a preset management device under the condition that the preset page is determined to be the induction acquisition page.
In an embodiment of this specification, the apparatus further includes:
the link acquisition module is used for acquiring link information of the preset page under the condition that the preset page is determined to be an induced acquisition page;
and the link sending module is used for sending the link information of the preset page to a preset induced page processing platform so that the preset induced page processing platform processes the target application program to which the preset page belongs.
The embodiment of the specification provides a detection device for an application program, which obtains page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, the target application program carries out word segmentation processing on the page content of the preset page and the input information of the user for a short distance carried in the host program to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputs a pre-trained page recognition model to obtain a target probability that the preset page is an induced collection page with malicious invasion to the privacy information of the user, the page recognition model is obtained by training based on a preset page content sample and an input information sample of the preset user, is used for judging a main program of the probability of the induced collection page, and sends the target probability to the host program to enable the host program to determine whether the preset page is the induced collection page with malicious invasion to the privacy information of the user based on the target probability, so as to protect the privacy information of the user and prevent the malicious invasion to the privacy information of the user. Therefore, the target probability that the preset page is the induced collection page which has malicious invasion on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, and meanwhile, according to the target probability, whether the preset page of the target application program is the induced collection page which has malicious invasion on the user privacy information can be accurately determined, so that the detection accuracy and the detection efficiency of the application program can be improved.
EXAMPLE eleven
Based on the same idea, embodiments of the present specification further provide a detection device for an application program, as shown in fig. 13.
The detection device of the application program may be the terminal device or the server provided in the above embodiments.
The detection device of the application may have a large difference due to different configurations or performances, and may include one or more processors 1301 and a memory 1302, where the memory 1302 may store one or more stored applications or data. Memory 1302 may be, among other things, transient or persistent storage. The application program stored in memory 1302 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for the application program's detection device. Still further, processor 1301 may be configured to communicate with memory 1302 to execute a series of computer-executable instructions in memory 1302 on a detection device of an application program. The detection apparatus of the application may also include one or more power supplies 1303, one or more wired or wireless network interfaces 1304, one or more input-output interfaces 1305, one or more keyboards 1306.
In particular, in this embodiment, the detection device of the application includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the detection device of the application, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:
when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page;
and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
Optionally, the page recognition model comprises a named entity recognition model, and the named entity recognition model is constructed based on a BERT model.
Optionally, before inputting the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced collection page with malicious infringement on the privacy information of the user, the method further includes:
within a preset time period, obtaining the preset page content sample and an input information sample of a preset user, wherein the page content sample comprises a page content sample of a historical induced acquisition page and a page content sample of a historical non-induced acquisition page, and the input information sample is input information of the user on a page to which the preset page content belongs;
performing word segmentation processing on the page content sample and the input information sample to obtain word segmentation information contained in the page content sample and word segmentation information contained in the input information sample;
and training a page recognition model comprising the named entity recognition model based on the word segmentation information contained in the page content sample and the word segmentation information contained in the input information sample to obtain the trained page recognition model.
Optionally, the performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user includes:
respectively acquiring page features corresponding to the page content of the preset page and input features corresponding to the input information of the user based on a preset feature extraction algorithm;
determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features;
and under the condition that the preset page is determined to be a suspected induced collection page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user.
Optionally, the determining whether the preset page is a suspected induced collection page based on a preset feature matching algorithm, the page feature and the input feature includes:
performing induction matching detection on the page features based on a preset induction keyword feature matching rule to obtain a first detection result;
performing sensitive information matching detection on the input features based on a preset sensitive information feature matching algorithm to obtain a second detection result;
and determining whether the preset page is a suspected induced acquisition page or not based on the first detection result and the second detection result.
Optionally, the performing induction matching detection on the page features based on a preset induction keyword feature matching rule to obtain a first detection result, including:
and determining the matching rate of the page features and preset induction keyword features based on a preset regular matching rule, and determining the first detection result based on the matching rate.
Optionally, after determining whether the preset page of the target application is an induced collection page that has a malicious infringement on the user privacy information based on the target probability, the method further includes:
and outputting preset alarm information under the condition that the preset page is determined to be an induced acquisition page.
The embodiment of the specification provides detection equipment for an application program, which is characterized in that when a target application program in a host program is detected to run, page content of a preset page opened when the target application program runs and input information of a user in the preset page are obtained, word segmentation processing is performed on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user are input into a page recognition model trained in advance to obtain a target probability that the preset page is an induced collection page with malicious infringement on the privacy information of the user, and based on the target probability, whether the preset page of the target application program is an induced collection page with malicious infringement on the privacy information of the user is determined, so that the privacy information of the user is protected, and the target application program is prevented from maliciously infringement on the privacy information of the user. Therefore, the target probability that the preset page is the induced acquisition page with malicious invasion to the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, namely, whether the preset page of the target application program is the induced acquisition page with malicious invasion to the user privacy information or not can be timely and accurately determined according to the target probability, and the detection accuracy and the detection efficiency of the application program are improved.
Example twelve
Based on the same idea, the detection apparatus for an application provided in the embodiments of the present specification further provides a detection device for an application, as shown in fig. 14.
The detection device of the application program may be the terminal device where the host program provided in the above embodiment is located.
The detection devices for applications may vary significantly depending on configuration or performance, and may include one or more processors 1401 and memory 1402, where the memory 1402 may have one or more stored applications or data stored therein. Memory 1402 may be, among other things, transient storage or persistent storage. The application program stored in memory 1402 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a detection device for the application program. Still further, the processor 1401 may be arranged in communication with the memory 1402, and execute a series of computer executable instructions in the memory 1402 on the detection device of the application. The detection apparatus of an application may also include one or more power sources 1403, one or more wired or wireless network interfaces 1404, one or more input-output interfaces 1405, one or more keyboards 1406.
In particular, in this embodiment, the detection device of the application includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the detection device of the application, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:
when detecting that a target application program in the host program runs, respectively acquiring page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page based on a preset feature extraction algorithm, wherein the target application program is a small program carried in the host program;
determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features;
under the condition that the preset page is determined to be a suspected induced acquisition page, sending the page content of the preset page and the input information of the user to a server, and receiving a target probability that the preset page sent by the server is an induced acquisition page which has malicious infringement on user privacy information, wherein the target probability is obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, and the page recognition model is a model which is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page;
and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
Optionally, after determining whether the preset page of the target application is an induced collection page that has a malicious infringement on the user privacy information based on the target probability, the method further includes:
and outputting preset alarm information under the condition that the preset page is determined to be an induced acquisition page.
The embodiment of the specification provides detection equipment for an application program, when a target application program in a host program is detected to run, page features corresponding to page contents of a preset page and input features corresponding to input information of users in the preset page are respectively obtained based on a preset feature extraction algorithm, the target application program is a small program carried in the host program, whether the preset page is a suspected induced acquisition page is determined based on a preset feature matching algorithm, the page features and the input features, under the condition that the preset page is determined to be the suspected induced acquisition page, the page contents of the preset page and the input information of the users are sent to a server, the target probability of the induced acquisition page which is transmitted by the server and is maliciously invaded to the privacy information of the users is received, the target probability is obtained by the server based on the page contents of the preset page, the input information of the users and a pre-trained page identification model, the page identification model is obtained based on a preset page content sample and a preset user input information sample, the probability is used for judging the probability, whether the target application program protection information of the malicious acquisition page is obtained based on the preset page content sample and the privacy information of the user, and the malicious privacy information of the user is prevented from invading the target application program. In this way, the host program can process the page content of the preset page and the input information of the user on the preset page based on the preset feature extraction algorithm and the preset feature matching algorithm, determine whether the preset page is a suspected induced acquisition page, send the page content of the preset page and the input information of the user to the server under the condition that the preset page is determined to be the suspected induced acquisition page, and determine whether the preset page is an induced acquisition page which has malicious infringement on the privacy information of the user according to the target probability determined by the server, so that the data processing pressure of the server can be reduced, the detection efficiency and the detection accuracy of the preset page can be improved, and the detection accuracy and the detection efficiency of the corresponding application program can be improved.
EXAMPLE thirteen
Based on the same idea, the detection apparatus for an application provided in the embodiments of the present specification further provides a detection device for an application, as shown in fig. 15.
The detection device of the application program may be the server provided in the above embodiment.
The detection devices of the applications may have relatively large differences due to different configurations or performances, and may include one or more processors 1501 and a memory 1502, where one or more stored applications or data may be stored in the memory 1502. The memory 1502 may be, for example, a transient storage or a persistent storage. The application program stored in the memory 1502 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a detection device for the application program. Still further, the processor 1501 may be configured to communicate with the memory 1502 and execute a series of computer-executable instructions in the memory 1502 on a detection device of an application program. The detection apparatus of the application may also include one or more power supplies 1503, one or more wired or wireless network interfaces 1504, one or more input-output interfaces 1505, one or more keyboards 1506.
In particular, in this embodiment, the detection device of the application includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the detection device of the application, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:
acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious invasion on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page;
and sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
Optionally, the acquiring the page content of the preset page of the target application program provided by the host program and the input information of the user on the preset page includes:
acquiring encrypted information provided by the host program, wherein the encrypted information is information obtained by encrypting the page content of the preset page and the input information of the user on the preset page by the host program based on a preset encryption algorithm;
and decrypting the encrypted information based on a preset decryption algorithm to obtain the page content of the preset page and the input information of the user on the preset page.
Optionally, after the pre-trained page recognition model is input to the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user to obtain a target probability that the preset page is an induced acquisition page in which malicious infringement exists on the privacy information of the user, the method further includes:
determining whether the preset page is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability;
and sending alarm information that the preset page is an induced acquisition page to a preset management device under the condition that the preset page is determined to be the induced acquisition page.
Optionally, the method further comprises:
acquiring link information of the preset page under the condition that the preset page is determined to be an induced acquisition page;
and sending the link information of the preset page to a preset induced page processing platform so that the preset induced page processing platform processes a target application program to which the preset page belongs.
The embodiment of the specification provides detection equipment for an application program, which obtains page content of a preset page of a target application program provided by a host program and input information of a user in the preset page, the target application program performs word segmentation processing on the page content of the preset page and the input information of the user for a sub-process carried in the host program to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputs word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced collection page with malicious invasion on the privacy information of the user, the page recognition model is obtained by training based on a preset page content sample and an input information sample of the preset user, is used for judging a model of the probability of the induced collection page, and sends the target probability to the host program to determine whether the preset page is the induced collection page with malicious invasion on the privacy information of the user based on the target probability so as to protect the privacy information of the user and prevent the malicious invasion on the privacy information of the target collection page. Therefore, the target probability that the preset page is the induced collection page which has malicious invasion on the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, and meanwhile, according to the target probability, whether the preset page of the target application program is the induced collection page which has malicious invasion on the user privacy information can be accurately determined, so that the detection accuracy and the detection efficiency of the application program can be improved.
Example fourteen
The embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the processes of the detection method embodiment of the application program, and can achieve the same technical effects, and in order to avoid repetition, the descriptions are omitted here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiment of the specification provides a computer-readable storage medium, which is used for obtaining page content of a preset page opened when a target application program in a host program runs and input information of a user in the preset page when the target application program in the host program is detected to run, performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user, inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page with malicious infringement on the privacy information of the user, and determining whether the preset page of the target application program is an induced acquisition page with malicious infringement on the privacy information of the user based on the target probability to protect the privacy information of the user and prevent the target application program from maliciously infringement on the privacy information of the user. Therefore, the target probability that the preset page is the induced acquisition page with malicious invasion to the user privacy information is determined based on the pre-trained page recognition model, so that the detection accuracy and the detection efficiency of the preset page can be improved, namely, whether the preset page of the target application program is the induced acquisition page with malicious invasion to the user privacy information or not can be timely and accurately determined according to the target probability, and the detection accuracy and the detection efficiency of the application program are improved.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abll (Advanced boot Expression Language), AHDL (alternate hard Description Language), traffic, CUPL (computer universal Programming Language), HDCal (Java hard Description Language), lava, lola, HDL, PALASM, rhydl (Hardware Description Language), VHDL (Hardware Description Language), and vhul-Language, which is currently used most commonly. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be regarded as a hardware component and the means for performing the various functions included therein may also be regarded as structures within the hardware component. Or even means for performing the functions may be conceived to be both a software module implementing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises that element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
One or more embodiments of the specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present disclosure, and is not intended to limit the present disclosure. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (17)

1. A method of detection of an application, the method comprising:
when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page, wherein the target application program is an applet carried in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain a target probability that the preset page induces the acquisition page to have malicious infringement on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information;
the performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user includes:
determining whether the preset page is a suspected induction collection page or not based on a preset feature matching algorithm, page features and input features, and under the condition that the preset page is determined to be the suspected induction collection page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induction collection page or not based on a preset induction keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induction keyword feature matching rule is used for performing induction matching detection on the page features, and the preset sensitive information feature matching algorithm is used for performing sensitive information matching detection on the input features;
the page features are features corresponding to the page contents of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
2. The method according to claim 1, before inputting word segmentation information included in page content of the preset page and word segmentation information included in the input information of the user into a pre-trained page recognition model to obtain a target probability that the preset page is an induced acquisition page in which malicious infringement exists on user privacy information, further comprising:
within a preset time period, acquiring the preset page content sample and an input information sample of a preset user, wherein the page content sample comprises a page content sample of a historical induced acquisition page and a page content sample of a historical non-induced acquisition page, and the input information sample is input information of the user on a page to which the preset page content belongs;
performing word segmentation processing on the page content sample and the input information sample to obtain word segmentation information contained in the page content sample and word segmentation information contained in the input information sample;
and training a page recognition model comprising the named entity recognition model based on the word segmentation information contained in the page content sample and the word segmentation information contained in the input information sample to obtain the trained page recognition model.
3. The method of claim 2, wherein determining whether the preset page is a suspected inducement capture page based on a preset feature matching algorithm, the page features, and the input features comprises:
performing induction matching detection on the page features based on the preset induction keyword feature matching rule to obtain a first detection result;
performing sensitive information matching detection on the input features based on the preset sensitive information feature matching algorithm to obtain a second detection result;
and determining whether the preset page is a suspected induced acquisition page or not based on the first detection result and the second detection result.
4. The method according to claim 3, wherein the performing induction matching detection on the page features based on a preset induction keyword feature matching rule to obtain a first detection result comprises:
and determining the matching rate of the page features and the preset induction keyword features based on a preset regular matching rule, and determining the first detection result based on the matching rate.
5. The method of claim 1, further comprising, after the determining whether the preset page of the target application is an induced collection page having malicious infringement on user privacy information based on the target probability:
and outputting preset alarm information under the condition that the preset page is determined to be an induced acquisition page.
6. A detection method of an application program is applied to a host program, and the method comprises the following steps:
when detecting that a target application program in the host program runs, respectively acquiring page features corresponding to the page contents of the preset page and input features corresponding to the input information of users in the preset page based on a preset feature extraction algorithm, wherein the target application program is a small program carried in the host program;
determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induced acquisition page or not based on a matching result of a preset induced keyword feature matching rule and a preset sensitive information feature matching algorithm, the preset induced keyword feature matching rule is used for carrying out induced matching detection on the page features, and the preset sensitive information feature matching algorithm is used for carrying out sensitive information matching detection on the input features;
under the condition that the preset page is determined to be a suspected induced acquisition page, sending the page content of the preset page and the input information of the user to a server, and receiving a target probability that the preset page sent by the server is an induced acquisition page which has malicious infringement on user privacy information, wherein the target probability is obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
and determining whether a preset page of the target application program is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
7. The method of claim 6, after the determining whether the preset page of the target application is an induced collection page with malicious invasion on user privacy information based on the target probability, further comprising:
and outputting preset alarm information under the condition that the preset page is determined to be an induced acquisition page.
8. A detection method of an application program is applied to a server, and the method comprises the following steps:
acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in page content of the preset page and word segmentation information contained in input information of the user into a page recognition model trained in advance to obtain a target probability that the preset page is an induced acquisition page with malicious invasion to privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and an input information sample of a preset user and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which has malicious infringement on the user privacy information based on the target probability to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information;
the performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user includes:
determining whether the preset page is a suspected induction collection page or not based on a preset feature matching algorithm, page features and input features, and under the condition that the preset page is determined to be the suspected induction collection page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induction collection page or not based on a preset induction keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induction keyword feature matching rule is used for performing induction matching detection on the page features, and the preset sensitive information feature matching algorithm is used for performing sensitive information matching detection on the input features;
the page features are features corresponding to the page content of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
9. The method according to claim 8, wherein the acquiring the page content of the preset page of the target application provided by the host program and the input information of the user on the preset page comprises:
acquiring encrypted information provided by the host program, wherein the encrypted information is information obtained by encrypting the page content of the preset page and the input information of the user on the preset page by the host program based on a preset encryption algorithm;
and decrypting the encrypted information based on a preset decryption algorithm to obtain the page content of the preset page and the input information of the user on the preset page.
10. The method according to claim 8, after the pre-trained page recognition model is input into the word segmentation information included in the page content of the preset page and the word segmentation information included in the input information of the user to obtain the target probability that the preset page is an induced collection page having malicious infringement on the privacy information of the user, the method further comprising:
determining whether the preset page is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability;
and sending alarm information that the preset page is an induced acquisition page to a preset management device under the condition that the preset page is determined to be the induced acquisition page.
11. The method of claim 10, further comprising:
acquiring link information of the preset page under the condition that the preset page is determined to be an induced acquisition page;
and sending the link information of the preset page to a preset induced page processing platform so that the preset induced page processing platform processes a target application program to which the preset page belongs.
12. An apparatus for detecting an application, the apparatus comprising:
the information acquisition module is used for acquiring page content of a preset page opened when a target application program in a host program runs and input information of a user on the preset page when the target application program is detected to run, wherein the target application program is a small program carried in the host program;
the first processing module is used for performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
the probability obtaining module is used for inputting word segmentation information contained in page content of the preset page and word segmentation information contained in input information of the user into a pre-trained page recognition model to obtain target probability that the preset page is an induced acquisition page with malicious invasion on user privacy information, the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
the page detection module is used for determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information;
the first processing module is configured to determine whether a preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, a page feature and an input feature, and perform word segmentation processing on the preset page content and the input information of the user under the condition that the preset page is determined to be the suspected induced acquisition page, so as to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, where the preset feature matching algorithm is configured to determine whether the preset page is the suspected induced acquisition page or not based on a preset induced keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induced keyword feature matching rule is configured to perform induced matching detection on the page feature, and the preset sensitive information feature matching algorithm is configured to perform sensitive information matching detection on the input feature; the page features are features corresponding to the page content of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
13. An apparatus for detecting an application, the apparatus comprising:
the feature extraction module is used for respectively acquiring page features corresponding to the page content of a preset page and input features corresponding to input information of a user in the preset page based on a preset feature extraction algorithm when a target application program in a host program is detected to run, wherein the target application program is a small program carried in the host program;
the page determining module is used for determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features, the preset feature matching algorithm is used for determining whether the preset page is the suspected induced acquisition page or not based on a matching result of a preset induced keyword feature matching rule and a preset sensitive information feature matching algorithm, the preset induced keyword feature matching rule is used for carrying out induced matching detection on the page features, and the preset sensitive information feature matching algorithm is used for carrying out sensitive information matching detection on the input features;
the information sending module is used for sending the page content of the preset page and the input information of the user to a server under the condition that the preset page is determined to be a suspected induced acquisition page, and receiving the target probability that the preset page sent by the server is an induced acquisition page which has malicious infringement on user privacy information, wherein the target probability is obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
and the page detection module is used for determining whether a preset page of the target application program is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information.
14. An apparatus for detecting an application, the apparatus comprising:
the information receiving module is used for acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program;
the word segmentation processing module is used for carrying out word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
the probability acquisition module is used for inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a pre-trained page recognition model to obtain the target probability that the preset page is an induced acquisition page with malicious infringement on the privacy information of the user, the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
the probability sending module is used for sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which maliciously invades the user privacy information or not based on the target probability, so as to protect the user privacy information and prevent the target application program from maliciously invading the user privacy information;
the word segmentation processing module is used for determining whether the preset page is a suspected induction collection page or not based on a preset feature matching algorithm, page features and input features, and performing word segmentation processing on the preset page content and the input information of the user under the condition that the preset page is determined to be the suspected induction collection page to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induction collection page or not based on a preset induction keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induction keyword feature matching rule is used for performing induction matching detection on the page features, and the preset sensitive information feature matching algorithm is used for performing sensitive information matching detection on the input features; the page features are features corresponding to the page content of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
15. An application detection apparatus, the application detection apparatus comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
when detecting that a target application program in a host program runs, acquiring page content of a preset page opened when the target application program runs and input information of a user on the preset page, wherein the target application program is an applet carried in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain a target probability that the preset page induces the acquisition page to have malicious infringement on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information;
the performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user includes:
determining whether the preset page is a suspected induction collection page or not based on a preset feature matching algorithm, page features and input features, and under the condition that the preset page is determined to be the suspected induction collection page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induction collection page or not based on a preset induction keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induction keyword feature matching rule is used for performing induction matching detection on the page features, and the preset sensitive information feature matching algorithm is used for performing sensitive information matching detection on the input features;
the page features are features corresponding to the page contents of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
16. An apparatus for detecting an application, the apparatus comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
when detecting that a target application program in a host program runs, respectively acquiring page features corresponding to page contents of a preset page and input features corresponding to input information of a user in the preset page based on a preset feature extraction algorithm, wherein the target application program is a small program carried in the host program;
determining whether the preset page is a suspected induced acquisition page or not based on a preset feature matching algorithm, the page features and the input features, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induced acquisition page or not based on a preset induced keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induced keyword feature matching rule is used for carrying out induced matching detection on the page features, and the preset sensitive information feature matching algorithm is used for carrying out sensitive information matching detection on the input features;
under the condition that the preset page is determined to be a suspected induced acquisition page, sending the page content of the preset page and the input information of the user to a server, and receiving a target probability that the preset page sent by the server is an induced acquisition page which has malicious infringement on user privacy information, wherein the target probability is obtained by the server based on the page content of the preset page, the input information of the user and a pre-trained page recognition model, the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
and determining whether a preset page of the target application program is an induced acquisition page which has malicious infringement on the user privacy information or not based on the target probability so as to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information.
17. An apparatus for detecting an application, the apparatus comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring page content of a preset page of a target application program provided by a host program and input information of a user on the preset page, wherein the target application program is an applet loaded in the host program;
performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user;
inputting word segmentation information contained in the page content of the preset page and word segmentation information contained in the input information of the user into a page recognition model trained in advance to obtain a target probability that the preset page induces the acquisition page to have malicious infringement on the privacy information of the user, wherein the page recognition model is obtained by training based on a preset page content sample and a preset user input information sample and is used for judging the probability of the induced acquisition page, and the page recognition model comprises a named entity recognition model constructed based on a BERT model;
sending the target probability to the host program so that the host program determines whether the preset page is an induced acquisition page which has malicious infringement on the user privacy information based on the target probability to protect the user privacy information and prevent the target application program from maliciously infringing the user privacy information;
the performing word segmentation processing on the page content of the preset page and the input information of the user to obtain word segmentation information included in the page content of the preset page and word segmentation information included in the input information of the user includes:
determining whether the preset page is a suspected induction collection page or not based on a preset feature matching algorithm, page features and input features, and under the condition that the preset page is determined to be the suspected induction collection page, performing word segmentation processing on the preset page content and the input information of the user to obtain word segmentation information contained in the preset page content and word segmentation information contained in the input information of the user, wherein the preset feature matching algorithm is used for determining whether the preset page is the suspected induction collection page or not based on a preset induction keyword feature matching rule and a matching result of a preset sensitive information feature matching algorithm, the preset induction keyword feature matching rule is used for performing induction matching detection on the page features, and the preset sensitive information feature matching algorithm is used for performing sensitive information matching detection on the input features;
the page features are features corresponding to the page contents of the preset page and acquired based on a preset feature extraction algorithm, and the input features are input features corresponding to the input information of the user and acquired based on the preset feature extraction algorithm.
CN202010143430.4A 2020-03-04 2020-03-04 Application program detection method, device and equipment Active CN111400705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010143430.4A CN111400705B (en) 2020-03-04 2020-03-04 Application program detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010143430.4A CN111400705B (en) 2020-03-04 2020-03-04 Application program detection method, device and equipment

Publications (2)

Publication Number Publication Date
CN111400705A CN111400705A (en) 2020-07-10
CN111400705B true CN111400705B (en) 2023-03-14

Family

ID=71430492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010143430.4A Active CN111400705B (en) 2020-03-04 2020-03-04 Application program detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN111400705B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149404A (en) * 2020-09-18 2020-12-29 支付宝(杭州)信息技术有限公司 Method, device and system for identifying risk content of user privacy data
CN112307371B (en) * 2020-10-27 2024-03-22 支付宝(杭州)信息技术有限公司 Applet sub-service identification method, device, equipment and storage medium
CN112364367A (en) * 2020-11-27 2021-02-12 支付宝(杭州)信息技术有限公司 Object processing method, device and equipment based on privacy protection
CN112600834B (en) * 2020-12-10 2023-03-24 同盾控股有限公司 Content security identification method and device, storage medium and electronic equipment
CN112860566B (en) * 2021-03-02 2024-04-30 百度在线网络技术(北京)有限公司 Applet detection method, device, electronic equipment and readable medium
CN113010892B (en) * 2021-03-26 2022-09-20 支付宝(杭州)信息技术有限公司 Method and device for detecting malicious behavior of small program
CN112948835B (en) * 2021-03-26 2022-07-19 支付宝(杭州)信息技术有限公司 Applet risk detection method and device
CN113284516A (en) * 2021-03-29 2021-08-20 威凯检测技术有限公司 Intelligent voice product privacy invasion detection method based on energy consumption
CN113283232A (en) * 2021-05-31 2021-08-20 支付宝(杭州)信息技术有限公司 Method and device for automatically analyzing private information in text
CN114330331B (en) * 2021-12-27 2022-09-16 北京天融信网络安全技术有限公司 Method and device for determining importance of word segmentation in link

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938041A (en) * 2012-10-30 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Comprehensive detection method and system for page tampering
CN103324615A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for detecting phishing website based on SEO (search engine optimization)
CN105338001A (en) * 2015-12-04 2016-02-17 北京奇虎科技有限公司 Method and device for recognizing phishing website
CN106453351A (en) * 2016-10-31 2017-02-22 重庆邮电大学 Financial fishing webpage detection method based on Web page characteristics
CN110059468A (en) * 2019-04-02 2019-07-26 阿里巴巴集团控股有限公司 A kind of small routine Risk Identification Method and device
CN110826006A (en) * 2019-11-22 2020-02-21 支付宝(杭州)信息技术有限公司 Abnormal collection behavior identification method and device based on privacy data protection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324615A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for detecting phishing website based on SEO (search engine optimization)
CN102938041A (en) * 2012-10-30 2013-02-20 北京神州绿盟信息安全科技股份有限公司 Comprehensive detection method and system for page tampering
CN105338001A (en) * 2015-12-04 2016-02-17 北京奇虎科技有限公司 Method and device for recognizing phishing website
CN106453351A (en) * 2016-10-31 2017-02-22 重庆邮电大学 Financial fishing webpage detection method based on Web page characteristics
CN110059468A (en) * 2019-04-02 2019-07-26 阿里巴巴集团控股有限公司 A kind of small routine Risk Identification Method and device
CN110826006A (en) * 2019-11-22 2020-02-21 支付宝(杭州)信息技术有限公司 Abnormal collection behavior identification method and device based on privacy data protection

Also Published As

Publication number Publication date
CN111400705A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111400705B (en) Application program detection method, device and equipment
EP3780541B1 (en) Identity information identification method and device
CN111539021A (en) Data privacy type identification method, device and equipment
CN109426732B (en) Data processing method and device
CN111159697B (en) Key detection method and device and electronic equipment
EP3945696B1 (en) Blockchain data processing method, apparatus, and device
CN112035881B (en) Privacy protection-based application program identification method, device and equipment
CN112182506A (en) Data compliance detection method, device and equipment
CN112199661A (en) Privacy protection-based equipment identity processing method, device and equipment
CN110235141B (en) Biometric feature recognition method and electronic device
CN114896603A (en) Service processing method, device and equipment
CN112837202B (en) Watermark image generation and attack tracing method and device based on privacy protection
CN112819156A (en) Data processing method, device and equipment
CN114553516B (en) Data processing method, device and equipment
CN115544555A (en) Data processing method and device, storage medium and electronic equipment
CN113239852B (en) Privacy image processing method, device and equipment based on privacy protection
CN112818400B (en) Biological identification method, device and equipment based on privacy protection
CN112364367A (en) Object processing method, device and equipment based on privacy protection
CN113239851B (en) Privacy image processing method, device and equipment based on privacy protection
CN112199731A (en) Data processing method, device and equipment
CN112818389B (en) Data processing method, device and equipment based on privacy protection
CN115828171B (en) Method, device, medium and equipment for executing service cooperatively by end cloud
CN115688130B (en) Data processing method, device and equipment
CN117290879A (en) Risk assessment method, device and equipment for model features
CN115310085A (en) SDK risk detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40033191

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant