CN115114587A

CN115114587A - Automatic identification method, system, equipment and storage medium of counterfeit applet

Info

Publication number: CN115114587A
Application number: CN202210855326.7A
Authority: CN
Inventors: 何晔; 邓薇; 高思雨
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-09-27

Abstract

The invention provides an automatic identification method, a system, equipment and a storage medium of counterfeit applets, wherein the method comprises the following steps: carrying out fuzzy search according to the name of the small program to be detected to obtain a set of target small programs; acquiring the feature information of the applet, the static image information and the dynamic character string information of different pages by capturing the pages of the applet to be tested and the target applet for image-text recognition; filtering the set of target small programs according to a preset white list and a preset black list; and obtaining a first similarity of each residual target small program and the small program to be tested based on the small program characteristic information, a second similarity of the static image information and a third similarity based on the dynamic character string information, and if a preset threshold value is met, determining that the target small program is a counterfeit small program. The method can be effectively applied to the field of small program security analysis, so that the small programs which are counterfeited and pirated can be identified, and risks such as information leakage and property loss are avoided.

Description

Automatic identification method, system, equipment and storage medium of counterfeit applet

Technical Field

The invention relates to the field of network security, in particular to an automatic identification method, a system, equipment and a storage medium of counterfeit applets.

Background

A Mini Program (Mini Program) is an application that can be used without downloading and installing, and it realizes the dream of "touch by the application", and the user can open the application by scanning or searching one scan. The concept of 'running after running' is also embodied, and a user does not need to be concerned about whether too many applications are installed. The application will be ubiquitous and readily available without installation and uninstallation. The number of applets is gradually increased sharply because the applets have the characteristics of low development cost, short period, cross-platform compatibility, no need of downloading and the like. Similar to APP, as applets are more and more widely applied, pirated applets are gradually increased, so that a user cannot identify the applets, namely the rights and interests of the legal applets are damaged, and meanwhile, the risk problems of sensitive information leakage, property loss and the like are easily caused when the user uses the counterfeited applets.

For example: WeChat applet, one of the applets, the English name Wechat Mini Program, is an application that can be used without downloading and installing, and realizes the dream of being "reachable by the touch" by the user, and the user can open the application by scanning or searching once. After a full open application, the subject type is the developer of an enterprise, government, media, other organization or individual, and can apply for registration applets. The wechat small program, the wechat subscription number, the wechat service number and the wechat enterprise number are parallel systems. The WeChat applet is an application that can be used without downloading and is an innovation, and through the development of nearly two years, a new WeChat applet development environment and developer ecology have been constructed.

In view of the above, the present invention provides an automatic identification method, system, device and storage medium for counterfeit applets.

It is noted that the information disclosed in the background section above is only for enhancement of understanding of the background of the invention and therefore may comprise information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide an automatic identification method, a system, equipment and a storage medium of counterfeit applets, which overcome the difficulties in the prior art and can be effectively applied to the field of applet security analysis.

The embodiment of the invention provides an automatic identification method of counterfeit applets, which comprises the following steps:

carrying out fuzzy search according to keywords of the name of the small program to be detected to obtain a set of target small programs;

acquiring small program characteristic information, static image information and dynamic character string information of different pages by capturing pages of the small program to be detected and each target small program in the running process for image-text recognition, wherein the small program characteristic information at least comprises account main body information;

filtering the set of target applets according to a preset white list and a preset black list based on the applet feature information;

obtaining a first similarity of each remaining target small program and the small program to be tested based on the small program feature information, a second similarity of the static image information and a third similarity based on the dynamic character string information;

and when the first similarity, the second similarity and the third similarity all meet a preset threshold, the target applet is a counterfeit applet.

Preferably, the performing fuzzy search according to the keyword of the name of the applet to be tested to obtain the set of target applets includes:

extracting keywords according to the name of the small program to be detected;

and carrying out fuzzy search in the applet library according to the keywords, and establishing a set of the obtained target applets.

Preferably, the obtaining of the applet feature information and the static image information and the dynamic character string information of different pages by capturing pages in the running process of the applet to be tested and each target applet for image-text recognition, where the applet feature information at least includes account main information includes:

capturing the small program to be detected and the page of each target small program in the running process;

through image-text recognition, acquiring the small program characteristic information, static image information and dynamic character string information of different pages, wherein the small program characteristic information at least comprises the following steps: account main information, authenticated program application numbers and service categories; the static image information comprises at least one of an icon, a page loading picture and a front-end page of the applet; the dynamic character string information comprises at least one of a dynamic uniform resource locator character string, an IP address character string and a domain name character string.

Preferably, the filtering the set of target applets according to a preset white list and a preset black list based on the applet feature information includes:

filtering the legal small programs in the target small programs according to a preset white list based on the small program feature information;

and filtering pirated applets in the target applets according to a preset blacklist based on the applet feature information.

Preferably, the obtaining a first similarity of each remaining target applet and the applet to be tested based on the applet feature information, a second similarity of the static image information and a third similarity based on the dynamic character string information comprises:

obtaining a character string editing distance between each remaining target small program and the small program to be tested based on the characteristic information of the small program to obtain a first similarity;

obtaining a second similarity between each remaining target small program and the small program to be tested based on the icon of the small program in the static image information, the cosine distance of the picture feature extracted from the page loading picture and/or the second similarity obtained by extracting the feature vector through the structural similarity in the front-end page;

and obtaining a third similarity between each residual target small program and the small program to be tested based on the character string editing distance of the dynamic character string information.

Preferably, the obtaining of the second similarity between each remaining target applet and the applet to be tested based on the icon of the applet in the static image information, the cosine distance of the image feature extracted from the page loading image, and/or the second similarity obtained by extracting the feature vector through the structural similarity in the front-end page includes:

uniformly scaling the height and width of the icons and page loading pictures of the rest target small programs and the small programs in the small programs to be tested into 64 x 64;

extracting picture characteristics through a trained picture comparison neural network;

calculating the similarity between the feature vectors by cosine similarity, and respectively representing the feature vectors of the two pictures as A ═ a ₁ ,…,a _n ]，B＝[b ₁ ,…,b _n ]The cosine distance between the two vectors is

a _i Is an element in the feature vector A, b _i I is less than or equal to n for one element in the feature vector B.

Preferably, the obtaining a second similarity between each remaining target applet and the applet to be tested based on the icon of the applet in the static image information, a cosine distance of an image feature extracted from a page loading image, and/or a second similarity obtained by extracting a feature vector through structural similarity in the front-end page further includes:

uniformly scaling the height and width of the front end page of each of the rest target small programs and the small program to be tested to 768 × 256;

obtaining structural similarity parameters between the extracted feature vectors, and assuming that the pictures of the two front end pages are respectively represented by x and y, simplifying a calculation formula of the structural similarity of the pictures of the two front end pages as follows:

wherein, the average values of the pixel values of the two pictures are respectively expressed as mu _x And mu _y ，σ _x ² Is the variance, σ, of the pixel values of the first front page _y ² Is the variance, σ, of the pixel values of the second front page _xy Is the covariance of the first front page and the second front page, C ₁ ＝(K ₁ L) ² ,C ₂ ＝(K ₂ L) ² Is a constant for maintaining stability, L is the dynamic range of pixel values, K ₁ ＝0.01，K ₂ ＝0.03。

Preferably, when the first similarity, the second similarity, and the third similarity all satisfy a preset threshold, the target applet is a counterfeit applet, including:

when the first similarity and the second similarity are both larger than a preset first threshold value, and meanwhile, when the third similarity is smaller than a preset second threshold value, the target applet is a counterfeit applet; the value range of the preset first threshold is 70% to 90%, and the value range of the preset second threshold is 30% to 50%.

Preferably, the method further comprises the following steps:

when the editing distance between the main account information of the target applet and the main account information of the applet having the associated company relationship in the preset white list is greater than a preset threshold, the similarity of the main account information is lower than a preset third threshold, and the value range of the preset third threshold is 30-50%, the target applet is a counterfeit applet.

The embodiment of the present invention further provides an automatic identification system of a counterfeit applet, which is used to implement the above automatic identification method of a counterfeit applet, and the automatic identification system of a counterfeit applet includes:

the fuzzy search module is used for carrying out fuzzy search according to the keywords of the name of the small program to be detected to obtain a set of target small programs;

the main body information module is used for acquiring the small program feature information and the static image information and the dynamic character string information of different pages by capturing the pages of the small program to be detected and each target small program in the running process to perform image-text recognition, wherein the small program feature information at least comprises account main body information;

the list filtering module is used for filtering the set of the target small programs according to a preset white list and a preset black list based on the small program characteristic information;

the similarity module is used for obtaining the first similarity of each residual target small program and the small program to be tested based on the small program characteristic information, the second similarity of the static image information and the third similarity based on the dynamic character string information;

and the counterfeit judgment module is used for judging that the target applet is a counterfeit applet when the first similarity, the second similarity and the third similarity all meet a preset threshold value.

An embodiment of the present invention further provides an automatic identification device for a counterfeit applet, including:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the above-described automated identification method of a mock applet, via execution of the executable instructions.

Embodiments of the present invention also provide a computer-readable storage medium for storing a program that, when executed, implements the steps of the above-described method for automatically identifying a counterfeit applet.

The invention aims to provide an automatic identification method, a system, equipment and a storage medium of counterfeit applets, which can be effectively applied to the field of applet security analysis and can avoid the risks of information leakage, property loss and the like caused by using counterfeit pirate applets by identifying counterfeit applets.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.

FIG. 1 is a flow chart of the method of automatic identification of a mock applet of the present invention.

Fig. 2 is a flowchart illustrating step S110 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention.

Fig. 3 is a flowchart illustrating step S120 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention.

Fig. 4 is a flowchart illustrating step S130 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention.

Fig. 5 is a flowchart illustrating step S140 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention.

Figure 6 is a block schematic diagram of an automatic identification system of a mock applet, in accordance with the present invention.

FIG. 7 is a block diagram of the fuzzy search module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention.

FIG. 8 is a block diagram of the subject information module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention.

FIG. 9 is a block diagram of a roster filter module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention.

FIG. 10 is a block diagram of a similarity module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention.

Fig. 11, 12 and 13 are schematic diagrams of the implementation process of the automatic identification method of the counterfeit applet in the invention.

Figure 14 is a schematic diagram of an automatic identification device of a mock applet according to the present invention.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and of being practiced or being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.

Reference throughout this specification to "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics shown may be combined in any suitable manner in any one or more embodiments or examples. Moreover, the various embodiments or examples and features of the various embodiments or examples presented herein can be combined and combined by those skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the expressions of the present application, "plurality" means two or more unless specifically defined otherwise.

In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.

Throughout the specification, when a device is referred to as being "connected" to another device, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a device "includes" a certain component, unless otherwise stated, the device does not exclude other components, but may include other components.

When a device is said to be "on" another device, this may be directly on the other device, but may also be accompanied by other devices in between. When a device is said to be "directly on" another device, there are no other devices in between.

Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface are represented. Also, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.

Although not defined differently, including technical and scientific terms used herein, all terms have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms defined in commonly used dictionaries are to be interpreted as having meanings consistent with those of the related art documents and the present prompts, and must not be excessively interpreted as having ideal or very formulaic meanings unless defined otherwise.

In order to solve the problem of counterfeit of the small programs on the market at present, the invention provides an automatic identification method of counterfeit small programs, which can effectively improve the efficiency of manual identification. FIG. 1 is a flow chart of the method of automatic identification of a mock applet of the present invention. As shown in fig. 1, the present invention relates to the field of network configuration, and is a method for automatically identifying a counterfeit applet applied to a mobile terminal, and the flow of the present invention includes:

s110, carrying out fuzzy search according to keywords of the name of the small program to be detected, and obtaining a set of target small programs. Wherein, the small program to be tested is a legal version small program.

S120, performing image-text recognition by capturing the small program to be detected and the page of each target small program in the running process to obtain the characteristic information of the small program, and the static image information and the dynamic character string information of different pages, wherein the characteristic information of the small program at least comprises the main information of the account.

And S130, filtering the set of the target small programs according to a preset white list and a preset black list based on the characteristic information of the small programs.

S140, obtaining a first similarity of each residual target small program and the small program to be tested based on the small program feature information, a second similarity of the static image information and a third similarity based on the dynamic character string information.

S150, when the first similarity, the second similarity and the third similarity all meet a preset threshold, the target applet is a counterfeit applet.

The implementation process of the invention mainly comprises the following steps: and acquiring the target small program to be detected by utilizing keyword fuzzy matching search. And extracting information such as an icon, an account main body, a front-end page screenshot, an interface IP (Internet protocol), a URL (uniform resource locator) and the like from the small program to be detected. And identifying the original small program in the target small program by using the AppID of the original small program and the main account information, and quickly identifying the counterfeit small program by using a counterfeit small program blacklist library. And (5) carrying out similarity analysis on the rest small programs, and listing the small programs with high similarity as suspected counterfeit small programs. Analyzing the account main body of the suspected counterfeit small program and the account main body of the original edition, and screening out the small programs which are possibly related companies or branch companies and have higher association degree with the original edition account main body. And identifying the final counterfeit small program through the steps. This patent discerns counterfeit object is the applet to confirm the static and dynamic characteristics of discerning and collecting to the applet. Considering that small programs between other branches of a company may have certain similarity, the patent proposes a recognition method according to the main characteristics of an account. The patent provides a counterfeit applet identification system process, and provides a counterfeit identification method for an applet, and static basic information such as an account main body and service categories is extracted by combining the characteristics of the applet, and similarity identification is comprehensively performed by combining dynamic data such as a home page picture, an IP (Internet protocol), a URL (uniform resource locator) and the like acquired during the operation of the applet, so that the detection efficiency is improved

The invention provides a method for identifying counterfeit pirate small programs, which is suitable for identifying counterfeit pirate small programs. At present, with the increasing number of small programs, the piracy problem also increases. The method can be effectively applied to the field of small program security analysis, and information leakage, property loss and other risks caused by using the counterfeit small program by a user are avoided by identifying the counterfeit small program. The automatic identification method of the counterfeit applet can be effectively applied to the field of applet security analysis, and by identifying the counterfeit applet, the risks of information leakage, property loss and the like caused by using the counterfeit applet by a user are avoided.

Fig. 2 is a flowchart illustrating step S110 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention. Fig. 3 is a flowchart illustrating step S120 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention. Fig. 4 is a flowchart illustrating step S130 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention. Fig. 5 is a flowchart illustrating step S140 in the embodiment of the method for automatically identifying a counterfeit applet according to the present invention. As shown in fig. 2 to 5, in the embodiment of fig. 1, on the basis of steps S110, S120, S130 and S140, step S110 is replaced by S111 and S112, step S120 is replaced by S121 and S122, step S130 is replaced by S131 and S132, step S140 is replaced by S141, S142 and S143, step S150 is replaced by S151, and steps S160 and S170 are added, and each step is described below:

and S111, extracting keywords according to the name of the small program to be tested, wherein the small program to be tested is a legal small program.

And S112, carrying out fuzzy search in the small program library according to the keywords, and establishing a set of the obtained target small programs. Those target applets that are similar to the applet to be tested are obtained by fuzzy search.

And S121, capturing the to-be-detected small program and the page of each target small program in the running process.

S122, through image-text recognition, acquiring the small program characteristic information, the static image information and the dynamic character string information of different pages, wherein the small program characteristic information at least comprises: account main information, authenticated program application numbers and service categories; the static image information comprises at least one of an icon of the applet, a page loading picture and a front-end page; the dynamic string information includes at least one of a dynamic uniform resource locator string, an IP address string, and a domain name string. In this embodiment, the existing image-text recognition algorithm is used to obtain the applet feature information, the static image information and the dynamic character string information of different pages from the page, which is not described herein again.

And S131, filtering the legal small program in the target small program according to a preset white list based on the small program feature information.

And S132, filtering pirated applets in the target applets according to a preset blacklist based on the feature information of the applets.

And S141, obtaining the character string editing distance between each residual target small program and the small program to be tested based on the characteristic information of the small program to obtain a first similarity.

And S142, obtaining a second similarity between each residual target small program and the small program to be detected based on the icon of the small program in the static image information and the cosine distance of the picture feature extracted from the page loading picture, and/or obtaining a second similarity obtained by extracting the feature vector through the structural similarity in the front-end page.

In a preferred embodiment, step S142 includes:

and S1421, uniformly scaling the height and width of the icons and page loading pictures of the remaining target small programs and the small programs to be tested into 64 x 64.

S1422, extracting picture features through the trained picture comparison neural network.

S1423, calculating the similarity between the feature vectors by cosine similarity, and expressing the feature vectors of the two pictures as a ═ a ₁ ,…,a _n ]，B＝[b ₁ ,…,b _n ]The cosine distance between the two vectors is

a _i Is an element in the feature vector A, b _i I is less than or equal to n for one element in the feature vector B. In this embodiment, the similarity between feature vectors is obtained by using the existing cosine similarity calculation method, which is not described herein again.

And S1424, uniformly scaling the height and width of the front end page of each remaining target small program and the small program to be tested to 768 × 256.

S1425, extracting picture features through the trained picture comparison neural network.

S1426, obtaining structural similarity parameters between the extracted feature vectors, assuming that the pictures of the two front end pages are respectively represented by x and y, and the structural similarity simplified calculation formula of the pictures of the two front end pages is as follows:

wherein, the average values of the pixel values of the two pictures are respectively expressed as mu _x And mu _y ，σ _x ² Is the variance, σ, of the pixel values of the first front page _y ² Is the variance, σ, of the pixel values of the second front page _xy Is the covariance of the first front page and the second front page, C ₁ ＝(K ₁ L) ² ,C ₂ ＝(K ₂ L) ² Is a constant for maintaining stability, L is the dynamic range of pixel values, K ₁ ＝0.01，K ₂ 0.03. The Structural Similarity parameter, namely SSIM, Structural Similarity, is an index for measuring the Similarity between two images. The index was first proposed by the Laboratory for Image and Video Engineering (Laboratory for Image and Video Engineering) at the university of Texas, Austin. One of the two images used by SSIM is an uncompressed undistorted image, and the other is a distorted image. As an implementation of the structural similarity theory, the structural similarity index defines structural information from the perspective of image composition as being independent of brightness and contrast, reflects attributes of object structures in a scene, and models distortion as a combination of three different factors of brightness, contrast, and structure. The mean is used as an estimate of the luminance, the standard deviation as an estimate of the contrast, and the covariance as a measure of the degree of structural similarity.

And S143, obtaining the character string editing distance between each residual target small program and the small program to be tested based on the dynamic character string information to obtain a third similarity.

S151, when the first similarity and the second similarity are both larger than a preset first threshold value, and meanwhile, when the third similarity is smaller than a preset second threshold value, the target applet is a counterfeit applet; the value range of the preset first threshold is 70% to 90%, and the value range of the preset second threshold is 30% to 50%.

And S160, when the editing distance between the main account information of the target small program and the main account information of the small program with the associated company relationship in the preset white list is larger than a preset threshold, the similarity of the main account information is lower than a preset third threshold, and the value range of the preset third threshold is 30-50%, the target small program is a counterfeit small program.

S170, the applet to be tested which is judged to be the counterfeit applet

The invention provides a method for identifying counterfeit small programs on the market, which comprises the steps of automatically extracting dynamic and static characteristics of small programs to be detected, respectively analyzing the similarity of the extracted characteristics and the characteristics of genuine small programs, identifying suspected counterfeit small programs according to the similarity, then further analyzing the characteristics of a comprehensive account main body, finally identifying counterfeit small programs and improving the detection accuracy.

The invention aims at screening counterfeit pirate WeChat small programs, and the invention is characterized in that a counterfeit identification method is provided for small programs, and dynamic and static characteristics are extracted by combining the characteristics of the small programs. The static characteristics extract basic information such as an applet account main body, an applet name and a service category, the dynamic characteristics extract information such as pictures, URLs (uniform resource locator) and IP (Internet protocol) when the applet runs, the dynamic and static characteristics are integrated for processing, different methods are selected according to the characteristics of different characteristics for similarity analysis, and the process of the identification system for the counterfeit applet is provided. In addition, the method also considers that the small programs among other branch companies of the company may have function reuse, so that the small programs may have certain similarity, and in the process of identifying suspected counterfeit small programs, the method provides that the small programs are distinguished according to the similarity analysis of the main features of the account number, so that the accuracy of the counterfeit small programs is improved.

Figure 6 is a block schematic diagram of an automatic identification system of a mock applet, in accordance with the present invention. As shown in FIG. 6, the automatic identification system of a mock applet of the present invention includes, but is not limited to:

the fuzzy search module 51 performs fuzzy search according to the keywords of the name of the applet to be detected, and obtains a set of target applets.

The main body information module 52 obtains the applet feature information and the static image information and the dynamic character string information of different pages by capturing the pages of the applet to be tested and each target applet in the running process for image-text recognition, wherein the applet feature information at least comprises account main body information.

And the list filtering module 53 is used for filtering the set of target applets according to a preset white list and a preset black list based on the feature information of the applets.

The similarity module 54 obtains a first similarity between each of the remaining target applets and the to-be-tested applet based on the applet feature information, a second similarity between the static image information and a third similarity between the remaining target applets and the to-be-tested applet based on the dynamic character string information.

And the counterfeit judgment module 55, when the first similarity, the second similarity and the third similarity all satisfy the preset threshold, the target applet is a counterfeit applet.

The implementation principle of the above modules is described in the related description of the automatic identification method of the counterfeit applet, and will not be described herein again.

The automatic identification system of the counterfeit applet can be effectively applied to the field of applet security analysis, and information leakage, property loss and other risks caused by the use of the counterfeit applet by a user are avoided by identifying the counterfeit applet.

FIG. 7 is a block diagram of the fuzzy search module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention. FIG. 8 is a block diagram of the subject information module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention. Fig. 9 is a block diagram of a list filtering module in an embodiment of the automatic identification system of a counterfeit applet according to the present invention. FIG. 10 is a block diagram of a similarity module in an embodiment of the automated identification system of a mock applet, in accordance with the present invention. 7-10, based on the embodiment of the apparatus in FIG. 6, the automatic identification system of a counterfeit applet of the present invention replaces the fuzzy search module 51 with a keyword extraction module 511 and a set creation module 512; the page capturing module 521 and the image-text recognition module 522 replace the main body information module 52; the white list module 531 and the black list module 532 replace the list filtering module 53; the similarity module 54 is replaced by a first similarity module 541, a second similarity module 542 and a third similarity module 543; the counterfeit detection module 551 replaces the counterfeit determination module 55. And a supplementary judgment module 56 is added, and the following description is given for each module:

the keyword extraction module 511 extracts keywords according to the name of the applet to be tested.

And the set establishing module 512 is used for performing fuzzy search in the applet library according to the keywords and establishing a set of the obtained target applets.

And the page grabbing module 521 is used for grabbing pages of the to-be-tested small program and each target small program in the running process.

The image-text recognition module 522 obtains applet feature information, static image information and dynamic character string information of different pages by performing image-text recognition, where the applet feature information at least includes: account main information, authenticated program application numbers and service categories; the static image information comprises at least one of an icon, a page loading picture and a front-end page of the applet; the dynamic string information includes at least one of a dynamic uniform resource locator string, an IP address string, and a domain name string.

And a white list module 531 for filtering the genuine applets in the target applets according to a preset white list based on the applet feature information.

The blacklist module 532 filters pirated applets in the target applet according to a preset blacklist based on the applet feature information.

The first similarity module 541 obtains a first similarity between each of the remaining target applets and the applet to be tested based on the character string editing distance of the applet feature information.

The second similarity module 542 obtains the static image-based information of each of the remaining target applets and the applet to be testedAnd obtaining a second similarity obtained by cosine distance of the image features extracted by the icons of the small programs and the page loading images and/or obtaining a second similarity obtained by extracting feature vectors through structural similarity in the front-end page. In a preferred embodiment, the second similarity module 542 is configured to uniformly scale the height and width of the icons and page loading pictures of each of the remaining target applets and the applets to be tested to 64 × 64; extracting picture characteristics through a trained picture comparison neural network; calculating the similarity between the characteristic vectors by cosine similarity, and respectively representing the characteristic vectors of the two pictures as A ═ a ₁ ,…,a _n ]，B＝[b ₁ ,…,b _n ]The cosine distance between the two vectors is

Uniformly scaling the height and width of the front end page of each residual target small program and the small program to be tested to 768 × 256;

wherein, the average values of the pixel values of the two pictures are respectively expressed as mu _x And mu _y ，σ _x ² Is the variance, σ, of the pixel values of the first front page _y ² Is a pixel of the second front pageVariance of value, σ _xy Is the covariance of the first front page and the second front page, C ₁ ＝(K ₁ L) ² ,C ₂ ＝(K ₂ L) ² Is a constant for maintaining stability, L is the dynamic range of pixel values, K ₁ ＝0.01，K ₂ ＝0.03。

The third similarity module 543 obtains a third similarity based on the string editing distance of the dynamic string information for each of the remaining target applets and the applet to be tested.

The counterfeit detection module 551, when the first similarity and the second similarity are both greater than a preset first threshold, and meanwhile, when the third similarity is less than a preset second threshold, the target applet is a counterfeit applet; the value range of the preset first threshold is 70% to 90%, and the value range of the preset second threshold is 30% to 50%.

And the supplementary judgment module 56 is used for judging that the target small program is a counterfeit small program when the editing distance between the main account information of the target small program and the main account information of the small program having the associated company relationship in the preset white list is greater than a preset threshold, the similarity of the main account information is lower than a preset third threshold, and the value range of the preset third threshold is 30-50%.

The implementation principle of the above steps is described in the related description of the automatic identification method of the counterfeit applet, and will not be described herein again.

Fig. 11, 12 and 13 are schematic diagrams of the implementation process of the automatic identification method of the counterfeit applet in the invention. Referring to fig. 11, 12 and 13, the specific embodiment of the present invention is as follows:

(1) firstly, keyword search is carried out: according to the invention, a series of target small program information is obtained by fuzzy search through an automatic test tool, i.e. the Apium, according to the keywords of the name of the small program to be tested. Generally, a counterfeit applet has a high degree of similarity in name with a genuine applet, and this step can obtain a series of target applets including the counterfeit applet.

(2) Then, extracting the characteristics of the small program to be detected and the target small program: and extracting characteristics of an icon, a service category, an AppID, an account main body, a front-end page, an IP, a URL and the like of the target small program. (URL, Uniform resource locator, is a representation of the location of information on a Web service on the Internet.)

Extracting basic characteristic information: the method comprises the steps of using an Apium automatic test tool, performing screenshot on a page through a save _ screenshot () function, recognizing characters in the screenshot of the page by using an OCR technology, and acquiring basic characteristic information including an applet account main body, an AppID, a service category and the like according to a corresponding rule. (see FIG. 12). Among them, appium is one of the mainstream automated testing tools on the current mobile platform. appium is a compound word consisting of the first three letters of "application" and the last three letters of "Selenium", respectively. application is "application", and we generally refer to applications on mobile platforms as apps for short. Selenium is currently the mainstream Web UI automation testing tool. There is an inheritance relationship between appium and Selenium. The implications of appium are: selenium automated test tool of mobile terminal. The appium is an open source automatic testing tool and supports native applications, Web applications and mixed applications on iOS and Android platforms. The AppID is the identity card number of the applet and is the applet ID on the WeChat public platform, and with the AppID, the client can determine the identity of the applet and use the provided high-level interface. App ID is an application number, and is a name formed by combining App and ID. A string of symbols consisting of letters and numbers is commonly used as a unique identification code of application software, so that a developer and a platform can distinguish conveniently. While it is not necessary to develop the applet AppID, if the applet is to be tested on a live machine and released, it must be used, which is the same as Apple's developer account, and can only be played on the iOS simulator without the expense of purchasing the developer account. Certainly, the applet AppID is not charged, and as long as the registrant meets the qualification, the applet can be freely registered and the AppID is obtained. Payment transaction initiation depends on the binding relationship between a public number, an applet, a mobile application (i.e., APPID) and a merchant number (i.e., MCHID), so that after a merchant completes a subscription, the binding relationship between the current merchant number and the APPID needs to be confirmed, and the merchant can use the binding relationship.

Acquiring icons of the applet to be tested and the target applet, and loading pictures on a page: when the small program runs dynamically, the related resource pictures are automatically downloaded into the local document (see fig. 13). The file path is Applet/Applet AppID/store/images.

Acquiring front-end pages of the applet to be tested and the target applet: the method comprises the steps that an automatic testing tool is utilized, an applet is operated and screenshot is carried out, the fact that a popup box appears in the applet and requires a user to log in or authorize is considered, the applet confirms/driver _ switch _ to _ alert _ disturb () # popup box cancels the function driver _ switch _ to _ alert _ disturb with a self-contained popup window, and related pages are screenshot after keyword positioning button elements such as 'always allowed', 'determined', 'cancelled' and the like bypass a login page or agree with an authorization page. Here only the home page is screenshot.

Extracting dynamic URL and IP characteristics: and capturing the small program interface data by using a packet capturing tool, and acquiring information such as IP (Internet protocol), URL (uniform resource locator) and the like through regular matching.

(3) Because the applet AppID has uniqueness, and the account main body of the applet is information authenticated by the WeChat official, the two items of information can be used as a basis for judging a legal version or a known pirated applet. According to the invention, the target small program is filtered through the white list mechanism of the original small program and the accumulated account number main body and AppID information of the known pirate small program, and the known original and pirate small programs are filtered, so that the subsequent analysis data volume is reduced.

(4) And performing similarity analysis on the characteristic information of the remaining target small program and the legal small program to identify the counterfeit small program.

Analyzing the similarity of the basic characteristic information: the basic characteristic information comprises information such as an applet name, an applet account main body, a service category and the like. Here, the basic feature information similarity is calculated from the edit distance of the same-class character strings.

Icon and page loading image similarity analysis: because the sizes of the pictures are different, the invention uniformly scales the height and the width of the pictures to 64 x 64, and the invention extracts the picture characteristics by using a neural network method and calculates the similarity between the characteristic vectors by using cosine similarity.

Let the feature vectors of two pictures be denoted as a ═ a ₁ ,…,a _n ]，B＝[b ₁ ,…,b _n ]The cosine distance between the two vectors is

Front-end page similarity analysis: the invention uniformly scales the top width of the screenshot of the home page to 768 × 256, wherein the image characteristics are extracted by using a neural network method, and because the similarity of the front-end page of the imitated applet is more biased to the similarity of the image structure, the similarity between the extracted characteristic vectors is calculated by using SSIM, so that the similarity of the image structure is analyzed.

Assuming that the two pictures are respectively represented by x and y, the structural similarity simplified calculation formula of the two pictures is as follows:

wherein, the average values of the pixel values of the two pictures are respectively expressed as mu _x And mu _y ，σ _x ² Is the variance, σ, of the pixel values of the first front page _y ² Is the variance, σ, of the pixel values of the second front page _xy Is the covariance of the first front page and the second front page, C ₁ ＝(K ₁ L) ² ,C ₂ ＝(K ₂ L) ² Is a constant for maintaining stability, L is the dynamic range of pixel values, K ₁ ＝0.01，K ₂ But not limited thereto, 0.03.

Dynamic URL, IP, domain name and other feature analysis: for features such as URL, IP, domain name, etc., the edit distance of the character string is still used to calculate the similarity. In general, features such as service URL, IP, domain name, etc. of counterfeit APP are different from those of the original version. And through the third step, filtering the target small program by using the black and white small program library, and then carrying out similarity analysis on the rest small programs.

(5) And when the similarity of the home page picture and the local loaded picture, the similarity of the service category, the applet name and the like all reach over 80 percent, and the similarity of the information such as the IP, the URL and the like and the copyright is lower than 40 percent, judging that the target applet is the suspected counterfeit applet.

(6) Sometimes, the small programs operated by the branch companies are similar to the small programs of other branch companies of the company due to business, the multiplexing of functions of the small programs and the like, so that the small programs operated by the branch companies have high similarity in information such as icons, introductions, front-end pages and the like. In identifying counterfeit applets, such applets are primarily distinguished by account body characteristics. The small programs of other branch companies or the account main bodies of the associated companies should have correlation with the account main body of the original small program, if the editing distance between the account main body of the target small program and the account main body of the original small program is large, the similarity of the account main bodies is lower than 40%, and the target small program is judged to be the counterfeit small program.

The invention screens the small programs by extracting dynamic and static characteristics and performs similarity analysis on the rest small programs. Compared with the prior art, the invention has the following distinguishing technical characteristics:

(1) the static characteristics of the extracted target small program comprise icons, service categories, AppID, account number main bodies and the like, and dynamic data such as a front-end page, an interface IP, a URL and the like.

(2) And filtering the target small program through a white list mechanism of the legal small program and the accumulated account number main body and AppID information of the known pirate small program.

(3) Listing the column with high similarity as a suspected counterfeit small program; analyzing the account main body of the suspected counterfeit small program and the account main body of the original edition, and screening out the small programs which are possibly related companies or branch companies and have higher association degree with the original edition account main body.

The embodiment of the invention also provides automatic identification equipment for the counterfeit applets, which comprises a processor. A memory having stored therein executable instructions of the processor. Wherein the processor is configured to perform the steps of the method of automatic identification of a mock applet via execution of the executable instructions.

As shown above, the automatic identification system for counterfeit applets of the present invention can be effectively applied in the field of applet security analysis, and can avoid the risks of information leakage, property loss, and the like caused by using counterfeit applets by identifying counterfeit applets.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.

Figure 14 is a schematic diagram of an automatic identification device of a mock applet according to the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 14. The electronic device 600 shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 14, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.

Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, processing unit 610 may perform the steps as shown in fig. 1.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: a processing system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

The embodiment of the invention also provides a computer readable storage medium for storing the program, and the steps of the automatic identification method of the counterfeit applet are realized when the program is executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of this specification, when the program product is run on the terminal device.

As shown above, the automatic identification system for counterfeit applets of the present invention can be effectively applied in the field of applet security analysis, and can avoid information leakage and property loss caused by the use of counterfeit applets by identifying counterfeit applets.

The program product 800 for implementing the above method according to an embodiment of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out processes of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the present invention is directed to provide an automatic identification method, system, device and storage medium for counterfeit applets, which can be effectively applied in the field of applet security analysis, and avoid the risks of information leakage and property loss caused by using counterfeit applets by identifying counterfeit applets.

The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An automatic identification method of counterfeit applets is characterized by comprising the following steps:

filtering the set of the target small programs according to a preset white list and a preset black list based on the small program feature information;

2. The method of claim 1, wherein the performing fuzzy search according to the keyword of the name of the applet to be tested to obtain the set of target applets comprises:

extracting keywords according to the name of the small program to be detected;

3. The method for automatically identifying counterfeit applets according to claim 1, wherein the image-text identification is performed by capturing pages in the running process of the to-be-tested applet and each target applet to obtain applet feature information and static image information and dynamic character string information of different pages, wherein the applet feature information at least includes account number main information, and the method comprises the following steps:

4. The method of automatically identifying counterfeit applets of claim 3, wherein said filtering the set of target applets according to a preset white list and a preset black list based on said applet feature information comprises:

5. The method of automatically identifying a mock applet according to claim 3, wherein said obtaining a first similarity of each of said remaining target applet to said applet to be tested based on said applet feature information, a second similarity of said static image information and a third similarity based on said dynamic string information comprises:

6. The method for automatically identifying counterfeit applets of claim 5, wherein said obtaining a second similarity between each of the remaining target applets and the applet to be tested based on the icon of the applet in the still image information, a cosine distance of a picture feature extracted from a page-loaded picture, and/or a second similarity obtained by extracting a feature vector through structural similarity in the front-end page comprises:

extracting picture characteristics through the trained picture comparison neural network;

7. The method of claim 6, wherein the obtaining of the second similarity between each remaining target applet and the applet to be tested based on the icon of the applet in the still image information, the cosine distance of the image feature extracted from the page loading image, and/or the second similarity obtained by extracting the feature vector through the structural similarity in the front-end page, further comprises:

uniformly scaling the height and width of the front end page of each residual target small program and the small program to be tested to 768 x 256;

8. The method for automatically identifying a counterfeit applet according to claim 1, wherein the step of determining that the target applet is a counterfeit applet when the first similarity, the second similarity and the third similarity all satisfy a predetermined threshold comprises:

9. The method for automatic identification of a counterfeit applet as claimed in claim 1, further comprising the steps of:

when the editing distance between the main account information of the target small program and the main account information of the small program having the associated company relationship in the preset white list is larger than a preset threshold, the similarity of the main account information is lower than a preset third threshold, and the value range of the preset third threshold is 30-50%, the target small program is a counterfeit small program.

10. An automatic identification method of a counterfeit applet, comprising:

11. An automatic identification device for counterfeit applets, comprising:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the method of automatic identification of a mock applet according to any of claims 1 to 9, via execution of said executable instructions.

12. A computer-readable storage medium for storing a program, characterized in that the program, when being executed by a processor, carries out the steps of the method for automatic identification of a mock applet according to any one of claims 1 to 9.