CN113569861A

CN113569861A - Mobile application illegal content scanning method, system, equipment and medium

Info

Publication number: CN113569861A
Application number: CN202110884177.2A
Authority: CN
Inventors: 郝德禄; 肖冠正; 甘心; 王伟; 曾荣
Original assignee: iMusic Culture and Technology Co Ltd
Current assignee: iMusic Culture and Technology Co Ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2021-10-29
Anticipated expiration: 2041-08-03
Also published as: CN113569861B

Abstract

The invention discloses a method, a system, equipment and a medium for scanning illegal contents of mobile applications, wherein the method comprises the following steps: the method comprises the steps of capturing a screen of a home page of the mobile application; performing region segmentation on the home page screenshot through a convolutional network segmentation model, and determining a clickable region and a region information file; traversing and clicking the clickable area according to the area information file, judging whether the page is jumped or not by a pixel point matching method, and performing click traversal and screen capture on all pages in the application by combining a depth-first search algorithm to determine the screen capture of the application page; performing image character recognition on the application page screenshot, detecting whether sensitive words exist in the application page screenshot, and determining illegal character scanning of the mobile application; and detecting image content of the application page screenshot, detecting whether an illegal image exists in the application page screenshot, and determining mobile application illegal image scanning. The invention can carry out full-coverage scanning on the application inner page and can be widely applied to the technical field of computer application.

Description

Mobile application illegal content scanning method, system, equipment and medium

Technical Field

The invention relates to the technical field of computer application, in particular to a method, a system, equipment and a medium for scanning illegal contents of mobile application.

Background

At present, the bad information detection method of the mobile application mainly carries out decompilation processing on static content in the mobile application and combines an interface packet capturing technology to carry out illegal information detection on dynamically returned text and image content, however, when the mobile application uses code confusion, resource confusion and interface encryption, the method cannot normally function. In addition, violation information detection is performed on the screenshot image of the application page, the application page needs to be traversed, the traversal method is generally realized by manually writing scripts to control clicking coordinates of the application page, the scripts are only used for single application, and if the application interface is changed seriously, script codes also need to be adjusted. In addition, the information of the interface clickable UI component is also acquired through a development tool, and the clickable UI component is traversed and clicked to control page jump so as to achieve the purpose of page traversal, but when the mobile application is a Hybrid application or a native application embedded WebView page, the method fails because the information of the page clickable UI component cannot be acquired. Thus, these methods all have various degrees of limitations.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, a system, a device, and a medium for scanning illegal mobile application content, so as to implement overlay scanning on any application page content and improve accuracy of detecting bad information.

In one aspect, the present invention provides a method for scanning illegal contents of mobile applications, including:

the method comprises the steps of performing screen capture on a home page of the mobile application, and determining home page screen capture;

performing region segmentation on the home page screenshot through a convolutional network segmentation model, and determining a clickable region and a region information file;

traversing and clicking the clickable area according to the area information file, judging whether a page is jumped or not by a pixel point matching method, and performing click traversal and screen capture on all pages in the application by combining a depth-first search algorithm to determine application page screen capture;

performing image character recognition on the application page screenshot, detecting whether sensitive words exist in the application page screenshot, and determining illegal character scanning of the mobile application;

and detecting the image content of the application page screenshot, detecting whether an illegal image exists in the application page screenshot, and determining mobile application illegal image scanning.

Optionally, the performing region segmentation on the home page screenshot through a convolutional network segmentation model to determine a clickable region and a region information file includes:

training the convolutional network segmentation model to determine a pre-trained convolutional network segmentation model;

inputting the home page screenshot to the pre-trained convolutional network segmentation model;

the pre-training convolutional network segmentation model carries out semantic segmentation on the home page screenshot, and character regions, image regions and region information are obtained through segmentation; the area information is used for representing the initial coordinates and the area width and height of the character area and the image area in the screen capture image;

and determining the character area and the image area as clickable areas, saving the area information in a file format, and determining an area information file.

Optionally, the traversing and clicking the clickable area according to the area information file, determining whether a page jumps or not by using a pixel point matching method, and determining application page screen capture by performing click traversal and screen capture on all pages in an application in combination with a depth-first search algorithm, including:

analyzing the initial coordinate and the width and height information of the clickable area from the area information file, performing traversal clicking on the central point of the clickable area according to the information, performing screen capture on a clicked application page, and determining a first application screen capture;

comparing the home page screenshot with the first application screenshot, and determining a jump page when the proportion of pixels of the home page screenshot and the first application screenshot at the same position exceeds a first threshold; the pixel proportion is used for representing the quantity proportion of different pixel values in the same pixel point;

and performing area segmentation and traversal clicking on the jump page, performing screen capture on the clicked page, performing traversal clicking and screen capture on all application pages by using a depth-first algorithm, completing screen capture on all pages of the mobile application, and determining screen capture of the application pages.

Optionally, the image and text recognition of the screenshot of the application page, detecting whether a sensitive word exists in the screenshot of the application page, and determining that the illegal text of the mobile application is scanned includes:

carrying out optical character recognition on the application page screen capture to determine character information;

inputting the character information into a sensitive word detection system for sensitive word detection, performing character matching according to a sensitive word database, and determining a matching result;

and when the matching result is that the illegal information is contained, marking the screen capture of the application page as an illegal page, and determining illegal character scanning of the mobile application page.

Optionally, the image content detection on the screenshot of the application page, detecting whether an illegal image exists in the screenshot of the application page, and determining that the illegal image scanning of the mobile application is performed includes:

inputting the screenshot of the application page into an image content detection system, detecting whether the screenshot of the application page contains illegal content, and determining a detection result;

and when the detection result is that the illegal content is contained, marking the screenshot of the application page as an illegal page, and determining the illegal image detection of the mobile application page.

Optionally, the training the convolutional network segmentation model to determine a pre-trained convolutional network segmentation model includes:

acquiring an application screenshot;

manually marking the application screenshot or generating a single-channel mask image according to page component information;

setting the pixel value in the single-channel mask image as a first numerical value according to a character region, setting the pixel value in the single-channel mask image as a second numerical value according to an image region, setting the pixel value in the single-channel mask image as a third numerical value according to a background region, and determining the single-channel mask image with the changed pixel value as annotation data;

and importing the labeling data into a semantic segmentation algorithm model, and determining a pre-training convolution network segmentation model.

Optionally, the determining the text area and the image area as clickable areas, saving the area information in a file format, and determining an area information file includes:

determining the character area and the image area as clickable areas, saving the area information in a file format, and determining an initial area information file;

and modifying the initial area information file, deleting clickable areas which do not need click skipping, and determining a target area information file.

On the other hand, the embodiment of the invention also discloses a mobile application illegal content scanning system, which comprises the following steps:

the first module is used for capturing a screen of a home page of the mobile application and determining home page capture;

the second module is used for carrying out region segmentation on the home page screenshot through a convolutional network segmentation model, and determining a clickable region and a region information file;

the third module is used for traversing and clicking the clickable area according to the area information file, judging whether the page jumps or not by a pixel point matching method, and performing click traversal and screen capture on all pages in the application by combining a depth-first search algorithm to determine the screen capture of the application page;

the fourth module is used for carrying out image character recognition on the application page screenshot, detecting whether sensitive words exist in the application page screenshot or not, and determining illegal character scanning of the mobile application;

and the fifth module is used for detecting the image content of the application page screenshot, detecting whether an illegal image exists in the application page screenshot or not, and determining mobile application illegal image scanning.

On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

On the other hand, the embodiment of the invention also discloses a computer readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to realize the method.

In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects: according to the method, a convolutional network segmentation model is used for carrying out region segmentation on the home page screenshot, and a clickable region and a region information file are determined; the clickable area can be identified in the application page, and the click coordinate of the clickable area can be automatically calculated, so that the application range is wider, and the use is more convenient and fast; in addition, the invention carries out image character recognition on the screenshot of the application page, detects whether sensitive words exist in the screenshot of the application page and determines illegal character scanning of the mobile application; detecting image content of the application page screenshot, detecting whether an illegal image exists in the application page screenshot, and determining mobile application illegal image scanning; illegal content detection can be performed on the text and the image, and accuracy rate of bad information detection is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a detailed flow chart of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present invention provides a method for scanning illegal contents of a mobile application, including:

s1, screen capturing is carried out on the home page of the mobile application, and home page screen capturing is determined;

s2, performing area segmentation on the home page screenshot through a convolutional network segmentation model, and determining a clickable area and an area information file;

s3, performing traversal clicking on the clickable area according to the area information file, judging whether the page jumps or not by a pixel point matching method, performing click traversal and screen capture on all pages in the application by combining a depth-first search algorithm, and determining application page screen capture;

s4, performing image character recognition on the application page screenshot, detecting whether sensitive words exist in the application page screenshot, and determining mobile application illegal character scanning;

s5, detecting the image content of the application page screenshot, detecting whether an illegal image exists in the application page screenshot, and determining mobile application illegal image scanning.

Further preferably, in step S2, the determining a clickable area and an area information file by performing area segmentation on the top page screenshot through a convolutional network segmentation model includes:

Training the convolutional network segmentation model to obtain a pre-trained convolutional network segmentation model; after the home page screenshot is input to the pre-trained convolution network segmentation model, the segmentation model performs semantic segmentation on the home page screenshot to obtain a character area, an image area, and initial coordinates and area width and height of each character area and each image area in the screenshot image. And taking the character area and the image area obtained by segmentation as clickable areas of the page, and storing the obtained information of the character area and the image area into a JSON (Java Server pages connection) format file to obtain the clickable area and area information file.

As a further optional implementation manner, in step S3, the step of performing traversal clicking on the clickable area according to the area information file, determining whether a page jumps or not by using a pixel point matching method, performing click traversal and screen capture on all pages in the application by combining with a depth-first search algorithm, and determining screen capture of the application page includes:

The method comprises the steps of analyzing initial coordinates and width and height information of a clickable area from a JSON file of the clickable area of a screen shot, calculating inspiration coordinates and width and height information to obtain a center point of the clickable area, then performing traversal clicking on the center point of the clickable area, and performing screen shot on an application after clicking. In mobile application, the part for page jump can be clicked to be a text part or an image part, so that a text area and an image area are segmented by a convolutional network segmentation model and determined to be a clickable area. The clickable areas are traversed and clicked, so that the page can be skipped, but the page skipping cannot occur when each clickable area is clicked, so that the page screenshot before clicking and the page screenshot after clicking need to be compared, and whether the page skipping occurs or not is judged. And each pixel point in the page screenshot has a pixel value, the page screenshot before clicking and the page screenshot after clicking are compared, and if the proportion of the different pixel values of the pixel points at the same position exceeds five percent, the page is confirmed to jump. And performing semantic segmentation on the page subjected to page skipping through a convolutional network segmentation model to obtain a clickable area, clicking and capturing a screen of the clickable area, and repeating the operation. And traversing clicking by adopting a depth-first search algorithm, namely performing semantic segmentation on a first skipped page by using the depth-first search algorithm to obtain a clickable area, clicking the clickable area, performing screen capture on the clicked page, performing page skip judgment, performing semantic segmentation, click, screen capture and other operations on a second skipped page until the page returns to the clickable area which is not clicked after the page is not skipped, and repeating the operations until all clickable areas are clicked to realize clicking and screen capture on all application pages. And the motor of the interface only needs to manually write a universal script to analyze the JSON file to obtain the coordinate of the central point of the clickable area, and call a click simulation instruction of a corresponding platform, such as an adb tool of an Android platform or a UIautomation frame of an iOS platform.

Further as a preferred implementation manner, in step S4, the performing image-character recognition on the screenshot of the application page, detecting whether there is a sensitive word in the screenshot of the application page, and determining that the mobile application scans the illegal character includes:

The method comprises the steps of identifying text information in page screenshots through an ORC technology for all screen shot images, namely the page screenshots, and inputting the text information into a sensitive word detection system. And detecting whether the text contains violation information or not by matching with the information of the sensitive word database, and if the violation text exists, marking the page corresponding to the screenshot image as a violation page to finish scanning the mobile application violation characters.

Further as a preferred embodiment, in step 5, the performing image content detection on the screenshot of the application page, detecting whether an illegal image exists in the screenshot of the application page, and determining that scanning of the illegal image of the mobile application is performed includes:

The method comprises the steps of inputting the screenshot of an application page into an AI image content detection model, detecting whether the screenshot image has bad information such as yellow-related information, terrorism-related information and the like, marking the page corresponding to the screenshot image as an illegal page if the illegal image exists, and finishing scanning of the illegal image of the mobile application.

Further as a preferred embodiment, the training the convolutional network segmentation model to determine a pre-trained convolutional network segmentation model includes:

acquiring an application screenshot;

Training data of the convolutional network segmentation model are a large number of single-channel mask images; firstly, screen capture is needed to be carried out on the application, a single-channel mask image can be generated on the screen capture image in a mode of manual marking, and the single-channel mask image can also be generated through page UI component information. The size of the mask image is the same as that of the screenshot, the pixel value of the mask image is 1 to represent that the pixel belongs to the character area, the pixel value of the mask image is 2 to represent that the pixel belongs to the image area, and the pixel value of the mask image is 0 to represent that the pixel belongs to the background area. When the region with width and height (w1, h1) is marked as a text region from the (x1, y1) coordinate in the screenshot, the pixel value of the mask image from the (x1, y1) coordinate and the pixel value of the region with width and height (w1, h1) are set to 1, and the pixel value of the image region and the pixel value of the background region are set to 2 and 0, respectively, to obtain the training data. Inputting training data into a network model taking DeepLabV3+ or other semantic segmentation algorithms as a backbone network, and training to obtain a convolution network segmentation model capable of segmenting texts and images.

Further preferably, the determining the text area and the image area as clickable areas, storing the area information in a file format, and determining an area information file includes:

The character area and the image area are determined as clickable areas, and the area information is stored in a JSON file format to obtain an initial area information file. The JSON file of the clickable area list can be modified manually, and the area nodes which do not need to be clicked and jumped are deleted, so that the target area information file is obtained. The clickable regions which cannot be skipped are deleted by manually modifying the JSON codes, so that the number of traversals of the clickable regions is reduced, and the scanning speed can be increased.

With reference to figure 1. The process of the invention specifically comprises the following steps: the method comprises the steps of performing screen capture on an application home page to obtain a home page screen capture; inputting the screenshot of the home page into a convolutional network segmentation model for semantic segmentation to obtain a clickable area and area information; clicking and screen capturing the clickable area according to the area information, comparing pixel values of the screen capturing pages, judging whether the pages are jumped or not, inputting the screen capturing of the jumped pages into a segmentation model to obtain the clickable area if the pages are jumped, and clicking and screen capturing each page of the mobile application by using a depth-first search algorithm to obtain the screen capturing of each page of the mobile application. Illegal character detection is carried out on page screen capture through an ORC technology, illegal image detection is carried out on the page screen capture through an AI image content detection model, and scanning of illegal content of mobile application is achieved.

Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.

Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

In summary, the embodiments of the present invention have the following advantages:

(1) the embodiment of the invention obtains the screenshot information of all the pages of the application by a page scanning method, and can capture the content of the application which is subjected to code confusion, code consolidation and interface encryption;

(2) the embodiment of the invention obtains the clickable area through the segmentation interface of the convolutional network segmentation model, and can identify the clickable area in the Hybrid application and the native application WebView page;

(3) according to the embodiment of the invention, the detection of the illegal content is carried out on the application page by combining the ORC text detection model and the AI image content detection model, so that the accuracy rate of the detection of the illegal content is improved.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mobile application illegal content scanning method is characterized by comprising the following steps:

2. The method for scanning illegal contents of mobile applications according to claim 1, wherein the step of performing area segmentation on the front page screenshot through a convolutional network segmentation model to determine a clickable area and an area information file comprises the following steps:

3. The method for scanning the illegal contents of the mobile application according to claim 1, wherein the step of performing traversal clicking on the clickable area according to the area information file, judging whether a page jumps or not by using a pixel point matching method, performing click traversal and screen capture on all pages in the application by combining a depth-first search algorithm, and determining screen capture of the page of the application comprises the following steps:

4. The method for scanning illegal mobile application content according to claim 1, wherein the steps of performing image character recognition on the screenshot of the application page, detecting whether sensitive words exist in the screenshot of the application page, and determining scanning illegal mobile application characters comprise:

5. The method for scanning the illegal mobile application content according to claim 1, wherein the steps of detecting the image content of the screenshot of the application page, detecting whether an illegal image exists in the screenshot of the application page, and determining the scanning of the illegal mobile application image comprise:

6. The method according to claim 2, wherein the training the convolutional network segmentation model to determine a pre-trained convolutional network segmentation model comprises:

acquiring an application screenshot;

7. The method for scanning illegal contents of mobile application according to claim 2, wherein the step of determining the text area and the image area as clickable areas, the step of saving the area information in a file format, and the step of determining an area information file comprises the steps of:

8. A mobile application offending content scanning system, comprising:

9. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-7.