CN113742559A - Keyword detection method and device, electronic equipment and storage medium - Google Patents

Keyword detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113742559A
CN113742559A CN202111044573.0A CN202111044573A CN113742559A CN 113742559 A CN113742559 A CN 113742559A CN 202111044573 A CN202111044573 A CN 202111044573A CN 113742559 A CN113742559 A CN 113742559A
Authority
CN
China
Prior art keywords
login
target
page
information
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111044573.0A
Other languages
Chinese (zh)
Inventor
俞皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111044573.0A priority Critical patent/CN113742559A/en
Publication of CN113742559A publication Critical patent/CN113742559A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour

Abstract

The embodiment provides a keyword detection method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: according to the method, the login page of the query system is identified according to the preset path information in the login script, the login information in the login script is input into the login page to log in the query system, a user does not need to manually find the login page in the embodiment, and the user does not need to manually input the login information to log in the query system, so that the function of automatically logging in the query system is realized.

Description

Keyword detection method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a keyword detection method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of internet technology, internet information under the big data era grows exponentially, and a large amount of information resources are shared through network services, wherein browsing a system website is one of the network services with the highest utilization rate at present. In order to further optimize the system, for example, the query system, the detection of the keywords needs to be performed on the pages of the query system. At present, keyword detection is mainly performed on a page of a query system in a manual mode, a user needs to log in the query system manually, find a page needing keyword detection manually, and manually check whether the page contains keyword information, but keyword detection efficiency is not high. Therefore, a method for detecting keywords with higher efficiency is needed.
Disclosure of Invention
The embodiment of the disclosure provides a keyword detection method and device, an electronic device, and a storage medium, which can improve the efficiency of keyword detection.
In order to achieve the above object, a first aspect of the embodiments of the present disclosure provides a keyword detection method, including:
identifying a login page of the query system according to path information in a preset login script, wherein the path information is used for indicating a path for logging in the login page of the query system;
logging in the query system according to the login information in the login script;
performing screenshot processing on a browsing page of the query system to obtain a target page screenshot;
performing character recognition on the screenshot of the target page to recognize the target characters;
and matching the target characters with preset keywords to obtain a detection result corresponding to the keywords.
In some embodiments, identifying a login page of the query system according to path information in a preset login script includes:
identifying a login button of a login page according to the path information;
the login query system according to the login information in the login script comprises:
calling login information in the login script and filling the login information into a preset login frame in a login page;
and triggering a login button through the login script to login the query system.
In some embodiments, identifying a login page of the query system according to path information in a preset login script includes:
identifying a login button of a login page according to the path information;
the login query system according to the login information in the login script comprises:
and under the condition that the login information in the login script is input into the login page, triggering a login button through the login script so as to log in the inquiry system.
In some embodiments, the login page includes an account entry box and a password entry box, the login information includes a login account and a login password, and the query system is logged in according to the login information in the login script, including:
and respectively inputting the login account and the login password into an account input box and a password input box to log in the query system.
In some embodiments, the login page further comprises an authentication code input box, and prior to logging into the query system, the method further comprises:
and inputting the preset universal verification code into a verification code input box.
In some embodiments, screenshot processing is performed on a browse page of a query system to obtain a target page screenshot, including:
entering a first-level menu interface of the query system;
determining whether a current menu in a primary menu interface is browsed;
if the current menu is not browsed, opening all secondary menus under the current menu;
and taking all the page screenshots of the secondary menu under the current menu as target page screenshots.
In some embodiments, performing text recognition on the target page screenshot to identify the target text comprises:
and performing character recognition on all the page screenshots of the secondary menu of the current menu to recognize target characters.
In some embodiments, performing text recognition on the target page screenshot to identify the target text comprises:
carrying out binarization processing on the target page screenshot to obtain a binarization image;
extracting target character information from the binary image according to the pixel gray value;
comparing the target character information with preset character information to obtain a comparison result;
and if the character information is the same as the preset character information, converting the character information into the target characters.
In some embodiments, extracting the target character information from the binarized image based on pixel grayscale values includes:
dividing the binary image into a plurality of initial areas according to the pixel gray value; wherein the pixel gray values of all pixels of the initial region are the same;
extracting areas with zero pixel gray values from the plurality of initial areas as character areas;
performing character recognition on the character area to obtain initial character information;
and according to the character interval condition of the character area, carrying out segmentation processing on the initial character information to obtain target character information.
In some embodiments, matching the target text with a preset keyword to obtain a detection result corresponding to the keyword includes:
acquiring a preset keyword;
sequentially acquiring single characters of the target characters according to the character sequencing sequence of the target characters;
matching each single character of the target character with the keyword to obtain a matching result;
and if the matching result is that the target character consistent with the keyword is found, obtaining the target character of the corresponding keyword.
A second aspect of the embodiments of the present disclosure provides a keyword detection apparatus, including: login module and detection module, wherein the login module specifically is used for:
identifying a login page of the query system according to path information in a preset login script, wherein the path information is used for indicating a path for logging in the login page of the query system;
logging in the query system according to the login information in the login script;
the detection module is specifically configured to:
performing screenshot processing on a browsing page of the query system to obtain a target page screenshot;
performing character recognition on the screenshot of the target page to recognize the target characters;
and matching the target characters with preset keywords to obtain a detection result corresponding to the keywords.
A third aspect of the embodiments of the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a program, and the processor is configured to execute the method according to any one of the embodiments of the first aspect of the present disclosure when the program is executed by the processor.
A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, in which a program is stored in a memory, and the program, when executed by a processor, is configured to perform the method according to any one of the embodiments of the first aspect of the present application.
The embodiment of the disclosure provides a keyword detection method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: according to the method, the login page of the query system is identified according to the preset path information in the login script, the login information in the login script is input into the login page to log in the query system, a user does not need to manually find the login page in the embodiment, and the user does not need to manually input the login information to log in the query system, so that the function of automatically logging in the query system is realized. That is to say, in the scheme of the application, the target characters are automatically identified by automatically logging in the query system and automatically acquiring the screenshot of the page after entering the query system, so that the detection efficiency of the keywords is greatly improved.
Drawings
Fig. 1 is a flowchart of a keyword detection method provided by an embodiment of the present disclosure;
FIG. 2 is a flowchart of step S300 in FIG. 1;
FIG. 3 is a flowchart of step S400 in FIG. 1;
FIG. 4 is a schematic diagram of an initial image provided by an embodiment of the present disclosure;
FIG. 5 is a schematic illustration of a binarized image resulting from processing the initial image of FIG. 4;
FIG. 6 is a flowchart of step S420 in FIG. 3;
FIG. 7 is a schematic diagram of a label image resulting from processing the binarized image of FIG. 5;
fig. 8 is a block diagram of a module structure of a keyword detection apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
First, several terms referred to in the present application are explained:
UI automated testing: and using a tool or a script to run a system or an application program on a front-end interface of the software to be tested under a preset condition and known test data, acquiring a data result displayed on a front-end page of the software to be tested, verifying, and evaluating to obtain a test conclusion.
Ocr (optical character recognition) character recognition: refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks characters printed on paper and then translates the shape into computer text using a character recognition method; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information.
string: string is a character string in programming languages such as C + +, java, VB and the like, and is a few characters which are referenced by double quotation marks, such as 'Abc', 'day', and the character string is a special object and belongs to a reference type. In java and C #, after a String class object is created, a String cannot be changed once initialized, all strings in the String class are constant, data cannot be changed, and the String object can be shared because the String object is not changeable. Any change to the String class is a return of a new String class object. The string in the C + + standard library is encapsulated in a type form by a string class, and the string comprises the processing operation of a character sequence.
java keywords (keywords): is a specially defined identifier, sometimes called reserved word, and a specially defined variable in computer language. The keywords of Java have special meaning to Java compiler, they are used to represent a data type, or represent the structure of program, etc., and the keywords cannot be used as variable name, method name, class name, package name and parameter.
Extensible markup Language Path Language (Xml Path Language, XPath): which is a language used to locate a part of an XML document, XPath provides the ability to find nodes in a data structure tree based on an XML tree structure.
Script (Script): a scripting language, also known as a build-out language, or dynamic language, is a programming language used to control software applications, and scripts are usually stored as text and are only interpreted or compiled when called.
The web application: the method is an application program which can be accessed through Web, and the program has the greatest advantage that a user can easily access the application program, and the user only needs to have a browser and does not need to install other software.
Otsu (otsu): the method is an algorithm for determining the image binarization segmentation threshold, and is also called as a maximum between-class variance method from the principle of Otsu method, because after the image binarization segmentation is carried out according to the threshold obtained by Otsu method, the between-class variance between the foreground and background images is maximum, and the image is divided into two parts of the background and the foreground according to the gray characteristic of the image. Since the variance is a measure of the uniformity of the gray distribution, the larger the inter-class variance between the background and the foreground is, the larger the difference between the two parts constituting the image is, and the smaller the difference between the two parts is when part of the foreground is mistaken for the background or part of the background is mistaken for the foreground. Thus, a segmentation that maximizes the inter-class variance means that the probability of false positives is minimized.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the rapid development of internet technology, internet information under the big data era grows exponentially, and a large amount of information resources are shared through network services, wherein browsing a system website is one of the network services with the highest utilization rate at present. In order to further optimize the system, for example, the query system, the detection of the keywords needs to be performed on the pages of the query system. At present, keyword detection is mainly performed on a page of a query system in a manual mode, a user needs to log in the query system manually, find a page needing keyword detection manually, and manually check whether the page contains keyword information, but keyword detection efficiency is not high. Therefore, a keyword detection method with higher keyword detection efficiency is needed.
Based on this, the embodiments of the present disclosure provide a keyword detection method and apparatus, an electronic device, and a storage medium, which can improve the detection efficiency of keywords.
The method and apparatus for detecting a keyword, the electronic device, and the storage medium provided in the embodiments of the present disclosure are specifically described in the following embodiments, and first, the method for detecting a keyword in the embodiments of the present disclosure is described.
The embodiment of the disclosure provides a keyword detection method, and relates to the technical field of artificial intelligence. The keyword detection method provided by the embodiment of the disclosure can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, smart watch, or the like; the server side can be configured as an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN) and a big data and artificial intelligence platform; the software may be an application or the like implementing the keyword detection method, but is not limited to the above form.
The disclosed embodiments are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Referring to fig. 1, a keyword detection method according to an embodiment of the present disclosure includes, but is not limited to, steps S100 to S500.
S100, identifying a login page of the query system according to preset path information in a login script;
s200, logging in the query system according to the login information in the login script;
s300, performing screenshot processing on a browsing page of the query system to obtain a target page screenshot;
s400, performing character recognition on the screenshot of the target page to recognize the target characters;
and S500, matching the target characters with preset keywords to obtain detection results corresponding to the keywords.
In step S100, a login page of the query system is identified according to path information in a preset login script, where the script is a strip of text commands, the text commands are visible, when the script program is executed, an interpreter of the system translates one strip of text commands into machine-recognizable instructions, and the machine-recognizable instructions are executed in program order. For example, the text commands for login in the login script are: and acquiring the webpage address of the login page through XPath stored in the login script, inputting the webpage address of the login page, and jumping to the login page of the query system. The query system may be a trademark query system, specifically, the trademark query system may be a trademark query system of a trademark office, or may also be a trademark query system developed by another third party company, and the keyword detection is performed on the trademark query system, so that the piracy of the trademark can be prevented.
In step S200, login information in the login script is input to a login page to log in the query system, wherein the login information refers to: before logging in the query system, information to be filled in page elements of the login page is required, the page elements include but are not limited to all elements displayed on the login page, such as images, text boxes, buttons, drop-down lists, videos and the like, and the login information is filled in the login page before logging in the query system.
In some embodiments, the step of logging into the query system according to the login script is: identifying a login button of a login page according to the path information, calling login information in the login script, and filling the login information into a preset login frame in the login page; and triggering a login button through the login script to login the query system. The process of identifying the login button of the login page according to the path information specifically comprises the following steps: the method includes the steps of acquiring attribute information of a login button, wherein the attribute information is information capable of uniquely identifying the login button, such as an ID of the login button, a name of the login button and the like, then acquiring a button path of the login button from path information according to the attribute information, and identifying the login button according to the button path.
In some embodiments, the path information mentioned in step S100 is: the login page is used for indicating the path of the login button at the position of the login page, the login button of the login page can be identified according to the path information of the login button, and when the login information in the login script is input into the login page, a click operation is triggered on the login button through a character command of the login script to realize login, wherein the click operation is equivalent to a click operation.
In some embodiments, the login page includes an account entry box and a password entry box, the login information includes a login account and a login password, and the login account and the login password are entered into the account entry box and the password entry box, respectively, in the login page to log in the query system.
In some embodiments, before logging in the query system, in addition to inputting a login account and a login password on a login page, a preset universal verification code needs to be input in a verification code input box to log in the query system, wherein for web applications, most systems require a user to input a verification code when the user logs in, and the verification code has many types, such as letters, numbers, Chinese characters and the like, for the systems, the verification code can be used to effectively prevent the password from being hacked by a machine guessing method, so that the security is increased to a certain extent, because most verification codes in the systems are randomly generated and are not a fixed value, for the automatic login in the embodiment of the present application, if the verification code in the login page is identified in a conventional manner, the process is complicated, and if the verification code is directly removed, in consideration of the above situation, the embodiment of the present application performs verification by using a preset universal verification code, where the universal verification code is to set the verification code to a certain specific value in advance, for example, "ab 13 c", as long as the user inputs the preset universal verification code "ab 13 c" into the input box, the program automatically considers that the verification is passed, and if the user inputs a universal verification code other than the preset universal verification code into the input box, the program performs verification in the original verification mode of verification code, that is, the security of the query system is ensured, and unnecessary verification code identification steps can be reduced.
In step S300, screenshot is performed on a browsing page of the query system to obtain a target page screenshot, where the browsing page refers to a webpage that can be displayed in the query system, and in practical application, the screenshot on the browsing page may be performed in the following manner: the method is characterized in that the method is carried out by utilizing a self-contained method in a program, the screenshot of the whole browsing page is carried out, the obtained screenshot of the target page is stored in a specified path, in order to avoid the repetition of the name of the image, the image can be named in a date and time format, for example, a certain screenshot of the target page can be named as '20210720-163656. png', it needs to be noted that the mode of naming the image is not limited to the steps, and a person skilled in the art can also name the image according to the size, type and the like of the image according to actual requirements, and details are not described herein.
In some embodiments, as shown in fig. 2, step S300 includes, but is not limited to, steps S310 to S340:
s310, entering a primary menu interface of the query system;
s320, determining whether the current menu in the primary menu interface is browsed;
s330, if the current menu is not browsed, opening all secondary menus under the current menu;
s340, acquiring all page screenshots of the secondary menu under the current menu;
in steps S310 to S320, a first-level menu interface of the query system is entered, where the first-level menu interface refers to a main menu of the query system, and it is determined whether a current menu in the first-level menu interface is browsed, if the current menu is not browsed, all second-level menus under the current menu are opened, and page screenshots of all second-level menus under the current menu are obtained, where a second-level menu refers to a sub-menu of the main menu, it should be noted that a menu bar is actually a tree-shaped structure, a sub-menu is a branch of the menu bar, and a sub-menu refers to an option included in a certain item in the main menu. For example, when the menu 1 has 4 options, which are "option 1", "option 2", "option 3", "option 4", and "option 2" is viewed, the submenu 1 "," submenu 2 "," submenu 3 "," submenu 4 "," submenu 1 "," submenu 2 "," submenu 3 ", and" submenu 4 "are provided therein, so as to form a submenu of" option 2 ", if the current menu is not browsed, it is determined that none of the second-level menus under the current menu has been browsed, it is necessary to capture all pages of the second-level menu, if the current menu has been browsed, it is determined whether the second-level menu has been browsed, capture is performed on the second-level menu that has not been browsed, it is required to describe that when a certain menu interface is browsed, the browsing state of the menu interface can be marked as a browsed state, so that whether a certain menu interface is browsed or not can be judged according to the browsing state.
In step S400, character recognition is performed on the target page screenshot to recognize a target character, where the target character is a character recognized after the character recognition is performed on the target page screenshot, and in practical applications, character recognition may be performed on the target page screenshot by using an OCR technology.
In some embodiments, the page screenshots of all secondary menus of the current menu mentioned in step S340 are subjected to character recognition, so as to obtain target characters corresponding to each page.
In some embodiments, as shown in fig. 3, step S400 includes, but is not limited to, steps S410 to S440:
s410, performing binarization processing on the screenshot of the target page to obtain a binarized image;
s420, extracting target character information from the binary image according to the pixel gray value;
s430, comparing the target character information with preset character information to obtain a comparison result;
s440, if the character information is the same as the preset character information, converting the character information into the target text.
In step S410, the target page screenshot is processed to obtain a binary image, that is, the target page screenshot is subjected to image preprocessing, and an otsu algorithm may be used to perform binarization on the image, that is, the target page screenshot is divided into 2 groups by an appropriate division threshold, one group of grays, for example, a grayscale value of 0, is used as a target, the other group of grays, for example, a grayscale value of 255, is used as a background, and the binary image is obtained according to the target and the background.
In some embodiments, in practical applications, a specific process of processing the target page screenshot to obtain the binarized image is as follows, and it should be understood that the following description is only an exemplary illustration and is not a specific limitation of the present application.
The method comprises the following steps: initializing parameters of a target page screenshot, specifically: and (3) setting a segmentation threshold value of a target and a background of the target page screenshot as t, wherein the ratio of the target point number to the target page screenshot is W0, the average gray scale is U0, the ratio of the background point number to the target page screenshot is W1, and the average gray scale is U1.
Step two: calculating the proportion of the target points in the screenshot of the target page and the proportion of the background points in the screenshot of the target page, specifically: if the size of the target page screenshot is M × N, the number of pixels in the target page screenshot, the gray level of which is less than t, is N0, and the number of pixels in the target page screenshot, the gray level of which is greater than t, is N1, then W0 is N0/(M × N), and W1 is N1/(M × N).
Step three: calculating the total average gray level of the screenshot of the target page and the variance and the total average gray level of the target and the background, specifically: the total average gray scale U of the target page screenshot is W0U 0+ W1U 1, the variance between the target and the background of the target page screenshot is g W0(U0-U) (U0-U) + W1(U1-U) (U1-U), and the variance between the target and the background of the target page screenshot is g W0W 1 (U0-U1) (U0-U1) after U0U 0+ W1U 1 is substituted.
Step four: and (3) solving the value of t when the variance g between the target and the background of the screenshot of the target page is at the maximum value in a traversal mode.
Step five: setting the gray value of the pixel smaller than t in the screenshot of the target page as 0 (corresponding to the target), and setting the gray value of the pixel larger than t as 255 (corresponding to the background).
The following is a description of a specific embodiment of a process for preprocessing a target page screenshot, and for the purpose of aspect description, the target page screenshot is named as an initial image hereinafter, and it should be understood that the following description is only an exemplary illustration and not a specific limitation of the present application.
The method comprises the following steps: after performing gray scale detection on the initial image of 6 × 6 pixels in fig. 4, as shown in table 1, table 1 is the gray scale value of each pixel in the initial image:
0 0 51 204 204 255
0 51 153 204 153 204
51 153 204 102 51 153
204 204 153 51 0 0
255 204 102 51 0 0
255 255 204 153 51 0
TABLE 1
Step two, setting the division threshold to 153, and calculating the threshold, i.e. t, to 153, specifically:
calculating the proportion W0 of the target points in the initial image as: w0 ═ N0/(M × N) ═ 17/36 ═ 0.4722; calculating the average gray level U0 of the target points in the initial image as: u0 ═ (0 × 8+51 × 7+102 × 2)/17 ═ 33; calculating the proportion W1 of the background points in the initial image as: w1 ═ N1/(M × N) ═ 19/36 ═ 0.5278; calculating the average gray level U1 of the background points in the initial image as: u1 ═ (153 × 6+204 × 9+255 × 4)/19 ═ 198.6316; the variance g of the target and background of the initial image is calculated as: g-W0-W1-U0-U1 (U0-U1) -0.4722-0.5278-165.6316-165.6316-6837.254.
Step three: and judging according to the data obtained in the second step, when t is 153, g is 6837.254 at maximum, the gray values of the pixels smaller than 153 are all set to be 0 (corresponding to the target), and the gray values of the pixels larger than t are set to be 255 (corresponding to the background), so as to obtain the binary image, wherein the specific binary image is as shown in fig. 5.
In some embodiments, as shown in fig. 6, step S420 includes, but is not limited to, steps S421 to S424:
s421, dividing the binary image into a plurality of initial areas according to the pixel gray value;
s422, extracting an area with zero pixel gray value from the plurality of initial areas as a character area;
s423, performing character recognition on the character area to obtain initial character information;
and S424, according to the character interval condition of the character area, carrying out segmentation processing on the initial character information to obtain target character information.
In steps S421 to S422, a text region of the binarized image is obtained according to the gray level value of each pixel in the binarized image, where the gray level values of all pixels in the initial region are the same, and further, the gray level value of each pixel in the text region is zero, specifically, the binarized image is scanned, a connected region with the same gray level value is obtained in the binarized image, the same label value is assigned to each pixel in the region, different connected regions are distinguished by setting different label values in the different connected regions, and the connected region mentioned in this embodiment of the present application is the text region of the binarized image.
The following is a description of a process of dividing a text region of a binarized image by way of example, and it should be understood that the following description is only exemplary and not a specific limitation of the present application.
First, a first scan is performed on each row of pixels of the binarized image in fig. 5 (the binarized image in fig. 5 includes 5 rows of pixels), where [ x, y ] indicates relative coordinate points or coordinate regions of some pixels in the binarized image, for example, when x is 1 and y is 3, [1,3] indicates the 1 st pixel, the 2 nd pixel and the 3 rd pixel in the first row of pixels in the initial image, and Label is a Label value. The specific scanning result of the binarized image is as follows, in the first row: a total of 1 blob (representing that of the pixels in the first row, only one connected region has a pixel of 0), i.e., [1,3], Label 1; a second row: a total of 1 clique, i.e., [1, 2], since the cliques in the second row are continuous with the cliques in the first row, the Label in the second row is also marked as 1, which means that the cliques in the second row and the cliques in the first row can form a connected region; third row: there are a total of two blobs, namely [1,1] and [4,5], the Label of the first blob in the third row is labeled 1 because the first blob in the third row and the blobs in the second row are contiguous, and the Label of the second blob in the third row is labeled 2 for distinction because the second blob in the third row is not contiguous with the rest of the blobs; fourth row: a total of 1 blob, i.e., [4,6], the Label in the fourth row is labeled 2 since the blob in the fourth row is contiguous with the second blob in the third row; the fifth element: a total of two clusters, 3 and 4,6, respectively, the Label is 3 since the first cluster [3,3] is not continuous with the previous row, and the Label is 2 since the second cluster [4,6] is continuous with the previous row; a sixth row: there is a total of one blob, i.e., [5,6], since this blob is contiguous with the second blob in the fifth row, labeled Label 2, whereby the first pass of the binarized image is completed.
Step two, based on the image in the step one, performing a second scanning, specifically: optimizing the result of the first scanning, judging whether each clique in the first scanning can be communicated, and correcting the label value of the cliques which can be communicated but have inconsistent label values, for example: in the first scanning pass, the Label of the first blob in the fifth row is labeled as 3, and in the second scanning pass, it can be detected that the first blob in the fifth row is communicated with the second blob but the Label values are not consistent, and the Label of the first blob in the fifth row needs to be modified to 2 to obtain an accurate Label image, as shown in fig. 7, two text regions included in the binarized image can be determined according to the Label values Label.
In steps S423 to S424, performing character recognition on the character region to obtain initial character information, specifically: detecting the starting position and the ending position of each line of text in the text area in the step S421, fitting a base line by a least square method according to the coordinates of each pixel in the text area, namely, locating the base line at the position with the densest pixel points, finding the arrangement condition of each line in a binary image, identifying the characters in each line by using ORC (organic Rankine cycle) to obtain initial character information, segmenting the initial character information according to the character interval condition of the character area to obtain target character information, and obtaining the target character information
In steps S430 to S440, the target character information is compared with the preset character information to obtain a comparison result, and the target character is obtained according to the comparison result. Wherein, the preset character information may refer to a character library storing various characters, the target character information is compared with the characters in the character library, if the comparison result is the same, it indicates that the target character information is consistent with the characters in the character library, the target character information can be converted into a target character, for example, the target character information is "good", if the character information stored in the character library is "good", in the comparison process, it is found that the target character information is consistent with the preset character information, the target character corresponding to the target character information can be considered as "good", if the character information consistent with the target character information is not matched in the preset character information, the target character information is considered to be cut twice, the wrongly-cut characters are recombined, then the characters are finely cut, after the fine cutting is completed, and comparing the character with the characters in the character library, and re-identifying the characters, thereby completing the process of identifying the image based on the ORC.
In step S500, the target text is matched with a preset keyword to obtain a detection result corresponding to the keyword, the preset keyword is preset by the user, for example, the preset keyword is set as a "test", when the target text includes the "test", the target text is considered to include the keyword, in addition, the embodiment of the present application can also position the position including the keyword in the target text, so that the user can conveniently find the corresponding page in the query system according to the position of the keyword to modify the keyword, the preset keyword in the embodiment of the present application can also support global configuration, that is, the modification of the keyword can be performed by modifying the global parameter, dynamic configuration is supported, and after the keyword is modified, the system can still perform automatic detection according to the modified keyword.
In some embodiments, step S500 specifically includes the steps of: acquiring preset keywords, sequentially acquiring single characters of the target characters according to the character sequencing sequence of the target characters, and matching each single character of the target characters with the keywords to obtain a matching result; if the matching result is that the target character consistent with the keyword is found, the target character of the corresponding keyword is obtained, it should be noted that, in this embodiment of the application, if the target character consistent with the keyword can be found in the target character, the detection result is the target character consistent with the keyword, and if the target character consistent with the keyword cannot be found in the target character, the detection result is failure, that is, the target character does not include the keyword.
The disclosed embodiment provides a keyword detection method, which can identify a login page of a query system according to path information in a preset login script, input login information in the login script into the login page, by logging in the query system, the embodiment does not need the user to manually find the login page and manually input the login information to log in the query system, realizes the function of automatically logging in the query system, after logging in the query system, automatically performing screenshot processing on a browsing page of the query system to obtain a target page screenshot, and then, character recognition is carried out on the screenshot of the target page, the target character is recognized, finally, the target character is matched with a preset keyword, a detection result corresponding to the keyword is obtained, a user can know whether the query system contains the keyword according to the detection result, and the detection efficiency of the keyword is improved.
The embodiment of the present disclosure further provides a keyword detection apparatus 600, which can implement the keyword detection method, and as shown in fig. 8, the apparatus includes: a login module 601 and a detection module 602;
the login module 601 is specifically configured to: identifying a login page of the query system according to path information in a preset login script, wherein the path information is used for indicating a path for logging in the login page of the query system; inputting login information in the login script into a login page to log in the query system;
the detection module 602 is specifically configured to: performing screenshot processing on a browsing page of the query system to obtain a target page screenshot; performing character recognition on the screenshot of the target page to recognize the target characters; and matching the target characters with preset keywords to obtain a detection result corresponding to the keywords.
The keyword detection apparatus 600 according to the embodiment of the disclosure, by implementing the above-mentioned keyword detection method, identifies the login page of the query system according to the path information in the preset login script, and inputs the login information in the login script to the login page to log in the query system, and this embodiment does not require the user to manually find the login page and manually input the login information to log in the query system, so as to implement the function of automatically logging in the query system, after logging in the query system, automatically performs screenshot processing on the browsing page of the query system to obtain a screenshot of the target page, then performs character recognition on the screenshot of the target page to identify the target character, and finally matches the target character with the preset keyword to obtain the detection result corresponding to the keyword, and the user can know whether the query system contains the keyword according to the detection result, the detection efficiency of the keywords is improved.
An embodiment of the present disclosure further provides an electronic device, including: the electronic device comprises a memory and a processor, wherein the memory has stored therein a program, which when executed by the processor is adapted to perform the method according to any of the embodiments of the present application.
The hardware structure of the electronic device, which may include a processor 701 and a memory 702, is described in detail below in conjunction with fig. 9.
The processor 701 may be implemented by a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute a relevant program to implement the technical solution provided by the embodiment of the present disclosure;
the memory 702 may be implemented in a form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a Random Access Memory (RAM). The memory 702 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 702 and called by the processor 701 to execute the keyword detection method according to the embodiments of the present disclosure;
the electronic device may also include an input/output interface 703, a communication interface 704, and a bus 705.
An input/output interface 703 for realizing information input and output;
the communication interface 704 is used for realizing communication interaction between the device and other devices, and can realize communication in a wired manner (for example, USB, network cable, etc.) or in a wireless manner (for example, mobile network, WIFI, bluetooth, etc.); and
a bus 705 that transfers information between the various components of the device (e.g., the processor 701, the memory 702, the input/output interface 703, and the communication interface 704);
wherein the processor 701, the memory 702, the input/output interface 703 and the communication interface 704 are communicatively connected to each other within the device via a bus 705.
The embodiment of the disclosure also provides a storage medium, wherein the storage medium stores a program, and the processor is used for executing the method according to any one of the embodiments of the disclosure when the program is executed by the processor.
The keyword detection method and device, the electronic device, and the storage medium provided by the embodiments of the present disclosure identify a login page of an inquiry system according to path information in a preset login script, and input login information in the login script to the login page to log in the inquiry system, in this embodiment, a user does not need to manually find the login page and manually input the login information to log in the inquiry system, thereby implementing a function of automatically logging in the inquiry system, after logging in the inquiry system, automatically performing screenshot processing on a browsing page of the inquiry system to obtain a screenshot of a target page, then performing character identification on the screenshot of the target page to identify a target character, finally matching the target character with a preset keyword to obtain a detection result corresponding to the keyword, and the user can know whether the inquiry system contains the keyword according to the detection result, the detection efficiency of the keywords is improved.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present disclosure are for more clearly illustrating the technical solutions of the embodiments of the present disclosure, and do not constitute a limitation to the technical solutions provided in the embodiments of the present disclosure, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present disclosure are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.
Those skilled in the art will appreciate that the solutions shown in fig. 1-3 and 6 are not intended to limit the embodiments of the present disclosure, and may include more or less steps than those shown, or may combine some of the steps, or may be different steps.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The preferred embodiments of the present disclosure have been described above with reference to the accompanying drawings, and therefore do not limit the scope of the claims of the embodiments of the present disclosure. Any modifications, equivalents and improvements within the scope and spirit of the embodiments of the present disclosure should be considered within the scope of the claims of the embodiments of the present disclosure by those skilled in the art.

Claims (10)

1. A keyword detection method is characterized by comprising the following steps:
identifying a login page of an inquiry system according to path information in a preset login script, wherein the path information is used for indicating a path of the login page of the inquiry system;
logging in the query system according to the login information in the login script;
performing screenshot processing on a browsing page of the query system to obtain a target page screenshot;
carrying out character recognition on the screenshot of the target page to recognize target characters;
and matching the target characters with preset keywords to obtain a detection result corresponding to the keywords.
2. The method of claim 1, wherein the identifying a login page of a query system according to path information in a preset login script comprises:
identifying a login button of the login page according to the path information;
the logging in the query system according to the logging information in the logging script comprises the following steps:
calling login information in the login script and filling the login information into a preset login frame in the login page;
and triggering the login button through the login script so as to log in the query system.
3. The method of claim 2, wherein identifying the login button of the login page according to the path information comprises:
acquiring attribute information of the login button; wherein the attribute information is information capable of uniquely identifying the login button;
acquiring a button path of the login button from the path information according to the attribute information;
and identifying the login button according to the button path.
4. The method of claim 1, wherein the performing screenshot processing on a browse page of the query system to obtain a target page screenshot comprises:
entering a first-level menu interface of the query system;
determining whether a current menu in the primary menu interface is browsed;
if the current menu is not browsed, opening all secondary menus under the current menu;
and taking the page screenshots of all secondary menus under the current menu as the target page screenshots.
5. The method of any one of claims 1 to 4, wherein the performing text recognition on the target page screenshot to identify a target text comprises:
carrying out binarization processing on the target page screenshot to obtain a binarization image;
extracting target character information from the binary image according to the pixel gray value;
comparing the target character information with preset character information to obtain a comparison result;
and if the character information is the same as the preset character information, converting the character information into the target characters.
6. The method according to claim 5, wherein the extracting target character information from the binarized image according to pixel gray-scale values comprises:
dividing the binary image into a plurality of initial areas according to the pixel gray value; wherein the pixel grey values of all pixels of the initial region are the same;
extracting regions with zero pixel gray values from the plurality of initial regions as character regions;
performing character recognition on the character area to obtain initial character information;
and according to the character interval condition of the character area, carrying out segmentation processing on the initial character information to obtain the target character information.
7. The method according to any one of claims 1 to 4, wherein the matching the target text with a preset keyword to obtain a detection result corresponding to the keyword comprises:
acquiring a preset keyword;
sequentially acquiring single characters of the target characters according to the character sequencing sequence of the target characters;
matching each single character of the target character with the keyword to obtain a matching result;
and if the matching result is that the target character consistent with the keyword is found, obtaining the target character of the corresponding keyword.
8. A keyword detection apparatus, comprising:
a login module: the login module is specifically configured to:
identifying a login page of an inquiry system according to path information in a preset login script, wherein the path information is used for indicating a path of the login page of the inquiry system;
logging in the query system according to the login information in the login script;
a detection module: the detection module is specifically configured to:
performing screenshot processing on a browsing page of the query system to obtain a target page screenshot;
carrying out character recognition on the screenshot of the target page to recognize target characters;
and matching the target characters with preset keywords to obtain a detection result corresponding to the keywords.
9. Electronic device, characterized in that the electronic device comprises a memory and a processor, wherein the memory has stored therein a program which, when executed by the processor, is adapted to carry out the method according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage stores a computer program for performing the method of any of claims 1 to 7 when the computer program is executed by a computer.
CN202111044573.0A 2021-09-07 2021-09-07 Keyword detection method and device, electronic equipment and storage medium Pending CN113742559A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111044573.0A CN113742559A (en) 2021-09-07 2021-09-07 Keyword detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111044573.0A CN113742559A (en) 2021-09-07 2021-09-07 Keyword detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113742559A true CN113742559A (en) 2021-12-03

Family

ID=78736598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111044573.0A Pending CN113742559A (en) 2021-09-07 2021-09-07 Keyword detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113742559A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115826991A (en) * 2023-02-14 2023-03-21 江西曼荼罗软件有限公司 Software script generation method, system, computer and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115826991A (en) * 2023-02-14 2023-03-21 江西曼荼罗软件有限公司 Software script generation method, system, computer and readable storage medium

Similar Documents

Publication Publication Date Title
CA2917256C (en) Screenshot-based e-commerce
CN111767228B (en) Interface testing method, device, equipment and medium based on artificial intelligence
US20190188729A1 (en) System and method for detecting counterfeit product based on deep learning
CN107798001B (en) Webpage processing method, device and equipment
CN109977337B (en) Webpage design comparison method, device and equipment and readable storage medium
CN114120299A (en) Information acquisition method, device, storage medium and equipment
JP2019079347A (en) Character estimation system, character estimation method, and character estimation program
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN113742559A (en) Keyword detection method and device, electronic equipment and storage medium
CN116610304B (en) Page code generation method, device, equipment and storage medium
US20200364034A1 (en) System and Method for Automated Code Development and Construction
CN113094287A (en) Page compatibility detection method, device, equipment and storage medium
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN116703526A (en) Article recommendation method, device, equipment and storage medium
CN113806667B (en) Method and system for supporting webpage classification
CN111459774A (en) Method, device and equipment for acquiring flow of application program and storage medium
CN116185812A (en) Automatic testing method, device and medium for software system functions
CN113886906A (en) CAD drawing loading method, font file replacing method, device and storage medium
CN113705559A (en) Character recognition method and device based on artificial intelligence and electronic equipment
CN111291738A (en) Element extraction method and device in front-end page image and electronic equipment
CN112070092A (en) Verification code parameter acquisition method and device
CN112784189A (en) Method and device for identifying page image
CN110851349A (en) Page abnormal display detection method, terminal equipment and storage medium
CN116594916B (en) Page control positioning method, device and storage medium
CN115688083B (en) Method, device and equipment for identifying image-text verification code and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination