CN115410201A - Method, device and related equipment for processing verification code characters - Google Patents

Method, device and related equipment for processing verification code characters Download PDF

Info

Publication number
CN115410201A
CN115410201A CN202110571650.1A CN202110571650A CN115410201A CN 115410201 A CN115410201 A CN 115410201A CN 202110571650 A CN202110571650 A CN 202110571650A CN 115410201 A CN115410201 A CN 115410201A
Authority
CN
China
Prior art keywords
target
picture
character
verification code
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110571650.1A
Other languages
Chinese (zh)
Inventor
郑少胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN202110571650.1A priority Critical patent/CN115410201A/en
Publication of CN115410201A publication Critical patent/CN115410201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/36User authentication by graphic or iconic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the application discloses a method, a device and related equipment for processing verification code characters, wherein the method comprises the following steps: determining a target verification code picture to be verified through a webpage crawler strategy and a service link of a target service object captured from a service platform; inputting the target verification code picture into a target character recognition model, and recognizing target verification code characters in the target verification code picture by the target character recognition model; inputting the target verification code characters into a character input area in a simulation mode, responding to a simulation submission operation aiming at the target verification code characters, and performing character verification on the target verification code characters to obtain a character verification result; and if the character verification result indicates that the verification is successful, acquiring an operation certificate picture bound with the service link, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture. By the method and the device, the instantaneity and the accuracy of character extraction can be improved, and the acquisition efficiency of the main body information can be improved.

Description

Method, device and related equipment for processing verification code characters
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a captcha character, and a related device.
Background
At present, in some service scenarios (for example, a main body information lookup scenario), if a certain user (for example, a user a) needs to lookup a certain main body information in the service scenario through a user terminal, an automatic solution of a verification code can be performed on a service display interface currently displayed by the user terminal through a manual coding platform, and then the user terminal is allowed to load and obtain the main body information that the user a needs to query under the condition that the verification is passed.
For example, in the process of automatically solving the verification code based on the manual coding platform of the third party, the external personnel employed by the third party is often needed to assist in manual identification of the verification code and manual entry of the verification code, so that the automatic solution of the verification code in the current business scene is achieved, and the main information required by the user a can be acquired within a certain manual interaction time. It should be understood that in the process of manually identifying the verification code, once the verification code to be identified has character distortion or stickiness, the possibility of false identification is caused due to the subjectivity of manual identification, and the accuracy of verification code identification is further reduced. In addition, when the user a needs to refer to a large amount of subject information in batch, the manual interaction time of the third party is inevitably increased, so that the efficiency of the user a in acquiring the subject information is reduced.
Disclosure of Invention
The application provides a method, a device and related equipment for processing verification code characters, which can improve the real-time performance and accuracy of character extraction, and further improve the acquisition efficiency of main body information.
An embodiment of the present application provides a method for processing a verification code character, including:
capturing a service link of a target service object on a service platform through a webpage crawler strategy, and determining a target verification code picture to be verified based on the service link and the webpage crawler strategy;
acquiring a target character recognition model associated with a service platform, inputting a target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain target verification code characters in the target verification code picture;
the method comprises the steps of inputting target identifying code characters to a character input area corresponding to a target identifying code picture in a simulation mode, responding to a simulation submission operation aiming at the target identifying code characters in the character input area, and carrying out character verification on the target identifying code characters to obtain a character verification result;
and if the character verification result indicates that the verification is successful, acquiring an operation certificate picture bound with the service link, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
An embodiment of the present application provides an apparatus for processing a verification code character, including:
the target picture determining module is used for capturing a service link of a target service object on the service platform through a webpage crawler strategy and determining a target verification code picture to be verified based on the service link and the webpage crawler strategy;
the target character recognition module is used for acquiring a target character recognition model associated with the service platform, inputting a target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain a target verification code character corresponding to the target verification code picture;
the character submitting module is used for inputting the target verification code characters to a character input area corresponding to the target verification code picture in a simulation mode, responding to the simulation submitting operation aiming at the target verification code characters in the character input area, and performing character verification on the target verification code characters to obtain a character verification result;
and the main body information acquisition module is used for acquiring an operation certificate picture bound with the service link if the character verification result indicates that the verification is successful, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
Wherein, the target picture determining module comprises:
the link capturing unit is used for acquiring a webpage crawler strategy associated with the service platform and capturing link information of the N service objects from the service platform through the webpage crawler strategy; n is a positive integer;
the link determining unit is used for acquiring a target business object from the N business objects and using the link information of the target business object as the business link of the target business object;
the link analysis unit is used for analyzing the service link to obtain an operation link of an operation subject to which the target service object belongs, and outputting an operation display interface of the operation subject based on the operation link; the operation display interface comprises a query interface for querying an operation certificate picture of an operation subject;
and the verification interface output unit is used for responding to the simulation trigger operation aiming at the query interface, outputting a simulation verification interface corresponding to the operation display interface, and acquiring a target verification code picture to be verified from the simulation verification interface through a webpage crawler strategy.
Wherein, verify the interface output unit and include:
the interface triggering subunit is used for responding to the simulation triggering operation aiming at the query interface and switching the operation display interface into a simulation verification interface; the simulation verification interface comprises a target display area; the target display area is used for displaying a target verification code picture associated with the operation certificate picture;
the first query subunit is used for querying a verification picture address character matched with a first key element in a first picture acquisition rule in a data structure tree corresponding to the simulation verification interface through the first picture acquisition rule in the webpage crawler policy;
and the verification picture extracting subunit is used for extracting a target verification code picture from the target display area based on the verification picture address symbol.
Wherein the target character recognition module includes:
the target image acquisition unit is used for acquiring a target character recognition model associated with the service platform, denoising the target verification code image and taking the denoised target verification code image as a to-be-processed image;
the convolution characteristic extraction unit is used for inputting the picture to be processed into a convolution neural network in the target character recognition model, extracting the image convolution characteristic of the picture to be processed by the convolution neural network in the target character recognition model, and taking the extracted image convolution characteristic as the target image convolution characteristic of the target verification code picture;
the sequence feature extraction unit is used for inputting the convolution features of the target image into a recurrent neural network in the target character recognition model, extracting the sequence features in the convolution features of the target image by the recurrent neural network in the target character recognition model, and taking the extracted character sequence features as the target character sequence features corresponding to the target verification code image;
and the sequence feature alignment unit is used for inputting the target character sequence features into a character classification network in the target character recognition model, aligning the target character sequence features through a connection time classification network in the character classification network, and obtaining the target verification code characters in the target verification code picture based on the aligned target character sequence features.
The target character recognition model comprises a recurrent neural network, and the recurrent neural network in the target character recognition model is a bidirectional long-term and short-term memory network; the bidirectional long and short term memory network comprises a forward long and short term memory network and a reverse long and short term memory network; the forward long-short time memory network comprises a memory network B i And a memory network B i+1 Memory network B i+1 As a memory network B i The next memory network of (2); the reverse long-time and short-time memory network comprises a memory network C i+1 And memory network C i (ii) a Memory network C i+1 As a memory network C i The last memory network of (2); i is a positive integer less than or equal to M; the number of the memory networks in the forward long-short-time memory network and the reverse long-short-time memory network is M;
the sequence feature extraction unit includes:
a forward characteristic extraction subunit for obtaining a memory network B in the forward long-and-short term memory network i Associated forward history hidden feature h i-1 The convolution characteristic of the target image and the forward history hidden characteristic h i-1 Input memory network B i From a memory network B i Extracting and obtaining a forward target hidden feature h at the moment i i Hiding the forward target with the feature h i Inputting the convolution characteristics of the target image into a memory network B i+1 From a memory network B i+1 Extracting and obtaining a forward target hidden feature h at the (i + 1) th moment i+1
A reverse characteristic extraction subunit for acquiring and reversing the memory network C in the long-time and short-time memory network i+1 Associated reverse history hidden feature k i+1 Convolving the target image with the inverse historical hidden feature k i+1 Input memory network C i+1 From a memory network C i+1 Extracting to obtain a reverse target hidden feature k at the (i + 1) th moment i Hiding the reverse target feature k i Input memory network C for convolution characteristic of target image i From a memory network C i Extracting reverse target hidden feature k at the ith moment i-1
A feature concatenation subunit for concatenating the memory networks B i Extracting the forward target hidden feature h at the ith moment i And a memory network C i Extracting the reverse target hidden feature k at the ith moment i-1 Performing feature splicing to obtain a first splicing feature, and memorizing the network B i+1 Extracting the forward target hidden feature h at the (i + 1) th moment i+1 And a memory network C i+1 The reverse target hidden feature k extracted at the (i + 1) th moment i Performing characteristic splicing to obtain a second splicing characteristic;
and the sequence characteristic determining subunit is used for determining the target character sequence characteristic corresponding to the target verification code image extracted from the target image convolution characteristic based on the first splicing characteristic and the second splicing characteristic.
Wherein, the character submitting module comprises:
the system comprises a crawler-resisting unit, a verification code generating unit and a verification code inputting unit, wherein the crawler-resisting unit is used for acquiring a crawler-resisting strategy aiming at a target verification code character, and acquiring dormancy duration and character inputting interval duration indicated by the crawler-resisting strategy;
the character input unit is used for inputting the characters of the target verification code to a character input area corresponding to the target verification code picture in a simulated mode according to the character input interval duration within the dormancy duration; the character input area comprises a character input box and a character submission control;
and the character submitting unit is used for responding to the simulated submitting operation aiming at the character submitting control, and performing character verification on the target verification code characters displayed in the character input box to obtain a character verification result.
Wherein, the main part information acquisition module includes:
the operation certificate display unit is used for outputting a certificate picture display interface associated with the service platform and displaying an operation certificate picture bound with the service link in the certificate picture display interface if the character verification result indicates that the verification is successful;
the certificate interface analysis unit is used for analyzing the certificate picture display interface through a webpage crawler strategy associated with the service platform to obtain an operation certificate picture in the certificate picture display interface;
the optical model calling unit is used for taking the operation certificate picture as a picture to be acquired and calling the optical character recognition model through the optical character recognition interface;
and the main body information acquisition unit is used for identifying the picture to be acquired through the optical character identification model, and taking the optical character information identified from the picture to be acquired as the main body information of the operation main body to which the target business object acquired from the operation certificate picture belongs.
Wherein, the voucher interface analyzing unit comprises:
the second query subunit is used for searching a certificate picture address character matched with a second key element in a second picture acquisition rule in a data structure tree of the certificate picture display interface through the second picture acquisition rule in the webpage crawler policy associated with the service platform;
and the operation picture extracting subunit is used for extracting the operation certificate picture from the certificate display area in the certificate picture display interface based on the verification picture address symbol.
Wherein, the device still includes:
the system comprises an incidence relation establishing module, a business link establishing module and a business link updating module, wherein the incidence relation establishing module is used for establishing incidence relation between main body information and business links in a main body information base associated with an operation main body and updating the incidence relation to the main body information base;
and the service link updating module is used for taking the service link carrying the association relationship as a webpage updating link and updating the service link of the target service object into the webpage updating link on the service platform.
Wherein, the device still includes:
the illegal object detection module is used for taking an operation main body indicated by the main body information as an illegal operation main body based on the incidence relation between the main body information and the service link when a target service object under the webpage updating link is detected to belong to an illegal object in a blacklist;
and the notification information generation module is used for generating notification information associated with the illegal operation main body and sending the notification information to the supervision terminal corresponding to the platform supervision personnel associated with the service platform.
An embodiment of the present application provides a method for processing a verification code character, including:
acquiring an original sample picture for training an initial character recognition model; the original sample picture is determined based on the sample link and the webpage crawler policy; the sample link is captured from the service platform through a webpage crawler strategy;
acquiring a data enhancement strategy for enhancing data of an original sample picture, and performing data enhancement processing on the original sample picture based on the data enhancement strategy to obtain at least one enhanced sample picture associated with the original sample picture;
taking an original sample picture and at least one enhanced sample picture as target sample pictures, and taking marked verification code characters corresponding to the original sample picture as sample labels of the target sample pictures;
inputting a target sample picture into an initial character recognition model, recognizing the target sample picture by the initial character recognition model, and taking a sample verification code character recognized from the target sample picture as a prediction label;
performing iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture; the target verification code character is used for acquiring an operation certificate picture bound with a service link associated with the target verification code picture after character verification is carried out; the operation certificate picture is used for acquiring the subject information of the operation subject to which the target business object indicated by the business link belongs.
An embodiment of the present application provides an apparatus for processing a verification code character, including:
the original image acquisition module is used for acquiring an original sample image used for training an initial character recognition model; the original sample picture is determined based on the sample link and the webpage crawler policy; the sample link is captured from the service platform through a webpage crawler strategy;
the enhanced picture generation module is used for acquiring a data enhancement strategy for performing data enhancement on the original sample picture, and performing data enhancement processing on the original sample picture based on the data enhancement strategy to obtain at least one enhanced sample picture associated with the original sample picture;
the sample picture determining module is used for taking the original sample picture and at least one enhanced sample picture as target sample pictures and taking the marked verification code characters corresponding to the original sample picture as sample labels of the target sample pictures;
the sample character recognition module is used for inputting the target sample picture into the initial character recognition model, recognizing the target sample picture by the initial character recognition model and taking the sample verification code characters recognized from the target sample picture as a prediction label;
the model training module is used for carrying out iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture; the target verification code character is used for acquiring an operation certificate picture bound with a service link associated with the target verification code picture after character verification is carried out; the operation certificate picture is used for acquiring the subject information of the operation subject to which the target business object indicated by the business link belongs.
An aspect of an embodiment of the present application provides a computer device, where the computer device includes: a processor and a memory;
a processor is connected to the memory, wherein the memory is used for storing the computer program and the processor is used for calling the computer program to make the computer device execute the method in any aspect of the embodiment of the present application.
In one aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, the computer program being adapted to be loaded and executed by a processor, so as to enable a computer device having the processor to execute the method in any aspect of the embodiments of the present application.
An aspect of embodiments of the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method in any aspect of the embodiment of the present application.
According to the computer equipment, the target character recognition model is introduced, the target verification code characters in the target verification code picture can be intelligently extracted, and the real-time performance and accuracy of character extraction can be improved. Further, the computer device can input the target verification code characters to a character input area corresponding to the target verification code picture in a simulation mode, and further can perform character verification on the target verification code characters in response to a simulation submission operation aiming at the target verification code characters in the character input area to obtain a character verification result; further, if the character verification result indicates that the verification is successful, the computer device may acquire an operation certificate picture bound with the service link, and acquire the subject information of the operation subject to which the target service object belongs from the operation certificate picture. Therefore, the method and the device for verifying the target authentication code character can intelligently verify the recognized target authentication code character under the condition that the target authentication code character is recognized, so that business subject information (for example, business license subject information) can be quickly extracted from an operation certificate picture (for example, a business license picture) when verification is successful, and further the obtaining efficiency of the subject information can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;
fig. 2 is a scene schematic diagram of intelligently identifying a captcha character and intelligently acquiring subject information according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for processing a captcha character according to an embodiment of the present disclosure;
fig. 4 is a schematic view of a scene for acquiring a target verification code picture according to an embodiment of the present disclosure;
FIG. 5 is a schematic view of a target character recognition model according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a method for processing an identifying code character according to an embodiment of the present disclosure;
FIG. 7 is a schematic view of a scenario of a data collection scheme based on a web crawler according to an embodiment of the present application;
FIG. 8 is a block diagram of a method for processing a validation code character according to an embodiment of the present disclosure;
fig. 9 is a schematic view of a scene where an initial character recognition model is trained to obtain a target character recognition model according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a scenario in which a sample label is obtained through an attention mechanism network according to an embodiment of the present application;
FIG. 11 is a block diagram of an apparatus for processing an authentication code character according to an embodiment of the present disclosure;
FIG. 12 is a block diagram of an exemplary apparatus for processing a validation code character according to the present disclosure;
fig. 13 is a schematic diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The scheme provided by the embodiment of the application belongs to Machine Learning (Machine Learning, ML) in the field of artificial intelligence, and can be understood that the Machine Learning (ML) is a multi-field cross subject relating to probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other multi-field subjects.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present disclosure. As shown in fig. 1, the network architecture is suitable for an e-commerce operation system, and the e-commerce operation system may specifically include a computer device 2000, a first service server 10a, a first service platform user terminal cluster, a second service server 10b, and a second service platform user terminal cluster. It can be understood that the first platform user terminal cluster herein may be a user terminal cluster under a first service platform, and the second platform user terminal cluster herein may be a user terminal cluster under a second service platform. It should be understood that the first platform user terminal cluster and the second platform user terminal cluster may each include one or more user terminals, and here, the number of the user terminals in the user terminal cluster corresponding to each service platform will not be limited.
As shown in fig. 1, the first platform user terminal cluster may include a plurality of user terminals under the first service platform, and the plurality of user terminals under the first service platform may specifically include a user terminal 3000a, a user terminal 3000b, a user terminal 3000c, \ 8230;, and a user terminal 3000n shown in fig. 1. As shown in fig. 1, a user terminal 3000a, a user terminal 3000b, a user terminal 3000c, \ 8230, and a user terminal 3000n may be in network connection with the first service server 10a, so that each user terminal in the first platform user terminal cluster may perform data interaction with the first service server 10a through the network connection. For example, each user terminal in the first platform user terminal cluster may perform data interaction with the first service server 10a through the first application client (e.g., shopping client a), so that when the user terminals access the first service server 10a through the shopping client a, the user terminals may return to the operation display interface of the operation subject (e.g., virtual shop) registered by the user terminals on the first service platform. It can be understood that each user terminal in the first platform user terminal cluster may include: the intelligent terminal comprises an intelligent terminal with document loading and displaying functions, such as a smart phone, a tablet computer, a notebook computer, a desktop computer and an intelligent television.
Similarly, as shown in fig. 1, the second platform ue cluster may include a plurality of ues under the second service platform, and the plurality of ues under the second service platform may specifically include the ue 4000a, the ue 4000b, the ue 4000c, \ 8230;, and the ue 4000n shown in fig. 1. As shown in fig. 1, a user terminal 4000a, a user terminal 4000b, a user terminal 4000c, \ 8230, and a user terminal 4000n may be in network connection with the second service server 10b, so that each user terminal in the second platform user terminal cluster may perform data interaction with the second service server 10b through the network connection. For example, each user terminal in the second platform user terminal cluster may perform data interaction with the second service server 10B through the second application client (e.g., shopping client B), so that when the user terminal accesses the second service server 10B through the shopping client B, the user terminal may return to the operation display interface of the operation subject (e.g., virtual shop) registered by the user terminal on the second service platform. It can be understood that each user terminal in the second platform user terminal cluster may include: the intelligent terminal comprises an intelligent terminal with document loading and displaying functions, such as a smart phone, a tablet computer, a notebook computer, a desktop computer and an intelligent television.
As shown in fig. 1, both the first service server 10a corresponding to the first service platform and the second service server 10b corresponding to the second service platform may be connected to the computer device 2000 via a network, so that the computer device 2000 may perform data interaction with the first service server 10a and the second service server 10b via a commodity clue (e.g., a commodity type) indicated by the web crawler policy, so as to capture link information of a service object matching the commodity clue from the service servers corresponding to the service platforms, for example, the computer device 2000 may capture a commodity link of a commodity matching the specified commodity type from the first service platform and/or the second service platform, respectively.
It is understood that the merchandise links captured by the computer device 2000 may be from the same service platform or from different service platforms. For convenience of understanding, in the embodiment of the present application, link information of a service object (for example, a product S1) currently captured from a certain service platform (for example, a first service platform) is taken as an example to illustrate a specific process of determining a target verification code picture to be verified through the captured link information of the service object (for example, the product S1) and a web crawler policy.
It should be understood that after the computer device 2000 acquires the target captcha picture, the target character recognition model may be further acquired according to an intelligent data acquisition policy, so as to intelligently recognize the target captcha character in the target captcha picture through the target character recognition model, and then the recognized target captcha character may be input to the character input area in a simulated manner, so as to simulate submission of the target captcha character. At this time, the computer device 2000 may perform character verification on the submitted target authentication code character, and may further intelligently obtain an operation credential picture (e.g., a business license picture) bound with the link information of the business object (e.g., the product S1) when the verification is successful, so as to acquire the subject information of an operation subject (e.g., a virtual shop) to which the business object (e.g., the product S1) belongs from the operation credential picture (e.g., the business license picture). Because the execution process (for example, the character extraction process, the character input process and the character verification process) of the whole data acquisition scheme does not need manual participation, the manual interaction duration can be further reduced fundamentally, and the acquisition efficiency of the main body information can be improved.
The computer device 2000 shown in fig. 1 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, which will not be limited herein.
For easy understanding, please refer to fig. 2, and fig. 2 is a schematic view of a scene for intelligently identifying the captcha characters and intelligently acquiring the subject information according to an embodiment of the present application. The service server 20b shown in fig. 2 may be the first service server 10a corresponding to the first service platform. As shown in fig. 2, the first service platform may be the service platform P shown in fig. 2. It is understood that the business server 20b may provide online operation services (e.g., store registration services, product listing services) for registered merchant users (e.g., enterprise organizations or individual users accessing the business platform P shown in fig. 2 through the shopping client a).
For example, when merchant users such as business organizations or individual users access a first business platform (e.g., the business platform P shown in fig. 2) through the shopping client a, store registration services may be provided for the merchant users, that is, the merchant users may perform store registration through the provided store registration services to apply for creating corresponding virtual stores on the first business platform (e.g., the business platform P shown in fig. 2). In this case, the business server 20b may configure corresponding store links for the virtual stores requested by the merchant users on the first business platform (for example, the business platform P shown in fig. 2) so that one virtual store corresponds to one store link. In this way, the merchant users can also perform product listing in the virtual stores corresponding to the respective store links by the product listing service, so that one product link can be configured for each of the S products (i.e., one virtual object) that are listed in the corresponding virtual stores. Where S is a positive integer.
It can be seen that the first business platform (for example, the business platform P shown in fig. 2) can display the virtual commodities in different virtual stores thereon, and the commodity link of the virtual commodity in each virtual store has uniqueness, so that other users (for example, purchasing users) on the first business platform (for example, the business platform P shown in fig. 2) can accurately access the virtual commodity in a certain virtual store through the commodity link with uniqueness.
In order to facilitate product security supervision on the virtual goods presented on the first service platform (e.g., the service platform P shown in fig. 2) in the e-commerce operation system, the embodiment of the present application provides that goods links of N virtual goods can be grabbed from the first service platform (e.g., the service platform P shown in fig. 2) through a goods clue (e.g., a goods type) indicated by a web crawler policy, and the grabbed N virtual goods can be collectively referred to as N service objects, so that the goods links of the N virtual goods can be collectively referred to as link information of the N service objects, where N is a positive integer.
For convenience of understanding, in the embodiment of the present application, link information of one service object is selected from link information of N service objects as a service link of a target service object, where the service link of the target service object may be a service link L1 of a service object N1 shown in fig. 2. As shown in fig. 2, the computer device 20a may capture the business link L1 of the business object N1 from the business platform P through a web crawler policy, and may further parse the business link L1 to obtain a store link of the virtual store to which the business object N1 (i.e. the target business object) belongs. It can be understood that, in the embodiment of the present application, the store links of the virtual stores analyzed by the business link L1 may be collectively referred to as operation links of the operation subject to which the target business object belongs. At this time, the computer device 20a can obtain an operation display interface (for example, a store top page of the virtual store) of the operation subject through the operation link, and it should be noted that the operation display interface includes an inquiry interface that can be used for inquiring about the subject information. The query interface may be triggered by the simulation to access the simulation verification interface. The target display area of the simulated verification interface may be used to display a verification code picture associated with operation credential picture 204a shown in fig. 2.
For convenience of understanding, the captcha pictures displayed in the target display area of the simulation verification interface may be collectively referred to as target captcha pictures. At this time, the computer device 20a may further acquire a target verification code picture from the simulated verification interface according to the first picture acquisition rule indicated by the web crawler policy. As shown in fig. 2, the acquired target verification code picture may be the verification code picture 201a shown in fig. 2. To ensure that the captcha characters 203a in the captcha picture 201a can be recognized with near certainty. The embodiment of the application provides that the verification code characters in the verification code picture 201a can be innovatively and intelligently extracted through the character recognition model 202a shown in fig. 2.
The character recognition model 202a shown in fig. 2 may specifically include a first network and a second network. The first network specifically refers to a convolutional recurrent neural network (i.e., CRNN network) composed of a convolutional neural network and a recurrent neural network. The Convolutional Neural network in the CRNN network may be a CNN network (Convolutional Neural network), and the recurrent Neural network in the CRNN network may be a bidirectional Long Short Term Memory network (for example, a Bi-directional Long Short Term Memory network, bi-LSTM network for Short). The second network specifically refers to a character classification network having a character classification function, for example, the character classification network may specifically be a connection time classification network (for example, a Connection Temporal Classification (CTC) network) or an attention mechanism network.
For example, by introducing an end-to-end character recognition method based on the CRNN network and the CTC network in the embodiment of the present application, in the execution process of the whole data acquisition policy, a target verification code character (for example, the verification code character 203a shown in fig. 2, where the verification code character 203a may be specifically the character "3 nHf") in a target verification code picture (i.e., the verification code picture 201a shown in fig. 2) may be intelligently recognized, and as shown in fig. 2, the computer device 20a may further perform character verification on the verification code character 203a under the condition that the recognized verification code character 203a is intelligently submitted, and further may quickly acquire the operation credential picture 204a shown in fig. 2 when the character verification is successful. It should be understood that the operation certificate picture 204a may specifically be a photo of a license bound to the service link L1. At this time, the computer device may quickly and intelligently acquire the subject information of the operating subject to which the business object N1 belongs from the license photo (i.e., the operation voucher picture 204 a), and the subject information of the operating subject to which the business object N1 belongs may be the subject information 205a shown in fig. 2.
It should be understood that, in the case where the computer device 20a acquires the subject information of the operation subject (i.e., the subject information 205a shown in fig. 2), the subject information 205a may be further legally monitored to obtain a legally monitored result. If the legal monitoring result indicates that the subject information 205a does not have the legality, it may reflect that the operation subject belongs to an illegal operation subject, and further may help a platform supervisor to accurately obtain the illegal operation subject on the service platform in the first time, so that the obtained illegal operation subject may be subjected to violation punishment in time in the following process, thereby ensuring the security and reliability of the whole e-commerce operation system. Therefore, the computer device 20a can not only intelligently and real-timely perform character recognition, but also help the platform supervisor to realize centralized monitoring and management of the operation subject on the service platform P.
Furthermore, it is understood that the computer device 20a may also add the target captcha pictures and the target captcha characters identified in the target captcha pictures to the prior knowledge base, so that the added target captcha pictures may be subsequently obtained from the prior knowledge base as the original sample pictures, and the target captcha characters in the added target captcha pictures may be obtained from the prior knowledge base as the annotated captcha characters for annotating the original sample pictures, and further, the obtained original sample pictures and the annotated captcha characters may be used as initial training sample information for further training the character recognition model 202a (i.e., the target character recognition model). Then, the computer device 20a may use the initial training sample information and the enhanced sample information fitted by the initial training sample information as target training sample information to train a new target character recognition model. The target training sample information may include a small number of labeled original sample pictures and enhanced sample pictures obtained by fitting during data enhancement, where the labeled captcha characters indicated by the sample labels are captcha characters obtained during data labeling on the original sample pictures, for example, the target captcha characters.
For convenience of understanding, in the embodiments of the present application, the character recognition models before training may be collectively referred to as initial character recognition models, and new character recognition models obtained after the initial character recognition models are trained may be collectively referred to as the aforementioned target character recognition models.
Optionally, it should be understood that, in a block chain scenario, the computer device 20a may serve as a block chain node, so that, when a new target character recognition model is obtained by training the block chain node based on the target training sample information, other common identification nodes in a block chain network where the block chain node is located may be notified to perform common identification on model parameters of the newly generated target character recognition model, and then, when the common identification is achieved, the model parameters of the newly generated target character recognition model may be uploaded to the block chain, so that when a new target verification code picture is obtained, the computer device 20a may obtain the model parameters of the newly generated target character recognition model from the block chain in real time, and then, based on the model parameters of the newly generated target character recognition model, perform parameter update on the model parameters of the target character recognition model in the computer device 20a, and then, based on the target character recognition model after parameter update, may recognize the verification code character in the new target verification code picture.
The specific implementation manner of the computer device 20a obtaining the target character recognition model by training the initial character recognition model, and obtaining the target verification code characters in the target verification code picture and the main body information in the operation certificate picture by collecting through the target character recognition model may refer to the following description of the embodiments corresponding to fig. 3 to fig. 10.
Further, please refer to fig. 3, wherein fig. 3 is a schematic flowchart of a method for processing a verification code character according to an embodiment of the present application. It can be understood that the method provided in the embodiment of the present application may be executed by a computer device, where the computer device includes, but is not limited to, a user terminal or a service server running an illegal detection module. For convenience of understanding, the embodiment of the present application takes the computer device as an example of a user terminal, so as to illustrate a specific process of performing character extraction on a trained target character recognition model in the user terminal. As shown in fig. 3, the method may include at least the following steps S101 to S104:
step S101, capturing a service link of a target service object on a service platform through a webpage crawler strategy, and determining a target verification code picture to be verified based on the service link and the webpage crawler strategy;
specifically, the computer device may obtain a web crawler policy associated with the service platform, and capture link information of the N service objects from the service platform through the web crawler policy; n is a positive integer; further, the computer device may obtain a target business object from the N business objects, and use link information of the target business object as a business link of the target business object; further, the computer device may analyze the service link to obtain an operation link of an operation subject to which the target service object belongs, and may output an operation display interface of the operation subject based on the operation link; the operation display interface comprises a query interface for querying an operation certificate picture of an operation subject; furthermore, the computer device can respond to the simulation trigger operation aiming at the query interface, output the simulation verification interface corresponding to the operation display interface, and further obtain the target verification code picture to be verified from the simulation verification interface through a webpage crawler strategy.
The specific process of acquiring the target verification code picture to be verified from the simulation verification interface by the computer device can be described as follows: the computer device may switch the operation display interface to the simulation verification interface in response to a simulation trigger operation for the query interface. Note that the simulation verification interface herein may contain a target display area; the target display area can be used for displaying a target verification code picture associated with the operation certificate picture; further, the computer device may query, through a first picture acquisition rule in the web crawler policy, a verification picture address symbol matched with a first key element in the first picture acquisition rule in a data structure tree corresponding to the simulation verification interface, and may further extract, based on the verification picture address symbol, a target verification code picture from the target display region.
For easy understanding, please refer to fig. 4, and fig. 4 is a schematic view of a scene for obtaining a picture of a target verification code according to an embodiment of the present application. The service link L2 shown in fig. 4 may be link information of another service object (for example, the service object N2 shown in fig. 4) captured from the service platform P shown in fig. 4 (i.e., the service platform P in the embodiment corresponding to fig. 2). For example, as shown in FIG. 4, the business link L2 may be a merchandise link capable of directly pointing to the merchandise detail display interface of the business object N2, i.e., the merchandise link may be "https:// detail.
It should be understood that, since the business object N2 is a commodity in the virtual store D shown in fig. 4, that is, there is an affiliation between the business object N2 and the virtual store D (i.e., the operation subject) shown in fig. 4, in the case that the commodity link of the business object N2 is grasped by the web crawler policy, the computer device may parse the commodity link (i.e., the business link L2) according to the affiliation to parse the operation link of the virtual D to which the business object N2 belongs from the commodity link (i.e., the business link L2).
It should be understood that, as shown in fig. 4, the operation link here may be the shop link 41a shown in fig. 4. At this time, the computer apparatus can intelligently access the virtual store D through the store link 41a to output the store top page of the virtual store D. As shown in fig. 4, the shop front page of the virtual shop D may be the operation display interface 400a shown in fig. 4.
The operation display interface 400a shown in fig. 4 includes an inquiry interface for inquiring the operation voucher picture of the operation subject. Wherein, it is understood that the query interface may be the voucher viewing portal 42a for viewing enterprise qualifications shown in fig. 4; at this time, the computer device may respond to the simulation trigger operation for the credential viewing portal 42a, output the simulation verification interface 400b shown in fig. 4, and further may obtain the verification code picture to be verified from the target display area 43a of the simulation verification interface 400b through the web crawler policy.
Therein, it is understood that the simulation verification interface 400b shown in fig. 4 may include a target display area 43a, a character input area 44a, and a character submission area 45a. The target display area 43a shown in fig. 4 may be used to display a target verification code picture associated with the operation voucher picture to be viewed. The character input area 44a shown in fig. 4 may be used to simulate entry of characters of the target captcha extracted from the picture of the target captcha. Therein, the character submission area 45a shown in fig. 4 may be used to simulate submission of characters that have been entered into the character input area. It should be understood that the character input area 44a and the character submission area 45a may be displayed in a combined manner in the same area, or may be displayed in different areas in a distributed manner, which is not limited herein.
It should be appreciated that for the business platform P (e.g., QQ mall shopping platform) shown in fig. 4 above, the location and code of the target verification code picture displayed in the simulation verification interface 400b is relatively fixed. Therefore, as shown in fig. 4, the computer device may identify, by the first picture capturing rule in the web crawler policy, a location position region of the target captcha picture in the simulation verification interface 400b, where the location position region may be the target display region 43a shown in fig. 4, and further may capture, by the first picture capturing rule in the web crawler policy (for example, a captcha picture extraction rule, where a specific element extraction syntax indicated by the captcha picture extraction rule may be an XPath path language or a CSS selector), a specific element of the specific location region in the data structure tree corresponding to the simulation verification interface 400 b.
For example, the computer device may collectively refer to the node path language corresponding to the location area (i.e., the specific location area) as the first key element in the first picture capturing rule in the data structure tree (e.g., an XML document corresponding to the XPath path language) corresponding to the simulation verification interface 400b, and further may query the data structure tree corresponding to the simulation verification interface for the verification picture address character matching the first key element in the first picture capturing rule. It should be understood that the verification picture addresser is used to uniquely point to a specific location area (i.e., the target display area 43a shown in fig. 4) in the simulation verification interface 400b, so that the computer device can accurately capture the target verification code picture shown in fig. 4 from the target display area 43a shown in fig. 4 in real time based on the verification picture addresser.
For example, when the service platform P is a panning shopping platform, the specific element (i.e. the first key element) collected through the XPath language may be div. id = "nc _1_imgcaptcha _img", and the target verification code picture shown in fig. 4 can be extracted from the target display area 43a shown in fig. 4 based on the verification picture address character.
For another example, optionally, when the service platform P is a small red book shopping platform, the specific element (i.e., the first key element) collected through the XPath path language may be div. It should be understood that the service platform P herein may include, but is not limited to, a QQ mall shopping platform, a naughty shopping platform, a small red book shopping platform, a kyoto shopping platform, a one-shop shopping platform, a guild shopping platform, a multi-pin shopping platform, a furs shopping platform, a game mall shopping platform, etc., and the element types of the specific elements (i.e., the first key elements required to be queried by each service platform) specified by each service platform will not be limited herein.
It is understood that the online store operator in the ONI verification interface 400b shown in fig. 4 may be the merchant user who registers the virtual store D on the service platform P. The operation certificate information (e.g., the operation certificate photo) of the merchant user may specifically be a business license picture recorded and uploaded by the merchant user in the quality administration (e.g., the national business administration) in the embodiment corresponding to fig. 2.
Step S102, acquiring a target character recognition model associated with a service platform, inputting a target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain a target verification code character in the target verification code picture;
specifically, the computer device may obtain a target character recognition model associated with the service platform, perform denoising processing on the target verification code image, and use the target verification code image after denoising processing as an image to be processed; further, the computer device can input the picture to be processed into a convolutional neural network in the target character recognition model, extract the image convolution characteristic of the picture to be processed by the convolutional neural network in the target character recognition model, and use the extracted image convolution characteristic as the target image convolution characteristic of the target verification code picture; further, the computer device may input the convolution feature of the target image to a recurrent neural network in the target character recognition model, extract a sequence feature in the convolution feature of the target image by the recurrent neural network in the target character recognition model, and use the extracted character sequence feature as a target character sequence feature corresponding to the target verification code image; further, the computer device may input the target character sequence features into a character classification network in the target character recognition model, perform alignment processing on the target character sequence features through a connection time classification network in the character classification network, and obtain the target verification code characters in the target verification code picture based on the target character sequence features after the alignment processing.
Optionally, it can be understood that the specific process of the computer device obtaining the target verification code character through the target character recognition model may also be described as follows: the computer device may comprehensively identify the target identifying code character in the target identifying code picture by combining the first network and the second network in the target character identifying model when acquiring the target character identifying model associated with the service platform. It should be understood that the first network herein may be the above-mentioned convolutional recurrent neural network (i.e., the above-mentioned CRNN network), and the convolutional recurrent neural network (i.e., the above-mentioned CRNN network) may specifically include the above-mentioned convolutional neural network (e.g., the above-mentioned CNN network) and the recurrent neural network (e.g., the above-mentioned Bi-LSTM). The second network may be specifically the above character classification network, for example, the character classification network may include any one of the above connection time classification network (e.g., the above CTC network) and the above attention mechanism network having a character classification function.
For the convenience of understanding, the second network is taken as the above-mentioned connection time classification network (e.g., the above-mentioned CTC network) for example to illustrate the process of identifying the indefinite verification code in the target verification code picture through the above-mentioned CRNN network and the above-mentioned CTC network. Further, please refer to fig. 5, where fig. 5 is a scene schematic diagram of a target character recognition model according to an embodiment of the present application. As shown in fig. 5, the target verification code picture extracted by the computer device in step S101 may be the verification code picture 51a shown in fig. 5. The network 52a shown in fig. 5 may be the CNN network described above, and the CNN network may be configured to extract an image convolution feature in the verification code picture 51a, for example, the image convolution feature in the verification code picture 51a extracted by the network 52a may be the image convolution feature shown in fig. 5.
Optionally, in order to improve the reliability of feature extraction, the computer device may further perform denoising processing on the verification code picture 51a (target verification code picture) shown in fig. 5 according to an image denoising strategy in advance, and further may use the verification code picture 51a after denoising processing (i.e., the target verification code picture after denoising processing) as a picture to be processed; further, the computer device may input the picture to be processed to the network 52a shown in fig. 5 (i.e., the CNN network described above), and extract the image convolution feature of the picture to be processed by the network 52a, and may further use the extracted image convolution feature (e.g., the image convolution feature 53a shown in fig. 5) as the target image convolution feature of the target verification code picture.
It should be understood that the image denoising strategy according to the embodiment of the present application may specifically include a binarization algorithm corresponding to the binarization operation and a graying algorithm corresponding to the graying operation. For example, in the embodiment of the present application, noise interference such as background texture in the verification code picture 51a can be filtered by performing binarization operation or graying operation on the verification code picture 51a shown in fig. 5.
It should be understood that, in the case where the target character recognition model is a character recognition model formed by the CRNN network and the CTC network, in order to solve the inherent defects of the CTC network (for example, the dimension of the output feature of the network 52c should be consistent with the dimension of the input feature of the network 52 c), the embodiment of the present application proposes that the pooling layers of the CNN network in the CRNN network need to be reduced. For example, originally, the pooling layer of the CNN network is 3 layers, and in order to prevent the phenomenon that the feature reduction is too fast due to too many pooling layers, the embodiment of the present application proposes a CNN network that can reserve fewer pooling layers, for example, a CNN network with a single pooling layer can be reserved.
Alternatively, it should be understood that, if the target character recognition model is a character recognition model formed by the CRNN network and the attention mechanism network, the pooling layer of the CNN network in the CRNN network does not need to be reduced. In other words, the CNN network pooling layer in the CRNN network may be a common three pooling layers.
The network 52b shown in fig. 5 may be the above-mentioned recurrent neural network, for example, the recurrent neural network herein mainly refers to a bidirectional long-short term memory network. The bidirectional lengthThe short-term network may specifically include a forward long-term memory network and a reverse long-term memory network. As shown in fig. 5, the forward long-short time memory network may be configured to calculate a forward target hidden feature of the convolution image feature 53a at each time along a first feature calculation direction from left to right; for example, the forward hidden features extracted by the computer device at any two adjacent time instants (e.g., time instant i and time instant i + 1) may include: the forward target hidden feature h at the ith time i And a forward target hidden feature h at time i +1 i+1 . Wherein it should be understood that, since the forward long-and-short term memory network is essentially a recurrent neural network, the computer device extracts the forward hidden feature (e.g., the forward target hidden feature h) at the ith time i ) And may essentially be an input feature at a time that is next to the i-th time (i.e., time i + 1). Therefore, the computer device extracts the forward hidden feature (e.g., the forward target hidden feature h) at the ith time i ) Essentially extracted forward history hidden feature h i-1 Determined in conjunction with the image convolution feature 53 a. Similarly, the computer device extracts the forward hidden feature (e.g., the forward target hidden feature h) at the time of i +1 i+1 ) Essentially the forward target hidden feature h extracted from the time instant immediately preceding the (i + 1) th time instant (i.e. the (i) th time instant) i Determined in conjunction with the image convolution feature 53 a. It should be understood that the forward long and short term memory network may include M memory networks, and any two adjacent memory networks in the M memory networks may include a memory network B i And a memory network B i+1 And in the above-mentioned first characteristic calculation direction, memory network B i+1 For the memory network B i The next memory network. The memory network used by the computer device at time i can be a memory network B i . Similarly, the memory network used by the computer device at time i may be memory network B i+1
Similarly, as shown in fig. 5, the inverse long-time and short-time memory network can be used to calculate the convolution image feature 53a along the second feature calculation direction from right to leftThe reverse target hidden features at each time instant, for example, the reverse hidden features extracted by the computer device at any two adjacent time instants (e.g., the ith time instant and the (i + 1) th time instant), may include: reverse hidden feature k at time i +1 i And the reverse hidden feature k at the ith time i-1 . Wherein it should be understood that, since the reverse long-term memory network is essentially a recurrent neural network, the reverse hidden feature (e.g., the reverse target hidden feature k) extracted by the computer device at the ith time is i-1 ) Essentially the inverse target hidden feature k extracted at the time next to the i-th time instant, i.e. at time instant i +1 i And the image convolution feature 53 a. Similarly, the reverse hidden feature extracted by the computer device at time i +1 (e.g., reverse target hidden feature k) i ) Essentially the inverse history hidden feature k extracted from the time instant next to the time instant i +1 (i.e. the time instant i + 2) i+1 And the image convolution feature 53 a. It should be understood that the reverse long-and-short term memory network may also include M memory networks, and in the reverse long-and-short term memory network, any two adjacent memory networks in the M memory networks may include a memory network C i And a memory network C i+1 And in the second characteristic calculation direction, the network C is memorized i+1 As a memory network C i The last memory network. The memory network used by the computer equipment at the moment i can be a memory network C i . Similarly, the memory network used by the computer device at time i may be memory network C i+1
Therefore, the computer equipment can acquire the memory network B in the short-term and long-term memory network i Associated forward history hidden feature h i-1 Then, the target image convolution feature (e.g., the image convolution feature shown in fig. 5) and the forward history hidden feature h may be combined i-1 Input memory network B i From a memory network B i Extracting the forward target hidden feature h at the moment i i Further, the computer device may hide the forward target with a feature h i And target image volumeProduct feature input memory network B i+1 (i.e., memory network B) i And from the memory network B) of the memory network i+1 Extracting the forward target hidden feature h at the (i + 1) th moment i+1
Similarly, the computer device may also obtain the memory network C in the reverse long-time and short-time memory network i+1 Associated reverse history hidden feature k i+1 Then, the target image convolution feature (e.g., the image convolution feature shown in FIG. 5) and the inverse history hidden feature k are combined i+1 Input memory network C i+1 From a memory network C i+1 Extracting to obtain a reverse target hidden feature k at the (i + 1) th moment i Further, the computer device may hide the reverse target from the feature k i Input memory network C for convolution characteristic of target image i From a memory network C i Extracting reverse target hidden feature k at the ith moment i-1
Further, as shown in FIG. 5, the computer device may be a memory network B i The forward hidden feature at the ith time (e.g., the forward target hidden feature h described above) i ) And a memory network C i Reverse hidden feature at time i (e.g., reverse target hidden feature k) i-1 ) The feature splicing is performed to splice to obtain a splicing feature at the ith time, and it should be understood that in this embodiment of the application, the splicing feature spliced at the ith time may be collectively referred to as a first splicing feature. Similarly, the computer device can store the memory network B i+1 The forward hidden feature at time i +1 (e.g., the forward target hidden feature h described above) i+1 ) And a memory network C i+1 Reverse hidden feature at time i +1 (e.g., reverse target hidden feature k) i ) The feature splicing is performed to splice to obtain the splicing feature at the i +1 th time, and it should be understood that in this embodiment of the application, the splicing feature spliced at the i +1 th time may be collectively referred to as a second splicing feature.
As shown in fig. 5, the computer device may determine to extract the target verification code picture from the convolution feature of the target image based on the first splicing feature and the second splicing featureCorresponding target character sequence characteristics. For example, as shown in FIG. 5, the computer device may use the first stitching feature as a new convolution image feature to input the new convolution image feature into the memory network B i And memory network C i . Similarly, the computer device may use the second stitching feature as another new convolved image feature to input the another new convolved image feature into the memory network B i+1 And memory network C i+1 . Wherein, the memory network B i And a memory network C i The specific implementation of feature extraction for the new convolution image features can be found in the above description of the memory network B i And memory network C i The description thereof will not be repeated here. In addition, a memory network B i+1 And memory network C i+1 The specific implementation of feature extraction for another new convolution image feature can be seen in the above description of the memory network B i+1 And memory network C i+1 The description thereof will not be repeated here.
As shown in fig. 5, the computer apparatus may extract, through the network 52b shown in fig. 5, sequence features in the target image convolution features (i.e., the convolution image features 53b shown in fig. 5) and may use the extracted character sequence features as target character sequence features corresponding to the target verification code image patch, where the target character sequence features may be the sequence features 53b shown in fig. 5, and the character feature dimensions of the sequence features 53b (i.e., the character length extracted from the target verification code characters) may be 10, and the 10 characters may specifically be the character strings "< space >3 > <space > <nnspace > <space > <HHspace > < f" shown in fig. 10. Further, the computer device inputs the sequence feature 53b into a network 52c shown in fig. 5, where the network 52c may be a connection time classification network in the character classification network, at this time, the computer device may perform alignment processing on the sequence feature 53b through the connection time classification network in the character classification network, and then may obtain the captcha character 51b in the captcha picture 51a shown in fig. 5 based on the sequence feature 53b after the alignment processing, so that the following step S103 may be continuously performed subsequently. Where the character "< space >" may be used to characterize the separator "-".
Step S103, inputting the target identifying code characters to a character input area corresponding to the target identifying code picture in a simulation mode, responding to the simulation submission operation aiming at the target identifying code characters in the character input area, and performing character verification on the target identifying code characters to obtain a character verification result;
specifically, in order to improve the success rate of data acquisition, the computer device may obtain a crawler-reverse strategy for the target captcha character, and further may obtain a sleep duration (e.g., 1 to 2 s) and a character entry interval duration (e.g., a random time interval during which a user is simulated to enter a character) indicated by the crawler-reverse strategy; further, the computer device can input the characters of the target verification code to the character input area corresponding to the target verification code picture in a simulated mode according to the character input interval duration within the sleep duration; further, the computer device may perform character verification on the target captcha character displayed in the character entry box in response to the simulated submission operation for the character submission control, to obtain a character verification result.
It can be understood that the character input area herein may specifically include a character entry box and a character submission control; the character entry boxes may be distributed in the character input area 44a in the embodiment corresponding to fig. 4, and the character submission controls may be distributed in the character input area 44a in the embodiment corresponding to fig. 4, or may be independently distributed in the character submission area 45a in the embodiment corresponding to fig. 4. The specific distribution position of the character submission control will not be limited herein.
It should be understood that, after the computer device simulates a user to submit a target verification code character, the computer device may perform character verification on the target verification code character displayed in the character entry box, for example, the computer device may perform character feature comparison on a real verification code character of a target verification picture and a target verification code character obtained by current recognition, and may further obtain a first character feature comparison result under the condition that a similarity between the two satisfies a character verification condition, where the first character feature comparison result may be used to indirectly reflect that the target verification code character obtained by current recognition has higher accuracy. On the contrary, the computer device can obtain a second character feature comparison result under the condition that the similarity between the two does not meet the character verification condition, and the second character feature comparison result can be used for intelligently correcting the currently recognized target verification code character.
It should be understood that, in the embodiment of the present application, the first character feature comparison result or the second character feature comparison result may be collectively referred to as a character verification result obtained after performing character verification on the target verification code character. Therefore, if the character verification result is the first character feature comparison result, the following step S104 may be further performed.
Step S104, if the character checking result indicates that the checking is successful, acquiring an operation certificate picture bound with the service link, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
Specifically, if the character verification result indicates that the verification is successful, the computer device may output a certificate picture display interface associated with the service platform, and display an operation certificate picture bound with the service link in the certificate picture display interface; further, the computer equipment can analyze and process the certificate picture display interface through a webpage crawler strategy associated with the service platform to obtain an operation certificate picture in the certificate picture display interface; taking the operation certificate picture as a picture to be acquired, and calling an optical character recognition model through an optical character recognition interface; further, the computer device may identify the picture to be acquired through the optical character identification model, and use the optical character information identified from the picture to be acquired as the subject information of the operation subject to which the target business object acquired from the operation certificate picture belongs.
It can be understood that, the specific process of extracting the operation certificate picture from the certificate picture display interface by the computer device may be described as follows: the computer device may search the data structure tree of the voucher picture display interface for a voucher picture address character matching a second key element in the second picture capturing rule (e.g., a license picture extracting rule) in the web crawler policy associated with the service platform, and may further extract an operation voucher picture from the voucher display area in the voucher picture display interface based on the verification picture address character, for example, the operation voucher picture may be the operation voucher picture 204a in the embodiment corresponding to fig. 2.
It can be understood that, for a specific implementation manner of obtaining the operation credential picture by the computer device through the second picture collecting rule, reference may be made to the description of the specific process of extracting the target verification picture in the embodiment corresponding to fig. 4, and details will not be further described here.
Therefore, the computer device in the embodiment of the application can capture the service link of the target service object on the service platform through the webpage crawler strategy and can determine the target verification code picture to be verified based on the service link and the webpage crawler strategy; further, the computer device may obtain a target character recognition model associated with the service platform, may input a target verification code image into the target character recognition model, and recognizes the target verification code image by the target character recognition model to obtain a target verification code character in the target verification code image; it should be understood that, by introducing the target character recognition model, the target verification code characters in the target verification code picture can be intelligently extracted, and thus the real-time performance and accuracy of character extraction can be improved. Further, the computer device can input the target verification code characters to a character input area corresponding to the target verification code picture in a simulation mode, and further can perform character verification on the target verification code characters in response to a simulation submission operation aiming at the target verification code characters in the character input area to obtain a character verification result; further, if the character verification result indicates that the verification is successful, the computer device may acquire an operation certificate picture bound with the service link, and acquire the subject information of the operation subject to which the target service object belongs from the operation certificate picture. Therefore, the method and the device for verifying the target authentication code character can intelligently verify the recognized target authentication code character under the condition that the target authentication code character is recognized, so that business subject information (for example, business license subject information) can be quickly extracted from an operation certificate picture (for example, a business license picture) when verification is successful, and further the obtaining efficiency of the subject information can be improved.
Further, please refer to fig. 6, where fig. 6 is a schematic flowchart of a method for processing a captcha character according to an embodiment of the present application. It is understood that the method provided by the embodiments of the present application can be executed by a computer device, where the computer device includes, but is not limited to, a user terminal or a server. As shown in fig. 6, the method may include at least the following steps S201 to S211;
step S201, acquiring a webpage crawler strategy associated with a service platform, and capturing link information of N service objects from the service platform through the webpage crawler strategy;
wherein N is a positive integer;
step S202, obtaining a target business object from N business objects, and using the link information of the target business object as the business link of the target business object;
step S203, analyzing the business link to obtain the operation link of the operation subject to which the target business object belongs, and outputting an operation display interface of the operation subject based on the operation link;
the operation display interface comprises a query interface for querying an operation certificate picture of an operation subject;
and S204, responding to the simulation trigger operation aiming at the query interface, outputting a simulation verification interface corresponding to the operation display interface, and acquiring a target verification code picture to be verified from the simulation verification interface through a webpage crawler strategy.
For a specific implementation manner of steps S201 to S204, reference may be made to the description of step S101 in the embodiment corresponding to fig. 3, and details will not be further described here.
Step S205, acquiring a target character recognition model associated with the service platform, inputting a target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain a target verification code character in the target verification code picture;
step S206, the characters of the target verification code are input to a character input area corresponding to the picture of the target verification code in a simulation mode, and in response to the simulation submission operation aiming at the characters of the target verification code in the character input area, the characters of the target verification code are checked to obtain a character checking result;
step S207, if the character check result indicates that the check is successful, acquiring an operation certificate picture bound with the service link, and acquiring the subject information of the operation subject to which the target service object belongs from the operation certificate picture.
For a specific implementation manner of steps S205 to S207, reference may be made to the description of steps S102 to S104 in the embodiment corresponding to fig. 3, and details will not be further described here.
Step S208, establishing an association relation between the subject information and the service link in a subject information base associated with the operation subject, and updating the association relation to the subject information base.
Step S209, the business link carrying the association relationship is used as a web page updating link, and the business link of the target business object is updated to the web page updating link on the business platform.
Step S210, when the target service object under the webpage updating link is checked to belong to the illegal object in the blacklist, the operation subject indicated by the subject information is taken as the illegal operation subject based on the incidence relation between the subject information and the service link;
and step S211, generating notification information associated with the illegal operation subject, and sending the notification information to a supervision terminal corresponding to a platform supervision person associated with the service platform.
For easy understanding, please refer to fig. 7, and fig. 7 is a schematic view of a scenario of an overall data collection scheme based on a web crawler according to an embodiment of the present application. The acquisition process at all steps of the data acquisition scheme shown in fig. 7 needs to rely on web crawler policies. It should be understood that the web crawler policy herein specifically refers to a program or script for automatically crawling web information according to certain rules. Therefore, in the service scenario of performing intelligent supervision on the e-commerce operation system according to the embodiment of the present application, in the process of executing step S11 shown in fig. 7, link information of one or more service objects may be collected from the platform 71a to be collected through a web crawler policy, and then the collected link information of one or more service objects may be collectively referred to as link information of N service objects captured from the service platform. For example, in the e-commerce operation system, the link information of a business object may be a commodity link of a commodity. This means that the computer device can capture the commodity links of N commodities on the shelf from the platform 71a to be collected (e.g., the QQ mall shopping platform, etc.) through the link acquisition rule (e.g., the above commodity clue) related to the web crawler policy, and can use each captured commodity as a target business object, so that the commodity links of the N commodities can be collectively referred to as the business links of the target business object, and further, the operating subject of a large quantity of commodities on the platform 71a to be collected can be legally monitored, so as to prevent illegal operation of the illegal operating subject.
The specific process of acquiring the verification code picture by the computer device shown in fig. 7 can be referred to the description of the specific process of acquiring the verification code picture in the embodiment corresponding to fig. 3 or fig. 6. Further, the computer device may input the currently acquired verification code picture (i.e., the target verification code picture) into the target character recognition model 72a shown in fig. 7 to execute step S12 shown in fig. 7, so that the verification code characters in the currently acquired verification code picture (i.e., the target verification code picture) can be recognized by the target character recognition model.
It should be understood that, in the process of inputting the currently acquired verification code picture (i.e., the target verification code picture) into the target character recognition model 72a shown in fig. 7 for verification code recognition, the computer device needs to be positioned from the simulation verification interface to the target verification code picture in the specific location area in advance through the first picture acquisition rule related to the web crawler policy, and further, the target verification code picture can be extracted from the simulation verification interface. For a specific implementation manner of extracting the target verification code picture from the simulation verification interface by the computer device, reference may be made to the description of the first picture acquisition rule in the embodiment corresponding to fig. 3, which will not be described again.
Further, when the computer device identifies the target verification code characters in the target verification code picture through the target character identification model, character verification can be performed on the target verification code characters, and then the operation certificate picture bound with each commodity link (namely the service link) can be obtained under the condition that the verification is successful. At this time, the computer device may extract, from the credential image display interface, the operation credential image bound to each service link according to the second image acquisition rule related to the web crawler policy, and then may input the extracted operation credential image to the OCR interface shown in fig. 7 to further perform step S13, and then may acquire, from each operation credential image, corresponding optical character information through the OCR interface, and may acquire, from each operation credential image, corresponding optical character information as subject information of an operation subject to which each target service object belongs.
Further, as shown in fig. 7, when obtaining the subject information of each operation subject, the computer device may add or update the subject information of the operation subject associated with the corresponding service link to the list column in which each service link is located in the subject information database 74a shown in fig. 7, so as to establish an association relationship between the subject information of each operation subject information and each service link.
It should be understood that, if a product link (for example, the business link L3) of a certain product captured by the computer device from the platform to be collected is already associated with an operating subject (for example, the virtual store D) to which the product belongs, the computer device will not need to repeatedly establish an association relationship between the business link L3 and the subject information of the virtual store D in the subject information database 74 a. On the contrary, if the commodity link (e.g., the business link L4) of a certain commodity captured by the computer device from the platform to be collected is not associated with the operation subject (e.g., the virtual store D) to which the commodity belongs, when the computer device obtains the subject information of the virtual store D, the computer device establishes an association relationship between the business link L4 and the subject information of the virtual store D in the subject information database 74a shown in fig. 7, so that the commodity link L3 can be directly associated with the subject information of the virtual store, and further, the supervision department associated with the platform 71a to be collected can be helped, and the illegal operation subjects can be legally monitored through the association relationship between the business links and the subject information, so that the illegal operation subjects on the platform 71a to be collected can be quickly and effectively found out through the association relationships, and further, the problem that the subject information of the operation subject cannot be obtained through direct association of the commodity clues can be fundamentally helped through the established illegal association relationship, that the computer device can help to efficiently supervise the operation subject through the supervision platform.
Therefore, the target character recognition model is introduced, the target verification code characters in the target verification code picture can be intelligently extracted, and the real-time performance and accuracy of character extraction can be improved. Furthermore, the embodiment of the application can intelligently perform character verification on the recognized target verification code character under the condition of recognizing the target verification code character, so that business subject information (e.g., business license subject information) can be quickly extracted from an operation certificate picture (e.g., a watermarked business license picture) when verification is successful, the acquisition efficiency of the subject information can be further improved, and the problem that the subject information of an operation subject cannot be directly obtained through commodity clues in a related manner can be fundamentally solved.
Further, referring to fig. 8, fig. 8 is a method for processing a captcha character according to an embodiment of the present application, where the method may be executed by the foregoing computer device, where the method may include the following steps S301 to S305;
step S301, obtaining an original sample picture for training an initial character recognition model;
wherein the original sample picture is determined based on the sample link and the web crawler policy; the sample link is captured from the service platform through a webpage crawler strategy;
specifically, when the computer device captures the commodity link from the service platform through the web crawler policy, the captured commodity link may be used as a sample link, and a historical verification code picture determined by the web crawler policy and the sample link and recorded in a history may be further searched, so that the determined historical verification code picture may be used as an original sample picture for training the initial character recognition model. It should be understood that, when the original sample picture is obtained, the computer device may perform character tagging on the captcha characters in the original sample picture (for example, the captcha characters identified from the original sample picture may be used as a tagged character captcha, so as to perform character tagging on the original sample picture by using the tagged character captcha), and then may further perform the following step S302 according to the original sample picture after character tagging, so that a large number of enhanced sample pictures with different captcha patterns may be quickly generated by using a small number of currently tagged captcha pictures (that is, the original sample picture after character tagging or the original sample picture).
Step S302, acquiring a data enhancement strategy for enhancing data of an original sample picture, and performing data enhancement processing on the original sample picture based on the data enhancement strategy to obtain at least one enhanced sample picture associated with the original sample picture;
note that the data enhancement policy here may specifically include a character warping policy, a character enlarging policy, a character rotating policy, a character shifting policy, and the like. Specifically, when the data enhancement policies are acquired, the computer device may perform data enhancement processing on the currently labeled original sample picture through the data enhancement policies, and then may quickly obtain enhanced sample pictures of different verification code patterns, so as to generate at least one enhanced sample picture associated with the original sample picture.
The verification code pattern herein may specifically include: the character distortion mode corresponding to the character distortion strategy, the character method mode corresponding to the character amplification strategy, the character rotation mode corresponding to the character rotation strategy, the character position horizontal offset mode or the character position vertical offset mode corresponding to the character offset strategy and the like. It should be understood that, in the embodiment of the present application, after performing data enhancement processing on an obtained original sample picture based on the data enhancement policy, an enhanced sample picture with different verification code patterns may be obtained, and then a large amount of enhancement data (i.e., the aforementioned enhanced sample picture with different verification code patterns) that may participate in training may be generated through a small amount of tag data (i.e., the original sample picture with a tag), so that a target character recognition model that can recognize different verification code patterns may be obtained through training.
Step S303, taking an original sample picture and at least one enhanced sample picture as target sample pictures, and taking marked verification code characters corresponding to the original sample picture as sample labels of the target sample pictures;
it can be understood that, when target sample pictures used for training an initial character recognition model are acquired, corresponding sample labels need to be given to the target sample pictures, and since the enhanced sample pictures are obtained by performing character distortion, character amplification, character rotation or character offset on the original sample pictures, the labeled verification code characters of the labeled original sample pictures can be used as the sample labels of the enhanced sample pictures, so that the cost of simulation learning can be reduced fundamentally.
Step S304, inputting the target sample picture into an initial character recognition model, recognizing the target sample picture by the initial character recognition model, and taking the sample verification code character recognized from the target sample picture as a prediction label;
step S305, performing iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture;
the target verification code characters are used for acquiring operation certificate pictures bound by service links related to the target verification code pictures after character verification is carried out; the operation certificate picture is used for acquiring the subject information of the operation subject to which the target business object indicated by the business link belongs.
Specifically, the computer device may determine an initial loss function for adjusting a model parameter in the initial character recognition model based on the predicted probability value corresponding to the predicted tag and the real probability value corresponding to the sample tag, where the initial loss function is determined by a loss function of the first network and a loss function of the second network; further, if the computer device determines that the value of the initial loss function does not satisfy the model convergence condition, adjusting model parameters of each network in the initial character recognition model based on the value of the initial loss function, and then performing iterative training on the adjusted initial character recognition model through the target sample picture to obtain a target loss function of the initial character recognition model after iterative training; further, if the value of the target loss function satisfies the model convergence condition, the computer device may determine the initial character recognition model after the iterative training that satisfies the model convergence condition as the target character recognition model. The target character recognition model can be used for recognizing the target verification code characters in the target verification code picture in real time.
Wherein, it can be understood that, after the prediction tag is obtained by prediction, the computer device can compare the prediction tag with the real tag of the target sample picture to obtain the initial loss function value of the model loss function. It is to be understood that, if the initial loss function value does not satisfy the above-mentioned model convergence condition (for example, the initial loss function value is not the minimum loss function value in the model training phase), the computer device may inversely adjust the model parameters of each network in the initial character recognition model based on the initial loss function value to iteratively train the adjusted initial character recognition model through a new target sample picture until the target loss function value of the iteratively trained initial character recognition model satisfies the model convergence condition, and then determine the iteratively trained initial character recognition model satisfying the model convergence condition as the target character recognition model.
For easy understanding, please refer to fig. 9, and fig. 9 is a schematic view of a scene in which an initial character recognition model is trained to obtain a target character recognition model according to an embodiment of the present application. Wherein, steps S921 to S927 shown in fig. 9 are model training stages. As shown in fig. 9, in the model training phase, the small number of captcha pictures marked in step S921 may be the marked original sample pictures. As shown in fig. 9, the computer device may further execute step S922 to perform data enhancement processing on the original sample picture by means of warping, rotating, and the like, so as to obtain an enhanced sample picture associated with the marked small number of captcha pictures and having different captcha patterns.
Further, as shown in fig. 9, to reduce the model learning cost, the computer device may execute step S923 shown in fig. 9 to perform image denoising processing on target sample pictures (e.g., the labeled few captcha pictures and the enhanced sample pictures with different captcha patterns) participating in the training, and then may use the target sample pictures subjected to the image denoising processing as sample pictures to be trained. At this time, the computer device may further perform step S924 to input the sample picture to be trained into a convolutional neural network (e.g., a CNN network) in the initial character recognition model, so as to extract an image convolution feature of the sample picture to be trained (i.e., a sample image convolution feature of the target sample picture may be obtained) through the convolutional neural network in the initial character recognition model. Further, the computer device may perform step S925 to extract character sequence features in the sample image convolution features through a recurrent neural network (e.g., a Bi-LSTM network) in the target character recognition model, and may collectively refer to the extracted character sequence features in the sample image convolution features as sample character sequence features corresponding to the target sample picture. Further, the computer device may execute step S926 to perform an alignment process on the sample character sequence features obtained in step S925 by connecting a time classification network, and further may output the sample verification code characters in the target sample picture according to the sample character sequence features after the alignment process. It should be understood that, in the model training stage, the verification code characters output by the initial character recognition model are the prediction labels. At this time, the computer device may perform step S927 to train the initial character recognition model by predicting the label and the sample label, and may further use the trained initial character recognition model as the target character recognition model in the model application stage.
As shown in fig. 9, in the stage of model application, the computer device may execute step S911 to output the simulation verification interface corresponding to the operation display interface by analyzing the store link, and further may obtain a verification code picture to be verified (i.e., the target verification code picture to be verified) from the simulation verification interface. It should be understood that the target character recognition model herein is a character recognition model based on an end-to-end algorithm. Therefore, after the step S912 is executed, the computer device may recognize the target verification code character in the target verification code picture in real time by the target character recognition model, and then may execute the step S913 shown in fig. 9 according to the anti-crawler policy, that is, it is required to simulate the user to input characters one by one according to the randomly set character entry interval duration within the set delay of 1-2 seconds (i.e., the sleep duration), so as to subsequently perform character verification on the entered characters, and further, in case that the character verification is successful, execute the step S914 shown in fig. 9, so as to obtain the main information in the operation voucher picture. For a manner of obtaining the subject information in the operation credential picture, reference may be made to the description of the specific process for obtaining the subject information in the embodiment corresponding to fig. 3, which will not be further described here.
Optionally, when the computer device executes step S926, the sample captcha characters may be serially output through the attention mechanism network. For ease of understanding, please refer to fig. 10, where fig. 10 is a schematic diagram of a scenario in which a sample label is obtained through an attention mechanism network according to an embodiment of the present application. It should be understood that, in the model training phase, the verification code picture 101a shown in fig. 10 may be the above target sample picture. The convolutional recurrent neural network shown in the figure may specifically include the convolutional neural network and the recurrent neural network shown in fig. 9. When the computer device outputs the character sequence feature 103 shown in fig. 10 through the convolutional recurrent neural network, the character sequence feature 103a can be regarded as the above-described sample character sequence feature. In this way, when the sample character sequence features are input to the attention mechanism network 104a of fig. 10, the attention mechanism network encodes the sample character sequence features to obtain the attention force characteristics at each time. At this time, the computer device may perform character feature extraction on the captcha-labeled character corresponding to the sample label through a speech extraction network in the attention mechanism network, so as to obtain a captcha character feature corresponding to each character element in the captcha-labeled character. At this time, the computer device may decode and output the prediction element corresponding to each character element through the decoding function (e.g., the decoding function C1, the decoding function C2, and the decoding function C3) in the attention mechanism network based on the extracted captcha character feature and the attention diagram feature, for example, the prediction element may specifically include the prediction element "3", the prediction element "n", the prediction element "H", and the prediction element "f", and may determine the prediction label of the captcha picture 101a shown in fig. 10 based on the character string formed by these predicted prediction elements, for example, the sample captcha character corresponding to the prediction label may be the captcha character 105a shown in fig. 10.
Therefore, before obtaining the target character recognition model, the computer device according to the embodiment of the present application may further perform model training on the initial character recognition model through the character classification network in the embodiment corresponding to fig. 9 or fig. 10. It should be understood that, in the embodiment of the present application, when an original sample picture used for training an initial character recognition model is acquired, a computer device may label characters of the currently acquired original sample picture, and then may perform data enhancement processing on the currently labeled original sample picture through a data enhancement strategy, so as to quickly generate at least one enhanced sample picture associated with the original sample picture, so that only fewer original sample pictures need to be labeled in a model training stage, and thus labeling work on a large number of target sample pictures participating in training may be fundamentally reduced, and further, a cost of model training may be reduced. In other words, in the embodiment of the application, enhanced sample pictures with different identifying code patterns can be quickly generated through a small number of marked original sample pictures, so that in the process of performing model training on the initial character recognition model, target sample pictures with different identifying code patterns can be quickly obtained based on the original sample pictures and at least one enhanced sample picture, at this time, the computer device can directly use the identifying code characters of the marked original sample pictures as sample labels of the target sample pictures, and then can quickly and accurately configure corresponding sample labels for the target sample pictures participating in training. Further, the computer device may intelligently recognize the target sample picture for recognition by the initial character recognition model after inputting the target sample picture to the initial character recognition model, and may further use the sample captcha character recognized from the target sample picture as a prediction tag; at this time, the computer device may perform iterative training on the initial character recognition model according to the real label (i.e., the above sample label) and the predicted label of the target sample picture to obtain a target character recognition model for recognizing the target verification code character in the target verification code picture, where the target character recognition model may be applicable to variously-changed verification code scenes, and may also solve the recognition problem of the indefinite-length verification code through a convolutional recurrent neural network and a character classification network in the target character recognition model, so as to improve the accuracy and reliability of character recognition with low cost and high efficiency. In addition, it should be understood that, for a specific implementation manner in which the computer device obtains the target character recognition model through training to recognize the target verification code characters in the target verification code picture, reference may be made to the description of the target character recognition model in the embodiment corresponding to fig. 3 or fig. 6, and details will not be further described here.
Further, please refer to fig. 11, where fig. 11 is a schematic structural diagram of an apparatus for processing a captcha character according to an embodiment of the present application. The above-mentioned captcha character processing apparatus 1 may be a computer program (including program code) running in a computer device, for example, the captcha character processing apparatus 1 may be an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. Wherein, the identifying code character processing device 1 can comprise: the system comprises a target picture determining module 11, a target character recognition module 12, a character submitting module 13 and a main body information acquisition module 14; optionally, the apparatus 1 for processing a captcha character may further include: an association relationship establishing module 15, a service link updating module 16, an illegal object detecting module 17 and a notification information generating module 18.
The target picture determining module 11 is used for capturing a service link of a target service object on the service platform through a web crawler strategy and determining a target verification code picture to be verified based on the service link and the web crawler strategy;
the target picture determining module 11 includes: a link grasping unit 111, a link determining unit 112, a link parsing unit 113, and a verification interface output unit 114;
the link capturing unit 111 is configured to acquire a web crawler policy associated with the service platform, and capture link information of the N service objects from the service platform through the web crawler policy; n is a positive integer;
a link determining unit 112, configured to obtain a target service object from the N service objects, and use link information of the target service object as a service link of the target service object;
a link analysis unit 113, configured to analyze the service link to obtain an operation link of an operation subject to which the target service object belongs, and output an operation display interface of the operation subject based on the operation link; the operation display interface comprises a query interface for querying an operation certificate picture of an operation subject;
and the verification interface output unit 114 is configured to respond to a simulation trigger operation for the query interface, output a simulation verification interface corresponding to the operation display interface, and obtain a target verification code picture to be verified from the simulation verification interface through a web crawler policy.
Among them, the verification interface output unit 114 includes: an interface triggering sub-unit 1141, a first inquiry sub-unit 1142 and a verification picture extracting sub-unit 1143;
an interface triggering subunit 1141, configured to respond to a simulation triggering operation for the query interface, and switch the operation display interface to a simulation verification interface; the simulation verification interface comprises a target display area; the target display area is used for displaying a target verification code picture associated with the operation certificate picture;
a first querying subunit 1142, configured to query, through a first picture collecting rule in the web crawler policy, a verification picture address identifier matching a first key element in the first picture collecting rule in a data structure tree corresponding to the simulation verification interface;
and an authentication picture extracting subunit 1143, configured to extract a target authentication code picture from the target display area based on the authentication picture address identifier.
For a specific implementation manner of the interface triggering subunit 1141, the first querying subunit 1142, and the verification image extracting subunit 1143, reference may be made to the description of the specific process for extracting the target verification code image in the embodiment corresponding to fig. 3, which will not be further described here.
For specific implementation manners of the link capturing unit 111, the link determining unit 112, the link analyzing unit 113, and the verification interface output unit 114, reference may be made to the description of step S101 in the embodiment corresponding to fig. 3, and details will not be further described here.
The target character recognition module 12 is used for acquiring a target character recognition model associated with the service platform, inputting a target verification code picture into the target character recognition model, recognizing the target verification code picture by using the target character recognition model, and recognizing to obtain a target verification code character corresponding to the target verification code picture;
the target character recognition module 12 includes: a target picture acquiring unit 121, a convolution feature extracting unit 122, a sequence feature extracting unit 123 and a sequence feature aligning unit 124;
the target image obtaining unit 121 is configured to obtain a target character recognition model associated with the service platform, perform denoising processing on the target verification code image, and use the denoised target verification code image as an image to be processed;
the convolution feature extraction unit 122 is configured to input the picture to be processed to a convolution neural network in the target character recognition model, extract an image convolution feature of the picture to be processed by the convolution neural network in the target character recognition model, and use the extracted image convolution feature as a target image convolution feature of the target verification code picture;
the sequence feature extraction unit 123 is configured to input the convolution feature of the target image to a recurrent neural network in the target character recognition model, extract a sequence feature in the convolution feature of the target image by the recurrent neural network in the target character recognition model, and use the extracted character sequence feature as a target character sequence feature corresponding to the target verification code image;
the target character recognition model comprises a recurrent neural network, and the recurrent neural network in the target character recognition model is a bidirectional long-term and short-term memory network; the bidirectional long and short term memory network comprises a forward long and short term memory network and a reverse long and short term memory network; the forward long-short time memory network comprises a memory network B i And a memory network B i+1 Memory network B i+1 As a memory network B i The next memory network of (2); the reverse long-time and short-time memory network comprises a memory network C i+1 And a memory network C i (ii) a Memory network C i+1 As a memory network C i The last memory network of (2); i is a positive integer less than or equal to M; the number of the memory networks in the forward long-short-time memory network and the reverse long-short-time memory network is M;
the sequence feature extraction unit 123 includes: a forward feature extraction subunit 1231, a reverse feature extraction subunit 1232, a feature concatenation subunit 1233, and a sequence feature determination subunit 1234;
a forward feature extraction subunit 1231, configured to obtain the memory network B in the long-and-short term memory network in the forward direction i Associated forward history hidden feature h i-1 Convolving the target image with the feature and forwardHistory hidden feature h i-1 Input memory network B i From a memory network B i Extracting and obtaining the hidden feature h of the forward target at the moment i i Hiding the forward target with the feature h i Inputting the convolution characteristics of the target image into a memory network B i+1 From a memory network B i+1 Extracting and obtaining a forward target hidden feature h at the (i + 1) th moment i+1
A reverse characteristic extraction subunit 1232, configured to obtain and reverse the memory network C in the long-time and short-time memory networks i+1 Associated reverse history hidden feature k i+1 Convolving the target image with the inverse historical hidden feature k i+1 Input memory network C i+1 From a memory network C i+1 Extracting and obtaining reverse target hidden feature k at the (i + 1) th moment i Hiding the reverse target feature k i Input memory network C for convolution characteristic of target image i From a memory network C i Extracting reverse target hidden feature k at the ith moment i-1
A feature splicing subunit 1233 for splicing the memory network B i Extracting the forward target hidden feature h at the ith moment i And a memory network C i Extracting the reverse target hidden feature k at the ith moment i-1 Performing feature splicing to obtain a first splicing feature, and memorizing the network B i+1 Extracting the forward target hidden feature h at the (i + 1) th moment i+1 And a memory network C i+1 The reverse target hidden feature k extracted at the (i + 1) th moment i Performing characteristic splicing to obtain a second splicing characteristic;
the sequence feature determining subunit 1234 is configured to determine, based on the first splicing feature and the second splicing feature, a target character sequence feature corresponding to the target verification code image extracted from the target image convolution feature.
For a specific implementation manner of the forward feature extraction subunit 1231, the reverse feature extraction subunit 1232, the feature concatenation subunit 1233, and the sequence feature determination subunit 1234, reference may be made to the description of the bidirectional long-term and short-term memory network in the embodiment corresponding to fig. 3, which will not be further described here.
And the sequence feature alignment unit 124 is configured to input the target character sequence features into a character classification network in the target character recognition model, perform alignment processing on the target character sequence features through a connection time classification network in the character classification network, and obtain target verification code characters in the target verification code picture based on the target character sequence features after the alignment processing.
For specific implementation manners of the target picture obtaining unit 121, the convolution feature extracting unit 122, the sequence feature extracting unit 123, and the sequence feature aligning unit 124, reference may be made to the description of step S102 in the embodiment corresponding to fig. 3, and details will not be further described here.
The character submitting module 13 is configured to input a target verification code character to a character input area corresponding to the target verification code image in a simulation manner, and perform character verification on the target verification code character in response to a simulation submitting operation for the target verification code character in the character input area to obtain a character verification result;
wherein, the character submitting module 13 includes: an anti-crawler unit 131, a character input unit 132, and a character submission unit 133;
the anti-crawler unit 131 is configured to acquire an anti-crawler policy for a target captcha character, and acquire a dormancy duration and a character entry interval duration indicated by the anti-crawler policy;
the character input unit 132 is used for inputting the characters of the target verification code to the character input area corresponding to the target verification code picture in a character input interval time in a sleep time; the character input area comprises a character input box and a character submission control;
and a character submitting unit 133, configured to perform character verification on the target verification code characters displayed in the character entry box in response to the simulated submission operation for the character submitting control, so as to obtain a character verification result.
For specific implementation manners of the anti-crawler unit 131, the character input unit 132, and the character submitting unit 133, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, and details will not be further described here.
And the main body information acquisition module 14 is configured to, if the character verification result indicates that the verification is successful, acquire an operation certificate picture bound with the service link, and acquire the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
Wherein, the main body information collecting module 14 includes: an operation certificate display unit 141, a certificate interface analysis unit 142, an optical model calling unit 143 and a subject information acquisition unit 144;
an operation certificate display unit 141, configured to output a certificate picture display interface associated with the service platform if the character verification result indicates that the verification is successful, and display an operation certificate picture bound with the service link in the certificate picture display interface;
the voucher interface analyzing unit 142 is used for analyzing the voucher picture display interface through a webpage crawler policy associated with the service platform to obtain an operation voucher picture in the voucher picture display interface;
the credential interface parsing unit 142 includes: a second query sub-unit 1421 and an operation picture extraction sub-unit 1422;
a second query subunit 1421, configured to search, in the data structure tree of the credential image display interface, a credential image address identifier matching a second key element in a second image acquisition rule according to the second image acquisition rule in the web crawler policy associated with the service platform;
an operation picture extracting subunit 1422, configured to extract an operation certificate picture from the certificate display area in the certificate picture display interface based on the verification picture address identifier.
For specific implementation manners of the second querying subunit 1421 and the operation picture extracting subunit 1422, reference may be made to the description of the specific process for obtaining the operation credential picture in the embodiment corresponding to fig. 3, which will not be further described here.
The optical model calling unit 143 is configured to call the operation certificate picture as a picture to be acquired through an optical character recognition interface;
and a main body information acquisition unit 144, configured to identify the picture to be acquired through the optical character recognition model, and use the optical character information identified from the picture to be acquired as the main body information of the operation main body to which the target business object acquired from the operation voucher picture belongs.
For specific implementation manners of the operation credential display unit 141, the credential interface parsing unit 142, the optical model invoking unit 143, and the subject information collecting unit 144, reference may be made to the description of step S104 in the embodiment corresponding to fig. 3, and details will not be further described here.
Optionally, the association relationship establishing module 15 is configured to establish an association relationship between the subject information and the service link in a subject information base associated with the operation subject, and update the association relationship to the subject information base.
Optionally, the service link updating module 16 is configured to use the service link carrying the association relationship as a web page update link, and update the service link of the target service object as the web page update link on the service platform.
Optionally, the illegal object detection module 17 is configured to, when it is detected that the target service object under the web page update link belongs to an illegal object in the blacklist, use an operation subject indicated by the subject information as an illegal operation subject based on an association relationship between the subject information and the service link;
and the notification information generating module 18 is configured to generate notification information associated with an illegal operation subject, and send the notification information to a monitoring terminal corresponding to a platform monitoring person associated with the service platform.
For specific implementation manners of the target image determining module 11, the target character recognizing module 12, the character submitting module 13, and the main body information acquiring module 14, reference may be made to the description of steps S101 to S104 in the embodiment corresponding to fig. 3, and details will not be further described here. In addition, for specific implementation manners of the association relationship establishing module 15, the service link updating module 16, the illegal object detecting module 17 and the notification information generating module 18, reference may be made to the description of step S201 to step S211 in the embodiment corresponding to fig. 6, and details will not be further described here. In addition, the beneficial effects of the same method are not described in detail.
Further, please refer to fig. 11, and fig. 12 is a schematic structural diagram of an apparatus for processing a validation code character according to an embodiment of the present application. The captcha character processing apparatus 2 may be a computer program (including program code) running in a computer device, for example the captcha character processing apparatus 2 may be an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The validation code character processing apparatus 2 may include: an original picture acquisition module 21, an enhanced picture generation module 22, a sample picture determination module 23, a sample character recognition module 24 and a model training module 25;
an original image obtaining module 21, configured to obtain an original sample image used for training an initial character recognition model; the original sample picture is determined based on the sample link and the webpage crawler policy; the sample link is captured from the service platform through a webpage crawler strategy;
the enhanced picture generation module 22 is configured to obtain a data enhancement policy for performing data enhancement on the original sample picture, and perform data enhancement processing on the original sample picture based on the data enhancement policy to obtain at least one enhanced sample picture associated with the original sample picture;
the sample picture determining module 23 is configured to use the original sample picture and the at least one enhanced sample picture as target sample pictures, and use the labeled verification code character corresponding to the original sample picture as a sample label of the target sample picture;
the sample character recognition module 24 is configured to input the target sample picture into the initial character recognition model, recognize the target sample picture by using the initial character recognition model, and use a sample verification code character recognized from the target sample picture as a prediction tag;
the model training module 25 is configured to perform iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture; the target verification code character is used for acquiring an operation certificate picture bound with a service link associated with the target verification code picture after character verification is carried out; the operation certificate picture is used for acquiring the subject information of the operation subject to which the target business object indicated by the business link belongs.
For specific implementation manners of the original picture obtaining module 21, the enhanced picture generating module 22, the sample picture determining module 23, the sample character recognizing module 24, and the model training module 25, reference may be made to the description of step S301 to step S305 in the embodiment corresponding to fig. 8, and details will not be further described here. In addition, the beneficial effects of the same method are not described in detail.
Further, please refer to fig. 13, fig. 13 is a schematic diagram of a computer device according to an embodiment of the present application. The computer device 1000 as shown in fig. 13 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others. The memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 1005 may optionally also be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 13, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 1000 shown in fig. 13, the network interface 1004 is mainly used to provide a network communication function; the user interface 1003 is an interface for providing a user with input; the processor 1001 may be configured to call the device control application stored in the memory 1005, so as to perform the description of the method for processing the captcha character in the embodiment corresponding to fig. 3, fig. 6, or fig. 8, the description of the apparatus 1 for processing the captcha character in the embodiment corresponding to fig. 11, and the description of the apparatus 2 for processing the captcha character in the embodiment corresponding to fig. 12, which are not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Furthermore, it is to be noted here that: an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program executed by the aforementioned computer device 1000, and the computer program includes program instructions, and when the processor executes the program instructions, the processor can perform the description of the verification code character processing method in the embodiment corresponding to fig. 3 or fig. 6 or fig. 8, and therefore, details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware that is instructed by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.

Claims (15)

1. A method for processing a verification code character, comprising:
capturing a service link of a target service object on a service platform through a webpage crawler strategy, and determining a target verification code picture to be verified based on the service link and the webpage crawler strategy;
acquiring a target character recognition model associated with the service platform, inputting the target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain target verification code characters in the target verification code picture;
the target identifying code characters are input to a character input area corresponding to the target identifying code picture in a simulation mode, and in response to a simulation submission operation aiming at the target identifying code characters in the character input area, the target identifying code characters are subjected to character verification to obtain a character verification result;
if the character checking result indicates that the checking is successful, acquiring an operation certificate picture bound with the service link, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
2. The method of claim 1, wherein the capturing a service link of a target service object on a service platform through a web crawler policy and determining a target verification code picture to be verified based on the service link and the web crawler policy comprise:
acquiring a webpage crawler strategy associated with a service platform, and capturing link information of N service objects from the service platform through the webpage crawler strategy; n is a positive integer;
acquiring a target business object from the N business objects, and using the link information of the target business object as a business link of the target business object;
analyzing the service link to obtain an operation link of an operation subject to which the target service object belongs, and outputting an operation display interface of the operation subject based on the operation link; the operation display interface comprises a query interface for querying an operation certificate picture of the operation subject;
responding to the simulation trigger operation aiming at the query interface, outputting a simulation verification interface corresponding to the operation display interface, and acquiring a target verification code picture to be verified from the simulation verification interface through the webpage crawler strategy.
3. The method according to claim 2, wherein the responding to the simulation trigger operation for the query interface outputs a simulation verification interface corresponding to the operation display interface, and the obtaining of the target verification code picture to be verified from the simulation verification interface through the web crawler policy includes:
responding to the simulation trigger operation aiming at the query interface, and switching the operation display interface into a simulation verification interface; the simulation verification interface comprises a target display area; the target display area is used for displaying a target verification code picture associated with the operation certificate picture;
inquiring a verification picture address character matched with a first key element in the first picture acquisition rule in a data structure tree corresponding to the simulation verification interface through a first picture acquisition rule in the webpage crawler strategy;
and extracting the target verification code picture from the target display area based on the verification picture address character.
4. The method according to claim 1, wherein the obtaining a target character recognition model associated with the service platform, inputting the target verification code picture into the target character recognition model, and recognizing the target verification code picture by the target character recognition model to obtain the target verification code character in the target verification code picture, comprises:
acquiring a target character recognition model associated with the service platform, denoising the target verification code image, and taking the denoised target verification code image as a to-be-processed image;
inputting the picture to be processed into a convolutional neural network in the target character recognition model, extracting the image convolution characteristic of the picture to be processed by the convolutional neural network in the target character recognition model, and taking the extracted image convolution characteristic as the target image convolution characteristic of the target verification code picture;
inputting the target image convolution characteristics to a recurrent neural network in the target character recognition model, extracting sequence characteristics in the target image convolution characteristics by the recurrent neural network in the target character recognition model, and taking the extracted character sequence characteristics as target character sequence characteristics corresponding to the target verification code image;
inputting the target character sequence features into a character classification network in the target character recognition model, aligning the target character sequence features through a connection time classification network in the character classification network, and obtaining target verification code characters in the target verification code picture based on the aligned target character sequence features.
5. The method of claim 4, wherein the target character recognition model comprises a recurrent neural network, and the recurrent neural network in the target character recognition model is a bidirectional long-short term memory network; the bidirectional long and short term memory network comprises a forward long and short term memory network and a reverse long and short term memory network; the forward long-and-short time memory network comprises a memory network B i And memory network B i+1 Said memory network B i+1 For the memory network B i The next memory network of (2); the reverse long-time and short-time memory network comprises a memory network C i+1 And memory network C i (ii) a Said memory network C i+1 For the memory network C i The last memory network of (2); i is a positive integer less than or equal to M; the number of the memory networks in the forward long-short time memory network and the reverse long-short time memory network is M;
the inputting the convolution feature of the target image into a recurrent neural network in the target character recognition model, extracting a sequence feature in the convolution feature of the target image by the recurrent neural network in the target character recognition model, and taking the extracted character sequence feature as a target character sequence feature corresponding to the target verification code image includes:
obtaining the memory network B in the forward long-and-short-term memory network i Associated forward history hidden feature h i-1 The convolution characteristic of the target image and the forward history hidden characteristic h are combined i-1 Inputting the memory network B i From said memory network B i Extracting and obtaining a forward target hidden feature h at the moment i i Hiding the forward target with the feature h i Inputting convolution characteristics of the target image into the memory network B i+1 From said memory network B i+1 Extracting and obtaining a forward target hidden feature h at the (i + 1) th moment i+1
Acquiring the memory network C in the reverse long-time and short-time memory network i+1 Associated reverse History hidden feature k i+1 Convolving the target image with the inverse historical hidden feature k i+1 Input into the memory network C i+1 From said memory network C i+1 Extracting and obtaining reverse target hidden feature k at the (i + 1) th moment i Hiding the reverse target feature k i Inputting convolution characteristics of the target image into the memory network C i From said memory network C i Extracting and obtaining reverse target hidden features k at the ith moment i-1
The memory network B i The forward target hidden feature h extracted at the ith moment i And the memory network C i The reverse target hidden feature k extracted at the ith moment i-1 Performing characteristic splicing to obtain a first splicing characteristic, and connecting the memory network B i+1 The forward target hidden feature h extracted at the (i + 1) th moment i+1 And the memory network C i+1 The reverse target hidden feature k extracted at the (i + 1) th moment i Performing characteristic splicing to obtain a second splicing characteristic;
and determining target character sequence characteristics corresponding to the target verification code picture extracted from the target image convolution characteristics based on the first splicing characteristics and the second splicing characteristics.
6. The method according to claim 1, wherein the analog inputting of the target captcha character into a character input area corresponding to the target captcha picture, and in response to an analog submission operation for the target captcha character in the character input area, performing character checking on the target captcha character to obtain a character checking result, includes:
acquiring a crawler-resisting strategy aiming at the target verification code characters, and acquiring the dormancy duration and the character input interval duration indicated by the crawler-resisting strategy;
in the sleep time length, inputting the characters of the target verification code to a character input area corresponding to the target verification code picture in a character input interval time length in a simulation mode; the character input area comprises a character input box and a character submission control;
and responding to the simulated submission operation aiming at the character submission control, and performing character verification on the target verification code character displayed in the character input box to obtain a character verification result.
7. The method according to claim 1, wherein if the character verification result indicates that the verification is successful, acquiring an operation certificate picture bound to the service link, and acquiring subject information of an operation subject to which the target service object belongs from the operation certificate picture, includes:
if the character verification result indicates that the verification is successful, outputting a certificate picture display interface associated with the service platform, and displaying an operation certificate picture bound with the service link in the certificate picture display interface;
analyzing the voucher picture display interface through a webpage crawler strategy associated with the service platform to obtain the operation voucher picture in the voucher picture display interface;
taking the operation certificate picture as a picture to be collected, and calling an optical character recognition model through an optical character recognition interface;
and identifying the picture to be acquired through the optical character identification model, and taking the optical character information identified from the picture to be acquired as the main body information of the operation main body to which the target business object belongs, which is acquired from the operation voucher picture.
8. The method of claim 7, wherein the parsing the credential image display interface through a web crawler policy associated with the service platform to obtain the operation credential image in the credential image display interface comprises:
searching a certificate picture address character matched with a second key element in a second picture acquisition rule in a data structure tree of the certificate picture display interface through the second picture acquisition rule in the webpage crawler strategy associated with the service platform;
and extracting the operation certificate picture from a certificate display area in the certificate picture display interface based on the verification picture address symbol.
9. The method of claim 1, further comprising:
establishing an association relation between the subject information and the service link in a subject information base associated with the operation subject, and updating the association relation to the subject information base;
and taking the service link carrying the incidence relation as a webpage updating link, and updating the service link of the target service object into the webpage updating link on the service platform.
10. The method of claim 9, further comprising:
when the target business object under the webpage updating link is detected to belong to an illegal object in a blacklist, taking an operation subject indicated by the subject information as an illegal operation subject based on the incidence relation between the subject information and the business link;
and generating notification information associated with the illegal operation subject, and sending the notification information to a supervision terminal corresponding to a platform supervision person associated with the service platform.
11. A method for processing a verification code character, comprising:
acquiring an original sample picture for training an initial character recognition model; the original sample picture is determined based on a sample link and a web crawler policy; the sample link is grabbed from a service platform through the webpage crawler strategy;
acquiring a data enhancement strategy for performing data enhancement on the original sample picture, and performing data enhancement processing on the original sample picture based on the data enhancement strategy to obtain at least one enhanced sample picture associated with the original sample picture;
taking the original sample picture and the at least one enhanced sample picture as target sample pictures, and taking marked verification code characters corresponding to the original sample picture as sample labels of the target sample pictures;
inputting the target sample picture into the initial character recognition model, recognizing the target sample picture by the initial character recognition model, and taking the sample verification code character recognized from the target sample picture as a prediction label;
performing iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture; the target verification code characters are used for acquiring operation certificate pictures bound by service links related to the target verification code pictures after character verification is carried out; and the operation certificate picture is used for acquiring the subject information of the operation subject to which the target service object indicated by the service link belongs.
12. An apparatus for processing a captcha character, comprising:
the target picture determining module is used for capturing a service link of a target service object on a service platform through a webpage crawler strategy and determining a target verification code picture to be verified based on the service link and the webpage crawler strategy;
the target character recognition module is used for acquiring a target character recognition model associated with the service platform, inputting the target verification code picture into the target character recognition model, recognizing the target verification code picture by the target character recognition model, and recognizing to obtain a target verification code character corresponding to the target verification code picture;
the character submitting module is used for inputting the target verification code characters to a character input area corresponding to the target verification code picture in a simulation mode, responding to the simulation submitting operation aiming at the target verification code characters in the character input area, and performing character verification on the target verification code characters to obtain a character verification result;
and the main body information acquisition module is used for acquiring an operation certificate picture bound with the service link if the character verification result indicates that the verification is successful, and acquiring the main body information of the operation main body to which the target service object belongs from the operation certificate picture.
13. An apparatus for processing a captcha character, comprising:
the original image acquisition module is used for acquiring an original sample image used for training an initial character recognition model; the original sample picture is determined based on a sample link and a web crawler policy; the sample link is grabbed from a service platform through the webpage crawler strategy;
the enhanced picture generation module is used for acquiring a data enhancement strategy for performing data enhancement on the original sample picture, and performing data enhancement processing on the original sample picture based on the data enhancement strategy to obtain at least one enhanced sample picture associated with the original sample picture;
a sample picture determining module, configured to use the original sample picture and the at least one enhanced sample picture as target sample pictures, and use an annotated authentication code character corresponding to the original sample picture as a sample label of the target sample picture;
the sample character recognition module is used for inputting the target sample picture into the initial character recognition model, recognizing the target sample picture by the initial character recognition model, and taking a sample verification code character recognized from the target sample picture as a prediction label;
the model training module is used for carrying out iterative training on the initial character recognition model based on the prediction label and the sample label to obtain a target character recognition model for recognizing target verification code characters in a target verification code picture; the target verification code characters are used for acquiring operation certificate pictures bound by service links related to the target verification code pictures after character verification is carried out; and the operation certificate picture is used for acquiring the subject information of the operation subject to which the target service object indicated by the service link belongs.
14. A computer device, comprising: a processor and a memory;
the processor is connected to a memory for storing a computer program, the processor being configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-11.
15. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded and executed by a processor to cause a computer device having said processor to carry out the method of any one of claims 1 to 11.
CN202110571650.1A 2021-05-25 2021-05-25 Method, device and related equipment for processing verification code characters Pending CN115410201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110571650.1A CN115410201A (en) 2021-05-25 2021-05-25 Method, device and related equipment for processing verification code characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110571650.1A CN115410201A (en) 2021-05-25 2021-05-25 Method, device and related equipment for processing verification code characters

Publications (1)

Publication Number Publication Date
CN115410201A true CN115410201A (en) 2022-11-29

Family

ID=84155963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110571650.1A Pending CN115410201A (en) 2021-05-25 2021-05-25 Method, device and related equipment for processing verification code characters

Country Status (1)

Country Link
CN (1) CN115410201A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132989A (en) * 2023-10-23 2023-11-28 山东大学 Character verification code identification method, system and equipment based on convolutional neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132989A (en) * 2023-10-23 2023-11-28 山东大学 Character verification code identification method, system and equipment based on convolutional neural network
CN117132989B (en) * 2023-10-23 2024-01-26 山东大学 Character verification code identification method, system and equipment based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN112381104B (en) Image recognition method, device, computer equipment and storage medium
CN112101304B (en) Data processing method, device, storage medium and equipment
CN111931188B (en) Vulnerability testing method and system in login scene
CN108108711B (en) Face control method, electronic device and storage medium
CN114331829A (en) Countermeasure sample generation method, device, equipment and readable storage medium
CN109194689B (en) Abnormal behavior recognition method, device, server and storage medium
CN110795714A (en) Identity authentication method and device, computer equipment and storage medium
CN111049786A (en) Network attack detection method, device, equipment and storage medium
CN112733057A (en) Network content security detection method, electronic device and storage medium
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN113591603A (en) Certificate verification method and device, electronic equipment and storage medium
CN112307464A (en) Fraud identification method and device and electronic equipment
CN113079157A (en) Method and device for acquiring network attacker position and electronic equipment
CN114842411A (en) Group behavior identification method based on complementary space-time information modeling
CN115563600A (en) Data auditing method and device, electronic equipment and storage medium
CN115564000A (en) Two-dimensional code generation method and device, computer equipment and storage medium
CN111881740A (en) Face recognition method, face recognition device, electronic equipment and medium
CN115410201A (en) Method, device and related equipment for processing verification code characters
CN106126067A (en) A kind of method, device and mobile terminal triggering the unlatching of augmented reality function
CN111597361B (en) Multimedia data processing method, device, storage medium and equipment
CN113642519A (en) Face recognition system and face recognition method
Fowdur et al. Performance analysis of edge, fog and cloud computing paradigms for real-time video quality assessment and phishing detection
CN113762040B (en) Video identification method, device, storage medium and computer equipment
CN114495188B (en) Image data processing method and device and related equipment
CN113469138A (en) Object detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination