CN117690117A

CN117690117A - Image recognition method and device combining semantic analysis and serving data acquisition

Info

Publication number: CN117690117A
Application number: CN202311640501.1A
Authority: CN
Inventors: 杨亮山; 何冠枢; 徐徵; 凌俊
Original assignee: Guangdong Zhongsituo Big Data Research Institute Co ltd
Current assignee: Guangdong Zhongsituo Big Data Research Institute Co ltd
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-03-12

Abstract

The application relates to an image recognition method, device, equipment and medium for combining semantic analysis and serving data acquisition. The method comprises the following steps: acquiring an image to be identified; performing feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph; carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text; the text semantic data is used to characterize an indication type of the indication content, and the indication type is associated with at least one of type data, orientation data, area data, orientation data, and color data; the text semantic data is logically matched with the image feature information of each candidate sub-image to determine a target candidate sub-image logically matched with the indicated content from the plurality of candidate sub-images. By adopting the method, the identification processing of the verification code image can be realized, and the efficiency and the accuracy of the identification processing mode of the verification code image are improved.

Description

Image recognition method and device combining semantic analysis and serving data acquisition

Technical Field

The present invention relates to the field of computer technology, and in particular, to an image recognition method, an image recognition apparatus, a computer device, a computer readable storage medium, and a computer program product for combining semantic analysis to serve data acquisition.

Background

The verification code is an important means for preventing malicious attacks on websites, various in verification code forms, and the image verification code is an excellent design, so that compared with the traditional input character type verification code, the image verification code is high in safety and is very friendly to users.

But the image verification code effectively prevents malicious attacks, and simultaneously greatly influences an automatic program without malicious attacks, so that the process which can be automatically performed originally is forced to be separated, and the working efficiency is reduced. Meanwhile, in order to evaluate the security of the image verification code, a corresponding verification code image identification and processing method is often required to be designed to resist the test.

The existing image verification code recognition technology is most widely used in character recognition, slider jigsaw, word selection, scene recognition and the like, and along with the development of the technology, the verification codes can obtain effective recognition efficiency after a large amount of training through a machine learning method. However, for complex verification codes, if complex scenes of various spatial features and logic judgment such as semantic analysis, color recognition, geometric shape recognition, azimuth judgment and the like are required to be combined, the existing recognition technology cannot recognize with high accuracy and high efficiency.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a semantic analysis-combined image recognition method, a semantic analysis-combined image recognition apparatus, a computer device, a computer-readable storage medium, and a computer program product that serve for data acquisition. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image recognition method for combining semantic analysis for data acquisition, comprising:

acquiring an image to be identified; the image to be identified is an identifying code image; the image to be identified comprises an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of the object;

performing feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph; the image characteristic data comprises type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph; and

carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text; the text semantic data is used to characterize an indication type of the indication content, and the indication type is associated with at least one of the type data, the bearing data, the area data, the orientation data, and the color data;

Logically matching the text semantic data with the image characteristic information of each candidate sub-image to determine a target candidate sub-image logically matched with the indication content from the plurality of candidate sub-images;

the target candidate subgraph is used for a user to serve as reference service information when the user collects information about big data.

In an exemplary embodiment, the image recognition model comprises a spatial object shape recognition model pre-trained based on a Yolo target detection algorithm;

performing feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data for each candidate sub-graph, including:

inputting the image to be identified into the space object shape identification model to respectively identify the object type, the center point coordinate and the vertex coordinate of the target object in each candidate subgraph through the space object shape identification model;

determining type data of each target object based on the object type of each target object; and

determining area data of each target object based on vertex coordinates of each target object; and

determining azimuth data of each target object based on differences between center point coordinates of each target object;

The azimuth data are used for representing the position difference of each target object in the image to be identified.

In an exemplary embodiment, the image recognition model comprises a spatial object angle recognition model pre-trained based on a Yolo target detection algorithm;

inputting the image to be identified into the space object angle identification model to respectively identify the display angles of the target objects in the candidate subgraphs through the space object angle identification model; the display angle is used for representing a visual angle difference value of a target object in a three-dimensional space when the image to be identified is displayed in a two-dimensional plane;

determining orientation data of each target object based on the display angle of each target object;

wherein the orientation data is used for characterizing the orientation of a target object in three-dimensional space when the image to be identified is displayed in a two-dimensional plane.

In an exemplary embodiment, the image recognition model includes a color cluster pre-trained based on a Kmeans clustering algorithm;

inputting the image to be identified into the color cluster device so as to respectively identify the cluster colors of the target objects in the candidate subgraphs through the color cluster device;

and determining color data of each target object based on the cluster color of each target object.

In an exemplary embodiment, before the semantic analysis is performed on the indication text based on the preset text recognition model, the method further includes:

performing regular matching on the indication text to determine irrelevant words in the indication text;

deleting the irrelevant words in the instruction text to obtain a processed instruction text;

performing word segmentation processing on the processed indication text to obtain a word segmentation word list aiming at the indication text;

based on a preset word bag dictionary, sequentially converting each word segmentation word in the word segmentation word list into a corpus vector in a preset format to obtain a corpus vector list aiming at the indication text;

the word bag dictionary comprises a plurality of word number combinations, and each word number combination comprises a target word and a flag bit number bound with the target word sense;

The corpus vector in the preset format comprises zone bit numbers related to word segmentation words and frequency numbers of the word segmentation words which correspondingly appear in the indicated text.

In an exemplary embodiment, the text recognition model comprises a corpus pre-trained based on an Lsi semantic detection algorithm; the corpus comprises a plurality of reference corpus vector lists;

the semantic analysis of the indication text based on the preset text recognition model comprises the following steps:

performing similarity matching on the corpus vector list of the indicated text through the corpus to match a target corpus vector list with highest similarity with the corpus vector list from the plurality of reference corpus vector lists;

and determining text semantics corresponding to each corpus vector in the target corpus vector list based on the word bag dictionary so as to obtain text semantic data aiming at the indication text.

In an exemplary embodiment, after determining a target candidate sub-graph logically matching the indicated content from the plurality of candidate sub-graphs, the method further includes:

generating a verification code identification result aiming at the image to be identified based on the center point coordinates of the corresponding target object in the target candidate subgraph;

The verification code identification result is used as parameter information when a user performs verification code identification test on the target webpage or reference information when the user performs data acquisition on the target webpage through a data acquisition program with legal authentication.

According to a second aspect of embodiments of the present disclosure, there is provided an image recognition apparatus for combined semantic analysis serving data acquisition, comprising:

an image acquisition unit configured to perform acquisition of an image to be recognized; the image to be identified is an identifying code image; the image to be identified comprises an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of the object;

the feature recognition unit is configured to perform feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data for each candidate sub-graph; the image characteristic data comprises type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph;

the semantic analysis unit is configured to perform semantic analysis on the indication text based on a preset text recognition model to obtain text semantic information for the indication text; the text semantic information is used to characterize an indication type of the indication content, and the indication type is associated with at least one of the type data, the azimuth data, the area data, the orientation data, and the color data;

A logic matching unit configured to perform logic matching of the text semantic data with image feature information of each of the candidate subgraphs, so as to determine a target candidate subgraph logically matched with the indicated content from the plurality of candidate subgraphs;

According to a third aspect of embodiments of the present disclosure, there is provided a computer device comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the executable instructions to implement the method of any one of the above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium comprising program data which, when executed by a processor of a computer device, enables the computer device to perform a method as described in any one of the preceding claims.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising program instructions therein, which when executed by a processor of a computer device, enable the computer device to perform the method as set forth in any one of the preceding claims.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

firstly, acquiring an image to be identified; the image to be identified is a verification code image; the method comprises the steps that an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of an object are included in an image to be identified; performing feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph; the image characteristic data comprise type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph; carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text; the text semantic data is used to characterize an indication type of the indication content, and the indication type is associated with at least one of type data, orientation data, area data, orientation data, and color data; logically matching the text semantic data with the image feature information of each candidate sub-image to determine a target candidate sub-image logically matched with the indicated content from a plurality of candidate sub-images; the target candidate subgraph is used for a user to serve as reference service information when the user collects information about big data. On one hand, the method is different from the prior art, and the method performs feature recognition on each candidate sub-image based on a preset image recognition model to obtain image feature data; the text semantic data is obtained by carrying out semantic analysis on the indication text based on a preset text recognition model, and then the text semantic data is logically matched with the image characteristic information of each candidate sub-image to match out a target candidate sub-image, so that the image recognition processing flow is optimized, the recognition processing efficiency of the verification code image is improved, and the resource occupancy rate and the labor cost in the process of recognizing the verification code image are reduced; on the other hand, the type data, the azimuth data, the area data, the orientation data and the color data of the object displayed in the candidate subgraph and the text semantic data used for representing the indication type of the indication content are utilized to determine the target candidate subgraph logically matched with the indication content from the plurality of candidate subgraphs, so that various spatial features and logic judgment such as semantic analysis, color recognition, geometric shape recognition and azimuth judgment are effectively combined in the image recognition processing process, the flexibility and the accuracy of the verification code image recognition processing mode are improved, and the system can normally operate other application functions.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a diagram illustrating an application environment for an image recognition method incorporating semantic analysis that serves data acquisition according to an exemplary embodiment.

FIG. 2 is a flowchart illustrating a method of image recognition in conjunction with semantic analysis to service data acquisition according to an exemplary embodiment.

Fig. 3 is a schematic diagram of a captcha image, shown in accordance with an exemplary embodiment.

FIG. 4 is a flowchart illustrating a step of indicating text to corpus vector conversion according to an exemplary embodiment.

FIG. 5 is a block diagram of an image recognition processing device incorporating semantic analysis, according to an example embodiment.

FIG. 6 is a block diagram illustrating a computer device for image recognition, according to an example embodiment.

FIG. 7 is a block diagram illustrating a computer-readable storage medium for image recognition according to an example embodiment.

FIG. 8 is a block diagram illustrating a computer program product for image recognition, according to an example embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The term "and/or" in embodiments of the present application refers to any and all possible combinations including one or more of the associated listed items. Also described are: as used in this specification, the terms "comprises/comprising" and/or "includes" specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.

The terms "first," "second," and the like in this application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

In addition, although the terms "first," "second," etc. may be used several times in this application to describe various operations (or various elements or various applications or various instructions or various data) etc., these operations (or elements or applications or instructions or data) should not be limited by these terms. These terms are only used to distinguish one operation (or element or application or instruction or data) from another operation (or element or application or instruction or data).

The image recognition method combining semantic analysis for data acquisition can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a communication network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.

In some embodiments, referring to fig. 1, server 104 obtains an image to be identified; the image to be identified is a verification code image; the method comprises the steps that an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of an object are included in an image to be identified; performing feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph; the image characteristic data comprise type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph; carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text; the text semantic data is used to characterize an indication type of the indication content, and the indication type is associated with at least one of type data, orientation data, area data, orientation data, and color data; logically matching the text semantic data with the image feature information of each candidate sub-image to determine a target candidate sub-image logically matched with the indicated content from a plurality of candidate sub-images; the target candidate subgraph is used for a user to serve as reference service information when the user collects information about big data.

In some embodiments, the terminal 102 (e.g., mobile terminal, fixed terminal) may be implemented in various forms. The terminal 102 may be a mobile terminal including a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a portable handheld device, a personal digital assistant (PDA, personal Digital Assistant), a tablet (PAD), etc., or the terminal 102 may be a fixed terminal such as an automated teller machine (Automated Teller Machine, ATM), an automatic all-in-one machine, a digital TV, a desktop computer, a stationary computer, etc.

In the following, it is assumed that the terminal 102 is a fixed terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiments disclosed herein can also be applied to a mobile type terminal 102 if there are operations or elements specifically for the purpose of movement.

In some embodiments, the data processing components running on server 104 may load any of a variety of additional server applications and/or middle tier applications being executed, including, for example, HTTP (hypertext transfer protocol), FTP (file transfer protocol), CGI (common gateway interface), RDBMS (relational database management system), and the like.

In some embodiments, the server 104 may be implemented as a stand-alone server or as a cluster of servers. The server 104 may be adapted to run one or more application services or software components that provide the terminal 102 described in the foregoing disclosure.

In some embodiments, the operating systems on which the application services or software components run may include various versions of Microsoft Windows, apple Macintosh, and/or Linux operating systems, various commercial or UNIX-like operating systems (including but not limited to various GNU/Linux operating systems, google Chrome OS, etc.), and/or mobile operating systems, such as iOS, windows Phone, android, OS, blackBerry, palm OS operating systems, and other online or offline operating systems, without specific limitation herein.

In some embodiments, as shown in fig. 2, there is provided an image recognition method for serving data collection in combination with semantic analysis, taking an example that the method is applied to the server 104 in fig. 1 as an illustration, the method includes the following steps:

step S11: and acquiring an image to be identified.

In one embodiment, the image to be identified is a verification code image, and the image to be identified comprises an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of the object.

The indication text is used for indicating a user or a computer to select a candidate sub-graph of which the corresponding object space morphology is matched with the indication content from the image to be identified.

In a specific application scenario, referring to fig. 3, fig. 3 is a schematic page diagram of an embodiment of a verification code image in the present application. The page is a Web page, and when logging in the page, a verification code image used for testing is popped up in the page for the server to carry out identification testing. The verification code image comprises an indication text of 'please click on an object under a letter', and a plurality of candidate subgraphs of 'dark capital letter E', 'dark cube', 'dark lowercase letter y', 'dark numeral 4', 'light lowercase letter E', 'light lowercase letter y', 'dark lowercase letter E' and 'dark cone' occupying a certain space form.

Step S12: and carrying out feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph.

The image characteristic data comprises object name data, type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph.

In some embodiments, the image recognition model is trained based on a plurality of captcha images carrying annotation data. Each verification code image is marked with corresponding marking data of object type data, object azimuth data, object area data, object orientation data and object color data by engineers.

In one embodiment, the server first performs object recognition on the image to be recognized to identify individual object regions in the image to be recognized; then, image segmentation is carried out on each object area to obtain a plurality of candidate subgraphs; and finally, naming the object of each candidate sub-graph to obtain object name data about the displayed object.

In one embodiment, the image recognition model includes a spatial object shape recognition model pre-trained based on a Yolo target detection algorithm. The spatial object shape recognition model is used for recognizing type data, azimuth data and area data of objects displayed in the candidate subgraphs.

Specifically, the server inputs the image to be recognized into a space object shape recognition model to output type data, azimuth data and area data corresponding to each candidate sub-image.

In one embodiment, the image recognition model includes a spatial object angle recognition model pre-trained based on a Yolo target detection algorithm. The spatial object angle recognition model is used for recognizing orientation data of objects displayed in the candidate subgraphs.

Specifically, the server inputs the image to be identified into a space object angle identification model to output the orientation angles corresponding to the candidate subgraphs so as to obtain orientation data.

In one embodiment, the image recognition model includes a color cluster pre-trained based on a Kmeans clustering algorithm. Wherein the color cluster is used for identifying the color data of the object displayed in each candidate subgraph.

Specifically, the server inputs the image to be identified into a color cluster to output cluster colors corresponding to each candidate subgraph so as to obtain color data.

Step S13: and carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text.

In one embodiment, the text recognition model comprises a corpus pre-trained based on an Lsi semantic detection algorithm.

In particular, the server training corpus may be based on the following process:

(1) a plurality of example text is obtained.

For example, example text includes: example 1 "please find the same object as the maximum circle dimension color," example 2 "please find the uppercase corresponding to the green letter," example 3 "please click on the number" 6 "right against your, example 4" please select the object below the number, "example 5" please find the letter above the cylinder ".

(2) And sequentially performing irrelevant times of eliminating processing on the example text to obtain a plurality of corresponding training texts.

For example, training text includes: text 1 "object with the same color as the largest circle dimension, uppercase corresponding to the green letter of text 2", text 3 "right against your number" 6", object under the number of text 4", letter on top of the cylinder of text 5 ".

(3) Word segmentation processing is carried out on the training texts to obtain a plurality of corresponding text word sets.

Wherein the text word set may also be a word pool.

For example, a word pool includes: set 1 (and largest, circle-dimensional, color, same, object), set 2 (green, letter, corresponding, uppercase), set 3 (right against, you, number, ",6,"), set 4 (number, bottom, object), set 5 (cylinder, top, letter).

(4) Based on the plurality of sets of text words, a bag of words is generated.

The word bag is formed by arranging a plurality of words to form a word (key) and a word number combination of a flag bit number (value).

For example, the word count combinations in the word bag include: (and, 0), (circle body, 1), (max, 2), (object, 3), (4), (same, 5), (color, 6), (uppercase, 7), (letter, 8), (correspondence, 9), (green, 10), (", 11), (" 6", 12), (you, 13), (number, 14), (right, 15), (lower, 16), (upper, 17), (cylinder, 18). The former character in each word number combination is a word (key), and the latter character is a flag digit (value) bound corresponding to the word (key).

(5) A corpus is generated based on the word bags and the training text.

The corpus comprises a plurality of reference corpus vector lists, and each reference corpus vector list comprises a zone number of word segmentation words and a number of times of the word segmentation words which correspondingly appear in the training text.

For example, a training text is (right to you, numbers, ",6,") and its corresponding list of reference corpus vectors is [ (4.1), (11.2), (12.1), (13.1), (14.1), (15.1) ]. Wherein the number of occurrences of the "4" token word combination (4) in (4.1), "1" token character "in the training text is 1; (11.2) the combination of the characterization words (11), the occurrence frequency of the characterization character (2) in the training text is 2 times, and other vector expression modes are the same, so that redundant description is omitted.

In one embodiment, the server performs semantic analysis on the indication text based on a preset text recognition model, including the following steps:

step one: and performing similarity matching on the corpus vector list of the indicated text through the corpus to match a target corpus vector list with the highest similarity with the corpus vector list from a plurality of reference corpus vector lists.

Specifically, the server firstly converts the indication text into a corpus vector list with the same format as each reference corpus vector list of the corpus; and then, respectively carrying out similarity calculation on vector matrixes by the corpus vector list of the instruction text and each reference corpus vector list to obtain a target corpus vector list with the highest similarity with the corpus vector list.

Step two: based on the word bag dictionary, text semantics corresponding to each corpus vector in the target corpus vector list are determined, so that text semantic data aiming at the indicated text is obtained.

Wherein the text semantic data is used to characterize an indication type of the indication content and the indication type is associated with at least one of type data, orientation data, area data, orientation data, and color data.

As an example, a target corpus vector list is [ (0.1), (1.1), (2.1), (3.1), (4.1), (5.1), (6.1) ], and text meaning corresponding to each corpus vector is determined as [ "same", "maximum", "circle-dimension", "color", "same", "object" ] according to the word bag dictionary. Wherein the "maximum" characterization indication type includes a size type that is associated with the area data; "circle-dimension" characterization indicates that the type includes a shape type that is associated with type data; the "color" characterization indication type includes a color type, which is associated with color data.

Step S14: the text semantic data is logically matched with the image feature information of each candidate sub-image to determine a target candidate sub-image logically matched with the indicated content from the plurality of candidate sub-images.

Specifically, the server firstly analyzes the text semantic data and the image feature information of each candidate sub-graph with respect to the logical association of the semantic and the features so as to obtain the logical association degree between the image feature information of each candidate sub-graph and the text semantic data; and then, determining a target candidate sub-graph with the highest corresponding logic association degree from the plurality of candidate sub-graphs based on the logic association degree.

In an embodiment, the server may calculate a linear or nonlinear relationship between the text semantic data and the image feature information based on correlation coefficients (such as pearson correlation coefficients, spearman class correlation coefficients), analysis of variance, mutual information, linear regression, and the like, so as to obtain a corresponding logical association degree.

The logic association degree is used for measuring indexes of the logic correlation relationship between the text semantic data and the image characteristic information. Taking the pearson correlation coefficient mode as an example, the pearson correlation coefficient is used for measuring the strength and the direction of the linear correlation relationship between the two variables, the value ranges from-1 to 1, the value is close to 1 and represents positive correlation, the value is close to-1 and represents negative correlation, and the value is close to 0 and represents no correlation.

In one embodiment, the server firstly performs front-back ordering on the plurality of candidate subgraphs according to the logic association degree corresponding to each candidate subgraph; and then the candidate subgraph with the logic association degree arranged at the forefront is used as a target candidate subgraph.

In an embodiment, the target candidate subgraph is used by the user as reference service information when collecting information about big data.

Wherein the server, after determining a target candidate sub-graph logically matching the indicated content from the plurality of candidate sub-graphs, further comprises: and generating a verification code identification result aiming at the image to be identified based on the center point coordinates of the corresponding target object in the target candidate subgraph.

Taking an image to be identified as an identifying code image when an engineer performs identification test of clicking identifying codes in a target webpage as an example: after the server matches a corresponding target candidate sub-image from a plurality of candidate sub-images of the verification code image, the server takes the center point coordinates of the target object in the target candidate sub-image as output test point selection coordinates, namely the identification test result of the verification code image is used for clicking the center point coordinates of the target object in the verification code image, so as to finish the click test of the verification code image.

Taking an image to be identified as a login verification code when a user performs data acquisition in a target webpage as an example: after the server matches a corresponding target candidate sub-image from a plurality of candidate sub-images of the verification code image, the server takes the center point coordinates of the target object in the target candidate sub-image as the output login point coordinates, namely the identification test result of the verification code image is used for clicking the center point coordinates of the target object in the verification code image so as to finish point login of the verification code image.

In the image recognition process combining semantic analysis for data acquisition, a server firstly acquires an image to be recognized; the image to be identified is a verification code image; the method comprises the steps that an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of an object are included in an image to be identified; then, carrying out feature recognition on each candidate sub-graph based on a preset image recognition model to obtain image feature data aiming at each candidate sub-graph; the image characteristic data comprise type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph; carrying out semantic analysis on the indication text based on a preset text recognition model to obtain text semantic data aiming at the indication text; the text semantic data is used to characterize an indication type of the indication content, and the indication type is associated with at least one of type data, orientation data, area data, orientation data, and color data; finally, carrying out logic matching on the text semantic data and the image characteristic information of each candidate sub-image so as to determine a target candidate sub-image logically matched with the indicated content from a plurality of candidate sub-images; the target candidate subgraph is used for a user to serve as reference service information when the user collects information about big data. On one hand, the method is different from the prior art, and the method performs feature recognition on each candidate sub-image based on a preset image recognition model to obtain image feature data; the text semantic data is obtained by carrying out semantic analysis on the indication text based on a preset text recognition model, and then the text semantic data is logically matched with the image characteristic information of each candidate sub-image to match out a target candidate sub-image, so that the image recognition processing flow is optimized, the recognition processing efficiency of the verification code image is improved, and the resource occupancy rate and the labor cost in the process of recognizing the verification code image are reduced; on the other hand, the type data, the azimuth data, the area data, the orientation data and the color data of the object displayed in the candidate subgraph and the text semantic data used for representing the indication type of the indication content are utilized to determine the target candidate subgraph logically matched with the indication content from the plurality of candidate subgraphs, so that various spatial features and logic judgment such as semantic analysis, color recognition, geometric shape recognition and azimuth judgment are effectively combined in the image recognition processing process, the flexibility and the accuracy of the verification code image recognition processing mode are improved, and the system can normally operate other application functions.

It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the disclosed methods may be implemented in a more specific manner. For example, the embodiment in which the server performs feature recognition on each candidate sub-image based on a preset image recognition model to obtain image feature data for each candidate sub-image is merely illustrative.

In an exemplary embodiment, in step S12, that is, the server performs feature recognition on each candidate sub-graph based on the preset image recognition model, to obtain the type data, the area data, and the azimuth data for each candidate sub-graph, the following may be specifically executed:

step one: and inputting the image to be identified into a space object shape identification model to respectively identify the object type, the center point coordinate and the vertex coordinate of the target object in each candidate subgraph through the space object shape identification model.

The object type is the preset type of the target object, the center point coordinate is the center coordinate of the target object in the image to be identified, and the vertex coordinate is the edge vertex coordinate of the target object in the image to be identified.

In some embodiments, the object types of the target object include four major classes, and a plurality of minor classes under each major class. For example, object types of the target object include four major classes of "number", "lowercase", "uppercase", and "geometry", and include 9 minor classes corresponding to 1 to 9 under the "number" type, which correspond to the classification names: nb_1, nb_2..nb_9; the 26 subclasses corresponding to the a-z English lowercase are included under the "lowercase" type, and the corresponding class names are: sl_a, sl_b. The 26 subclasses corresponding to the A-Z English capitalization are included under the "capitalization" type, and the corresponding classification names are: sl_a, sl_b. The 4 subclasses comprising cylinders, spheres, cubes, cones under the "geometry" type, which correspond to the classification names: geo_cylinder, geo_sphere, geo_cube, geo_con.

Step two: determining type data of each target object based on the object type of each target object; and determining area data of each target object based on the vertex coordinates of each target object; and determining azimuth data of each target object based on a difference between coordinates of the center points of each target object.

In one embodiment, the server calculates the graphic area of the target object in the image by using the vertex coordinates (e.g., four vertex coordinates) of the target object, so as to compare the area sizes of different target objects in the image in pairs according to the image area.

In an embodiment, the server uses coordinates of a center point of the target object to obtain position data of the target object by comparing the position differences of different target objects in the images in the X axis/Y axis. The azimuth data are used for representing the position difference of each target object in the image to be identified, namely, the position relation of different target objects in the image on the X axis/Y axis.

As an example, the positional difference may be a positional and directional relationship between objects such as an a object being above a B object, and a B object being below and to the right of a C object.

Specifically, the server determines type data of each target object (e.g., the type data of the target object is english capital letter sl_a) according to the identified object type of each target object (e.g., the object type of the target object is english capital letter a); the server calculates area data (namely, the area size) of each target object according to the vertex coordinates of each edge of each identified target object in the image to be identified; and the server calculates azimuth data (namely the position and direction relation between every two objects) between each target object according to each central coordinate of each identified target object in the image to be identified.

In an exemplary embodiment, in step S12, that is, the server performs feature recognition on each candidate sub-graph based on a preset image recognition model, to obtain the orientation data for each candidate sub-graph, the following may be specifically executed:

step one: and inputting the image to be identified into a space object angle identification model to respectively identify the display angles of the target objects in the candidate subgraphs through the space object angle identification model.

In an embodiment, the display angle is used to characterize the perspective difference value of a target object in three-dimensional space when displaying an image to be identified in a two-dimensional plane.

Step two: and determining the orientation data of each target object based on the display angle of each target object.

In an embodiment, the orientation data is used to characterize the orientation of the target object in three-dimensional space when the image to be identified is shown in a two-dimensional plane. The orientation comprises four types of front, back, left side and right side.

As an example, take the "dark number 4" in fig. 3 as an example. The viewing angle difference between the "dark number 4" and the front viewing angle plane in the three-dimensional space is 45 degrees clockwise, that is, the front face of the "dark number 4" rotates 45 degrees clockwise in the front viewing angle plane, that is, the display angle of the "dark number 4" is 45 degrees toward the left side face, so that the orientation of the "dark number 4" when the image to be recognized is displayed in the two-dimensional plane is the left side face.

In an exemplary embodiment, in step S12, that is, the server performs feature recognition on each candidate sub-graph based on a preset image recognition model, to obtain color data for each candidate sub-graph, the following may be specifically performed:

step one: and inputting the images to be identified into a color cluster device to respectively identify the cluster colors of the target objects in each candidate subgraph through the color cluster device.

In one embodiment, the color cluster is constructed by a large number of training images under the training of kmeans clustering algorithm.

Wherein the training process may include: extracting a feature vector about R, G, B dimension from each training image; then, carrying out feature clustering on each feature vector through a kmeans clustering algorithm to obtain feature cluster sets with corresponding preset numbers, wherein the number of the feature cluster sets can be 5, namely, the feature cluster sets of five main color categories; and finally, extracting the corresponding cluster center feature from each feature cluster set to serve as a reference feature vector of the feature cluster set, and taking the color category corresponding to each feature cluster set as the center color of the feature cluster set.

Specifically, the color cluster firstly extracts feature data on a R, G, B three-dimensional channel from the candidate subgraph to obtain a target feature vector corresponding to R, G, B dimensions; then, respectively carrying out difference value calculation on the target feature vector and a preset number of reference feature vectors to obtain difference values between the target feature vector and each reference feature vector; and finally, taking the center color corresponding to the reference feature vector with the minimum corresponding difference value as the clustering color of the candidate subgraph.

Step two: color data of each target object is determined based on the cluster color of each target object.

Specifically, the server directly uses the cluster colors of the candidate subgraphs as the object colors of the corresponding target objects to obtain corresponding color data.

In an embodiment, before the server performs semantic analysis on the indicated text based on a preset text recognition model, the indicated text needs to be converted into a corresponding corpus vector list, so that similarity matching is performed on the corpus vector list of the indicated text through a corpus, and text semantic data for the indicated text is obtained.

In an exemplary embodiment, referring to fig. 4, fig. 4 is a schematic flow chart of an embodiment of text conversion into corpus vectors indicated in the present application. Specifically, the server converting the indication text into a corresponding corpus vector list includes the following steps:

step a1: the indicated text is subjected to regular matching to determine irrelevant words in the indicated text.

The irrelevant words are words which are irrelevant to the target object in the image to be identified, such as "please", "click", "find", "input", and the like in the indication text.

Step a2: deleting irrelevant words in the instruction text to obtain the processed instruction text.

Step a3: and performing word segmentation processing on the processed indication text to obtain a word segmentation word list aiming at the indication text.

As an example, an indication text is "please find an object with the same color as the largest cone in the figure". The server carries out regular matching and irrelevant word deletion on the instruction text to obtain the processed instruction text 'an object with the same color as the maximum cone in the figure'; then, word segmentation processing is carried out on the object to obtain a word segmentation word list (of the object with the same color as the largest cone).

Step a4: based on a preset word bag dictionary, each word segmentation word in the word segmentation word list is sequentially converted into a corpus vector in a preset format, and a corpus vector list aiming at the indication text is obtained.

The word bag dictionary comprises a plurality of word number combinations, and each word number combination comprises a target word and a flag bit number bound with a target word sense.

As an example, the word count combination in the bag of words dictionary includes: (and, 0), (circle body, 1), (max, 2), (object, 3), (4), (same, 5), (color, 6), (uppercase, 7), (letter, 8), (correspondence, 9), (green, 10), (", 11), (" 6", 12), (you, 13), (number, 14), (right, 15), (lower, 16), (upper, 17), (cylinder, 18). The former character in each word number combination is a target word, and the latter character is a flag bit number correspondingly bound with the target word.

The corpus vector in the preset format comprises zone bit numbers related to word segmentation words and frequency numbers of the word segmentation words which correspondingly appear in the indication text.

As an example, a word list is (right, you, number, ",6,") and its corpus vector corresponding to the preset format is [ (4.1), (11.2), (12.1), (13.1), (14.1), (15.1) ]. Wherein the number of occurrences of the "4" token word combination (4) in (4.1), "1" token character "in the word segmentation word list is 1; (11.2) the combination of the characterization words (11), the occurrence frequency of the characterization character (2) in the word segmentation word list is 2 times, and other vector expression modes are the same, so that redundant description is omitted.

On one hand, the method is different from the prior art, and the method performs feature recognition on each candidate sub-image based on a preset image recognition model to obtain image feature data; the text semantic data is obtained by carrying out semantic analysis on the indication text based on a preset text recognition model, and then the text semantic data is logically matched with the image characteristic information of each candidate sub-image to match out a target candidate sub-image, so that the image recognition processing flow is optimized, the recognition processing efficiency of the verification code image is improved, and the resource occupancy rate and the labor cost in the process of recognizing the verification code image are reduced; on the other hand, the type data, the azimuth data, the area data, the orientation data and the color data of the object displayed in the candidate subgraph and the text semantic data used for representing the indication type of the indication content are utilized to determine the target candidate subgraph logically matched with the indication content from the plurality of candidate subgraphs, so that various spatial features and logic judgment such as semantic analysis, color recognition, geometric shape recognition and azimuth judgment are effectively combined in the image recognition processing process, the flexibility and the accuracy of the verification code image recognition processing mode are improved, and the system can normally operate other application functions.

It should be understood that, although the steps in the figures of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 2-4 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

It should be understood that the same/similar parts of the embodiments of the method described above in this specification may be referred to each other, and each embodiment focuses on differences from other embodiments, and references to descriptions of other method embodiments are only needed.

Fig. 5 is a block diagram of an image recognition device with semantic analysis for data acquisition according to an embodiment of the present application. Referring to fig. 5, the image recognition apparatus 10 for combining semantic analysis serving data collection includes: an image acquisition unit 11, a feature recognition unit 12, a semantic analysis unit 13, and a logical matching unit 14.

Wherein the image acquisition unit 11 is configured to perform acquisition of an image to be recognized; the image to be identified is an identifying code image; the image to be identified comprises an indication text for expressing indication content in a natural language form and a plurality of candidate subgraphs for representing the spatial form of the object;

wherein the feature recognition unit 12 is configured to perform feature recognition on each candidate sub-graph based on a preset image recognition model, so as to obtain image feature data for each candidate sub-graph; the image characteristic data comprises type data, azimuth data, area data, orientation data and color data of an object displayed in the candidate subgraph;

the semantic analysis unit 13 is configured to perform semantic analysis on the indication text based on a preset text recognition model to obtain text semantic information for the indication text; the text semantic information is used to characterize an indication type of the indication content, and the indication type is associated with at least one of the type data, the azimuth data, the area data, the orientation data, and the color data;

wherein the logic matching unit 14 is configured to perform logic matching of the text semantic data with the image feature information of each of the candidate sub-graphs, so as to determine a target candidate sub-graph logically matched with the indicated content from the plurality of candidate sub-graphs;

In some embodiments, the image recognition model comprises a spatial object shape recognition model pre-trained based on a Yolo target detection algorithm;

In some embodiments, the image recognition model comprises a spatial object angle recognition model pre-trained based on a Yolo target detection algorithm;

In some embodiments, the image recognition model includes a color cluster pre-trained based on Kmeans clustering algorithm;

In some embodiments, before the semantic analysis of the indication text based on the preset text recognition model, the method further includes:

In some embodiments, the text recognition model comprises a corpus pre-trained based on an Lsi semantic detection algorithm; the corpus comprises a plurality of reference corpus vector lists;

In some embodiments, after determining a target candidate sub-graph logically matching the indicated content from the plurality of candidate sub-graphs, the method further comprises:

Fig. 6 is a block diagram of a computer device 20 provided in an embodiment of the present application. For example, the computer device 20 may be an electronic device, an electronic component, or an array of servers, or the like. Referring to fig. 6, the computer device 20 includes a processor 21, which further processor 21 may be a processor set, which may include one or more processors, and the computer device 20 includes memory resources represented by a memory 22, wherein the memory 22 has stored thereon a computer program, such as an application program. The computer program stored in the memory 22 may include one or more modules each corresponding to a set of executable instructions. Furthermore, the processor 21 is configured to implement the image recognition method as described above that serves in combination with semantic analysis of data acquisition when executing a computer program.

In some embodiments, computer device 20 is an electronic device in which a computing system may run one or more operating systems, including any of the operating systems discussed above as well as any commercially available server operating systems. The computer device 20 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP (hypertext transfer protocol) servers, FTP (file transfer protocol) servers, CGI (common gateway interface) servers, super servers, database servers, and the like. Exemplary database servers include, but are not limited to, those commercially available from (International Business machines) and the like.

In some embodiments, processor 21 generally controls the overall operation of computer device 20, such as operations associated with display, data processing, data communication, and recording operations. The processor 21 may comprise one or more processor components to execute computer programs to perform all or part of the steps of the methods described above. Further, the processor component may include one or more modules that facilitate interactions between the processor component and other components. For example, the processor component may include a multimedia module to facilitate controlling interactions between the user computer device 20 and the processor 21 using the multimedia component.

In some embodiments, the processor components in the processor 21 may also be referred to as CPUs (Central Processing Unit, central processing units). The processor assembly may be an electronic chip with signal processing capabilities. The processor may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor element or the like. In addition, the processor components may be collectively implemented by an integrated circuit chip.

In some embodiments, memory 22 is configured to store various types of data to support operations at computer device 20. Examples of such data include instructions, acquisition data, messages, pictures, video, and the like for any application or method operating on computer device 20. The memory 22 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.

In some embodiments, the memory 22 may be a memory stick, TF card, etc., and may store all information in the computer device 20, including the input raw data, computer programs, intermediate running results, and final running results, all stored in the memory 22. In some embodiments, it stores and retrieves information based on the location specified by the processor. In some embodiments, with memory 22, computer device 20 has memory capabilities to ensure proper operation. In some embodiments, the memory 22 of the computer device 20 may be divided into a main memory (memory) and an auxiliary memory (external memory) according to purposes, and there is a classification method that is divided into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the motherboard for storing data and programs currently being executed, but is only used for temporarily storing programs and data, and the data is lost when the power supply is turned off or the power is turned off.

In some embodiments, the computer device 20 may further comprise: the power supply assembly 23 is configured to perform power management of the computer device 20, and the wired or wireless network interface 24 is configured to connect the computer device 20 to a network, and the input output (I/O) interface 25. The computer device 20 may operate based on an operating system stored in the memory 22, such as Windows Server, mac OS X, unix, linux, freeBSD, or the like.

In some embodiments, power supply component 23 provides power to the various components of computer device 20. The power supply components 23 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the computer device 20.

In some embodiments, the wired or wireless network interface 24 is configured to facilitate communication between the computer device 20 and other devices, either wired or wireless. The computer device 20 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof.

In some embodiments, the wired or wireless network interface 24 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the wired or wireless network interface 24 also includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In some embodiments, input output (I/O) interface 25 provides an interface between processor 21 and peripheral interface modules, which may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

Fig. 7 is a block diagram of a computer-readable storage medium 30 provided in an embodiment of the present application. The computer-readable storage medium 30 has stored thereon a computer program 31, wherein the computer program 31, when executed by a processor, implements the image recognition method as described above that serves data acquisition in combination with semantic analysis.

The units integrated with the functional units in the various embodiments of the present application may be stored in the computer-readable storage medium 30 if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or all or part of the technical solution, or in a software product, and the computer readable storage medium 30 includes several instructions in a computer program 31 to enable a computer device (may be a personal computer, a system server, or a network device, etc.), an electronic device (such as MP3, MP4, etc., also may be a smart terminal such as a mobile phone, a tablet computer, a wearable device, etc., also may be a desktop computer, etc.), or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application.

Fig. 8 is a block diagram of a computer program product 40 provided by an embodiment of the present application. Included in the computer program product 40 are program instructions 41, the program instructions 41 being executable by a processor of the computer device 20 to implement the image recognition method as described above that serves data acquisition in combination with semantic analysis.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided with a semantic analysis-combined image recognition method, a semantic analysis-combined image recognition apparatus 10, a computer device 20, a computer readable storage medium 30 or a computer program product 40. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product 40 embodied on one or more computer program instructions 41 (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer usable program code embodied therein.

The present application is described with reference to flowchart and/or block diagram illustrations of a method of image recognition in conjunction with semantic analysis, which serves data acquisition, an apparatus 10 for image recognition in conjunction with semantic analysis, which serves data acquisition, a computer device 20, a computer-readable storage medium 30, or a computer program product 40, according to embodiments of the application. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program product 40. These computer program products 40 may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the program instructions 41, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program products 40 may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the program instructions 41 stored in the computer program product 40 produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These program instructions 41 may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the program instructions 41 which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that the descriptions of the above methods, apparatuses, electronic devices, computer-readable storage media, computer program products and the like according to the method embodiments may further include other implementations, and specific implementations may refer to descriptions of related method embodiments, which are not described herein in detail.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of image recognition in combination with semantic analysis for data acquisition, the method comprising:

2. The method of claim 1, wherein the image recognition model comprises a spatial object shape recognition model pre-trained based on a Yolo target detection algorithm;

3. The method of claim 1, wherein the image recognition model comprises a spatial object angle recognition model pre-trained based on a Yolo target detection algorithm;

4. The method of claim 1, wherein the image recognition model comprises a color cluster pre-trained based on Kmeans clustering algorithm;

5. The method of claim 1, further comprising, prior to the semantically analyzing the indicated text based on the preset text recognition model:

6. The method of claim 5, wherein the text recognition model comprises a corpus pre-trained based on an Lsi semantic detection algorithm; the corpus comprises a plurality of reference corpus vector lists;

7. The method of claim 1, further comprising, after said determining a target candidate subgraph logically matching said indicated content from said plurality of candidate subgraphs:

8. An image recognition device for data acquisition in combination with semantic analysis, the device comprising:

9. A computer device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the executable instructions to implement the method of any one of claims 1 to 7.

10. A computer readable storage medium comprising program data, which, when executed by a processor of a computer device, enables the computer device to perform the method of any one of claims 1 to 7.