CN111368838A

CN111368838A - Method and device for identifying reported screenshot

Info

Publication number: CN111368838A
Application number: CN201811605039.0A
Authority: CN
Inventors: 余建兴; 李政龙; 余敏雄; 余赢超; 王焜; 冯毅
Original assignee: Zhuhai Kingsoft Online Game Technology Co Ltd
Current assignee: Zhuhai Kingsoft Online Game Technology Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-07-03

Abstract

The embodiment of the application provides a method and a device for identifying a report screenshot, and relates to the technical field of network security, wherein the method comprises the following steps: the method comprises the steps of obtaining a report screenshot to be recognized, determining an image in a chat frame in the report screenshot according to a preset chat area recognition algorithm to obtain a chat area image, recognizing text information in the chat area image, inputting the text information into a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot. By the adoption of the method and the device, efficiency of identifying malicious information can be improved.

Description

Method and device for identifying reported screenshot

Technical Field

The application relates to the technical field of network security, in particular to a method and a device for identifying a reported screenshot.

Background

The chat system of the game is an open platform, and users can freely speak. However, there may be malicious information (such as advertisements that induce consumption or fraud) issued by lawbreakers using chat systems, compromising normal player interest. To maintain the interests of normal users, chat systems provide a reporting mechanism.

When the user finds that malicious information exists in the chat area, the user can directly perform screenshot operation on the chat area to obtain a report screenshot, then click a report option, and the user terminal can send a report request to the server, wherein the report request comprises the report screenshot intercepted by the user. Then, the server displays the report screenshot through a display component. The customer service staff can check the report screenshot by checking the report screenshot, and judge whether malicious information really exists in the report screenshot, so as to perform security management (for example, performing number sealing processing on an account issuing the malicious information).

However, whether malicious information exists is judged by manually checking the picture, the efficiency of identifying the malicious information is low, and when the number of reporting requests is large, the actual requirements cannot be met.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for identifying a report screenshot so as to improve the efficiency of identifying malicious information. The specific technical scheme is as follows:

in a first aspect, a method for identifying a report screenshot is provided, where the method includes:

acquiring a report screenshot to be identified;

determining an image in a chat frame in the report screenshot according to a preset chat area recognition algorithm to obtain a chat area image;

identifying text information in the chat area image;

and inputting the text information into a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot.

Optionally, the determining, according to a preset chat area recognition algorithm, an image in a chat frame in the report screenshot to obtain a chat area image includes:

detecting edges of all elements in the report screenshot according to a preset edge detection algorithm;

extracting candidate contours meeting preset contour conditions according to a preset contour extraction algorithm and the detected edges of all elements;

and respectively determining sub-images contained in each candidate contour, and determining the sub-images meeting the preset similarity condition with a preset image to obtain the chat area image.

Optionally, the detecting, according to a preset edge detection algorithm, an edge including each element in the report screenshot includes:

performing Gaussian blur preprocessing on the reported screenshot to obtain a processed image;

carrying out gray level processing on the processed image to obtain a gray level image corresponding to the processed image;

and carrying out convolution change on the gray level image, and determining the edge of each element contained in the report screenshot.

Optionally, the extracting, according to a preset contour extraction algorithm and the detected edges of each element, a candidate contour that meets a preset contour condition includes:

carrying out binarization on the image after the convolution change to obtain a binary image;

performing closing operation on the binary image;

determining a rectangular connected domain formed by pixel points with high pixel values in the image after the closing operation;

and determining the connected domain meeting the preset length-width ratio in the rectangular connected domain to obtain the candidate contour.

Optionally, the determining the sub-image meeting the preset similarity condition with the preset image to obtain the chat area image includes:

respectively calculating the color characteristic vector of each sub-image;

matching the color characteristic vector of each sub-image with the color characteristic vector of a preset image;

and determining the sub-images with the matching degree larger than a preset threshold value to obtain the chat area images.

Optionally, the identifying text information in the image of the chat area includes:

determining a character image containing a single character in the chat area image through a preset character segmentation algorithm;

recognizing that the character image contains characters according to a preset character recognition model;

and forming the recognized characters into sentences according to the arrangement sequence of the character images in the chat area images to obtain the text information in the chat area images.

Optionally, the inputting the text information into a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot includes:

generating a characteristic vector of the text information through a word vector algorithm;

and inputting the characteristic vector to a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot.

In a second aspect, an apparatus for identifying a report screenshot is provided, the apparatus comprising:

the acquisition module is used for acquiring the report screenshot to be identified;

the determining module is used for determining the image in the chat frame in the report screenshot according to a preset chat area recognition algorithm to obtain a chat area image;

the identification module is used for identifying text information in the chat area image;

and the output module is used for inputting the text information into a pre-trained recognition model and outputting a recognition result corresponding to the report screenshot.

Optionally, the determining module is specifically configured to:

performing closing operation on the binary image;

Optionally, the determining module is specifically configured to:

respectively calculating the color characteristic vector of each sub-image;

Optionally, the identification module is specifically configured to:

Optionally, the output module is specifically configured to:

In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of the first aspect when executing the program stored in the memory.

In a fourth aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the method steps of the first aspect.

In a fifth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above methods of identifying a reporting screenshot.

According to the method and the device for identifying the report screenshot, after the server obtains the report screenshot to be identified, the image in the chat frame in the report screenshot can be determined according to a preset chat area identification algorithm, the chat area image is obtained, text information in the chat area image is identified, the text information is input into a pre-trained identification model, an identification result corresponding to the report screenshot is output, and therefore reporting success or misinformation is judged. Therefore, manual review of reported screenshots is not needed, the efficiency of identifying malicious information is improved, and actual requirements can be met.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flow of an identification method for reporting a screenshot according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a report screenshot provided in an embodiment of the present invention;

fig. 3 is a flowchart of a method for determining an image of a chat area according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a gray scale image according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a convolution-transformed image according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a binary image according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an image of a chat area according to an embodiment of the present invention;

fig. 8 is a flowchart of a method for identifying text information in an image of a chat area according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a text area image according to an embodiment of the present invention;

fig. 10 is a flowchart of a method for determining an identification result corresponding to a report screenshot according to an embodiment of the present invention;

FIG. 11 is a flowchart of a training method for identifying a model according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an identification apparatus for reporting a screenshot according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an identification method of a report screenshot, and the method is applied to electronic equipment. The electronic device may be a device with data processing and computing capabilities, for example, the electronic device may be a server, which may be a backend server of some instant messaging application. In the embodiment of the present application, an electronic device is taken as an example for description, and other conditions are similar in threshold value. After the server obtains the report screenshot to be identified, the server can determine the image in the chat frame in the report screenshot according to a preset chat area identification algorithm to obtain the chat area image, then identify the text information in the chat area image, further input the text information into a pre-trained identification model, and output the identification result corresponding to the report screenshot, so that the report success or the report error is judged. Therefore, manual review of reported screenshots is not needed, the efficiency of identifying malicious information is improved, and actual requirements can be met.

As shown in fig. 1, a flow diagram of an identification method for reporting a screenshot provided in an embodiment of the present application includes the following steps:

and step 110, acquiring a report screenshot to be identified.

In implementation, when a user finds that malicious information exists in a chat area, the user can directly perform screenshot operation on the chat area, then click a report option, and the user terminal can send a report request to the server, wherein the report request includes a picture intercepted by the user (namely, a report screenshot). After receiving the report request, the server may analyze the report request to obtain a report screenshot in the report request, which is shown in fig. 2 and is a schematic diagram of the report screenshot.

And step 120, determining images in the chat frame in the report screenshot according to a preset chat area recognition algorithm to obtain chat area images.

In implementation, the server may store a chat area recognition algorithm in advance, and the server may determine a chat frame in the report screenshot according to the chat area recognition algorithm, and further obtain an image in the chat frame to obtain a chat area image. In one possible implementation, the server may use the prior characteristics of the chat box to clean the background noise and coarsely locate the reported text content region. The prior characteristics comprise more characters in the chat frame, consistent character spacing, square chat outline, consistent length-width ratio and the like. In general terms, the server may first detect the edges of various elements in the image, which may include, for example, text, contours, or edges of some background image. Then, the connected regions formed by the edges are calculated, and then the rectangular outlines of various elements in the image are determined according to the connected regions formed by the edges.

Considering that the chat area is a regular square area containing more characters, the background color tone is consistent and is obviously different from the color tones of other areas, therefore, the square outline can be found out from the connected area as a candidate outline, and then the chat frame is screened out from the candidate outline, so as to obtain the chat area image.

Optionally, as shown in fig. 3, the specific processing procedure of step 120 may include the following steps:

and step 121, detecting the edges of the elements in the report screenshot according to a preset edge detection algorithm.

In implementation, the server may store an edge detection algorithm in advance, and the edge detection algorithm may adopt any algorithm or processing method having an edge recognition function in the prior art. The server can detect the edges of all elements in the report screenshot according to an edge detection algorithm.

Optionally, the processing procedure of detecting the edge including each element in the report screenshot according to a preset edge detection algorithm may specifically include the following steps:

step one, performing Gaussian blur preprocessing on the reported screenshot to obtain a processed image.

In implementation, the server may perform gaussian blur preprocessing on the report screenshot to obtain an image (i.e., a processed image) after the gaussian blur preprocessing. By Gaussian blur preprocessing, image details can be generalized to remove noise, so that the edge can be detected more accurately in the follow-up process.

In the embodiment of the present application, color-based gaussian blur processing may be adopted, and edge points in an image are more easily detected by color-based gaussian blur processing than by gaussian blur processing for a grayscale image.

The color-based gaussian blur process may be: aiming at each pixel point in the reported screenshot, the server transforms the pixel value P (i, j) of the pixel point in each color channel to obtain a new pixel value P' (i, j) of the pixel point in each color channel. And P' (i, j) is obtained by aggregation calculation of the fuzzy weight matrix W and pixel values of eight surrounding pixel points. The variation formula is shown in formula 1-1.

The fuzzy weight matrix W may be a weight matrix with a radius of 1, as shown in equation 1-2.

And step two, carrying out gray level processing on the processed image to obtain a gray level image corresponding to the processed image.

In an embodiment, after obtaining the processed image, the server may perform a grayscale process on the processed image to obtain a grayscale image (which may be referred to as a first grayscale image for convenience of distinction) corresponding to the processed image. In this way, the color information of the processed image can be removed to reduce the amount of calculation for edge detection. The first gray image can reflect the distribution and characteristics of the overall and local chromaticity and luminance levels of the image. The calculation formula of the gradation process can be as shown in formulas 1 to 3.

Gray(i,j)＝0.11·B(i,j)+0.59·G(i,j)+0.3·R(i,j)(1-3)

Wherein, Gray (i, j) is the Gray value of the pixel point (i, j) after Gray processing; b (i, j), G (i, j), R (i, j) are RGB component values of the pixel point (i, j), i.e. blue, green, red component values, respectively.

The first gray image obtained after the gray processing may be as shown in fig. 4.

And step three, carrying out convolution change on the gray level image, and determining the edges of all elements contained in the report screenshot.

In implementation, since the place where the gray value changes drastically in the first gray image is usually the edge of the element, the gray value of the edge is significantly different from the gray value of the neighborhood. Therefore, after the server obtains the first gray image, the server can perform convolution change on the first gray image to determine the edge containing each element in the report screenshot. The specific treatment process can be as follows: the server performs convolution change on the first gray level image to obtain a new gray level value of each pixel point in the first gray level image, and then performs threshold operation on the new gray level value of each pixel point to determine edge information.

Convolution is a weighted sum method that takes into account the effects of the surrounding pixel values and the location near and far. Generally, the farther the distance between pixels, the smaller the influence and the smaller the weight; for example, the edge and the neighborhood pixels have larger difference and larger correlation therebetween; but not to the pixels of the remote area.

Specifically, given a gray image A, a first order gradient approximation of the luminance function of A is calculated by convolution to obtain G_xAnd G_y(ii) a Wherein G is_xAnd G_yThe images detected by the transverse and longitudinal edges are respectively represented, and specific transformation formulas can refer to formulas 1 to 4 and formulas 1 to 5.

Alternatively, the image after convolution transformation of the first gray image may be as shown in fig. 5. As can be seen from fig. 5, the chat frame and the characters included in the chat frame are clearly depicted, and the distinguishing effect is significant.

And step 122, extracting candidate contours meeting preset contour conditions according to a preset contour extraction algorithm and the detected edges of the elements.

In implementation, after the server determines the edges of each element included in the report screenshot, the server may extract the contours according to a preset contour extraction algorithm and the detected edges of each element to obtain a plurality of contours, and then extract candidate contours satisfying preset contour conditions from the contours. The preset outline condition can be set by technical personnel according to actual requirements, and is used for determining the outline which is consistent with the shape of the chat frame. For example, if the chat frame is rectangular, the outline of the rectangle can be determined from the extracted outlines; if the chat frame is circular, the outline of the circle can be determined among the extracted outlines. The specific algorithm may be an algorithm capable of determining the shape of the strip in the prior art, and the embodiment of the present application is not limited.

Optionally, the processing procedure of extracting the candidate contour meeting the preset contour condition according to the preset contour extraction algorithm and the detected edges of each element may specifically include the following steps:

and step one, carrying out binarization on the image after the convolution change to obtain a binary image.

In the implementation, considering that the image after the convolution transformation (which is a gray image and may be referred to as a second gray image for convenience of distinction) contains relatively complex background elements, in order to accurately and efficiently identify the contour, the second gray image may be first converted into a binary image, and the color intensity and dark information in the second gray image may be removed. After binarization processing, the obtained binary value only contains foreground information and background information. The foreground information is white (i.e., the pixel points of the foreground image have high pixel values, such as 1), and the background information is black (i.e., the pixel points of the background image have low pixel values, such as 0). In this way, the calculation amount of the subsequent steps can be obviously reduced, and the robustness of the contour extraction algorithm can be improved.

In the second gray scale image, the value of each pixel is a number between 0-255, representing the degree of darkness. And the value of each pixel in the binary image has only two values of {0,1}, wherein 0 represents black and 1 represents white.

The formula for the binary transform refers to formulas 1 to 6.

Wherein Bi (x, y) is the numerical value of the pixel point (x, y) after binarization; gray (x, y) is the Gray value of the pixel point; t is a threshold, e.g., T may be 128.

And step two, performing closing operation on the binary image.

In implementation, after the server obtains the binary image, the server can use image closing operations such as expansion and erosion to connect the edges of the small character blocks close to each other to form a connected domain without protrusions, namely an approximate outline of the chat frame. Specifically, the server may perform an expansion operation on the binary image, combine the edges of two adjacent separated text character image blocks, then perform a corrosion operation, and flatten the edge of the connected domain and the protrusion to obtain a connected domain without the protrusion.

Alternatively, the binary image after the closing operation may be as shown in fig. 6.

And step three, determining a rectangular connected domain formed by the pixel points with high pixel values in the image after the closing operation.

In implementation, the server may extract the candidate contour by using the prior feature that the chat box region contains more characters and the contour is a regular square. Specifically, the server may determine a connected domain composed of pixel points with high pixel values in the image after the closing operation, and then determine a circumscribed rectangle of the connected domain for each connected domain to obtain a rectangular connected domain.

And step four, determining the connected domain meeting the preset length-width ratio in the rectangular connected domain to obtain the candidate contour.

In implementation, the server may determine, in a rectangular connected domain, a connected domain satisfying a preset aspect ratio, to obtain a candidate contour. Wherein the preset length-width ratio is the length-width ratio of the standard chat frame. In this way, candidate contours that meet the aspect ratio of the standard chat box can be screened out. The server may also record four vertex coordinates corresponding to the minimum bounding rectangle of each candidate contour, which may be the pixel positions of the vertices in the binary image.

And step 123, respectively determining sub-images contained in each candidate contour in the candidate contours, and determining the sub-images meeting the preset similarity condition with the preset image to obtain the chat area image.

In implementation, after the server determines the candidate contours, for each candidate contour, the server may intercept, according to the vertex coordinates corresponding to the candidate contour, an image in a rectangular area formed by the vertex coordinates in the report screenshot, and obtain a sub-image corresponding to the candidate contour. Then, a sub-image satisfying a preset similarity condition with a preset image can be determined from the intercepted sub-images, so as to obtain a chat area image.

Optionally, the determining the sub-images satisfying the preset similarity condition with the preset image to obtain the chat area image may specifically include the following steps:

step one, respectively calculating the color characteristic vector of each sub-image.

In implementation, the server may use the prior features of the chat frame to distinguish the chat area from the plurality of candidate contours. The prior characteristics comprise that the chat box has consistent color tone and is greatly different from other areas in color tone. Specifically, the server may calculate the color feature vector of each sub-image according to a preset color feature vector algorithm.

Optionally, in this embodiment of the present application, HSV (Hue Saturation Value color space vector) may be used as a color vector, a conventional RGB color space is converted into a visually balanced HSV color space, each HSV cell is used as a vector dimension, and the number of pixels of a color falling in each HSV cell is used as a Value of the dimension. Among them, HSV is a color space created by a.r.smith according to the intuitive characteristics of colors, i.e., Hue (Hue), Saturation (Saturation), and Value (Value), and is used to measure the intuitive perception of colors, such as what color, how dark and how bright are expressed.

And step two, matching the color characteristic vector of each sub-image with the color characteristic vector of the preset image.

The preset image is a preset chat area image, namely a color image contained in the chat frame.

In implementation, the server calculates and stores the color feature vector of the preset image according to a color feature vector algorithm. Then, the server may match the color feature vector of each sub-image with the color feature vector of the preset image, respectively. For example, a degree of fitting of the color feature vector of the sub-image to the color feature vector of the preset image may be calculated. The calculation process of the fitting degree belongs to the prior art, and the embodiment of the present application is not described in detail.

And step three, determining the sub-images with the matching degree larger than a preset threshold value to obtain the chat area images.

In implementation, after calculating the matching degree (such as the fitting degree) corresponding to each sub-image, the server may determine whether the matching degree is greater than a preset threshold, if so, determine that the sub-image is a chat area image, and if not, determine that the sub-image is not a chat area image. For example, the cosine similarity of the vectors is taken as the fitting degree, and the preset threshold is 0.87.

Alternatively, the identified chat area image may be as shown in fig. 7.

Step 130, identifying text information in the chat area image.

In implementation, the server may also have a text recognition algorithm pre-stored therein. After the server identifies the chat area image, the text information in the chat area image can be identified through the text identification algorithm. The text recognition algorithm may be a recognition algorithm in the prior art, and the embodiment of the present application is not limited.

Optionally, an embodiment of the present application provides a processing procedure for recognizing text information in an image of a chat area, as shown in fig. 8, specifically, the processing procedure may include the following steps:

and 131, determining a character image containing a single character in the chatting area image through a preset character segmentation algorithm.

In implementation, the server may store a character segmentation algorithm in advance, and the server may determine a character image containing a single character in the chat area image through the character segmentation algorithm. A specific process may include the following steps.

Step one, preprocessing the image of the chatting area.

In implementation, the server may perform gray processing on the image of the chat area to reduce noise and avoid that the color image contains interference information to affect character segmentation and judgment, and the processing procedure of this step is similar to the processing procedure in step 121 and is not described again.

Then, the server can binarize the chat area image to divide the background and the font of the chat frame into black and white. Specifically, the server may count the number of each gray value by traversing the image pixel points, and project the gray values and the number thereof onto the histogram. And forming two obvious wave peaks on the gray level histogram by using the prior knowledge with larger contrast between the background of the chat frame and the chat characters, and taking the gray level value at the lowest wave trough between the two wave peaks as a binarization threshold value.

And step two, the server performs character string segmentation processing. .

In the implementation, considering that the characters in the chat frame are standard characters, the line-row spacing and the character spacing are approximately equal, and the phenomenon of blocking rarely exists, the character line can be segmented by the aid of the prior characteristics through a horizontal projection method, the upper limit and the lower limit of each line can be found, and the character string can be segmented. Specifically, the server may traverse the pixels in rows according to a sequence from left to right, and count the number of white pixels in each row. And then projecting the counted number onto a histogram, wherein the abscissa of the histogram is the image row coordinate, and the ordinate is the number of white pixel points counted by the row. In the histogram, peaks alternate with valleys, which are the upper and lower starting and ending bounds of the character. The intersection point of the wave crest and the wave troughs on the two sides of the wave crest is used as a starting point candidate for line segmentation, and the wave crest region is a region candidate containing line characters. The server screens out character string areas from the candidates, and the screening rule comprises two rules: firstly, the distance between two wave crests should meet the setting of the shortest interval and the longest interval between lines in the game chat frame; second, the amplitude of the peak should satisfy the minimum crest value.

Thus, the server can obtain a character image including only the character string.

And step three, the server performs single character segmentation on the character image only containing the character string.

In implementation, the server may vertically project the text image, find the left and right boundaries of each character, and cut out a single character from the character string. Specifically, the server may traverse the pixels of the text and the digital image in columns according to the sequence from top to bottom, and count the number of white pixels. And then projecting the counted number onto a histogram, wherein the abscissa of the histogram is the row character image column coordinate, the ordinate is the number of white pixel points counted by the column, counting the number of columns occupied by single wave crests one by one, and comparing the number with the preset minimum width of a single standard character.

Considering that the Chinese characters have more left and right structures and upper and lower structures, excessive cutting is easily caused, namely, the Chinese characters on the left and right components are cut into two parts, such as the Chinese character 'then'. The embodiment of the application provides a new three-class word cutting rule, and the specific content is as follows.

Firstly, when the number of columns occupied by a single peak is smaller than the minimum width of the standard characters, the peak is considered to be only a certain part of the complete characters, and the peaks occupying the smaller number of columns in the peaks adjacent to the left or right of the peak are combined into a larger peak area to be used as a single character small picture.

And secondly, if the number of the columns occupied by the single wave crest is larger than the minimum width of the standard characters, the single character small picture is regarded as a single character small picture.

And thirdly, carrying out corrosion operation on the pictures left after the first point and the second point are identified, and enabling the fonts to be adhered to form single character small pictures.

After the peak division, the intersection point of the peak and the left and right sides is used as the starting point of the character division, and a small region picture (which may be called a character region image) of a single character is extracted and divided, as shown in fig. 9.

Step 132, recognizing that the character image contains characters according to a preset character recognition model.

In implementation, the server may store a character recognition model in advance, and then recognize that the character image contains characters according to the character recognition model.

The embodiment of the application provides a training method of a character recognition model, and the training method can adopt a Convolutional Neural Network (CNN) method with superior performance in the industry to train the character recognition model. The specific process can be as follows: firstly, training an optical recognition model of CNN; then, distinguishing each single character image by using the model; finally, the language model is used for identifying the content of the whole sentence according to the association condition among the single characters, so that the accuracy is improved.

The CNN method is a supervised image character recognition method. By the local perception field and weight sharing method, the complexity of a network model is reduced, the number of weights is reduced, and the calculation overhead is greatly reduced. Specifically, the CNN includes 5 hierarchies including an input layer, a convolutional layer, an active layer, a pooling layer, and a full connection layer. Wherein:

the input is the pixel matrix of the original character image, thus eliminating the subjective operations of manually extracting the characteristics and the like of the traditional model

The convolutional layer extracts local characteristics of the image. Extracting local perception features in the picture, and then performing comprehensive operation on the local perception features at a higher level to obtain global information; on the other hand, through weight sharing, the connection parameters of the neurons in the same layer are only related to feature extraction and are not related to specific positions, and the connection of all the positions in the same layer is guaranteed to be weight shared.

The excitation layer performs one-time nonlinear mapping on the output result of the convolution layer to realize the nonlinear expression force of the model. In experiments, a Rule function is selected as an activation function, and is found to be remarkably superior to the traditional activation functions such as sigmoid and tanh

The pooling layer screens features, compresses data, reduces parameters and screens non-significant features in a down-sampling mode under the condition of not influencing image quality so as to reduce overfitting and improve the fault tolerance of the model

The full-connection layer constructs a multi-classification neural network, overfitting is prevented by adopting a Dropout mode, partial neurons are dormant randomly, and overfitting possibly occurring on partial nodes is prevented

In the experiment, because the structure of Chinese characters is more complex than numbers and English, in order to model detail information, a 48 × 48 gray level image construction sample is used as the input of the model, in order to ensure the practicability of the model, 3620 high-frequency Chinese characters are counted from a chatting log in a game, 26 letters (capital and small) and 10 numbers are added, and 3682 characters are used as the model output.

The server may calculate the maximum probability word using the softmax function. In the experiment, the output is a 3682-dimensional vector, and each output dimension is the probability that the picture belongs to the class of characters.

Step 133, forming the recognized characters into sentences according to the arrangement order of the character images in the chat area image, so as to obtain the text information in the chat area image.

In implementation, the server may respectively recognize characters in each character image, and then compose the recognized characters into a sentence according to an arrangement order of the character images in the chat area image, so as to obtain text information in the chat area image. Alternatively, the server may sequentially input the character images to the character recognition model in accordance with the arrangement order of the character images in the chat area image, to obtain the text information in the chat area image.

Alternatively, the word "recruit" may be misidentified as "recruit" in view of the possible lack of clarity of the image. In order to improve the performance, the method and the device can adopt a language model to solve the problem by utilizing the matching incidence relation between characters. For example, based on a large amount of text data, the word frequency of "recruiting" may be counted much higher than that of "recruiting", i.e., the word may be considered as "recruiting" rather than "recruiting". Specifically, conditional probabilities between words are counted from a large number of chat texts, P (S1| S2), such as P (recruitment) ═ frequency of recruitment/'frequency of recruitment'; then, a Viterbi dynamic programming algorithm is used for finding out a character collocation combination with the maximum probability in the specified character string, for example, the combination probability of 'team recruitment' is the maximum and is higher than 'team recruitment'.

And step 140, inputting the text information into a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot.

In implementation, after the server obtains the text information, the text information can be input into a pre-trained recognition model, and a recognition result corresponding to the reported screenshot is output. As shown in fig. 10, a specific process may include the following steps.

And step 141, generating a feature vector of the text information through a word vector algorithm.

In implementation, the server may store a word vector algorithm in advance, and the server may generate a feature vector of the text information through the word vector algorithm. Word vectors are a language modeling and feature learning technique in Natural Language Processing (NLP), mathematical embedding from a space of one dimension per word to a continuous vector space with lower dimensions. In the experiment, chat text information in a game is collected, and a word vector is trained from the text information by using an industry universal embedded vector method word2vec, wherein the dimension is 200 dimensions. word2vec is a shallow two-layer neural network that is trained to reconstruct word text information for linguistics. After training, the word2vec model can be used to map each word to a vector, which can be used to represent the word-to-word relationship, and the vector is the hidden layer of the neural network. Given text information, there is a 200-dimensional vector for each word, and the vectors of all words in the text information are added to obtain a feature vector of the text information.

And 142, inputting the feature vector into a pre-trained recognition model.

And step 143, outputting the recognition result corresponding to the report screenshot.

In implementation, the server may input the feature vector of the text information to a pre-trained recognition model, and may output a recognition result corresponding to the report screenshot.

The embodiment of the present application further provides a training process of the recognition model, as shown in fig. 11, which specifically includes the following steps.

Step 1101, training samples are obtained.

Wherein the training samples comprise spam text and non-spam text. The junk text is a text which does not contain the reported content, such as a harassing text reported by a malicious user or other texts which do not contain malicious information; non-spam text is text that contains the content of a report.

Step 1102, generating feature vectors for the training samples using the word vectors.

Where word vectors are a language modeling and feature learning technique in Natural Language Processing (NLP), mathematical embedding from a space of one dimension per word to a continuous vector space with lower dimensions. In the experiment, chat texts in the game are collected, and word vectors are trained from the texts by utilizing an industry universal embedded vector method word2vec, wherein the dimensionality is 200 dimensions. word2vec is a shallow two-layer neural network that is trained to reconstruct the word text of linguistics. After training, the word2vec model can be used to map each word to a vector, which can be used to represent the word-to-word relationship, and the vector is the hidden layer of the neural network. Given a text, there is a 200-dimensional vector for each word, and the vectors of all words in the text are added to obtain a feature vector of the text, which is then trained.

Step 1103, training the recognition model through a decision tree.

In practice, decision trees are a very efficient supervised classification method in the industry. Given a number of samples, each sample has a set of attributes (i.e., feature vectors) and a class, which are trained to produce a decision tree. The tree is composed of nodes and directed edges. The nodes are divided into internal nodes and leaf nodes, wherein the internal nodes represent a feature or attribute, and the leaf nodes represent a class (i.e., a recognition result). This classification tree can give the correct classification for newly appearing objects.

Thus, after the recognized text information is given, the character vector corresponding to the text information can be generated by using the word vector. The feature vectors are then input into a decision tree (i.e., recognition model) for classification. Specifically, the judging process starts from a root node of the tree, judges a certain characteristic dimension of the characteristic vector according to a preset rule, and then distributes the characteristic vector to sub-nodes thereof according to a judging result, namely, each sub-node corresponds to a dimension value of the characteristic vector. And moving downwards in a recursion manner until the leaf nodes are reached, and finally distributing the feature vector to the class of the leaf nodes, so as to obtain the class corresponding to the feature vector, namely the identification result corresponding to the text information. In the embodiment of the application, the identification results are classified into forbidden categories and false-alarm categories, namely reporting pass or rejecting.

Optionally, in the embodiment of the present application, the recognition effect of the report screenshot is verified. In the verification process, 1000 user reporting screenshots are randomly screened, the customer service manually audits the labels, then the label data is compared with the identification result identified by the identification method of the reporting screenshots provided by the embodiment of the application, and the accuracy and the coverage rate of the identification method are counted. Wherein, the accuracy is defined as the number of the matched samples divided by the total number of the predicted samples, and the coverage is defined as the number of the matched samples divided by the total number of the labeled samples. Through actual data verification, the accuracy of the algorithm is 73.2%, and the coverage rate is 67.3%. Compared with the traditional character recognition algorithm, the method has the advantages that the accuracy is only 14.1% and the coverage rate is 11.5% based on rule audit after characters are directly recognized from the image; namely, the recognition accuracy of the new algorithm is remarkably improved compared with that of the traditional method. According to the method and the system, the predicted result is applied to the game customer service system to support customer service to conduct account security audit, and the efficiency of customer service personnel is obviously improved. Specifically, before online, each customer service person processes 200 reporting screenshots every day, and after online, the daily throughput is improved to 4530 screenshots, so that the service value is huge.

In the embodiment of the application, after the server obtains the report screenshot to be identified, the server can determine the image in the chat frame in the report screenshot according to a preset chat area identification algorithm to obtain the chat area image, then identify the text information in the chat area image, further input the text information into a pre-trained identification model, and output the identification result corresponding to the report screenshot, so that the report success or the report error is judged. Therefore, manual review of reported screenshots is not needed, the efficiency of identifying malicious information is improved, and actual requirements can be met.

Based on the same technical concept, as shown in fig. 12, an embodiment of the present application further provides an identification apparatus for reporting a screenshot, where the apparatus includes:

an obtaining module 1210, configured to obtain a report screenshot to be identified;

the determining module 1220 is configured to determine, according to a preset chat area recognition algorithm, an image in a chat frame in the report screenshot to obtain a chat area image;

the recognition module 1230 is used for recognizing the text information in the chat area image;

and the output module 1240 is used for inputting the text information into the pre-trained recognition model and outputting the recognition result corresponding to the report screenshot.

Optionally, the determining module 1220 is specifically configured to:

and carrying out convolution change on the gray level image, and determining the edge containing each element in the report screenshot.

Optionally, the determining module 1220 is specifically configured to:

performing closing operation on the binary image;

and in the connected domain, determining the connected domain meeting the preset length-width ratio to obtain the candidate contour.

Optionally, the determining module 1220 is specifically configured to:

respectively calculating the color characteristic vector of each sub-image;

Optionally, the identifying module 1230 is specifically configured to:

determining a character image containing a single character in the chatting area image through a preset character segmentation algorithm;

and forming the recognized characters into sentences according to the arrangement sequence of the character images in the chat area image to obtain the text information in the chat area image.

Optionally, the output module 1240 is specifically configured to:

and inputting the characteristic vector into a pre-trained recognition model, and outputting a recognition result corresponding to the report screenshot.

The embodiment of the present application further provides an electronic device, as shown in fig. 13, which includes a processor 1301, a communication interface 1302, a memory 1303, and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete communication with each other through the communication bus 1304,

a memory 1303 for storing a computer program;

the processor 1301 is configured to implement the following steps when executing the program stored in the memory 1303:

acquiring a report screenshot to be identified;

identifying text information in the chat area image;

performing closing operation on the binary image;

respectively calculating the color characteristic vector of each sub-image;

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, or discrete hardware components.

In another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above methods for identifying a report screenshot.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the method for identifying a screenshot as reported in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for identifying a reported screenshot, the method comprising:

acquiring a report screenshot to be identified;

identifying text information in the chat area image;

2. The method of claim 1, wherein determining the image in the chat box in the report screenshot according to a preset chat area recognition algorithm to obtain a chat area image comprises:

3. The method according to claim 2, wherein the detecting edges of the report screenshot including the elements according to a preset edge detection algorithm comprises:

4. The method according to claim 3, wherein the extracting candidate contours satisfying the preset contour condition according to a preset contour extraction algorithm and the detected edges of each element comprises:

performing closing operation on the binary image;

5. The method according to claim 2, wherein the determining the sub-image satisfying the preset similarity condition with the preset image to obtain the chat area image comprises:

respectively calculating the color characteristic vector of each sub-image;

6. The method of claim 1, wherein the identifying text information in the chat area image comprises:

7. The method of claim 1, wherein the inputting the text message into a pre-trained recognition model and outputting a recognition result corresponding to the report screenshot comprises:

8. An apparatus for identifying a reported screenshot, the apparatus comprising:

9. The apparatus of claim 1, wherein the determining module is specifically configured to:

10. The apparatus of claim 2, wherein the determining module is specifically configured to:

11. The apparatus of claim 3, wherein the determining module is specifically configured to:

performing closing operation on the binary image;

12. The apparatus of claim 2, wherein the determining module is specifically configured to:

respectively calculating the color characteristic vector of each sub-image;

13. The apparatus according to claim 1, wherein the identification module is specifically configured to:

14. The apparatus of claim 1, wherein the output module is specifically configured to:

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

16. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.