CN115661585A

CN115661585A - Image recognition method and related device

Info

Publication number: CN115661585A
Application number: CN202211567344.1A
Authority: CN
Inventors: 黄超
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-01-31
Anticipated expiration: 2042-12-07
Also published as: CN115661585B

Abstract

The application discloses an image identification method and a related device, a frame set of a target application program is obtained firstly, an initial self-coding model is trained based on a display picture frame in the frame set, and the self-coding model is based on the fact that an input display picture frame restores a prediction picture frame after feature extraction and carries out unsupervised training through the difference between the display picture frame and the prediction picture frame, so that a feature extraction model with high extraction precision can be obtained through training without marking a sample label. And then, constructing a probability model based on a fitting normal distribution mode for the feature vector of the display frame acquired by the feature extraction model, so that when the frame to be detected of the target application program is acquired, the target feature vector of the frame to be detected is acquired by the feature extraction model, and the prediction probability corresponding to the target feature vector is determined by the probability model, thereby greatly saving the sample labeling cost, being capable of identifying a brand-new abnormal display condition and improving the abnormal identification precision.

Description

Image recognition method and related device

Technical Field

The present application relates to the field of image recognition, and in particular, to an image recognition method and a related apparatus.

Background

By running the application program configured in the terminal device, different services can be provided for the user, such as services of playing games, reading characters and the like. When the application program runs, the normal image needs to be displayed to provide corresponding services, for example, for a game application, the correct game scene and special effect should be displayed based on the control of a player. If abnormal display such as map misplacement and failure of material rendering occurs, the service quality is directly influenced.

Therefore, the problem of abnormal display caused by the abnormal display can be determined and solved in time only by accurately detecting the abnormal display condition in the image. In the related technology, an image detection model is mainly obtained in a supervision training mode, and whether an image displayed in the running process of an application program is abnormal or not is identified through the detection model.

In order to ensure the accuracy of the detection model, a large number of labeled abnormal display samples are needed, various possible abnormal display conditions need to be involved, the labeling cost is very high, and after the training is completed, the detection model is difficult to identify the newly-appeared abnormal display conditions.

Disclosure of Invention

In order to solve the technical problem, the application provides an image identification method and a related device, which can realize higher abnormality identification precision without marking a sample label.

The embodiment of the application discloses the following technical scheme:

in one aspect, the present application provides an image recognition method, including:

acquiring a frame set of a target application program, wherein the frame set comprises a plurality of display frame frames corresponding to the target application program;

inputting the display frame into an initial self-coding model, wherein the initial self-coding model comprises an initial feature extraction sub-model and an initial image restoration sub-model, the initial feature extraction sub-model is used for extracting an initial feature vector of the display frame, and the initial image restoration sub-model is used for restoring based on the initial feature vector to obtain a predicted frame;

training the initial self-coding model based on the difference between the display picture frame and the corresponding prediction picture frame, and training the initial feature extraction sub-model as a feature extraction model;

acquiring a feature vector corresponding to the display frame according to the feature extraction model, and constructing a probability model based on a fitting normal distribution mode according to the feature vector;

when a to-be-detected picture frame of the target application program is obtained, a target feature vector of the to-be-detected picture frame is obtained through the feature extraction model, and a prediction probability corresponding to the target feature vector is determined through the probability model, wherein the prediction probability is used for identifying whether the to-be-detected picture frame has an abnormal display picture.

In another aspect, the present application provides an image recognition apparatus, comprising:

the frame acquisition unit is used for acquiring a frame set of a target application program, and the frame set comprises a plurality of display frame corresponding to the target application program;

an input unit, configured to input the display frame into an initial self-coding model, where the initial self-coding model includes an initial feature extraction sub-model and an initial image reduction sub-model, the initial feature extraction sub-model is used to extract an initial feature vector of the display frame, and the initial image reduction sub-model is used to obtain a predicted frame based on the initial feature vector reduction;

a training unit, configured to train the initial self-coding model based on a difference between the display picture frame and the corresponding prediction picture frame, and train the initial feature extraction sub-model as a feature extraction model;

the construction unit is used for acquiring a feature vector corresponding to the display frame according to the feature extraction model and constructing a probability model based on a fitting normal distribution mode according to the feature vector;

and the detection unit is used for acquiring a target feature vector of the picture frame to be detected through the feature extraction model when the picture frame to be detected of the target application program is acquired, and determining a prediction probability corresponding to the target feature vector through the probability model, wherein the prediction probability is used for identifying whether the picture frame to be detected has an abnormal display picture.

In another aspect, the present application provides a computer device comprising a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is configured to execute the image recognition method according to the above aspect according to instructions in the computer program.

In another aspect, an embodiment of the present application provides a computer-readable storage medium for storing a computer program, where the computer program is configured to execute the image recognition method according to the above aspect.

In another aspect, the present application provides a computer program product including a computer program, which when run on a computer device, causes the computer device to execute the image recognition method.

According to the technical scheme, before abnormal display identification is carried out on the frame to be detected of the target application program, the frame set of the target application program can be obtained, the initial self-coding model is trained on the basis of the display frame in the frame set, the self-coding model is based on the fact that the input display frame is subjected to feature extraction and then the prediction frame is restored, and the feature extraction model with high extraction precision can be obtained through training without supervision according to the difference between the display frame and the prediction frame. And then constructing a probability model based on a fitting normal distribution mode by using the feature vector of the display frame acquired by the feature extraction model. Because the abnormal display frames generally have low probability, the proportion of the normally displayed frame in the display frame frames in the frame set is very high, and even the abnormal display frame is not available in the frame set in some scenes, so that the normal distribution rule learned by the probability model can basically and accurately describe the characteristic characteristics of the normally displayed frame under the condition of not marking a sample. Therefore, when the frame to be detected of the target application program is obtained, the target characteristic vector of the frame to be detected is obtained through the characteristic extraction model, and the prediction probability corresponding to the target characteristic vector is determined through the probability model, so that the prediction probability for identifying whether the frame has an abnormal display frame or not can be accurately obtained through the characteristic extraction model and the probability model, the sample marking cost is greatly saved, and in addition, as the probability model learns the overall characteristic characteristics of the normally displayed frame, even if a brand new abnormal display condition occurs in the frame to be predicted in the prediction process, the probability model can also recognize the abnormality based on the learned characteristic characteristics, and the abnormality recognition precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of an image recognition method according to an embodiment of the present application;

fig. 2 is a flowchart of an image recognition method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a frame of a display screen in a game application according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a game map according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an initial self-coding model according to an embodiment of the present application;

fig. 6 is a schematic diagram of an anomaly detection process according to an embodiment of the present application;

fig. 7 is a block diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 8 is a structural diagram of a terminal device according to an embodiment of the present application;

fig. 9 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

At present, in order to ensure the precision of a detection model, a large number of labeled abnormal display samples are needed, various possible abnormal display conditions need to be involved, the labeling cost is very high, and after training is completed, the detection model is difficult to identify the newly-appeared abnormal display conditions.

In order to solve the above technical problem, embodiments of the present application provide an image recognition method and a related apparatus, which can achieve high abnormality recognition accuracy without labeling a sample label, and can accurately recognize even if a brand-new abnormality display condition occurs in a to-be-predicted picture frame in a prediction process.

The image recognition method provided by the embodiment of the application can be implemented by computer equipment, and the computer equipment can be terminal equipment or a server, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing cloud computing service. The terminal devices include, but are not limited to, mobile phones, computers, intelligent voice interaction devices, intelligent appliances, vehicle-mounted terminals, aircrafts and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In order to facilitate understanding of the technical solution provided by the present application, an image recognition method provided by the embodiment of the present application will be introduced in combination with an actual application scenario.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image recognition method provided in an embodiment of the present application. In the application scenario shown in fig. 1, the terminal device 10 and the server 20 are included, a target application providing a service is installed in the terminal device 10, the server 20 corresponding to the target application and the terminal device 10 may interact with each other through a network, in the process that the terminal device 10 implements the service through the target application, the server 20 may provide a picture frame of the service for the terminal device 10, and the server 20 may perform a pattern recognition method to detect the picture frame before providing the picture frame.

Before performing abnormal display recognition on a frame to be detected of a target application program, the server 20 may first obtain a frame set of the target application program, train an initial self-coding model based on a plurality of display frame corresponding to the target application program in the frame set, and perform unsupervised training without labeling a sample label because the self-coding model is based on an input display frame, restores a predicted frame after feature extraction, and performs unsupervised training based on a difference between the display frame and the predicted frame, thereby obtaining a feature extraction model with high extraction precision.

The initial self-coding model comprises an initial feature extraction submodel and an initial image reduction submodel, the initial feature extraction submodel is used for extracting an initial feature vector of a display picture frame, the initial image reduction submodel is used for obtaining a prediction picture frame based on the reduction of the initial feature vector, and the initial self-coding model can be trained based on the difference between the display picture frame and the corresponding prediction picture frame, so that the initial feature extraction submodel is trained as a feature extraction model.

Then, the server 20 constructs a probability model based on fitting a normal distribution with respect to the feature vector of the display frame acquired by the feature extraction model. Because the abnormal display frames generally have low probability, the proportion of the normally displayed frame in the display frame frames in the frame set is very high, and even the abnormal display frame is not available in the frame set in some scenes, so that the normal distribution rule learned by the probability model can basically and accurately describe the characteristic characteristics of the normally displayed frame under the condition of not marking a sample.

When the frame to be detected of the target application program is obtained, the server 20 obtains the target feature vector of the frame to be detected through the feature extraction model, and determines the prediction probability corresponding to the target feature vector through the probability model, so that the prediction probability for identifying whether the frame has an abnormal display frame can be accurately obtained through the feature extraction model and the probability model, the sample labeling cost is greatly saved, and the probability model learns the overall feature characteristics of the normally displayed frame, so that even if a brand new abnormal display condition occurs in the frame to be predicted in the prediction process, the probability model can also recognize the abnormality based on the learned feature characteristics, and the abnormality recognition precision is improved.

Next, an image recognition method provided in an embodiment of the present application will be described with reference to the drawings. Referring to fig. 2, fig. 2 is a flowchart of an image recognition method provided in an embodiment of the present application, where a server is used as an execution subject to describe the method, and the method includes:

s101, acquiring a frame set of the target application program, wherein the frame set comprises a plurality of display frame corresponding to the target application program.

In the embodiment of the application, the target application program is used for providing services for the user, different target application programs can provide different services for the user, and the target application program can be a game application program or a reading program and the like. The target application program needs to display normal images to provide services when running, a plurality of images in the game application program sequentially display game animations forming the game application program, and a frame set of the target application program is an image set which needs to be displayed when the target application program runs and can comprise a plurality of display frame corresponding to the target application program as data supporting the services. The frame is the minimum unit of data transmission in the network, the display frame is the minimum unit of the video required by the service, one image can be used as one frame when being used as a part of the video required by the service, and the plurality of display frames can be all the frame in the target application program or part of the frame in the target application program.

When the target application program is a game application program, the display frame is a frame required to be displayed in the game process, the frame can be displayed on a horizontal screen and can also be displayed on a vertical screen, when the frame is displayed on the horizontal screen, the size of the screen of the terminal device in the horizontal direction is larger than that in the vertical direction, and when the frame is displayed on the vertical screen, the size of the screen of the terminal device in the vertical direction is larger than that in the horizontal direction. The display frame may include at least one virtual object, the virtual object may be a game character, a game item, and the like, the game item may include an item for implementing an attack, such as a virtual gun, a virtual knife, and the like, and when the game item includes a virtual gun, a scene of the game is a gun battle scene. Referring to fig. 3, a schematic diagram of a display frame in a game application provided in an embodiment of the present application is shown, where the display frame 10 may include a game background, a game character 11, operation controls 13/14, a game item 12, and the game item is a virtual gun.

When the target application program is a reading program, the display frame is a frame in the reading process, and the frame may include a reading background, characters, and the like.

The problem of abnormal display may exist in the picture frame displayed in the running process of the target application program, the abnormal display may affect the service quality of the target application program, and the abnormal display picture frame may cause an abnormal display picture. When the target application program is a game application program, the abnormal display picture is at least one of an illumination abnormal picture, a model collision abnormal picture or a material rendering abnormal picture. Problems such as screen blooming caused by uneven illumination may exist in an illumination abnormal picture; the abnormal picture of model collision may also be referred to as a through-model abnormal picture, and there may be a problem that one virtual model passes through another virtual model, or two virtual models which should be contacted have a large distance, and the like, in the picture, for example, a game character or a game item passes through a virtual wall in a game scene, or a game character and a game item controlled by the game character do not correctly contact and have an obvious distance so that the game item hangs; the problem that the material of the virtual model may be abnormal in the material rendering abnormal picture is generally caused by abnormal interaction between virtual illumination and a virtual object, such as ground depression in a virtual scene, and the like, so that the picture is taken as a ground depression abnormal picture. When the target application program is a reading program, abnormal display pictures such as a color abnormal picture, a character overlapped picture and the like are displayed.

The detection of the abnormal display pictures is a very challenging problem, a target application program usually has a plurality of display pictures, the probability of the abnormal display pictures is high, the types of the abnormal display pictures are many, scenes of the abnormal display pictures are random, the existing abnormal display pictures do not necessarily appear in the later period, and meanwhile, the image samples of the abnormal display pictures are few, so that the detection of the abnormal pictures faces a great challenge. The method has the advantages that the manual traversal of a plurality of display pictures of the target application program is carried out to realize the picture detection, a large amount of time is consumed frequently, and therefore the detection efficiency can be greatly improved by the automatic abnormal picture detection mode.

In the related art, a normal display screen may be used as a positive sample, an abnormal display screen may be used as a negative sample, and a depth model, such as a Convolutional Neural Network (CNN), may be trained through a large number of labeled positive samples and negative samples, so that the depth model has the capability of identifying the positive samples and the negative samples. However, this method requires a large number of labeled samples, which results in waste of labor cost and time cost, and only model training can be performed for a specific type of abnormal display screen, and the trained model has the capability of recognizing the specific type of abnormal display screen, but does not have the capability of recognizing other types of abnormal display screens.

In the related technology, a normal display picture can be prerecorded, the normal display picture is numbered, when the picture to be detected is obtained, the number is compared with the corresponding normal display picture, whether the picture to be detected is an abnormal display picture or not is judged, and the normal display picture is a picture in a target application program of a normal version.

In order to detect the abnormal display frame, in the embodiment of the present application, a frame set of the target application may be used as a training sample for training a subsequent Artificial Intelligence (AI) model. The display frame in the frame set of the target application program can be a normally displayed frame, namely, the display frame is used as a positive sample, so that the model obtained by subsequent training captures the characteristics of the normally displayed frame, more display frames can be obtained, and the accuracy of the model is improved. In the embodiment of the application, the negative samples do not need to be obtained, the manual labeling of the positive and negative samples is also not needed, compared with a mode of manually labeling a large number of positive and negative samples and further realizing model training, the labor cost is saved, the abnormity detection can be carried out under the condition of no large number of abnormal display pictures, and the detection scene of the abnormal display pictures is widened.

In addition, the number of the abnormal display pictures in practical application is small, the abnormal types are various, and under the condition that a negative sample is not required to be obtained, the computing resource and the time cost for constructing the abnormal display pictures of the specific types through the algorithm are saved. The model obtained by training the positive sample can be used for judging the characteristics of the picture to be detected, and if the characteristics of the picture to be detected are matched with the characteristics of the positive sample, the picture is determined to be a normal display picture, so that various types of abnormal display pictures can be identified, the abnormal display pictures are not limited to one abnormal type, and the detection of the abnormal display pictures is more universal. Moreover, the model trained in the later period learns the characteristics of the positive sample, so that the detection can be carried out on various pictures to be detected without pre-storing corresponding normal display pictures, and the reliability and the accuracy of the picture detection are improved under the condition of reducing the redundancy of information storage.

Specifically, the multiple display frame frames in the frame set of the target application program may be obtained based on the target application program of the first version, and the first version may be a version for completing the abnormal display detection, so that the display frame frames corresponding to the target application program of the first version are all normally displayed frame frames, and thus may be used as a positive sample of a subsequently trained model. The first version herein is not necessarily the first version of the target application, but one version defined for distinguishing the frame to be detected may be the first version of the target application, or may be another version, which is a version in which the abnormal display detection is completed.

When the target application level is the game application, a game map of the target application can be acquired, and the display frame can be acquired in the game map by traversing the game map, so that the acquisition of the display frame can be automatically performed. Specifically, a vertex set of a geometric polygon forming the game map can be determined, wherein the geometric polygon can be a triangle, a rectangle, a hexagon and the like, then a map traversal path is determined according to the position relation of the vertices in the vertex set in the game map, a game role is controlled to move along the map traversal path, a game picture in the moving process is obtained as a display picture frame, and as many display picture frames as possible can be obtained in the path traversal manner, so that more training samples are obtained, and the trained model can be suitable for abnormal recognition of images at various positions of the game map.

The map traversal path can be determined through a game navigation framework, the game navigation framework is a Navmesh component for example, the game navigation framework can be carried by a game engine, game resources can be baked based on the game navigation framework, a feasible region in the game map is determined, the feasible region is a region where game characters can stay, the feasible region is represented by a plurality of geometric polygons, and therefore the geometric polygons forming the game map are determined through the game navigation framework. Referring to fig. 4, a schematic diagram of a game map provided in the embodiment of the present application is shown, where a gray area corresponds to a feasible area 21 in the game map, a white area is an infeasible area 22 in the game map, such as an area where an obstacle is located, an area where a wall is located, and the like, the feasible area 21 is composed of a plurality of triangles, and boundaries of the triangles are represented by black lines in the feasible area.

And then, acquiring a vertex set of the geometric polygon through the game navigation frame, and determining a map traversal path according to the position relation of the vertices in the vertex set in the game map. Specifically, a vertex set of a geometric polygon can be used as a traversed target point set, an initial position of a game role is obtained by using a game interface, a target point closest to the game role is selected from the target point set to serve as a current target point, and the game interface is carried by a game engine; generating a moving path from an initial position to a current target point of a target role through a game navigation frame, finally controlling the game role to move to the current target point through a moving and steering interface, acquiring a game picture to obtain a display picture frame in the moving process of the game role, wherein the moving and steering interface is carried by a game engine and can be used for controlling the game role. After the game role reaches the current target point, the current target point can be used as a history target point, a target point which is closest to the history target point is determined from an unreached target point set and used as a new current target point, the game role is controlled to move from the history target point to the new current target point, and the current target point is determined and the game role is controlled to move repeatedly until the traversal of all the target points is completed.

The game pictures can be acquired in the moving process of the game character, all the game pictures can be used as display picture frames, and partial pictures can be selected from the game pictures to be used as the display picture frames, because the images of adjacent frames have high similarity, even if partial pictures are selected, the excessive loss of the characteristics of the game pictures can not be caused, and more positive sample characteristics can be reserved under the condition that the display picture frames are reserved as few as possible. As an example, one frame of the game screen may be cut out as a display screen frame every 5 frames of the entire game screen, and in the case where 30 frames of the game screen are displayed in 1 second, 6 display screen frames may be cut out for the game screen in 1 second.

S102, inputting the display frame into an initial self-coding model, wherein the initial self-coding model comprises an initial feature extraction sub-model and an initial image restoration sub-model, the initial feature extraction sub-model is used for extracting an initial feature vector of the display frame, and the initial image restoration sub-model is used for restoring to obtain a predicted frame based on the initial feature vector.

In the embodiment of the application, after the display picture frame is acquired, the display picture frame may be input into the initial self-coding model, so that the predicted picture frame may be obtained by using the initial self-coding model. The initial self-coding model is a neural network model, such as a convolutional neural network model, the core of the initial self-coding model is used for learning deep representation of input data, the deep representation can be trained in an unsupervised or supervised mode, specifically, in the unsupervised training mode, the initial sub-coding model can extract features of an input display picture frame and predict the picture frame based on the extracted features, self-supervision is realized based on the input display picture frame and the predicted picture frame obtained through prediction, and labels of training samples are not needed to be used for supervision in the training process of the initial self-coding model, so that the training samples do not need to be manually labeled to obtain the labels of the training samples, and labor cost is saved. The deep layer of the display frame is represented as an abstract feature of the display frame, and can represent the display frame to a certain extent, so that the influence of the details of the display frame on the accuracy of the model can be reduced.

Referring to fig. 1, the initial self-encoding model includes an initial feature extraction sub-model and an initial image reduction sub-model, the initial feature extraction sub-model is used to extract an initial feature vector of a display frame, the initial image reduction sub-model is used to obtain a prediction frame based on the initial feature vector reduction, when the initial feature vector has a good representative power, the degree of similarity between the prediction frame and the corresponding display frame is high, that is, the image reduction effect is good, and at this time, the feature extraction capability of the initial feature extraction sub-model is good.

The initial feature extraction submodel may include a Convolutional layer (Convolutional layer) and a fully connected layer, which are connected in sequence, and the number of the Convolutional layer may be plural, and as an example, the number of the Convolutional layer may be 4. The convolution layer can comprise a plurality of convolution units, each convolution unit can realize convolution operation according to parameters to obtain deeper features, each convolution unit in the convolution layer can be used for carrying out convolution operation on the display frame to obtain sub convolution features of the display frame, and the full-connection layer is connected with all convolution units of the convolution layer and used for integrating the sub convolution features of the display frame to obtain the convolution features of the display frame.

The initial image restoration submodel may include a fully connected layer, a conversion layer, and an upsampling layer (upsample layer) connected in sequence, and the number of the upsampling layers may be plural, and as an example, the number of the upsampling layers may be 5. The full connection layer is used for acquiring the convolution characteristics of the display frame from the initial characteristic extraction submodel and preprocessing the convolution characteristics of the display frame through the connection relation with the full connection layer in the initial characteristic extraction submodel; the conversion layer is used for converting the characteristic vectors into a two-dimensional matrix corresponding to multiple channels; the up-sampling layer increases the scale of the features and reduces the depth of the features through deconvolution operation, so that the features are restored, a predicted picture frame obtained through restoration can be output, and the predicted picture frame is represented through a depth map. The size of the predicted picture frame may be identical to the size of the input data.

Before the display frame is input into the initial self-encoding model, the display frame may be preprocessed to convert the display frame into a fixed format, so as to reduce the complexity of the display frame, for example, the size of the display frame is scaled to 256 × 256 × 3, the height and width of the display frame are 256 pixels, the number of channels is 3, that is, each pixel point corresponds to the pixel values of three channels, and the three channels are red (red, r), green (green, g), and blue (blue, b) channels, so that the number of features included in the display frame is reduced, the number of features to be processed by the initial self-encoding model is reduced, the structure of the initial self-encoding model is simplified, and the light weight of the initial self-encoding model is realized.

Referring to fig. 5, which is a schematic structural diagram of an initial self-coding model provided in this embodiment of the present application, input data input to the initial feature extraction sub-model may be features of a display frame, where the size of a convolution kernel of a convolution layer is 4, and a step size is 2, then a depth of an output feature of a first convolution layer is 32, a depth of an output feature of a second convolution layer is 48, a depth of an output feature of a third convolution layer is 64, a depth of an output feature of a fourth convolution layer is 128, and a depth of an output feature of a full connection layer is 512.

The depth of the features of the fully-connected layer input to the initial image restoration submodel may be 1024. The convolution kernel size of the upsampling layer is 4, the step size is 2, the depth of the output feature of the first upsampling layer is 128, the depth of the output feature of the second upsampling layer is 64, the depth of the output feature of the third upsampling layer is 48, the depth of the output feature of the fourth upsampling layer is 32, and the depth of the output feature of the fifth upsampling layer is 3. The size of the prediction picture frame may be 256 × 256 × 3.

In addition, the initial self-coding model further comprises an activation layer (not shown), and the activation layer can be arranged after each layer, for example, after each convolutional layer, the full-link layer, the conversion layer and the upsampling layer, and introduces nonlinear characteristics to output characteristics of each layer, so that the data processing accuracy of the initial self-coding model is improved. That is, the initial feature extraction submodel in the initial self-coding model may include an active layer, a convolutional layer, an active layer, a fully-connected layer, and an active layer, which are connected in sequence, and the initial image restoration submodel in the initial self-coding model may include a fully-connected layer, an active layer, a conversion layer, an active layer, an upsampled layer, an up-sampled layer, an active layer, an up-sampled layer, and an active layer, which are connected in sequence.

S103, training an initial self-coding model based on the difference between the display picture frame and the corresponding prediction picture frame, and training an initial feature extraction sub-model into a feature extraction model.

In the embodiment of the application, the initial self-coding model can be trained based on the difference between the display picture frame and the corresponding prediction picture frame, so that the initial feature extraction sub-model is trained as the feature extraction model, and the feature extraction model has accurate deep feature extraction capability. Therefore, in the process of training the initial self-coding model based on the display picture frames in the frame set, the initial self-coding model restores the predicted picture frames after feature extraction based on the input display picture frames, and carries out unsupervised training through the difference between the display picture frames and the predicted picture frames, so that the feature extraction model with high extraction precision can be obtained by training without labeling sample labels.

Specifically, the initial self-coding model can be trained by minimizing the difference between the display picture frame and the corresponding prediction picture frame, so that the initial feature extraction sub-model is trained as the feature extraction model, and the prediction picture frame predicted by the feature extraction model can be closer to the display picture frame, so that the prediction picture frame has accurate feature extraction capability. In the process of training the initial self-coding model, the initial self-coding model can be optimized in a gradient backward transfer mode, when the difference between a display picture frame and a corresponding prediction picture frame is smaller than a preset value, the initial self-coding model can be considered to meet the quality requirement, model training can be stopped, and at the moment, the initial feature extraction sub-model is trained as a feature extraction model. And the gradient backward transfer refers to updating the weight from the last layer in the training process, and updating the weight of the previous layer based on the updated weight of the current layer.

The difference between the display frame and the corresponding prediction frame can be represented by the difference in pixel values of the same pixel position in the display frame and the corresponding prediction frame, that is, after the difference in pixel values of the same pixel position in the display frame and the corresponding prediction frame is determined, the initial self-encoding model can be trained based on the optimization target for minimizing the difference in pixel values. The difference between the display screen frame and the corresponding prediction screen frame may be represented by other differences, such as a luminance difference.

The difference between the pixel values of the same pixel position in the display frame and the corresponding prediction frame can be represented by the pixel value difference of three channels, the three channels are red, green and blue channels, respectively, so that the pixel value difference of each pixel point can be the sum of the pixel value differences of the three channels of the pixel point, and the difference between the display frame and the corresponding prediction frame can be represented as the average value of the pixel value differences of each pixel point, referring to the following formula:

wherein L is the difference between the display frame and the corresponding prediction frame, m is the number of pixels in the display frame, and the number of pixels in the prediction frame corresponding to the display frame is also n, y _p,r 、y _p,g And y _p,b Respectively the pixel values of the red, green and blue channels of the p-th pixel point in the display frame, y _p,r1 、y _p,g1 And y _p,b1 The pixel values of the red channel, the green channel and the blue channel of the p-th pixel point in the predicted picture frame are respectively, and the p-th pixel point in the displayed picture frame and the p-th pixel point in the predicted picture frame are at the same pixel position.

And S104, acquiring a feature vector corresponding to the display frame according to the feature extraction model, and constructing a probability model based on a fitting normal distribution mode according to the feature vector.

After the initial feature extraction submodel is trained into the feature extraction model, the feature extraction model has accurate feature extraction capability, the extracted features can represent a display frame, and the feature extraction model has lower feature dimensionality compared with the display frame, so that feature vectors corresponding to the display frame can be obtained according to the feature extraction model, and a probability model is constructed according to the feature vectors and based on a fitting Normal distribution (Normal distribution) mode. Because the abnormal display frames generally have low probability, the proportion of the normally displayed frame in the display frame frames in the frame set is very high, and even the abnormal display frame is not available in the frame set in some scenes, so that the normal distribution rule learned by the probability model can basically and accurately describe the characteristic characteristics of the normally displayed frame under the condition of not marking a sample.

The normal distribution is also called Gaussian distribution (Gaussian distribution), which is a distribution of random variables having mean μ and variance σ ² Two parameters, mean μ being the mean of the random variable obeying a normal distribution, variance σ ² Is the variance of this random variable, and is thereforeA normal distribution can also be written as N (. Mu.,. Sigma.) ² ). The probability law of the random variable obeying normal distribution is that the probability of taking a value adjacent to mu is high, and the probability of taking a value farther away from mu is lower; the smaller σ, the more concentrated the distribution is near μ, and the larger σ, the more dispersed the distribution. The normally distributed density function is characterized in that: with respect to μ symmetry, a maximum is reached at μ, a value of 0 at positive (negative) infinity, and an inflection point at μ ± σ.

The feature vector corresponding to the display frame is obtained according to the feature extraction model, and the feature extraction model may be implemented by inputting the display frame into the feature extraction model, and the feature extraction model may output the feature vector after inputting the display frame, where the feature vector may include a plurality of feature dimensions, each feature dimension corresponds to one sub-feature, and each sub-feature may include a plurality of feature values.

According to the feature vectors, a probability model is built based on a fitting normal distribution mode, specifically, after the sub-features corresponding to a plurality of feature dimensions are subjected to normalization respectively, the mean value and the variance of each feature dimension are counted based on the fitting normal distribution mode, the sub-probability models corresponding to the feature dimensions are determined based on the mean value and the variance, and the probability model is built according to the sub-probability models corresponding to the feature dimensions. The sub-probability model corresponding to a certain characteristic dimension is configured with normal distribution determined by the mean and the variance of the characteristic dimension, and can simulate the characteristic distribution of a normal image in the characteristic dimension, wherein each value in the normal distribution corresponds to the probability determined according to the density function, and the density function is determined according to the mean and the variance. The probability model may also be referred to as a density estimation model, and may include a combination of multiple sub-probability models, where a probability corresponding to a feature dimension corresponding to a target probability model may be determined by a target probability model in the multiple sub-probability models, and the probability model may also be a model obtained by combining multiple sub-probability models, where a probability corresponding to a feature dimension may be determined by the probability model.

Before the probability model is constructed based on the fitting normal distribution mode according to the feature vectors, the feature vectors can be screened. Specifically, if the feature vector includes a plurality of feature dimensions, before the probability model is constructed based on a fitting normal distribution manner according to the feature vector, the discrete degrees respectively corresponding to the sub-features of the feature vector under the plurality of feature dimensions are determined, the target feature dimension is determined when the discrete degrees in the plurality of feature dimensions satisfy the discrete condition, and the sub-feature corresponding to the target feature dimension is deleted from the feature vector. The discrete degree refers to the difference degree between values of the observation variables, the larger the discrete degree is, the larger the difference of the feature vectors in the sub-features is, the more dispersed the sub-features are, the characteristics of the corresponding sub-features are difficult to embody, and ideally, each sub-feature preferably conforms to the features of approximate Gaussian distribution, so that the probability model can be conveniently constructed subsequently. The discrete degree meeting the discrete condition can be that the discrete degree is larger than the preset discrete degree, the discrete degree can be represented by standard deviation or variance, and the like, and by deleting the sub-features corresponding to the target feature dimension meeting the discrete condition from the feature vector, the sub-features with poor reliability can be deleted, so that the screening of the feature vector is realized, and the fitting precision is improved.

In specific implementation, the discrete degree can be represented by a variance, the preset discrete degree can be represented by a preset variance value, the sub-features of the feature vector under a plurality of feature dimensions can be subjected to normalization processing respectively and then subjected to statistics to obtain the corresponding variance, when the variance is larger than the preset variance value, the discrete degree corresponding to the feature dimension is determined to be larger, the discrete condition is met, and then the sub-features corresponding to the feature dimension can be deleted from the feature vector. If normalization processing and variance statistics are performed on each sub-feature and sub-feature screening is performed, normalization processing and variance statistics are not required in the process of constructing the probability model according to the feature vectors, and the variance determined in the screening process can be directly used.

The sub-features of each dimension of the feature vector are feature vectors of a normal sample set, and normalization processing can be performed by the following formula to normalize the sub-features to be within a specific range:

wherein, y _i For the ith eigenvalue in the sub-feature of one eigen dimension in the eigenvector, each eigenvalue y _i Sub-features, y, constituting the feature dimension _min Is the smallest eigenvalue, y, of the sub-features of the feature dimension _max Is the largest eigenvalue, y, of the sub-features of the feature dimension _i11 For the ith eigenvalue in the normalized sub-feature of the feature dimension, each normalized eigenvalue y _i11 And forming the normalized sub-features of the feature dimension, so that each feature value can be normalized to be within the range of 0 to 1, the variance obtained based on the statistics is equivalent to the normalized variance, the variance can be compared with a preset variance value, and if the variance is greater than the preset variance value, the discrete degree of the sub-features of the feature dimension is larger, and the discrete condition is met.

Before the probability model is built according to the feature vectors and based on the mode of fitting normal distribution, other features of the display frame can be determined, so that the probability model is built based on the mode of fitting normal distribution by combining the feature vectors and the other features in the process of building the probability model. Other characteristics may be, for example, a brightness distribution parameter, etc.

Specifically, before the probability model is constructed based on the fitting normal distribution mode according to the feature vector, the display frame can be subjected to grid division to obtain a plurality of picture subframes, and brightness distribution parameters corresponding to the plurality of picture subframes are determined respectively.

In specific implementation, in the process of meshing the display frame, gray processing may be performed on the display frame to obtain a gray image corresponding to the display frame, and then the gray image is meshed to obtain a plurality of image subframes, where the luminance mean and variance corresponding to each of the plurality of image subframes may be used as luminance distribution parameters, and the luminance mean and variance may embody the luminance and luminance dispersion programs of the plurality of image subframes, and may be used as the luminance distribution parameters of the plurality of image subframes. The number of the multiple picture subframes may be 3 × 3, and 9 luminance means and variances may be obtained.

After the brightness distribution parameters are obtained, in the process of building the probability model based on the fitting normal distribution mode according to the feature vectors and the brightness distribution parameters, gaussian blur kernels corresponding to a plurality of picture subframes can be determined according to the brightness distribution parameters, the brightness features of the corresponding picture subframes are subjected to convolution processing by the Gaussian blur kernels to obtain corresponding convolution features, then the feature vectors and the convolution features are spliced to obtain image features, the probability model is built based on the fitting normal distribution mode according to the image features, the features can be fuzzified by performing convolution processing on the brightness features, and negative effects caused by slight differences are reduced.

In the process of performing convolution processing on the brightness features, the weight corresponding to the brightness features of each picture subframe is gradually reduced from the center to the boundary to form Gaussian distribution, and the convolution kernel with the weights is used as a Gaussian blur kernel, so that the brightness features at the boundary position have lower weights, the obtained convolution features can weaken the influence of the grid boundary, and prevent the imaging of the object from moving at the two grid boundaries to severely influence the feature vectors. The feature vectors and the convolution features are spliced, and the feature vectors and the convolution features are overlapped to form sub-features with more dimensions, namely the number of the feature dimensions of the image features is the sum of the number of the feature dimensions of the feature vectors and the number of the feature dimensions of the convolution features, part of the sub-features in the image features are the sub-features of the feature vectors, and the other part of the sub-features are the sub-features of the convolution features.

The probability model is constructed based on a fitting normal distribution mode according to the image characteristics, and specifically, after sub-characteristics corresponding to a plurality of characteristic dimensions in the image characteristics are respectively normalized, the mean value and the variance of each characteristic dimension are counted based on the fitting normal distribution mode, the sub-probability models corresponding to the plurality of characteristic dimensions are determined based on the mean value and the variance, and the probability model is constructed according to the sub-probability models corresponding to the plurality of characteristic dimensions. The sub-probability model corresponding to a certain characteristic dimension is configured with normal distribution determined by the mean and the variance of the characteristic dimension, and each value in the normal distribution corresponds to the probability determined according to the mean and the variance.

Before the probability model is constructed based on the fitting normal distribution mode according to the image characteristics, the image characteristics can be screened. Specifically, before the probability model is constructed based on the fitting normal distribution mode according to the image features, the discrete degrees respectively corresponding to the sub-features of the image features under the multiple feature dimensions are determined, the discrete degrees meeting the discrete condition in the multiple feature latitudes are determined as the target feature latitudes, the sub-features corresponding to the target feature latitudes are deleted from the image features, the sub-features with poor reliability can be deleted, the feature vector is screened, and the fitting accuracy is improved.

In specific implementation, the discrete degree can be represented by a variance, the preset discrete degree can be represented by a preset variance value, the sub-features of the image feature in multiple feature dimensions can be subjected to normalization processing respectively and then subjected to statistics to obtain the corresponding variance, when the variance is larger than the preset variance value, the discrete degree corresponding to the feature dimension is determined to be large, the discrete condition is met, and then the sub-features corresponding to the feature dimension can be deleted from the image feature. If normalization processing and variance statistics are performed on each sub-feature and sub-feature screening is performed, the variance in the screening process can be directly utilized without normalization processing and variance statistics in the process of constructing the probability model according to the image features.

The sub-features of each dimension of the image feature are feature vectors or convolution features of a normal sample set, and the normalization process can be performed by the following formula to normalize the sub-features to be within a specific range:

wherein, y _i1 For the ith characteristic value in the sub-characteristic of one characteristic dimension in the image characteristic, each characteristic value y _i1 Sub-features, y, constituting the feature dimension _min1 Is the smallest eigenvalue, y, of the sub-features of the feature dimension _max1 Is the largest eigenvalue, y, of the sub-features of the feature dimension _i2 For the ith eigenvalue in the normalized sub-feature of the feature dimension, each normalized eigenvalue y _i2 And forming the normalized sub-features of the feature dimension, so that each feature value can be normalized to be within the range of 0 to 1, the variance obtained based on the statistics is equivalent to the normalized variance, the normalized sub-features can be compared with a preset variance value, and if the variance is greater than the preset variance value, the discrete degree of the sub-features of the feature dimension is larger, and the discrete condition is met.

S105, when the frame to be detected of the target application program is obtained, the target feature vector of the frame to be detected is obtained through the feature extraction model, the prediction probability corresponding to the target feature vector is determined through the probability model, and the prediction probability is used for identifying whether the frame to be detected has an abnormal display picture.

In the embodiment of the application, a frame to be detected of a target application program may be obtained, where the frame to be detected may be a normal display frame or an abnormal display frame, the frame to be detected and the display frame may not belong to the same version of the target application program, the version of the target application program to which the display frame belongs is recorded as a first version, and the version of the target application program to which the frame to be detected belongs is recorded as a second version, that is, the frame to be detected is obtained based on the target application program of the second version, and the first version and the second version are named for distinguishing different versions and do not represent a version order of the target application program in a development process. The second version can be later than the first version, the first version is an old version which is already finished with abnormal display detection, the second version can be a new version, detection of the frame to be detected in the second version can be performed based on the display frame of the first version, the abnormal display detection of the first version can be performed in the mode in the embodiment of the application and can also be manually realized, and thus the abnormal display detection of the frame of each version is not required to be manually performed.

The process of acquiring the to-be-detected frame of the target application may refer to the process of acquiring the display frame of the target application, and specifically, may acquire the game map of the target application, and may automatically acquire the to-be-detected frame by traversing the game map and acquiring the to-be-detected frame in the game map. During specific implementation, a vertex set of geometric polygons forming a game map can be determined, a map traversal path is determined according to the position relation of the vertices in the vertex set in the game map, a game role is controlled to move along the map traversal path, and a game picture in the moving process is obtained to serve as a picture frame to be detected, so that enough picture frames to be detected can be automatically obtained, and the enough picture frames to be detected can be subjected to abnormity identification.

The frame of the picture to be detected can be all game pictures in the moving process of the game role, and can also be a part of game pictures, because the images of adjacent frames have high similarity, and even if a part of pictures are selected, the characteristics of the game pictures can not be excessively lost, so that the detection of the target application program of the second version can be realized under the condition of monitoring the least number of the frame frames to be detected, for example, one frame of picture can be intercepted from every 5 frames of pictures in all the game pictures as the frame of the picture to be detected, and under the condition that 30 frames of pictures are displayed in 1 second, 6 frame frames to be detected can be intercepted from the game pictures in 1 second.

When a frame to be detected of a target application program is obtained, a target characteristic vector of the frame to be detected is obtained through the characteristic extraction model, and the prediction probability corresponding to the target characteristic vector is determined through the probability model, so that the prediction probability for identifying whether the frame has an abnormal display frame can be accurately obtained through the characteristic extraction model and the probability model, the sample marking cost is greatly saved, and the probability model learns the overall characteristic characteristics of the normally displayed frame, so that even if a brand-new abnormal display condition occurs in the frame to be predicted in the prediction process, the probability model can identify the abnormality based on the learned characteristic characteristics, and the abnormality identification precision is improved.

The feature extraction model has accurate deep feature extraction capability, so that a target feature vector of a picture frame to be detected can be obtained through the feature extraction model, the target feature vector can represent the picture frame to be detected, and compared with the picture frame to be detected, the feature extraction model has lower feature dimensionality, the prediction probability corresponding to the target feature vector can be determined through the probability model.

Specifically, if the target feature vector includes a plurality of feature dimensions, in the process of determining the prediction probability, the sub-prediction probabilities corresponding to the plurality of feature dimensions respectively may be determined through a probability model, the prediction probability may be determined according to the sub-prediction probabilities corresponding to the plurality of feature dimensions respectively, and the prediction probability model may be determined according to the sub-features of the plurality of feature dimensions, so that the accuracy is high. The plurality of feature dimensions can respectively correspond to the sub-probability models, and the sub-prediction probabilities respectively corresponding to the plurality of feature dimensions can be determined by utilizing the functions of the plurality of sub-probability models corresponding to the plurality of feature dimensions; the sub-prediction probabilities respectively corresponding to the multiple feature dimensions respectively refer to the probabilities corresponding to the sub-features in normal distribution, the sub-features representing the feature dimensions enable the probability that the frame to be detected is a normal display picture, the prediction probability can be determined according to the multiple sub-prediction probabilities, the multiple sub-prediction probabilities are in positive correlation with the prediction probability, and the prediction probability is larger when the multiple sub-prediction probabilities are larger.

In specific implementation, the prediction probability may be the sum of logarithms (log values) of the sub-prediction probabilities, and the manner of determining the prediction probability according to the sub-prediction probabilities may be expressed by the following formula:

wherein N is the number of multiple feature dimensions, x _n For sub-features corresponding to the nth feature dimension, p (x) _n ) And p (x) is the prediction probability of the picture frame to be detected.

After the prediction probability is determined, whether the frame to be detected is an abnormal display frame or not can be determined according to the prediction probability, specifically, if the prediction probability is smaller than a preset probability value, the frame to be detected can be determined to be the abnormal display frame, and if not, the frame to be detected is a normal display frame. Therefore, the frame to be detected can be automatically acquired, the abnormal detection of the frame to be detected can be automatically performed, the characteristic distribution of a normally displayed frame is captured through the probability model, the prediction probability of the frame to be detected is calculated according to the characteristic distribution, the abnormal type does not need to be considered, the abnormal condition beyond the expectation of the game can be automatically identified, the dependence on a contrast sample and a negative sample with a label is not needed, and the sensitivity to different scenes is low.

In the game scene, a game map may be traversed to obtain a display frame, an initial self-coding model is trained by using the display frame to obtain a feature extraction model with a feature extraction function, the display frame is adjusted and extracted by using the feature extraction model to obtain a feature vector, and after the feature vector is subjected to feature screening, a probability model may be trained by using the feature vector, so that training of the feature extraction model and the probability model is completed.

Then, a picture frame to be detected is obtained, if the picture frame to be detected is an abnormal display picture, for example, at least one of an illumination abnormal picture, a model collision abnormal picture or a material rendering abnormal picture, a target feature vector of the picture frame to be detected can be extracted by using the feature extraction model, the target feature vector can represent the picture frame to be detected, and pixel value distribution which can reflect the characteristics of illumination abnormality, model collision abnormality, material rendering abnormality and the like necessarily exists, for example, pixel value distribution abnormality is caused by uneven illumination in the illumination abnormal picture or pixel value distribution abnormality is caused by improper distance between two virtual models in a through-mode abnormal picture or pixel value distribution abnormality is caused by improper material of the virtual models.

And then, determining the prediction probability corresponding to the target characteristic vector by using a probability model, wherein if the target characteristic vector has the problem of abnormal pixel value distribution caused by the problem, the probability corresponding to the normal value of the pixel value distribution in the normal distribution configured in the probability model is higher, so that the matching degree of the target characteristic vector and the value corresponding to the higher probability in the normal distribution is lower, and the corresponding prediction probability is also lower. On the contrary, if the problem of abnormal pixel value distribution does not exist in the target feature vector, the matching degree of the value of the target feature vector and the value of the corresponding higher probability in the normal distribution is higher, and the corresponding prediction probability is also higher. And further, whether the frame of the picture to be detected is an abnormal display picture can be determined according to the prediction probability, so that the abnormal detection of the picture is realized.

Based on an image recognition method provided in an embodiment of the present application, an embodiment of the present application further provides an image recognition apparatus, as shown in fig. 7, for a structural block diagram of the image recognition apparatus provided in the embodiment of the present application, the image recognition apparatus 1300 may include:

a frame acquiring unit 1301, configured to acquire a frame set of a target application, where the frame set includes a plurality of display frame corresponding to the target application;

an input unit 1302, configured to input the display frame into an initial self-coding model, where the initial self-coding model includes an initial feature extraction sub-model and an initial image reduction sub-model, the initial feature extraction sub-model is used to extract an initial feature vector of the display frame, and the initial image reduction sub-model is used to obtain a predicted frame based on the initial feature vector;

a training unit 1303, configured to train the initial self-coding model based on a difference between the display picture frame and the corresponding prediction picture frame, and train the initial feature extraction submodel as a feature extraction model;

a constructing unit 1304, configured to obtain a feature vector corresponding to the display frame according to the feature extraction model, and construct a probability model based on a fitting normal distribution manner according to the feature vector;

the detecting unit 1305 is configured to, when a to-be-detected picture frame of the target application program is obtained, obtain a target feature vector of the to-be-detected picture frame through the feature extraction model, and determine, through the probability model, a prediction probability corresponding to the target feature vector, where the prediction probability is used to identify whether the to-be-detected picture frame has an abnormal display picture.

Optionally, the apparatus further comprises:

the dividing unit is used for carrying out grid division on the display picture frame to obtain a plurality of picture subframes;

a parameter determining unit, configured to determine brightness distribution parameters corresponding to the plurality of picture subframes respectively;

the building unit 1304 includes:

the characteristic acquisition unit is used for acquiring a characteristic vector corresponding to the display frame according to the characteristic extraction model;

and the construction subunit is used for constructing a probability model based on a fitting normal distribution mode according to the feature vector and the brightness distribution parameter.

Optionally, the dividing unit includes:

the gray processing unit is used for carrying out gray processing on the display picture frame to obtain a gray image corresponding to the display picture frame;

the dividing subunit is used for carrying out grid division on the gray level image to obtain a plurality of picture subframes;

the parameter determining unit is specifically configured to:

and determining the brightness mean value and the variance corresponding to the plurality of picture subframes respectively as the brightness distribution parameters.

Optionally, the building subunit includes:

the kernel determining unit is used for determining Gaussian blur kernels corresponding to the plurality of picture subframes according to the brightness distribution parameters;

the convolution unit is used for carrying out convolution processing on the brightness characteristic of the corresponding picture subframe by utilizing the Gaussian blur kernel to obtain a corresponding convolution characteristic;

the splicing unit is used for splicing the feature vector and the convolution feature to obtain an image feature;

and the model building unit is used for building the probability model based on a fitting normal distribution mode according to the image characteristics.

Optionally, the feature vector includes a plurality of feature dimensions, and the apparatus further includes:

the discrete degree determining unit is used for determining discrete degrees corresponding to the sub-features of the feature vectors under the multiple feature dimensions respectively before the probability model is constructed based on a fitting normal distribution mode according to the feature vectors;

the screening unit is used for determining that the dispersion degree in the plurality of feature dimensions meets the dispersion condition as a target feature dimension;

and the characteristic deleting unit is used for deleting the sub-characteristics corresponding to the target characteristic dimension from the characteristic vector.

Optionally, when the target application is a game application, the frame obtaining unit 1301 includes:

a game map acquisition unit for acquiring a game map of the target application;

a vertex set determination unit for determining a set of vertices of geometric polygons constituting the game map;

the traversal path determining unit is used for determining a map traversal path according to the position relation of the vertexes in the vertex set in the game map;

and the frame acquisition subunit is used for controlling the game role to move along the map traversal path and acquiring the game picture in the moving process as the display picture frame.

Optionally, the training unit 1303 includes:

a difference determining unit, configured to determine a difference between pixel values of a same pixel position in the display frame and the corresponding prediction frame;

a training subunit, configured to train the initial self-coding model based on an optimization objective of minimizing the difference of the pixel values.

Optionally, the detecting unit 1305 includes an extracting unit and a probability predicting unit;

the extraction unit is used for acquiring a target feature vector of the picture frame to be detected through the feature extraction model when the picture frame to be detected of the target application program is acquired;

the probability prediction unit comprises:

a sub-prediction probability determining unit, configured to determine, through the probability model, sub-prediction probabilities respectively corresponding to the multiple feature dimensions;

and the prediction subunit is used for determining the prediction probability according to the sub-prediction probabilities respectively corresponding to the plurality of characteristic dimensions.

Optionally, the display frame is obtained based on a first version of the target application program, the frame to be detected is obtained based on a second version of the target application program, the first version is a version in which abnormal display detection is completed, and the second version is later than the first version.

Optionally, when the target application program is a game application program, the abnormal display screen includes at least one of an illumination abnormal screen, a model collision abnormal screen, or a material rendering abnormal screen.

Therefore, before abnormal display recognition is carried out on the frame to be detected of the target application program, the frame set of the target application program can be obtained, the initial self-coding model is trained on the basis of the display frame in the frame set, the self-coding model is based on the fact that the input display frame restores the predicted frame after feature extraction, and the feature extraction model with high extraction precision can be obtained by training without marking a sample label as the unsupervised training is carried out through the difference between the display frame and the predicted frame. And then constructing a probability model based on a fitting normal distribution mode by using the feature vector of the display frame acquired by the feature extraction model. Because the abnormal display pictures generally have low probability, the proportion of the picture frames belonging to normal display in the display picture frames in the frame set is very high, and even the abnormal display picture frames are not in the frame set possibly in some scenes, so that the normal distribution rule learned by the probability model can basically and accurately describe the characteristic characteristics of the normally displayed picture frames under the condition of not marking samples. Therefore, when the frame to be detected of the target application program is obtained, the target characteristic vector of the frame to be detected is obtained through the characteristic extraction model, and the prediction probability corresponding to the target characteristic vector is determined through the probability model, so that the prediction probability for identifying whether the frame has an abnormal display frame or not can be accurately obtained through the characteristic extraction model and the probability model, the sample marking cost is greatly saved, and in addition, as the probability model learns the overall characteristic characteristics of the normally displayed frame, even if a brand new abnormal display condition occurs in the frame to be predicted in the prediction process, the probability model can also recognize the abnormality based on the learned characteristic characteristics, and the abnormality recognition precision is improved.

An embodiment of the present application further provides a computer device, where the computer device is the computer device described above, and may include a terminal device or a server, and the image recognition apparatus described above may be configured in the computer device. The computer apparatus is described below with reference to the accompanying drawings.

If the computer device is a terminal device, please refer to fig. 8, an embodiment of the present application provides a terminal device, taking the terminal device as a mobile phone as an example:

fig. 8 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 8, the handset includes: radio Frequency (RF) circuitry 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuitry 1460, wireless fidelity (WiFi) module 1470, processor 1480, and power supply 1490. Those skilled in the art will appreciate that the handset configuration shown in fig. 8 is not intended to be limiting and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 8:

RF circuit 1410 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for processing received downlink information of a base station to processor 1480; in addition, data for designing uplink is transmitted to the base station.

The memory 1420 may be used to store software programs and modules, and the processor 1480 executes various functional applications and data processing of the cellular phone by operating the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. Further, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432.

The display unit 1440 may be used to display information input by or provided to the user and various menus of the cellular phone. The display unit 1440 may include a display panel 1441.

The handset may also include at least one sensor 1450, such as light sensors, motion sensors, and other sensors.

Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between a user and a cell phone.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through a WiFi module 1470, and provides wireless broadband internet access for the user.

The processor 1480 is the control center of the mobile phone, connects the various parts of the entire mobile phone by various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1420 and calling data stored in the memory 1420.

The handset also includes a power supply 1490 (such as a battery) that powers the various components.

In this embodiment, the processor 1480 included in the terminal device further has the following functions:

inputting the display frame into an initial self-coding model, wherein the initial self-coding model comprises an initial feature extraction sub-model and an initial image reduction sub-model, the initial feature extraction sub-model is used for extracting an initial feature vector of the display frame, and the initial image reduction sub-model is used for obtaining a prediction frame based on the initial feature vector reduction;

If the computer device is a server, the embodiment of the present application further provides a server, please refer to fig. 9, where fig. 9 is a structural diagram of the server 1500 provided in the embodiment of the present application, and the server 1500 may have a large difference due to different configurations or performances, and may include one or more processors 1522, such as Central Processing Units (CPUs), a memory 1532, and one or more storage media 1530 (e.g., one or more mass storage devices) for storing applications 1542 or data 1544. Memory 1532 and storage media 1530 may be, among other things, transient or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Further, the processor 1522 may be configured to communicate with the storage medium 1530 to execute a series of instruction operations in the storage medium 1530 on the server 1500.

Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input-output interfaces 1558, and/or one or more operating systems 1541, such as a Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM , Linux ^TM ，FreeBSD ^TM And so on.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 9.

In addition, a storage medium is provided in an embodiment of the present application, and the storage medium is used for storing a computer program, and the computer program is used for executing the method provided in the embodiment.

The embodiment of the present application also provides a computer program product including instructions, which when run on a computer, causes the computer to execute the method provided by the above embodiment.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as a Read-only Memory (ROM), a RAM, a magnetic disk, or an optical disk.

It should be noted that, in this specification, each embodiment is described in a progressive manner, and the same and similar parts between the embodiments are referred to each other, and each embodiment focuses on differences from other embodiments. In particular, the apparatus and system embodiments, because they are substantially similar to the method embodiments, are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Moreover, the present application may be further combined to provide more implementation manners on the basis of the implementation manners provided by the above aspects. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image recognition method, characterized in that the method comprises:

2. The method of claim 1, further comprising:

carrying out grid division on the display picture frame to obtain a plurality of picture subframes;

determining brightness distribution parameters corresponding to the plurality of picture subframes respectively;

the method for constructing a probability model based on fitting normal distribution according to the feature vector comprises the following steps:

and constructing a probability model based on a fitting normal distribution mode according to the feature vector and the brightness distribution parameter.

3. The method of claim 2, wherein said gridding said display picture frame into a plurality of picture sub-frames comprises:

carrying out gray level processing on the display picture frame to obtain a gray level image corresponding to the display picture frame;

carrying out grid division on the gray level image to obtain a plurality of picture subframes;

the determining the brightness distribution parameters corresponding to the plurality of picture subframes respectively includes:

4. The method according to claim 3, wherein the constructing a probability model based on fitting a normal distribution according to the feature vector and the brightness distribution parameter comprises:

determining Gaussian blur kernels respectively corresponding to the plurality of picture subframes according to the brightness distribution parameters;

performing convolution processing on the brightness characteristic of the corresponding picture subframe by using the Gaussian blur kernel to obtain a corresponding convolution characteristic;

splicing the feature vector and the convolution feature to obtain an image feature;

and constructing the probability model based on a fitting normal distribution mode according to the image characteristics.

5. The method according to any one of claims 1-4, wherein the feature vector comprises a plurality of feature dimensions, and before the constructing the probabilistic model based on fitting a normal distribution according to the feature vector, the method further comprises:

determining the discrete degrees respectively corresponding to the sub-features of the feature vector under the multiple feature dimensions;

determining that the dispersion degree in the plurality of feature dimensions meets a dispersion condition as a target feature dimension;

and deleting the sub-features corresponding to the target feature dimension from the feature vector.

6. The method of any of claims 1-4, wherein when the target application is a game application, the obtaining a frame set of the target application comprises:

acquiring a game map of the target application program;

determining a set of vertices of geometric polygons that make up the game map;

determining a map traversal path according to the position relation of the vertexes in the vertex set in the game map;

and controlling the game role to move along the map traversal path, and acquiring the game picture in the moving process as the display picture frame.

7. The method according to any of claims 1-4, wherein the training of the initial self-coding model based on the difference between the display picture frame and the corresponding prediction picture frame comprises:

determining the difference of pixel values of the same pixel position in the display picture frame and the corresponding prediction picture frame;

training the initial self-encoding model based on an optimization objective that minimizes the pixel value difference.

8. The method according to any one of claims 1-4, wherein the target feature vector comprises a plurality of feature dimensions, and the determining, by the probability model, the prediction probability corresponding to the target feature vector comprises:

determining sub-prediction probabilities respectively corresponding to the plurality of feature dimensions through the probability model;

and determining the prediction probability according to the sub-prediction probabilities respectively corresponding to the plurality of characteristic dimensions.

9. The method according to any one of claims 1 to 4, wherein the display frame is obtained based on a first version of the target application, and the frame to be detected is obtained based on a second version of the target application, wherein the first version is a version in which abnormal display detection is completed, and the second version is later than the first version.

10. The method of any of claims 1-4, wherein when the target application is a gaming application, the exception display comprises at least one of a lighting exception, a model collision exception, or a material rendering exception.

11. An image recognition apparatus, characterized in that the apparatus comprises:

the input unit is used for inputting the display picture frame into an initial self-coding model, the initial self-coding model comprises an initial feature extraction sub-model and an initial image restoration sub-model, the initial feature extraction sub-model is used for extracting an initial feature vector of the display picture frame, and the initial image restoration sub-model is used for restoring based on the initial feature vector to obtain a prediction picture frame;

a training unit for training the initial self-coding model based on the difference between the display picture frame and the corresponding prediction picture frame, and training the initial feature extraction sub-model as a feature extraction model;

12. The apparatus of claim 11, further comprising:

the construction unit includes:

the characteristic acquisition unit is used for acquiring a characteristic vector corresponding to the display picture frame according to the characteristic extraction model;

13. A computer device, the computer device comprising a processor and a memory:

the processor is adapted to perform the method of any of claims 1-10 according to instructions in the computer program.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any one of claims 1-10.

15. A computer program product comprising instructions that, when run on a computer device, cause the computer device to perform the method of any one of claims 1-10.