CN113516146A

CN113516146A - Data classification method, computer and readable storage medium

Info

Publication number: CN113516146A
Application number: CN202011520177.6A
Authority: CN
Inventors: 吴涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-10-19

Abstract

The embodiment of the application discloses a data classification method, a computer and a readable storage medium, which relate to the field of artificial intelligence, and the method comprises the following steps: acquiring a visual identifier corresponding to an original image, identifying a detection position coordinate used for extracting an attention object in the original image, acquiring a target detection image containing the attention object from the original image based on the detection position coordinate, and determining detection position information used for extracting the attention object according to the visual identifier and the detection position coordinate; the visual identifier is used for representing the corresponding detection vision of the original image in the object of interest; acquiring image characteristics of a target detection image, and carrying out scale transformation on detection position information to obtain position characteristics of the target detection image; and performing feature splicing on the image features and the position features to obtain fusion features, and performing classification processing on the fusion features to obtain the image category to which the target detection image belongs. By the aid of the method and the device, accuracy of data identification and classification can be improved.

Description

Data classification method, computer and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data classification method, a computer, and a readable storage medium.

Background

With the arrival of information and intelligent society, industrial product production gradually moves to intelligent production, which greatly improves productivity, but with the manual large-scale liberation, how to improve product quality and yield, thereby reducing raw material consumption and labor cost investment, becoming a core problem facing the factory in the process of digital and intelligent modification, therefore, the detection of products or parts is very important, and the defect detection becomes an industrially very important application. In the field of industrial defect detection, manual inspection is generally used, but the manual inspection is high in cost and is easily affected by the proficiency of workers, so that the detection accuracy and efficiency are greatly different; or, the industrial defects are identified by adopting a mode of firstly detecting (or segmenting) and then secondarily classifying, that is, the industrial defects are directly identified by adopting a target detection model and the like, so that the accuracy of the identification result is low.

Disclosure of Invention

The embodiment of the application provides a data classification method, a computer and a readable storage medium, which can improve the accuracy of data identification and classification.

One aspect of the embodiments of the present application provides a data classification method, including:

acquiring a visual identifier corresponding to an original image, identifying a detection position coordinate used for extracting an attention object in the original image, acquiring a target detection image containing the attention object from the original image based on the detection position coordinate, and determining detection position information corresponding to the original image according to the visual identifier and the detection position coordinate; the visual identification is used for representing the visual angle of the attention object in the article corresponding to the original image;

acquiring image characteristics of a target detection image, and carrying out scale transformation on detection position information to obtain position characteristics of the target detection image;

and performing feature splicing on the image features and the position features to obtain fusion features, determining key features in the image features based on the position features, and classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs.

The method for identifying the detection position coordinates used for extracting the attention object in the original image and acquiring the target detection image containing the attention object from the original image based on the detection position coordinates comprises the following steps:

acquiring an original image, carrying out object detection on the original image, acquiring a first prediction frame containing an attention object, and determining a detection position coordinate according to first position information of the first prediction frame;

and determining a second prediction frame according to the detection position coordinates, and determining the area indicated by the second prediction frame in the original image as a target detection image.

Wherein, according to the first position information of the first prediction frame, determining the detection position coordinate comprises:

determining the frame width and the frame height of the first prediction frame according to the first position information of the first prediction frame;

and adjusting the size of the first prediction frame based on the frame width and the frame height, and determining the detection position coordinate according to the adjusted first prediction frame.

Wherein, based on frame width and frame height, carry out size adjustment to first prediction frame, confirm detection position coordinate according to the first prediction frame after the adjustment, include:

according to the difference between the frame width and the frame height and the first position information, carrying out size adjustment on the first prediction frame, and determining second position information and the adjusted width and the adjusted height of the adjusted first prediction frame;

and acquiring a frame expansion coefficient, carrying out size transformation on the adjustment width and the adjustment height based on the frame expansion coefficient, and determining a detection position coordinate according to the second position information and the transformed adjustment width and adjustment height.

The method for acquiring the image characteristics of the target detection image comprises the following steps:

and inputting the target detection image into a convolutional neural network in the data classification model, and extracting the features of the target detection image based on a convolutional layer in the convolutional neural network to obtain the image features of the target detection image.

The method for carrying out scale transformation on the detection position information to obtain the position characteristics of the target detection image comprises the following steps:

acquiring the image width and the image height of a target detection image, and normalizing the detection position coordinate based on the image width and the image height to obtain a normalized position coordinate;

acquiring a total number of the vision, and performing normalization processing on the vision identification based on the total number of the vision to obtain a normalized vision identification;

and generating a perception input feature according to the normalized position coordinate and the normalized visual identification, and performing scale transformation on the perception input feature by adopting a multilayer perceptron in the data classification model to obtain the position feature of the target detection image.

Wherein, carry out the feature concatenation to image feature and position feature, obtain and fuse the feature, include:

normalizing the image characteristics to obtain normalized image characteristics, and normalizing the position characteristics to obtain normalized position characteristics;

and carrying out feature splicing on the normalized image features and the normalized position features to obtain fusion features.

The method for classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs includes:

obtaining a scale factor in a classifier of the data classification model, and carrying out scale transformation on the fusion characteristics based on the scale factor; the scale factor is obtained by training a classifier in the data classification model;

and performing key identification on the features corresponding to the key features in the fusion features after the scale transformation based on the classifier, and performing classification processing on the fusion features after the scale transformation based on the key identification result to obtain the image category to which the target detection image belongs.

classifying the fusion features based on the key features to obtain at least two prediction labels and a prediction probability value of each prediction label;

and determining the prediction label with the maximum prediction probability value as the image category to which the target detection image belongs.

Wherein, the method also comprises:

if the image category belongs to the image abnormal category, acquiring a communication mode associated with the image category;

sending an article abnormal message to the terminal equipment based on the communication mode so that the terminal equipment detects the article corresponding to the original image based on the article abnormal message; the item exception message includes an image category.

acquiring a detection image sample, a sample visual identification of the detection image sample, a detection sample position coordinate and a target sample label, and determining detection sample position information of the detection image sample according to the sample visual identification and the detection sample position coordinate; the sample visual identification is used for representing the visual angle of the detection image sample in the article corresponding to the detection image sample;

adopting an initial convolutional neural network in an initial data classification model to obtain sample image characteristics of a detection image sample, and adopting an initial multilayer perceptron in the initial data classification model to perform scale transformation on detection sample position information to obtain sample position characteristics of the detection image sample;

performing feature splicing on the sample image features and the sample position features to obtain sample fusion features, determining sample key features of the sample image features based on the sample position features, and performing classification processing on the sample fusion features based on the sample key features to obtain image sample categories to which the detected image samples belong;

and training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain the data classification model.

The method comprises the following steps of classifying sample fusion characteristics based on sample key characteristics to obtain image sample categories to which detected image samples belong, wherein the method comprises the following steps:

classifying the sample fusion characteristics based on the sample key characteristics to obtain at least two sample labels and a sample prediction probability value of each sample label; the at least two sample tags comprise a target sample tag;

and determining the sample label with the maximum sample prediction probability value as the image sample class to which the detected image sample belongs.

Wherein the loss function comprises a first loss function and a second loss function;

training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain a data classification model, which comprises the following steps:

generating a first loss function according to the image sample category and the target sample label;

generating a label distribution function according to a target sample label, generating a prediction distribution function according to at least two sample labels and the sample prediction probability value of each sample label, and generating a second loss function according to the label distribution function and the prediction distribution function;

and training the initial data classification model according to the first loss function and the second loss function to obtain the data classification model.

An aspect of an embodiment of the present application provides a data classification apparatus, including:

the input acquisition module is used for acquiring a visual identifier corresponding to the original image, identifying a detection position coordinate used for extracting the attention object in the original image, acquiring a target detection image containing the attention object from the original image based on the detection position coordinate, and determining detection position information corresponding to the original image according to the visual identifier and the detection position coordinate; the visual identification is used for representing the visual angle of the attention object in the article corresponding to the original image;

the characteristic acquisition module is used for acquiring the image characteristics of the target detection image and carrying out scale transformation on the detection position information to obtain the position characteristics of the target detection image;

and the data classification module is used for performing feature splicing on the image features and the position features to obtain fusion features, determining key features in the image features based on the position features, and performing classification processing on the fusion features based on the key features to obtain the image category to which the target detection image belongs.

Wherein, this input acquisition module includes:

the position determining unit is used for acquiring an original image, performing object detection on the original image, acquiring a first prediction frame containing an attention object, and determining a detection position coordinate according to first position information of the first prediction frame;

and the image determining unit is used for determining a second prediction frame according to the detection position coordinates, and determining the area indicated by the second prediction frame in the original image as the target detection image.

Wherein, in determining the detection position coordinates based on the first position information of the first prediction frame, the position determination unit includes:

the position obtaining subunit is configured to determine, according to first position information of the first predicted frame, a frame width and a frame height of the first predicted frame;

and the size adjusting subunit is used for adjusting the size of the first predicted frame based on the frame width and the frame height, and determining the detection position coordinate according to the adjusted first predicted frame.

Wherein, this size adjustment subunit includes:

the position adjusting subunit is used for adjusting the size of the first predicted frame according to the difference between the frame width and the frame height and the first position information, and determining the second position information and the adjusted width and the adjusted height of the adjusted first predicted frame;

and the size conversion subunit is used for acquiring the frame expansion coefficient, performing size conversion on the adjustment width and the adjustment height based on the frame expansion coefficient, and determining the detection position coordinate according to the second position information and the converted adjustment width and adjustment height.

In the aspect of obtaining image features of a target detection image, the feature obtaining module is specifically configured to:

In the aspect of carrying out scale transformation on the detection position information to obtain the position characteristics of the target detection image, the characteristic acquisition module comprises:

the position normalization unit is used for acquiring the image width and the image height of the target detection image and normalizing the detection position coordinate based on the image width and the image height to obtain a normalized position coordinate;

the visual normalization unit is used for acquiring the total visual number and normalizing the visual identification based on the total visual number to obtain a normalized visual identification;

and the position characteristic acquisition unit is used for generating perception input characteristics according to the normalized position coordinates and the normalized visual identification, and carrying out scale transformation on the perception input characteristics by adopting a multilayer perceptron in the data classification model to obtain the position characteristics of the target detection image.

Wherein, in the aspect of carrying out feature splicing on image features and position features to obtain fusion features, the data classification module comprises:

the characteristic normalization unit is used for normalizing the image characteristics to obtain normalized image characteristics and normalizing the position characteristics to obtain normalized position characteristics;

and the characteristic splicing unit is used for carrying out characteristic splicing on the normalized image characteristic and the normalized position characteristic to obtain a fusion characteristic.

In the aspect of classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs, the data classification module includes:

the factor processing unit is used for acquiring scale factors in a classifier of the data classification model and carrying out scale transformation on the fusion characteristics based on the scale factors; the scale factor is obtained by training a classifier in the data classification model;

and the category determining unit is used for carrying out key identification on the features corresponding to the key features in the fusion features after the scale transformation based on the classifier, and carrying out classification processing on the fusion features after the scale transformation based on the key identification result to obtain the image category to which the target detection image belongs.

the result obtaining unit is used for classifying the fusion features based on the key features to obtain at least two prediction labels and the prediction probability value of each prediction label;

the class determination unit is configured to determine the prediction label having the largest prediction probability value as an image class to which the target detection image belongs.

Wherein, the device still includes:

the communication acquisition module is used for acquiring a communication mode associated with the image category if the image category belongs to the image abnormal category;

the message sending module is used for sending an article abnormal message to the terminal equipment based on the communication mode so that the terminal equipment can detect the article corresponding to the original image based on the article abnormal message; the item exception message includes an image category.

the sample acquisition module is used for acquiring a detection image sample, a sample visual identification of the detection image sample, a detection sample position coordinate and a target sample label, and determining the detection sample position information of the detection image sample according to the sample visual identification and the detection sample position coordinate; the sample visual identification is used for representing the visual angle of the detection image sample in the article corresponding to the detection image sample;

the sample characteristic acquisition module is used for acquiring sample image characteristics of the detected image sample by adopting an initial convolutional neural network in an initial data classification model, and carrying out scale transformation on the position information of the detected sample by adopting an initial multilayer perceptron in the initial data classification model to obtain the sample position characteristics of the detected image sample;

the sample classification module is used for performing feature splicing on the sample image features and the sample position features to obtain sample fusion features, determining sample key features of the sample image features based on the sample position features, and performing classification processing on the sample fusion features based on the sample key features to obtain image sample classes to which the detected image samples belong;

and the model training module is used for training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain the data classification model.

In the aspect of classifying the sample fusion features based on the sample key features to obtain the image sample category to which the detected image sample belongs, the sample classification module includes:

the probability obtaining unit is used for classifying the sample fusion characteristics based on the sample key characteristics to obtain at least two sample labels and a sample prediction probability value of each sample label; the at least two sample tags comprise a target sample tag;

and the sample type determining unit is used for determining the sample label with the maximum sample prediction probability value as the image sample type to which the detected image sample belongs.

the model training module comprises:

the first loss generating unit is used for generating a first loss function according to the image sample category and the target sample label;

a second loss generating unit, configured to generate a label distribution function according to the target sample label, generate a prediction distribution function according to the at least two sample labels and the sample prediction probability value of each sample label, and generate a second loss function according to the label distribution function and the prediction distribution function;

and the model generation unit is used for training the initial data classification model according to the first loss function and the second loss function to obtain the data classification model.

One aspect of the embodiments of the present application provides a computer device, including a processor, a memory, and an input/output interface;

the processor is connected to the memory and the input/output interface, respectively, where the input/output interface is configured to receive data and output data, the memory is configured to store a computer program, and the processor is configured to call the computer program to execute the data classification method in one aspect of the embodiments of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded by a processor and execute a data classification method in an aspect of the embodiments of the present application.

An aspect of an embodiment of the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternatives in one aspect of the embodiments of the application.

The embodiment of the application has the following beneficial effects:

in the embodiment of the application, computer equipment acquires a visual identifier corresponding to an original image, identifies a detection position coordinate used for extracting an attention object in the original image, acquires a target detection image containing the attention object from the original image based on the detection position coordinate, and determines detection position information corresponding to the original image according to the visual identifier and the detection position coordinate; the visual identification is used for representing the visual angle of the attention object in the article corresponding to the original image; acquiring image characteristics of a target detection image, and carrying out scale transformation on detection position information to obtain position characteristics of the target detection image; and performing feature splicing on the image features and the position features to obtain fusion features, determining key features in the image features based on the position features, and classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs. By combining the target detection image and the detection position information of the target detection image in the original image, the target detection image can be identified and classified according to the position information, so that the identification emphasis of the detection image is different under different position information, the position corresponding to the acquired image category can be obtained, and the accuracy of data identification and classification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of a network interaction architecture for data classification provided by an embodiment of the present application;

FIG. 2 is a diagram of a data classification model architecture provided by an embodiment of the present application;

FIG. 3 is a flow chart of a method for data classification according to an embodiment of the present application;

fig. 4a to fig. 4b are schematic diagrams of an object recognition scene according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an object detection image acquisition scene provided in an embodiment of the present application;

fig. 6 is a schematic view of an image recognition scene provided in an embodiment of the present application;

FIG. 7 is a flowchart illustrating a data classification model training method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a data classification apparatus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another data classification apparatus provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Optionally, the application may adopt a computer vision technology, a deep learning technology, and the like in the field of artificial intelligence, so as to implement identification of the target detection image in the original image, identification of the image category of the target detection image, and the like.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. For example, in the present application, it is considered that acquisition of a target detection image in an original image and detection position information of the target detection image, identification of an image type of the target detection image, and the like are all achieved by artificial intelligence.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The present application mainly relates to directions such as computer vision technology (for example, identification and acquisition of detection position information in an original image and a target detection image, etc.) and machine learning/deep learning (for example, training and use of a data classification model, etc.), wherein any one of the techniques in artificial intelligence may be used alone, or the techniques in artificial intelligence may be used in a random combination, for example, the computer vision technique may be used alone, or the computer vision technique and the deep learning technique may be used in combination, without limitation. Through the use of the related technology of artificial intelligence, the efficiency of data identification and classification in the application is improved.

Computer Vision technology (CV) generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also includes technologies of common biometric features such as face recognition, fingerprint recognition, and the like. Deep Learning (DL) is a new research direction in the field of Machine Learning (ML). Deep learning is a complex machine learning algorithm for learning the internal rules and the expression levels of sample data, has an effect on the aspects of voice and image recognition far exceeding the prior related technologies, and generally comprises technologies such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, formal education learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to technologies such as computer vision and deep learning in the field of artificial intelligence, and is specifically explained by the following embodiments:

in the embodiment of the present application, please refer to fig. 1, where fig. 1 is a network interaction architecture diagram for data classification provided in the embodiment of the present application, and the embodiment of the present application may be implemented by a computer device. The computer device 101 may obtain an original image from the terminal device, and identify the original image to obtain an image category to which the target detection image belongs in the original image. The terminal device may be the terminal device 102a, the terminal device 102b, or the terminal device 102c, the computer device may perform data interaction with any one terminal device, and data interaction may be performed between each terminal device. The computer device 101 may obtain an original image from the terminal device, and obtain detection position information and a target detection image corresponding to the detection position information from the original image; or the terminal equipment acquires the detection position information and a target detection image corresponding to the detection position information from the original image and sends the detection position information and the target detection image to the computer equipment; alternatively, the computer device may directly acquire the original image, and acquire the detection position information and the target detection image corresponding to the detection position information from the original image, and the like, which is not limited herein.

Specifically, please refer to fig. 2, fig. 2 is a diagram of a data classification model architecture according to an embodiment of the present application. As shown in fig. 2, the computer device obtains the coordinates of the detection position for extracting the attention object in the original image, obtains the visual identifier corresponding to the original image, and generates the detection position information 2012 according to the visual identifier and the coordinates of the detection position. The computer device may acquire a target detection image 2011 including the attention object from the original image based on the detection position coordinates, and input the target detection image 2011 and the detection position information 2012 into the data classification model 202, where the data classification model 202 includes a convolutional neural network 2021 and a multi-layer perceptron 2022. Specifically, the target detection image 2011 is input into the convolutional neural network 2021, and the image features of the target detection image 2011 are acquired based on the convolutional neural network 2021; the detection position information 2012 is input to the multilayer perceptron 2022, and the detection position information is subjected to scale conversion based on the multilayer perceptron 2022, so as to obtain the position feature of the target detection image 2011. The computer device performs feature stitching on the image features and the position features to obtain fusion features, may determine key features in the image features based on the position features, and performs classification processing on the fusion features based on the key features to obtain an image category to which the target detection image 2011 belongs. By combining the detection position information and the target detection image, the target detection image can be identified and detected based on the emphasis (namely key features) of the detection position information, and the corresponding position of the image category in the original image is utilized, so that the accuracy of data classification and identification can be improved.

It is understood that the computer device mentioned in the embodiments of the present application includes, but is not limited to, a terminal device or a server. In other words, the computer device may be a server or a terminal device, or may be a system of a server and a terminal device. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm-top computer, an Augmented Reality/Virtual Reality (AR/VR) device, a helmet-mounted display, a wearable device, a smart speaker, a digital camera, a camera, and other Mobile Internet Devices (MID) with network access capability, and the like, where the client has a display function. The above-mentioned server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Optionally, the data related in the embodiment of the present application may be stored in a computer device, or the data may be stored based on a cloud storage technology, which is not limited herein. The computer equipment can cache the original image or the detection position information and the target detection image after acquiring the original image or the detection position information and the target detection image, and process the cached original image or the detection position information and the target detection image based on a data classification period; or the obtained original image or the detection position information and the target detection image may be processed in real time, which is not limited herein.

Further, please refer to fig. 3, fig. 3 is a flowchart of a data classification method according to an embodiment of the present application. As shown in fig. 3, an original image is taken as an example for description, in other words, in the embodiment of the method described in fig. 3, the data classification process includes the following steps:

step S301, acquiring a visual identifier corresponding to the original image, identifying a detection position coordinate used for extracting the attention object in the original image, acquiring a target detection image containing the attention object from the original image based on the detection position information, and determining the detection position information corresponding to the original image according to the visual identifier and the detection position coordinate.

In the embodiment of the present application, the computer device may acquire an original image, acquire detection position coordinates for extracting an object of interest from the original image, and acquire a target detection image including the object of interest from the original image based on the detection position coordinates, where the object of interest refers to an object to be identified in the present application. The computer equipment can also obtain a visual identification corresponding to the original image, and determine the detection position information corresponding to the original image according to the visual identification and the detection position coordinates. For example, in the field of industrial defect detection, the object of interest may be an industrial defect; in the field of object recognition, the object of interest may be a person, an animal, an item, or the like; in other words, the corresponding attention object is different in different fields, and the attention object may be changed as needed. For example, the article corresponding to the original image has four visual angles, which respectively correspond to the visual angle 1, the visual angle 2, the visual angle 3, and the visual angle 4, when the visual angle 2 corresponding to the article is photographed to generate the original image, the visual identifier corresponding to the original image is the identifier of the visual angle 2, and the attention object is the shape of the article presented under the visual angle 2.

Further, the computer device may obtain an original image, perform object detection on the original image, obtain a first prediction frame including the attention object, and determine a detection position coordinate according to first position information of the first prediction frame; and determining a second prediction frame according to the detection position coordinates, and determining the area indicated by the second prediction frame in the original image as a target detection image.

Fig. 4a to 4b are schematic diagrams of an object recognition scene provided in an embodiment of the present application, where fig. 4a is a schematic diagram of an object recognition point determination scene provided in the embodiment of the present application, and fig. 4b is a schematic diagram of a frame determination scene provided in the embodiment of the present application. As shown in fig. 4a, a computer device may acquire an original image 401, divide the original image 401 into a plurality of equal identification frames, that is, acquire at least two identification frames, acquire a degree of association between each identification frame and an object of interest, determine an identification point based on the degree of association, and assume that three possible objects of interest are identified from the original image, and record the three possible objects of interest as an object of interest 1, an object of interest 2, and an object of interest 3, where the identification point corresponding to the object of interest 1 is determined to be an identification point 4021, the identification point corresponding to the object of interest 2 is determined to be an identification point 4022, and the identification point corresponding to the object of interest 3 is determined to be an identification point 4023. Further, as shown in fig. 4b, size transformation, shape transformation, and the like are performed on the identification points 4021 to obtain at least two associated borders for the possible attention object 1, including but not limited to the associated border 4031, the associated border 4032, the associated border 4033, and the like; performing size transformation, shape transformation and the like on the identification points 4022 to obtain at least two associated frames for the possible attention object 2, including but not limited to the associated

frames

4041, 4042 and the like; the identification points 4023 are subjected to size transformation, shape transformation, and the like, so as to obtain at least two associated frames for the possible attention object 3, including but not limited to the associated frame 4051, the associated frame 4052, and the like. Obtaining the confidence degrees corresponding to the associated border 4031, the associated border 4032 and the associated border 4033 respectively, and determining the associated border 4032 as a first predicted border corresponding to the attention object 1 on the assumption that the confidence degree of the associated border 4032 > the confidence degree of the associated border 4031 > the confidence degree of the associated border 4033; obtaining the confidence degrees corresponding to the associated border 4041 and the associated border 4042, and assuming that the confidence degree of the associated border 4042 is greater than the confidence degree of the associated border 4041, determining the associated border 4042 as a first predicted border corresponding to the attention object 2; obtaining the confidence degrees corresponding to the associated border 4051 and the associated border 4052, and assuming that the confidence degree of the associated border 4052 is greater than the confidence degree of the associated border 4051, determining the associated border 4052 as a first predicted border corresponding to the attention object 3. According to the first prediction frame of the attention object 1, the first position coordinate of the attention object 1 is determined, the detection position coordinate of the attention object 1 is determined according to the first position information, and similarly, the detection position coordinate of the attention object 2 and the detection position coordinate of the attention object 3 are obtained. The confidence is used for representing the probability of the attention object appearing in the corresponding associated frame.

Optionally, the computer device may acquire the original image, generate at least two preselected frames based on the original image, wherein overlapping portions may occur between two of the at least two preselected frames, acquire a confidence of each preselected frame, and determine the preselected frame with the confidence greater than a preset confidence threshold as the first predicted frame. For example, in the field of industrial defect detection, where the object of interest is an industrial defect, the computer device may obtain a confidence level for each preselected frame, the confidence level being indicative of a likelihood that the industrial defect is present in the corresponding preselected frame, and determine a first predicted bounding box from among the at least two preselected frames that the industrial defect is likely to be present based on the confidence level for each preselected frame.

Optionally, the computer device may further identify the original image based on the target detection model, identify a detection position coordinate used for extracting the attention object in the original image, and acquire the target detection image containing the attention object from the original image based on the detection position coordinate. The target detection model may be a fast region Convolutional Neural network (fast R-CNN), a Single polygonal box Detector (SSD), or a Single detection model (youonly Look one, YOLO), and the like, which is not limited herein.

Further, after the computer device acquires the first predicted frame in the original image, the computer device may determine the detected position coordinate according to the first position information of the first predicted frame. Specifically, the computer device may determine a frame width and a frame height of the first predicted frame according to the first position information of the first predicted frame; and adjusting the size of the first prediction frame based on the frame width and the frame height, and determining detection position information according to the adjusted first prediction frame. The Region indicated by the detected position coordinates in the original image may be referred to as a Region of Interest (ROI), that is, a Region to be identified in the present application, where a sufficiently moderate background is generally required to contrast an object of Interest, and therefore, the size of the first prediction frame may be adjusted, so that the Region indicated by the adjusted first prediction frame may contain the object of Interest and the background used for contrasting the object of Interest, thereby improving the accuracy of data identification and classification. Optionally, the data classification model uses a full connection layer to obtain an image category to which the target detection image belongs, so that the size of the obtained target detection image can be made to conform to the size that can be identified by the data classification model by performing size adjustment, shape adjustment and the like on the first prediction frame, so as to ensure that the target detection image is not distorted when being zoomed in the data classification model, thereby improving the accuracy of data identification and classification. Based on this, when the computer device performs size adjustment on the first prediction frame, the first prediction frame can be converted into a square, and then the square is zoomed, so as to obtain a detection position coordinate; alternatively, the computer device may convert the first predicted bounding box into a square, and obtain the detected position coordinates based on the square; alternatively, the computer device may scale the first predicted bounding box to obtain the detected position coordinates.

Taking the example that the computer device converts the first prediction frame into a square and then scales the square to obtain the detection position coordinate, the computer device can adjust the size of the first prediction frame according to the difference between the frame width and the frame height and the first position information, and determine the second position information and the adjusted width and height of the adjusted first prediction frame; and acquiring a frame expansion coefficient, carrying out size transformation on the adjustment width and the adjustment height based on the frame expansion coefficient, and determining a detection position coordinate according to the second position information and the transformed adjustment width and adjustment height.

Wherein the first position information of the first prediction frame is recorded as

Representing the position coordinates of the upper left vertex of the first predicted bounding box in the original image,

and the position coordinates of the lower right vertex of the first prediction frame in the original image are represented. Specifically, please refer to fig. 5, and fig. 5 is a schematic view of a target detection image acquisition scene according to an embodiment of the present application. As shown in fig. 5, after the computer device performs object detection on the original image 501, a first predicted frame 5011 including an object of interest is obtained, and four vertices of the first predicted frame 5011 are respectively denoted by "a" and "b^d、b^d、c^dAnd d^dWherein, the upper left vertex a^dThe position coordinates in the original image 501 are

Lower right vertex c^dThe position coordinates in the original image 501 are

The frame width w of the first predicted frame 5011 is determined according to the first position information of the first predicted frame 5011^dAnd the height h of the frame^dWherein the width w of the frame^dCan be seen from formula (1), the height h of the frame^dSee equation (2):

wherein, if the frame height h^dIs greater than the width w of the frame^dThen to the frame width w^dCarrying out expansion; if the width w of the frame^dIs greater than the height h of the frame^dThen, for the frame height h^dPerforming expansion, wherein the number of pixels expanded is

abs refers to an operation that takes the absolute value. If the frame height h is as shown in FIG. 5^dIs greater than the width w of the frame^dThen the width w of the frame is adjusted^dP pixels are respectively expanded to both sides to obtain second position information and the adjusted width and height of the adjusted first prediction frame 502. Wherein the second position information is recorded as (x'_min,y′_min,x′_max,y′_max) Then, the second location information may be obtained as shown in equations (3) to (6):

determining the adjusted width and the adjusted height of the adjusted first predicted frame 502 according to the second position informationThe adjustment width is denoted by w ', the adjustment height is denoted by h ', the four vertices of the adjusted first prediction frame 502 are denoted by a ', b ', c ', and d ', respectively, and the position coordinate of the top left vertex of the adjusted first prediction frame 502 is (x '_min,y′_min) The position coordinate of the lower right vertex of the adjusted first prediction frame 502 is (x'_max,y′_max). The obtaining manner of the adjustment width w 'can be shown in formula (7), and the obtaining manner of the adjustment height h' can be shown in formula (8):

w′＝x′_max-x′_min+1 (7)

h′＝y′_max-y′_min+1 (8)

further, the computer device may obtain a frame expansion coefficient, perform size transformation on the adjustment width w 'and the adjustment height h' based on the frame expansion coefficient, and record the frame expansion coefficient as r, where the frame expansion coefficient r is a trainable parameter determined in the process of training the data classification model, and after a test, when the frame expansion coefficient r is 0.4, the performance of the data classification model is optimal. Of course, with continuous optimization of the data classification model, the value of the frame expansion coefficient r may also be updated, so that the data classification model may obtain better performance. Wherein the detection position information is determined based on the second position information and the converted adjustment width w 'and adjustment height h', and the detection position information is written as (x)_min,y_min,x_max,y_max) The detection position information may be obtained in a manner shown in equations (9) to (12):

x_min＝x′_min-w′*r (9)

y_min＝y′_min-h′*r (10)

x_max＝x′_max+w′*r (11)

y_max＝y′_max+h′*r (12)

wherein, a second prediction frame 503 is determined according to the detection position information, and four vertexes of the second prediction frame 503 are divided intoA, b, c, and d, respectively, and the position coordinate of the top left vertex a of the second prediction frame 503 is (x)_min,y_min) The position coordinate of the lower right vertex c of the second prediction frame 503 is (x)_max,y_max) The width of the second prediction box 503 may be denoted as w and the height may be denoted as h. The area indicated by the second prediction frame 503 in the original image 501 is determined as a target detection image 504, and the image width of the target detection image 504 is w and the image height is h.

Optionally, if the second prediction frame 503 exceeds the image boundary of the original image 501 in the original image 501, assuming that the original image 501 uses the top-left vertex as the origin of coordinates, and the width of the original image 501 is w_pHeight of h_pE.g. when x_min<0, or, y_min<0, or, x_max>w_pOr, y_max>h_pAnd when the second prediction frame 503 exceeds the image boundary of the original image 501, recording the area exceeding the image boundary of the original image 501 in the area indicated by the second prediction frame 503 as an out-of-bounds area, and adding a default value in the out-of-bounds area to obtain the target detection image.

Optionally, the computer device may generate the detection position information according to the visual identifier and the detection position coordinate, and the computer device may determine the detection position coordinate according to the second position information and the converted adjustment width and adjustment height, where the detection position coordinate is (x)_min,y_min,x_max,y_max) The computer device can obtain the visual identification and determine the detection position information according to the visual identification and the detection position coordinates. The visual identifier may be a visual identifier corresponding to the original image, for example, all cameras participating in shooting are numbered, for example, the numbers corresponding to the N cameras are 1, 2, 3, …, and N, respectively, the number of the camera associated with the vision corresponding to the original image may be determined as the visual identifier of the original image, for example, if the original image is shot by the camera 2, the visual identifier of the original image may be determined as 2, the detection position information may be determined according to the visual identifier and the detection position coordinates, and the detection position information may be detectedThe position information is (2, x)_min,y_min,x_max,y_max). Wherein N is a positive integer. Optionally, the visual identifier may also be determined according to the visual position, for example, one object is photographed from N visual positions to obtain N original images, where the N original images include an original image i, the visual identifier of the original image i is an identifier corresponding to the visual position i, i is a positive integer, and i is less than or equal to N. The above two optional ways of determining the visual identifier are used, and other identifiers that may represent the shooting visual identifier of the original image may also be used as the visual identifier of the original image, which is not limited herein. In which there may be a plurality of original images, in the embodiment shown in fig. 3, one original image is taken as an example for description.

Step S302, obtaining the image characteristics of the target detection image, and carrying out scale transformation on the detection position information to obtain the position characteristics of the target detection image.

In this embodiment of the application, the computer device may input the target detection image into a Convolutional Neural Network (CNN) in the data classification model, and perform feature extraction on the target detection image based on a Convolutional layer in the CNN to obtain an image feature of the target detection image. The CNN may be any neural network capable of acquiring image features, such as a ResNet18 network, and the like, without limitation, wherein the numbers in the ResNet18 network indicate the depth of the network, and refer to layers with weights, including convolutional layers and full-link layers, and not including pooling layers and Batch Normalization (BN) layers, and the like.

The detection position information may include a visual identifier and detection position coordinates. Carrying out scale transformation on the detection position information to obtain the position characteristics of the target detection image, and the method comprises the following steps: acquiring the image width and the image height of a target detection image, and normalizing the detection position coordinate based on the image width and the image height to obtain a normalized position coordinate; acquiring a total number of the vision, and performing normalization processing on the vision identification based on the total number of the vision to obtain a normalized vision identification; and generating a perception input feature according to the normalized position coordinate and the normalized visual identification, and performing scale transformation on the perception input feature by adopting a multilayer perceptron in the data classification model to obtain the position feature of the target detection image. The scale is a guide rope and can represent a standard for seeing objects, and the visual identification and the detection position coordinate have different scales, namely the visual identification and the detection position coordinate have different measurement standards, so that if the visual identification and the detection position coordinate are directly combined and input into the data classification model for prediction, the prediction of the data classification model is difficult, and inaccurate prediction can be caused. The normalization process of the visual identifier can be shown in formula (13):

wherein the ID_camRefers to the visual identification of the original image, and the visual identification is taken as the camera number corresponding to the original image, i.e. ID_camAny value from 1 to N can be taken, N is the total number of the vision, the normalization processing is carried out on the vision identification based on the total number of the vision to obtain the normalized vision identification

Further, the normalization process of the detected position coordinates can be seen from the formula (14) to the formula (17):

where w is the image width of the target detection image, h is the image height of the target detection image, (x)_min,y_min,x_max,y_max) For detecting the position coordinate, normalizing the position coordinate based on the image width and the image height to obtain a normalized position coordinate

Generating perception input characteristics according to the normalized position coordinates and the normalized visual identification

And (3) carrying out scale transformation on the perception input features by adopting a Multi-layer Perceptron (MLP) in the data classification model to obtain the position features of the target detection image.

The image features are assumed to be 512-dimensional features, the detection position information is 5-dimensional features (namely visual identification and detection position coordinates), large scale difference exists between the two features, the dimension difference is large, and the position features can be almost ignored when being predicted in the data classification model. Further, the computer device may obtain image dimension information of the image feature, determine a position dimension range according to the image dimension information, perform scale transformation on the perception input feature by using MLP, and obtain the position feature of the target detection image based on the position dimension range, where the position dimension information of the position feature belongs to the position dimension range. For example, the position dimension range is a first ratio (e.g. 1/4) larger than the image dimension information, and is smaller than the image dimension information, that is, the ratio of the position dimension information of the position feature to the image dimension information is larger than the first ratio (1/4) and smaller than 1, because the feature of the image itself is more important when the target detection image is subjected to identification and classification, and the scale adaptation degree between the position feature and the image feature is improved under the condition that the image feature is guaranteed to be dominant in the prediction process, and the perceptual input feature can be subjected to a certain degree of scale transformation based on the position dimension range. Optionally, the position dimension range may also be updated as needed, for example, when the importance degree of the position feature is higher than the importance degree of the image feature in the prediction process of the target detection image, the position dimension range may be larger than the image dimension information and smaller than a second percentage (e.g., 5/4) of the image dimension information, or the position dimension range may be larger than the first percentage of the image dimension information and smaller than the second percentage of the image dimension information, which is not limited herein. For example, in the field of industrial defect detection, in order to ensure that an industrial defect can be detected, the importance degree of the image feature may be considered to be greater than that of the position feature, and the position dimension range may be a first percentage greater than the image dimension information and smaller than the image dimension information.

Step S303, performing feature splicing on the image features and the position features to obtain fusion features, determining key features in the image features based on the position features, and performing classification processing on the fusion features based on the key features to obtain the image category to which the target detection image belongs.

In the embodiment of the application, the computer device can perform normalization processing on the image features to obtain normalized image features, and perform normalization processing on the position features to obtain normalized position features; and carrying out feature splicing on the normalized image features and the normalized position features to obtain fusion features. By carrying out normalization processing on the image features and the position features, the scale difference between the image features and the position features is further reduced, and the prediction accuracy of the data classification model is improved. The process of normalizing the image features can be shown in formula (18):

wherein, F_CNNRepresenting image features, normalization, and noting the position feature as F_MLPThe process of normalizing the position features can be seen in equation (19):

further, the computer device can perform feature splicing on the normalized image features and the normalized position features to obtain fusion features. Optionally, the computer device may also perform feature stitching on the normalized image features and the normalized position features, and perform normalization processing on the stitching result to obtain a fusion feature, in this case, the obtaining manner of the fusion feature may be shown in formula (20):

wherein concat represents a feature splicing operation, wherein the normalization algorithm may be an L1 normalization algorithm, an L2 normalization algorithm, or a linear function normalization algorithm, which is not limited herein. Wherein the input characteristic of the normaize is defined as X ═ (X)₁,x₂,…,x_m) Taking the L2 normalization algorithm as an example, the normalization algorithm can be shown in formula (21):

where m is a positive integer, m may represent the dimension of the input feature, e.g., m represents the dimension of the image feature in equation (18), m represents the dimension of the location feature in equation (19), and m may represent the dimension of the fused feature in equation (20).

Further, the computer device may obtain a scale factor in a classifier of the data classification model, and perform scale transformation on the fusion feature based on the scale factor, wherein the scale factor is obtained by training the classifier in the data classification model; in other words, the key recognition refers to performing key recognition on the features corresponding to the key features, for example, increasing the weight of the features corresponding to the key features. The normalized feature has a modular length of 1, and after the fusion feature is input into the classifier, if the fusion feature is directly predicted by using an activation function, the problems of difficult model training and non-convergence of the model caused by too small output value may occur, so that a scale factor s can be added into the classifier of the data classification model, and then the fusion feature is predicted based on the classifier, thereby improving the accuracy of data identification classification. If the value of the scale factor s is too small, performing scale transformation on the fusion features based on the scale factor s, and not helping training of a data classification model; if the value of the scale factor s is too large, the prediction probability value of the output prediction label is too large, even approaching 1, and the problem of excessive confidence of the classification result is easily caused. Therefore, the scale factor s can be defined as a learnable parameter instead of a manually adjusted hyper-parameter, so that the scale factor s can optimize the data classification model and improve the accuracy of the data classification model in identifying and classifying data. In the training process of the data classification model, the scale factor s is learned, and through experiments, when the value of the scale factor s is between 2 and 5, the data classification model has a good effect, and optionally, when the data classification model is continuously learned and optimized, the value range of the scale factor s can be optimized and updated.

When the computer equipment classifies the fusion features to obtain the image category to which the target detection image belongs, the computer equipment can directly classify the fusion features based on the key features without carrying out scale transformation on the fusion features to obtain at least two prediction labels and the prediction probability value of each prediction label; and determining the prediction label with the maximum prediction probability value as the image category to which the target detection image belongs. Or, the computer device may classify the scale-transformed fusion features based on the classifier to obtain at least two prediction labels and a prediction probability value of each prediction label, and determine the prediction label with the maximum prediction probability value as the image category to which the target detection image belongs.

Further, if the image type belongs to the image abnormal type, acquiring a communication mode associated with the image type; sending an article abnormal message to the terminal equipment based on the communication mode so that the terminal equipment detects the article corresponding to the original image based on the article abnormal message; the item exception message includes an image category. For example, referring to fig. 6, fig. 6 is a schematic view of an image recognition scene provided in an embodiment of the present application. As shown in fig. 6, the computer device inputs a target detection image 601 into a convolutional neural network of a data classification model 602, and performs feature extraction on the target detection image 601 based on the convolutional neural network to obtain an image feature of the target detection image 601; the detected position information is input into the multilayer perceptron of the data classification model 602, and the detected position information is subjected to scale transformation based on the multilayer perceptron, so as to obtain the position characteristics of the target detection image. The computer device performs feature splicing on the image features and the position features to obtain fusion features 6021, classifies the fusion features 6021 by a classifier in the data classification model 602 to obtain at least two prediction labels 6022 and a prediction probability value of each prediction label, and determines the prediction label with the maximum prediction probability value as an image category 603 to which the target detection image belongs. It is assumed that the at least two prediction labels 6022 include (K +1) prediction labels, which are respectively denoted as (prediction label 1, prediction labels 2, …, prediction label K, and prediction label (K +1)), where the at least two prediction labels 6022 include a normal label and K kinds of image abnormality labels, and if the image category 603 belongs to the K kinds of image abnormality labels, it indicates that the image category 603 belongs to the image abnormality category, and sends an article abnormality message to the terminal device 604. Taking the industrial defect detection field as an example, assuming that the K types of image abnormal tags are respectively an industrial crack, an inclusion, an pressed oxide skin, a pit, a plaque, oil stain interference and the like, assuming that the image type 603 is the image abnormal tag corresponding to the "industrial crack", acquiring a communication mode for processing the "industrial crack", sending an article abnormal message to a terminal device indicated by the communication mode based on the communication mode, so that the terminal device detects an article corresponding to an original image based on the article abnormal message, and if the terminal device is an automatic processing device, detecting the article corresponding to the original image by the terminal device and repairing the "industrial crack"; or, the communication mode is associated with a worker, and after the worker acquires the article abnormal message from the terminal device, the article corresponding to the original image is detected and repaired.

Further, please refer to fig. 7, and fig. 7 is a schematic flowchart of a data classification model training method according to an embodiment of the present application. As shown in fig. 7, the method includes the steps of:

step S701, obtaining a detection image sample, a sample visual identification of the detection image sample, a detection sample position coordinate and a target sample label, and determining detection sample position information of the detection image sample according to the sample visual identification and the detection sample position coordinate.

In this embodiment of the present application, a computer device obtains a training sample, where the training sample may include a detection image sample, a sample visual identifier of the detection image sample, a detection sample position coordinate, and a target sample label, and determines detection sample position information of the detection image sample according to the sample visual identifier and the detection sample position coordinate, where the target sample label is used to represent a sample label actually corresponding to the detection image sample. Optionally, the computer device may perform data splicing on the sample visual identifier and the detection sample position coordinate to obtain the detection sample position information.

Step S702, an initial convolutional neural network in the initial data classification model is adopted to obtain sample image characteristics of the detected image sample, and the initial multilayer perceptron in the initial data classification model is adopted to carry out scale transformation on the position information of the detected sample to obtain the sample position characteristics of the detected image sample.

In this embodiment of the application, the process of obtaining the sample image features may refer to the process of obtaining the image features in step S302 in fig. 3, and the process of obtaining the sample position features may refer to the process of obtaining the position features in step S302 in fig. 3, which is not described herein again.

Step S703, performing feature splicing on the sample image features and the sample position features to obtain sample fusion features, determining sample key features of the sample image features based on the sample position features, and performing classification processing on the sample fusion features based on the sample key features to obtain image sample categories to which the detected image samples belong.

In the embodiment of the application, the computer device can classify the sample fusion features based on the sample key features to obtain at least two sample labels and a sample prediction probability value of each sample label; the at least two sample tags comprise a target sample tag; and determining the sample label with the maximum sample prediction probability value as the image sample class to which the detected image sample belongs. The process can be specifically referred to as step S303 in fig. 3.

Step S704, training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain the data classification model.

In the embodiment of the present application, the loss function includes a first loss function and a second loss function. The computer equipment can generate a first loss function according to the image sample category and the target sample label; generating a label distribution function according to a target sample label, generating a prediction distribution function according to at least two sample labels and the sample prediction probability value of each sample label, and generating a second loss function according to the label distribution function and the prediction distribution function; and training the initial data classification model according to the first loss function and the second loss function to obtain the data classification model. Wherein the first loss function can be seen in equation (22):

L1＝CrossEntroyLoss(logits) (22)

for example, if there are C sample labels to be classified, the logits is a vector containing C elements, and crossentryloyloss is a first loss function used for performing classification learning on the data classification model, where the first loss function may be a cross entropy loss function or the like, which is not limited herein. Wherein the second loss function can be seen in equation (23):

L2＝KLDivLoss(logits) (23)

the KLDivLoss is a second loss function used for performing distribution learning on the data classification model, and the second loss function may be a KL divergence (Kullback-Leibler divergence) loss function used for calculating a difference degree between two random variables. Wherein the loss function can be seen in equation (24):

L＝L1+λ*L2 (24)

the λ is a weight coefficient between the first loss function and the second loss function, and the weight coefficient may be an empirical value or determined by optimization during the process of training the data classification model.

Further, the process of training the data classification model specifically includes the following steps:

in a model deployment stage of the data classification model, firstly, a loss function layer is removed, steps S701 to S703 are executed, and logits are converted into probability distribution through an initial classifier, so that at least two sample labels and a sample prediction probability value of each sample label are obtained, and the at least two sample labels and the sample prediction probability value of each sample label can be represented through a vector probs. Wherein the classifier can convert logits into probability distributions by an activation function, which can be a logistic regression (softmax) function, as shown in equation (25):

probs＝softmax(logits) (25)

determining the sample label with the maximum sample prediction probability value as the image sample class to which the detected image sample belongs, wherein the process is shown as a formula (26):

label＝argmax(probs) (26)

wherein argmax represents that an index corresponding to a maximum element in the vector is taken, that is, a sample label corresponding to a maximum sample prediction probability value is determined as an image sample class label to which the detected image sample belongs, and a sample prediction probability value corresponding to the image sample class label is a confidence of the image sample class, which can be expressed as formula (27):

score＝probs[label] (27)

where probs [ label ] represents the corresponding value of the image sample class label in the vector probs.

Further, the computer device may set an initial distribution function, and if the distribution of at least two sample labels is ordered (e.g., age prediction of human face, etc.), the initial distribution function may be a conventional probability distribution function, such as a gaussian function, etc., and the sample class is substituted into the initial distribution function to generate a sample distribution function; and obtaining a probability mean value and a probability variance of the sample prediction probability values of at least two sample labels and each sample label, and determining a training distribution function according to the probability mean value and the probability variance. If the distribution of the at least two sample labels is unordered, counting the number of the sample labels in the training set respectively serving as the image sample categories, determining the training proportion of each sample label in the training set according to the number of each sample label respectively serving as the image sample categories, and taking the training proportion of each sample label in the training set as the training distribution function; for example, assuming that there are 4 sample labels, in the training set, sample label 1 is taken as the image sample category 10 times, sample label 2 is taken as the image sample category 30 times, sample label 3 is taken as the image sample category 35 times, and sample label 4 is taken as the image sample category 25 times, the training distribution function may be determined to be (0.1, 0.3, 0.35, 0.25). Determining a first loss value between the image sample class and the target sample label through a first loss function, determining a second loss value between the sample distribution function and a training distribution function through a second loss function, determining a total loss value between the first loss function and the second loss function through a formula (24), and training the initial data classification model based on the total loss value to obtain the data classification model.

In the embodiment of the application, the computer equipment can train the data classification model by combining classification learning and distribution learning, the predicted class confidence of the data classification model is truly reflected through the classification learning, the model can be stably trained through the distribution learning guarantee, and the training reliability of the data classification model is improved.

Referring to table 1, table 1 shows performance of a defect classification, where the first performance is a performance when the visual identifier and the ROI detection position coordinate are not used, and the second performance is a performance when the visual identifier and the detection position coordinate are fused in this application, and a larger index value indicates a better performance.

TABLE 1

It can be seen that the performance of the method is improved to a certain extent on the basis of the prior art, and the accuracy of data identification and classification can be improved.

The training process and the prediction process of the data classification model can be executed by the same computer device or different computer devices.

Further, please refer to fig. 8, fig. 8 is a schematic diagram of a data classification apparatus according to an embodiment of the present application. The data sorting means may be a computer program (comprising program code, etc.) running on a computer device, for example the data sorting means may be an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 8, the data classification apparatus 800 may be used in a computer device in the embodiment corresponding to fig. 3, and specifically, the apparatus may include: an input acquisition module 11, a feature acquisition module 12 and a data classification module 13.

The input acquisition module 11 is configured to acquire a visual identifier corresponding to an original image, identify a detection position coordinate used for extracting an attention object in the original image, acquire a target detection image including the attention object from the original image based on the detection position coordinate, and determine detection position information corresponding to the original image according to the visual identifier and the detection position coordinate; the visual identification is used for representing the visual angle of the attention object in the article corresponding to the original image;

the feature obtaining module 12 is configured to obtain image features of a target detection image, and perform scale transformation on detection position information to obtain position features of the target detection image;

and the data classification module 13 is configured to perform feature splicing on the image features and the position features to obtain fusion features, determine key features in the image features based on the position features, and perform classification processing on the fusion features based on the key features to obtain an image category to which the target detection image belongs.

Wherein, the input obtaining module 11 includes:

a position determining unit 111, configured to obtain an original image, perform object detection on the original image, obtain a first prediction frame including an attention object, and determine a detection position coordinate according to first position information of the first prediction frame;

and an image determining unit 112, configured to determine a second prediction frame according to the detection position coordinates, and determine an area indicated by the second prediction frame in the original image as the target detection image.

Wherein, in determining the detection position coordinates based on the first position information of the first prediction frame, the position determination unit 111 includes:

a position obtaining subunit 1111, configured to determine, according to first position information of the first predicted frame, a frame width and a frame height of the first predicted frame;

a size adjusting sub-unit 1112, configured to perform size adjustment on the first predicted frame based on the frame width and the frame height, and determine the detection position coordinate according to the adjusted first predicted frame.

The size adjustment subunit 1112 includes:

a position adjusting subunit 111a, configured to perform size adjustment on the first predicted frame according to the difference between the frame width and the frame height and the first position information, and determine the second position information and an adjusted width and an adjusted height of the adjusted first predicted frame;

and a size conversion subunit 111b, configured to obtain a frame expansion coefficient, perform size conversion on the adjustment width and the adjustment height based on the frame expansion coefficient, and determine a detection position coordinate according to the second position information and the converted adjustment width and adjustment height.

In terms of acquiring image features of the target detection image, the feature acquisition module 12 is specifically configured to:

In the aspect of performing scale transformation on the detected position information to obtain the position feature of the target detection image, the feature obtaining module 12 includes:

a position normalization unit 121, configured to obtain an image width and an image height of the target detection image, and perform normalization processing on the detection position coordinates based on the image width and the image height to obtain normalized position coordinates;

the visual normalization unit 122 is configured to obtain a total visual number, and perform normalization processing on the visual identifier based on the total visual number to obtain a normalized visual identifier;

and the position feature obtaining unit 123 is configured to generate a perception input feature according to the normalized position coordinates and the normalized visual identifiers, and perform scale transformation on the perception input feature by using a multilayer perceptron in the data classification model to obtain a position feature of the target detection image.

Wherein, in the aspect of performing feature splicing on the image features and the position features to obtain fusion features, the data classification module 13 includes:

the feature normalization unit 131 is configured to perform normalization processing on the image features to obtain normalized image features, and perform normalization processing on the position features to obtain normalized position features;

and the feature splicing unit 132 is configured to perform feature splicing on the normalized image features and the normalized position features to obtain fusion features.

In the aspect of classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs, the data classification module 13 includes:

the factor processing unit 133 is configured to obtain a scale factor in a classifier of the data classification model, and perform scale transformation on the fusion feature based on the scale factor; the scale factor is obtained by training a classifier in the data classification model;

the category determining unit 134 is configured to perform key identification on features corresponding to key features in the fusion features after the scale transformation based on the classifier, and perform classification processing on the fusion features after the scale transformation based on a key identification result to obtain an image category to which the target detection image belongs.

a result obtaining unit 135, configured to classify the fusion features based on the key features to obtain at least two prediction tags and a prediction probability value of each prediction tag;

the class determining unit 134 is configured to determine the prediction label with the largest prediction probability value as the image class to which the target detection image belongs.

Wherein the apparatus 800 further comprises:

the communication acquisition module 14 is configured to acquire a communication mode associated with the image category if the image category belongs to the image abnormal category;

the message sending module 15 is configured to send an article exception message to the terminal device based on the communication mode, so that the terminal device detects an article corresponding to the original image based on the article exception message; the item exception message includes an image category.

The embodiment of the application provides a data classification device, which acquires a visual identifier corresponding to an original image, identifies a detection position coordinate used for extracting an attention object in the original image, acquires a target detection image containing the attention object from the original image based on the detection position coordinate, and determines detection position information corresponding to the original image according to the visual identifier and the detection position coordinate; the visual identification is used for representing the visual angle of the attention object in the article corresponding to the original image; acquiring image characteristics of a target detection image, and carrying out scale transformation on detection position information to obtain position characteristics of the target detection image; and performing feature splicing on the image features and the position features to obtain fusion features, determining key features in the image features based on the position features, and classifying the fusion features based on the key features to obtain the image category to which the target detection image belongs. By combining the target detection image and the detection position information of the target detection image in the original image, the target detection image can be identified and classified according to the position information, so that the identification emphasis of the detection image is different under different position information, the position corresponding to the acquired image category can be obtained, and the accuracy of data identification and classification is improved.

Further, please refer to fig. 9, fig. 9 is a schematic diagram of another data classification apparatus provided in the embodiment of the present application. The data sorting means may be a computer program (comprising program code, etc.) running on a computer device, for example the data sorting means may be an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 9, the data classification apparatus 900 may be used in the computer device in the embodiment corresponding to fig. 7, and specifically, the apparatus may include: a sample acquisition module 16, a sample feature acquisition module 17, a sample classification module 18, and a model training module 19.

The sample acquisition module 16 is configured to acquire a detection image sample, a sample visual identifier of the detection image sample, a detection sample position coordinate, and a target sample label, and determine detection sample position information of the detection image sample according to the sample visual identifier and the detection sample position coordinate; the sample visual identification is used for representing the visual angle of the detection image sample in the article corresponding to the detection image sample;

a sample feature obtaining module 17, configured to obtain a sample image feature of the detected image sample by using an initial convolutional neural network in an initial data classification model, and perform scale transformation on the position information of the detected sample by using an initial multilayer perceptron in the initial data classification model to obtain a sample position feature of the detected image sample;

the sample classification module 18 is configured to perform feature splicing on the sample image features and the sample position features to obtain sample fusion features, determine sample key features of the sample image features based on the sample position features, and perform classification processing on the sample fusion features based on the sample key features to obtain image sample categories to which the detected image samples belong;

and the model training module 19 is configured to train the initial data classification model based on the loss function between the image sample class and the target sample label to obtain a data classification model.

In the aspect of classifying the sample fusion features based on the sample key features to obtain the image sample category to which the detected image sample belongs, the sample classification module 18 includes:

the probability obtaining unit 181 is configured to classify the sample fusion features based on the sample key features to obtain at least two sample labels and a sample prediction probability value of each sample label; the at least two sample tags comprise a target sample tag;

a sample class determination unit 182, configured to determine the sample label with the largest sample prediction probability value as the image sample class to which the detected image sample belongs.

the model training module 19 includes:

a first loss generating unit 191 configured to generate a first loss function according to the image sample class and the target sample label;

a second loss generating unit 192, configured to generate a label distribution function according to the target sample label, generate a prediction distribution function according to at least two sample labels and the sample prediction probability value of each sample label, and generate a second loss function according to the label distribution function and the prediction distribution function;

the model generating unit 193 is configured to train the initial data classification model according to the first loss function and the second loss function, so as to obtain a data classification model.

The embodiment of the application provides a data classification device, which can be used for training a data classification model by combining classification learning and distribution learning, truly reflects the predicted class confidence of the data classification model through the classification learning, ensures the model to be stably trained through the distribution learning and improves the training reliability of the data classification model.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer device in the embodiment of the present application may include: one or more processors 1001, memory 1002, and input-output interface 1003. The processor 1001, the memory 1002, and the input/output interface 1003 are connected by a bus 1004. The memory 1002 is used for storing a computer program, which includes program instructions, and the input/output interface 1003 is used for receiving data and outputting data, such as data interaction between a computer device and a terminal device; the processor 1001 is used to execute program instructions stored by the memory 1002.

The processor 1001 is configured to perform the following operations when performing prediction by using a data classification model:

The processor 1001 is configured to perform the following operations when training a data classification model:

In some possible embodiments, the processor 1001 may be a Central Processing Unit (CPU), and the processor may be other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1002 may include both read-only memory and random-access memory, and provides instructions and data to the processor 1001 and the input/output interface 1003. A portion of the memory 1002 may also include non-volatile random access memory. For example, the memory 1002 may also store device type information.

In a specific implementation, the computer device may execute, through each built-in functional module thereof, the implementation manner provided in each step in fig. 3 or fig. 7, which may be referred to specifically for the implementation manner provided in each step in fig. 3 or fig. 7, and is not described herein again.

The embodiment of the present application provides a computer device, including: the system comprises a processor, an input/output interface and a memory, wherein the processor acquires a computer program in the memory, executes each step of the method shown in the figure 3 and performs data classification operation. The method and the device for extracting the object of interest achieve the purposes that detection position information used for extracting the object of interest in an original image is obtained, and a target detection image containing the object of interest is obtained from the original image based on the detection position information; acquiring image characteristics of a target detection image, and carrying out scale transformation on detection position information to obtain position characteristics of the target detection image; and performing feature splicing on the image features and the position features to obtain fusion features, and performing classification processing on the fusion features to obtain the image category to which the target detection image belongs. By combining the target detection image and the detection position information of the target detection image in the original image, the target detection image can be identified and classified according to the position information, so that the identification emphasis of the detection image is different under different position information, the position corresponding to the acquired image category can be obtained, and the accuracy of data identification and classification is improved.

The embodiment of the present application provides a computer device, including: the data classification model training method comprises a processor, an input/output interface and a memory, wherein the processor acquires a computer program in the memory, executes each step of the method shown in the figure 7 and carries out training operation of the data classification model. The embodiment of the application realizes the training process of the data classification model, the data classification model is trained by combining classification learning and distribution learning, the predicted class confidence of the data classification model is truly reflected by the classification learning, the model can be stably trained by the distribution learning guarantee, and the training reliability of the data classification model is improved.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, where the computer program is suitable for being loaded by the processor and executing the data classification method provided in each step in fig. 3, and for details, reference may be made to implementation manners provided in each step in fig. 3, and details are not described here again. Alternatively, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program is suitable for being loaded by the processor and executing the data classification method provided in each step in fig. 7, which may specifically refer to an implementation manner provided in each step in fig. 7, and is not described herein again. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. By way of example, a computer program can be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network.

The computer-readable storage medium may be the data classification apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the method provided in the various optional manners in fig. 3, thereby implementing that the channel feature map corresponding to the multimedia data is obtained based on the channel attention mechanism, the channel feature map is processed based on the spatial attention mechanism, a spatial feature map of the multimedia data is obtained, the predicted media quality of the multimedia data is determined according to the spatial feature map, and the attention to the distortion and degradation region in the multimedia data is implemented in combination with the channel attention mechanism and the spatial attention mechanism, thereby improving the accuracy of quality prediction. The quality prediction model in the application is trained on the basis of end-to-end, so that the quality prediction model can be optimized in an iterative process, meanwhile, the simplicity of the convolution attention module is kept, and the prediction speed is improved.

The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the specification for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data sorting apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data sorting apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data sorting apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data sorting apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of data classification, the method comprising:

acquiring image characteristics of the target detection image, and carrying out scale transformation on the detection position information to obtain position characteristics of the target detection image;

2. The method of claim 1, wherein the identifying detection position coordinates in the original image for extracting an object of interest, and acquiring a target detection image containing the object of interest from the original image based on the detection position coordinates comprises:

3. The method of claim 2, wherein determining the detected location coordinates based on the first location information of the first predicted bounding box comprises:

4. The method of claim 3, wherein resizing the first predicted bounding box based on the bounding box width and the bounding box height, determining the detection location coordinates from the adjusted first predicted bounding box comprises:

according to the difference value between the frame width and the frame height and the first position information, carrying out size adjustment on the first prediction frame, and determining second position information and the adjusted width and height of the adjusted first prediction frame;

5. The method of claim 1, wherein said obtaining image features of the target detection image comprises:

inputting the target detection image into a convolutional neural network in a data classification model, and performing feature extraction on the target detection image based on a convolutional layer in the convolutional neural network to obtain the image features of the target detection image.

6. The method of claim 1, wherein the scaling the detection position information to obtain the position feature of the target detection image comprises:

acquiring the image width and the image height of the target detection image, and normalizing the detection position coordinate based on the image width and the image height to obtain a normalized position coordinate;

acquiring a total number of vision, and carrying out normalization processing on the visual identification based on the total number of vision to obtain a normalized visual identification;

and generating a perception input feature according to the normalized position coordinate and the normalized visual identification, and performing scale transformation on the perception input feature by adopting a multilayer perceptron in a data classification model to obtain the position feature of the target detection image.

7. The method of claim 1, wherein said feature stitching the image feature and the location feature to obtain a fused feature comprises:

normalizing the image features to obtain normalized image features, and normalizing the position features to obtain normalized position features;

and performing feature splicing on the normalized image features and the normalized position features to obtain fusion features.

8. The method of claim 1, wherein the classifying the fusion features based on the key features to obtain an image class to which the target detection image belongs comprises:

obtaining a scale factor in a classifier of a data classification model, and carrying out scale transformation on the fusion feature based on the scale factor; the scale factor is obtained by training the classifier in the data classification model;

and performing key identification on the feature corresponding to the key feature in the fusion features after the scale transformation based on the classifier, and performing classification processing on the fusion features after the scale transformation based on a key identification result to obtain an image category to which the target detection image belongs.

9. The method of claim 1, wherein the classifying the fusion features based on the key features to obtain an image class to which the target detection image belongs comprises:

10. The method of claim 1, wherein the method further comprises:

sending an article abnormal message to terminal equipment based on the communication mode so that the terminal equipment detects the article corresponding to the original image based on the article abnormal message; the item anomaly message includes the image category.

11. A method of data classification, the method comprising:

acquiring sample image characteristics of the detected image sample by adopting an initial convolutional neural network in an initial data classification model, and performing scale transformation on the position information of the detected sample by adopting an initial multilayer perceptron in the initial data classification model to obtain the sample position characteristics of the detected image sample;

performing feature splicing on the sample image features and the sample position features to obtain sample fusion features, determining sample key features of the sample image features based on the sample position features, and performing classification processing on the sample fusion features based on the sample key features to obtain image sample categories to which the detection image samples belong;

and training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain a data classification model.

12. The method of claim 11, wherein the classifying the sample fusion features based on the sample key features to obtain an image sample class to which the detection image sample belongs comprises:

classifying the sample fusion characteristics based on the sample key characteristics to obtain at least two sample labels and a sample prediction probability value of each sample label; the at least two sample tags comprise the target sample tag;

and determining the sample label with the maximum sample prediction probability value as the image sample category to which the detected image sample belongs.

13. The method of claim 12, wherein the loss function comprises a first loss function and a second loss function;

the training the initial data classification model based on the loss function between the image sample class and the target sample label to obtain a data classification model, including:

generating the first loss function according to the image sample category and the target sample label;

generating a label distribution function according to the target sample label, generating a prediction distribution function according to the at least two sample labels and the sample prediction probability value of each sample label, and generating the second loss function according to the label distribution function and the prediction distribution function;

and training the initial data classification model according to the first loss function and the second loss function to obtain a data classification model.

14. A computer device comprising a processor, a memory, an input output interface;

the processor is connected to the memory and the input/output interface, respectively, wherein the input/output interface is configured to receive data and output data, the memory is configured to store a computer program, and the processor is configured to call the computer program to perform the method according to any one of claims 1 to 10 or to perform the method according to any one of claims 11 to 13.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-10 or to perform the method of any of claims 11-13.