CN116958624A - Method, device, equipment, medium and program product for identifying appointed material - Google Patents

Method, device, equipment, medium and program product for identifying appointed material Download PDF

Info

Publication number
CN116958624A
CN116958624A CN202310036648.3A CN202310036648A CN116958624A CN 116958624 A CN116958624 A CN 116958624A CN 202310036648 A CN202310036648 A CN 202310036648A CN 116958624 A CN116958624 A CN 116958624A
Authority
CN
China
Prior art keywords
semantic
feature
prediction
point
feature representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310036648.3A
Other languages
Chinese (zh)
Inventor
王昌安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN202310036648.3A priority Critical patent/CN116958624A/en
Publication of CN116958624A publication Critical patent/CN116958624A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device, equipment, a medium and a program product for identifying a specified material, and relates to the field of computer vision. The method comprises the following steps: obtaining a prediction semantic graph and a prediction boundary graph corresponding to a target image, wherein the target image comprises a specified material area to be identified; taking the classification entropy of a first pixel point in the prediction semantic graph as a first screening condition, and determining a plurality of first point feature representations from the first semantic feature representations; determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence coefficient of the second pixel point in the prediction boundary diagram as a second screening condition; performing feature updating on the first semantic feature representation based on the plurality of first point feature representations and the second point feature representation to obtain a second semantic feature representation; and predicting the appointed material area in the target image based on the second semantic feature representation to obtain a target recognition result. The method improves the image recognition accuracy of objects such as glass materials.

Description

Method, device, equipment, medium and program product for identifying appointed material
Technical Field
The present application relates to the field of computer vision, and in particular, to a method, apparatus, device, medium, and program product for identifying a specified material.
Background
Glass objects are very common in daily life, such as windows, doors, glasses, etc. Such objects tend to exhibit transparent properties, resulting in the appearance of the glass object not having fixed texture information, but rather the appearance texture information being shared with the scene background, thus resulting in the boundaries of the glass object tending to appear quite blurred.
In the related art, in order to strengthen boundary features, semantic features and boundary features are generally extracted from an image respectively, feature points with high boundary confidence are obtained, the feature points are connected two by two to obtain data of a graph structure, and neighbor information among the feature points is aggregated by using a graph convolution network, so that contour features are optimized.
However, in the implementation process, the method is optimized for the point at the boundary position, but the glass object often has high light on the surface and has the problem of boundary blurring, so that the corresponding identification effect of the method is poor.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment, a medium and a program product for identifying a specified material, which can improve the image identification accuracy of an object such as glass material. The technical scheme is as follows:
In one aspect, a method for identifying a specified material is provided, the method comprising:
obtaining a prediction semantic graph and a prediction boundary graph corresponding to a target image, wherein the target image comprises a specified material area to be identified, the prediction semantic graph is used for indicating the specified material area obtained by prediction through the first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the specified material area obtained by prediction;
taking the classification entropy of a first pixel point in the prediction semantic graph as a first screening condition, determining a plurality of first point feature representations from the first semantic feature representations, wherein the classification entropy is used for indicating the determination degree of the prediction category corresponding to the first pixel point;
determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence coefficient of the second pixel point in the prediction boundary map as a second screening condition, wherein the boundary confidence coefficient is used for indicating the probability that the second pixel point is a boundary pixel point;
performing feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation;
And predicting the appointed material region in the target image based on the second semantic feature representation to obtain a target recognition result.
In another aspect, there is provided an apparatus for identifying a specified material, the apparatus comprising:
the image processing device comprises an acquisition module, a prediction semantic graph and a prediction boundary graph, wherein the acquisition module is used for acquiring a prediction semantic graph and a prediction boundary graph corresponding to a target image, the target image comprises a specified material region to be identified, the prediction semantic graph is used for indicating the specified material region obtained through prediction by means of first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the specified material region obtained through prediction;
the first determining module is used for determining a plurality of first point feature representations from the first semantic feature representations by taking the classification entropy of a first pixel point in the prediction semantic graph as a first screening condition, wherein the classification entropy is used for indicating the determination degree of the prediction category corresponding to the first pixel point;
the second determining module is used for determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence coefficient of a second pixel point in the prediction boundary map as a second screening condition, wherein the boundary confidence coefficient is used for indicating the probability that the second pixel point is a boundary pixel point;
The updating module is used for carrying out feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation;
and the prediction module is used for predicting the specified material area in the target image based on the second semantic feature representation to obtain a target recognition result.
In another aspect, a computer device is provided, where the terminal includes a processor and a memory, where the memory stores at least one instruction, at least one section of program, a code set, or an instruction set, where the at least one instruction, the at least one section of program, the code set, or the instruction set is loaded and executed by the processor to implement the method for identifying a specified material according to any one of the embodiments of the present application.
In another aspect, a computer readable storage medium is provided, where at least one program code is stored, where the program code is loaded and executed by a processor to implement a method for identifying a specified material according to any one of the embodiments of the present application.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for identifying a specified material according to any one of the above embodiments.
The technical scheme provided by the application at least comprises the following beneficial effects:
when the area of the appointed material in the target image is identified, the first semantic feature representation is screened through the classification entropy corresponding to the pixel point in the prediction semantic image of the target image to obtain a first point feature representation, the first semantic feature representation is screened through the boundary confidence corresponding to the pixel point in the prediction boundary image to obtain a second point feature representation, the first semantic feature representation is updated through the first point feature representation and the second point feature representation to obtain a new second semantic feature representation, and the target identification result aiming at the appointed material area is obtained through the prediction identification of the second semantic feature representation. According to the method, the points with high boundary confidence in the prediction boundary map are used as the contour priori knowledge of the region, so that the optimization of uncertain points in the prediction semantic map is realized, and the image recognition accuracy of objects such as glass materials is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of defects in a related art when glass recognition is performed;
FIG. 2 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application;
FIG. 3 is a flowchart of a method for identifying a specified texture according to an exemplary embodiment of the present application;
FIG. 4 is a schematic illustration of a prediction semantic graph and a prediction boundary graph provided by an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of a target recognition result provided by an exemplary embodiment of the present application;
FIG. 6 is a flowchart of a method for identifying a specified texture according to an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram of a structured attention optimization module in an image recognition model, according to an exemplary embodiment of the present application;
FIG. 8 is a flowchart of a method for identifying a specified texture according to an exemplary embodiment of the present application;
FIG. 9 is a schematic diagram of the overall structure of an image recognition model provided by an exemplary embodiment of the present application;
FIG. 10 is a schematic diagram of the recognition effect of an image recognition model provided by an exemplary embodiment of the present application;
FIG. 11 is a block diagram illustrating a structure of an identification device for a specified material according to an exemplary embodiment of the present application;
FIG. 12 is a block diagram illustrating a structure of an identification device for a specified material according to an exemplary embodiment of the present application;
Fig. 13 is a schematic diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, the terms involved in the embodiments of the present application will be briefly described:
artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML): is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
Computer Vision technology (CV): the computer vision is a science for researching how to make the machine "look at", and more specifically, it means that the camera and the computer are used to replace human eyes to identify, measure and other machine vision for the target, and further to make graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to the instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, map construction, autopilot, intelligent transportation, and the like.
Glass objects are quite common in daily life, and in images comprising glass objects, the following characteristics are provided: the appearance of the glass object has no fixed texture information, and the appearance texture information is shared with the scene background; the boundary of the glass object is blurred; there is a local highlight phenomenon on the surface of the glass object.
The above phenomena all cause that the points of the positions of the partial areas of the glass object cannot show clear semantic meaning, so that the points (such as corner points, edge points, points with high light and the like) at certain positions are difficult to judge when a machine segments the glass object.
In the related art, a picture is input into a network, and semantic features and boundary features are extracted respectively. In order to strengthen the recognition of the boundary, feature points with high boundary confidence are obtained, the feature points are subjected to pairwise phase-to-phase data to obtain the graph structure, and neighbor information among the feature points is aggregated by using a graph convolution network, so that the contour features are optimized. However, while the use of structural information by extracting contour features is considered, it is still essentially optimized for points at boundary locations, with poor recognition of the interior region of the glass object. As shown in fig. 1, which shows a defect schematic diagram existing in the glass identification of the related art, a comparison between the glass region mask map 120 and the prediction map 130 corresponding to the target image 110 shows that a significant identification defect exists.
In the embodiment of the application, the region/article of the specified material in the image is identified by a computer vision technology, and the point with high boundary confidence in the prediction boundary map is used as the contour priori knowledge of the region to optimize the uncertain point in the prediction semantic map, so that the image identification accuracy of the object such as glass material is improved.
Referring to fig. 2, a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application is shown. The computer system of the implementation environment comprises: a terminal 210, a server 220, and a communication network 230.
Alternatively, the terminal 210 includes various forms of devices such as a mobile phone, a tablet computer, a desktop computer, a portable notebook computer, a smart home appliance, a vehicle-mounted terminal, an aircraft, a robot, and the like. Data communication is achieved between the terminal 210 and the server 220 via a communication network 230.
Alternatively, the server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud security, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform.
Cloud Technology (Cloud Technology) refers to a hosting Technology that unifies serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
In some embodiments, the server 220 described above may also be implemented as a node in a blockchain system. Blockchain (Blockchain) is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like.
In some embodiments, the method provided by the embodiment of the present application is schematically illustrated by taking the implementation of interaction between the terminal 210 and the server 220 as an example, when there is a need for image recognition in the terminal 210, in one example, the image recognition application in the terminal 210 uploads the target image to the server 220 when the terminal 210 needs to recognize the designated material area in the target image, the server 220 invokes the image recognition module, the target recognition result is obtained by outputting the designated material recognition method provided by the embodiment of the present application, and the server 220 feeds back the target recognition result to the terminal 210, and displays the target recognition result after the terminal 210 receives the target recognition result.
Alternatively, the above application may be implemented as an e-commerce application, a social application, a video application, or the like, which is not specifically limited herein.
In other embodiments, the method provided in the embodiments of the present application may also be implemented independently by the terminal 210, and in one example, the terminal 210 is configured with an AI component, where an image recognition model capable of completing a corresponding image recognition task is stored in the AI component, and the AI component outputs a target recognition result according to an input target image to display the target recognition result.
Referring to fig. 3, a flowchart of a method for identifying a specified material according to an exemplary embodiment of the present application is shown, and in an embodiment of the present application, a schematic description is given by taking the method applied to a server as shown in fig. 2 as an example, and the method may also be applied to a terminal, which is not limited herein specifically. The method comprises the following steps:
step 310, a prediction semantic graph and a prediction boundary graph corresponding to the target image are obtained.
Illustratively, the target image includes a specified material region to be identified.
In some embodiments, the specified texture includes texture that shares a texture or shares a portion of a texture with the environment. Alternatively, the above specified materials may include glass, transparent/translucent plastic, light-transmitting silica gel, light-transmitting resin, reflective mirror, etc., and are not particularly limited herein.
Illustratively, the prediction semantic graph is used for indicating a predicted specified material area through the first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the predicted specified material area.
In some embodiments, feature extraction is performed on the target image through a first feature extraction network to obtain a first semantic feature representation, and classification results of each pixel point in the image serving as a specified material area are predicted through a first semantic prediction network according to the first semantic feature representation to obtain the prediction semantic graph.
Illustratively, the prediction semantic graph is an n-channel prediction probability graph, where n represents the number of categories and n is a positive integer. In one example, when only the specified material region in the target image is identified, the predicted semantic graph is a single-channel predicted probability graph; in another example, when identifying different types of designated material areas in the target image, the prediction semantic graph is a multi-channel prediction probability graph, each type of designated material area corresponds to one channel color, for example, the type of the designated material area includes a window type, a water cup type, a glasses type, and the like, the different types of designated material areas present different colors on the prediction semantic graph, and for example, the types of the designated material areas include a glass type, a plastic type, a silica gel type, a resin type, and the like, and the different types of material areas present different colors on the prediction semantic graph.
In some embodiments, feature extraction is performed on the target image through a second feature extraction network to obtain a boundary feature representation, and prediction is performed on classification results of each pixel point in the image as a boundary according to the boundary feature representation through a boundary prediction network to obtain the prediction boundary map.
Illustratively, the prediction boundary map is a prediction probability map of m channels, where m represents the number of categories, and m is a positive integer. In one example, the above m may be implemented as 1, that is, the prediction boundary map is a single-channel prediction probability map, where the prediction boundary map only indicates the boundary of the specified texture region, and does not distinguish the type of the specified texture region; in another example, m may be implemented as a positive integer greater than or equal to 2, i.e., the boundary channel color in the prediction boundary map is associated with the type of the specified texture region associated with the boundary.
Alternatively, the first feature extraction network and the second feature extraction network may be different feature extraction networks; alternatively, the first feature extraction network and the second feature extraction network are implemented by different combinations of specified feature extraction networks, without limitation.
In other embodiments, the first semantic feature representation is obtained by extracting features of the target image, the first semantic feature representation is input to a first semantic prediction network, a predicted semantic graph is output, and a boundary of a specified material region in the predicted semantic graph is extracted by a preset boundary detection operator (e.g., canny) to obtain a predicted boundary graph.
In one example, as shown in FIG. 4, which illustrates a schematic diagram of a prediction semantic graph and a prediction boundary graph provided by an exemplary embodiment of the present application, a target image 410 generates a prediction semantic graph 420 from a first semantic feature representation and a prediction boundary graph 430 from a boundary feature representation.
Alternatively, the target image may be uploaded by the terminal with authorization; alternatively, the target image may be obtained from a database.
Step 320, determining a plurality of first point feature representations from the first semantic feature representations by using the classification entropy of the first pixel point in the prediction semantic graph as a first filtering condition.
Illustratively, the above-mentioned classification entropy is used for indicating the determination degree of the prediction category corresponding to the first pixel point, where the above-mentioned classification entropy and the determination degree of the prediction category have a negative correlation, that is, the greater the classification entropy, the higher the uncertainty degree of the prediction result of the first pixel point on the prediction semantic graph.
Alternatively, the first pixel point may be implemented as all the pixel points in the prediction semantic graph; alternatively, the first pixel may appear as a part of pixels in the prediction semantic graph.
In some embodiments, when the first pixel is implemented as a part of pixels in the prediction semantic graph, optionally, the part of pixels are pixels in a first designated area in the prediction semantic graph, where the first designated area includes a designated material area, for example, an area including 10 pixel ranges around the designated material area is determined as the first designated area, and pixels in the first designated area are determined as the first pixels.
In some embodiments, a classification entropy corresponding to the first pixel point is obtained from the prediction semantic graph, and a first target pixel point meeting a first screening condition is screened from the first pixel point based on the classification entropy.
Optionally, sorting the first pixel points according to the classification entropy to obtain a first pixel list, and determining N first pixel points with highest classification entropy in the first pixel list as the first target pixel points, wherein N is a positive integer; or in response to the classification entropy of the first pixel point reaching a preset entropy threshold, determining the first pixel point as a first target pixel point.
Illustratively, after the first target pixel point is determined, a first point feature representation corresponding to the first target pixel point is truncated from the first semantic feature representation according to a pixel position of the first target pixel point in the prediction semantic graph.
Alternatively, the manner of intercepting the first point feature representation may be implemented as at least one of the following:
first, according to a first pixel position of a first target pixel point in a prediction semantic graph, determining a matched target pixel position from a target image, and intercepting a feature representation corresponding to a pixel of the target pixel position in a first semantic feature representation to obtain the first point feature representation.
And secondly, determining a target area comprising a target pixel position matched with the first pixel position from the target image according to the first pixel position of the first target pixel point in the prediction semantic graph, and intercepting a feature representation corresponding to the first semantic feature representation of pixels in the target area to obtain the first point feature representation.
And 330, determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence of the second pixel points in the prediction boundary map as a second screening condition.
Illustratively, the above-mentioned boundary confidence is used for indicating the probability that the second pixel point is the boundary pixel point, where the above-mentioned boundary confidence and the probability that the second pixel point is the boundary pixel point have a positive correlation, that is, the greater the boundary confidence, the higher the probability that the second pixel point is the boundary pixel point on the prediction boundary map.
Alternatively, the second pixel point may be implemented as all the pixel points in the prediction boundary map; alternatively, the second pixel may appear as a part of the pixels in the prediction boundary map.
In some embodiments, when the second pixel is implemented as a portion of pixels in the prediction boundary map, optionally, the portion of pixels are pixels in a second designated area in the prediction boundary map that indicates a boundary including a designated material area, for example, an area including 5 pixel areas around the boundary of the designated material area is determined as the second designated area, and pixels in the second designated area are determined as the second pixels.
In some embodiments, a boundary confidence corresponding to the second pixel point is obtained from the prediction boundary map, and a second target pixel point meeting a second screening condition is screened from the second pixel point based on the boundary confidence.
Optionally, sorting the second pixel points according to the boundary confidence coefficient to obtain a second pixel list, and determining M second pixel points with the highest boundary confidence coefficient in the second pixel list as the second target pixel points, wherein M is a positive integer; or, in response to the boundary confidence of the second pixel point reaching the preset confidence threshold, determining the second pixel point as the second target pixel point.
Illustratively, after determining the second target pixel, intercepting a second point feature representation corresponding to the second target pixel from the first semantic feature representation according to the pixel position of the second target pixel in the prediction boundary map.
Alternatively, the manner of intercepting the second point feature representation may be implemented as at least one of the following:
first, according to a second pixel position of a second target pixel point in a prediction boundary diagram, determining a matched target pixel position from a target image, and intercepting a feature representation corresponding to a pixel of the target pixel position in a first semantic feature representation to obtain the second point feature representation.
And secondly, determining a target area comprising a target pixel position matched with the second pixel position from the target image according to the second pixel position of the second target pixel point in the prediction semantic graph, and intercepting the feature representation corresponding to the first semantic feature representation of the pixels in the target area to obtain the second point feature representation.
And step 340, performing feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation.
In some embodiments, the first point feature representation is feature updated with the second point feature representation to obtain a third point feature representation, and the first semantic feature representation is feature updated with the third point feature representation to obtain a second semantic feature representation.
Alternatively, the feature update manner of the second point feature representation to the first point feature representation may be implemented as at least one of the following manners:
first, the first point feature representation is updated based on feature similarities between the first point feature representation and the second point feature representation to obtain a third point feature representation.
Illustratively, calculating the feature similarity between the first point feature representation and the second point feature representation, and adding the first point feature representation and the second point feature representation to obtain a third point feature representation in response to the feature similarity between the first point feature representation and the second point feature representation reaching a preset similarity threshold, wherein the feature dimensions of the first point feature representation and the second point feature representation are the same.
Alternatively, the above feature similarity may be indicated by at least one distance data of a euclidean distance, a cosine distance, a mahalanobis distance, a hamming distance, or the like of the first point feature representation and the second point feature representation in the feature space.
And secondly, carrying out feature updating on the first point feature representation based on the clustering results of the plurality of second point feature representations to obtain a third point feature representation.
Schematically, clustering the plurality of second point feature representations to obtain at least two clusters, determining a target cluster from the at least two clusters according to the feature similarity between the clusters and the first point feature representation, and performing feature fusion on the second point feature representation and the first point feature representation in the target cluster to obtain the third point feature representation.
In some embodiments, each cluster corresponds to a second point feature representation as a cluster center, and the target cluster is determined from at least two clusters by calculating a similarity between the second point feature representation and the first point feature representation as the cluster center.
In some embodiments, feature similarities between the second point feature representation and the first point feature representation in the target cluster are calculated respectively, the feature similarities are used as weights to carry out weighted summation on the second point feature representation to obtain cluster feature representations corresponding to the target cluster, and the cluster feature representations and the first point feature representations are added to obtain the third point feature representation.
Thirdly, feature fusion is carried out on the second point feature representations based on the attention mechanism so as to update the features of the first point feature representations and obtain third point feature representations.
Illustratively, the attention mechanism is used for establishing a relation among a Query (Query), a Key (Key) and a Value (Value), and the process includes calculating an attention score (weight coefficient) through the Query and the Key, and carrying out weighted summation on the Value according to the attention score.
In the embodiment of the application, the first point feature representation is taken as Q, the second point feature representation is taken as K and V, so that the attention feature representation is determined, and the attention feature representation and the first point feature representation are added to obtain the third point feature representation.
In some embodiments, the attention feature representation described above may be implemented as a multi-headed attention mechanism.
Optionally, feature replacement is performed on the designated feature positions in the first semantic feature representation through the third point feature representation; alternatively, the feature specifying the feature location in the first semantic feature representation and the third point feature representation are added.
And step 350, predicting the specified material area in the target image based on the second semantic feature representation to obtain a target recognition result.
In some embodiments, the second semantic feature representation is input into a second semantic prediction network and output results in a target recognition result.
Alternatively, the network structure between the second semantic prediction network and the first semantic prediction network may be the same or different; alternatively, when the network structures between the second semantic prediction network and the first semantic prediction network are the same, the network weight information between the second semantic prediction network and the first semantic prediction network may be the same or different.
Referring to fig. 5, a schematic diagram of a target recognition result 520 provided by an exemplary embodiment of the present application is shown, and a glass region in a target image 510 is recognized to obtain a target recognition result 520 corresponding to the glass region.
In summary, in the method for identifying a specified material according to the embodiment of the present application, when an area of the specified material in a target image is identified, a first semantic feature representation is screened by a classification entropy corresponding to a pixel point in a predicted semantic image of the target image to obtain a first point feature representation, and the first semantic feature representation is screened by a boundary confidence level corresponding to the pixel point in a prediction boundary image to obtain a second point feature representation, and the first semantic feature representation is updated by the first point feature representation and the second point feature representation to obtain a new second semantic feature representation, and prediction identification is performed by the second semantic feature representation to obtain a target identification result for the specified material area. According to the method, the points with high boundary confidence in the prediction boundary map are used as the contour priori knowledge of the region, so that the optimization of uncertain points in the prediction semantic map is realized, and the image recognition accuracy of objects such as glass materials is improved.
Referring to fig. 6, a flowchart of a method for identifying a specified material according to an exemplary embodiment of the present application is shown, in which a process for implementing structured attention optimization by a first point feature representation and a second point feature representation is schematically illustrated, the method includes:
step 610, obtaining a prediction semantic graph and a prediction boundary graph corresponding to the target image.
Illustratively, the target image includes a specified material region to be identified.
In some embodiments, the specified texture includes texture that shares a texture or shares a portion of a texture with the environment. Alternatively, the above specified materials may include glass, transparent/translucent plastic, light-transmitting silica gel, light-transmitting resin, reflective mirror, etc., and are not particularly limited herein.
Illustratively, the prediction semantic graph is used for indicating a predicted specified material area through the first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the predicted specified material area.
Step 620, determining a plurality of first point feature representations from the first semantic feature representations by using the classification entropy of the first pixel point in the prediction semantic graph as a first filtering condition.
Illustratively, classification entropy corresponding to each of a plurality of first pixel points in a prediction semantic graph is obtained, the plurality of first pixel points are arranged according to the classification entropy to obtain a first pixel list, N first pixel points with highest classification entropy in the first pixel list are determined to be first target pixel points meeting first screening conditions, N is a positive integer, and first point feature representations corresponding to the first target pixel points are determined from first semantic feature representations based on pixel positions of the first target pixel points in the prediction semantic graph.
In the embodiment of the application, each first pixel point in the prediction semantic graph is traversed, the classification entropy corresponding to each first pixel point is determined, N first pixel points with the largest classification entropy are selected according to the descending order of the classification entropy, and the corresponding first point feature representation is determined from the first semantic feature representation according to the positions of the first pixel points.
In the embodiment of the application, since the prediction semantic graph is a classification result of whether each pixel point in the target image belongs to the specified material area, that is, the prediction semantic graph can indicate whether each pixel is a prediction result of a pixel forming the specified material area, the classification entropy is determined by the classification probability corresponding to each pixel point indicated by the prediction semantic graph. In one example, the classification entropy H of the ith first pixel point is calculated by equation one.
Equation one: h= - Σ j∈[0,n] p(x j )logp(x j )
Wherein p (x) j ) For the class probability of the ith first pixel point under the jth class, n is used for indicating the channel number of the prediction semantic graph, namely, the class number of the classification prediction.
In other embodiments, the classification entropy corresponding to the ith first pixel point may also be determined by the classification probabilities corresponding to the plurality of first pixel points in the designated pixel area centered on the ith first pixel point.
Step 630, determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence of the second pixel point in the prediction boundary map as a second filtering condition.
Schematically, boundary confidence degrees respectively corresponding to a plurality of second pixel points in the prediction boundary diagram are obtained, the plurality of second pixel points are arranged according to the boundary confidence degrees, a second pixel list is obtained, M second pixel points with highest boundary confidence degrees in the second pixel list are determined to be second target pixel points meeting second screening conditions, M is a positive integer, and second point feature representations corresponding to the second target pixel points are determined from the first semantic feature representations based on pixel positions of the second target pixel points in the prediction boundary diagram.
In the embodiment of the application, each second pixel point in the prediction boundary map is traversed, the boundary confidence corresponding to each second pixel point is determined, M second pixel points with the largest boundary confidence are selected according to the boundary confidence, and the corresponding second point feature representation is determined from the first semantic feature representation according to the positions of the second pixel points.
In step 641, the first point feature representation is updated with a plurality of second point feature representations based on the multi-headed attention mechanism to obtain a third point feature representation.
Illustratively, a first point feature representation is used as a Query feature (Query), a second point feature representation is used as a Key feature (Key) and a Value feature (Value), an attention feature representation is generated based on a multi-head attention mechanism, and the attention feature representation and the first point feature representation are added to obtain a third point feature representation.
In one example, the attention characteristic representation determined by the multi-headed attention mechanism is shown in equation two.
Formula II:
where q is the query feature, k is the key feature, v is the value feature,representing the dimensions of the input features q, k, v.
In the embodiment of the application, after the attention characteristic representation is obtained through a multi-head attention mechanism, the attention characteristic representation and the original query characteristic (first point characteristic representation) are added, so that an optimized third point characteristic representation is obtained.
In some embodiments, when the prediction semantic graph includes a plurality of specified material areas of a plurality of categories, at least two first point feature representations are corresponding to the specified material areas of each category, before generating a third point feature representation, clustering the plurality of second point feature representations to obtain at least two clusters, determining first feature distributions corresponding to the at least two clusters respectively, obtaining second feature distributions corresponding to the at least two first point feature representations belonging to the same category, determining a target cluster from the at least two clusters based on distribution similarity between the first feature distributions and the second feature distributions, and using the second point feature representations in the target cluster to update the first point feature representations, that is, performing feature update on the first point feature representations through the second point feature representations in the target cluster based on a multi-head attention mechanism to obtain the third point feature representation.
And step 642, performing feature updating on the first semantic feature representation based on the third point feature representation to obtain a second semantic feature representation.
Schematically, a target pixel position corresponding to the third point feature representation is determined, and the feature corresponding to the target pixel position in the first semantic feature representation is replaced by the third point feature representation, so that a second semantic feature representation is obtained. Since the third point feature representation is updated from the first point feature representation, the target pixel position corresponding to the third point feature representation is the pixel position of the first pixel point corresponding to the first point feature representation.
Step 650, predicting the specified material area in the target image based on the second semantic feature representation, so as to obtain a target recognition result.
In some embodiments, the second semantic feature representation is input into a second semantic prediction network and output results in a target recognition result.
Referring to fig. 7, a schematic structural diagram of a structured attention optimization module in an image recognition model according to an exemplary embodiment of the present application is shown. In the structured attention optimization module, a first semantic feature representation 701 is output through a first convolution semantic prediction network to obtain a predicted semantic graph 702, and a plurality of first point feature representations 703 are determined from the first semantic feature representations 701 according to the predicted semantic graph 702. The boundary feature representation 704 is output through a convolution boundary prediction network to obtain a prediction boundary map 705, a plurality of second point feature representations 706 are determined from the first semantic feature representation 701 according to the prediction boundary map 705, an attention feature representation 707 is obtained through an attention mechanism module 710, the attention feature representation 707 is inserted into the first semantic feature representation 701 to obtain a second semantic feature representation 708, the second semantic feature representation 708 is input into the second convolution semantic prediction network, and a target recognition result 709 is output. Wherein the first convolution semantic prediction network and the second convolution semantic prediction network share network weight information.
In summary, in the method for identifying a specified material according to the embodiment of the present application, when an area of the specified material in a target image is identified, a first semantic feature representation is screened by a classification entropy corresponding to a pixel point in a predicted semantic image of the target image to obtain a first point feature representation, and the first semantic feature representation is screened by a boundary confidence level corresponding to the pixel point in a prediction boundary image to obtain a second point feature representation, and the first semantic feature representation is updated by the first point feature representation and the second point feature representation to obtain a new second semantic feature representation, and prediction identification is performed by the second semantic feature representation to obtain a target identification result for the specified material area. According to the method, the points with high boundary confidence in the prediction boundary map are used as the contour priori knowledge of the region, so that the optimization of uncertain points in the prediction semantic map is realized, and the image recognition accuracy of objects such as glass materials is improved.
Referring to fig. 8, a flowchart of a method for identifying a specified material according to an exemplary embodiment of the present application is shown, in which, in an embodiment of the present application, a schematic description is given of obtaining a prediction semantic graph and a prediction boundary graph of a target image, where step 811 to step 813 are lower steps of step 310 or step 610, and the method includes:
And step 811, extracting features of the target image to obtain a first semantic feature representation and a boundary feature representation respectively.
In some embodiments, feature extraction is performed on the target image through different feature extraction networks, resulting in a first semantic feature representation and a boundary feature representation, respectively. Illustratively, the target image is input to a first feature extraction network to obtain a first semantic feature representation, and the target image is input to a second feature extraction network to obtain a boundary feature representation.
In other embodiments, the first semantic feature representation and the boundary feature representation are different outputs resulting from different combinations of combinations in the same feature extraction network. In one example, feature extraction is performed on the target image to obtain an intermediate feature representation, multi-scale feature extraction is performed on the intermediate feature representation to obtain a first semantic feature representation, and feature fusion is performed on the intermediate feature representation and the first semantic feature representation to obtain a boundary feature representation.
Alternatively, the above-described feature extraction network may be implemented as at least one of a fully connected neural network (Deep Neural Network, DNN), a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Network, CNN), a converter (transducer), or the like.
Step 812, the first semantic feature representation is input to a first semantic prediction network and output to obtain a predicted semantic graph.
Alternatively, the first semantic prediction network described above may be implemented as at least one of DNN, RNN, CNN, transformer and the like.
In some embodiments, the first semantic prediction network includes neurons for completing decision distribution generation, and illustratively, the first semantic feature is represented as a hidden layer feature of the data to be identified, and is a high-dimensional vector, and the vector is further used for producing decision results. In the process of generating the decision distribution, firstly, generating a logits vector based on a full connection layer, as shown in a formula III, wherein W is a decision layer parameter matrix, and h is a first semantic feature representation.
And (3) a formula III: logits z=w·h
Inputting the logits vector output by the full connection layer into a logistic regression (softmax) layer to obtain the prediction probabilities corresponding to different categories, wherein the prediction probability p of the ith category i As shown in formula four, z i For the logits vector corresponding to the ith category, i is a positive integer, and when the prediction semantic graph only distinguishes whether the pixel point is the pixel point in the specified material area, i=1.
Equation four:
after determining the prediction probability corresponding to each category, the decision distribution generation part outputs the category with the largest prediction probability as a first classification result, namely as shown in a formula five, wherein the classification category corresponding to the model comprises C classification categories, p i The prediction probability corresponding to the ith category.
Formula five: conf=maxp i ,i=1,2,…,C
The recognition process of the first semantic prediction network for the first semantic feature representation can be represented as formula six, wherein p is the output prediction semantic graph and p is the output prediction semantic graph i For the prediction probability corresponding to the ith category, i E [1, C]And x is a target image, the theta feature extracts the model parameters of the network, and W is the model parameters of the first semantic prediction network.
Formula six: p= (p 1 ,…p C )=g(x;θ,W)
Step 813, inputting the boundary feature representation into a boundary prediction network, and outputting to obtain a prediction boundary map.
In some embodiments, the boundary prediction network includes a fully connected layer and a logistic regression (softmax) layer, the boundary features represent that after the boundary features are input into the boundary prediction network, the logits vectors output by the fully connected layer are generated through the fully connected layer, the logits vectors output by the fully connected layer are input into the softmax layer to obtain the prediction probability of the second pixel point as a point on the boundary of the specified material area, and the prediction probability is used as the boundary confidence.
In one example, please refer to fig. 9, which is a schematic diagram illustrating an overall structure of an image recognition model 900 provided by an exemplary embodiment of the present application, the image recognition model 900 includes a feature extraction portion and a prediction portion, where the feature extraction portion includes a residual network (res net 50) 910 and a hole pyramid pooling (Atrous Spatial Pyramid Pooling, ASPP) layer 920 and a merging convolution (conca+conv) layer 930, and the prediction portion includes a spatial autoregressive model (Spatial Autoregressive Models, SAR) 940. Wherein, the ASSP layer 920 extracts features containing different receptive field sizes by introducing convolution with different void ratios and pooling operations.
The target image 901 is input into the image recognition model 900, multi-layer feature extraction is performed through the residual network 910, the residual network 910 inputs the output into the ASPP layer 920, and the output obtains the first semantic feature representation F s The first semantic feature represents F s Is input into SAR940 and into merge convolution layer 930, the residual network 910 includes multiple feature extraction layers, and the residual network 910 also outputs intermediate features F of the first feature extraction layer 1 Input to the merge convolution layer 930, the merge convolution layer 930 pairs the intermediate feature F 1 And a first semantic feature representation F s Performing feature fusion, and outputting to obtain boundary feature representation F b Boundary features representing F b Is input into SAR 940.
In SAR940, boundary features represent F b By prediction, a prediction boundary map 902 is obtained, the first semantic feature representing F s The object recognition result 903 is obtained by feature update and prediction, and the boundary loss L between the boundary map 902 and the labeled boundary map 904 is calculated by prediction b Semantic loss L between target recognition result 903 and labeled semantic graph 905 s Training the image recognition model.
In summary, in the method for identifying a specified material according to the embodiment of the present application, when an area of the specified material in a target image is identified, a first semantic feature representation is screened by a classification entropy corresponding to a pixel point in a predicted semantic image of the target image to obtain a first point feature representation, and the first semantic feature representation is screened by a boundary confidence level corresponding to the pixel point in a prediction boundary image to obtain a second point feature representation, and the first semantic feature representation is updated by the first point feature representation and the second point feature representation to obtain a new second semantic feature representation, and prediction identification is performed by the second semantic feature representation to obtain a target identification result for the specified material area. According to the method, the points with high boundary confidence in the prediction boundary map are used as the contour priori knowledge of the region, so that the optimization of uncertain points in the prediction semantic map is realized, and the image recognition accuracy of objects such as glass materials is improved.
In one example, please refer to fig. 10, which illustrates a schematic diagram of the recognition effect of the image recognition model provided by the method according to an exemplary embodiment of the present application, that is, the recognition effect of the image recognition model provided by the method according to an exemplary embodiment of the present application is schematically illustrated. The mask image 1002 corresponding to the glass region corresponds to the input image 1001, the recognition effect achieved by the algorithm in the related art is shown as a first recognition result 1003, and the recognition effect achieved by the method provided by the embodiment of the application is shown as a second recognition result 1004. The first recognition result 1003 can obviously sense the recognition defect caused by the problems of high light, blurred edges and the like of the glass region.
Referring to fig. 11, a block diagram of a device for identifying a specified material according to an exemplary embodiment of the present application is shown, where the device includes the following modules:
an obtaining module 1110, configured to obtain a prediction semantic graph and a prediction boundary graph corresponding to a target image, where the target image includes a specified material area to be identified, the prediction semantic graph is used to indicate the specified material area obtained by prediction through a first semantic feature representation, and the prediction boundary graph is used to indicate a boundary of the specified material area obtained by prediction;
The first determining module 1120 is configured to determine a plurality of first point feature representations from the first semantic feature representations by using a classification entropy of a first pixel point in the prediction semantic graph as a first filtering condition, where the classification entropy is used to indicate a determination degree of a prediction category corresponding to the first pixel point;
a second determining module 1130, configured to determine a plurality of second point feature representations from the first semantic feature representations using a boundary confidence level of a second pixel point in the prediction boundary map as a second filtering condition, where the boundary confidence level is used to indicate a probability that the second pixel point is a boundary pixel point;
an updating module 1140, configured to perform feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation;
and a prediction module 1150, configured to predict the specified material region in the target image based on the second semantic feature representation, so as to obtain a target recognition result.
In some alternative embodiments, as shown in fig. 12, the updating module 1140 includes:
a first updating unit 1141, configured to perform feature updating on the first point feature representation through the plurality of second point feature representations based on a multi-head attention mechanism, to obtain a third point feature representation;
And a second updating unit 1142, configured to perform feature updating on the first semantic feature representation based on the third point feature representation, to obtain the second semantic feature representation.
In some alternative embodiments, the updating module 1140 further comprises:
a generating unit 1143, configured to generate an attention feature representation based on the multi-headed attention mechanism with the first point feature representation as a query feature and the second point feature representation as a key feature and a value feature;
the first updating unit 1141 is further configured to add the attention feature representation and the first point feature representation to obtain the third point feature representation.
In some alternative embodiments, the updating module 1140 further comprises:
a determining unit 1144, configured to determine a target pixel position corresponding to the third point feature representation;
the second updating unit 1142 is further configured to replace a feature corresponding to the target pixel position in the first semantic feature representation with the third point feature representation, to obtain the second semantic feature representation.
In some optional embodiments, the prediction semantic graph includes a plurality of classes of the specified material areas, where each class of the specified material area corresponds to at least two first point feature representations;
The first updating unit 1141 is further configured to cluster the plurality of second point feature representations to obtain at least two clusters and determine first feature distributions corresponding to the at least two clusters respectively; acquiring second feature distribution corresponding to the at least two first point feature representations belonging to the same category; determining a target cluster from the at least two clusters based on the distribution similarity between the first feature distribution and the second feature distribution; and carrying out feature updating on the first point feature representation through the second point feature representation in the target cluster based on the multi-head attention mechanism to obtain the third point feature representation.
In some alternative embodiments, the first determining module 1120 includes:
a first obtaining unit 1121, configured to obtain the classification entropy corresponding to each of the plurality of first pixel points in the prediction semantic graph;
a first sorting unit 1122, configured to sort the plurality of first pixel points with the classification entropy to obtain a first pixel list;
a first filtering unit 1123, configured to determine N first pixel points with highest classification entropy in the first pixel list as first target pixel points that meet the first filtering condition, where the classification entropy and the determination degree of the prediction class are in a negative correlation, and N is a positive integer;
The first filtering unit 1123 is further configured to determine, from the first semantic feature representations, a first point feature representation corresponding to the first target pixel point based on a pixel position of the first target pixel point in the prediction semantic graph.
In some alternative embodiments, the second determining module 1130 includes:
a second obtaining unit 1131, configured to obtain the boundary confidence degrees corresponding to the plurality of second pixel points in the prediction boundary map respectively;
a second sorting unit 1132, configured to arrange the plurality of second pixel points with the boundary confidence coefficient to obtain a second pixel list;
a second filtering unit 1133, configured to determine M second pixel points with the highest boundary confidence in the second pixel list as second target pixel points that meet the second filtering condition, where M is a positive integer;
the second filtering unit 1133 is further configured to determine, from the first semantic feature representations, a second point feature representation corresponding to the second target pixel point based on a pixel position of the second target pixel point in the prediction boundary map.
In some alternative embodiments, the acquiring module 1110 includes:
An extraction unit 1111, configured to perform feature extraction on the target image to obtain the first semantic feature representation and boundary feature representation respectively;
a first prediction unit 1112, configured to input the first semantic feature representation to a first semantic prediction network, and output the first semantic feature representation to obtain the predicted semantic graph;
the second prediction unit 1113 is configured to input the boundary feature representation to a boundary prediction network, and output the boundary prediction map.
In some optional embodiments, the extracting unit 1111 is further configured to perform feature extraction on the target image to obtain an intermediate feature representation; performing multi-scale feature extraction on the intermediate feature representation to obtain the first semantic feature representation; and carrying out feature fusion on the intermediate feature representation and the first semantic feature representation to obtain the boundary feature representation.
In some optional embodiments, the prediction module 1150 is configured to input the second semantic feature representation into a second semantic prediction network, and output the target recognition result;
wherein the first semantic prediction network and the second semantic prediction network share network weight information.
In summary, when the identification device for the specified material in the embodiment of the present application identifies the region of the specified material in the target image, the first semantic feature representation is screened by the classification entropy corresponding to the pixel point in the predicted semantic image of the target image to obtain the first point feature representation, the first semantic feature representation is screened by the boundary confidence corresponding to the pixel point in the predicted boundary image to obtain the second point feature representation, and the first semantic feature representation is updated by the first point feature representation and the second point feature representation to obtain the new second semantic feature representation, and the prediction identification is performed by the second semantic feature representation to obtain the target identification result for the region of the specified material. According to the method, the points with high boundary confidence in the prediction boundary map are used as the contour priori knowledge of the region, so that the optimization of uncertain points in the prediction semantic map is realized, and the image recognition accuracy of objects such as glass materials is improved.
It should be noted that: the identification device for the specified material provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for identifying the specified material provided in the above embodiment and the method embodiment for identifying the specified material belong to the same concept, and the specific implementation process is detailed in the method embodiment and will not be described herein again.
Fig. 13 is a schematic diagram showing a structure of a server according to an exemplary embodiment of the present application. Specifically, the following structure is included.
The server 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, a system Memory 1304 including a random access Memory (Random Access Memory, RAM) 1302 and a Read Only Memory (ROM) 1303, and a system bus 1305 connecting the system Memory 1304 and the central processing unit 1301. The server 1300 also includes a mass storage device 1306 for storing an operating system 1313, application programs 1314, and other program modules 1315.
The mass storage device 1306 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305. The mass storage device 1306 and its associated computer-readable media provide non-volatile storage for the server 1300. That is, the mass storage device 1306 may include a computer readable medium (not shown) such as a hard disk or compact disc read only memory (Compact Disc Read Only Memory, CD-ROM) drive.
Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), charged erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 1304 and mass storage device 1306 described above may be collectively referred to as memory.
According to various embodiments of the application, the server 1300 may also operate by a remote computer connected to the network through a network, such as the Internet. I.e., the server 1300 may be connected to the network 1312 via a network interface unit 1311 coupled to the system bus 1305, or the network interface unit 1311 may be used to connect to other types of networks or remote computer systems (not shown).
The memory also includes one or more programs, one or more programs stored in the memory and configured to be executed by the CPU.
The embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to realize the identification method of the specified materials provided by the above method embodiments. Alternatively, the computer device may be a terminal or a server.
Embodiments of the present application further provide a computer readable storage medium having at least one instruction, at least one program, a code set, or an instruction set stored thereon, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor, so as to implement the method for identifying a specified material provided by the foregoing method embodiments.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for identifying a specified material according to any one of the above embodiments.
Alternatively, the computer-readable storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), solid state disk (SSD, solid State Drives), or optical disk, etc. The random access memory may include resistive random access memory (ReRAM, resistance Random Access Memory) and dynamic random access memory (DRAM, dynamic Random Access Memory), among others. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (14)

1. A method for identifying a specified material, the method comprising:
obtaining a prediction semantic graph and a prediction boundary graph corresponding to a target image, wherein the target image comprises a specified material area to be identified, the prediction semantic graph is used for indicating the specified material area obtained by prediction through the first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the specified material area obtained by prediction;
taking the classification entropy of a first pixel point in the prediction semantic graph as a first screening condition, determining a plurality of first point feature representations from the first semantic feature representations, wherein the classification entropy is used for indicating the determination degree of the prediction category corresponding to the first pixel point;
determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence coefficient of the second pixel point in the prediction boundary map as a second screening condition, wherein the boundary confidence coefficient is used for indicating the probability that the second pixel point is a boundary pixel point;
Performing feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation;
and predicting the appointed material region in the target image based on the second semantic feature representation to obtain a target recognition result.
2. The method of claim 1, wherein the feature updating the plurality of first semantic feature representations based on the plurality of first point feature representations and the second point feature representation to obtain a second semantic feature representation comprises:
performing feature updating on the first point feature representation through the plurality of second point feature representations based on a multi-head attention mechanism to obtain a third point feature representation;
and carrying out feature updating on the first semantic feature representation based on the third point feature representation to obtain the second semantic feature representation.
3. The method of claim 2, wherein the multi-headed gaze based mechanism performs feature updating of the first point feature representation via the plurality of second point feature representations to obtain a third point feature representation, comprising:
generating an attention feature representation based on the multi-headed attention mechanism with the first point feature representation as a query feature and the second point feature representation as a key feature and a value feature;
And adding the attention characteristic representation and the first point characteristic representation to obtain the third point characteristic representation.
4. The method of claim 2, wherein the feature updating the first semantic feature representation based on the third point feature representation to obtain the second semantic feature representation comprises:
determining a corresponding target pixel position of the third point feature representation;
and replacing the feature corresponding to the target pixel position in the first semantic feature representation with the third point feature representation to obtain the second semantic feature representation.
5. The method according to any one of claims 2 to 4, wherein the prediction semantic graph includes a plurality of classes of the specified texture regions, and the specified texture region of each class corresponds to at least two first point feature representations;
the method further comprises the steps of:
clustering the plurality of second point feature representations to obtain at least two clusters and determining first feature distribution corresponding to the at least two clusters respectively;
acquiring second feature distribution corresponding to the at least two first point feature representations belonging to the same category;
determining a target cluster from the at least two clusters based on the distribution similarity between the first feature distribution and the second feature distribution;
The multi-head attention-based mechanism performs feature updating on the first point feature representation through the plurality of second point feature representations to obtain a third point feature representation, and the method comprises the following steps:
and carrying out feature updating on the first point feature representation through the second point feature representation in the target cluster based on the multi-head attention mechanism to obtain the third point feature representation.
6. The method according to any one of claims 1 to 4, wherein determining a plurality of first point feature representations from the first semantic feature representations using a classification entropy of a first pixel point in the prediction semantic graph as a first filtering condition includes:
acquiring the classification entropy corresponding to each of a plurality of first pixel points in the prediction semantic graph;
arranging the plurality of first pixel points by using the classification entropy to obtain a first pixel list;
determining N first pixel points with highest classification entropy in the first pixel list as first target pixel points meeting the first screening condition, wherein the classification entropy and the determination degree of the prediction category are in a negative correlation relationship, and N is a positive integer;
and determining a first point feature representation corresponding to the first target pixel point from the first semantic feature representation based on the pixel position of the first target pixel point in the prediction semantic graph.
7. The method according to any one of claims 1 to 4, wherein determining a plurality of second point feature representations from the first semantic feature representations using the boundary confidence of the second pixel point in the prediction boundary map as a second filtering condition includes:
acquiring the boundary confidence degrees respectively corresponding to a plurality of second pixel points in the prediction boundary map;
arranging the plurality of second pixel points according to the boundary confidence coefficient to obtain a second pixel list;
determining M second pixel points with highest boundary confidence in the second pixel list as second target pixel points meeting the second screening conditions, wherein M is a positive integer;
and determining a second point characteristic representation corresponding to the second target pixel point from the first semantic characteristic representation based on the pixel position of the second target pixel point in the prediction boundary diagram.
8. The method according to any one of claims 1 to 4, wherein the obtaining a prediction semantic map and a prediction boundary map corresponding to the target image includes:
extracting features of the target image to obtain the first semantic feature representation and boundary feature representation respectively;
Inputting the first semantic feature representation into a first semantic prediction network, and outputting to obtain the prediction semantic graph;
and inputting the boundary characteristic representation into a boundary prediction network, and outputting to obtain the prediction boundary map.
9. The method of claim 8, wherein the feature extraction of the target image to obtain the first semantic feature representation and the boundary feature representation, respectively, comprises:
extracting features of the target image to obtain intermediate feature representation;
performing multi-scale feature extraction on the intermediate feature representation to obtain the first semantic feature representation;
and carrying out feature fusion on the intermediate feature representation and the first semantic feature representation to obtain the boundary feature representation.
10. The method of claim 8, wherein predicting the specified texture region in the target image based on the second semantic feature representation results in a target recognition result, comprising:
inputting the second semantic feature representation to a second semantic prediction network, and outputting to obtain the target recognition result;
wherein the first semantic prediction network and the second semantic prediction network share network weight information.
11. An apparatus for identifying a specified material, the apparatus comprising:
the image processing device comprises an acquisition module, a prediction semantic graph and a prediction boundary graph, wherein the acquisition module is used for acquiring a prediction semantic graph and a prediction boundary graph corresponding to a target image, the target image comprises a specified material region to be identified, the prediction semantic graph is used for indicating the specified material region obtained through prediction by means of first semantic feature representation, and the prediction boundary graph is used for indicating the boundary of the specified material region obtained through prediction;
the first determining module is used for determining a plurality of first point feature representations from the first semantic feature representations by taking the classification entropy of a first pixel point in the prediction semantic graph as a first screening condition, wherein the classification entropy is used for indicating the determination degree of the prediction category corresponding to the first pixel point;
the second determining module is used for determining a plurality of second point feature representations from the first semantic feature representations by taking the boundary confidence coefficient of a second pixel point in the prediction boundary map as a second screening condition, wherein the boundary confidence coefficient is used for indicating the probability that the second pixel point is a boundary pixel point;
the updating module is used for carrying out feature updating on the first semantic feature representation based on the plurality of first point feature representations and the plurality of second point feature representations to obtain a second semantic feature representation;
And the prediction module is used for predicting the specified material area in the target image based on the second semantic feature representation to obtain a target recognition result.
12. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the method of identifying a specified material according to any one of claims 1 to 10.
13. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the method of identifying a specified texture as claimed in any one of claims 1 to 10.
14. A computer program product comprising a computer program or instructions which, when executed by a processor, performs a method of identifying a specified texture as claimed in any one of claims 1 to 10.
CN202310036648.3A 2023-01-10 2023-01-10 Method, device, equipment, medium and program product for identifying appointed material Pending CN116958624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310036648.3A CN116958624A (en) 2023-01-10 2023-01-10 Method, device, equipment, medium and program product for identifying appointed material

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310036648.3A CN116958624A (en) 2023-01-10 2023-01-10 Method, device, equipment, medium and program product for identifying appointed material

Publications (1)

Publication Number Publication Date
CN116958624A true CN116958624A (en) 2023-10-27

Family

ID=88453648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310036648.3A Pending CN116958624A (en) 2023-01-10 2023-01-10 Method, device, equipment, medium and program product for identifying appointed material

Country Status (1)

Country Link
CN (1) CN116958624A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975169A (en) * 2024-03-27 2024-05-03 先临三维科技股份有限公司 Object classification method, computer program product, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975169A (en) * 2024-03-27 2024-05-03 先临三维科技股份有限公司 Object classification method, computer program product, device and storage medium

Similar Documents

Publication Publication Date Title
WO2020238293A1 (en) Image classification method, and neural network training method and apparatus
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
CN111582409A (en) Training method of image label classification network, image label classification method and device
CN113177559B (en) Image recognition method, system, equipment and medium combining breadth and dense convolutional neural network
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN113807399A (en) Neural network training method, neural network detection method and neural network detection device
CN111522979B (en) Picture sorting recommendation method and device, electronic equipment and storage medium
CN116935188B (en) Model training method, image recognition method, device, equipment and medium
CN115222896B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer readable storage medium
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
Lu et al. Face recognition algorithm based on stack denoising and self-encoding LBP
CN116958624A (en) Method, device, equipment, medium and program product for identifying appointed material
CN114764865A (en) Data classification model training method, data classification method and device
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN116992124A (en) Label ordering method, device, equipment, medium and program product
Rad et al. A multi-view-group non-negative matrix factorization approach for automatic image annotation
Bahrami et al. Image concept detection in imbalanced datasets with ensemble of convolutional neural networks
Karpagam et al. Leveraging CNN Deep Learning Model for Smart Parking
CN111680722B (en) Content identification method, device, equipment and readable storage medium
CN114626520B (en) Method, device, equipment and storage medium for training model
CN118470301A (en) Multi-scale target detection method based on depth information scene graph generation
CN116977769A (en) Label labeling method, image classification model construction method and image classification method
Wang et al. Research on Image Segmentation Algorithm Based on Multimodal Hierarchical Attention Mechanism and Genetic Neural Network
CN118114037A (en) Data processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication