CN113762238A

CN113762238A - Positioning identification method, device, equipment, system and computer storage medium

Info

Publication number: CN113762238A
Application number: CN202110587092.8A
Authority: CN
Inventors: 赵龙飞; 向灵竹; 戴媛; 黄碧丹; 蒋俊南
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-12-07

Abstract

The application provides a positioning identification method, a device, equipment, a system and a computer storage medium; relates to the field of artificial intelligence, and the method comprises the following steps: acquiring corresponding preset common image characteristics of at least one object in a scene image, and generating a matching template; sliding the matching template in the scene image, and obtaining at least one piece of predicted position information corresponding to at least one object according to the matching degree of the matching template and the corresponding image part of the matching template on at least one preset sliding position in the scene image; in a scene image, performing target classification and identification on at least one pre-classified image corresponding to at least one piece of predicted position information to obtain a classification and identification result of at least one object; the at least one object belongs to at least one object class. Through the method and the device, the positioning identification precision can be improved.

Description

Positioning identification method, device, equipment, system and computer storage medium

Technical Field

The present application relates to artificial intelligence technologies, and in particular, to a method, an apparatus, a device, a system, and a computer storage medium for location identification.

Background

In recent years, with the development of deep learning, for the positioning and recognition task in image processing, the mainstream technology of target detection is biased to directly send a picture into a neural network, so that the neural network outputs the positions of all potential objects and classification and recognition information. The method is suitable for scenes of multi-scale and multi-type object identification, but because the neural network identification is easy to generate the problems of false identification, missed identification and the like, and the center of an object detected by the neural network can cause certain deviation due to the background, the positioning precision of the existing method is low, the precision of classification identification based on positioning is low, and the method is not suitable for scenes requiring high precision and high accuracy, such as scenes of playing chess by a robot and the like.

Disclosure of Invention

The embodiment of the application provides a positioning identification method, a positioning identification device, positioning identification equipment, a positioning identification system and a computer storage medium, which can improve the accuracy of object positioning.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a positioning identification method, which comprises the following steps:

acquiring corresponding preset common image characteristics of at least one object in a scene image, and generating a matching template;

sliding the matching template in the scene image, and obtaining at least one piece of predicted position information corresponding to the at least one object according to the matching degree of the matching template and the corresponding image part of the matching template on at least one preset sliding position in the scene image;

in the scene image, performing target classification and identification on at least one pre-classified image corresponding to the at least one piece of predicted position information to obtain a classification and identification result of the at least one object; the at least one object belongs to at least one object class.

In the above method, the sliding the matching template in the scene image, and obtaining at least one predicted position information corresponding to the at least one object according to the matching degree between the matching template and the corresponding image portion of the matching template at least one preset sliding position in the scene image, includes:

aligning the center position of the matching template with the at least one preset sliding position one by one to obtain a corresponding region to be matched of the matching template at each preset sliding position;

calculating the matching degree of the matching template and the image part in the region to be matched to obtain a matching score corresponding to each preset sliding position;

and determining the at least one piece of predicted position information from the at least one preset sliding position according to a preset matching strategy and the matching score.

In the above method, the performing, in the scene image, target classification and identification on at least one pre-classified image corresponding to the at least one piece of predicted location information to obtain a classification and identification result of the at least one object includes:

generating at least one candidate region on the at least one piece of predicted position information according to a preset region size;

taking the image part in the at least one candidate area as the at least one pre-classified image, and performing classified prediction of the at least one object class on each pre-classified image in the at least one pre-classified image to obtain a prediction result of each object class corresponding to each pre-classified image;

and obtaining a classification identification result of the at least one object according to the prediction result of each object class corresponding to each pre-classification image.

In the above method, the obtaining of the preset generic image feature corresponding to the at least one object in the scene image to generate the matching template includes:

acquiring an image of a scene containing at least one object from a preset acquisition position through image acquisition equipment to obtain a scene image;

extracting an image part corresponding to a single object from the scene image to be used as a template image; and carrying out image segmentation on the template image according to the preset common image characteristics to obtain the matching template.

In the above method, the matching degree calculation includes:

a square error matching algorithm, a correlation matching algorithm, or a standard matching algorithm.

In the above method, the preset common image feature includes:

at least one of contour features, pattern features, color features, texture features.

The embodiment of the application provides a positioning and identifying device, including: .

The generating module is used for acquiring corresponding preset common image characteristics of at least one object in the scene image and generating a matching template;

the positioning module is used for sliding the matching template in the scene image and obtaining at least one piece of predicted position information corresponding to the at least one object according to the matching degree of the matching template and the corresponding image part of the matching template at least one preset sliding position in the scene image;

the identification module is used for carrying out target classification identification on at least one pre-classified image corresponding to the at least one piece of predicted position information in the scene image to obtain a classification identification result of the at least one object; the at least one object belongs to at least one object class.

In the above apparatus, the positioning module is further configured to align the center position of the matching template with the at least one preset sliding position one by one, so as to obtain a region to be matched corresponding to the matching template at each preset sliding position; calculating the matching degree of the matching template and the image part in the region to be matched to obtain a matching score corresponding to each preset sliding position; and determining the at least one piece of predicted position information from the at least one preset sliding position according to a preset matching strategy and the matching score.

In the above apparatus, the identifying module is further configured to generate at least one candidate region based on a preset region size on the at least one predicted position information; taking the image part in the at least one candidate area as the at least one pre-classified image, and performing classified prediction of the at least one object class on each pre-classified image in the at least one pre-classified image to obtain a prediction result of each object class corresponding to each pre-classified image; and obtaining a classification identification result of the at least one object according to the prediction result of each object class corresponding to each pre-classification image.

In the above apparatus, the generating module is further configured to perform image acquisition on a scene including at least one object from a preset acquisition position through an image acquisition device to obtain the scene image; extracting an image part corresponding to a single object from the scene image to be used as a template image; and carrying out image segmentation on the template image according to the preset common image characteristics to obtain the matching template.

In the above apparatus, the matching degree calculation includes:

In the above apparatus, the preset commonality image feature includes:

The embodiment of the application provides a positioning identification system, including:

the image acquisition equipment is used for acquiring an image of a scene containing at least one object from a preset acquisition position to obtain a scene image;

the positioning identification device is used for extracting a template image corresponding to a single object from the scene image, and performing image segmentation on the template image according to the preset common image characteristics to obtain a matching template; sliding the matching template in the scene image, and obtaining at least one piece of predicted position information corresponding to the at least one object according to the matching degree of the matching template and the corresponding image part of the matching template on at least one preset sliding position in the scene image; in the scene image, performing target classification and identification on at least one pre-classified image corresponding to the at least one piece of predicted position information to obtain a classification and identification result of the at least one object; the at least one object belongs to at least one object class;

a control device for generating an operation instruction for a target object in the at least one object based on the at least one predicted position information and a classification recognition result of the at least one object;

and the execution equipment is used for operating the target object according to the operation instruction.

The embodiment of the application provides a positioning identification device, including:

a memory for storing executable instructions;

and the processor is used for realizing the positioning identification method provided by the embodiment of the application when the processor executes the executable instructions stored in the memory.

The embodiment of the present application provides a computer storage medium, which stores executable instructions for causing a processor to execute the computer storage medium, so as to implement the positioning identification method provided by the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

in the embodiment of the application, a matching template is generated according to the preset common image characteristics of at least one object, and at least one piece of predicted position information of the at least one object in a scene image is obtained through a template matching method, so that the object positioning precision can be improved; and the target classification and identification are carried out on at least one pre-classified image corresponding to at least one piece of predicted position information, so that the interference of irrelevant information on the classification and identification of the object is reduced, and the accuracy of the classification and identification is improved.

Drawings

Fig. 1 is an alternative structural diagram of a positioning identification system architecture provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an alternative position of an image capturing device in an architecture of a location identification system according to an embodiment of the present application;

fig. 3 is an alternative structural diagram of a positioning identification device provided in an embodiment of the present application;

fig. 4 is an alternative flow chart of a positioning identification method provided in the embodiment of the present application;

fig. 5 is an alternative flow chart of a positioning identification method provided in the embodiment of the present application;

fig. 6 is an alternative schematic diagram of a chessboard scene image of the chinese chess provided by the embodiment of the present application;

figure 7 is an alternative schematic view of a pawn image as provided by an embodiment of the application;

FIG. 8 is an alternative diagram of a matching template extracted from a pawn image according to an embodiment of the present application;

fig. 9 is an alternative flowchart of a positioning identification method according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a process of sliding a matching template on a scene image according to an embodiment of the present application;

fig. 11 is an alternative flowchart of a positioning identification method according to an embodiment of the present application;

FIG. 12 is a diagram illustrating the effect of a matching score distribution provided by an embodiment of the present application;

FIG. 13 is a schematic diagram illustrating an effect of generating at least one candidate region according to an embodiment of the present application;

fig. 14 is a schematic diagram of a classification recognition result of at least one image to be classified according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

2) Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

3) Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

4) A Robot (Robot) is a machine device that automatically performs work, and generally includes an actuator, a driving device, a detecting device, a control system, and a complex machine. It can accept human command, run the program programmed in advance, and also can operate according to the principle outline action made by artificial intelligence technology. Its task is to assist or replace the work of human work. The chess playing robot is a specific application of the robot in chess games, and can autonomously complete the whole chess playing process like a human.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:

currently, for an object positioning scene, a common positioning method mainly obtains a position of an object in an acquired picture through a neural network, then measures a distance by using a depth camera, obtains a spatial three-dimensional position of the object in the camera, and finally obtains the three-dimensional spatial position of the object by combining the position of the object in the picture and the spatial three-dimensional position of the object. However, the neural network recognition is prone to problems such as false recognition and missing recognition, and the center of an object detected by the neural network may be shifted by a certain amount due to the background. And the depth camera also has a very large error in ranging, thereby reducing the accuracy of object positioning. Especially for scenes requiring high precision and high accuracy, such as scenes of playing chess by a robot, the positioning effect is very poor, and the accuracy of further image identification based on positioning is influenced.

Embodiments of the present application provide a positioning identification method, apparatus, device, system, and computer storage medium, which can improve the accuracy of positioning identification, and an exemplary application of the positioning identification device provided in the embodiments of the present application is described below. In the following, an exemplary application will be explained when the location identification device is implemented as a server.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a positioning recognition system 100 according to an embodiment of the present application, in order to support a positioning recognition task, such as a robot chess playing task, an image capturing device 400 is connected to a server 200 through a network 300, the server 200 is connected to a control device 600, the control device 600 is connected to an execution device 500, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The capturing device 400 is configured to capture an image of a scene including at least one object from a preset capturing position, obtain a scene image, and transmit the scene image to the server 200. Here, the at least one object may be a plurality of pawns and the at least one object may be a plurality of pawns for the robot chess task; the scene image may be a board image comprising a plurality of pawn images. In some embodiments, the preset acquisition position may be directly above a scene, such as a checkerboard.

The server 200 is configured to extract a template image corresponding to a single object from the scene image, and perform image segmentation on the template image according to a preset common image feature to obtain a matching template; and sliding the matching template in the scene image, and obtaining at least one piece of predicted position information corresponding to at least one object according to the matching degree of the matching template and the corresponding image part of the matching template in at least one preset sliding position in the scene image. Here, the preset common image feature may be a common feature of a plurality of chess pieces, for example, if the outline of the chess pieces is a circle of the same size, the outline of the shape of the chess pieces may be used as the preset common image feature, and the frame of the chess piece circle may be generated as the matching template. The server 200 transmits the at least one piece of predicted position information and the classification recognition result of the at least one object to the control execution device 400, wherein the server 200 and the control execution device 400 may be connected through a network or may be connected through other device connection methods, which is not limited specifically herein.

The control device 600 is configured to generate an operation instruction for a target object of the at least one object based on the at least one predicted position information and the classification recognition result of the at least one object. Here, the control execution device may obtain positions of various types of chess pieces on the chessboard according to at least one piece of predicted position information and a classification recognition result of at least one object, and further generate an operation instruction for a target chess piece in at least one chess piece according to preset chess rules and policy logic, that is, move the target chess piece to the target position on the chessboard.

And the execution device 500 is used for operating the target object according to the operation instruction. Here, the execution device may include a gripper of the robot, and the gripper may grip the target chess piece and place the target chess piece to the target position according to the operation instruction, thereby completing one chess playing operation of the robot.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The control device 600 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The control device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

Here, it should be noted that, in fig. 1, the image capturing device 400 is connected to the executing device 500, and a default starting position of the executing device is a preset capturing position of the image capturing device 400, such as a position right above a chessboard, so that the image capturing device 400 can capture images on the chessboard at the preset capturing position, and distortion of captured scene images caused by capturing angle deviation is avoided, which affects positioning accuracy. In some embodiments, after each movement and operation of the target object, for example, after each movement of the grasping target chess piece to the target position, the executing device may return to its default starting position, so that the image capturing device may be located at the preset capturing position for image capturing when the next chess playing operation is performed.

In some embodiments, the image capturing device may also be fixed at a preset capturing position for capturing a scene through the supporting component, and fig. 2 shows a schematic diagram of the image capturing device 110 fixed at a preset capturing position directly above the chessboard through the supporting component 111 for capturing a scene image on the chessboard.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 3 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 3.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates a location identification apparatus 255 stored in the memory 250, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: a generating module 2551, a positioning module 2552 and an identifying module 2553, which are logical and therefore can be arbitrarily combined or further split depending on the functions implemented.

The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the positioning identification method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The positioning identification method provided by the embodiment of the present application will be described in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present application.

Referring to fig. 4, fig. 4 is an alternative flowchart of a positioning identification method provided in the embodiment of the present application, which will be described with reference to the steps shown in fig. 4.

S101, acquiring corresponding preset common image characteristics of at least one object in a scene image, and generating a matching template.

The positioning and identifying method is suitable for scenes that object objects are relatively fixed, but the positioning accuracy and the classification accuracy are extremely high, such as objects with fixed appearances and different types on a production line, chess playing by a robot, or other object positioning and identifying scenes with common stable characteristics.

In the embodiment of the application, the positioning and identifying device acquires a scene image containing at least one object, and then generates a matching template according to a corresponding preset common image characteristic of the at least one object in the scene image.

In some embodiments, the preset common image feature may be an image feature common to the extracted at least one object image in the at least one object image corresponding to the at least one object in the scene image, based on a priori knowledge of the at least one object. The preset common image feature may also be an image detection method based on artificial intelligence, and the image feature common to at least one object image is extracted from at least one object image by comparing the at least one object image. The specific selection is performed according to actual conditions, and the embodiments of the present application are not limited.

In some embodiments, the pre-set commonality image features comprise: at least one of contour features, pattern features, color features, texture features.

In the embodiment of the application, the contour feature may be an appearance or shape feature common to at least one object image; for example, for a chess piece, the contour feature may be a circular piece contour, and the pattern feature may be a landmark pattern feature common to at least one object image, and for example, for a product on a pipeline, the pattern feature may be a pattern identifier common to the product. The color feature may be a color distribution, a color composition, and a mutual relationship between colors, which are common to at least one object image; the texture feature may be a common image texture feature of the at least one object image.

In some embodiments, the preset common image feature may also be another type of common visual feature of the at least one object, which is specifically selected according to the actual situation, and the embodiment of the present application is not limited thereto.

S102, sliding the matching template in the scene image, and obtaining at least one piece of predicted position information corresponding to at least one object according to the matching degree of the matching template and the corresponding image part of the matching template in at least one preset sliding position in the scene image.

In the embodiment of the application, the scene image includes at least one preset sliding position, that is, a preset sliding point coordinate, the positioning and identifying device can slide in the scene image by using the matching template, when each preset sliding position is reached, the matching degree of the matching template and the image part in the area where the matching template is located is calculated, and the matching degree corresponding to each preset sliding position is obtained by traversing at least one preset sliding position.

In this embodiment of the application, the positioning identification apparatus may further determine, according to the matching degree corresponding to each preset sliding position, at least one piece of predicted position information corresponding to at least one object in the scene image.

In some embodiments, the positioning recognition means may use at least one preset slide position with a high matching degree as the at least one predicted position information.

S103, performing target classification and identification on at least one pre-classified image corresponding to at least one piece of predicted position information in the scene image to obtain a classification and identification result of at least one object; the at least one object belongs to at least one object class.

In the embodiment of the application, when the positioning identification device obtains at least one piece of predicted position information, a positioning result of at least one object in the scene image is obtained. The positioning identification device can position at least one prediction area corresponding to at least one object in the scene image based on at least one piece of prediction position information, and then target classification identification is carried out on the image in the at least one prediction area, namely at least one pre-classification image, instead of carrying out classification identification on the whole image, so that the targeted identification can greatly improve the precision of classification identification.

In the embodiment of the application, the positioning identification device correspondingly takes the classification identification result of at least one pre-classification image as the classification identification result of at least one object.

It can be understood that, in the embodiment of the present application, a matching template is generated according to preset generic image features of at least one object, and at least one piece of predicted position information of the at least one object in a scene image is obtained by a template matching method, so that the accuracy of object positioning can be improved; and the target classification and identification are carried out on at least one pre-classified image corresponding to at least one piece of predicted position information, so that the interference of irrelevant information on the classification and identification of the object is reduced, and the accuracy of the classification and identification is improved.

In some embodiments, referring to fig. 5, fig. 5 is an optional flowchart of the positioning identification method provided in the embodiments of the present application, based on fig. 4, S101 may be implemented by performing S1011-S1013,

and S1011, acquiring an image of a scene containing at least one object from a preset acquisition position through image acquisition equipment to obtain a scene image.

In the embodiment of the application, the positioning and identifying device can acquire images of real scenes needing positioning and identifying from a preset acquisition position through image acquisition equipment such as a camera, an image sensor and the like to obtain scene images. Wherein the real scene image contains at least one object.

And S1012, extracting an image part corresponding to a single object from the scene image to be used as a template image.

In this embodiment of the application, the positioning recognition device may extract an image portion corresponding to a single object from a scene image, and exemplarily extract an image of any single chess piece from a scene image of a chessboard including a plurality of chess piece images, as a template image.

In some embodiments, for a scene image of the chessboard as shown in fig. 6, the position recognition means may extract an image of one pawn, exemplarily a pawn "s", therefrom as the template image, as shown in fig. 7.

And S1013, performing image segmentation on the template image according to the preset common image characteristics to obtain a matched template.

In the embodiment of the application, the positioning and identifying device may preset common image features according to at least one object, for example, when the at least one object is a chess piece, the preset common image features may be a ring on the piece image, and the template image is subjected to image segmentation to obtain the matching template.

In some embodiments, for the template image shown in fig. 7, the positioning and recognizing device may perform image segmentation on the template image by using a color segmentation method to obtain a "scholar" pattern and a circular ring border pattern, and the positioning and recognizing device uses the circular ring border pattern obtained by segmentation as a matching template according to a preset common image feature, i.e., a circular ring, as shown in fig. 8.

In some embodiments, after the positioning and identifying device performs image segmentation on the template image, the segmented image may be further processed by manually removing noise points, so as to improve the image definition and obtain a matching template.

It can be understood that, in the embodiment of the present application, the template image of a single object is subjected to image segmentation according to the preset common image feature of at least one object to obtain the matching template, so that at least one object in the scene image can be positioned through the matching template including the preset common image feature, and thus, the positioning accuracy is improved.

In some embodiments, referring to fig. 9, fig. 9 is an optional flowchart of the positioning identification method provided in the embodiment of the present application, and based on fig. 4 or fig. 5, S102 may be implemented by performing S1021-S1023, which will be described with reference to each step.

And S1021, aligning the center position of the matching template with at least one preset sliding position one by one to obtain a region to be matched corresponding to the matching template at each preset sliding position.

In the embodiment of the application, the positioning and identifying device aligns the center position of the matching template with at least one preset sliding position one by one to obtain the to-be-matched area corresponding to each preset sliding position of the matching template.

In some embodiments, the at least one preset sliding position may be coordinates of each pixel point in the scene image, and the positioning identification device may slide the matching template from top to bottom along a preset sliding track, for example, from left to right in the scene image, align the center position of the matching template with the coordinates of each pixel point in the scene image one by one, traverse the entire scene image, and obtain a region to be matched corresponding to each preset sliding position of the matching template. As shown in fig. 10.

In some embodiments, the at least one preset sliding position may also be pre-specified in all pixel points included in the scene image, for example, pixel points within one or more ranges are pre-specified in the scene image as the at least one preset sliding position, or at least one pixel coordinate point screened out by a preset screening policy is used as the at least one preset sliding position, so as to reduce the amount of computation of the positioning recognition device during template sliding matching and improve the matching speed. The specific selection is performed according to actual conditions, and the embodiments of the present application are not limited.

In some embodiments, the positioning and recognizing device may also align the center position of the matching template with at least one preset sliding position one by one in a parallel processing manner, so as to obtain the to-be-matched region corresponding to each preset sliding position of the matching template.

And S1022, calculating the matching degree of the matching template and the image part in the region to be matched to obtain a matching score corresponding to each preset sliding position.

In the embodiment of the application, the positioning identification device calculates the matching degree of the matching template and the image part in the region to be matched at each sliding position to obtain the matching score corresponding to each preset sliding position.

In some embodiments, the positioning and identifying device may calculate the image matching degree between the matching template and the image portion in the region to be matched by using a matching degree calculation method such as square error matching, correlation matching, standard matching, and the like, so as to obtain a matching score corresponding to each preset sliding position.

In some embodiments, when the correlation matching algorithm is used for the matching degree calculation, the matching score may be a numerical value within the [ -1,1] interval. Wherein, a matching score of 1 indicates perfect matching, i.e. positive correlation matching, a matching score of-1 indicates negative correlation matching, and a matching score of 0 indicates zero correlation matching, i.e. no correlation.

It can be seen that the matching score represents the matching degree between the image portion in the region to be matched and the preset commonality image feature, and when the correlation between the matching degree between the image portion in the region to be matched and the preset commonality image feature is high, it indicates that the possibility that the image portion in the region to be matched is the image corresponding to the object is also high.

And S1023, determining at least one piece of predicted position information from at least one preset sliding position according to a preset matching strategy and a matching score.

In the embodiment of the application, when the positioning and recognizing device obtains the matching score corresponding to each preset sliding position, the matching scores can be screened according to a preset matching strategy, at least one target matching score meeting the preset matching strategy is determined, and the preset sliding position corresponding to the at least one target matching score is used as at least one piece of predicted position information, so that the at least one piece of predicted position information is determined from the at least one preset sliding position.

In this embodiment of the application, the preset matching policy may be that, according to the number of the at least one object, a preset sliding position corresponding to a matching score of the number of the top objects with the matching degree ranked from high to low is used as the at least one piece of predicted position information. Other preset matching strategies can be selected according to actual conditions, and the selection is specifically performed according to the actual conditions, which is not limited in the embodiment of the application.

It can be understood that, by using the matching template to perform sliding matching in the scene image, the position of at least one object can be accurately located in the scene image, and the accuracy of object location is improved.

In some embodiments, referring to fig. 11, fig. 11 is an optional flowchart of the positioning identification method provided in the embodiments of the present application, and based on fig. 4, fig. 5, or fig. 9, S103 may be implemented by performing S1031 to S1033, which will be described with reference to each step.

And S1031, generating at least one candidate region on the at least one piece of predicted position information according to a preset region size.

In this embodiment of the application, the positioning identification apparatus may generate at least one candidate region on at least one piece of predicted position information in the scene image according to a preset region size.

In some embodiments, the positioning identification apparatus may generate a candidate region corresponding to each predicted position according to a preset region size by using each predicted position information of the at least one predicted position information as a central point, so as to obtain the at least one candidate region.

In some embodiments, for a chess piece positioning scene, because the size of each chess piece is fixed, when the positioning recognition device obtains at least one piece of predicted position information, the size of the chess piece can be used as a preset area size through a target detection neural network in an artificial intelligence technology, and at least one candidate frame with the size of the preset area size is generated by taking each piece of predicted position information as a central point and is used as at least one candidate area.

S1032, taking the image part in the at least one candidate area as at least one pre-classified image, and performing classified prediction of at least one object class on each pre-classified image in the at least one pre-classified image to obtain a prediction result of each object class corresponding to each pre-classified image.

In the embodiment of the application, the positioning and identifying device may use an image portion in at least one candidate region as at least one pre-classified image, classify and identify the at least one pre-classified image through a convolutional neural network, obtain a probability that each pre-classified image belongs to each object class, and use the probability as a prediction result of each pre-classified image corresponding to each object class.

Here, the convolutional neural network may be a multi-target detection neural network for outputting a probability that each pre-classified image belongs to at least one object class. The positioning identification device can predict the object class of each pre-classified image through the probability that each pre-classified image output by the convolutional neural network belongs to at least one object class, and the identification result of at least one object is obtained.

In some embodiments, the convolutional neural network may be a multi-target classification recognition network model obtained by training an initial convolutional neural network model through a machine learning method using a sample image set of at least one object in advance. For example, the positioning identification device may collect an image of each chess piece and correspondingly mark the image of each chess piece with a chess piece category thereof as a sample image set; and then training by using the sample image set to obtain the multi-target classification recognition network model.

In some embodiments, the multi-target classification detection network model may be a Once-Only Look (YOLO) model or other multi-target detection models, which are specifically selected according to actual situations, and the embodiment of the present application is not limited.

S1033, obtaining a classification recognition result of at least one object according to the prediction result of each object class corresponding to each pre-classification image.

In this embodiment of the application, the classification and identification device may predict the object class to which each pre-classified image finally belongs according to the prediction result of each pre-classified image corresponding to each object class, and use the predicted object class as the object class to which the object corresponding to each pre-classified image belongs, so as to identify the object class to which each object belongs, and use the object class as the classification and identification result of at least one object.

It can be understood that, in the embodiment of the present application, by classifying and identifying at least one pre-classified image corresponding to at least one predicted position, the range of the target detection processing performed by the neural network can be narrowed, and the target detection processing is limited to each predicted position for performing prediction of the related pre-classified image, so that the interference of the background image is reduced, and the accuracy of the positioning and identifying is improved.

Next, an exemplary application of the embodiment of the present application will be described by taking an example in which the positioning identification method in the embodiment of the present application is applied to a positioning identification scene in which a robot plays chess.

In the embodiment of the application, the Chinese chess scene mainly comprises a chessboard and chess pieces with rings. The positioning and identifying equipment firstly shoots a scene image of a chess scene through a camera hung right above a chessboard, extracts chess piece pictures of single chess pieces on the scene image, and obtains the chess piece pictures, and then obtains the borders of the chess piece rings as matching templates by adopting a method of color segmentation and manual noise point removal.

In the embodiment of the application, before the robot executes chess playing operation each time, the positioning and identifying device acquires a current scene image through the camera so as to obtain the nearest chess piece arrangement on the chessboard. The positioning identification device can adopt a template matching method, slide the matching template from left to right and from top to bottom on the current scene image, calculate the matching degree of the matching template and the local part of the picture, namely the region to be matched, on each pixel point of the current scene image, and obtain the positions of all circular rings on the picture. After the complete graph is traversed, a matching score distribution graph can be obtained, for example, the positioning and identifying device performs template matching on the chess scene shown in fig. 6, the obtained matching score distribution graph can be shown in fig. 12, and in fig. 12, a more white circle indicates that the matching degree of the matching template and the region to be matched is higher. The positioning identification device can determine the first 32 matching scores with higher scores from the matching score distribution graph according to the number of the chess pieces, and uses the preset sliding positions corresponding to the first 32 matching scores as at least one predicted position, so that the piece center point of each piece can be positioned in the current scene image.

In the embodiment of the present application, since the default pawn size is fixed, the candidate box size is fixed, here, 79 × 79 may be used as a preset candidate box size, that is, a preset region size, and a candidate box to be classified with a size of 79 × 79 is generated at the center point of each pawn as at least one candidate region, such as the box region shown in fig. 13. The positioning identification equipment sends the current scene image containing a plurality of candidate frames to be classified into a convolutional neural network with ResNet18 as a main network, further converts a feature tensor output by the ResNet18 main network into a one-dimensional vector through a full-connection network, and finally converts a 79 x 3-dimensional pre-classified image in each candidate frame to be classified into a 1 x 14-dimensional vector, wherein each dimension of the vector corresponds to the confidence coefficient of one chess category. Here, 79 × 3 dimensions correspond to dimensions of the length, width, and RGB values of the presorted image, respectively, and each dimension in the 1 × 14-dimensional vector corresponds to one type of object category. When at least one object is chess, the dimensions corresponding to the 1-by-14-dimensional vector may be as shown in fig. 14, including: w _ chariot, corresponding to a white square "car"; w _ horse, corresponding to the white square "horse"; w _ elepha, corresponding to the white square "elephant"; w _ general, corresponding to "will" in the white square; w _ advisor, corresponding to the white square "shi"; w _ cannon, corresponding to the white square "shot"; w _ soldier, corresponding to "pawn" of white square; r _ soldier, corresponding to the red "soldier"; r _ cannon, corresponding to the red square "shot"; r _ chariot, corresponding to the red square "car"; r _ horse, corresponding to the red square "horse"; r _ elepha, corresponding to the red square "phase"; r _ general, corresponding to "commander" in red; r _ advisor, corresponding to the red square "shi"; r _ cannon, corresponding to the red square "shot". For the 1 × 14-dimensional vector corresponding to each candidate frame to be classified, the positioning and identifying device selects the dimension with the highest numerical value as the object category of the candidate frame to be classified, and finally obtains the chess category at the corresponding position, as shown in fig. 14. Therefore, the positioning and identifying equipment realizes that the pre-classified images are obtained by using template matching on the original current scene image of the chessboard, then the object class of each pre-classified image is obtained by using the neural network, and finally the state information of all the chessmen on the chessboard picture is obtained.

In some embodiments, since the state of the chess pieces may change during playing chess, the positioning and identifying device may collect the current scene image once every preset time interval on the chessboard, and the method in the embodiments of the present application is applied to the latest obtained current scene image. Or, the positioning and identifying device may acquire the current scene image once on the board before each piece in the board game, and apply the method in the embodiment of the present application to the latest current scene image. The specific selection is performed according to actual conditions, and the embodiments of the present application are not limited.

It can be understood that the Chinese chess identification and positioning method provided by the embodiment of the application can realize the accurate positioning of the Chinese chess, thereby providing visual support for the robot to play chess, and the common Chinese chess with the circular ring can meet the requirements without modifying the chess pieces on the chessboard. Through the experiment, when the embodiment of the application is applied to a robot chess playing scene, the recognition and positioning precision can reach 1mm, and the requirement for grabbing by a mechanical clamping jaw can be met.

Continuing with the exemplary structure of the positioning recognition device 255 implemented as software modules provided in the embodiments of the present application, in some embodiments, as shown in fig. 3, the software modules stored in the positioning recognition device 255 of the memory 250 may include:

a generating module 2551, configured to obtain preset common image features of at least one object in a scene image, and generate a matching template;

a positioning module 2552, configured to slide the matching template in the scene image, and obtain at least one piece of predicted position information corresponding to the at least one object according to a matching degree between the matching template and a corresponding image portion of the matching template at least one preset slide position in the scene image;

an identifying module 2553, configured to perform target classification and identification on at least one pre-classified image corresponding to the at least one piece of predicted position information in the scene image, to obtain a classification and identification result of the at least one object; the at least one object belongs to at least one object class.

In some embodiments, the positioning module 2551 is further configured to align the center position of the matching template with the at least one preset sliding position one by one, so as to obtain a corresponding to-be-matched region of the matching template at each preset sliding position; calculating the matching degree of the matching template and the image part in the region to be matched to obtain a matching score corresponding to each preset sliding position; and determining the at least one piece of predicted position information from the at least one preset sliding position according to a preset matching strategy and the matching score.

In some embodiments, the identifying module 2552 is further configured to generate at least one candidate region according to a preset region size on the at least one predicted location information; taking the image part in the at least one candidate area as the at least one pre-classified image, and performing classified prediction of the at least one object class on each pre-classified image in the at least one pre-classified image to obtain a prediction result of each object class corresponding to each pre-classified image; and obtaining a classification identification result of the at least one object according to the prediction result of each object class corresponding to each pre-classification image.

In some embodiments, the generating module 2553 is further configured to perform image acquisition on a scene including at least one object from a preset acquisition position through an image acquisition device, so as to obtain the scene image; extracting an image part corresponding to a single object from the scene image to be used as a template image; and carrying out image segmentation on the template image according to the preset common image characteristics to obtain the matching template.

In some embodiments, the matching degree calculation includes:

In some embodiments, the predetermined commonality image characteristic comprises:

It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.

The embodiment of the present application provides a computer storage medium, which is a computer readable storage medium, and stores executable instructions, and when the executable instructions are executed by a processor, the executable instructions will cause the processor to execute the method provided by the embodiment of the present application, for example, the method as shown in fig. 4, 5, 9\ 11.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a hypertext Markup Language (HT ML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the present application, the matching template is generated according to the preset generic image features of at least one object, and at least one piece of predicted position information of the at least one object in the scene image is obtained by using the template matching method, so that the accuracy of object positioning can be improved; and the target classification and identification are carried out on at least one pre-classified image corresponding to at least one piece of predicted position information, so that the interference of irrelevant information on the classification and identification of the object is reduced, and the accuracy of the classification and identification is improved. The positioning identification method in the embodiment of the application can be applied to scenes of playing chess with the robot, and has good positioning identification effect on positioning other objects with common stable characteristics, such as scenes of carrying out automatic detection and identification on products on a production line, and the like.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for location identification, comprising:

2. The method according to claim 1, wherein the sliding the matching template in the scene image, and obtaining at least one predicted position information corresponding to the at least one object according to a matching degree between the matching template and a corresponding image portion of the matching template at least one preset sliding position in the scene image, comprises:

3. The method according to claim 1, wherein performing target classification recognition on at least one pre-classified image corresponding to the at least one predicted position information in the scene image to obtain a classification recognition result of the at least one object includes:

4. The method according to claim 1, wherein the obtaining of the corresponding preset commonality image feature of the at least one object in the scene image and the generating of the matching template comprise:

extracting an image part corresponding to a single object from the scene image to be used as a template image;

and carrying out image segmentation on the template image according to the preset common image characteristics to obtain the matching template.

5. The method of claim 2, wherein the degree of match calculation comprises:

6. The method according to any one of claims 1-5, wherein the pre-defined commonality image features comprise:

7. A location identification system, comprising:

8. A position recognition apparatus, comprising:

9. A location identification device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 6 when executing executable instructions stored in the memory.

10. A computer storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 6.