CN112633384B

CN112633384B - Object recognition method and device based on image recognition model and electronic equipment

Info

Publication number: CN112633384B
Application number: CN202011566951.7A
Authority: CN
Inventors: 余志良; 吕雪莹; 赵乔; 蒋佳军; 陈泽裕; 赖宝华; 罗倩慧; 高松鹤; 李康宇; 朱玉石; 王成; 徐铭远; 侯继旭
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-11-01
Anticipated expiration: 2040-12-25
Also published as: CN112633384A

Abstract

The application discloses an object identification method and device based on an image identification model and electronic equipment, and relates to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: the method comprises the steps of obtaining confidence threshold values of a plurality of candidate types, identifying an acquired image by adopting an image identification model to obtain a target region and a confidence coefficient of each candidate type, wherein the target region is a region containing an object in the acquired image, the confidence coefficient is used for indicating the probability that the object belongs to the corresponding candidate type, and determining the target type from the candidate types according to the confidence threshold values of the candidate types, wherein the target type is a candidate type with the corresponding confidence coefficient being larger than the corresponding confidence threshold value, and the object contained in the target region is determined to belong to the target type. In the method and the device, the corresponding confidence threshold values are set for different types, so that the accuracy of object identification is improved, and the accuracy requirements of different scene identification are met.

Description

Object identification method and device based on image identification model and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and particularly relates to an object identification method and device based on an image identification model and electronic equipment.

Background

At present, deep learning has performed significantly in the fields of image recognition, speech recognition, natural language processing, computational biology, recommendation systems, and the like. The deployment of the deep learning model is realized, and is the last ring of the AI application landing.

In the related art, for different types of identification scenes, when the samples are unevenly distributed, the object identification recall rate is not good for some types with fewer samples, and the identification precision requirements of different scenes cannot be met.

Disclosure of Invention

The application provides an object identification method and device based on an image identification model for improving object identification accuracy and electronic equipment.

According to an aspect of the present application, there is provided an object recognition method based on an image recognition model, including:

obtaining confidence thresholds of a plurality of candidate types;

identifying the acquired image by adopting an image identification model to obtain a target area and obtain the confidence coefficient of each candidate type; wherein the target region is a region in the captured image that contains an object, the confidence level being indicative of a probability that the object belongs to a corresponding candidate type;

determining a target type from a plurality of candidate types according to the confidence thresholds of the candidate types, wherein the target type is a candidate type of which the corresponding confidence is greater than the corresponding confidence threshold;

and determining that the object contained in the target area belongs to the target type.

According to another aspect of the present application, there is provided an object recognition apparatus based on an image recognition model, including:

an obtaining module for obtaining confidence threshold values of a plurality of candidate types;

the processing module is used for identifying the acquired image by adopting an image identification model to obtain a target area and obtain the confidence coefficient of each candidate type; wherein the target region is a region of the captured image that contains an object, the confidence level being indicative of a probability that the object belongs to a corresponding candidate type;

a first determining module, configured to determine a target type from the multiple candidate types according to confidence thresholds of the multiple candidate types, where the target type is a candidate type whose corresponding confidence is greater than the corresponding confidence threshold;

and the identification module is used for determining that the object contained in the target area belongs to the target type.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of image recognition model-based object recognition of the first aspect.

According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the image recognition model-based object recognition method of the first aspect.

According to another aspect of the application, a computer program product is provided, comprising a computer program which, when being executed by a processor, implements the image recognition model based object recognition method according to the first aspect.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present application, nor are they intended to limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of an object recognition method based on an image recognition model according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another object recognition method based on an image recognition model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a user interaction interface provided by an embodiment of the present application;

fig. 4 is a schematic view of a camera image recognition provided in the embodiment of the present application;

fig. 5 is a second schematic view illustrating image recognition of a camera according to an embodiment of the present application;

fig. 6 is a third schematic view illustrating image recognition of a camera according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an object recognition apparatus based on an image recognition model according to an embodiment of the present application;

fig. 8 is a schematic block diagram of an example electronic device 800 of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An object recognition method, an object recognition device and an electronic device based on an image recognition model according to the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of an object recognition method based on an image recognition model according to an embodiment of the present disclosure.

As shown in fig. 1, the method comprises the steps of:

step 101, obtaining confidence thresholds of a plurality of candidate types.

In this embodiment, different candidate types have corresponding confidence thresholds, that is, the candidate types are different and the confidences are different. The types are related to classification scenes, for example, in a production scene, the types include hexagonal screws, self-tapping screws and the like according to the classification of the screws. In a vehicle type identification scenario, the types include cars, vans, off-road vehicles, and the like.

The confidence threshold of the candidate type in this embodiment indicates the probability that the object belongs to the type, and by setting confidence thresholds corresponding to different candidate types, the accuracy requirement of type identification of different identification scenes is met, so that the problem that the accuracy of identification of the object type with fewer training samples is low due to different training degrees of identification models for the type with fewer samples and the type with more samples when the samples are distributed unevenly is solved.

And 102, identifying the acquired image by adopting an image identification model to obtain a target area and obtain the confidence coefficient of each candidate type.

The image recognition model in this embodiment is obtained by training in a deep learning manner using corresponding training samples, and the image recognition model is used to recognize the acquired image to obtain a target region and obtain confidence levels of the candidate types. The target region is a region containing an object in the acquired image, and the confidence coefficient is used for indicating the probability that the object belongs to the corresponding candidate type. For example, the object is a hexagonal screw, a target region is determined by image recognition model recognition, the target region is a region containing the hexagonal screw in the captured image, the probability that the object belongs to the type of the hexagonal screw is 0.88, the probability that the object belongs to the tapping screw is 0.4, and the like.

Step 103, determining a target type from the multiple candidate types according to the confidence threshold values of the multiple candidate types.

Wherein the target type is a candidate type with a corresponding confidence greater than a corresponding confidence threshold.

In this embodiment, the confidence levels of the objects identified by the identification model, which belong to the candidate types, are compared with the confidence level threshold set for the candidate types, and the candidate types with the corresponding confidence levels larger than the confidence level threshold are used as the target types of the objects, so that the reliability of determining the types of the objects in the image is improved.

For example, in this embodiment, the confidence threshold of the hexagonal screw is set to 0.7, and the confidence threshold of the tapping screw is set to 0.5, so that, according to the recognition confidence output by the image recognition model in the above steps, the confidence 0.88 of the recognized hexagonal screw is greater than the corresponding confidence threshold 0.7, and the confidence 0.4 of the recognized tapping screw is less than the corresponding confidence threshold 0.5, and therefore, it is determined that the target type of the object is a hexagonal screw.

And 104, determining that the object contained in the target area belongs to the target type.

Furthermore, in the embodiment, the object included in the target area identified in the acquired image is determined to belong to the target type, so that the reliability of determining the object type in the image is improved.

In the object recognition method based on the image recognition model of the embodiment, confidence threshold values of a plurality of candidate types are obtained, the image recognition model is adopted to recognize the collected image to obtain a target region and confidence coefficients of the candidate types, the target region is a region containing an object in the collected image, the confidence coefficients are used for indicating the probability that the object belongs to the corresponding candidate types, and the target type is determined from the candidate types according to the confidence threshold values of the candidate types, wherein the target type is a candidate type with the corresponding confidence coefficient larger than the corresponding confidence threshold value, and the object contained in the target region is determined to belong to the target type.

Based on the previous embodiment, the present embodiment provides another object identification method based on an image identification model, which explains that by setting a size range, the accuracy of determining a target type object identified in a target area is improved.

Fig. 2 is a schematic flowchart of another object recognition method based on an image recognition model according to an embodiment of the present application. As shown in fig. 2, the method comprises the following steps:

step 201, obtaining confidence thresholds of a plurality of candidate types.

In one implementation manner of the embodiment, the confidence threshold is set by responding to the setting for operation, specifically, responding to a first setting operation, which may be a click setting operation or a slide setting operation, to set confidence thresholds of multiple candidate types. The confidence thresholds of a plurality of candidate types are manually set based on the number of training samples of the image recognition model according to the setting operation of the response user, the number of training samples is small, the degree of model training is not enough, the accuracy of model recognition is low, that is, the confidence value of the type obtained through the model recognition is low, and the corresponding confidence threshold is set to be a small value, for example, 0.5; when the number of training samples is large, the recognition accuracy of the trained model is high, and then the corresponding confidence threshold is set to be a high value, for example, 0.8, so that the recognition accuracy requirements under different classification scenes are met.

For example, fig. 3 is a schematic view of a user interaction interface provided in the embodiment of the present application, and as shown in fig. 3, a user may perform confidence threshold setting for different candidate types on the interaction interface through a confidence adjustment button.

In another implementation manner of this embodiment, the setting of the confidence threshold is automatically set, and specifically, the electronic device automatically determines the confidence threshold of each candidate type according to the number of training samples of each candidate type in the training sample set of the image recognition model. The number of samples and the confidence threshold value are in a direct proportional relationship, the number of training samples is small, the degree of model training is insufficient, the accuracy of model identification obtained by training is low, that is, the confidence value of the type obtained by the model identification is low, and the corresponding confidence threshold value is set to be a small value, for example, 0.5; when the number of training samples is large, the recognition accuracy of the trained model is high, and then the corresponding confidence threshold is set to be a high value, for example, 0.8, so that the recognition accuracy requirements under different classification scenes are met.

Step 202, identifying the collected image by using an image identification model to obtain a target area and obtain confidence of each candidate type. In the embodiment, one or more industrial cameras can be connected, the images acquired by the cameras can be directly identified, various camera adaptation schemes are provided, and meanwhile, a driver does not need to be installed.

In one implementation manner of the application, a video stream containing multi-frame collected images is acquired from an image sensor, the acquired video stream is converted into data in an array format, and the data in the array format is input to an image recognition model for recognition. As shown in fig. 4, a recognition schematic diagram is shown, it should be noted that the industrial camera in fig. 4 may be one industrial camera or multiple industrial cameras, where in a scene of multiple industrial cameras, an image recognition model corresponding to each industrial camera may be selected to be the same or different. The model of the corresponding camera may be selected through the user interface UI shown in fig. 3, which is not limited in this embodiment.

In another implementation manner of this embodiment, a video stream including multiple frames of captured images is obtained from an image sensor, the video stream is converted into a video file in a predetermined file format, and the video file in the predetermined file format is input to an image recognition model for recognition by using a predetermined interface. As shown in fig. 5, the acquired video stream is not directly transmitted, but the video stream is converted into a base64 file format, where base64 is one of the most common encoding modes for transmitting 8-Bit byte codes, and can be used to transmit a longer video file, and the video file with a set format is input into an image recognition model for recognition by using a GRPC interface, so that the transmission performance is improved and the efficiency is improved.

In yet another implementation manner of this embodiment, a video stream including multiple frames of captured images is acquired from at least one image sensor, and the video stream of each image sensor is input into a corresponding image recognition model for recognition.

As shown in fig. 6, a video stream of multiple frames of images that can be acquired from an image sensor of one camera may be input to a corresponding image sensor for recognition, or a video stream output from a plurality of cameras may be recognized by using a corresponding image sensor, where the image sensors corresponding to different cameras may be the same or different. In a scene with multiple cameras, video streams output by the multiple cameras can be converted into corresponding base64 file formats, and then transmitted to corresponding image recognition models for recognition, so as to improve the transmission efficiency.

For other explanations of step 202, refer to the explanations in the foregoing embodiments, which are not repeated in this embodiment.

Step 203, determining a target type from the multiple candidate types according to the confidence thresholds of the multiple candidate types.

For details, reference may be made to the explanations in the above embodiments, and the principles are the same, which are not described herein again.

In response to the second setting operation, a size range is set, step 204.

The second setting operation is a click operation performed by the user or a sliding operation for setting a size range of the object in a UI interface interacting with the user.

In the related art, when an object is recognized, a detection result output by a recognition model includes the size of a rectangular frame that cannot quantify the size of the recognized object, and thus, in the present embodiment, a size range of the object, which is a size range in which the object is mapped into an image, can be set in response to a user operation.

For example, in this embodiment, the object to be identified may be a short hex screw, and the long hex screw and the short hex screw may be identified according to the threshold, and further, in order to improve the accuracy of the object identification, the short hex screw may be accurately selected by setting the size range of the object, for example, the short hex screw, and the long hex screw may be excluded, so that the reliability of the object identification may be improved.

Step 205, determining that the object displayed in the target area conforms to the size range.

In an implementation manner of the embodiment of the present application, a largest local area conforming to a set shape in the target area is taken as the target area, and in a case that the size of the target area is within a size range, an object conforming size range displayed in the target area is determined, so that accuracy of object size determination is improved. The set shape refers to a set shape of an object to be recognized.

In another possible implementation manner of the embodiment of the application, the area of the target region is corrected according to the setting coefficient to obtain a corrected area, and under the condition that the corrected area is within the size range, it is determined that the object displayed in the target region conforms to the size range, so that the accuracy of determining the size of the object is improved. The setting coefficient may be determined based on a statistical value of the identified target area.

In yet another possible implementation manner of the embodiment of the application, the size of the target area is determined according to the number of horizontal and/or vertical pixels in the target area, and under the condition that the size of the target area is within the size range, the object displayed in the target area is determined to be in line with the size range, and the object is further screened through the set size range, so that the accuracy of determining the size of the object is improved.

In the embodiment, the object in the target area is further identified through the set size range, so that the reliability of object identification in the target area is improved.

Step 206, determining that the object contained in the target area belongs to the target type.

In this embodiment, when the object displayed in the target area conforms to the size range, it is determined that the object included in the target area belongs to the target type, and the reliability of object identification of the target type is improved.

As shown in fig. 3, the image recognition results of a plurality of cameras, for example, two cameras, can be displayed in the user interaction interface, which increases the intuitiveness.

In the object recognition method based on the image recognition model of the embodiment, confidence thresholds of a plurality of candidate types are obtained, the image recognition model is adopted to recognize the collected image, so that a target area in which an object is shown and confidence of the object belonging to each candidate type are determined from the collected image, the target type is determined from the candidate types according to the confidence thresholds of the candidate types, wherein the confidence of the target type is greater than the confidence threshold of the target type, and the object contained in the target area is determined to belong to the target type. Meanwhile, the size range of the object is set, the size of the object in the target area is further identified based on the set size range, and the reliability of object identification in the target area is improved.

In order to implement the above embodiments, the present embodiment provides an object recognition apparatus based on an image recognition model.

Fig. 7 is a schematic structural diagram of an object recognition apparatus based on an image recognition model according to an embodiment of the present application, as shown in fig. 7, the apparatus includes:

an obtaining module 71, configured to obtain confidence thresholds of multiple candidate types.

The processing module 72 is configured to identify the acquired image by using an image identification model to obtain a target region and obtain confidence levels of the candidate types; wherein the target region is a region of the acquired image containing an object, the confidence level being indicative of a probability that the object belongs to a corresponding candidate type.

The first determining module 73 is configured to determine a target type from a plurality of candidate types according to confidence thresholds of the candidate types, where the target type is a candidate type whose corresponding confidence is greater than the corresponding confidence threshold.

And the identifying module 74 is configured to determine that the object included in the target area belongs to the target type.

Further, in an implementation manner of the present application, the obtaining module 71 is specifically configured to: setting the confidence thresholds of a plurality of the candidate types in response to a first setting operation; or determining the confidence threshold of each candidate type according to the number of training samples of each candidate type in the training sample set of the image recognition model.

In one implementation of the present application, the apparatus further includes:

a setting module for setting a size range in response to a second setting operation;

the identification module 74 is specifically configured to: and determining that the object contained in the target area belongs to the target type under the condition that the object displayed in the target area conforms to the size range.

In one implementation manner of the present application, the apparatus further includes:

the second determining module is used for taking the maximum local area which is in line with the set shape in the target area as an object area; determining that the object presented within the target area conforms to the size range if the size of the object area is within the size range.

In an implementation manner of the present application, the second determining module is further configured to correct the area of the target region according to a set coefficient to obtain a corrected area; determining that the object displayed in the target region conforms to the size range if the corrected area is within the size range.

In an implementation manner of the present application, the second determining module is further configured to determine a size of the target region according to the number of horizontal and/or vertical pixels in the target region; determining that an object presented within the target area conforms to the size range if the size of the target area is within the size range.

In an implementation manner of the present application, the processing module 72 is specifically configured to: acquiring a video stream containing a plurality of frames of collected images from an image sensor; converting the acquired video stream into data in an array format; and inputting the data in the array format to the image recognition model for recognition.

In an implementation manner of the present application, the processing module 72 is further specifically configured to: acquiring a video stream containing a plurality of frames of acquired images from an image sensor; converting the video stream into a video file in a predetermined file format; and inputting the video file with the preset file format into the image recognition model by adopting a preset interface for recognition.

In an implementation manner of the present application, the processing module 72 is further specifically configured to: acquiring a video stream containing a plurality of frames of captured images from at least one image sensor; and inputting the video stream of each image sensor into the corresponding image recognition model for recognition.

It should be noted that the explanation of the foregoing method embodiment is also applicable to the apparatus of this embodiment, and the principle is the same, and is not repeated here.

In the object recognition device based on the image recognition model according to the embodiment, confidence thresholds of a plurality of candidate types are obtained, the image recognition model is adopted to recognize the acquired image to obtain a target region and confidence of each candidate type, the target region is a region including an object in the acquired image, the confidence is used for indicating the probability that the object belongs to the corresponding candidate type, and the target type is determined from the candidate types according to the confidence thresholds of the candidate types, wherein the target type is a candidate type with the corresponding confidence greater than the corresponding confidence threshold, and the object included in the target region is determined to belong to the target type.

In order to implement the above embodiments, this embodiment provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image recognition model-based object recognition method of the foregoing method embodiments.

In order to implement the above embodiments, the present embodiment provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the image recognition model-based object recognition method according to the foregoing method embodiment.

In order to implement the above embodiments, the present embodiment provides a computer program product comprising a computer program which, when being executed by a processor, implements the image recognition model-based object recognition method as described in the foregoing method embodiments.

According to embodiments of the present application, an electronic device, a readable storage medium, and a computer program product are also provided.

Fig. 8 is a schematic block diagram of an example electronic device 800 of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the device 800 includes a computing unit 801 that can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 802 or a computer program loaded from a storage unit 808 into a RAM (Random Access Memory) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An I/O (Input/Output) interface 805 is also connected to the bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806 such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 801 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as an object recognition method based on an image recognition model. For example, in some embodiments, the image recognition model-based object recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image recognition model based object recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image recognition model-based object recognition method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, system On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (erasable Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An object recognition method based on an image recognition model comprises the following steps:

obtaining confidence threshold values of a plurality of candidate types;

identifying the acquired image by adopting an image identification model to obtain a target area and obtain the confidence coefficient of each candidate type; the target area is an area containing an object in the acquired image, the confidence coefficient is used for indicating the probability that the object belongs to the corresponding candidate type, the adaptation of various cameras is provided, and the acquired image is obtained by accessing one or more industrial cameras and acquiring the image; under the scenes of a plurality of industrial cameras, the image recognition models corresponding to the industrial cameras are the same or different, and a user can select the image recognition models of the corresponding cameras through a user interaction interface;

determining that the object contained in the target area belongs to the target type;

wherein the method further comprises: setting a size range, which is a size range in which the object is mapped into the image, in response to a second setting operation including a click operation performed by a user or a slide operation for setting the size range of the object in a UI interface with which the user interacts; taking the maximum local area in the target area, which conforms to a set shape, as an object area, wherein the set shape is the set shape of the object to be identified; determining that the object displayed in the target area conforms to the size range if the size of the object area is within the size range;

the determining that the object included in the target area belongs to the target type includes:

determining that the object contained in the target area belongs to the target type if the object displayed in the target area conforms to the size range;

wherein, adopt the image recognition model to discern the collection image, include:

acquiring a video stream containing a plurality of frames of captured images from at least one image sensor;

converting the video stream into a video file in a predetermined file format;

and inputting the video file in the preset file format into the image recognition model by adopting a preset interface for recognition, and displaying a plurality of recognition results in a user interaction interface.

2. The object recognition method of claim 1, wherein the obtaining confidence thresholds for a plurality of candidate types comprises:

setting the confidence thresholds of a plurality of the candidate types in response to a first setting operation;

or determining the confidence threshold of each candidate type according to the number of training samples of each candidate type in the training sample set of the image recognition model.

3. The object recognition method of claim 1, wherein the method further comprises:

correcting the area of the target area according to a set coefficient to obtain a corrected area;

determining that the object displayed in the target region conforms to the size range if the corrected area is within the size range.

4. The object recognition method of claim 1, wherein the method further comprises:

determining the size of the target area according to the number of horizontal and/or longitudinal pixel points in the target area;

determining that an object presented within the target area conforms to the size range if the size of the target area is within the size range.

5. The object recognition method according to any one of claims 1 to 4, wherein the recognizing the captured image using the image recognition model includes:

acquiring a video stream containing a plurality of frames of collected images from an image sensor;

converting the acquired video stream into data in an array format;

and inputting the data in the array format to the image recognition model for recognition.

6. An object recognition apparatus based on an image recognition model, comprising:

an obtaining module, configured to obtain confidence thresholds of multiple candidate types;

the processing module is used for identifying the acquired image by adopting an image identification model to obtain a target area and obtain the confidence coefficient of each candidate type; the target area is an area containing an object in the acquired image, the confidence coefficient is used for indicating the probability that the object belongs to the corresponding candidate type, and the acquired image is obtained by accessing one or more industrial cameras and acquiring the image; under the scenes of a plurality of industrial cameras, the image recognition models corresponding to each industrial camera are the same or different, and a user can select the image recognition models of the corresponding cameras through a user interaction interface;

the identification module is used for determining that the object contained in the target area belongs to the target type;

wherein the apparatus further comprises:

a setting module, configured to set a size range in response to a second setting operation, where the size range is a size range in which the object is mapped into the image, and the second setting operation includes a click operation performed by a user or a slide operation for setting the size range of the object in a UI interface interacting with the user;

the second determining module is used for taking the maximum local area which accords with a set shape in the target area as an object area, wherein the set shape is the set shape of an object to be identified; determining that the object presented within the target area conforms to the size range if the size of the object area is within the size range;

the identification module is specifically configured to:

the processing module is specifically configured to:

converting the video stream into a video file in a predetermined file format;

7. The object identifying apparatus of claim 6, wherein the obtaining module is specifically configured to:

setting the confidence thresholds for a plurality of the candidate types in response to a first setting operation;

or determining the confidence threshold of each candidate type according to the number of training samples of each candidate type in a training sample set of the image recognition model.

8. The object identifying apparatus according to claim 6,

the second determining module is further configured to correct the area of the target region according to a set coefficient to obtain a corrected area; determining that the object displayed in the target region conforms to the size range if the corrected area is within the size range.

9. The object identifying apparatus according to claim 6,

the second determining module is further configured to determine the size of the target region according to the number of horizontal and/or vertical pixels in the target region; determining that an object presented within the target area conforms to the size range if the size of the target area is within the size range.

10. The object identifying apparatus according to any one of claims 6 to 9, wherein the processing module is specifically configured to:

converting the acquired video stream into data in an array format;

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.