CN115424157A

CN115424157A - Target identification method, device and system

Info

Publication number: CN115424157A
Application number: CN202110600131.3A
Authority: CN
Inventors: 章柏永; 朱强; 叶小卫; 梅艳芳; 姜嘉豪
Original assignee: Zhejiang Public Information Industry Co ltd
Current assignee: Zhejiang Public Information Industry Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-12-02

Abstract

The present disclosure provides a target identification method, device and system, relating to the field of artificial intelligence, wherein the method comprises the following steps: in response to a first request for carrying out target recognition on a video, inputting a plurality of continuous first image frames in the first video into a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame; adding identification information of the identified target in each first image frame to the first image frame to obtain a plurality of continuous second image frames; playing a second video, the second video comprising the plurality of second image frames.

Description

Target identification method, device and system

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to a target identification method, device and system.

Background

In the related art, an object in an image may be identified based on a machine learning model. However, how to identify objects in a video based on a machine learning model is still imperfect.

Disclosure of Invention

In view of this, the disclosed embodiments propose the following solution suitable for identifying objects in video.

According to an aspect of the embodiments of the present disclosure, there is provided a target identification method, including: in response to a first request for carrying out target recognition on a video, inputting a plurality of continuous first image frames in the first video into a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame; adding identification information of the identified target in each first image frame to the first image frame to obtain a plurality of continuous second image frames; playing a second video, the second video comprising the plurality of second image frames.

In some embodiments, the first request carries an identification of a camera from which the first video is obtained in real-time.

In some embodiments, the second video begins playing from a first time, the first time being between a start time and an end time of the first video.

In some embodiments, adding identification information of the identified target in each first image frame to the first image frame comprises: determining a position of a recognized target in the first image frame; adding identification information of the identified target at the location.

In some embodiments, at least one of the plurality of first image frames includes an unidentified target; the method further comprises the following steps: in each of the at least one first image frame, marking an unidentified target in the first image frame to obtain a third image frame; displaying the third image frame to query the third image frame for identification information of the tagged unidentified target; receiving a response message, wherein the response message carries identification information of an unidentified target marked in the third image frame; training the machine learning model with the third image frame as an input and identification information of an unidentified target marked in the third image frame as an output.

In some embodiments, the method further comprises: in response to a second request for target recognition of the image, inputting the first image to the machine learning model to obtain identification information of a recognized target in the first image; and adding the identification information of the identified target in the first image to obtain a second image.

In some embodiments, the number of identified targets is greater than 1.

According to another aspect of the embodiments of the present disclosure, there is provided an object recognition apparatus including: the input module is configured to respond to a first request for carrying out target recognition on the video, and input a plurality of continuous first image frames in the first video into a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame; an adding module configured to add identification information of the identified target in each first image frame to the first image frame to obtain a plurality of continuous second image frames; a playback module configured to play a second video, the second video including the plurality of second image frames.

According to still another aspect of the embodiments of the present disclosure, there is provided an object recognition apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above embodiments based on instructions stored in the memory.

According to still another aspect of the embodiments of the present disclosure, there is provided a target recognition system including: the object recognition apparatus according to any one of the above embodiments; and the camera is configured to send the acquired first video to the target recognition device in real time.

According to yet another aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any one of the above embodiments.

In the embodiment of the present disclosure, identification information of a recognized target obtained based on a machine learning model is added to a first image frame of a first video to obtain a second image frame, and a second video including a plurality of second image frames is played. In this way, the played second video can be made to intuitively present the identification information of the recognized target.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a target identification method according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram of a method of object recognition according to further embodiments of the present disclosure;

FIG. 3 is a schematic block diagram of an object recognition device according to some embodiments of the present disclosure;

FIG. 4 is a schematic block diagram of an object recognition device according to further embodiments of the present disclosure;

FIG. 5 is a schematic diagram of a structure of a target recognition system according to some embodiments of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

FIG. 1 is a flow diagram of a target identification method according to some embodiments of the present disclosure.

As shown in fig. 1, the object recognition method includes steps 102 to 106.

In step 102, in response to a first request for target recognition of a video, a plurality of consecutive first image frames in the first video are input to a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame.

In some embodiments, the first video is a selected one of a plurality of videos. For example, the user may select one of the plurality of videos locally and then click on the ok button to treat the selected video as the first video.

In still other embodiments, the first request carries an identification of the camera. For example, there are multiple cameras, each carrying a corresponding identification. When the first request carries the identification of a certain camera, the first video is acquired from the camera in real time. For example, a user may input an identifier of a camera at a preset position, so as to call a video acquired by the camera in real time as a first video.

In some embodiments, each first image frame is a preset size image frame.

For example, a pre-trained machine learning model may identify a variety of targets. Each type of object has corresponding identification information. By inputting a plurality of consecutive first image frames in the first video to a machine learning model trained in advance, identification information of the recognized target in each first image frame can be obtained.

In some embodiments, the pre-trained machine learning model is based on a multi-scale residual convolutional neural network, e.g., a Darknet-53 neural network. Thus, efficient multi-target recognition can be achieved.

For example, the number of identified objects in a certain first image frame is greater than 1 (i.e., there are multiple objects in the first image frame that the machine learning model can identify, e.g., two people and a vehicle). After the first image frame is input to the machine learning model, identification information (e.g., "person", and "vehicle") of the plurality of recognized targets may be obtained.

In step 104, identification information of the identified target in each first image frame is added to the first image frame to obtain a plurality of consecutive second image frames.

For example, identification information of the identified target in each first image frame may be added to the first image frame as a watermark or annotation.

In some embodiments, the location of the identified target in the first image frame is first determined. Then, identification information of the recognized target is added at the position of the recognized target in the first image frame. In this way, the presentation of the identified object in the second image frame may be made more intuitive, so that a tracking of the identified object may be achieved by means of the second video.

At step 106, a second video comprising a plurality of second image frames is played.

In some embodiments, the first video is acquired in real-time from a camera. The camera captures a first video between a start time and an end time of the first video. The second video is played from a first time between the start time and the end time of the first video. In this manner, the second video may be played while the first video is being captured.

And under the condition that the target recognition efficiency of the pre-trained machine learning model is high enough, the playing of the second video is close to the real-time with the acquisition of the first video. Therefore, the second video capable of intuitively presenting the identified target can be played in real time during monitoring, so that the identified object in the first video monitored by the camera can be tracked in real time.

In some embodiments, the second video also includes information such as the number of frames transmitted per second, the time, etc. For example, the CV2 module of Python may be utilized to add this information in the second video.

In the above-described embodiment, the identification information of the recognized target obtained based on the machine learning model is added to the first image frame of the first video to obtain the second image frame, and the second video including a plurality of the second image frames is played. In this way, the played second video can be made to intuitively present the identification information of the recognized target.

In some embodiments, at least one of the plurality of first image frames includes an unrecognized target (i.e., a target that cannot be recognized by the machine learning model). For this case, the present disclosure also proposes a solution as shown in fig. 2.

FIG. 2 is a flow diagram of a method of object recognition according to further embodiments of the present disclosure.

As shown in fig. 2, the object recognition method further includes steps 202 to 208.

In step 202, in each of at least one first image frame comprising an unidentified object, the unidentified object in the first image frame is marked to obtain a third image frame.

For example, the first image frame includes an unidentified traffic signal light. At this time, the traffic signal lamp may be framed with a marker symbol in the first image frame to obtain a third image frame.

At step 204, the third image frame is displayed to query the third image frame for identifying information of the tagged unidentified object.

For example, a third image frame may be displayed while the second video is being played to query the third image frame for identifying information of the tagged unidentified object. The user may then respond, for example, by entering identification information for the unrecognized target.

At step 206, a response message is received carrying identification information of the unidentified target marked in the third image frame.

At step 208, the machine learning model is trained with the third image frame as an input and the identification information of the unidentified target marked in the third image frame as an output.

It should be appreciated that the training in step 208 is continued training for the pre-trained machine learning model in step 102. In this way, the weight of the machine learning model can be optimized, thereby improving the recognition capability of the machine learning model.

In the above-described embodiment, the third image frame in which the unrecognized target is marked is displayed to inquire the identification information of the unrecognized target, and the machine learning model is trained using the third image frame and the identification information of the unrecognized target as input and output, respectively. In the using process of the machine learning model, the unidentified target is used as a training sample to continue training the machine learning model, so that the identification capability of the machine learning model can be improved, and the time for searching the training sample can be saved.

By continuously training the machine learning model in this way, the target identification method provided by the embodiment of the disclosure can quickly adapt to the target identification requirements of different industries.

The object recognition method shown in fig. 1 and 2 is further described below with reference to some embodiments.

In some embodiments, the target recognition method further comprises inputting the first image to a pre-trained machine learning model to obtain identification information of the recognized target in the first image in response to a second request for target recognition of the image. Then, the identification information of the recognized target in the first image is added to the first image to obtain a second image. In this way, identification of the object in the first image may be achieved.

Similarly, for example, the first image is a first image selected from a plurality of images. For another example, the second request carries an identification of the camera. At this time, a first image is acquired from the camera.

Similarly, the number of recognized targets in the first image may be greater than 1. There may also be unidentified objects in the first image, thereby performing operations similar to steps 202 through 208.

FIG. 3 is a schematic diagram of a structure of an object recognition device, according to some embodiments of the present disclosure.

As shown in fig. 3, the object recognition apparatus 300 includes an input module 301, an adding module 302, and a playing module 303.

The input module 301 is configured to input a plurality of consecutive first image frames in a first video to a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame in response to a first request for target recognition of the video.

The adding module 302 is configured to add identification information of the identified target in each first image frame to the first image frame to obtain a plurality of consecutive second image frames.

The playing module 303 is configured to play a second video comprising a plurality of second image frames.

In some embodiments, at least one of the plurality of first image frames includes an unidentified target. The object-recognition device 300 further includes a marking module, a display module, a receiving module, and a training module, which are not shown.

The marking module is configured to mark, in each of at least one first image frame including an unidentified target, the unidentified target in the first image frame to obtain a third image frame.

The display module is configured to display the third image frame to query the third image frame for identifying information of the tagged unidentified target.

The receiving module is configured to receive a response message carrying identification information of an unidentified target marked in the third image frame.

The training module is configured to train the machine learning model trained in advance with the third image frame as an input and the identification information of the unidentified target marked in the third image frame as an output.

FIG. 4 is a schematic diagram of an object recognition device, according to further embodiments of the present disclosure.

As shown in fig. 4, the object recognition apparatus 400 includes a memory 401 and a processor 402 coupled to the memory 401, wherein the processor 402 is configured to execute the object recognition method according to any one of the foregoing embodiments based on instructions stored in the memory 401.

The memory 401 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory may store, for example, an operating system, application programs, a Boot Loader (Boot Loader), and other programs.

The object recognition apparatus 400 may further include an input-output interface 403, a network interface 404, a storage interface 405, and the like. The

interfaces

403, 404, 405 and the memory 401 and the processor 402 may be connected by a bus 406, for example. The input/output interface 403 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 404 provides a connection interface for various networking devices. The storage interface 405 provides a connection interface for external storage devices such as an SD card and a usb disk.

It should be understood that the object recognition device 300/400 of the present disclosure may enable near real-time object recognition. The target recognition device 300/400 can also be applied to scenes such as intelligent buildings, intelligent cities and the like, and multi-target recognition under complex environments is realized.

As shown in fig. 5, the object recognition system includes an object recognition device 300/400 and a camera 501.

The camera 501 is configured to transmit the captured first video to the object recognition device 300/400 in real time.

In some embodiments, the camera 501 is configured to send the acquired first image to the object recognition device 300/400.

The disclosed embodiments also provide a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, implement the method of any one of the above embodiments.

Thus, various embodiments of the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the embodiments of the target identification device and the target real-time system, since they basically correspond to the embodiments of the method, the description is relatively simple, and the relevant points can be referred to the partial description of the embodiments of the method.

In addition, in the description of the present disclosure, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or order.

As will be appreciated by one of skill in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that the functions specified in one or more of the flows in the flowcharts and/or one or more of the blocks in the block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be understood by those skilled in the art that various changes may be made in the above embodiments or equivalents may be substituted for elements thereof without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. An object recognition method, comprising:

in response to a first request for carrying out target recognition on a video, inputting a plurality of continuous first image frames in the first video into a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame;

adding identification information of the identified target in each first image frame to the first image frame to obtain a plurality of continuous second image frames;

playing a second video, the second video comprising the plurality of second image frames.

2. The method of claim 1, wherein the first request carries an identification of a camera from which the first video was obtained in real-time.

3. The method of claim 2, wherein the second video is played from a first time, the first time being between a start time and an end time of the first video.

4. The method of claim 1, wherein adding identification information of the identified target in each first image frame to the first image frame comprises:

determining a position of a recognized target in the first image frame;

adding identification information of the identified target at the location.

5. The method of claim 1, wherein at least one of the plurality of first image frames includes an unidentified target;

the method further comprises the following steps:

in each of the at least one first image frame, marking an unidentified target in the first image frame to obtain a third image frame;

displaying the third image frame to query the third image frame for identification information of the tagged unidentified target;

receiving a response message, wherein the response message carries identification information of an unidentified target marked in the third image frame;

training the machine learning model with the third image frame as an input and identification information of an unidentified target marked in the third image frame as an output.

6. The method of claim 1, further comprising:

in response to a second request for target recognition of the image, inputting the first image to the machine learning model to obtain identification information of a recognized target in the first image;

and adding the identification information of the identified target in the first image to obtain a second image.

7. The method of any of claims 1-6, wherein the number of identified targets is greater than 1.

8. An object recognition apparatus comprising:

the input module is configured to respond to a first request for carrying out target recognition on the video, and input a plurality of continuous first image frames in the first video into a machine learning model trained in advance to obtain identification information of a recognized target in each first image frame;

an adding module configured to add identification information of the identified target in each first image frame to the first image frame to obtain a plurality of continuous second image frames;

a playback module configured to play a second video, the second video including the plurality of second image frames.

9. An object recognition apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of any of claims 1-7 based on instructions stored in the memory.

10. An object recognition system comprising:

the object recognition device of claim 8 or 9; and

the camera is configured to send the acquired first video to the target recognition device in real time.

11. A computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1-7.