CN113435359A

CN113435359A - Image recognition method

Info

Publication number: CN113435359A
Application number: CN202110740438.3A
Authority: CN
Inventors: 李思雨; 刘金立; 林成友; 赵晨翔
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-24

Abstract

The application provides an image identification method, and relates to the technical field of image processing. In the application, firstly, a target monitoring video sent by the target monitoring equipment is obtained; secondly, screening multiple frames of monitoring video frames included in the target monitoring video to obtain multiple frames of target monitoring video frames; and then, identifying the target monitoring video frame to obtain a corresponding identification result, wherein the identification result comprises whether the behavior of the target monitoring object meets a preset monitoring condition. Based on the method, the problem of high consumption of computing resources in the existing image processing technology can be solved.

Description

Image recognition method

Technical Field

The application relates to the technical field of image processing, in particular to an image identification method.

Background

Image recognition is the basis of many fields such as monitoring, and for example, monitoring means and the like can be determined based on the result of image recognition. However, the inventors have found that the conventional image recognition technology has a problem of a large consumption of computing resources.

Disclosure of Invention

In view of the above, an object of the present application is to provide an image recognition method to solve the problem of high computational resource consumption in the existing image processing technology.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

an image recognition method is applied to an image recognition device, the image recognition device is in communication connection with a target monitoring device, and the image recognition method comprises the following steps:

acquiring a target monitoring video sent by the target monitoring equipment, wherein the target monitoring video is obtained by shooting a target monitoring object based on the target monitoring equipment, and the target monitoring video comprises a plurality of frames of monitoring video frames which have a sequential relation in time;

screening multiple frames of monitoring video frames included in the target monitoring video to obtain multiple frames of target monitoring video frames, wherein the target monitoring video frames belong to the target monitoring video;

and identifying the target monitoring video frame to obtain a corresponding identification result, wherein the identification result comprises whether the behavior of the target monitoring object meets a preset monitoring condition.

In a possible embodiment, in the above image recognition method, the step of obtaining the target monitoring video sent by the target monitoring device includes:

acquiring a monitoring video packet currently sent by the target monitoring equipment, wherein the target monitoring equipment is packaged to form the monitoring video packet after shooting a target monitoring object to obtain a monitoring video frame;

analyzing the monitoring video packet to obtain the frame number of the monitoring video frame included in the monitoring video frame so as to obtain a first frame number;

judging whether the first frame number is larger than a predetermined target frame number threshold value, wherein the target frame number threshold value is generated based on the configuration operation of the image identification equipment responding to the corresponding user;

and if the first frame number is greater than the target frame number threshold, determining the monitoring video packet as a target monitoring video comprising multiple monitoring video frames.

In a possible embodiment, in the image recognition method, the step of obtaining the target monitoring video sent by the target monitoring device further includes:

if the first frame number is less than or equal to the target frame number threshold, executing a waiting operation;

determining a waiting time length for executing waiting operation, and judging whether the waiting time length is greater than a predetermined target time length threshold value, wherein the target time length threshold value is generated based on the configuration operation of the image recognition equipment responding to the corresponding user;

and if the waiting time length is greater than the target time length threshold value, determining the surveillance video packet as a target surveillance video comprising a plurality of surveillance video frames.

if the waiting time length is less than or equal to the target time length threshold value, continuing to execute the waiting operation until the current waiting time length is greater than the target time length threshold value, or the sum of the number of the monitoring video frames included in the monitoring video packet and the number of the monitoring video frames included in the new monitoring video packet sent by the target monitoring equipment is greater than the target frame number threshold value;

when the waiting operation is continuously executed until the current waiting time length is greater than the target time length threshold value, determining the surveillance video packet as a target surveillance video comprising a plurality of frames of surveillance video frames;

and when the sum of the number of the surveillance video frames included by the surveillance video packet and the number of the surveillance video frames included by the new surveillance video packet sent by the target surveillance device is greater than the target frame number threshold, determining the surveillance video packet and the new surveillance video packet as a target surveillance video including multiple frames of surveillance video frames.

In a possible embodiment, in the image identification method, the step of performing a screening process on multiple frames of surveillance video frames included in the target surveillance video to obtain multiple frames of target surveillance video frames includes:

calculating the similarity between each monitoring video frame and other monitoring video frames of each frame aiming at each monitoring video frame included in the target monitoring video to obtain corresponding similarity information;

for each monitoring video frame, determining whether the monitoring video frame has an association relation with each other monitoring video frame based on the similarity information between the monitoring video frame and each other monitoring video frame;

and screening the multi-frame monitoring video frames included in the target monitoring video based on the incidence relation to obtain the multi-frame target monitoring video frames.

In a possible embodiment, in the above image recognition method, the step of calculating, for each frame of surveillance video included in the target surveillance video, a similarity between the frame of surveillance video and each of other frames of surveillance video to obtain corresponding similarity information includes:

respectively calculating the pixel mean value of each frame of monitoring video frame included in the target monitoring video;

and calculating the similarity of the pixel mean value between the monitoring video frame and each other monitoring video frame aiming at each monitoring video frame included in the target monitoring video to obtain the similarity information between the monitoring video frame and each other monitoring video frame.

In a possible embodiment, in the above image recognition method, the step of determining, for each of the surveillance video frames, whether there is an association relationship between the surveillance video frame and each of the other surveillance video frames based on similarity information between the surveillance video frame and each of the other surveillance video frames includes:

aiming at each monitoring video frame, executing target operation based on the similarity information between the monitoring video frame and each other monitoring video frame to determine whether the monitoring video frame and each other monitoring video frame have an association relation;

wherein the target operation comprises:

determining other monitoring video frames of which the similarity information belongs to the target similarity interval in the other monitoring video frames of the plurality of frames as first other monitoring video frames;

and screening the first other monitoring video frames of the plurality of frames based on a preset screening rule to determine whether the monitoring video frames and each other monitoring video frame have an association relationship.

In a possible embodiment, in the image identification method, the step of obtaining multiple target surveillance video frames by performing screening processing on multiple surveillance video frames included in the target surveillance video based on the association relationship includes:

determining the number of other monitoring video frames having an association relation with each monitoring video frame to obtain a corresponding first number aiming at each monitoring video frame included in the target monitoring video;

and determining whether to reserve the surveillance video frame as a target surveillance video frame based on the first number corresponding to the surveillance video frame for each frame of the surveillance video frame.

In a possible embodiment, in the image recognition method, the step of performing recognition processing on the target surveillance video frame to obtain a corresponding recognition result includes:

obtaining a pre-trained action recognition model, wherein the action recognition module is obtained based on neural network training;

and identifying the target monitoring video frame based on the action identification model to obtain a corresponding identification result, wherein the identification result comprises whether the behavior action of the target monitoring object meets a preset monitoring strip.

In a possible embodiment, in the image recognition method, the step of performing recognition processing on the target surveillance video frame based on the motion recognition model to obtain a corresponding recognition result includes:

identifying the target monitoring video frame based on the action identification model to obtain an identification result of whether corresponding target action characteristic information is matched with action characteristic reference information;

if the target action characteristic information is matched with the action characteristic reference information, determining that the behavior of the target monitoring object meets the preset monitoring condition, and if the target action characteristic information is not matched with the action characteristic reference information, determining that the behavior of the target monitoring object does not meet the preset monitoring condition.

According to the image identification method, after the target monitoring video is obtained, screening processing is carried out first, and then identification processing is carried out, so that the number of frames of the monitoring video frames needing to be processed during identification processing can be reduced to a certain extent, less computing resources are consumed, and the problem that computing resources in the existing image processing technology are consumed greatly is solved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a block diagram of an image recognition apparatus according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of an image recognition method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, an embodiment of the present application provides an image recognition apparatus. Wherein the image recognition device may include a memory and a processor.

In detail, the memory and the processor are electrically connected directly or indirectly to realize data transmission or interaction. For example, they may be electrically connected to each other via one or more communication buses or signal lines. The memory can have stored therein at least one software function (computer program) which can be present in the form of software or firmware. The processor may be configured to execute the executable computer program stored in the memory, so as to implement the image recognition method provided by the embodiment of the present application (as described later).

Alternatively, the Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a System on Chip (SoC), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

Also, the structure shown in fig. 1 is only an illustration, and the image recognition device may further include more or less components than those shown in fig. 1, or have a different configuration from that shown in fig. 1, for example, may include a communication unit for information interaction with other devices (such as an object monitoring device).

Wherein, in an alternative example, the image recognition device may be a server with data processing capability.

With reference to fig. 2, an embodiment of the present application further provides an image recognition method, which is applicable to the image recognition apparatus. Wherein, the method steps defined by the flow related to the image recognition method can be realized by the image recognition device. The specific process shown in FIG. 2 will be described in detail below.

Step S110, obtaining a target monitoring video sent by the target monitoring device.

In this embodiment, the image recognition device may first acquire a target monitoring video frame sent by the target monitoring device. The target monitoring video is obtained by shooting a target monitoring object based on the target monitoring equipment, image acquisition equipment such as a camera acquires information of the target monitoring object, and the target monitoring video comprises a plurality of frames of monitoring video with a sequential relation in time.

Step S120, the multi-frame monitoring video frames included in the target monitoring video are screened to obtain multi-frame target monitoring video frames.

In this embodiment, after the target surveillance video is acquired based on step S110, the image recognition device may perform screening processing on multiple surveillance video frames included in the target surveillance video to obtain multiple target surveillance video frames.

Wherein the target surveillance video frame belongs to the target surveillance video. That is, a portion of the surveillance video frames in the target surveillance video may be screened out as target surveillance video frames.

Step S130, the target monitoring video frame is identified to obtain a corresponding identification result.

In this embodiment, after the multiple frames of target surveillance video frames are acquired based on step S120, the image recognition device may perform recognition processing on the target surveillance video frames, so that corresponding recognition results may be obtained. And the identification result comprises whether the behavior of the target monitoring object meets a preset monitoring condition.

Based on the method, the target monitoring video is obtained, then the screening processing is carried out, and then the identification processing is carried out, so that the number of frames of the monitoring video frames needing to be processed during the identification processing can be reduced to a certain extent, less computing resources are consumed, and the problem of higher computing resource consumption in the existing image processing technology is solved.

It is understood that, in an alternative example, when step S110 is executed, the target surveillance video may be obtained based on the following steps:

first, a surveillance video packet currently sent by the target surveillance device is obtained, where the target surveillance device packages the surveillance video packet after shooting a target surveillance object to obtain a surveillance video frame (in an alternative example, the surveillance video packet may be encrypted when being sent);

secondly, analyzing the monitoring video packet to obtain the frame number of the monitoring video frame included in the monitoring video frame so as to obtain a first frame number, such as 10, 20, 30 and the like;

then, judging whether the first frame number is larger than a predetermined target frame number threshold, wherein the target frame number threshold can be generated based on the configuration operation of the image recognition device responding to the corresponding user according to the actual application scene;

and finally, if the first frame number is greater than the target frame number threshold, determining the surveillance video packet as a target surveillance video comprising multiple surveillance video frames.

It is to be understood that, on the basis of the above example, in an alternative example, when the step S110 is executed, the target monitoring video may also be obtained based on the following steps:

firstly, if the first frame number is less than or equal to the target frame number threshold, executing a waiting operation (if starting to time);

secondly, determining a waiting time length for executing the waiting operation, and judging whether the waiting time length is greater than a predetermined target time length threshold value, wherein the target time length threshold value can be generated based on the configuration operation of the image recognition equipment responding to the corresponding user according to the actual application scene;

and when the sum of the number of the surveillance video frames included in the surveillance video packet and the number of the surveillance video frames included in the new surveillance video packet sent by the target surveillance device is greater than the target frame number threshold, determining the surveillance video packet and the new surveillance video packet as a target surveillance video including multiple frames of surveillance video frames.

Based on this, through the setting of the waiting time threshold and the target frame number threshold, on one hand, the problem that the timeliness of the monitoring operation performed based on the identification result is poor due to the overlong waiting time can be avoided, and on the other hand, the problem that the identification result is inaccurate due to the fact that the number of the monitoring video frames used for identification processing is small can be avoided.

It is understood that, in an alternative example, when step S120 is executed, the multi-frame target surveillance video frame may be obtained through the screening process based on the following steps:

firstly, calculating the similarity between each monitoring video frame and other monitoring video frames of each frame aiming at each monitoring video frame included in the target monitoring video to obtain corresponding similarity information;

secondly, determining whether the monitoring video frame and each other monitoring video frame have an association relation or not based on the similarity information between the monitoring video frame and each other monitoring video frame aiming at each monitoring video frame;

and then, screening the multi-frame monitoring video frames included in the target monitoring video based on the incidence relation to obtain the multi-frame target monitoring video frames.

Based on this, because the incidence relation among the monitoring video frames is considered during the screening process, the screened target monitoring video frames for identification process can better reflect the obtained target monitoring video, thereby ensuring the accuracy of the identification result.

It will be appreciated that in an alternative example, the similarity information between the surveillance video frames may be derived based on the following steps:

firstly, respectively calculating the pixel mean value of each frame of monitoring video frame included in the target monitoring video (namely calculating the mean value of the pixel values of all pixel points);

then, for each frame of the surveillance video frame included in the target surveillance video, the similarity between the surveillance video frame and each other frame of the surveillance video frame with respect to the pixel mean is calculated, and the similarity information between the surveillance video frame and each other frame of the surveillance video frame is obtained (for example, the difference between the pixel mean of two frames of the surveillance video frame and the ratio of 255 may be used as the similarity information).

Based on the method, the step of calculating the similarity information can be simplified, the consumption of computing resources is reduced, and the similarity information obtained by the method has higher accuracy due to the fact that the scene is fixed and the similarity relation between the monitoring video frames can be better reflected through the pixel mean value.

It will be appreciated that in an alternative example, whether the surveillance video frame has an association with each of the other surveillance video frames may be determined based on the following steps:

wherein the target operation may include:

firstly, determining other surveillance video frames with similarity information belonging to a target similarity interval in the other surveillance video frames as first other surveillance video frames, where the target similarity interval may be generated based on a configuration operation performed by the image recognition device in response to a corresponding user according to an actual application scene, and it may be understood that the target similarity interval may include a maximum value calculated based on the foregoing calculation method, such as 1;

secondly, screening the plurality of frames of the first other surveillance video frame based on a preset screening rule to determine whether the surveillance video frame and each frame of the other surveillance video frame have an association relationship, so that after the first screening is performed based on the target similarity interval, further screening can be performed based on the screening rule to ensure that the determined association relationship has higher reliability.

It is understood that, in an alternative example, whether the surveillance video frame has an association relationship with other surveillance video frames may be determined based on the screening rule based on the following steps:

first, for multiple frames of the first other surveillance video frames, respectively determining the number of first other surveillance video frames in each of the similarity subintervals of the target similarity interval to which the similarity information belongs, to obtain a first video frame number, where the interval widths of each of the similarity subintervals are the same, and the sum of the interval widths of each of the similarity subintervals is equal to the interval width of the target similarity interval, for example, when determining the similarity subintervals, the image recognition apparatus may first generate an interval number in response to a configuration operation performed by a corresponding user, and then divide the target similarity interval into multiple similarity subintervals based on the interval number;

secondly, determining a video frame number variation value of a first video frame number of a first other monitoring video frame of which the similarity information belongs to each similarity subinterval relative to a first video frame number of a first other monitoring video frame of which the similarity information belongs to a previous similarity subinterval from a second similarity subinterval in the target similarity interval to obtain a video frame number variation sequence (for example, the first video frame number corresponding to the second similarity subinterval subtracts the first video frame number corresponding to the first similarity subinterval to obtain the first video frame number variation value in the video frame number variation sequence);

thirdly, based on the sequence of the video frame number change values in the video frame number change sequence, executing the following steps on the video frame number change values in the video frame number change sequence until the video frame number change value smaller than the video frame number change threshold is determined: judging whether the current video frame number change value is smaller than a video frame number change threshold value; determining the number of times of judgment which is currently performed; taking the currently determined judgment times as target judgment times, wherein the video frame number change threshold value can be generated based on the configuration operation of the image recognition equipment responding to the corresponding user;

fourthly, determining the number of second other monitoring video frames in each similarity subinterval of the similarity subintervals of the target judgment times, to obtain a second video frame number corresponding to the similarity subinterval, wherein the pixel mean value corresponding to the second other monitoring video frame belongs to a predetermined target pixel value interval, the target judgment times similarity subinterval may refer to a similarity subinterval corresponding to the video frame number change value of the previous target judgment times in the video frame number change sequence, the target pixel value interval may be determined based on the pixel mean value of the monitoring video frame, for example, the lower limit value of the target pixel value interval may be the product of the pixel mean value of the monitoring video frame and a coefficient smaller than 1, and the upper limit value of the target pixel value interval may be the product of the pixel mean value of the monitoring video frame and a coefficient larger than 1, these two coefficients may be generated based on configuration operations performed by the image recognition device in response to a corresponding user;

fifthly, determining second other monitoring video frames in the similarity subinterval corresponding to each second video frame number larger than a preset frame number threshold as third other monitoring video frames, wherein the preset frame number threshold can be generated based on configuration operation of the image recognition device responding to a corresponding user, such as 10 frames or 20 frames;

sixthly, for each of the similarity subintervals (in an alternative example, only the similarity subintervals with the third other surveillance video frames may be targeted, so that the calculation amount may be reduced), determining the number of the third other surveillance video frames belonging to the similarity subintervals to obtain third video frame numbers, and determining whether each of the third video frame numbers is greater than a preset number threshold, where the preset number threshold may be generated based on a configuration operation performed by the image recognition device in response to a corresponding user, and the preset number threshold may be greater than the preset frame number threshold;

seventhly, aiming at each third video frame number which is larger than or equal to the preset number threshold, establishing a corresponding relation between a predetermined first weight coefficient and the third video frame number;

eighthly, establishing a corresponding relation between a predetermined second weight coefficient and each third video frame number which is smaller than the preset number threshold, wherein the second weight coefficient is smaller than the first weight coefficient;

ninth, for each of the similarity subintervals (in an alternative example, only for the similarity subintervals with the third other surveillance video frames, so that the calculation amount can be further reduced), fusing (e.g., multiplying) the similarity information between each of the third other surveillance video frames and the surveillance video frame in the similarity subinterval by the first weight coefficient or the second weight coefficient corresponding to the third video frame number corresponding to the similarity subinterval to obtain a fused weight coefficient of the third other surveillance video frame;

and step ten, sequencing the third other monitoring video frames based on the corresponding fusion weight coefficient to obtain a third other monitoring video frame sequence, acquiring multiple frames (the specific number can be generated according to configuration operation performed by the image identification device in response to the corresponding user in the third other monitoring video frame sequence, and if the accuracy of the identification result is higher, the number can be larger, the consumption of computing resources for the identification result is smaller, and the number can be smaller) sequenced in the previous multiple frames, and determining the acquired multiple frames of the third other monitoring video frames as other monitoring video frames having an association relationship with the monitoring video frames.

It is understood that, in an alternative example, the filtering process may be performed on multiple frames of the surveillance video included in the target surveillance video based on the following steps:

firstly, aiming at each frame of monitoring video frame included in the target monitoring video, determining the number of other monitoring video frames having an association relation with the monitoring video frame to obtain a corresponding first number;

secondly, for each of the surveillance video frames, determining whether to retain the surveillance video frame based on the first number corresponding to the surveillance video frame as a target surveillance video frame, for example, in an alternative example, if the first number corresponding to one surveillance video frame is larger, such as larger than an average value of the first numbers, the surveillance video frame may be discarded, and if the first number corresponding to one surveillance video frame is smaller, such as smaller than or equal to the average value of the first numbers, the surveillance video frame may be retained as a target surveillance video frame.

It is understood that, in an alternative example, when step S130 is executed, the target surveillance video frame may be subjected to the identification process based on the following steps to obtain a corresponding identification result:

firstly, obtaining a pre-trained action recognition model, wherein the action recognition module is obtained based on neural network training, the specific training mode is not specifically limited, and the related existing neural network training method can be referred to;

secondly, identifying the target monitoring video frame based on the action identification model to obtain a corresponding identification result, wherein the identification result comprises whether the behavior action of the target monitoring object meets a preset monitoring condition.

It is understood that, in an alternative example, the target surveillance video frame may be identified based on the motion recognition model based on the following steps:

the target monitoring video frame is identified based on the action identification model to obtain an identification result of whether corresponding target action characteristic information is matched with action characteristic reference information, for example, the action identification module can be trained based on a positive sample video frame with the action characteristic reference information and a negative sample video frame without the action characteristic reference information;

if the target action characteristic information is matched with the action characteristic reference information, determining that the behavior of the target monitoring object meets the preset monitoring condition, and if the target action characteristic information is not matched with the action characteristic reference information, determining that the behavior of the target monitoring object does not meet the preset monitoring condition. Or, if the target action characteristic information matches with the action characteristic reference information, determining that the behavior of the target monitoring object does not satisfy the preset monitoring condition, and if the target action characteristic information does not match with the action characteristic reference information, determining that the behavior of the target monitoring object satisfies the preset monitoring condition. The specific configuration may be selected according to an actual application scenario, for example, if the action characteristic reference information is a dangerous action, the target action characteristic information is matched with the action characteristic reference information, and it is determined that the behavior of the target monitoring object does not satisfy the preset monitoring condition.

It will be appreciated that in an alternative example, the target motion characteristic information may be a continuous dynamic motion characteristic, such as a jump, and correspondingly, the motion characteristic reference information may also be a continuous dynamic motion characteristic, such as a jump. The target action characteristic information may also be a static action characteristic, such as lying on the ground, and correspondingly, the action characteristic reference information may also be a static action characteristic, such as lying on the ground.

It is understood that, when the target motion characteristic information is a static motion characteristic, it can be considered to be matched with the motion characteristic reference information when one frame of the target surveillance video frame has the target motion characteristic information. Or when all the target surveillance video frames have the target action characteristic information, the target surveillance video frames are considered to be matched with the action characteristic reference information.

In summary, according to the image identification method provided by the application, after the target surveillance video is obtained, the screening processing is performed first, and then the identification processing is performed, so that the number of frames of surveillance video frames to be processed during the identification processing can be reduced to a certain extent, thereby consuming less computing resources, and further improving the problem that the computing resources in the existing image processing technology are consumed greatly.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image recognition method is applied to an image recognition device, the image recognition device is in communication connection with a target monitoring device, and the image recognition method comprises the following steps:

2. The image recognition method according to claim 1, wherein the step of obtaining the target monitoring video sent by the target monitoring device comprises:

3. The image recognition method according to claim 2, wherein the step of obtaining the target monitoring video sent by the target monitoring device further comprises:

4. The image recognition method according to claim 3, wherein the step of obtaining the target monitoring video sent by the target monitoring device further comprises:

5. The image recognition method according to claim 1, wherein the step of performing a screening process on multiple frames of surveillance video frames included in the target surveillance video to obtain multiple frames of target surveillance video frames comprises:

6. The image recognition method according to claim 5, wherein the step of calculating, for each frame of the surveillance video frame included in the target surveillance video, a similarity between the surveillance video frame and each of the other surveillance video frames to obtain corresponding similarity information includes:

7. The image recognition method of claim 5, wherein the step of determining whether the monitored video frame has an association relationship with each other monitored video frame based on the similarity information between the monitored video frame and each other monitored video frame for each frame of the monitored video frames comprises:

wherein the target operation comprises:

8. The image recognition method according to claim 5, wherein the step of obtaining multiple target surveillance video frames by screening multiple surveillance video frames included in the target surveillance video based on the association relationship comprises:

9. The image recognition method according to any one of claims 1 to 8, wherein the step of performing recognition processing on the target surveillance video frame to obtain a corresponding recognition result comprises:

and identifying the target monitoring video frame based on the action identification model to obtain a corresponding identification result, wherein the identification result comprises whether the behavior action of the target monitoring object meets a preset monitoring condition.

10. The image recognition method according to claim 9, wherein the step of performing recognition processing on the target surveillance video frame based on the motion recognition model to obtain a corresponding recognition result comprises: