CN116189028A

CN116189028A - Image recognition method, device, electronic equipment and storage medium

Info

Publication number: CN116189028A
Application number: CN202211533716.9A
Authority: CN
Inventors: 施依欣; 王冠中; 牛志博; 倪烽; 张亚娴; 陈建业; 吕雪莹; 赵乔; 江左
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-05-30
Anticipated expiration: 2042-11-29
Also published as: CN116189028B

Abstract

The disclosure discloses an image recognition method, an image recognition device, electronic equipment and a storage medium. Relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and comprises the following specific implementation schemes: obtaining a time sequence image sequence to be identified according to the video image; responding to the received selection operation aiming at the target identification task, and obtaining a target processing strategy; and processing the time sequence image sequence to be identified based on the target processing strategy to obtain an identification result.

Description

Image recognition method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and is applied to the technical field of image recognition. And more particularly, to an image recognition method, apparatus, electronic device, and storage medium.

Background

The computer vision technology refers to that an image acquisition device and a computer replace human eyes to perform machine vision such as identification, tracking and measurement on targets, and further perform image processing. Images processed in computer vision technology may include still images and moving images.

With the development of artificial intelligence technology, the application of artificial intelligence technology in computer vision is becoming wider and wider, for example: image recognition based on artificial intelligence techniques, and the like.

Disclosure of Invention

The disclosure provides an image recognition method, an image recognition device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided an image recognition method including:

obtaining a time sequence image sequence to be identified according to the video image;

responding to the received selection operation aiming at the target identification task, and obtaining a target processing strategy; and

and processing the time sequence image sequence to be identified based on the target processing strategy to obtain an identification result.

According to another aspect of the present disclosure, there is provided an image recognition apparatus including: the device comprises a first obtaining module, a second obtaining module and a third obtaining module.

The first acquisition module is used for acquiring a time sequence image sequence to be identified according to the video image;

the second obtaining module is used for responding to the received selection operation aiming at the target identification task to obtain a target processing strategy; and

and the third obtaining module is used for processing the time sequence image sequence to be identified based on the target processing strategy to obtain an identification result.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which image recognition methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an image recognition method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a diagram of deriving a target processing policy in response to receiving a selection operation for a target recognition task, in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of processing spatial relationship features to obtain recognition results, according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a schematic diagram of processing motion features of a first target object to obtain a recognition result according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a schematic diagram of processing a location feature of a target area and a location feature of a first target object to obtain a recognition result according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a schematic diagram of processing a location feature of a first target object and a location feature of a second target object to obtain a recognition result according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of processing features of a first target object and features of a third target object to obtain a recognition result according to an embodiment of the disclosure;

FIG. 9 schematically illustrates an overall exemplary system architecture diagram corresponding to different processing strategies in accordance with an embodiment of the present disclosure;

FIG. 10 schematically illustrates an exemplary system architecture deployment diagram according to an embodiment of the present disclosure;

fig. 11 schematically illustrates a block diagram of an image recognition apparatus according to an embodiment of the present disclosure; and

fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement an image recognition method according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

With the development of artificial intelligence technology, a model meeting a specific task can be obtained based on model training based on a network structure algorithm of the specific task and used for target identification. For example: and (5) a target detection model. However, such specific task models are limited to specific task scenarios, and it is difficult to meet the requirements of complex application scenarios. For example: in the intelligent security scene, the scene requirement may include a recognition requirement of the target object, an abnormal behavior recognition requirement of the target object, an action track recognition requirement of the target object, and the like.

In view of this, an embodiment of the present disclosure provides an image recognition method, including: obtaining a time sequence image sequence to be identified according to the video image; responding to the received selection operation aiming at the target identification task, and obtaining a target processing strategy; and processing the time sequence image sequence to be identified based on the target processing strategy to obtain an identification result.

Fig. 1 schematically illustrates an exemplary system architecture to which image recognition methods and apparatuses may be applied according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the image recognition method and apparatus may be applied may include a terminal device, but the terminal device may implement the image recognition method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications may be installed on the

terminal devices

101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (merely an example) providing support for content browsed by the user with the first terminal apparatus 101, the second terminal apparatus 102, the third terminal apparatus 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the image recognition method provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image recognition method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

For example, when the user selects the target recognition task online, the first terminal device 101, the second terminal device 102 and the third terminal device 103 may acquire identification information of the target recognition task, then send the identification information of the target recognition task and the video image to the server 105, and the server 105 obtains a time sequence image sequence to be recognized according to the video image; analyzing the identification information of the target identification task to determine a target processing strategy; and processing the time sequence image sequence to be identified based on the target processing strategy to obtain an identification result. Or the server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105 determines a target processing policy, and processes the time-series image sequence to be identified based on the target processing policy, so as to obtain the identification result.

The image recognition method provided by the embodiments of the present disclosure may also be generally performed by the first terminal device 101, the second terminal device 102, and the third terminal device 103. Accordingly, the image recognition apparatus provided by the embodiments of the present disclosure may be generally disposed in the first terminal device 101, the second terminal device 102, and the third terminal device 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Fig. 2 schematically illustrates a flowchart of an image recognition method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, a sequence of time-sequential images to be identified is obtained from the video images.

In response to receiving the selection operation for the target recognition task, in operation S220, it is obtained

Target processing strategies.

In operation S230, the time-series image sequence to be recognized is processed based on the target processing policy, and a recognition result is obtained.

According to embodiments of the present disclosure, the video image may be an offline captured video image or a real-time captured video image.

According to the embodiment of the disclosure, according to the video image, a time sequence image sequence to be identified can be obtained by collecting the image frame of each frame. The acquisition period may also be set according to the actual service requirement, for example: the acquisition can be carried out once every 3-10 frames to obtain a time sequence image sequence to be identified.

According to embodiments of the present disclosure, the target recognition task may include any one of the following: abnormal behavior recognition task, target attribute feature recognition task, behavior track recognition task of target object, etc.

According to the embodiment of the disclosure, a user can interact with a server by using a visual interface of a client to obtain a target processing strategy. For example: user selects abnormal behavior Db on visual interface of client ₁ The server side can perform recognition tasks according to the abnormal behavior Db ₁ Obtaining and abnormality behavior Db ₁ Corresponding target processing strategies.

According to the embodiment of the disclosure, the target processing path can be determined based on the target processing policy, and the image feature sequence to be identified is processed based on the target processing path, so that the identification result of the abnormal behavior is obtained.

For example: and abnormal behavior Db ₁ The target processing policy corresponding to the identified task of (c) may include an image feature extraction path and a feature processing path. The method can be used for extracting the image characteristics of the time sequence image sequence to be identified and then processing the image characteristics to obtain an identification result.

For example: the target processing policy corresponding to the recognition task of the behavior trace of the target object may include an extraction path of the object feature, a recognition path of the object feature, an extraction path of the position feature of the target object, a recognition path of the position feature of the target object, and the like. Object features can be extracted first, and a target object is identified; and identifying the position characteristics of the target object to obtain the identification result of the action track of the target object.

According to the embodiment of the disclosure, since the target processing strategy is obtained in response to the received selection operation for the target recognition task, automatic matching of the target recognition task and the target processing strategy is realized based on data interaction between systems so as to adapt to complex application scene requirements. The method solves the problem that a model obtained by training a network structure algorithm based on a specific task in the related art needs secondary development to be applied to an actual application scene.

Operations S210 to S230 may be performed by an electronic device according to an embodiment of the present disclosure. The electronic device may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal device may be the first terminal device 101, the second terminal device 102 or the third terminal device 103 in fig. 1.

According to an embodiment of the present disclosure, operation S220 may include the following operations:

and determining the identification information of the target identification task according to the selection operation. And obtaining target processing strategy information from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task.

The method shown in fig. 2 is further described below with reference to fig. 3-10 in conjunction with the exemplary embodiment.

FIG. 3 schematically illustrates a diagram of deriving a target processing policy in response to receiving a selection operation for a target recognition task, according to an embodiment of the disclosure.

As shown in FIG. 3, in 300, an identification task T may be included in the identification task 321 ₁ (321_1), recognition task T ₂ (321_2), -identifying task T _m (321_m). Identifying task T may be included in identifying task to processing policy mapping 323 ₁ With processing strategy TA ₁ Is to identify task T, and is to identify the mapping relation 323_1 of task T ₂ With processing strategy TA ₂ Mapping relation 323_2, &..a, identifying task T _n With processing strategy TA _n Is defined, is a mapping relation 323—n.

According to the embodiment of the disclosure, according to the selection operation, the identification of the target identification task can be determined as the identification task T ₂ Is indicative of the target recognition task 322 being recognition task T ₂ . According to the recognition task T ₂ Can be derived from identifying a mapping of tasks to processing policiesDetermining target processing policy 324 in relation 323 as identifying task T ₂ Corresponding processing strategy TA ₂ 。

According to an embodiment of the present disclosure, obtaining target processing policy information from a mapping relationship between an identification task and a processing policy according to identification information of the target identification task may include the following operations:

and obtaining the information of the processing strategy to be selected from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task. And responding to the received selection operation aiming at the to-be-selected processing strategy information, and obtaining the target processing strategy.

In accordance with an embodiment of the present disclosure, task T is identified with ₂ The corresponding processing policy may include a plurality of, for example: processing strategy TA _2-1

Treatment strategy TA

_2- 2. .., treatment strategy TA _2-I The processing policy information to be selected may include: processing strategy TA _2-1 Treatment strategy TA _2-2 Treatment strategy TA _2-I 。

According to the embodiment of the disclosure, the policy information to be selected can be displayed on the client through the visual interface, so that a user can conveniently perform selection operation. Based on the selection of the user for the policy information to be selected, a target processing policy may be obtained. For example: user-selected may be a processing policy TA _2-2 Then and identify task T ₂ The corresponding target processing policy may be a processing policy TA _2-2 。

According to the embodiment of the disclosure, the user may select the policy information to be selected, or select two or more processing policies, and the obtained target processing policy may be a combined processing policy of the two or more processing policies.

For example: user-selected may be a processing policy TA _2-2 And processing policy TA _2-3 The resulting target processing policy may be a processing policy TA _2-2 And processing policy TA _2-3 Is a combination of the processing strategies of (a).

According to embodiments of the present disclosure, for a combined processing policy, in combinationThe order of the processing strategies can be determined based on the selection order, and can also be determined based on the association relation of processing logic among the processing strategies in the combination. For example: processing strategy TA _2-2 Is based on a processing policy TA _2-3 The order in the combined processing strategy can be determined as processing strategy TA _2-3 Processing strategy TA _2-2 。

According to embodiments of the present disclosure, the target processing policy may include a target feature extraction policy and a target feature processing policy. Operation S230 may include the following operations:

and extracting target features of the time sequence image sequence to be identified based on a target feature extraction strategy. And processing the target features based on the target feature processing strategy to obtain a recognition result.

Fig. 4 schematically illustrates a schematic diagram of processing spatial relationship features to obtain recognition results according to an embodiment of the disclosure.

As shown in fig. 4, in 400, a sequence of time-series images 432 to be identified is obtained from video data 431, the sequence of time-series images 432 to be identified may include a sequence of time-series images and t ₁ Image P corresponding to time ₁ (432_1), and t ₂ Image P corresponding to time ₂ (432_2), -and t _i Image P corresponding to time _i (432_i)。

According to an embodiment of the present disclosure, based on a target feature extraction policy, spatial relationship features of each image in the sequence of time-series images 432 to be identified are sequentially extracted, resulting in spatial relationship features 433. Spatial relationship feature 433 may include a relationship to t ₁ Spatial relationship characteristic 433_1 corresponding to time and t ₂ Spatial relationship features 433_2, and t corresponding to time _i Spatial relationship feature 433_i corresponding to time. The spatial relationship features characterize the relative motion features between foreground features and background features in an image in the image. For example: the image may include a street, a house, and a plurality of moving objects. The plurality of moving objects may be foreground features of the image and the street, house may be background features of the image. The spatial relative relationship between foreground features and background features can be characterizedFeatures of relative motion between a plurality of moving objects and streets, houses.

According to an embodiment of the present disclosure, a time-varying feature 434 of the spatial relationship feature is derived based on the target feature processing strategy. Based on the time-varying characteristics of the spatial relationship characteristics, recognition results 435 are obtained.

For example: at t ₁ ～t _i At the moment, the relative motion characteristics among the plurality of movable objects, the street and the house can be gradually gathered in the same direction along the time change, the recognition result can be determined to be that the plurality of movable objects have abnormal behaviors, and the abnormal behaviors can be fighting behaviors of the masses.

Fig. 5 schematically illustrates a schematic diagram of processing an action feature of a first target object to obtain a recognition result according to an embodiment of the disclosure.

As shown in fig. 5, in 500, a sequence of time-series images 532 to be identified is obtained from video data 531, the sequence of time-series images 532 to be identified may include a sequence equal to t ₁ Image P corresponding to time ₁ (532_1) and t ₂ Image P corresponding to time ₂ (532_2),. And t _i Image P corresponding to time _i (532_i)。

According to an embodiment of the present disclosure, based on the target feature extraction policy, the motion feature of the first target object of each image in the sequence of time-sequential images 532 to be identified is sequentially extracted, resulting in the motion feature 533 of the first target object. The action features 533 of the first target object may include the sum t ₁ Action feature 533_1 and t of first target object corresponding to time ₂ Action features 533_2, & gt, and t of the first target object corresponding to the time _i Action feature 533—i of the first target object corresponding to the moment.

According to embodiments of the present disclosure, the motion features may include a motion direction feature and a motion change feature. The motion direction feature may characterize a motion direction feature of the motion of the first target object relative to the ground. The motion change feature may characterize a motion change feature of the first target object.

According to an embodiment of the present disclosure, a time-varying feature 534 of the motion feature of the first target object is derived based on the target feature processing policy. The recognition result 535 is obtained from the time-varying characteristics of the motion characteristics of the first target object.

For example: at t ₁ ～t _i At this time, the movement direction characteristic of the first target object may be the same with respect to the ground as time goes by, and gradually shorten the distance from the ground. The motion change feature of the first target object may be that the motion of the first target object changes from a knee bending motion to a body leaning forward motion, and finally changes to a creeping motion of the ground supported by both hands. It may be determined that an abnormal behavior has occurred in the current first target object, and the abnormal behavior may be a falling behavior.

According to embodiments of the present disclosure, the motion characteristics of the first target object may characterize target skeletal point characteristics of the first target object. Based on the target feature processing strategy, the processing of the action features of the first target object to obtain the time-varying features of the action features of the first target object may include the following operations:

processing the characteristics of the target skeleton points based on a target characteristic processing strategy to obtain the change trend characteristics of the target skeleton points along with the change of time; and obtaining the time-varying characteristics of the action characteristics of the first target object according to the time-varying variation trend characteristics of the target skeleton points.

For example: the target skeletal points may include skeletal points of the torso, limbs, head, etc. of a human body. The change in skeletal point characteristics may more accurately characterize the change in motion characteristics of the first target object. And processing the target skeleton point characteristics based on a target characteristic processing strategy, so that the accuracy of image recognition can be improved.

Fig. 6 schematically illustrates a schematic diagram of processing a location feature of a target area and a location feature of a first target object to obtain a recognition result according to an embodiment of the present disclosure.

As shown in fig. 6, in 600, a sequence of time-series images 632 to be identified is obtained from video data 631, the sequence of time-series images 632 to be identified may include a sequence of time-series images corresponding to t ₁ Image corresponding to timeP ₁ (632_1), and t ₂ Image P corresponding to time ₂ (632_2),. And t _i Image P corresponding to time _i (632_i)。

According to an embodiment of the present disclosure, based on a target feature extraction policy, a position feature of a first target object and a position feature of a target area of each image in the sequence of time-series images 632 to be identified are sequentially extracted, resulting in a position feature 633 of the first target object and a position feature 634 of the target area. The location features 633 of the first target object may include the sum t ₁ Position characteristics 633_1 and t of first target object corresponding to time ₂ Location characteristics 633_2, & gt, and t of the first target object corresponding to the time of day _i Position feature 633—i of the first target object corresponding to the moment. The location features 634 of the target region may include the sum t ₁ Position characteristics 633_1 and t of second target object corresponding to time ₂ Location features 634_2, # and t of the target area corresponding to the time of day _i Location characteristics 634_i of the target area corresponding to the moment.

According to an embodiment of the present disclosure, the location feature of the target area and the location feature of the first target object are processed based on the target feature processing policy to obtain a first change feature 635, where the first change feature characterizes a feature of a first relative location relationship of the first target object and the target area that changes over time. Based on the first variation characteristic 635, a recognition result 636 is obtained.

For example: the position feature of the first target object may be a pixel coordinate feature, and the feature that the first relative position relationship between the first target object and the target area changes with time may include that a distance between the pixel coordinate of the first target object and an edge coordinate of the target area is gradually reduced until the pixel coordinate of the first target object is within the target area. Indicating that the first target object enters the target area, the obtained recognition result may be that the first object has already performed a behavior of entering the target area.

as shown in fig. 7, in 700, a sequence of sequential images 732 to be identified is derived from video data 731, the sequence of sequential images 732 to be identified may include a sequence equal to t ₁ Image P corresponding to time ₁ (732_1) and t ₂ Image P corresponding to time ₂ (732_2),. And t _i Image P corresponding to time _i (732_i)。

According to an embodiment of the present disclosure, based on a target feature extraction policy, a position feature of a first target object and a position feature of a second target object of each image in a sequence 732 of time-series images to be identified are sequentially extracted, resulting in a position feature 733 of the first target object and a position feature 734 of the second target object. The location feature 733 of the first target object may include a value equal to t ₁ Position features 733_1 and t of the first target object corresponding to time ₂ Position features 733_2, & gt, and t of the first target object corresponding to the time _i The position feature 733—i of the first target object corresponding to the moment. The location feature 734 of the second target object may include the sum t ₁ Position features 733_1 and t of the second target object corresponding to the time ₂ Location characteristics 734_2, # and t of the second target object corresponding to the time instant _i Position feature 734_i of the second target object corresponding to the moment.

According to an embodiment of the present disclosure, the location feature of the first target object and the location feature of the second target object are processed based on the target feature processing policy, resulting in a second change feature 735. Based on the second variation feature 735, a recognition result 736 is obtained.

According to an embodiment of the present disclosure, the second variation feature characterizes a second relative positional relationship of the first target object and the second target object over time. The first target object may represent a person and parts of the person such as the ear, head or mouth, and the second target object may represent any item, such as: cell phones, cigarettes, etc. The time-varying feature of the relative positional relationship of the first target object and the second target object may be a time-varying feature of the relative position between the person and the handset.

For example: the relative position relationship between the first target object and the second target object gradually approaches, which means that the distance between the mobile phone and the ear or the head of the person gradually shortens, so that the behavior of the first target object can be determined as the behavior that the person uses the mobile phone to make a call, and the obtained recognition result is that the first target object has made a call.

Fig. 8 schematically illustrates a schematic diagram of processing features of a first target object and features of a third target object to obtain a recognition result according to an embodiment of the disclosure.

As shown in fig. 8, in 800, a sequence of time-series images 832 to be identified is obtained from video data 831, the sequence of time-series images 832 to be identified may include a sequence of time-series images and t ₁ Image P corresponding to time ₁ (832_1), and t ₂ Image P corresponding to time ₂ (832_2),. And t _i Image P corresponding to time _i (832_i)。

According to an embodiment of the present disclosure, based on a target feature extraction policy, a position feature of a first target object and a position feature of a third target object of each image in the sequence of time-series images 832 to be identified are sequentially extracted, resulting in a position feature 833 of the first target object and a position feature 834 of the second target object. The location feature 833 of the first target object may include the sum t ₁ Position characteristics 833_1 and t of first target object corresponding to time ₂ Position features 833_2, & gt, and t of the first target object corresponding to the time _i Position feature 833—i of the first target object corresponding to the moment. The location feature 834 of the second target object may include a value associated with t ₁ Position characteristics 833_1 and t of the second target object corresponding to time ₂ Location feature 834_2, # and t of the second target object corresponding to the moment _i Position feature 834_i of the second target object corresponding to the moment.

According to an embodiment of the present disclosure, the video image 831 may be acquired by an acquisition device in a different acquisition direction, or may be acquired by an acquisition device in the same acquisition direction.

According to an embodiment of the present disclosure, based on a target processing policy, the following operations may be performed:

the features of the first target object and the features of the third target object are matched to obtain a feature matching result 835. In the case that the feature matching result satisfies the predetermined threshold, a recognition result 836 is obtained from the features of the first target object and the features of the third target object.

According to an embodiment of the present disclosure, in a case where the feature matching result satisfies a predetermined threshold, obtaining the recognition result according to the feature of the first target object and the feature of the third target object may include the following operations:

and under the condition that the feature matching result meets a preset threshold value, determining that the first target object and the third target object are the same target object. And obtaining the attribute characteristics of the target object according to the characteristics of the first target object and the characteristics of the third target object. And obtaining a recognition result according to the attribute characteristics of the target object.

According to an embodiment of the present disclosure, the characteristics of the first target object and the characteristics of the third target object may each include characteristics and action characteristics of different parts of the human body, for example: facial features, clothing color features, walking posture features, and the like.

According to the embodiment of the disclosure, the feature matching result is obtained by using the similarity between the features of the first target object and the features of the third target object. When the similarity satisfies a predetermined threshold, the feature representing the first target object matches the feature of the third target object, and it may be determined that the first target object and the third target object are the same object.

According to the embodiment of the disclosure, in the case that the first target object and the third target object are determined to be the same object, the method can be used for counting the crowd flow in a certain time period.

According to an embodiment of the present disclosure, the features of the first target object may further comprise a relative motion direction feature of the first target object and a background object in the image.

For example: the relative movement direction of the first target object in the image relative to the background object (such as house) is far away from the house, and the crowd flow leaving the house in a certain time period can be counted according to the relative movement direction characteristics of the first target object and the background object in the image. Similarly, when the relative movement direction of the first target object relative to the background object (for example, house) is that the first target object enters the house, the crowd flow entering the house in a certain time period can be counted according to the relative movement direction characteristics of the first target object and the background object in the image.

Fig. 9 schematically illustrates an overall exemplary system architecture diagram corresponding to different processing strategies in accordance with an embodiment of the present disclosure.

As shown in fig. 9, in 900, processing modules corresponding to different processing strategies are configured. In the case that the target recognition task is the first abnormal behavior recognition, the processing module corresponding to the target processing policy may include: a detection tracking module 903, a keypoint detection module 904, a first abnormal behavior identification module 905. And processing the video image 901 of the first acquisition device based on a target processing strategy corresponding to the first abnormal behavior recognition task to obtain a time sequence image sequence 902 to be recognized. The detection tracking module 903 processes the sequence 902 of time-series images to be identified to obtain the bone point features of the first target object. The keypoint monitoring module 904 processes the skeletal point features of the first target object to obtain target skeletal point features. The first abnormal behavior recognition module 905 is used for processing the target bone point characteristics to obtain a recognition result of the first abnormal behavior.

According to an embodiment of the present disclosure, in the case of changing the target recognition task, the target processing policy is changed accordingly. The time-series image sequence 902 to be recognized is processed by using other behavior recognition modules corresponding to the target recognition task, which will not be described herein.

According to an embodiment of the present disclosure, in the case where the target recognition task is an in-out calculation statistics task or an attribute recognition task, the processing may be performed in conjunction with the video image 910 of the second acquisition device and the video image 901 of the second acquisition device. For example: the sequence of time-series images 911 to be identified is processed by the detection tracking module 912 to obtain the characteristics of the third target object. For example, the detection tracking module 903 processes the sequence of time-series images 902 to be identified to obtain features of the first target object. The feature matching module 913 is used to process the features of the first target object and the features of the third target object, so as to obtain a feature matching result. And then the attribute recognition module 914 is utilized to process the features of the first target object and the features of the third target object to obtain an attribute recognition result. Or the same target object is calculated and counted by using the access counting module 915 to obtain an access counting statistical result.

Fig. 10 schematically illustrates an exemplary system architecture deployment diagram according to an embodiment of the present disclosure.

As shown in fig. 10, in an exemplary system architecture deployment diagram 1000, an algorithm architecture layer 1001, an application layer 1002, and a deployment layer 1003 are included. In the algorithm architecture layer 1001, the input data 10011 may include an image file, a single-shot video image, a multi-shot video image. The algorithm principle 10012 is to obtain target characteristics by performing target detection on input data. The target features are processed by using technologies such as multi-target tracking technology, feature association technology, facial recognition technology, track fusion technology and the like, so as to obtain output data 10013. The specific processing strategy is determined according to the target recognition task, and the output data 10013 may include attribute recognition results, abnormal behavior recognition results, trajectory/flow count, etc.

The application layer 1002 deploys various application functional modules, including an abnormal behavior early warning functional module 10021, a traffic density monitoring functional module 10022, an in-out traffic control functional module 10023, a video structuring functional module 10024, an attribute analysis functional module 10025, and the like.

The deployment layer 1003 includes a native Inference library (Paddle information) 10031, a serviced deployment framework (Paddle service) 10032, and a deep learning optimization Inference optimizer (TensorRT) 10033.

Fig. 11 schematically illustrates a block diagram of an image recognition apparatus according to an embodiment of the present disclosure.

As shown in fig. 11, the image recognition apparatus 1100 includes a first obtaining module 1101, a second obtaining module 1102, and a third obtaining module 1103.

The first obtaining module is used for obtaining a time sequence image sequence to be identified according to the video image.

And the second obtaining module is used for obtaining the target processing strategy in response to receiving the selection operation aiming at the target identification task.

According to an embodiment of the present disclosure, the second obtaining module 1102 may include a determining unit and a first obtaining unit.

And the determining unit is used for determining the identification information of the target identification task according to the selection operation.

The first obtaining unit is used for obtaining target processing strategy information from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task.

According to an embodiment of the present disclosure, the first obtaining unit may include a first obtaining subunit and a second obtaining subunit.

The first obtaining subunit is configured to obtain, according to the identification information of the target identification task, information of a processing policy to be selected from a mapping relationship between the identification task and the processing policy.

The second obtaining subunit is used for responding to the received selection operation aiming at the to-be-selected processing strategy information to obtain the target processing strategy.

According to an embodiment of the present disclosure, the target processing policy includes a target feature extraction policy and a target feature processing policy, and the third obtaining module 1103 may include: a second obtaining unit and a third obtaining unit.

And the second obtaining unit is used for extracting the target characteristics of the time sequence image sequence to be identified based on the target characteristic extraction strategy.

And the third obtaining unit is used for processing the target characteristics based on the target characteristic processing strategy to obtain the identification result.

According to an embodiment of the present disclosure, the target feature includes a spatial relationship feature, and the third obtaining unit includes: a third obtaining subunit and a fourth obtaining subunit.

And the third obtaining subunit is used for processing the spatial relationship features based on the target feature processing strategy to obtain the features of the spatial relationship features changing along with time.

And the fourth obtaining subunit is used for obtaining the identification result according to the time-varying characteristics of the spatial relationship characteristics.

According to an embodiment of the present disclosure, the target feature includes an action feature of the first target object, and the third obtaining unit includes: a fifth obtaining subunit and a sixth obtaining subunit.

And a fifth obtaining subunit, configured to process the motion feature of the first target object based on the target feature processing policy, to obtain a feature that the motion feature of the first target object changes with time.

And a sixth obtaining subunit, configured to obtain a recognition result according to the feature of the motion feature of the first target object that changes with time.

According to an embodiment of the disclosure, the motion feature of the first target object characterizes a target skeleton point feature of the first target object, and the sixth obtaining subunit is configured to process the target skeleton point feature based on a target feature processing policy, so as to obtain a change trend feature of the target skeleton point over time. And obtaining the time-varying characteristics of the action characteristics of the first target object according to the time-varying variation trend characteristics of the target skeleton points.

According to an embodiment of the present disclosure, the target feature includes a position feature of the target region and a position feature of the first target object, and the third obtaining unit includes: a seventh obtaining subunit and an eighth obtaining subunit.

And a seventh obtaining subunit, configured to process, based on the target feature processing policy, the position feature of the target area and the position feature of the first target object to obtain a first change feature, where the first change feature characterizes a feature that a first relative position relationship between the first target object and the target area changes with time. And an eighth obtaining subunit, configured to obtain a recognition result according to the first variation feature.

According to an embodiment of the present disclosure, the target feature includes a position feature of the first target object and a position feature of the second target object, and the third obtaining unit includes: a ninth obtaining subunit and a tenth obtaining subunit.

And a ninth obtaining subunit, configured to process, based on the target feature processing policy, the position feature of the first target object and the position feature of the second target object to obtain a second change feature, where the second change feature characterizes a feature of a second relative position relationship between the first target object and the second target object that changes with time. And tenth obtaining a subunit, configured to obtain a recognition result according to the second variation feature.

According to an embodiment of the present disclosure, the target features include: the characteristics of the first target object and the characteristics of the third target object are extracted from at least two images with different acquisition directions; the third obtaining unit includes: an eleventh obtaining subunit and a twelfth obtaining subunit.

And the eleventh obtaining subunit is configured to match the features of the first target object with the features of the third target object based on the target feature processing policy, so as to obtain a feature matching result.

A twelfth obtaining subunit, configured to obtain, when the feature matching result meets the predetermined threshold, a recognition result according to the feature of the first target object and the feature of the third target object.

According to an embodiment of the present disclosure, the twelfth obtaining subunit is configured to: and under the condition that the feature matching result meets a preset threshold value, determining that the first target object and the third target object are the same target object. And obtaining the attribute characteristics of the target object according to the characteristics of the first target object and the characteristics of the third target object. And obtaining a recognition result according to the attribute characteristics of the target object.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

Fig. 12 shows a schematic block diagram of an example electronic device 1200 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the apparatus 1200 includes a computing unit 1201, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 800 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 804. An input/output (I/O) interface 1205 is also connected to the bus 1204.

Various components in device 1200 are connected to I/O interface 1205, including: an input unit 1206 such as a keyboard, mouse, etc.; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208 such as a magnetic disk, an optical disk, or the like; and a communication unit 1209, such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the respective methods and processes described above, for example, the image recognition method. For example, in some embodiments, the image recognition method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1200 via ROM 1202 and/or communication unit 1209. When a computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the image recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the image recognition method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image recognition method, comprising:

2. The method of claim 1, wherein the deriving the target processing policy in response to receiving the selection operation for the target recognition task comprises:

Determining identification information of a target identification task according to the selection operation; and

and obtaining the target processing strategy information from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task.

3. The method according to claim 2, wherein the obtaining the target processing policy information from the mapping relationship between the identification task and the processing policy according to the identification information of the target identification task includes:

obtaining the information of the processing strategy to be selected from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task; and

and responding to the received selection operation aiming at the processing strategy information to be selected, and obtaining the target processing strategy.

4. The method of claim 1, wherein the target processing policy includes a target feature extraction policy and a target feature processing policy, and the processing the sequence of time-series images to be identified based on the target processing policy to obtain the identification result includes:

extracting target features of the time sequence image sequence to be identified based on the target feature extraction strategy; and

And processing the target feature based on the target feature processing strategy to obtain the identification result.

5. The method of claim 4, wherein the target feature comprises a spatial relationship feature, the processing the target feature based on the target feature processing policy to obtain the recognition result comprises:

processing the spatial relationship features based on the target feature processing strategy to obtain the time-varying features of the spatial relationship features; and

and obtaining the identification result according to the time-varying characteristics of the spatial relationship characteristics.

6. The method of claim 4, wherein the target feature comprises an action feature of a first target object, the processing the target feature based on the target feature processing policy to obtain the recognition result comprises:

processing the action characteristics of the first target object based on the target characteristic processing strategy to obtain the time-varying characteristics of the action characteristics of the first target object; and

and obtaining the identification result according to the time-varying characteristics of the action characteristics of the first target object.

7. The method of claim 6, wherein the motion feature of the first target object characterizes a target skeletal point feature of the first target object, wherein the processing the motion feature of the first target object based on the target feature processing policy results in a time-varying feature of the motion feature of the first target object, comprising:

processing the target skeleton point characteristics based on the target characteristic processing strategy to obtain change trend characteristics of the target skeleton points along with time change; and

and obtaining the time-varying characteristics of the action characteristics of the first target object according to the time-varying variation trend characteristics of the target skeleton points.

8. The method of claim 4, wherein the target features include a location feature of a target region and a location feature of a first target object, the processing the target features based on the target feature processing policy to obtain the recognition result comprises:

processing the position features of the target area and the position features of the first target object based on the target feature processing strategy to obtain a first change feature, wherein the first change feature characterizes the time-varying feature of a first relative position relationship between the first target object and the target area; and

And obtaining the identification result according to the first change characteristic.

9. The method of claim 4, wherein the target features include a location feature of a first target object and a location feature of a second target object, the processing the target features based on the target feature processing policy to obtain the recognition result comprises:

processing the position features of the first target object and the position features of the second target object based on the target feature processing strategy to obtain second change features, wherein the second change features represent the time-varying features of the second relative position relationship of the first target object and the second target object; and

and obtaining the identification result according to the second change characteristic.

10. The method of claim 4, wherein the target feature comprises: the method comprises the steps that characteristics of a first target object and characteristics of a third target object are extracted from at least two images with different acquisition directions; the processing the target feature based on the target feature processing policy to obtain the identification result includes:

Based on the target feature processing strategy, matching the features of the first target object with the features of the third target object to obtain a feature matching result; and

and under the condition that the characteristic matching result meets a preset threshold value, obtaining the identification result according to the characteristics of the first target object and the characteristics of the third target object.

11. The method according to claim 10, wherein the obtaining the recognition result according to the feature of the first target object and the feature of the third target object if the feature matching result satisfies a predetermined threshold value includes:

under the condition that the feature matching result meets a preset threshold value, determining that the first target object and the third target object are the same target object;

obtaining target object attribute characteristics according to the characteristics of the first target object and the characteristics of the third target object; and

and obtaining the identification result according to the attribute characteristics of the target object.

12. An image recognition apparatus comprising:

13. The apparatus of claim 12, wherein the second obtaining means comprises:

the determining unit is used for determining the identification information of the target identification task according to the selection operation; and

the first obtaining unit is used for obtaining the target processing strategy information from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task.

14. The apparatus of claim 13, wherein the first obtaining unit comprises:

the first obtaining subunit is used for obtaining the information of the processing strategy to be selected from the mapping relation between the identification task and the processing strategy according to the identification information of the target identification task; and

and the second obtaining subunit is used for responding to the received selection operation aiming at the processing strategy information to be selected to obtain the target processing strategy.

15. The apparatus of claim 12, wherein the target processing policy comprises a target feature extraction policy and a target feature processing policy, the third obtaining module comprising:

The second obtaining unit is used for extracting target features of the time sequence image sequence to be identified based on the target feature extraction strategy; and

and the third obtaining unit is used for processing the target feature based on the target feature processing strategy to obtain the identification result.

16. The apparatus of claim 15, wherein the target feature comprises a spatial relationship feature, the third obtaining unit comprising:

a third obtaining subunit, configured to process the spatial relationship feature based on the target feature processing policy, to obtain a feature of the spatial relationship feature that changes with time; and

and a fourth obtaining subunit, configured to obtain the identification result according to the feature of the spatial relationship feature that changes with time.

17. The apparatus of claim 15, wherein the target feature comprises an action feature of a first target object, the third obtaining unit comprising:

a fifth obtaining subunit, configured to process, based on the target feature processing policy, the motion feature of the first target object to obtain a feature that the motion feature of the first target object changes with time; and

And a sixth obtaining subunit, configured to obtain the identification result according to a feature of the motion feature of the first target object that changes with time.

18. The apparatus of claim 17, wherein the motion characteristics of the first target object characterize target skeletal point characteristics of the first target object, the sixth obtaining subunit is configured to:

19. The apparatus of claim 15, wherein the target features include a position feature of a target region and a position feature of a first target object, the third obtaining unit comprising:

a seventh obtaining subunit, configured to process, based on the target feature processing policy, a position feature of the target area and a position feature of the first target object to obtain a first change feature, where the first change feature characterizes a feature that a first relative positional relationship between the first target object and the target area changes with time; and

And an eighth obtaining subunit, configured to obtain the identification result according to the first variation feature.

20. The apparatus of claim 15, wherein the target features include a position feature of a first target object and a position feature of a second target object, the third obtaining unit comprising:

a ninth obtaining subunit, configured to process, based on the target feature processing policy, a position feature of the first target object and a position feature of the second target object to obtain a second change feature, where the second change feature characterizes a second time-varying feature of a second relative positional relationship between the first target object and the second target object; and

and tenth obtaining a subunit, configured to obtain the identification result according to the second variation feature.

21. The apparatus of claim 14, wherein the target feature comprises: the method comprises the steps that characteristics of a first target object and characteristics of a third target object are extracted from at least two images with different acquisition directions; the third obtaining unit includes:

an eleventh obtaining subunit, configured to match, based on the target feature processing policy, features of the first target object with features of the third target object, to obtain a feature matching result; and

A twelfth obtaining subunit, configured to obtain the identification result according to the feature of the first target object and the feature of the third target object when the feature matching result meets a predetermined threshold.

22. The apparatus of claim 21, wherein the twelfth obtaining subunit is configured to:

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-11.