CN114842416A

CN114842416A - Method for counting number of targets in region, and method and device for training region recognition model

Info

Publication number: CN114842416A
Application number: CN202210454214.0A
Authority: CN
Inventors: 陈士辉; 刘坤; 翁力帆
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-08-02

Abstract

The embodiment of the application provides a method for counting the number of targets in a region, a method and a device for training a region recognition model, relates to the technical field of image processing, and is used for automatically recognizing one or more regions of a target scene so as to improve the efficiency of counting target objects in the region. The specific scheme is as follows: acquiring an image to be identified, wherein the image to be identified is an image obtained by shooting a target scene; inputting an image to be recognized into a region recognition model to obtain characteristic information of a target scene; the feature information is used to indicate one or more target regions of the target scene; dividing an image to be identified into a plurality of image blocks; determining whether a first target object in the image to be identified is in a target area or not according to the obtained plurality of image blocks; the number of target objects within the target area is determined.

Description

Method for counting number of targets in region, and method and device for training region recognition model

Technical Field

The application relates to the technical field of image processing, in particular to a regional target number statistical method, a regional recognition model training method and a regional recognition model training device.

Background

The statistical method for the number of target objects (such as people, vehicles and the like) in public places can be applied to multiple fields of video monitoring, smart cities, public safety and the like, and has high practical value. At present, a large number of video monitoring devices are widely applied to public places such as stations, museums, squares, banks and supermarkets, and the number of target objects in different areas (such as fresh areas of supermarkets, ticket offices of stations, sidewalks at intersections and the like) of the public places is effectively monitored and analyzed by utilizing image information which is acquired by the video monitoring devices, so that the video monitoring devices are an indispensable part in the aspects of construction and management of the public places.

However, in the method of counting the number of target objects, generally, the boundaries of each region in one scene are manually configured, and then the number of recognition objects of each region is counted. Thus, a large amount of manpower is consumed, and the statistical efficiency of the number of target objects is affected.

Disclosure of Invention

The embodiment of the application provides a method for counting the number of targets in an area, a method and a device for training an area recognition model, and the accuracy of counting the number of people in the area can be improved.

In a first aspect, an embodiment of the present application provides a method for counting number of targets in a region, where the method specifically includes: acquiring an image to be identified, wherein the image to be identified is an image obtained by shooting a target scene; inputting an image to be recognized into a region recognition model to obtain characteristic information of a target scene; the feature information is used to indicate one or more target regions of the target scene; dividing an image to be recognized into a plurality of image blocks; determining whether a first target object in the image to be identified is in a target area or not according to the obtained plurality of image blocks; the number of target objects within the target area is determined.

Based on the technical scheme provided by the application, the following beneficial effects can be generated at least: the method is based on a trained area recognition model, firstly, an image to be recognized of a target scene is input into the area recognition model, and one or more target areas of the target scene are automatically recognized. Therefore, each region does not need to be manually divided, a large amount of manpower is saved, and the efficiency of identifying one or more regions in the target scene is improved. Furthermore, the method can sequentially identify whether each target object in the image to be identified is in one region of the target scene, so that the region where each target object is located can be accurately judged, the number of the target objects in each region is counted, and the accuracy of counting the number of the target objects in the region is improved.

In a possible implementation manner, the determining whether the first target object in the image to be recognized is in the target area according to the plurality of image blocks includes: determining the number of image blocks included by a first target object and the same number of image blocks included by a target area in an image to be identified and the repetition proportion of the number of the image blocks included by the first target object; and when the obtained repetition ratio is greater than or equal to a preset threshold value, determining that the first target object is in the target area.

It can be understood that in some images to be recognized, if the first target object is in the target area, all image blocks included in the first target object should belong to the image blocks included in the target area. However, in other images to be identified, when the first target object is in the target area, a part of image blocks included in the first target object belong to image blocks included in the target area, another part of image blocks included in the first target object may not belong to image blocks included in the target area, but a ratio of the number of image blocks included in the target area to the total number of image blocks included in the first target object is usually greater than or equal to a preset threshold. Thus, in the above implementation, the method may determine whether the first target object is within the target region by calculating the above repetition rate.

In another possible implementation manner, before determining a repetition ratio between a number of image blocks included in a first target object and a number of same image blocks in an image block included in a target area in an image to be recognized and a number of image blocks included in the first target object, the method further includes: and determining image blocks included by the first target object and the target area in the image to be recognized.

In another possible implementation manner, the determining image blocks included in the first target object and the target area in the image to be recognized specifically includes: for any image block in the plurality of image blocks, when the central point of any image block is in the identification range of the first target object, determining that the first target object in the image to be identified comprises any image block; and/or when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block.

In yet another possible implementation manner, the method further includes: and when the repetition proportion is smaller than a preset threshold value, determining that the first target object is not in the target area.

In yet another possible implementation manner, the feature information further includes one or more boundary lines of the target scene, and the method further includes: acquiring track information of a first target object in a preset time period, wherein the preset time period comprises a first moment when the first target object enters a target area; determining a connecting line between a track point at a first moment and a track point at a second moment in track information of a preset time period, wherein the second moment is before the first moment and belongs to the preset time period; and if the connecting line has an intersection point with one or more boundary lines, determining that the first target object is not in the target area.

It is understood that in the image to be recognized, if there is a transparent obstacle (e.g., a glass wall, a transparent shelf, etc.) in the target scene, and the first target object (e.g., a person) is near the glass wall, whether the person is inside or outside the glass wall, the image information of the person is included in the image to be recognized. However, one or more regions in the target scene are usually divided according to the intersection line of the obstacle and the ground (i.e. the above-mentioned division plane), and the inside and the outside of the glass wall belong to different regions. Therefore, when the repetition rate is greater than or equal to a preset threshold, the statistical device can determine that the person is near the glass wall. Further, the statistical device may determine whether an intersection point exists between a connection line between the track point at the first time and the track point at the second time and one or more boundary lines of the target scene according to the trajectory information of the person. Since the glass wall cannot be directly spanned, if there is a crossing point, it is determined that the person is not within the area within the wall.

In a second aspect, an embodiment of the present application further provides a method for training a region recognition model, where the method includes: acquiring a training sample set, wherein the training sample set comprises one or more images marked with feature information of a target scene, and the feature information comprises one or more areas and one or more boundary lines of the target scene; and training the initial model according to the acquired training sample set to obtain a region identification model, wherein the region identification model is used for performing region identification on the image of the target scene.

It can be understood that, for a target scene requiring region identification, the training device takes one or more images labeled with feature information of the target scene as a training sample set to train the initial model, so as to obtain a region identification model which can be used for region identification of the target scene. In this way, with the trained region identification model, automatic identification of one or more regions of the target scene can be achieved.

In a possible implementation manner, the obtaining of the training sample set specifically includes: acquiring a plurality of images of a target scene; for a first image shot at a first shooting angle in the plurality of images, converting the first image into an image with a shooting angle of a downward shot, wherein the first shooting angle is a horizontal shot or a upward shot; then acquiring characteristic information marked on the image with the shooting angle of prone shooting; converting the first image marked with the characteristic information into a first image shot at a first shooting angle; therefore, the first image with the shooting angle being the first visual angle and marked with the characteristic information is determined as the image in the training sample set.

It is to be understood that, in the embodiment of the present application, the shooting angle of the image to be recognized may include a tilt, a flat, or a bent. Typically, for a target scene (e.g., a restaurant), one or more regions are typically divided according to the specific placement of items (e.g., the placement of tables and chairs in a dining area), the placement of walls, etc. in the scene, i.e., one or more regions are typically divided according to the planar layout of the target scene. Therefore, in the image to be recognized at the first shooting angle, the actual placing position of the object in the target area may not be shot, and one or more areas of the target scene are not conveniently marked on the image. Therefore, in the implementation mode, the first image can be converted into the first image with the shooting angle being a down shot by converting the image, and then the one or more marked areas are acquired, so that more accurate characteristic information of the target scene can be obtained. Furthermore, the first image with the shooting angle of the first visual angle and the characteristic information is determined as the image in the training sample set, so that the identification accuracy of the obtained region identification model can be improved.

In another possible implementation manner, the method further includes: when the area included by the target scene changes, acquiring one or more images marked with the changed characteristic information of the target scene; and retraining the area recognition model according to one or more images marked with the changed target scene characteristic information to obtain the retrained area recognition model.

In a third aspect, the present application provides an apparatus for counting the number of targets in a region, the apparatus comprising: the receiving and sending module is used for acquiring an image to be identified, wherein the image to be identified is an image obtained by shooting a target scene; the recognition module is used for inputting the image to be recognized into the region recognition model to obtain the characteristic information of the target scene; the feature information is used for indicating one or more target areas of the target scene; the processing module is used for dividing the image to be identified into a plurality of image blocks; the processing module is further used for determining whether a first target object in the image to be identified is in the target area according to the plurality of image blocks; and determining the number of target objects within the target area.

In a possible implementation manner, the processing module is specifically configured to: determining the number of image blocks included by a first target object and the same number of image blocks included by a target area in an image to be identified and the repetition proportion of the number of the image blocks included by the first target object; and when the repetition ratio is greater than or equal to a preset threshold value, determining that the first target object is in the target area.

In another possible implementation manner, the processing module is further configured to determine an image block included in the first target object and an image block included in the target area in the image to be recognized.

In another possible implementation manner, the processing module is specifically configured to: for any image block in the plurality of image blocks, when the central point of any image block is in the identification range of the first target object, determining that the first target object in the image to be identified comprises any image block; and/or when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block.

In another possible implementation manner, the processing module is further configured to determine that the first target object is not located in the target area when the repetition rate is smaller than a preset threshold.

In another possible implementation manner, the characteristic information further includes one or more boundary lines of the target scene, and the transceiver module is further configured to acquire trajectory information of the first target object at a preset time interval, where the preset time interval includes a first time when the first target object enters the target area, and the processing module is further configured to determine a connection line between a trajectory point at the first time and a trajectory point at a second time in the trajectory information at the preset time interval, where the second time is before the first time, and the second time belongs to the preset time interval; and determining that the first target object is not within the target area when the connecting line has an intersection with the one or more boundary lines.

In a fourth aspect, an embodiment of the present application provides a training apparatus for a region recognition model, where the apparatus includes: the system comprises a receiving and sending module, a processing module and a processing module, wherein the receiving and sending module is used for acquiring a training sample set, the training sample set comprises one or more images marked with characteristic information of a target scene, and the characteristic information comprises one or more areas and one or more boundary lines of the target scene; and the training module is used for training the initial model according to the training sample set so as to obtain a region identification model, and the region identification model is used for carrying out region identification on the image of the target scene.

In a possible implementation manner, the apparatus further includes a processing module, and the transceiver module is specifically configured to: acquiring a plurality of images of a target scene; for a first image shot at a first shooting angle in the plurality of images, the processing module is used for converting the first image into a first image with a shooting angle of a downward shot, and the first shooting angle is a horizontal shot or a upward shot; the receiving and sending module is further specifically used for acquiring characteristic information marked on the image with the shooting angle being a down shot; the processing module is further used for converting the first image marked with the characteristic information into a first image shot at a first shooting angle; the processing module is further used for determining the first image with the shooting angle being the first visual angle and marked with the characteristic information as the image in the training sample set.

In another possible implementation manner, the transceiver module is further configured to acquire one or more images marked with changed target scene feature information when a region included in the target scene changes; the training module is further configured to retrain the region identification model according to the one or more images labeled with the changed target scene feature information, so as to obtain a retrained region identification model.

In a fifth aspect, the present application provides an electronic device, comprising: one or more processors; one or more memories; wherein the one or more memories are for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the method as provided in the first aspect above or the method as provided in the second aspect above.

In a sixth aspect, the present application provides a chip system, which is applied to an electronic device; the chip system includes one or more interface circuits, and one or more processors. The interface circuit and the processor are interconnected through a line; the interface circuit is configured to receive signals from a memory of the electronic device and to send signals to the processor, the signals including computer instructions stored in the memory. The computer instructions, when executed by the processor, cause the electronic device to perform the method as provided in the first aspect above, or the method as provided in the second aspect above.

In a seventh aspect, the present application provides a computer-readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the method as provided in the first aspect above or the method as provided in the second aspect above.

In an eighth aspect, the present application provides a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method as provided in the first aspect above or the method as provided in the second aspect above.

For a detailed description of the third to eighth aspects and various implementations thereof in the present application, reference may be made to the detailed description of the first or second aspect and various implementations thereof; in addition, for the beneficial effects of the third aspect to the eighth aspect and various implementation manners thereof, reference may be made to beneficial effect analysis in the first aspect or the second aspect and various implementation manners thereof, and details are not described here.

Drawings

Fig. 1 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for counting the number of targets in a region according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a target scene according to an embodiment of the present application;

FIG. 4 is a schematic diagram of track information provided in an embodiment of the present application;

FIG. 5 is a flowchart of another method for counting the number of targets in a region according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of another target scenario provided by an embodiment of the present application;

fig. 7 is a flowchart of a region identification model training method according to an embodiment of the present disclosure;

fig. 8 is a schematic composition diagram of a statistical apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic composition diagram of an exercise device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of this application, "/" means "or" unless otherwise stated, for example, A/B may mean A or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.

It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

At present, the application of the method for counting the number of targets in an area is more and more extensive, for example, the method can be used for counting the number of people in the area in public places such as stations, museums, squares, banks and supermarkets. Taking a supermarket as an example, when displaying commodities in the supermarket, the commodities can be specifically adjusted according to the statistical result of the number of people in different areas, and the layout of the goods shelf can be adjusted according to the statistical result of the number of people in different areas.

At present, for the statistics of the number of people in each area, population thermodynamic data can be periodically obtained, and the population thermodynamic data is matched with a dot matrix data table pre-constructed according to unit areas to obtain the unit area to which each population thermodynamic data belongs. On one hand, the population thermal data in the method is determined according to the location information of the clients provided by a Location Based Service (LBS) network service provider, but the number of the clients in an area cannot completely identify the number of people in the area, and people who do not carry the clients may exist in the area. Moreover, the satellite positioning error is large, and the method is not suitable for counting the number of people in a region with a small area, such as a fresh area of a supermarket, a spot area of a coffee shop and the like. On the other hand, as the layout in the scene is adjusted or changed, the size of one region may also be changed, but this method cannot automatically correct the boundaries of the unit regions in the pre-constructed lattice data table, which may result in an error in counting the number of region personnel.

In view of this, an embodiment of the present application provides a method for counting regional target numbers, where the method specifically includes: acquiring an image to be identified, wherein the image to be identified is an image obtained by shooting a target scene; inputting an image to be recognized into a region recognition model to obtain characteristic information of a target scene; the feature information is used to indicate one or more target regions of the target scene; dividing an image to be identified into a plurality of image blocks; determining whether a first target object in the image to be identified is in a target area or not according to the obtained plurality of image blocks; the number of target objects within the target area is determined. The method can automatically identify one or more target regions of a target scene based on the trained region identification model. In this way, the efficiency of one or more region identification in the target scene may be improved.

The embodiment of the application also provides a device for counting the number of the target objects in the area, and the device for counting the number of the target objects in the area can be used for executing the method for counting the number of people in the area. Optionally, the target object statistics apparatus in the area may be an electronic device with data processing capability, or a functional module in the electronic device, which is not limited herein. For example, the electronic device may be a server, which may be a single server, or may be a server cluster composed of a plurality of servers. Also for example, the electronic device may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, and a terminal device such as a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) device, and a Virtual Reality (VR) device. For another example, the electronic device may also be a video recording device, a video monitoring device, or other devices that can be used to capture an image to be recognized of a target scene. The present disclosure does not specifically limit the specific form of the electronic device.

Optionally, taking regional population counting for a supermarket as an example, a trained region identification model suitable for the supermarket may be stored in the target object counting device in the region in advance, a user may input an image to be identified of the supermarket to the target object counting device in the region, and the target object counting device in the region may identify one or more regions of the supermarket through the region identification model, and further determine the number of people in each region.

In the following, taking the target object statistics apparatus in the area as an example, as shown in fig. 1, fig. 1 shows a hardware structure of an electronic device 100.

As shown in FIG. 1, the electronic device 100 includes a processor 110, a communication link 120, and a communication interface 130.

Optionally, the electronic device 100 may further include a memory 140. The processor 110, the memory 140 and the communication interface 130 may be connected via a communication line 120.

The processor 110 may be a Central Processing Unit (CPU), a general purpose processor Network (NP), a Digital Signal Processor (DSP), a microprocessor, a microcontroller, a Programmable Logic Device (PLD), or any combination thereof. The processor 101 may also be any other device with processing function, such as a circuit, a device, or a software module, without limitation.

In one example, the processor 110 may include one or more CPUs, such as CPU0 and CPU1 in fig. 1.

As an alternative implementation, the electronic device 100 comprises multiple processors, for example, the processor 170 may be included in addition to the processor 110. A communication line 120 for transmitting information between the respective components included in the electronic apparatus 100.

A communication interface 130 for communicating with other devices or other communication networks. The other communication network may be an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), or the like. The communication interface 130 may be a module, a circuit, a transceiver, or any device capable of enabling communication.

A memory 140 for storing instructions. Wherein the instructions may be a computer program.

The memory 140 may be a read-only memory (ROM) or other types of static storage devices that can store static information and/or instructions, an access memory (RAM) or other types of dynamic storage devices that can store information and/or instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disc storage medium or other magnetic storage devices, and the like, without limitation.

It should be noted that the memory 140 may exist independently from the processor 110 or may be integrated with the processor 110. The memory 140 may be used for storing instructions or program code or some data etc. The memory 140 may be located inside the electronic device 100 or outside the electronic device 100, which is not limited.

The processor 110 is configured to execute the instructions stored in the memory 140 to implement the communication method provided by the following embodiments of the present application. For example, when the electronic device 100 is a terminal or a chip in a terminal, the processor 110 may execute instructions stored in the memory 140 to implement the steps performed by the transmitting end in the embodiments described below in the present application.

As an alternative implementation, the electronic device 100 further comprises an output device 150 and an input device 160. The output device 150 may be a display screen, a speaker, or the like capable of outputting data of the electronic apparatus 100 to a user. The input device 160 may be a device capable of inputting data to the electronic apparatus 100, such as a keyboard, a mouse, a microphone, or a joystick.

It should be noted that the configuration shown in fig. 1 does not constitute a limitation of the computing device, which may include more or less components than those shown in fig. 1, or some components may be combined, or a different arrangement of components than those shown in fig. 1.

In addition, for the area recognition model, an embodiment of the present application further provides a training apparatus for the area recognition model (for simplicity, hereinafter, referred to as a training apparatus for short), where the training apparatus may be used to train the area recognition model, and provide a trained area recognition model suitable for a specific scene (for example, supermarket a). The training device may also be an electronic device module with data processing capability, or a functional module in the electronic device, which is not limited herein. Alternatively, the hardware structure of the training apparatus may be as shown in fig. 1.

In some embodiments, the target object statistics apparatus and the training apparatus in the region may be integrated into one device; alternatively, the target object counting device and the training device in the area may be two independent devices.

The embodiments provided in the present application will be described in detail below with reference to the accompanying drawings.

The embodiment of the present application provides a method for counting the number of people in an area, as shown in fig. 2, the method is applied to a device for counting target objects in an area (for simple description, hereinafter referred to as a statistical device for short) having a hardware structure shown in fig. 1, and specifically includes the following steps:

s101, a statistical device obtains an image to be identified.

The image to be recognized is an image obtained by shooting a target scene. The target scene is a scene in which the number of target objects in the region needs to be counted. Alternatively, the target scene may be a public place such as a supermarket a, a station B, a coffee shop C, or the like.

Therefore, the image to be identified is one or more images of the target scene. Moreover, in order to improve the accuracy of the number of people in the area, the image to be recognized may include images of a plurality of shooting angles of the target scene. In an actual scene, one area may be determined by a specific layout in a scene, such as a placement position of a shelf, a setting position of a cash register, a placement position of a table and a chair, and the like. Therefore, the image to be recognized is an image that can exhibit the overall decoration and layout of a specific scene.

In the present embodiment, the shooting angles include a tilt, a flat, and a plan view. The shooting at the back is shooting from a low position to an upper position, and the height of the shooting device is lower than a shooting angle of a shot object in the shooting process. The flat shooting refers to a shooting angle at which the shooting device and the object are in the same horizontal line. The overhead shooting refers to shooting from a high position to a low position by the shooting device, and the height of the shooting device is higher than a shooting angle of a shot object in the shooting process.

It should be noted that, in general, for a target scene (e.g., a restaurant), one or more regions are generally divided according to a specific placement position of an article (e.g., a placement position of a table and a chair in a dining area), a wall position, and the like in the scene, that is, one or more regions are generally divided according to a planar layout of the target scene. Therefore, in the image to be recognized which is horizontally shot or upward shot, the actual placement position of the object in the target area may not be shot, and the image to be recognized which is shot at the downward angle may clearly show the overall decoration and layout of the target scene, that is, the image to be recognized which is shot at the downward angle may clearly show the area division condition of the target scene.

S102, the statistical device inputs the image to be recognized into the region recognition model to obtain the characteristic information of the target scene.

The region identification model is a pre-trained identification model which can be used for identifying one or more regions included in the target scene.

Further, the feature information includes one or more target regions of the target scene. Optionally, the feature information may further include one or more boundaries in the target scene.

The boundary line is an intersection between an obstacle that cannot be directly crossed and the ground. For example, the obstacle may be a wall, a cabinet, a shelf, etc. in the target scene.

One or more regions of the target scene may be determined according to a specific layout in the scene, such as a wall position, a shelf arrangement position, a cash register arrangement position, a desk and chair arrangement position, and the like. Thus, the boundary line may be a boundary line of one or more regions.

In addition, in the process of identifying the number of people in the area of the target scene, if a transparent obstacle (such as a glass wall, a transparent shelf and the like) which cannot be directly crossed exists in the target scene, and a first target object (such as a person) is near the glass wall, the image information of the person is included in the image to be identified no matter the person is in the wall or out of the wall of the glass wall. But the inside and outside of the glass wall may belong to different areas or the outside of the glass wall may not belong to the target scene. Therefore, the feature information may further include one or more boundaries in the target scene, so that when the target scene has transparent obstacles that cannot be directly crossed, the region where the target object is located is more accurately located according to the boundaries. Illustratively, as shown in fig. 3, fig. 3 is a to-be-recognized image of a coffee shop X including characteristic information. The feature information of the image includes a region 11, a region 12, a region 13, a region 14, a region 15, and a region 16 in fig. 3. The feature information of the image includes the boundary line 21, the boundary line 22, and the boundary line 23 in fig. 3.

In one implementation, the statistical device may identify feature information of the target scene at a first preset frequency. Furthermore, based on the feature information that has been identified, statistics of the number of regional target objects of the target scene may be performed a plurality of times.

The first preset frequency may be weekly, monthly, etc.

It should be noted that, under the condition that the overall decoration and layout of the target scene do not change, the area division of the target scene does not change, and therefore, the statistical device may acquire the image to be recognized of the target scene at the first preset frequency, so as to reduce the frequency of area recognition and save the computing resources.

In another implementation manner, the statistical device may acquire an image to be recognized of the target scene when the number of the regional target objects needs to be counted in the target scene each time, and then recognize feature information of the target scene according to the image to be recognized.

It should be noted that, the statistical device obtains the image to be recognized of the target scene when the number of the regional target objects needs to be counted each time, and can obtain the more accurate feature information that is recognized in real time.

S103, dividing the image to be recognized into a plurality of image blocks by a statistical device.

Specifically, the statistical device may equally divide the image to be recognized into a plurality of image blocks. An image block may be a square, a rectangle, or other possible shapes, which is not limited in this application.

For example, the image blocks may be obtained by performing mesh segmentation on an image in a form of X rows by X Y columns, so as to obtain X × Y image blocks of the complete image, where one image block may include one or more pixel points of the image to be recognized. For example, the statistical device may perform mesh segmentation on the acquired image according to the resolution or size of the image. For example, the statistical device may divide an image to be recognized into 1984 image blocks according to 32 rows × 62 columns.

The pixel refers to a minimum unit in an image, and a pixel point may be regarded as an indivisible unit or element in the entire image. The complete image is composed of a plurality of pixel points, and the color and the position of all the pixel points of the image can determine the whole pattern and the size of the image.

And S104, the statistical device determines whether the first target object in the image to be identified is in the target area according to the plurality of image blocks.

The target area is any area in a target scene. The first target object may be any one of target objects in a target scene, or the first target object may be any one of target objects in an image to be recognized, where the position of the target object is the same as that of a target area. Alternatively, the first target object may be a target object whose trajectory information is close to the target area.

For example, taking the first target object as a person, when the person appears in the field of view of the monitoring device installed in the target scene, the monitoring device may track the person and record the position of the person at a second preset frequency. For example, the trajectory information of the person in a period of time may be an image of the person in the period of time at a plurality of times, as shown in fig. 4 (a). The person image 31 may be track information of the person at a time. For another example, the trajectory information of the person during a period of time may be a plurality of trajectory points of the person at a plurality of times during the period of time, as shown in fig. 4 (b). The track point 32 may be track information of the person at a time of a preset time period. Alternatively, the locus point 32 may be the position of the center of gravity of the person at that time, or the position of the feet of the person, or the like. For another example, the track information of the person in a period of time may be a plurality of images obtained by the monitoring device capturing the person at a plurality of times in the period of time, that is, the plurality of images may be a plurality of images to be identified.

In some embodiments, the statistical device may determine whether the first target object is within the target area according to the trajectory information of the first target object at the current time.

Alternatively, taking the trajectory information as shown in (b) in fig. 4 as an example, the statistical device may determine whether the trajectory point 32 is within the target area based on one or more target areas of the identified target scene.

Optionally, as shown in fig. 5, the statistical apparatus may perform the following steps S1041 to S1043, and determine whether the first target object is in the target area according to the plurality of image blocks in the image to be recognized:

s1041, the counting device determines the number of the same image blocks in the image blocks included by the first target object and the image blocks included by the target area in the image to be recognized, and the repetition proportion of the number of the image blocks included by the first target object.

In some embodiments, the statistical device may determine an image block included in the first target object and an image block included in the target area in the image to be recognized, and then calculate the repetition ratio.

It should be understood that, for any image block in a plurality of image blocks in the image to be recognized, the image block may be an image block included in the first target object, or the image block may be an image block included in the target area, or the image block may be an image block included in both the first target object and the target area. Or, the image block does not belong to an image block included in the first target object nor to an image block included in the target area

Optionally, for any image block in the plurality of image blocks in the image to be recognized, when the central point of any image block is within the recognition range of the first target object, the statistical device may determine that the first target object in the image to be recognized includes any image block; and/or when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block.

For example, as shown in fig. 6, taking a first target object as a person 40 as an example, the identification range of the first target object may be a range included in a person identification frame 41 of the person 40 in an image to be identified, and all image blocks whose center points are in the person identification frame 41 are image blocks included in the first target object. The identification range of the target area may be a range included in the area identification frame 17 in the image to be identified, and all image blocks whose central points are in the area identification frame 17 are the image blocks included in the target area.

Further, the statistical device may determine, according to the image blocks included in the first target object and the image blocks included in the target area in the image to be recognized, the number of the image blocks included in the first target object in the image to be recognized, and the number of the same image blocks included in the first target object and the image blocks included in the target area in the image to be recognized.

In addition, the repetition ratio is a ratio between the number of image blocks included in the first target object in the image to be recognized and the same number of image blocks included in the target area and the number of image blocks included in the first target object in the image to be recognized.

S1042, when the repetition proportion is larger than or equal to a preset threshold value, the statistical device determines that the first target object is in the target area.

The preset threshold may be 50% or other reasonable proportional value.

It can be understood that, in the target image to be recognized, for example, an image to be recognized whose shooting angle is top shooting, if the first target object is in the target area, all image blocks included in the first target object should belong to the image blocks included in the target area. However, in other images to be recognized, for example, images to be recognized with a flat shooting angle, when the first target object is in the target area, a part of image blocks included in the first target object belong to image blocks included in the target area, another part of image blocks included in the first target object may not belong to image blocks included in the target area, but a ratio of the number of image blocks included in the target area to the total number of image blocks included in the first target object is usually greater than or equal to a preset threshold. Therefore, the statistical means may determine whether the first target object is within the target region by calculating the repetition rate described above.

The top shooting is a top shooting, and in the shooting process with the shooting angle of the top shooting, the height of the camera is higher than that of the shot object, and the shooting direction of the camera is vertical to the horizontal plane.

In some embodiments, when the repetition ratio is greater than or equal to the preset threshold, the statistical device may obtain the trajectory information of the first target object in the preset time period, determine a connection line between the trajectory point at the first time and the trajectory point at the second time in the trajectory information of the preset time period, and determine that the first target object is in the target area if the connection line does not intersect with one or more boundary lines.

The preset time period comprises a first moment when the first target object enters the target area. The duration of the preset time period may be shorter duration such as 10 seconds, 5 seconds, and the like. For example, the preset time period may be a time period in which the ending time is the first time, or may be a time period in which the intermediate time is the first time.

And the second time is before the first time, and the second time belongs to the preset time interval.

It can be understood that, taking the first target object as an example of a person, the movement locus of one person does not have an intersection with one or more boundary lines because the person cannot pass through the solid wall directly under normal conditions. Therefore, if the connecting line intersects with one or more boundary lines, it is determined that the first target object is not in the target area.

It should be noted that, in the image to be recognized, if there is a transparent obstacle (e.g., a glass wall, a transparent shelf, etc.) that cannot be directly crossed in the target scene, and the first target object (e.g., a person) is located near the glass wall, the image to be recognized includes image information of the person whether the person is located inside or outside the glass wall. However, one or more regions in the target scene are usually divided according to the intersection line of the obstacle and the ground (i.e. the above-mentioned division plane), and the inside and the outside of the glass wall belong to different regions. Therefore, when the repetition rate is greater than or equal to a preset threshold, the statistical device can determine that the person is near the glass wall. Further, the statistical device may determine whether an intersection point exists between a connection line between the track point at the first time and the track point at the second time and one or more boundary lines of the target scene according to the trajectory information of the person. Since the glass wall cannot be directly spanned, if there is a crossing point, it is determined that the person is not within the area within the wall. For example, if the outer wall of the restaurant E is a glass wall, the inside of the glass wall is a dining area of the restaurant E, the outside of the glass wall belongs to the outside of the restaurant E, but in the image to be recognized, the repetition ratio of the image information of the first target object located outside the glass wall to the image information of the dining area of the restaurant E may be greater than the preset threshold value.

Therefore, when the repetition ratio is greater than or equal to the preset threshold, the statistical device may further determine whether an intersection point exists between a connecting line between the track point at the first time and the track point at the second time and one or more boundary lines of the target scene, and determine whether the first target object is located in the target area.

And S1043, when the repetition ratio is smaller than a preset threshold, the statistical device determines that the first target object is not in the target area.

S105, the counting device determines the number of the target objects in the target area. Optionally, for all target objects in the target scene, the statistical device may sequentially determine the area where each target object is located, so as to obtain the number of target objects included in the target area in the image.

In some embodiments, the present application further provides a training method for a region identification model, which is applied to the training apparatus described above, and as shown in fig. 7, the method includes:

s201, the training device obtains a training sample set.

The training sample set comprises a plurality of images of a target scene.

Optionally, the target scene is a scene that needs to be subjected to region identification or a scene of the same type that is similar to the scene layout that needs to be subjected to region identification. For example, if the scene requiring area identification is supermarket a, the target scene may be supermarket a and supermarket a1 of the same type with similar layout as supermarket a.

It should be noted that, in practical implementation, the supermarket a and the supermarket a1 may be chain supermarkets of the same brand opened at different positions, and thus, the layouts of the supermarket a and the supermarket a1 are similar, and the region partitions of the supermarket a and the supermarket a1 are also similar, so that in order to collect a large number of samples and train a region recognition model, the training sample set includes a plurality of images which may also include the supermarket a 1.

Optionally, the multiple images of the target scene may be images of multiple viewing angles of the target scene.

And the images in the training sample set are marked with target scene characteristic information. The feature information includes one or more regions of the target scene. Illustratively, the one or more regions marked on each image may be as shown in fig. 3 as region 12, region 13, region 14, region 15, and region 16.

The feature information further includes one or more boundary lines of the target scene, where the boundary lines are intersections between obstacles that cannot be directly crossed and the ground. Illustratively, the one or more boundary lines marked on each image may be shown as a boundary line 21, a boundary line 22, and a boundary line 23 in fig. 3.

In some embodiments, the training device may directly acquire a plurality of images labeled with the target scene feature information, and determine the acquired plurality of images as the training sample set.

Or the training device may first acquire a plurality of images of the target scene not labeled with the feature information, then display the acquired plurality of images of the target scene to the user, and then receive the plurality of images of the target scene labeled with the feature information by the user, thereby obtaining the training sample set.

Optionally, when the training device acquires multiple images of a target scene that is not labeled with feature information, the training device may display the multiple images of the target scene that is not labeled with feature information to a user through a built-in or external display device, and acquire the feature information labeled on each image by the user through a built-in or external input device, such as a mouse, a keyboard, or a touch device on the display device, to further obtain a training sample set.

Alternatively, the training apparatus may send a plurality of images of the target scene not labeled with the feature information to the terminal device of the user. Therefore, the terminal device of the user can acquire the characteristic information marked on each image input by the user and send a plurality of images of the target scene marked with the characteristic information to the training device. Correspondingly, the training device can receive the multiple images to obtain a training sample set.

In some embodiments, when the training device acquires a plurality of images of a target scene not labeled with feature information, the training device may determine a first image captured at a first capturing angle from the plurality of images, convert the first image into an image captured at a down-shooting angle, and capture the first image at a flat-shooting angle or a tilt-shooting angle.

For the shooting angle, reference may be made to the related description in step S101, which is not described herein again.

Alternatively, the training device may convert the image of the first shooting angle into a top-down image by means of projective transformation.

The projective Transformation may also be referred to as Perspective Transformation (Perspective Transformation), which is Transformation that a projective plane (Perspective plane) is rotated around a trajectory (Perspective axis) by a certain angle according to a Perspective rotation law under the condition that three points, i.e., a Perspective center, an image point, and a target point, are collinear, and an image is projected onto a new plane (e.g., a horizontal plane) from the original plane.

It should be noted that the overhead image can visually show the horizontal layout of the target scene, so that the user can conveniently determine the boundary of one area of the target scene on the horizontal plane and the boundary line of the target scene. In addition, the user can judge the boundary of one area of the target scene on the horizontal plane and the position of the boundary line more accurately, so that the one or more areas and the one or more boundary lines of the target scene marked by the user are more accurate based on the overhead image compared with the image at the first shooting angle. Therefore, the training device trains the region identification model according to the above, and the region identification model with higher identification accuracy can be obtained.

Further, the training device may acquire feature information marked on the top-down image as a shooting angle, and convert the top-down image marked with the feature information into an image at a first shooting angle. Therefore, the training device can determine the first image with the shooting angle being the first visual angle and labeled with the characteristic information as the image in the training sample set.

S202, training the initial model by the training device according to the training sample set to obtain the region identification model.

The region identification model is used for identifying regions of the image of the target scene.

In some embodiments, the training process of the region identification model to be trained according to the training samples includes:

s1, the training device obtains an initial model to be trained and a training sample set.

S2, inputting the first image in the training sample set into the initial model to be trained by the training device to obtain the characteristic information of the first image.

S3, comparing the feature information of the first image output by the initial model with the feature information of the first image label by the training device, and determining the loss (loss) value of the image.

S4, the training device adjusts the model parameter of the initial mode according to the loss value

S5, the training device takes another image of the training sample set as a new first image, and repeats S1-S5 until the model converges.

The other image may be the first image, or may be any image in the training sample set other than the first image, which is not limited to this.

The training device may determine whether the model converges based on a loss value output each time in the initial model process, or determine whether the pattern converges based on the training times, which is not limited in the embodiment of the present application.

For example, the training device may determine that the model converges when the loss value is less than the loss value threshold.

For another example, the training apparatus may determine that the model converges when the number of training times exceeds a threshold number of times.

In this way, the training apparatus can determine the converged model as the trained region identification model.

In some embodiments, the training method of the region identification model further includes: the training device acquires a test sample image marked with target scene characteristic information, inputs the test sample image into the region identification model, and tests the region identification model. For example, the process of testing the region identification model by the training apparatus may refer to the above steps S1 to S5, which are not described herein again.

In some embodiments, when the region included in the target scene changes, the training device may acquire one or more images marked with the changed target scene feature information; and retraining the region recognition model according to one or more images marked with the changed target scene characteristic information to obtain the retrained region recognition model.

The process of retraining the region identification model by the training apparatus is shown in the above steps S1 to S6.

It should be noted that, as the layout in the target scene is adjusted or changed, the feature information of the target scene may also be changed, and at this time, the training device may acquire a plurality of images of the target scene and the feature information after correction on each image, and retrain the region identification model. It should be understood that if the boundaries of one or more regions in the target scene in the feature information are adjusted or changed along with the layout in the target scene, and the region identification model is not corrected in time, the error of the region identification result obtained by using the region identification model is large, and the accuracy of the people counting in the region is affected.

The corrected feature information marked on each image of the target scene may be input again by the user.

Based on the above embodiment, for a target scene that needs region identification, the training device takes one or more images labeled with feature information of the target scene as a training sample set to train the initial model, so as to obtain a region identification model that can be used for region identification of the target scene. In this way, with the trained region identification model, automatic identification of one or more regions of the target scene can be achieved.

It can be seen that the foregoing describes the solution provided by the embodiments of the present application primarily from a methodological perspective. In order to implement the functions, the embodiments of the present application provide corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiment of the present application, the statistical device or the training device may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 8 is a schematic structural diagram of a statistical apparatus according to an embodiment of the present application. The statistical apparatus 200 may include: a transceiver module 201, an identification module 202 and a processing module 203.

The transceiver module 201 is configured to acquire an image to be recognized, where the image to be recognized is an image obtained by shooting a target scene.

The identification module 202 is configured to input the image to be identified into the region identification model to obtain feature information of the target scene; the feature information is used to indicate one or more target regions of the target scene.

The processing module 203 is used for dividing the image to be recognized into a plurality of image blocks; the processing module 203 is further configured to determine whether a first target object in the image to be identified is in the target area according to the plurality of image blocks; and determining the number of target objects within the target area.

In a possible implementation manner, the processing module 203 is specifically configured to: determining the number of image blocks included by a first target object and the same number of image blocks included by a target area in an image to be identified and the repetition proportion of the number of the image blocks included by the first target object; and when the repetition ratio is greater than or equal to a preset threshold value, determining that the first target object is in the target area.

In another possible implementation manner, the processing module 203 is further configured to determine an image block included in the first target object and an image block included in the target area in the image to be recognized.

In another possible implementation manner, the processing module 203 is specifically configured to: for any image block in the plurality of image blocks, when the central point of any image block is in the identification range of the first target object, determining that the first target object in the image to be identified comprises any image block; and/or when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block.

In another possible implementation manner, the processing module 203 is further configured to determine that the first target object is not located in the target area when the repetition rate is smaller than a preset threshold.

In another possible implementation manner, if the feature information further includes one or more boundary lines of the target scene, the transceiver module 201 is further configured to obtain track information of the first target object in a preset time period, where the preset time period includes a first time when the first target object enters the target area, and the processing module 203 is further configured to determine a connection line between a track point at the first time and a track point at a second time in the track information of the preset time period, where the second time is before the first time, and the second time belongs to the preset time period; and determining that the first target object is not within the target area when the connecting line has an intersection with the one or more boundary lines.

For the detailed description of the above alternative modes, reference may be made to the foregoing method embodiments, which are not described herein again. In addition, any explanation and beneficial effect description of the statistical device 200 provided above can refer to the corresponding method embodiment described above, and are not repeated.

As an example, in conjunction with fig. 1, the functions implemented by the signal processing module 203 in the statistical apparatus 200 may be implemented by the processor 110 or the processor 170 in fig. 1 executing the program code in the memory 140 in fig. 1. The functions performed by the transceiver module 101 may be implemented by the communication line 120 in fig. 1, but are not limited thereto.

Fig. 9 is a schematic structural diagram of an exercise device 300 according to an embodiment of the present disclosure. The apparatus 300 may include: a transceiver module 301 and a training module 302. Optionally, the apparatus 300 may further include a processing module 303.

The transceiver module 301 is configured to obtain a training sample set, where the training sample set includes one or more images labeled with feature information of a target scene, where the feature information includes one or more regions and one or more boundary lines of the target scene.

The training module 302 is configured to train the initial model according to the training sample set to obtain a region identification model, where the region identification model is used to perform region identification on an image of a target scene.

In a possible implementation manner, the apparatus further includes a processing module 303. The transceiver module 301 is specifically configured to: multiple images of a target scene are acquired. For a first image shot at a first shooting angle in the multiple images, the processing module 303 is configured to convert the first image into a first image shot at a shooting angle of a downward shot, where the first shooting angle is a horizontal shot or a vertical shot; and the transceiver module is further specifically used for acquiring characteristic information marked on the image with the shooting angle being a down shot. The processing module 303 is further configured to convert the first image labeled with the characteristic information into a first image captured at a first capturing angle. The processing module 303 is further configured to determine the first image with the first angle of view and labeled with the feature information as an image in the training sample set. In another possible implementation manner, the transceiver module 301 is further configured to acquire one or more images marked with changed target scene feature information when a region included in the target scene changes. The training module 302 is further configured to retrain the region identification model according to one or more images labeled with the changed target scene feature information, so as to obtain a retrained region identification model.

For the detailed description of the above alternative modes, reference may be made to the foregoing method embodiments, which are not described herein again. In addition, for any explanation and beneficial effects of the training apparatus 300 provided above, reference may be made to the corresponding method embodiments, and details are not repeated.

As an example, in connection with fig. 1, the functions implemented by the processing module 303 in the training apparatus may be implemented by the processor 110 or the processor 170 in fig. 1 executing the program code in the memory 140 in fig. 1. The functions implemented by the transceiving module 301 may be implemented by the communication line 120 in fig. 1, but are not limited thereto.

Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It should be noted that the division of the modules in fig. 8 or fig. 9 is schematic, and is only one logic function division, and there may be another division manner in actual implementation. For example, two or more functions may also be integrated in one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The embodiment of the present application further provides a computer-readable storage medium, which includes computer-executable instructions, and when the computer-readable storage medium is run on a computer, the computer is caused to execute any one of the methods provided by the above embodiments. For example, one or more features of S101-S105 of FIG. 2 may be undertaken by one or more computer-executable instructions stored in the computer-readable storage medium.

The embodiments of the present application also provide a computer program product containing instructions for executing a computer, which when executed on a computer, causes the computer to perform any one of the methods provided by the above embodiments.

An embodiment of the present application further provides a chip, including: a processor coupled to the memory through the interface, and an interface, when the processor executes the computer program or the computer execution instructions in the memory, the processor causes any one of the methods provided by the above embodiments to be performed.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present application are generated in whole or in part when the computer-executable instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer-readable storage media can be any available media that can be accessed by a computer or can comprise one or more data storage devices, such as servers, data centers, and the like, that can be integrated with the media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for counting the number of objects in a region, the method comprising:

acquiring an image to be identified, wherein the image to be identified is an image obtained by shooting a target scene;

inputting an image to be recognized into a region recognition model to obtain characteristic information of a target scene; the feature information is used for indicating one or more target areas of the target scene;

dividing the image to be identified into a plurality of image blocks;

determining whether a first target object in the image to be identified is in the target area or not according to the plurality of image blocks;

determining a number of the target objects within the target area.

2. The method of claim 1, wherein the determining whether a first target object in the image to be recognized is within the target area according to the plurality of image blocks comprises:

determining the number of image blocks included by a first target object in the image to be identified and the same number of image blocks in the image blocks included by the target area, and the repetition proportion of the number of the image blocks included by the first target object;

when the repetition proportion is larger than or equal to a preset threshold value, determining that the first target object is in the target area; alternatively, the first and second electrodes may be,

and when the repetition proportion is smaller than the preset threshold value, determining that the first target object is not in the target area.

3. The method according to claim 2, wherein before the determining the repetition ratio of the number of same image blocks in the image block included in the first target object and the image block included in the target area in the image to be recognized to the number of image blocks included in the first target object, the method further comprises:

for any image block in the plurality of image blocks, when the central point of the any image block is within the identification range of the first target object, determining that the first target object in the image to be identified comprises the any image block; and/or the presence of a gas in the gas,

and when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block.

4. The method of any of claims 1 to 3, wherein the feature information further comprises one or more boundary lines of the target scene, the method further comprising:

acquiring track information of the first target object in a preset time period, wherein the preset time period comprises a first moment when the first target object enters the target area;

determining a connecting line between the track point at the first moment and the track point at the second moment in the track information of the preset time period, wherein the second moment is before the first moment, and the second moment belongs to the preset time period;

and if the connecting line and the one or more boundary lines have intersection points, determining that the first target object is not in the target area.

5. A region recognition model training method is characterized by comprising the following steps:

acquiring a training sample set, wherein the training sample set comprises one or more images marked with feature information of a target scene, and the feature information comprises one or more areas and one or more boundary lines of the target scene;

and training an initial model according to the training sample set to obtain a region identification model, wherein the region identification model is used for performing region identification on the image of the target scene.

6. The method of claim 5, wherein the obtaining a training sample set comprises:

acquiring a plurality of images of a target scene;

for a first image shot at a first shooting angle in the plurality of images, converting the first image into an image with a shooting angle of a downward shot, wherein the first shooting angle is a horizontal shot or a upward shot;

acquiring characteristic information marked on an image with a shooting angle of a prone shot;

converting the first image marked with the characteristic information into the first image shot at a first shooting angle;

and determining the first image with the shooting angle as a first visual angle and marked with the characteristic information as an image in a training sample set.

7. The method of claim 5, further comprising:

when the area included in the target scene changes, acquiring one or more images marked with changed target scene characteristic information;

and retraining the region recognition model according to one or more images marked with the changed target scene characteristic information to obtain the retrained region recognition model.

8. An apparatus for counting the number of objects in a region, the apparatus comprising:

the system comprises a receiving and sending module, a processing module and a processing module, wherein the receiving and sending module is used for acquiring an image to be identified, and the image to be identified is an image obtained by shooting a target scene;

the recognition module is used for inputting the image to be recognized into the region recognition model to obtain the characteristic information of the target scene; the feature information is used for indicating one or more target areas of the target scene;

the processing module is used for dividing the image to be identified into a plurality of image blocks;

the processing module is further configured to determine whether a first target object in the image to be identified is in the target area according to the plurality of image blocks; and determining the number of the target objects within the target area;

the processing module is specifically configured to:

when the repetition proportion is larger than or equal to a preset threshold value, determining that the first target object is in the target area;

when the repetition proportion is smaller than the preset threshold value, determining that the first target object is not in the target area;

the processing module is specifically configured to:

when the central point of any image block is in the identification range of the target area, determining that the target area in the image to be identified comprises any image block;

the characteristic information further includes one or more boundary lines of the target scene, and the transceiver module is further configured to acquire trajectory information of the first target object in a preset time period, where the preset time period includes a first time when the first target object enters the target area;

the processing module is further configured to determine a connection line between the track point at the first time and the track point at a second time in the track information of the preset time period, where the second time is before the first time, and the second time belongs to the preset time period; and determining that the first target object is not within the target area when the connecting line has an intersection with the one or more boundary lines.

9. An area recognition model training apparatus, comprising:

the system comprises a receiving and sending module, a processing module and a processing module, wherein the receiving and sending module is used for acquiring a training sample set, the training sample set comprises one or more images marked with characteristic information of a target scene, and the characteristic information comprises one or more areas and one or more boundary lines of the target scene;

the training module is used for training an initial model according to the training sample set to obtain a region identification model, and the region identification model is used for performing region identification on the image of the target scene;

the device further comprises a processing module, wherein the transceiver module is specifically used for acquiring a plurality of images of the target scene;

for a first image shot at a first shooting angle in the plurality of images, the processing module is used for converting the first image into an image with a shooting angle of a downward shot, and the first shooting angle is a horizontal shot or a upward shot;

the transceiver module is further specifically configured to acquire feature information labeled on an image with a shooting angle of a prone shot;

the processing module is further configured to convert the first image labeled with the feature information into the first image shot at the first shooting angle;

the processing module is further configured to determine the first image with a first angle of view and labeled with the feature information as an image in a training sample set;

the transceiver module is further configured to acquire one or more images marked with changed target scene feature information when an area included in the target scene changes;

the training module is further configured to retrain the region identification model according to the one or more images labeled with the changed target scene feature information, so as to obtain a retrained region identification model.

10. A computer readable storage medium having stored thereon computer instructions which, when executed, cause a computer to perform the method of any of claims 1 to 4, or the method of any of claims 5 to 7.