CN111753574A

CN111753574A - Throw area positioning method, device, equipment and storage medium

Info

Publication number: CN111753574A
Application number: CN201910231890.XA
Authority: CN
Inventors: 曾晓嘉
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd; SF Tech Co Ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2020-10-09

Abstract

The application discloses a throwing area positioning method, a throwing area positioning device, throwing area positioning equipment and a storage medium. The method comprises the following steps: decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3; performing fusion processing on every N frames of images to determine fused images; and inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed. The technical scheme can accurately position the throwing area in the video, and the content information and the time sequence information of each frame of image are reserved to a great extent by carrying out fusion processing on the plurality of frames of images, so that the calculated amount is reduced, the result is stable and the accuracy is high.

Description

Throw area positioning method, device, equipment and storage medium

Technical Field

The invention relates to the field of image processing, in particular to a method, a device, equipment and a storage medium for positioning a throwing area.

Background

In recent years, feature recognition technology is a hot spot of research in the field of artificial intelligence and pattern recognition, and has been increasingly applied to daily life of people, such as: attendance system, access control system etc. based on biological feature identification, wherein, in the commodity circulation field, can carry out feature identification through the letter sorting action to the staff in real time to whether monitoring staff has the action of throwing away, thereby realize the high-efficient management to the enterprise.

At present, feature recognition can be performed through a significance detection method based on a deep learning model, a large number of data sets are needed in the model training process, training convergence is difficult to perform, the false alarm rate of cases without a throwing area is high, in addition, the throwing area in a video can be recognized through a method for generating an confrontation network based on a pix2pix model, the method can capture a motion area in the video, but the positioning of the throwing area is not clear, and the following steps need to be further manually processed, the processing process is complex, and the positioning of the throwing area is not accurate.

Disclosure of Invention

In view of the above-mentioned defects or shortcomings in the prior art, it is desirable to provide a method, an apparatus, a device and a storage medium for positioning a throwing area, which can solve the problems of inaccurate positioning and high false alarm rate of the throwing area in the prior art.

In a first aspect, the present invention provides a method of locating a disposal area, the method comprising:

decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;

performing fusion processing on every N frames of images to determine fused images;

and inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed.

In one embodiment, the fusing every N frames of images to determine fused images includes:

determining a B channel value, a G channel value and an R channel value of each N frame of image;

and determining the fused image according to the B channel value, the G channel value and the R channel value.

In one embodiment, the determining the B channel value, the G channel value, and the R channel value of each N frame image includes:

determining a B channel value, a G channel value and an R channel value of each frame of image;

n in every N frame images₁B channel mean, N of frame image₂G-channel mean and N of frame image₃Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is₁+N₂+N₃＝N，N、N₁、N₂、N₃Are all positive integers, N₁≥1，N₂≥1，N₃≥1。

In one embodiment, the fused image is input into a throwing area positioning network model, and a throwing area in the video to be processed is determined, wherein the throwing area positioning network model is constructed by the following steps:

acquiring a historical video;

decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3;

performing fusion processing on every N frames of images to determine a fusion image;

and inputting the fusion graph into a preset positioning network model for training, and determining a throwing area positioning network model.

In one embodiment, inputting the fusion map into a preset positioning network model for training, and determining a throw area positioning network model includes:

and training the preset positioning network model by adopting a feature extraction algorithm and a target detection algorithm to obtain a throwing area positioning network model.

In a second aspect, embodiments of the present application provide a throw area positioning device, the device including:

the decomposition module is used for decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;

the image determining module is used for performing fusion processing on every N frames of images and determining fused images;

and the throwing area determining module is used for inputting the fused image into a throwing area positioning network model and determining the throwing area in the video to be processed.

In one embodiment, the image determination module includes:

a first determining unit configured to determine a B channel value, a G channel value, and an R channel value of the every N frame images;

and the second determining unit is used for determining the fused image according to the B channel value, the G channel value and the R channel value.

In one embodiment, the first determining unit is specifically configured to:

In one embodiment, the tossing area determining module comprises a tossing area locating network model constructed by the following steps:

acquiring a historical video;

In one embodiment, the tossing area determining module is specifically configured to:

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the method for locating a tossing area as described above when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for locating a throwing area.

The method, the device, the equipment and the storage medium for positioning the throwing area resolve the acquired video to be processed into a plurality of continuous N-frame image sets according to time sequence, then perform fusion processing on every N-frame image to determine the fused image, and finally input the fused image into a throwing area positioning network model to determine the throwing area in the video to be processed. According to the technical scheme, as the fusion processing is carried out on the multi-frame images in the video, the identification of the throwing area in the video can be converted into the identification of the throwing area in the image, and the content information and the time sequence information of each frame of image are reserved to a great extent.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a schematic flowchart of a method for positioning a throwing area according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of a method for determining a fused image according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a structure of a fused image according to an embodiment of the present invention;

FIG. 4 is a schematic flowchart of a process for constructing a throw area positioning network model according to an embodiment of the present invention;

FIG. 5 is a schematic structural view of a positioned disposal area provided by an embodiment of the present invention;

FIG. 6 is a schematic structural view of a throw area positioning device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

As mentioned in the background art, in the logistics industry, in order to pursue timeliness, a phenomenon that an express is thrown occurs to workers, and these behaviors not only cause damage to packages, but also increase various complaints, in order to efficiently manage enterprises, detection of a thrown area in a video is particularly important, in the prior art, the thrown area in the video can be located by a significance detection method based on deep learning, the method can generate a significance result for each frame of image, however, a large amount of data sets are needed in the training process of a deep learning model, training convergence is difficult to perform, no matter whether the thrown area exists in the video, a corresponding result is generated, and the false alarm rate of a case without the thrown area is high; in addition, a throw area in the video can be identified by a method for generating a confrontation network based on a pix2pix model, which can capture a picture of a motion area in the video, i.e. an image with the motion area is marked as white, and other background areas are marked as black, for example: people moving in the video, objects in motion states such as a rotating fan and the like are marked as white, but the throwing area cannot be accurately identified, the picture marked as white needs to be further manually processed, the processing process is complex, and the throwing area in the video is inaccurately positioned.

Based on the above defects, embodiments of the present invention provide a method for positioning a tossing area, which decomposes an acquired video to be processed into a plurality of consecutive N-frame image sets according to a time sequence, performs fusion processing on every N-frame image to obtain a fused image, and inputs the fused image into a tossing area positioning network model, thereby determining a tossing area in the video to be processed, which can reduce the calculation workload and accurately position the tossing area in the video.

The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

It should be noted that the execution subject of the method embodiments described below may be a throw area positioning apparatus, which may be implemented by software, hardware, or a combination of software and hardware as part or all of the terminal device. The execution subjects of the method embodiments described below are described taking a computer device as an example.

Fig. 1 is a schematic flow chart of a method for positioning a throwing area according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step S101, decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;

specifically, the video to be processed may be a video in a certain time period that needs to be processed, and may be directly acquired by a camera, downloaded by a cloud, or imported by another device, where the device may be a monitoring device.

Further, after the video to be processed is obtained, the video to be processed may be decomposed into one frame of continuous images according to a time sequence through video processing software, and then a plurality of continuous N-frame image sets are obtained with N frames as step lengths, where N is a positive integer and N is greater than or equal to 3, optionally, N may be any positive integer such as 3, 4, 5, 6, and the like, which is not limited in this embodiment.

For example, assuming that the video to be processed is 100 frames and N is 10, i.e. ten frames are taken as step size, the video can be decomposed into 10 image sets of the 1 st to 10 th frames, the 11 th to 20 th frames, the 81 th to 90 th frames, and the 91 st to 100 th frames according to time sequence.

Optionally, the image set may have an image of a throwing area, or may not have an image of a throwing area, which is not limited in this embodiment.

And S102, performing fusion processing on every N frames of images, and determining fused images.

Specifically, image fusion refers to that image data acquired by a multi-source channel is subjected to image processing, computer technology and the like, so that favorable information in each channel is extracted to the maximum extent, and finally, high-quality images are synthesized to improve the utilization rate of image information; the data form of image fusion is an image containing light and shade, color, temperature, distance and other scene features, and the images can be shown in a form of one or a column.

Generally, image fusion can be divided into pixel level fusion, feature level fusion, and decision level fusion, where there are spatial domain algorithm and transform domain algorithm in pixel level fusion, such as logic filtering method, gray-scale weighted average method, contrast modulation method, wavelet transform method, etc., and the fusion algorithm often combines the average value, entropy value, standard deviation, and average gradient of an image.

Each pixel of the image is composed of R, G, B components, and after a plurality of consecutive N-frame image sets are obtained, the fused image can be specified by performing fusion processing on each N-frame image by a fusion algorithm.

And S103, inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed.

Specifically, after each N frames of images are fused to obtain fused images, the fused images are input into a throwing area positioning network model, so that a throwing area in the video to be processed is determined, wherein the throwing area positioning network model has the capability of positioning the throwing area.

The method for positioning the throwing area comprises the steps of decomposing an acquired video to be processed into a plurality of continuous N-frame image sets according to a time sequence, fusing every N-frame image to determine a fused image, and inputting the fused image into a throwing area positioning network model to determine the throwing area in the video to be processed. According to the technical scheme, as the fusion processing is carried out on the multi-frame images in the video, the identification of the throwing area in the video can be converted into the identification of the throwing area in the image, and the content information and the time sequence information of each frame of image are reserved to a great extent.

Fig. 2 is a schematic flowchart of a method for determining a fused image according to an embodiment of the present invention, where on the basis of the foregoing embodiment, as shown in fig. 2, the method includes:

step S201, determining a B channel value, a G channel value, and an R channel value of each N frame image.

Specifically, after the acquired video to be processed is decomposed into a plurality of continuous N-frame image sets according to time sequence, each N-frame image set is processed, the B channel value, the G channel value and the R channel value of each frame of image can be checked through related software, and N in each N-frame image is calculated₁B channel mean, N of frame image₂G-channel mean and N of frame image₃R channel mean of frame image, N₁B channel mean, N of frame image₂G-channel mean and N of frame image₃The R-channel mean value of the frame images is taken as the B-channel value, the G-channel value and the R-channel value of each N-frame image, wherein N, N₁、N₂、N₃Are all positive integers, and N₁≥1，N₂≥1，N₃≥1。

Illustratively, when N is 10, i.e. in ten frames, N is the step size₁Is 3, N₂Is 4, N₃3, decomposing the acquired video to be processed into a plurality of continuous ten-frame image sets according to time sequence, processing each ten-frame image set, checking the B channel value, the G channel value and the R channel value of each frame of image through related software, calculating the B channel mean value of the first three frames of images, the G channel mean value of the middle four frames of images and the R channel mean value of the last three frames of images in each ten frames of images, and obtaining the B channel mean value B of the first three frames of images_meanG channel mean G of middle four-frame image_meanAnd R channel mean value R of the last three frames of images_meanAs a B channel value, a G channel value, and an R channel value for every ten frame images.

It should be noted that, the B channel mean value of the first three frames of images can be obtained by summing and averaging the B channel values of the first three frames of images, and similarly, the G channel mean value of the middle four frames of images and the R channel mean value of the last three frames of images can also be obtained by summing and averaging.

Optionally, when N is 6, i.e. six frames are used as the step size, N₁Is 2, N₂Is 2, N₃2, decomposing the acquired video to be processed into a plurality of continuous six-frame image sets according to time sequence, processing each six-frame image, checking the B channel value, the G channel value and the R channel value of each frame image through related software, calculating the B channel mean value of the first two frames of images, the G channel mean value of the middle two frames of images and the R channel mean value of the last two frames of images in each six-frame image, and obtaining the B channel mean value B of the first two frames of images_meanG channel mean G of two intermediate frame images_meanAnd the R channel mean value R of the two subsequent frames_meanAs a B channel value, a G channel value, and an R channel value for every six frame images, thereby determining a fused image.

And S202, determining a fused image according to the B channel value, the G channel value and the R channel value.

Specifically, after a B channel value, a G channel value, and an R channel value of each N frame image are obtained, a fused image is obtained.

For example, assuming that a video to be processed is 100 frames, the video to be processed is decomposed into 10 image sets including the 1 st to 10 th frames, the 11 th to 20 th frames, and the 91 st to 100 th frames according to a time sequence, then the images of the 1 st to 10 th frames are fused according to the above method, a fused image of the images of the first ten frames is determined, the fused image is input into a throw-away area positioning network, whether a throw-away area exists in the images of the first ten frames is further determined, and whether a throw-away area exists in the images of the 11 th to 20 th frames, and the 91 st to 100 th frames is sequentially determined, and a process of fusing images of every ten frames can be shown in fig. 3.

The true bookThe method for determining the fused image provided by the embodiment determines the B channel value, the G channel value and the R channel value of each frame of image, and combines N in each N frame of image₁B channel mean, N of frame image₂G-channel mean and N of frame image₃The R channel mean value of the frame image is used as a B channel value, a G channel value and an R channel value of every N frames of images, so that the fused image is determined, the video can be converted into the image for processing, the content information and the time sequence information of the image sequence are reserved to a great extent, and the complex calculated amount is avoided.

Optionally, as shown in fig. 4, after the fused image is determined, the fused image may be input into a toss area positioning network model, and the toss area positioning network model may be constructed through the following steps:

and step S301, acquiring a history video.

Specifically, the historical video refers to existing video stream data, optionally, the historical video may be obtained by downloading through a cloud, or may be obtained by importing through other devices, which is not limited in this embodiment.

For example, the process of acquiring the historical video by the computer may be: the computer equipment receives a processing instruction input by a user and acquires historical video data according to the processing instruction. The processing instruction may include the video of the throwing area, or may not include the video data of the throwing area.

Step S302, decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3.

And step S303, performing fusion processing on every N frames of images to determine a fusion image.

It should be noted that after the history video is acquired, the history video may be decomposed into continuous images of one frame according to a time sequence, then a plurality of continuous N-frame image sets are obtained with N frames as a step length, and each N-frame image set is subjected to fusion processing to determine a plurality of fusion images.

And step S304, inputting the fusion map into a preset positioning network model for training, and determining the throwing area positioning network model.

Specifically, after obtaining a plurality of fusion graphs, a feature extraction algorithm and a target detection algorithm may be adopted to train a preset positioning network model, optionally, the feature extraction algorithm may be a fast R-CNN algorithm, an SSD algorithm, or a YOLO algorithm, wherein the fast R-CNN may be mainly performed by feature extraction, taking the whole picture as input, obtaining a feature layer of the picture by using a convolutional neural network CNN, then performing region nomination, nominating k on a final convolutional feature layer by using k different rectangular frames, performing secondary classification on a region corresponding to each rectangular frame, fine-tuning the position and size of a candidate frame by using k regression models, and finally performing target classification.

In addition, the feature extraction network adopts a deep residual error network ResNet-101 model, and the model has excellent visual representation capability and generalization function and is widely applied to the fields of image object detection, image semantic segmentation, video object tracking and the like.

For example, 4900 video samples can be acquired in a real complex scene, wherein 2600 videos with throwing behaviors and 2300 non-throwing videos exist, and the video samples can be obtained according to the following steps of 8: and 2, dividing the positioning network model into a training set and a test set, wherein the training set is used for training a preset positioning network model to obtain the trained positioning network model, inputting the test set into the trained positioning network model, judging the performance of the trained positioning network model by determining model parameter indexes, and finally determining the throwing area positioning network model.

It should be noted that, after the throw area positioning network model is determined, the fused image is input into the throw area positioning network model and output as a rectangular frame positioned to the throw area, and if there is no throw area in a certain video, the fused image is not output, for example, when there is a throw area in a video, the rectangular frame positioned to the output throw area can be shown in fig. 5.

The method for constructing the tossing area locating network model provided by this embodiment determines the tossing area locating network model by acquiring the historical video, decomposing the historical video into a plurality of continuous N-frame image sets according to a time sequence, fusing every N-frame image set to determine a fusion map, and finally inputting the fusion map into the preset locating network model for training.

It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Fig. 6 is a schematic structural view of a throwing area positioning device according to an embodiment of the present invention. As shown in fig. 6, the apparatus may implement the methods shown in fig. 1 to 5, and may include:

the decomposition module 10 is configured to decompose the acquired video to be processed into a plurality of consecutive N-frame image sets, where N is a positive integer and is greater than or equal to 3;

the image determining module 20 is configured to perform fusion processing on every N frames of images to determine a fused image;

and a throwing area determining module 30, configured to input the fused image into a throwing area positioning network model, and determine a throwing area in the video to be processed.

Preferably, the image determining module 20 includes:

a first determining unit 201, configured to determine a B channel value, a G channel value, and an R channel value of the every N frame images;

a second determining unit 202, configured to determine the fused image according to the B channel value, the G channel value, and the R channel value.

Preferably, the first determining unit 201 is specifically configured to:

Preferably, the tossing area determining module 30 is configured to include a tossing area locating network model constructed by the following steps:

acquiring a historical video;

Preferably, the tossing area determining module 30 is specifically configured to:

The tossing area positioning device provided by the embodiment can execute the embodiment of the method, and the implementation principle and the technical effect are similar, and are not described again here.

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 7, a schematic structural diagram of a computer system 700 suitable for implementing the terminal device or the server of the embodiment of the present application is shown.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU701, the ROM702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 706 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 706 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, the processes described above with reference to fig. 1-5 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the methods of fig. 1-5. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a decomposition module, an image determination module, and a throw area determination module. Where the names of these units or modules do not in some cases constitute a limitation on the units or modules themselves, for example, a decomposition module may also be described as "for decomposing an acquired video to be processed into a plurality of successive sets of ten frames".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the throw area positioning method as described in the above embodiments.

For example, the electronic device may implement the following as shown in fig. 1: step S101, decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3; step S102, carrying out fusion processing on every N frames of images, and determining fused images; and step S103, inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed. As another example, the electronic device may implement the various steps shown in fig. 2 and 3.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc. Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware.

Claims

1. A method of locating a disposal area, comprising:

2. The tossing area positioning method according to claim 1, wherein performing fusion processing on every N frames of images, and determining fused images, includes:

3. The tossing area locating method according to claim 2, wherein the determining of the B-channel value, the G-channel value, and the R-channel value every N-frame image includes:

4. The tossing area positioning method according to claim 1, wherein the fused image is input to a tossing area positioning network model, and the tossing area in the video to be processed is determined, including a tossing area positioning network model constructed by:

acquiring a historical video;

5. The method of claim 4, wherein inputting the fused map into a predetermined positioning network model for training to determine the tossing area positioning network model comprises:

6. A disposal area locating device, the device comprising:

7. The tossing area positioning device of claim 6, wherein the image determination module comprises:

8. The tossing area positioning device according to claim 7, characterized in that said first determination unit is particularly adapted to:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the program.

10. A computer-readable storage medium having stored thereon a computer program for:

the computer program, when executed by a processor, implementing the method of any one of claims 1-5.