CN111753574A - Throw area positioning method, device, equipment and storage medium - Google Patents
Throw area positioning method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111753574A CN111753574A CN201910231890.XA CN201910231890A CN111753574A CN 111753574 A CN111753574 A CN 111753574A CN 201910231890 A CN201910231890 A CN 201910231890A CN 111753574 A CN111753574 A CN 111753574A
- Authority
- CN
- China
- Prior art keywords
- channel value
- image
- determining
- images
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000007499 fusion processing Methods 0.000 claims abstract description 21
- 230000004927 fusion Effects 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The application discloses a throwing area positioning method, a throwing area positioning device, throwing area positioning equipment and a storage medium. The method comprises the following steps: decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3; performing fusion processing on every N frames of images to determine fused images; and inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed. The technical scheme can accurately position the throwing area in the video, and the content information and the time sequence information of each frame of image are reserved to a great extent by carrying out fusion processing on the plurality of frames of images, so that the calculated amount is reduced, the result is stable and the accuracy is high.
Description
Technical Field
The invention relates to the field of image processing, in particular to a method, a device, equipment and a storage medium for positioning a throwing area.
Background
In recent years, feature recognition technology is a hot spot of research in the field of artificial intelligence and pattern recognition, and has been increasingly applied to daily life of people, such as: attendance system, access control system etc. based on biological feature identification, wherein, in the commodity circulation field, can carry out feature identification through the letter sorting action to the staff in real time to whether monitoring staff has the action of throwing away, thereby realize the high-efficient management to the enterprise.
At present, feature recognition can be performed through a significance detection method based on a deep learning model, a large number of data sets are needed in the model training process, training convergence is difficult to perform, the false alarm rate of cases without a throwing area is high, in addition, the throwing area in a video can be recognized through a method for generating an confrontation network based on a pix2pix model, the method can capture a motion area in the video, but the positioning of the throwing area is not clear, and the following steps need to be further manually processed, the processing process is complex, and the positioning of the throwing area is not accurate.
Disclosure of Invention
In view of the above-mentioned defects or shortcomings in the prior art, it is desirable to provide a method, an apparatus, a device and a storage medium for positioning a throwing area, which can solve the problems of inaccurate positioning and high false alarm rate of the throwing area in the prior art.
In a first aspect, the present invention provides a method of locating a disposal area, the method comprising:
decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;
performing fusion processing on every N frames of images to determine fused images;
and inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed.
In one embodiment, the fusing every N frames of images to determine fused images includes:
determining a B channel value, a G channel value and an R channel value of each N frame of image;
and determining the fused image according to the B channel value, the G channel value and the R channel value.
In one embodiment, the determining the B channel value, the G channel value, and the R channel value of each N frame image includes:
determining a B channel value, a G channel value and an R channel value of each frame of image;
n in every N frame images1B channel mean, N of frame image2G-channel mean and N of frame image3Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is1+N2+N3=N,N、N1、N2、N3Are all positive integers, N1≥1,N2≥1,N3≥1。
In one embodiment, the fused image is input into a throwing area positioning network model, and a throwing area in the video to be processed is determined, wherein the throwing area positioning network model is constructed by the following steps:
acquiring a historical video;
decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3;
performing fusion processing on every N frames of images to determine a fusion image;
and inputting the fusion graph into a preset positioning network model for training, and determining a throwing area positioning network model.
In one embodiment, inputting the fusion map into a preset positioning network model for training, and determining a throw area positioning network model includes:
and training the preset positioning network model by adopting a feature extraction algorithm and a target detection algorithm to obtain a throwing area positioning network model.
In a second aspect, embodiments of the present application provide a throw area positioning device, the device including:
the decomposition module is used for decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;
the image determining module is used for performing fusion processing on every N frames of images and determining fused images;
and the throwing area determining module is used for inputting the fused image into a throwing area positioning network model and determining the throwing area in the video to be processed.
In one embodiment, the image determination module includes:
a first determining unit configured to determine a B channel value, a G channel value, and an R channel value of the every N frame images;
and the second determining unit is used for determining the fused image according to the B channel value, the G channel value and the R channel value.
In one embodiment, the first determining unit is specifically configured to:
determining a B channel value, a G channel value and an R channel value of each frame of image;
n in every N frame images1B channel mean, N of frame image2G-channel mean and N of frame image3Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is1+N2+N3=N,N、N1、N2、N3Are all positive integers, N1≥1,N2≥1,N3≥1。
In one embodiment, the tossing area determining module comprises a tossing area locating network model constructed by the following steps:
acquiring a historical video;
decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3;
performing fusion processing on every N frames of images to determine a fusion image;
and inputting the fusion graph into a preset positioning network model for training, and determining a throwing area positioning network model.
In one embodiment, the tossing area determining module is specifically configured to:
and training the preset positioning network model by adopting a feature extraction algorithm and a target detection algorithm to obtain a throwing area positioning network model.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the method for locating a tossing area as described above when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for locating a throwing area.
The method, the device, the equipment and the storage medium for positioning the throwing area resolve the acquired video to be processed into a plurality of continuous N-frame image sets according to time sequence, then perform fusion processing on every N-frame image to determine the fused image, and finally input the fused image into a throwing area positioning network model to determine the throwing area in the video to be processed. According to the technical scheme, as the fusion processing is carried out on the multi-frame images in the video, the identification of the throwing area in the video can be converted into the identification of the throwing area in the image, and the content information and the time sequence information of each frame of image are reserved to a great extent.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a schematic flowchart of a method for positioning a throwing area according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart of a method for determining a fused image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a structure of a fused image according to an embodiment of the present invention;
FIG. 4 is a schematic flowchart of a process for constructing a throw area positioning network model according to an embodiment of the present invention;
FIG. 5 is a schematic structural view of a positioned disposal area provided by an embodiment of the present invention;
FIG. 6 is a schematic structural view of a throw area positioning device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
As mentioned in the background art, in the logistics industry, in order to pursue timeliness, a phenomenon that an express is thrown occurs to workers, and these behaviors not only cause damage to packages, but also increase various complaints, in order to efficiently manage enterprises, detection of a thrown area in a video is particularly important, in the prior art, the thrown area in the video can be located by a significance detection method based on deep learning, the method can generate a significance result for each frame of image, however, a large amount of data sets are needed in the training process of a deep learning model, training convergence is difficult to perform, no matter whether the thrown area exists in the video, a corresponding result is generated, and the false alarm rate of a case without the thrown area is high; in addition, a throw area in the video can be identified by a method for generating a confrontation network based on a pix2pix model, which can capture a picture of a motion area in the video, i.e. an image with the motion area is marked as white, and other background areas are marked as black, for example: people moving in the video, objects in motion states such as a rotating fan and the like are marked as white, but the throwing area cannot be accurately identified, the picture marked as white needs to be further manually processed, the processing process is complex, and the throwing area in the video is inaccurately positioned.
Based on the above defects, embodiments of the present invention provide a method for positioning a tossing area, which decomposes an acquired video to be processed into a plurality of consecutive N-frame image sets according to a time sequence, performs fusion processing on every N-frame image to obtain a fused image, and inputs the fused image into a tossing area positioning network model, thereby determining a tossing area in the video to be processed, which can reduce the calculation workload and accurately position the tossing area in the video.
The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
It should be noted that the execution subject of the method embodiments described below may be a throw area positioning apparatus, which may be implemented by software, hardware, or a combination of software and hardware as part or all of the terminal device. The execution subjects of the method embodiments described below are described taking a computer device as an example.
Fig. 1 is a schematic flow chart of a method for positioning a throwing area according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step S101, decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;
specifically, the video to be processed may be a video in a certain time period that needs to be processed, and may be directly acquired by a camera, downloaded by a cloud, or imported by another device, where the device may be a monitoring device.
Further, after the video to be processed is obtained, the video to be processed may be decomposed into one frame of continuous images according to a time sequence through video processing software, and then a plurality of continuous N-frame image sets are obtained with N frames as step lengths, where N is a positive integer and N is greater than or equal to 3, optionally, N may be any positive integer such as 3, 4, 5, 6, and the like, which is not limited in this embodiment.
For example, assuming that the video to be processed is 100 frames and N is 10, i.e. ten frames are taken as step size, the video can be decomposed into 10 image sets of the 1 st to 10 th frames, the 11 th to 20 th frames, the 81 th to 90 th frames, and the 91 st to 100 th frames according to time sequence.
Optionally, the image set may have an image of a throwing area, or may not have an image of a throwing area, which is not limited in this embodiment.
And S102, performing fusion processing on every N frames of images, and determining fused images.
Specifically, image fusion refers to that image data acquired by a multi-source channel is subjected to image processing, computer technology and the like, so that favorable information in each channel is extracted to the maximum extent, and finally, high-quality images are synthesized to improve the utilization rate of image information; the data form of image fusion is an image containing light and shade, color, temperature, distance and other scene features, and the images can be shown in a form of one or a column.
Generally, image fusion can be divided into pixel level fusion, feature level fusion, and decision level fusion, where there are spatial domain algorithm and transform domain algorithm in pixel level fusion, such as logic filtering method, gray-scale weighted average method, contrast modulation method, wavelet transform method, etc., and the fusion algorithm often combines the average value, entropy value, standard deviation, and average gradient of an image.
Each pixel of the image is composed of R, G, B components, and after a plurality of consecutive N-frame image sets are obtained, the fused image can be specified by performing fusion processing on each N-frame image by a fusion algorithm.
And S103, inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed.
Specifically, after each N frames of images are fused to obtain fused images, the fused images are input into a throwing area positioning network model, so that a throwing area in the video to be processed is determined, wherein the throwing area positioning network model has the capability of positioning the throwing area.
The method for positioning the throwing area comprises the steps of decomposing an acquired video to be processed into a plurality of continuous N-frame image sets according to a time sequence, fusing every N-frame image to determine a fused image, and inputting the fused image into a throwing area positioning network model to determine the throwing area in the video to be processed. According to the technical scheme, as the fusion processing is carried out on the multi-frame images in the video, the identification of the throwing area in the video can be converted into the identification of the throwing area in the image, and the content information and the time sequence information of each frame of image are reserved to a great extent.
Fig. 2 is a schematic flowchart of a method for determining a fused image according to an embodiment of the present invention, where on the basis of the foregoing embodiment, as shown in fig. 2, the method includes:
step S201, determining a B channel value, a G channel value, and an R channel value of each N frame image.
Specifically, after the acquired video to be processed is decomposed into a plurality of continuous N-frame image sets according to time sequence, each N-frame image set is processed, the B channel value, the G channel value and the R channel value of each frame of image can be checked through related software, and N in each N-frame image is calculated1B channel mean, N of frame image2G-channel mean and N of frame image3R channel mean of frame image, N1B channel mean, N of frame image2G-channel mean and N of frame image3The R-channel mean value of the frame images is taken as the B-channel value, the G-channel value and the R-channel value of each N-frame image, wherein N, N1、N2、N3Are all positive integers, and N1≥1,N2≥1,N3≥1。
Illustratively, when N is 10, i.e. in ten frames, N is the step size1Is 3, N2Is 4, N33, decomposing the acquired video to be processed into a plurality of continuous ten-frame image sets according to time sequence, processing each ten-frame image set, checking the B channel value, the G channel value and the R channel value of each frame of image through related software, calculating the B channel mean value of the first three frames of images, the G channel mean value of the middle four frames of images and the R channel mean value of the last three frames of images in each ten frames of images, and obtaining the B channel mean value B of the first three frames of imagesmeanG channel mean G of middle four-frame imagemeanAnd R channel mean value R of the last three frames of imagesmeanAs a B channel value, a G channel value, and an R channel value for every ten frame images.
It should be noted that, the B channel mean value of the first three frames of images can be obtained by summing and averaging the B channel values of the first three frames of images, and similarly, the G channel mean value of the middle four frames of images and the R channel mean value of the last three frames of images can also be obtained by summing and averaging.
Optionally, when N is 6, i.e. six frames are used as the step size, N1Is 2, N2Is 2, N32, decomposing the acquired video to be processed into a plurality of continuous six-frame image sets according to time sequence, processing each six-frame image, checking the B channel value, the G channel value and the R channel value of each frame image through related software, calculating the B channel mean value of the first two frames of images, the G channel mean value of the middle two frames of images and the R channel mean value of the last two frames of images in each six-frame image, and obtaining the B channel mean value B of the first two frames of imagesmeanG channel mean G of two intermediate frame imagesmeanAnd the R channel mean value R of the two subsequent framesmeanAs a B channel value, a G channel value, and an R channel value for every six frame images, thereby determining a fused image.
And S202, determining a fused image according to the B channel value, the G channel value and the R channel value.
Specifically, after a B channel value, a G channel value, and an R channel value of each N frame image are obtained, a fused image is obtained.
For example, assuming that a video to be processed is 100 frames, the video to be processed is decomposed into 10 image sets including the 1 st to 10 th frames, the 11 th to 20 th frames, and the 91 st to 100 th frames according to a time sequence, then the images of the 1 st to 10 th frames are fused according to the above method, a fused image of the images of the first ten frames is determined, the fused image is input into a throw-away area positioning network, whether a throw-away area exists in the images of the first ten frames is further determined, and whether a throw-away area exists in the images of the 11 th to 20 th frames, and the 91 st to 100 th frames is sequentially determined, and a process of fusing images of every ten frames can be shown in fig. 3.
The true bookThe method for determining the fused image provided by the embodiment determines the B channel value, the G channel value and the R channel value of each frame of image, and combines N in each N frame of image1B channel mean, N of frame image2G-channel mean and N of frame image3The R channel mean value of the frame image is used as a B channel value, a G channel value and an R channel value of every N frames of images, so that the fused image is determined, the video can be converted into the image for processing, the content information and the time sequence information of the image sequence are reserved to a great extent, and the complex calculated amount is avoided.
Optionally, as shown in fig. 4, after the fused image is determined, the fused image may be input into a toss area positioning network model, and the toss area positioning network model may be constructed through the following steps:
and step S301, acquiring a history video.
Specifically, the historical video refers to existing video stream data, optionally, the historical video may be obtained by downloading through a cloud, or may be obtained by importing through other devices, which is not limited in this embodiment.
For example, the process of acquiring the historical video by the computer may be: the computer equipment receives a processing instruction input by a user and acquires historical video data according to the processing instruction. The processing instruction may include the video of the throwing area, or may not include the video data of the throwing area.
Step S302, decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3.
And step S303, performing fusion processing on every N frames of images to determine a fusion image.
It should be noted that after the history video is acquired, the history video may be decomposed into continuous images of one frame according to a time sequence, then a plurality of continuous N-frame image sets are obtained with N frames as a step length, and each N-frame image set is subjected to fusion processing to determine a plurality of fusion images.
And step S304, inputting the fusion map into a preset positioning network model for training, and determining the throwing area positioning network model.
Specifically, after obtaining a plurality of fusion graphs, a feature extraction algorithm and a target detection algorithm may be adopted to train a preset positioning network model, optionally, the feature extraction algorithm may be a fast R-CNN algorithm, an SSD algorithm, or a YOLO algorithm, wherein the fast R-CNN may be mainly performed by feature extraction, taking the whole picture as input, obtaining a feature layer of the picture by using a convolutional neural network CNN, then performing region nomination, nominating k on a final convolutional feature layer by using k different rectangular frames, performing secondary classification on a region corresponding to each rectangular frame, fine-tuning the position and size of a candidate frame by using k regression models, and finally performing target classification.
In addition, the feature extraction network adopts a deep residual error network ResNet-101 model, and the model has excellent visual representation capability and generalization function and is widely applied to the fields of image object detection, image semantic segmentation, video object tracking and the like.
For example, 4900 video samples can be acquired in a real complex scene, wherein 2600 videos with throwing behaviors and 2300 non-throwing videos exist, and the video samples can be obtained according to the following steps of 8: and 2, dividing the positioning network model into a training set and a test set, wherein the training set is used for training a preset positioning network model to obtain the trained positioning network model, inputting the test set into the trained positioning network model, judging the performance of the trained positioning network model by determining model parameter indexes, and finally determining the throwing area positioning network model.
It should be noted that, after the throw area positioning network model is determined, the fused image is input into the throw area positioning network model and output as a rectangular frame positioned to the throw area, and if there is no throw area in a certain video, the fused image is not output, for example, when there is a throw area in a video, the rectangular frame positioned to the output throw area can be shown in fig. 5.
The method for constructing the tossing area locating network model provided by this embodiment determines the tossing area locating network model by acquiring the historical video, decomposing the historical video into a plurality of continuous N-frame image sets according to a time sequence, fusing every N-frame image set to determine a fusion map, and finally inputting the fusion map into the preset locating network model for training.
It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Fig. 6 is a schematic structural view of a throwing area positioning device according to an embodiment of the present invention. As shown in fig. 6, the apparatus may implement the methods shown in fig. 1 to 5, and may include:
the decomposition module 10 is configured to decompose the acquired video to be processed into a plurality of consecutive N-frame image sets, where N is a positive integer and is greater than or equal to 3;
the image determining module 20 is configured to perform fusion processing on every N frames of images to determine a fused image;
and a throwing area determining module 30, configured to input the fused image into a throwing area positioning network model, and determine a throwing area in the video to be processed.
Preferably, the image determining module 20 includes:
a first determining unit 201, configured to determine a B channel value, a G channel value, and an R channel value of the every N frame images;
a second determining unit 202, configured to determine the fused image according to the B channel value, the G channel value, and the R channel value.
Preferably, the first determining unit 201 is specifically configured to:
determining a B channel value, a G channel value and an R channel value of each frame of image;
n in every N frame images1B channel mean, N of frame image2G-channel mean and N of frame image3Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is1+N2+N3=N,N、N1、N2、N3Are all positive integers, N1≥1,N2≥1,N3≥1。
Preferably, the tossing area determining module 30 is configured to include a tossing area locating network model constructed by the following steps:
acquiring a historical video;
decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3;
performing fusion processing on every N frames of images to determine a fusion image;
and inputting the fusion graph into a preset positioning network model for training, and determining a throwing area positioning network model.
Preferably, the tossing area determining module 30 is specifically configured to:
and training the preset positioning network model by adopting a feature extraction algorithm and a target detection algorithm to obtain a throwing area positioning network model.
The tossing area positioning device provided by the embodiment can execute the embodiment of the method, and the implementation principle and the technical effect are similar, and are not described again here.
Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 7, a schematic structural diagram of a computer system 700 suitable for implementing the terminal device or the server of the embodiment of the present application is shown.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU701, the ROM702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 706 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 706 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, the processes described above with reference to fig. 1-5 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the methods of fig. 1-5. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a decomposition module, an image determination module, and a throw area determination module. Where the names of these units or modules do not in some cases constitute a limitation on the units or modules themselves, for example, a decomposition module may also be described as "for decomposing an acquired video to be processed into a plurality of successive sets of ten frames".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the throw area positioning method as described in the above embodiments.
For example, the electronic device may implement the following as shown in fig. 1: step S101, decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3; step S102, carrying out fusion processing on every N frames of images, and determining fused images; and step S103, inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed. As another example, the electronic device may implement the various steps shown in fig. 2 and 3.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc. Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware.
Claims (10)
1. A method of locating a disposal area, comprising:
decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;
performing fusion processing on every N frames of images to determine fused images;
and inputting the fused image into a throwing area positioning network model, and determining a throwing area in the video to be processed.
2. The tossing area positioning method according to claim 1, wherein performing fusion processing on every N frames of images, and determining fused images, includes:
determining a B channel value, a G channel value and an R channel value of each N frame of image;
and determining the fused image according to the B channel value, the G channel value and the R channel value.
3. The tossing area locating method according to claim 2, wherein the determining of the B-channel value, the G-channel value, and the R-channel value every N-frame image includes:
determining a B channel value, a G channel value and an R channel value of each frame of image;
n in every N frame images1B channel mean, N of frame image2G-channel mean and N of frame image3Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is1+N2+N3=N,N、N1、N2、N3Are all positive integers, N1≥1,N2≥1,N3≥1。
4. The tossing area positioning method according to claim 1, wherein the fused image is input to a tossing area positioning network model, and the tossing area in the video to be processed is determined, including a tossing area positioning network model constructed by:
acquiring a historical video;
decomposing the historical video into a plurality of continuous N-frame image sets according to time sequence; wherein N is a positive integer, and N is more than or equal to 3;
performing fusion processing on every N frames of images to determine a fusion image;
and inputting the fusion graph into a preset positioning network model for training, and determining a throwing area positioning network model.
5. The method of claim 4, wherein inputting the fused map into a predetermined positioning network model for training to determine the tossing area positioning network model comprises:
and training the preset positioning network model by adopting a feature extraction algorithm and a target detection algorithm to obtain a throwing area positioning network model.
6. A disposal area locating device, the device comprising:
the decomposition module is used for decomposing the acquired video to be processed into a plurality of continuous N-frame image sets, wherein N is a positive integer and is more than or equal to 3;
the image determining module is used for performing fusion processing on every N frames of images and determining fused images;
and the throwing area determining module is used for inputting the fused image into a throwing area positioning network model and determining the throwing area in the video to be processed.
7. The tossing area positioning device of claim 6, wherein the image determination module comprises:
a first determining unit configured to determine a B channel value, a G channel value, and an R channel value of the every N frame images;
and the second determining unit is used for determining the fused image according to the B channel value, the G channel value and the R channel value.
8. The tossing area positioning device according to claim 7, characterized in that said first determination unit is particularly adapted to:
determining a B channel value, a G channel value and an R channel value of each frame of image;
n in every N frame images1B channel mean, N of frame image2G-channel mean and N of frame image3Taking the R channel mean value of the frame image as a B channel value, a G channel value and an R channel value of each N frame image; wherein N is1+N2+N3=N,N、N1、N2、N3Are all positive integers, N1≥1,N2≥1,N3≥1。
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the program.
10. A computer-readable storage medium having stored thereon a computer program for:
the computer program, when executed by a processor, implementing the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231890.XA CN111753574A (en) | 2019-03-26 | 2019-03-26 | Throw area positioning method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231890.XA CN111753574A (en) | 2019-03-26 | 2019-03-26 | Throw area positioning method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111753574A true CN111753574A (en) | 2020-10-09 |
Family
ID=72672201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910231890.XA Pending CN111753574A (en) | 2019-03-26 | 2019-03-26 | Throw area positioning method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753574A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115334334A (en) * | 2022-07-13 | 2022-11-11 | 北京优酷科技有限公司 | Video frame insertion method and device |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080080787A1 (en) * | 2006-09-28 | 2008-04-03 | Microsoft Corporation | Salience Preserving Image Fusion |
CN101339653A (en) * | 2008-01-30 | 2009-01-07 | 西安电子科技大学 | Infrared and colorful visual light image fusion method based on color transfer and entropy information |
CN104504730A (en) * | 2014-12-19 | 2015-04-08 | 长安大学 | Method for distinguishing parked vehicle from falling object |
CN105812942A (en) * | 2016-03-31 | 2016-07-27 | 北京奇艺世纪科技有限公司 | Data interaction method and device |
CN105913456A (en) * | 2016-04-12 | 2016-08-31 | 西安电子科技大学 | Video significance detecting method based on area segmentation |
CN106570478A (en) * | 2016-11-04 | 2017-04-19 | 北京智能管家科技有限公司 | Object loss determine method and device in visual tracking |
WO2018019126A1 (en) * | 2016-07-29 | 2018-02-01 | 北京市商汤科技开发有限公司 | Video category identification method and device, data processing device and electronic apparatus |
CN108197623A (en) * | 2018-01-19 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
CN108241854A (en) * | 2018-01-02 | 2018-07-03 | 天津大学 | A kind of deep video conspicuousness detection method based on movement and recall info |
CN108288259A (en) * | 2018-01-06 | 2018-07-17 | 昆明物理研究所 | A kind of gray scale fusion Enhancement Method based on color space conversion |
CN108319905A (en) * | 2018-01-25 | 2018-07-24 | 南京邮电大学 | A kind of Activity recognition method based on long time-histories depth time-space network |
CN108875565A (en) * | 2018-05-02 | 2018-11-23 | 淘然视界(杭州)科技有限公司 | The recognition methods of railway column, storage medium, electronic equipment, system |
CN108960207A (en) * | 2018-08-08 | 2018-12-07 | 广东工业大学 | A kind of method of image recognition, system and associated component |
CN109272509A (en) * | 2018-09-06 | 2019-01-25 | 郑州云海信息技术有限公司 | A kind of object detection method of consecutive image, device, equipment and storage medium |
CN109309811A (en) * | 2018-08-31 | 2019-02-05 | 中建三局智能技术有限公司 | A kind of throwing object in high sky detection system based on computer vision and method |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109472248A (en) * | 2018-11-22 | 2019-03-15 | 广东工业大学 | A kind of pedestrian recognition methods, system and electronic equipment and storage medium again |
-
2019
- 2019-03-26 CN CN201910231890.XA patent/CN111753574A/en active Pending
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080080787A1 (en) * | 2006-09-28 | 2008-04-03 | Microsoft Corporation | Salience Preserving Image Fusion |
CN101339653A (en) * | 2008-01-30 | 2009-01-07 | 西安电子科技大学 | Infrared and colorful visual light image fusion method based on color transfer and entropy information |
CN104504730A (en) * | 2014-12-19 | 2015-04-08 | 长安大学 | Method for distinguishing parked vehicle from falling object |
CN105812942A (en) * | 2016-03-31 | 2016-07-27 | 北京奇艺世纪科技有限公司 | Data interaction method and device |
CN105913456A (en) * | 2016-04-12 | 2016-08-31 | 西安电子科技大学 | Video significance detecting method based on area segmentation |
WO2018019126A1 (en) * | 2016-07-29 | 2018-02-01 | 北京市商汤科技开发有限公司 | Video category identification method and device, data processing device and electronic apparatus |
CN106570478A (en) * | 2016-11-04 | 2017-04-19 | 北京智能管家科技有限公司 | Object loss determine method and device in visual tracking |
CN108241854A (en) * | 2018-01-02 | 2018-07-03 | 天津大学 | A kind of deep video conspicuousness detection method based on movement and recall info |
CN108288259A (en) * | 2018-01-06 | 2018-07-17 | 昆明物理研究所 | A kind of gray scale fusion Enhancement Method based on color space conversion |
CN108197623A (en) * | 2018-01-19 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
CN108319905A (en) * | 2018-01-25 | 2018-07-24 | 南京邮电大学 | A kind of Activity recognition method based on long time-histories depth time-space network |
CN108875565A (en) * | 2018-05-02 | 2018-11-23 | 淘然视界(杭州)科技有限公司 | The recognition methods of railway column, storage medium, electronic equipment, system |
CN108960207A (en) * | 2018-08-08 | 2018-12-07 | 广东工业大学 | A kind of method of image recognition, system and associated component |
CN109309811A (en) * | 2018-08-31 | 2019-02-05 | 中建三局智能技术有限公司 | A kind of throwing object in high sky detection system based on computer vision and method |
CN109272509A (en) * | 2018-09-06 | 2019-01-25 | 郑州云海信息技术有限公司 | A kind of object detection method of consecutive image, device, equipment and storage medium |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109472248A (en) * | 2018-11-22 | 2019-03-15 | 广东工业大学 | A kind of pedestrian recognition methods, system and electronic equipment and storage medium again |
Non-Patent Citations (1)
Title |
---|
陈海永 等: "复杂背景太阳能电池片表面缺陷多光谱图像融合", 《2018中国自动化大会(CAC2018)论文集》, 31 December 2018 (2018-12-31), pages 716 - 721 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115334334A (en) * | 2022-07-13 | 2022-11-11 | 北京优酷科技有限公司 | Video frame insertion method and device |
CN115334334B (en) * | 2022-07-13 | 2024-01-09 | 北京优酷科技有限公司 | Video frame inserting method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163080B (en) | Face key point detection method and device, storage medium and electronic equipment | |
CN108229321B (en) | Face recognition model, and training method, device, apparatus, program, and medium therefor | |
CN110348522B (en) | Image detection and identification method and system, electronic equipment, and image classification network optimization method and system | |
CN113313053B (en) | Image processing method, device, apparatus, medium, and program product | |
CN113691733B (en) | Video jitter detection method and device, electronic equipment and storage medium | |
CN111723728A (en) | Pedestrian searching method, system and device based on bidirectional interactive network | |
CN111144337B (en) | Fire detection method and device and terminal equipment | |
CN112364843A (en) | Plug-in aerial image target positioning detection method, system and equipment | |
CN114092759A (en) | Training method and device of image recognition model, electronic equipment and storage medium | |
CN111199238A (en) | Behavior identification method and equipment based on double-current convolutional neural network | |
US20230049656A1 (en) | Method of processing image, electronic device, and medium | |
CN115239508A (en) | Scene planning adjustment method, device, equipment and medium based on artificial intelligence | |
CN111985269A (en) | Detection model construction method, detection device, server and medium | |
CN113705380A (en) | Target detection method and device in foggy days, electronic equipment and storage medium | |
CN111753574A (en) | Throw area positioning method, device, equipment and storage medium | |
US20230048649A1 (en) | Method of processing image, electronic device, and medium | |
CN116959099A (en) | Abnormal behavior identification method based on space-time diagram convolutional neural network | |
CN114120423A (en) | Face image detection method and device, electronic equipment and computer readable medium | |
CN114844889B (en) | Video processing model updating method and device, electronic equipment and storage medium | |
CN111191593A (en) | Image target detection method and device, storage medium and sewage pipeline detection device | |
CN111382603B (en) | Track calculation device and method | |
CN112101279B (en) | Target object abnormality detection method, target object abnormality detection device, electronic equipment and storage medium | |
CN111062337B (en) | People stream direction detection method and device, storage medium and electronic equipment | |
CN116132645B (en) | Image processing method, device, equipment and medium based on deep learning | |
CN110956057A (en) | Crowd situation analysis method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |