WO2024037660A1 - 确定异常分拣区域的方法、装置、电子设备及存储介质 - Google Patents

确定异常分拣区域的方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024037660A1
WO2024037660A1 PCT/CN2023/119451 CN2023119451W WO2024037660A1 WO 2024037660 A1 WO2024037660 A1 WO 2024037660A1 CN 2023119451 W CN2023119451 W CN 2023119451W WO 2024037660 A1 WO2024037660 A1 WO 2024037660A1
Authority
WO
WIPO (PCT)
Prior art keywords
sorting
video
area
abnormal
features
Prior art date
Application number
PCT/CN2023/119451
Other languages
English (en)
French (fr)
Inventor
蔡文杰
Original Assignee
顺丰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 顺丰科技有限公司 filed Critical 顺丰科技有限公司
Publication of WO2024037660A1 publication Critical patent/WO2024037660A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • This application relates to the field of logistics sorting technology, and specifically to a method, device, electronic equipment and storage medium for determining abnormal sorting areas.
  • This application provides a method, device, electronic equipment and storage medium for determining abnormal sorting areas, aiming to solve the current problem of inaccurate abnormal sorting detection.
  • this application provides a method for determining an abnormal sorting area, including: obtaining a target video of a sorting site; extracting global action features in the target video, and distant action features in the target video, where , the remote action feature includes the remote action information in the target video; according to the global action feature and the remote action feature, the abnormal sorting area in the sorting site is determined.
  • the extracting the global action features in the target video and the distant action features in the target video includes: according to the preset image area, extracting the global action features in the target video Each video frame is divided into blocks to obtain the image blocks corresponding to each video frame; the image blocks corresponding to the same preset image area of each video frame are arranged in time sequence to obtain multiple corresponding to the target video.
  • the feature extraction process is performed on multiple image block sequences corresponding to the target video through a preset self-attention model to obtain the global action features, including:
  • the multiple image block sequences are encoded and processed to obtain the coding features corresponding to each image block sequence in the multiple image block sequences; through the preset self-attention model, one of the coding features corresponding to each image block sequence is determined.
  • weighted fusion processing is performed on the coding features corresponding to each image block sequence to obtain the global action feature.
  • each video frame in the target video is divided into blocks according to a preset image area to obtain image blocks corresponding to each video frame, including: according to the preset image area The coordinate range corresponding to the preset image area on the image coordinate system is performed, and the block processing is performed on each video frame to obtain the image block corresponding to each video frame, wherein the image coordinate system is established on the target on every video frame in the video.
  • the timing includes the timing of each video frame.
  • determining the abnormal sorting area in the sorting site based on the global action features and the long-range action features includes: fusing the global action features and the Remote action features are used to obtain enhanced action features of the target video; based on the enhanced action features, abnormal sorting areas in the sorting site are determined.
  • determining the abnormal sorting area in the sorting site based on the enhanced action features includes: performing predictive processing on the enhanced action features to obtain the sorting The initial candidates to be screened in the venue start the first position of the sorting area; obtain the second position of the package placement area in the sorting site; if the distance between the first position and the second position is greater than the preset distance threshold, all The initial sorting area is determined as the abnormal sorting area.
  • the prediction processing of the enhanced action features to obtain the initial sorting area to be screened in the sorting site includes: based on the area based on the enhanced action features.
  • the convolutional neural network generates multiple candidate frames, and filters the multiple candidate frames through non-maximum suppression to obtain the initial sorting area.
  • filtering the plurality of candidate boxes through non-maximum suppression to obtain the initial sorting area includes: using non-maximum suppression, Screen at least one candidate box whose intersection and union ratio is greater than a preset value among the plurality of candidate boxes; and predict the initial sorting area based on the at least one candidate box.
  • the method for determining an abnormal sorting area further includes: matching the abnormal sorting area with a preset sorting area in the sorting site to obtain the abnormal sorting area.
  • obtaining the target video of the sorting site includes: obtaining the first initial video and the second initial video captured by multiple video acquisition devices in the sorting site; detecting Obtain the first sorting line in the first initial video and the second sorting line in the second initial video; the first sorting line and the second sorting line include the same sorting sub-line, and if the sorting sub-line is incomplete in at least one of the first sorting line and the second sorting line, the first initial video
  • the video frames in and the video frames in the second initial video are spliced, and the spliced video frames are determined to be the target videos.
  • this application provides a device for determining an abnormal sorting area, including: an acquisition unit for acquiring a target video of the sorting site; an extraction unit for extracting global action features in the target video, and the The distant action characteristics in the target video, wherein the distant action characteristics include the distant action information in the target video; a determination unit configured to determine the sorting according to the global action characteristics and the distant action characteristics Unusual sorting areas on site.
  • the present application also provides an electronic device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program, any one of the above first aspects is implemented. Steps in the method of identifying abnormal sorting areas.
  • this application also provides a storage medium.
  • a computer program is stored on the storage medium.
  • the steps in any method of determining an abnormal sorting area in the first aspect are implemented.
  • the method for determining abnormal sorting areas determines the abnormal sorting area in the sorting site based on global action characteristics and remote action features containing remote action information, which can increase the richness of remote action information, especially in When the sorting site is large, the global action features contain less action information.
  • the distant action features can enhance the action information in the global action features, thereby improving the detection accuracy of abnormal sorting areas.
  • Figure 1 is a schematic diagram of the application scenario of the method for determining abnormal sorting areas provided by the embodiment of the present application.
  • Figure 2 is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 3a is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 3b is a schematic diagram of the detection model provided in the embodiment of the present application.
  • Figure 4 is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a preset image area provided in an embodiment of the present application.
  • Figure 6 is a schematic diagram of an image block sequence provided in an embodiment of the present application.
  • Figure 7a is a schematic diagram of the detection model provided in the embodiment of the present application.
  • Figure 7b is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 8 is a schematic flowchart of a method for determining an abnormal sorting area provided in an embodiment of the present application.
  • Figure 9 is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 10 is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 11 is a schematic flowchart of the method for determining abnormal sorting areas provided in the embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a device for determining an abnormal sorting area provided in an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • first and second are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. quantity. Thus, features defined as “first” and “second” may explicitly or implicitly include one or more of the described features. In the description of the embodiments of this application, “plurality” means two or more, unless otherwise explicitly and specifically limited.
  • the method based on deep learning can automatically learn the characteristics of violent sorting, thereby eliminating the need for manual setting of rules.
  • existing deep learning-based methods have low accuracy and cannot accurately detect areas where violent sorting occurs.
  • embodiments of the present application provide a method, device, electronic device and storage medium for determining abnormal sorting areas, which will be described in detail below.
  • the device for determining the abnormal sorting area can be integrated in an electronic device, and the electronic device can be a server, a terminal, or other equipment.
  • the execution subject of the method for determining the abnormal sorting area in the embodiment of the present application may be the device for determining the abnormal sorting area provided by the embodiment of the present application, or a server device, a physical host or a user device that integrates the device for determining the abnormal sorting area.
  • the device for determining the abnormal sorting area can be implemented in hardware or software.
  • the UE can be a smartphone, tablet, laptop, handheld computer, desktop computer or personal digital Assistant (Personal Digital Assistant, PDA) and other terminal devices.
  • the execution subject of the method of determining the abnormal sorting area in the embodiment of the present application may also be an electronic device having a processor, and the processor is configured to execute the method of determining the abnormal sorting area in the embodiment of the present application.
  • the electronic device can operate individually or in a cluster of devices.
  • Figure 1 is a schematic scene diagram of a system for determining an abnormal sorting area provided by an embodiment of the present application.
  • the method of determining an abnormal sorting area provided by an embodiment of the present application can be applied to the method of determining an abnormal sorting area as shown in Figure 1 in the system.
  • the system for determining abnormal sorting areas may include an electronic device 101 .
  • the electronic device 101 may be an independent server, or a server network or server cluster composed of servers, including but not limited to a computer, a network host, a single network server, multiple network server sets, or a cloud server composed of multiple servers.
  • Cloud servers consist of a large number of computers or network servers based on cloud computing.
  • Figure 1 is only one application scenario of the solution of the present application and does not constitute a limitation on the application scenarios of the solution of the present application.
  • Other application environments may also include those in Figure 1 Show more or fewer electronic devices, for example, only one electronic device is shown in Figure 1.
  • the system for determining the abnormal sorting area may also include one or more other electronic devices, which are not limited here.
  • the system for determining abnormal sorting areas may also include a memory 102 for storing data, such as target videos.
  • a communication connection may be established between the electronic device 101 and the memory 102, and the communication connection may be a wired or wireless network connection.
  • the electronic device 101 and the memory 102 can be deployed on the same physical device for implementation, or they can be deployed on different physical devices for implementation. When the electronic device 101 and the memory 102 are deployed on different physical devices, they can be deployed in the same local area network or in different local area networks.
  • the electronic device 101 communicates with the memory 102 to obtain the target video to the sorting site from the memory 102.
  • the electronic device 101 extracts the global action features in the target video and the distant action features in the target video, and extracts them according to The global action features and the long-range action features determine abnormal sorting areas in the sorting site.
  • the electronic device 101 may be integrated with a device for determining an abnormal sorting area.
  • the device for determining an abnormal sorting area is used to perform the above-mentioned method for determining an abnormal sorting area provided by the embodiment of the present application.
  • the method of determining the abnormal sorting area includes: obtaining the target video of the sorting site; extracting the Global action features in the target video, and long-range action features in the target video, where the long-range action features include long-range action information in the target video; according to the global action features and the long-range action features, Determine abnormal sorting areas in the sorting site.
  • Figure 2 is a schematic flowchart of a method for determining an abnormal sorting area provided by an embodiment of the present application. It should be noted that although a logical sequence is shown in the flowcharts, in some cases the steps shown or described may be performed in a sequence different from that herein.
  • the method for determining the abnormal sorting area may specifically include the following steps S201 to S203.
  • the method for determining abnormal sorting areas can be used in the field of logistics and express delivery to detect whether there is violent sorting by sorters in the sorting site. If there is violent sorting, it is considered that there is a violent sorting situation in the sorting site. Exception sorting.
  • the method for determining an abnormal sorting area provided in this embodiment can detect whether the sorter sorts the packages on the package transport device to the package placement area by throwing away fragile packages.
  • the sorting site may refer to the sorting factory affiliated to the express company.
  • the target video refers to the video to be detected.
  • the electronic device can use the real-time video captured by the video acquisition device as the target video to detect whether there is abnormal sorting in the current sorting site.
  • the electronic device can fetch frames from the video stream uploaded to the preset database by the video acquisition device.
  • the frame fetching frequency can be 6 frames per second, and when the number of fetched frames reaches the preset value, the fetched video Frames make up the target video.
  • the electronic device obtains the video frames from the above video stream whose time difference between the corresponding timestamp and the current time is within 1 second, and obtains these video frames from these Six video frames are randomly selected from the video frames, and these six video frames constitute the target video.
  • the electronic device obtains the first video frame from the above video stream where the time difference between the corresponding timestamp and the current time is within 1 second, And randomly select 6 second video frames from the first video frame, and then obtain the third video frame from the above video stream whose time difference between the corresponding timestamp and the current time is 2 seconds to 1 second, and obtain the third video frame from the third video
  • Six fourth video frames are randomly selected from the frame, and the second video frame and the fourth video frame constitute the target video.
  • the default database may refer to the database used by the express company's backend to store videos.
  • the above preset values can be set according to actual scenarios, and the values in the above examples cannot be used as limitations on the embodiments of the present application.
  • the video acquisition device may refer to a camera installed in the sorting site, etc.
  • the express company can also use the method of determining abnormal sorting areas provided by the embodiments of this application to check historical videos to determine whether any sorters have violently sorted.
  • an express delivery company can use the method of determining abnormal sorting areas provided by the embodiments of this application to determine whether any sorters have engaged in violent sorting during monthly sorter violation inspections.
  • the electronic device can perform the method of determining the abnormal sorting area provided by the embodiments of the present application by taking frames from the historical video, which will not be described in detail.
  • S202 Extract global action features in the target video and distant action features in the target video, where the long-range action features include distant action information in the target video.
  • Global action features refer to the temporal features of the action information of all sorters in the target video.
  • the electronic device can perform feature extraction processing on the target video through a preset convolutional layer in a three-dimensional convolutional neural network (3D Convolutional Neural Network) to obtain global action features.
  • 3D Convolutional Neural Network three-dimensional convolutional neural network
  • the three-dimensional convolutional neural network is a convolutional neural network that extracts features through a three-dimensional convolution kernel. Through the three-dimensional convolutional neural network, the depth information in the video can be extracted.
  • open source software such as PoseC3D can be used.
  • a three-dimensional convolutional neural network is used to extract global action features.
  • the preset three-dimensional convolutional neural network can be trained by the initial three-dimensional convolutional neural network.
  • an initial 3D convolutional neural network can be trained by:
  • the action tag is obtained by manual annotation;
  • the convolutional layer in the initial three-dimensional convolutional neural network Through the convolutional layer in the initial three-dimensional convolutional neural network, the temporal features in the sample video are extracted, and through the fully connected layer in the initial three-dimensional convolutional neural network, the predicted action type corresponding to the sample video is predicted based on the temporal characteristics;
  • the parameters in the initial three-dimensional convolutional neural network are adjusted to obtain a preset three-dimensional convolutional neural network.
  • the convolutional layer in the initial three-dimensional convolutional neural network can be trained, and the trained convolutional layer can effectively extract the global action features in the target video.
  • the distant action feature refers to the temporal characteristics of the distant action information in the target video.
  • the long-range action information refers to the action information contained in the long-range area in the corresponding image area in the target video.
  • the long-range area refers to the sorting site area that is far away from the video acquisition device.
  • global action features include both long-range action information and close-range action information, while long-range action features only contain far-range action information.
  • the types of action information contained in global action features More.
  • electronic devices can extract distant action features in the target video through the preset three-dimensional convolutional neural network.
  • the specific training method can be referred to the above. Just change the action label of the sample video to the type of distant action of the sorter in the sample video, and the area where the distant action occurs in the sample video.
  • S203 Determine the abnormal sorting area in the sorting site according to the global action characteristics and the long-range action characteristics.
  • the abnormal sorting area can refer to the area where abnormal sorting behavior occurs in the sorting site. There can be one or multiple areas.
  • the electronic device can fuse global action features and long-range action features, and then predict based on the fused features to obtain the area where abnormal sorting behavior occurs in the target video, and then based on the image area of the target video and the site area of the sorting site.
  • the preset conversion relationship is used to obtain the abnormal sorting area.
  • the preset conversion relationship may refer to a coordinate conversion relationship, that is, establishing an image coordinate system on the target video, and establishing a site coordinate system on the sorting site, and then converting the coordinates in the image coordinate system to the coordinates in the site coordinate system.
  • the conversion relationship is finally stored in the backend database of the express company as a preset conversion relationship.
  • the electronic device reads the preset conversion relationship from the backend database.
  • the purpose of fusing global action features and distant action features is to improve the richness of distant action information, which can then increase the detection accuracy of abnormal sorting areas. Since the sorting site is large, the distance between the distant view area and the video acquisition device is usually relatively small. Therefore, the corresponding image area of the distant view area in the target video is smaller, and the global action features contain less distant action information. If only through The preset three-dimensional convolutional neural network extracts global action features, and then determines the abnormal sorting area based on the global action features. When the abnormal sorting area is part of the distant view area, the electronic device may not be able to accurately detect the abnormal sorting area. The fusion of global action features and long-range action features can improve the richness of long-range action information, thus improving the accuracy of abnormal sorting area detection, which is especially suitable for application scenarios in large venues.
  • step S203 Determine the abnormal sorting area in the sorting site based on the global action characteristics and the long-range action characteristics
  • S310 Fusion of the global action features and the long-range action features to obtain enhanced action features of the target video.
  • Enhanced action features refer to features obtained by fusing global action features and long-range action features. It can be understood that enhanced action features contain rich long-range action information and close-range action information.
  • the electronic device can fuse global action features and distant action features through a preset Feature Pyramid Network (FPN) to obtain enhanced action features of the target video.
  • FPN Feature Pyramid Network
  • Feature pyramid network is a network model that can fuse multi-scale features.
  • S320 Determine the abnormal sorting area in the sorting site according to the enhanced action characteristics.
  • the region selection network (Region Proposal Network, RPN) in the preset Faster Region-Convolutional Neural Network (Faster Region-Convolutional Neural Network, Faster R-CNN) can be used.
  • RPN Region Proposal Network
  • Faster Region-Convolutional Neural Network, Faster R-CNN Faster R-CNN
  • Faster R-CNN is a neural network that generates candidate frames and detects and filters candidate frames through non-maximum suppression to obtain the target. It can include convolutional layers, RPN layers and prediction layers.
  • the convolutional layer is used to extract features
  • the RPN layer is used to generate candidate frames, and detect and filter candidate frames through non-maximum suppression.
  • the prediction layer can be composed of a fully connected layer, which is used to filter the candidate frames based on Features included for target prediction.
  • IoU is used to determine the degree of overlap between the candidate box and the abnormal sorting area. The larger the IoU, the higher the degree of overlap between the candidate box and the abnormal sorting area. The smaller the IoU, the greater the overlap between the candidate box and the abnormal sorting area. The lower the degree of overlap.
  • Figure 3b shows a detection model 300 that can be used for anomaly sorting detection.
  • the detection model 300 includes a first feature extraction layer 301, a second feature extraction layer 302, a feature fusion layer 303, and a prediction layer 304. .
  • the first feature extraction layer 301 may be composed of a convolution layer of a first three-dimensional convolutional neural network and is used to extract global action features.
  • the second feature extraction layer 302 may be composed of a convolutional layer of a second three-dimensional convolutional neural network, and is used to extract distant action features. It can be understood that the first three-dimensional convolutional neural network and the second three-dimensional convolutional neural network can be different three-dimensional convolutional neural networks.
  • the feature fusion layer 303 may be composed of a feature pyramid network and is used to fuse global action features and distant action features to obtain enhanced action features.
  • the prediction layer 304 can be composed of the RPN layer and the prediction layer in Faster R-CNN, and is used to predict abnormal sorting areas based on enhanced action features.
  • the method for determining the abnormal sorting area includes: obtaining a target video of the sorting site; extracting global action features in the target video and distant action features in the target video; According to the global action characteristics and the long-range action characteristics, abnormal sorting areas in the sorting site are determined.
  • the method for determining abnormal sorting areas determines the abnormal sorting areas in the sorting site based on global action characteristics and remote action features containing remote action information, which can increase the richness of remote action information.
  • the global action features contain less action information.
  • the distant action features can enhance the action information in the global action features and improve the detection accuracy of abnormal sorting areas.
  • step S202 Extracting global action features in the target video and distant action features in the target video” may include the following content.
  • each video frame in the target video is divided into blocks to obtain image blocks corresponding to each video frame.
  • the image block corresponding to the video frame may refer to the sub-image obtained after cropping the video frame.
  • the electronic device can establish an image coordinate system on the video frames in the target video, and then perform block processing on the video frames according to the coordinate range corresponding to the preset image area on the image coordinate system to obtain corresponding image blocks.
  • the number of image blocks corresponding to each video frame should be the same as the number of preset image areas.
  • Figure 5 shows a situation of block processing. Assume that in the example of Figure 5, the preset image area is four rectangular areas of the same size, so the block processing of video frame 501 is Refers to dividing the video frame 501 into four rectangular image blocks of the same size, namely image blocks A, B, C and D.
  • the number and size of the preset image areas can be set according to the needs of the actual scene, and this is not limited in the embodiments of the present application.
  • S402 Arrange the image blocks corresponding to the same preset image area in each video frame in time sequence to obtain multiple image block sequences corresponding to the target video.
  • Arranging according to time sequence means: arranging image blocks corresponding to the same image area according to the time sequence of the video frames corresponding to each image block.
  • the preset image area in Figure 6 is four rectangular areas of the same size, that is, segmenting the video frame means dividing the video frame into four rectangular image blocks of the same size.
  • the video frames in the target video include video frames 601, 602, and 603 in total, then after the video frames 601, 602, and 603 are divided into blocks, image blocks 6011, 6012, 6013, 6014, 6021, 6022, 6023, 6024, and 6031 can be obtained respectively. , 6032, 6033, 6034.
  • the first image block sequence composed of image blocks 6011, 6021, 6031, and the second image block sequence composed of image blocks 6012, 6022, 6032 can be obtained.
  • the third image block sequence consists of 6023 and 6033, and the fourth image block sequence consists of image blocks 6014, 6024 and 6034. It can be understood that the number of image block sequences is also the same as the number of preset image areas.
  • S403 Perform feature extraction processing on multiple image block sequences corresponding to the target video through a preset self-attention model to obtain the global action features.
  • the detection model 700 includes: a blocking layer 701, an encoding layer 702, a self-attention layer 703, and a downsampling layer 704. Feature fusion layer 705, and prediction layer 706.
  • the blocking layer 701 is used to perform blocking processing and sorting processing on video frames to obtain multiple image block sequences corresponding to the target video.
  • the purpose of steps S401 and S402 can be achieved through the blocking layer 701 .
  • the Patch Partition module in the swim transformer model can be used as the patching layer 701.
  • the encoding layer 702 is used to encode each image block sequence to obtain the characteristics of each image block sequence.
  • the Linear Embedding module in the swim transformer model can be used as the encoding layer 702.
  • the self-attention layer 703 is used to perform self-attention processing on each image block sequence to obtain the attention weight of each image block sequence, and perform weighted fusion of the features of each image block sequence according to the corresponding attention weight to obtain Global action features.
  • the Basic Layer module in the swim transformer model can be used as the self-attention layer 703.
  • the downsampling layer 704 is used to downsample global action features to obtain distant action features.
  • another Basic Layer module in the swim transformer model connected to the self-attention layer 703 can be used as the downsampling layer 704.
  • the feature fusion layer 705 may be composed of a feature pyramid network and is used to fuse global action features and distant action features to obtain enhanced action features.
  • the prediction layer 706 can be composed of the RPN layer and the prediction layer in Faster R-CNN, and is used to predict abnormal sorting areas based on enhanced action features.
  • the above-mentioned swim transformer is a model that can improve calculation speed through block attention.
  • the electronic device can call the trained detection model 700 to achieve the purpose of step S403 through the encoding layer 702 and the self-attention layer 703 in the trained detection model 700.
  • the accuracy of feature extraction can be improved through the self-attention mechanism.
  • the amount of calculation required by the self-attention mechanism can be reduced through blocking, thus achieving a lightweight model while ensuring accuracy. .
  • step S403 "performs feature extraction processing on multiple image block sequences corresponding to the target video through a preset self-attention model. , obtain the global action features in the target video", which may include the following content.
  • S710 Perform coding processing on the multiple image block sequences to obtain coding features corresponding to each image block sequence in the multiple image block sequences.
  • the electronic device can call the coding layer 702 in the trained detection model 700 to achieve the purpose of step S710.
  • the coding features of the corresponding image area can be obtained.
  • S720 Determine the attention weight between coding features corresponding to each image block sequence through a preset self-attention model.
  • the attention weight of each encoding feature is the attention weight between the encoding feature and other encoding features except the encoding feature.
  • the electronic device can call the self-attention layer 703 in the trained detection model 700 to achieve the purpose of step S720.
  • z i refers to the attention weight between the i-th coding feature and other coding features except the i-th coding feature
  • j refers to the j-th other coding feature
  • n refers to the total number of coding features, that is, the image
  • the total number of block sequences xi refers to the i-th coding vector
  • d k refers to the dimension of K, that is, the vector dimension of the product of W k and the coding vector
  • W q , W k , and W v are all preset parameters.
  • b refers to the global action feature
  • z i refers to the attention weight between the i-th coding feature and other coding features except the i-th coding feature
  • x i refers to the i-th coding vector
  • n refers to The total number of encoding features, that is, the total number of image patch sequences.
  • S404 Perform downsampling processing on the global action features to obtain the distant action features.
  • the downsampling layer 704 in the trained detection model 700 can be called to achieve the purpose of step S404.
  • the global action features contain less distant action information. Therefore, even if the global action features are downsampled, the probability of removing the distant action information from the global action features is relatively small. If it is lower, other information that is not related to the distant action information can be compressed, and then the distant action features containing the distant action information can be obtained.
  • steps S401 to S404 can be extracted through the self-attention mechanism, and when performing feature extraction processing through the self-attention mechanism, by dividing the video frame into blocks, it can reduce Self-attention computation.
  • the detection model 700 used in steps S401 to S404 does not need to add model branches to extract distant action features, but can directly downsample the global action features to obtain the distant action features, which reduces Detect the number of parameters and calculation time of the model.
  • the distant action features are obtained by downsampling in steps S401 to S404, after downsampling, the scales of the global feature features and the distant action features are different.
  • the detection model 700 uses a feature pyramid network that can fuse features of different scales.
  • the feature fusion layer 705 it is used to fuse global feature features and distant action features to ensure that the enhanced action features will not contain erroneous information due to scale mismatch during feature fusion.
  • S801 Perform prediction processing on the enhanced action features to obtain the first position of the initial sorting area to be screened in the sorting site.
  • the electronic device can call the prediction layer 706 in the trained detection model 700 to achieve the purpose of step S801.
  • the initial sorting area refers to the sorting area in the sorting site that is predicted by the prediction layer 706 to contain abnormal sorting behavior.
  • the first position may refer to the position of the initial sorting area in the sorting site.
  • the description of the package placement area can be referred to step S201, and details will not be described again.
  • the second location may refer to the location of the package placement area in the sorting yard.
  • the electronic device can train an open-source target detection network to obtain a region detection network.
  • the region detection network Through the region detection network, the corresponding image coordinates of the placed region in the image coordinate system of the target video are detected, and then the predetermined Assume a transformation relationship to transform the image coordinates in the image coordinate system to the second position in the site coordinate system.
  • electronic devices can train YOLOv2 to obtain a region detection network.
  • YOLOv2 is a network that implements target detection through convolutional layers and fully connected layers.
  • a preset distance threshold is used to evaluate the distance between the initial sorting area and the package placement area. If the distance between the first position and the second position is less than or equal to the preset distance threshold, it means that the initial sorting area and the package placement area are close to each other, and the reason for the abnormal sorting action of the sorter may be considered Even if the packages are thrown into the placement area, they will not be damaged because they are close to each other. It is not that the sorters do not consider the safety of the packages. At this time, the initial sorting area is not regarded as an abnormal sorting area.
  • the initial sorting area can be used as an abnormal sorting area.
  • the preset distance threshold can be stored in the courier company's backend database.
  • the abnormal sorting area after obtaining the abnormal sorting area, the abnormal sorting area can be matched with the preset areas corresponding to each sorting line in the sorting site, the sorting line corresponding to the abnormal sorting behavior can be determined, and then the output The alarm corresponding to the sorting line Information to facilitate management by managers at the sorting site.
  • the method after the step "determine the abnormal sorting area in the sorting site according to the global action characteristics and the long-range action characteristics", the method also includes the following content.
  • S901 Match the abnormal sorting area with the preset sorting area in the sorting site to obtain the target sorting area to which the abnormal sorting area belongs and the target sorting area corresponding to the target sorting area. Line picking.
  • the preset sorting area refers to the preset work area corresponding to the sorting line.
  • abnormal sorting behavior occurs in the preset sorting area, it means that the sorter who made the abnormal sorting behavior corresponds to the preset sorting area.
  • Sorting line workers are used.
  • the preset sorting areas and corresponding sorting lines can be stored in the express company's backend database.
  • the electronic device can respectively obtain the coordinate areas of the abnormal sorting area and the preset sorting area in the site coordinate system, and then obtain the target containing the abnormal sorting area from the preset sorting area based on the respective coordinate areas.
  • the sorting area, and the target sorting line corresponding to the target sorting area can respectively obtain the coordinate areas of the abnormal sorting area and the preset sorting area in the site coordinate system, and then obtain the target containing the abnormal sorting area from the preset sorting area based on the respective coordinate areas.
  • the sorting area, and the target sorting line corresponding to the target sorting area can respectively obtain the coordinate areas of the abnormal sorting area and the preset sorting area in the site coordinate system, and then obtain the target containing the abnormal sorting area from the preset sorting area based on the respective coordinate areas.
  • the sorting area, and the target sorting line corresponding to the target sorting area can respectively obtain the coordinate areas of the abnormal sorting area and the preset sorting area in the site coordinate system, and then obtain the target containing
  • S902 Send the alarm information corresponding to the target sorting line to the target terminal.
  • the alarm information may include text information, voice information, etc., which are not limited in the embodiments of the present application.
  • the electronic device can generate corresponding alarm information according to the target sorting line.
  • the alarm information is text information
  • the electronic device can generate text information such as "violent sorting behavior occurs in the target production line" and send it to the target terminal.
  • the target terminal may refer to a smart phone, a personal computer, a management platform, etc., which is not limited in the embodiments of this application.
  • the target terminal may refer to the site management platform of the sorting site.
  • step S201 “Obtaining the target video of the sorting site” may include the following content.
  • the first initial video and the second initial video may refer to two videos captured by two adjacent video acquisition devices deployed in the sorting site. For example, it can refer to the video at both ends captured by two adjacent cameras. It should be noted that the shooting time of the first initial video and the second initial video should be the same.
  • the first initial video and the second initial video should contain information on at least part of the same area in the sorting site (for example, contain information on at least one identical sorting line).
  • the first initial video and the second initial video respectively obtained by two adjacent cameras refer to Figure 11.
  • the area 1101 in the sorting site 1100 is the image capture area of the first camera
  • the area 1102 is the image capture area of the adjacent second camera
  • the area 1103 in the sorting site 1100 is the sorting site area captured by the first camera and the second camera at the same time, and is included in the first initial video and the second initial video. Information about the same area in the sorting yard.
  • the electronic device can train an open source target detection network to obtain a sorting line detection network, and then detect the first sorting line and the second sorting line through the sorting line detection network.
  • YOLOv2 can be trained to obtain a sorting line detection network. The description of YOLOv2 can be referred to above, and the details will not be repeated.
  • An incomplete sorting line means that the entire sorting line is not captured in the corresponding video. For example, when the sorting line is too long and the entire sorting line cannot be photographed within the field of view of the corresponding video acquisition device, the sorting line is an incomplete sorting line for the video captured by the video acquisition device.
  • the first sorting line and the second sorting line include the same sorting sub-line, and the sorting sub-line is between the first sorting line and the second sorting line. If at least one sorting line is incomplete, the video frames in the first initial video and the video frames in the second initial video are spliced, and the spliced video frames are determined to be the target videos.
  • the electronic device can obtain the first image coordinates of the first sorting line in the image coordinate system of the first initial video, and the second image coordinates of the second sorting line in the image coordinate system of the second initial video. , and convert the first image coordinates into field coordinates in the field coordinate system through the preset conversion relationship corresponding to the first initial video, and convert the second image coordinates into the field coordinate system through the preset conversion relationship corresponding to the second initial video. If there are a first sorting line and a second sorting line with the same site coordinates, it means that the first sorting line and the second sorting line contain the same sorting sub-line.
  • the first sorting line and the second sorting line output by the sorting line detection network will carry complete information, so the electronic device can directly read the information.
  • the sorting sub-line is in the
  • the information carried in the first sorting line and/or the second sorting line is an "incomplete sorting line”
  • the video frames in the first initial video and the video frames in the second initial video are spliced. , obtain the spliced video frames, and the target video composed of the spliced video frames.
  • the embodiment of the present application also provides a device for determining the abnormal sorting area, as shown in Figure 12 is a schematic structural diagram of a device for determining an abnormal sorting area in an embodiment of the present application.
  • the device 1200 for determining an abnormal sorting area includes:
  • the acquisition unit 1201 is used to acquire the target video of the sorting site
  • Extraction unit 1202 configured to extract global action features in the target video and distant action features in the target video, where the long-range action features include distant action information in the target video;
  • the determination unit 1203 is configured to determine an abnormal sorting area in the sorting site according to the global action characteristics and the long-range action characteristics.
  • the extraction unit 1202 is specifically configured to: perform block processing on each video frame in the target video according to a preset image area to obtain image blocks corresponding to each video frame. ; Arrange the image blocks corresponding to the same preset image area of each video frame in time sequence to obtain multiple image block sequences corresponding to the target video; through the preset self-attention model, the corresponding target video Perform feature extraction processing on multiple image block sequences to obtain the global action features; perform downsampling processing on the global action features to obtain the distant action features.
  • the extraction unit 1202 when the extraction unit 1202 performs feature extraction processing on multiple image block sequences corresponding to the target video through a preset self-attention model to obtain the global action features, specifically Used for: performing coding processing on the plurality of image block sequences to obtain coding features corresponding to each image block sequence in the plurality of image block sequences; determining each image block sequence through the preset self-attention model The attention weights between corresponding coding features; according to the attention weights corresponding to the coding features corresponding to each image block sequence, weighted fusion processing is performed on the coding features corresponding to each image block sequence to obtain the global action features.
  • the extraction unit 1202 divides each video frame in the target video into blocks according to the preset image area to obtain the image block corresponding to each video frame, specifically For: performing the block processing on each video frame according to the coordinate range corresponding to the preset image area on the image coordinate system to obtain the image block corresponding to each video frame, wherein the image coordinate system Built on each video frame in the target video.
  • the timing includes the timing of each video frame.
  • the determination unit 1203 is specifically configured to: fuse the global action features and the long-range action features to obtain enhanced action features of the target video; and determine the enhanced action features based on the enhanced action features. Describe the abnormal sorting area in the sorting site.
  • the determination unit 1203 when determining the abnormal sorting area in the sorting site based on the enhanced action features, is specifically configured to: perform prediction processing on the enhanced action features, Obtain the first position of the initial sorting area to be screened in the sorting site; obtain the second position of the package placement area in the sorting site; if the distance between the first position and the second position is greater than the preset distance threshold, the initial sorting area is determined as the abnormal sorting area.
  • the determination unit 1203 when the determination unit 1203 performs prediction processing on the enhanced action characteristics to obtain the initial sorting area to be screened in the sorting site, it is specifically configured to: based on the enhanced action Features, through base A convolutional neural network based on the region generates multiple candidate frames, and filters the multiple candidate frames through non-maximum suppression to obtain the initial sorting region.
  • the determination unit 1203 when the determination unit 1203 filters the plurality of candidate boxes through non-maximum value suppression to obtain the initial sorting area, it is specifically configured to: Large value suppression: screening at least one candidate box whose intersection ratio is greater than a preset value among the plurality of candidate boxes; predicting the initial sorting area based on the at least one candidate box.
  • the determination unit 1203 is also used to match the abnormal sorting area with a preset sorting area in the sorting site to obtain the target to which the abnormal sorting area belongs.
  • the acquisition unit 1201 is specifically configured to: acquire the first initial video and the second initial video captured by multiple video acquisition devices in the sorting site; detect that the first The first sorting line in the initial video, and the second sorting line in the second initial video; the first sorting line and the second sorting line include the same sorting sub-line, And if the sorting sub-line is incomplete in at least one of the first sorting line and the second sorting line, the video frames in the first initial video and The video frames in the second initial video are spliced, and the spliced video frames are determined to be the target videos.
  • each of the above units can be implemented as an independent entity, or can be combined in any way to be implemented as the same or several entities.
  • each of the above units please refer to the previous method embodiments, and will not be described again here.
  • the device for determining an abnormal sorting area can perform the steps in the method for determining an abnormal sorting area in any embodiment, it can achieve the beneficial effects that can be achieved by the method for determining an abnormal sorting area in any embodiment of the present application. See the previous description for details and will not go into details here.
  • FIG. 13 shows is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device provided by an embodiment of the present application includes a processor 1301.
  • the processor 1301 is used to implement any of the above embodiments when executing the computer program stored in the memory 1302.
  • a computer program can be divided into one or more modules/units, and one or more modules/units are stored in the memory 1302 and executed by the processor 1301 to complete the embodiment of the present application.
  • One or more modules/units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program in the computer device.
  • the electronic device may include, but is not limited to, a processor 1301 and a memory 1302. Those skilled in the art can understand that the illustrations are only examples of the electronic equipment and do not constitute a limitation on the electronic equipment.
  • the electronic equipment may include more or less components than those shown in the figures, or some components may be combined, or different components may be used.
  • the processor 1301 can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or a readily available processor.
  • Programmable gate array Field-Programmable Gate Array, FPGA
  • a general-purpose processor can be a microprocessor or the processor can be any conventional processor, etc.
  • the processor is the control center of the electronic device and uses various interfaces and lines to connect various parts of the entire electronic device.
  • the memory 1302 may be used to store computer programs and/or modules.
  • the processor 1301 implements various functions of the computer device by running or executing the computer programs and/or modules stored in the memory 1302 and calling data stored in the memory 1302.
  • the memory 1302 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store data based on Data created by the use of electronic equipment (such as audio data, video data, etc.), etc.
  • memory may include high-speed random access memory and may also include non-volatile memory such as hard drives, RAM, plug-in Hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • non-volatile memory such as hard drives, RAM, plug-in Hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
  • embodiments of the present application provide a storage medium.
  • a computer program is stored on the storage medium.
  • the steps in the method for determining an abnormal sorting area described in any embodiment of the present application are performed. , for specific operations, please refer to the description of the method of determining the abnormal sorting area in any embodiment, and will not be described again here.
  • the storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or optical disk etc.
  • the steps in the method for determining the abnormal sorting area described in any embodiment of the present application can be executed. Therefore, the method of determining the abnormal sorting area described in any embodiment of the present application can be implemented.
  • the beneficial effects that can be achieved by this method are detailed in the previous description and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种确定异常分拣区域的方法、装置、电子设备及存储介质,包括:获取分拣场地的目标视频;提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。本申请实施例提供的确定异常分拣区域的方法,根据全局动作特征和包含远景动作信息的远景动作特征,确定分拣场地中的异常分拣区域,可以增加远景动作信息的丰富性,尤其在分拣场地较大的情况下,全局动作特征中包含的动作信息较少,通过远景动作特征可以增强全局动作特征中的动作信息,从而提高了异常分拣区域的检测准确性。

Description

确定异常分拣区域的方法、装置、电子设备及存储介质 技术领域
本申请涉及物流分拣技术领域,具体涉及一种确定异常分拣区域的方法、装置、电子设备及存储介质。
发明背景
暴力分拣动作检测这一任务非常困难。物流场景中,监控摄像头画面包含的场景很大,而具体抛扔的区域很小。另外,画面中也存在很多运动的物体,包括叉车、皮带运转的货物、人等,这使得基于人工设定规则的方法难以准确检测暴力分拣。
发明内容
本申请提供一种确定异常分拣区域的方法、装置、电子设备及存储介质,旨在解决目前的异常分拣检测不准确的问题。
第一方面,本申请提供一种确定异常分拣区域的方法,包括:获取分拣场地的目标视频;提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
在本申请一种可能的实现方式中,所述提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,包括:根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块;将每个视频帧的对应同一预设的图像区域的图像块按照时序进行排列,得到所述目标视频对应的多个图像块序列;通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征;对所述全局动作特征进行降采样处理,得到所述远景动作特征。
在本申请一种可能的实现方式中,所述通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征,包括:对所述多个图像块序列进行编码处理,得到多个图像块序列中的每个图像块序列对应的编码特征;通过所述预设的自注意力模型,确定每个图像块序列对应的编码特征之间的注意力权重;根据每个图像块序列对应的编码特征对应的注意力权重,对每个图像块序列对应的编码特征进行加权融合处理,得到所述全局动作特征。
在本申请一种可能的实现方式中,所述根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块,包括:根据所述预设的图像区域在图像坐标系上对应的坐标范围,对每个视频帧进行所述分块处理,得到每个视频帧对应的图像块,其中,所述图像坐标系建立在所述目标视频中的每个视频帧上。
在本申请一种可能的实现方式中,所述时序包括每个视频帧的时序。
在本申请一种可能的实现方式中,所述根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域,包括:融合所述全局动作特征和所述远景动作特征,得到所述目标视频的增强动作特征;根据所述增强动作特征,确定所述分拣场地中的异常分拣区域。
在本申请一种可能的实现方式中,所述根据所述增强动作特征,确定所述分拣场地中的异常分拣区域,包括:对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初 始分拣区域的第一位置;获取所述分拣场地中包裹放置区域的第二位置;若所述第一位置与所述第二位置之间的距离大于预设的距离阈值,则将所述初始分拣区域确定为所述异常分拣区域。
在本申请一种可能的实现方式中,所述对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域,包括:根据所述增强动作特征,通过基于区域的卷积神经网络,生成多个候选框,并通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域。
在本申请一种可能的实现方式中,所述通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域,包括:通过所述非极大值抑制,在所述多个候选框中筛选交并比大于预设值的至少一个候选框;根据所述至少一个候选框,预测所述初始分拣区域。
在本申请一种可能的实现方式中,确定异常分拣区域的方法还包括:将所述异常分拣区域与所述分拣场地中的预设分拣区域进行匹配,得到所述异常分拣区域所属的目标分拣区域,以及所述目标分拣区域对应的目标分拣线;向目标终端发生所述目标分拣线对应的告警信息。
在本申请一种可能的实现方式中,所述获取分拣场地的目标视频,包括:获取所述分拣场地中多个视频获取装置各自拍摄得到的第一初始视频和第二初始视频;检测得到所述第一初始视频中的第一分拣线,以及所述第二初始视频中的第二分拣线;在所述第一分拣线和所述第二分拣线中包含同一个分拣子线,并且所述分拣子线在所述第一分拣线和所述第二分拣线中的至少一个分拣线中存在不完整的情况下,对所述第一初始视频中的视频帧和所述第二初始视频中的视频帧进行拼接,确定拼接后的视频帧为目标视频。
第二方面,本申请提供一种确定异常分拣区域的装置,包括:获取单元,用于获取分拣场地的目标视频;提取单元,用于提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;确定单元,用于根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
第三方面,本申请还提供一种电子设备,包括处理器、存储器以及存储于存储器中并能够在处理器上运行的计算机程序,处理器执行计算机程序时,实现上述第一方面中任一种确定异常分拣区域的方法中的步骤。
第四方面,本申请还提供一种存储介质,存储介质上存储有计算机程序,计算机程序被处理器执行时,实现上述第一方面中任一种确定异常分拣区域的方法中的步骤。
本申请实施例提供的确定异常分拣区域的方法,根据全局动作特征和包含远景动作信息的远景动作特征,确定分拣场地中的异常分拣区域,可以增加远景动作信息的丰富性,尤其在分拣场地较大的情况下,全局动作特征中包含的动作信息较少,通过远景动作特征可以增强全局动作特征中的动作信息,从而提高了异常分拣区域的检测准确性。
附图简要说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的确定异常分拣区域的方法的应用场景示意图。
图2是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图3a是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图3b是本申请实施例中提供的检测模型的示意图。
图4是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图5是本申请实施例中提供的预设的图像区域的示意图。
图6是本申请实施例中提供的图像块序列的示意图。
图7a是本申请实施例中提供的检测模型的示意图。
图7b是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图8是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图9是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图10是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图11是本申请实施例中提供的确定异常分拣区域的方法的流程示意图。
图12是本申请实施例中提供的确定异常分拣区域的装置的结构示意图。
图13是本申请实施例中提供的电子设备的结构示意图。
实施本发明的方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请实施例的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
为了使本领域任何技术人员能够实现和使用本申请,给出了以下描述。在以下描述中,为了解释的目的而列出了细节。应当明白的是,本领域普通技术人员可以认识到,在不使用这些特定细节的情况下也可以实现本申请。在其它实例中,不会对公知的过程进行详细阐述,以避免不必要的细节使本申请实施例的描述变得晦涩。因此,本申请并非旨在限于所示的实施例,而是与符合本申请实施例所公开的原理和特征的最广范围相一致。
目前应用较多的方法为基于深度学习的方法,基于深度学习的方法能够自动地学习暴力分拣的特征,从而免去人工设定规则。然而,现有的基于深度学习的方法准确率不高,无法准确检测出现暴力分拣的区域。
为解决上述问题,本申请实施例提供一种确定异常分拣区域的方法、装置、电子设备和存储介质,以下分别进行详细说明。
在一示例中,该确定异常分拣区域的装置可以集成在电子设备中,该电子设备可以是服务器,也可以是终端等设备。
本申请实施例确定异常分拣区域的方法的执行主体可以为本申请实施例提供的确定异常分拣区域的装置,或者集成了该确定异常分拣区域的装置的服务器设备、物理主机或者用户设备(User Equipment,UE)等不同类型的电子设备,确定异常分拣区域的装置可以采用硬件或者软件的方式实现,UE具体可以为智能手机、平板电脑、笔记本电脑、掌上电脑、台式电脑或者个人数字助理(Personal Digital Assistant,PDA)等终端设备。本申请实施例确定异常分拣区域的方法的执行主体还可以为具有处理器的电子设备,处理器用于执行本申请实施例的确定异常分拣区域的方法。
该电子设备可以采用单独运行的工作方式,或者也可以采用设备集群的工作方式。
图1是本申请实施例所提供的确定异常分拣区域的系统的场景示意图,本申请实施例提供的确定异常分拣区域的方法,可以应用于如图1所示的确定异常分拣区域的系统中。参见图1,该确定异常分拣区域的系统可以包括电子设备101。
电子设备101可以是独立的服务器,也可以是服务器组成的服务器网络或服务器集群,其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云服务器。云服务器由基于云计算(Cloud Computing)的大量计算机或网络服务器构成。
本领域技术人员可以理解,图1中示出的应用环境,仅仅是与本申请方案的一种应用场景,并不构成对本申请方案应用场景的限定,其他的应用环境还可以包括比图1中所示更多 或更少的电子设备,例如图1中仅示出1个电子设备,可以理解的,该确定异常分拣区域的系统还可以包括一个或多个其他电子设备,具体此处不作限定。
另外,如图1所示,该确定异常分拣区域的系统还可以包括存储器102,用于存储数据,如存储目标视频等。
电子设备101与存储器102之间可以建立通信连接,该通信连接可以是有线或无线网络连接。可选的,在部署实现上,电子设备101与存储器102可以部署在同一物理设备上实现,也可以部署在不同物理设备上实现。当电子设备101与存储器102部署在不同物理设备上实现时,两者可以部署同一局域网内,也可以部署在不同局域网内。
下面结合本申请方案和图1所示的确定异常分拣区域的系统,对确定异常分拣区域的过程进行详细说明。
电子设备101与存储器102进行通信,以从存储器102中获取至分拣场地的目标视频,电子设备101提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,并根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
在一示例中,电子设备101中可以集成有确定异常分拣区域的装置,该确定异常分拣区域的装置用于执行本申请实施例提供的上述确定异常分拣区域的方法。
需要说明的是,图1所示的确定异常分拣区域的系统的场景示意图仅仅是一个示例,本申请实施例描述的确定异常分拣区域的系统以及场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着确定异常分拣区域的系统的演变和新业务场景的出现,本发明实施例提供的技术方案对于类似的技术问题,同样适用。
下面结合图2至11,对本申请实施例提供的确定异常分拣区域的方法进行更为详细的举例说明。
本申请实施例中以电子设备作为执行主体,为了简化与便于描述,后续方法实施例中将省略该执行主体,该确定异常分拣区域的方法包括:获取分拣场地的目标视频;提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
参照图2,图2是本申请实施例提供的确定异常分拣区域的方法的流程示意图。需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。该确定异常分拣区域的方法具体可以包括以下步骤S201至步骤S203。
S201,获取分拣场地的目标视频。
本申请实施例提供的确定异常分拣区域的方法可用于物流快递领域,检测分拣场地中是否存在分拣员暴力分拣的情况,若存在暴力分拣的情况,则认为分拣场地中存在异常分拣。例如,本请实施例提供的确定异常分拣区域的方法可以检测分拣员是否通过抛扔等易损坏包裹的动作,将包裹运输装置上的包裹分拣至包裹放置区域。分拣场地可以是指快递公司下属的分拣厂。
目标视频是指待检测的视频。示例性地,电子设备可以将视频获取装置拍摄的实时视频作为目标视频,检测当前分拣场地中是否存在异常分拣。例如,电子设备可以从视频获取装置上传至预设数据库的视频流中取帧,取帧的频率可以为每秒6帧,并在取到的帧数达到预设值时,将取到的视频帧组成目标视频。若取帧的频率为每秒6帧,并且预设值为6帧,则电子设备从上述视频流中获取对应时间戳与当前时间之间的时间差为1秒之内的视频帧,并从这些视频帧中随机选取6张视频帧,以这6张视频帧构成目标视频。若取帧的频率可以为每秒6帧,并且预设值为12帧,则电子设备从上述视频流中获取对应时间戳与当前时间之间的时间差为1秒之内的第一视频帧,并从第一视频帧中随机选取6张第二视频帧,然后从上述视频流中获取对应时间戳与当前时间之间的时间差为2秒至1秒的第三视频帧,并从第三视频帧中随机选取6张第四视频帧,以第二视频帧和第四视频帧构成目标视频。
预设数据库可以是指快递公司后台用于存储视频的数据库。上述预设值可以根据实际场景进行设置,不能将上述例子中的值作为对本申请实施例的限制。视频获取装置可以是指安装在分拣场地中的摄像头等等。
在另一些实施例中,快递公司还可以通过本申请实施例提供的确定异常分拣区域的方法,对历史拍摄的视频进行检查,判断是否曾有分拣员存在暴力分拣的情况。例如,快递公司可以在每月的分拣员违规检查中通过本申请实施例提供的确定异常分拣区域的方法,判断是否曾有分拣员存在暴力分拣的情况。同样地,电子设备可以通过对历史视频进行取帧的方法,执行本申请实施例提供的确定异常分拣区域的方法,具体不进行赘述。
S202,提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息。
全局动作特征是指目标视频中所有分拣员的动作信息的时序特征。示例性地,电子设备可以通过预设的三维卷积神经网络(3D Convolutional Neural Network)中的卷积层对目标视频进行特征提取处理,得到全局动作特征。
三维卷积神经网络是一种通过三维卷积核提取特征的卷积神经网络,通过三维卷积神经网络,可以提取得到视频中的深度信息,在本申请实施例中,可以采用PoseC3D等等开源的三维卷积神经网络,提取得到全局动作特征。
预设的三维卷积神经网络可以由初始的三维卷积神经网络训练得到。例如,可以通过以下方法训练初始的三维卷积神经网络:
获取携带动作标签的样本视频,其中,动作标签包括样本视频中分拣员的分拣动作类型,以及样本视频中分拣动作发生的区域,动作标签由人工进行标注得到;
通过初始的三维卷积神经网络中的卷积层,提取样本视频中的时序特征,并通过初始的三维卷积神经网络中的全连接层,根据时序特征,预测样本视频对应的预测动作类型;
根据预测动作类型和动作标签,对初始的三维卷积神经网络中的参数进行调整,得到预设的三维卷积神经网络。
通过上述训练方法,可以对初始的三维卷积神经网络中的卷积层进行训练,训练后的卷积层可以有效提取目标视频中的全局动作特征。
远景动作特征是指所述目标视频中的远景动作信息的时序特征。远景动作信息是指远景区域在目标视频中对应的图像区域内,所包含的动作信息,远景区域是指与视频获取装置之间相距较远的分拣场地区域。
可以理解的,全局动作特征与远景动作特征之间的区别在于,全局动作特征同时包括了远景动作信息和近景动作信息,而远景动作特征仅包含远景动作信息,全局动作特征中包含的动作信息种类更多。
同样地,电子设备可以通过预设的三维卷积神经网络,提取得到目标视频中的远景动作特征。具体的训练方法可以参考上文,将样本视频的动作标签更改为样本视频中分拣员的远景动作类型,以及样本视频中远景动作发生的区域即可。
S203,根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
异常分拣区域可以是指在分拣场地中发生异常分拣行为的区域,可以有一个,可以有多个。
电子设备可以将全局动作特征和远景动作特征融合后,根据融合得到的特征进行预测,得到目标视频中发生异常分拣行为的区域,然后根据目标视频的图像区域和分拣场地的场地区域之间的预设转换关系,得到异常分拣区域。
预设转换关系可以是指坐标转换关系,即在目标视频上建立图像坐标系,并在分拣场地上建立场地坐标系,然后将图像坐标系中的坐标与场地坐标系中的坐标之间的转换关系作为预设转换关系,最后存储在快递公司的后台数据库中,在执行步骤S203时,电子设备从该后台数据库中读取得到预设转换关系。
可以理解的,将全局动作特征和远景动作特征融合的目的是提高远景动作信息的丰富性,进而可以增加异常分拣区域的检测准确性。由于分拣场地的场地较大,通常远景区域与视频获取装置之前相距较远,因此远景区域在目标视频中对应的图像区域较小,全局动作特征中包含的远景动作信息较少,若仅通过预设的三维卷积神经网络提取全局动作特征,然后根据全局动作特征确定异常分拣区域,则在异常分拣区域为远景区域的一部分时,电子设备可能无法精确检测到该异常分拣区域。而将全局动作特征和远景动作特征融合后,可以提高远景动作信息的丰富性,因此可以提高异常分拣区域检测的准确性,特别适用于大场地的应用场景。
示例性地,如图3a所示,步骤S203“根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域”,可以包括如下内容。
S310,融合所述全局动作特征和所述远景动作特征,得到所述目标视频的增强动作特征。
增强动作特征是指将全局动作特征和远景动作特征融合后得到的特征,可以理解的,增强动作特征中包含丰富的远景动作信息,以及近景动作信息。
在一些实施例中,电子设备可以通过预设的特征金字塔网络(Feature Pyramid Network,FPN),融合全局动作特征和远景动作特征,得到目标视频的增强动作特征。特征金字塔网络是一种可以融合多尺度特征的网络模型。
S320,根据所述增强动作特征,确定所述分拣场地中的异常分拣区域。
在一些实施例中,电子设备在执行步骤S320时,可以通过预设的快速区域卷积神经网络(Faster Region-Convolutional Neural Network,Faster R-CNN)中的区域选取网络(Region Proposal Network,RPN)层和预测层,根据增强动作特征,预测得到异常分拣区域。
Faster R-CNN是一种通过生成候选框,并通过非极大值抑制,对候选框进行检测和筛选,得到目标的神经网络,其可以包括卷积层、RPN层和预测层。卷积层用于提取特征,RPN层用于生成候选框,并通过非极大值抑制,对候选框进行检测和筛选,预测层可以由全连接层构成,用于根据筛选后的候选框内包含的特征进行目标预测。
需要说明的是,由于在目标视频中出现暴力分拣行为的可能性不高,异常分拣区域的数量不多,因此,若想要通过Faster R-CNN,预测异常分拣区域,则在通过非极大值抑制,对候选框进行筛选时,可以仅提取交并比(Intersection overUnion,IoU)最大的n个候选框,根据这n个候选框,预测异常分拣区域。
IoU用于判断候选框与异常分拣区域之间的重叠程度,IoU越大,候选框与异常分拣区域之间的重叠程度越高,IoU越小,候选框与异常分拣区域之间的重叠程度越低。
为了方便理解,图3b示出了一种可以用于异常分拣检测的检测模型300,检测模型300包括第一特征提取层301,第二特征提取层302,特征融合层303,以及预测层304。
第一特征提取层301可以由第一三维卷积神经网络的卷积层构成,用于提取全局动作特征。
第二特征提取层302可以由第二三维卷积神经网络的卷积层构成,用于提取远景动作特征。可以理解的,第一三维卷积神经网络和第二三维卷积神经网络可以为不同的三维卷积神经网络。
特征融合层303可以由特征金字塔网络构成,用于融合全局动作特征和远景动作特征,得到增强动作特征。
预测层304可以由Faster R-CNN中的RPN层和预测层构成,用于根据增强动作特征,预测异常分拣区域。
综上所述,本申请实施例提供的确定异常分拣区域的方法包括:获取分拣场地的目标视频;提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征;根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
可见,本申请实施例提供的确定异常分拣区域的方法,根据全局动作特征和包含远景动作信息的远景动作特征,确定分拣场地中的异常分拣区域,可以增加远景动作信息的丰富性, 尤其在分拣场地较大的情况下,全局动作特征中包含的动作信息较少,通过远景动作特征可以增强全局动作特征中的动作信息,提高异常分拣区域的检测准确性。
为了进一步提高异常分拣区域的检测准确性,在一些实施例中,可以通过自注意力机制,提取全局动作特征,并且在通过自注意力机制进行特征提取处理时,通过对视频帧进行分块的方式,能够降低自注意力的计算量。参考图4,步骤S202“提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征”,可以包括如下内容。
S401,根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块。
视频帧对应的图像块可以是指对该视频帧进行裁剪后,得到的子图像。
示例性地,电子设备可以在目标视频中的视频帧上建立图像坐标系,然后根据预设的图像区域在图像坐标系上对应的坐标范围,对视频帧进行分块处理,得到对应的图像块,可以理解的,每个视频帧对应的图像块的数量应当与预设的图像区域的数量相同。为了方便理解,图5中示出了一种分块处理的情况,假设在图5的例子中,预设的图像区域为4个相同大小的矩形区域,因此对视频帧501进行分块处理是指将视频帧501分为4个大小相同的矩形图像块,即图像块A、B、C和D。预设的图像区域的数量和大小可以根据实际场景的需求进行设置,本申请实施例对此不进行限制。
需要说明的是,为了保证后续对图像块进行自注意力处理时,不会提取到重复的信息,预设的图像区域之间没有重叠区域。以图5中的视频帧502为例,假设根据预设的图像区域进行划分后,视频帧502被分为4个图像块a、b、c和d,则a和b之间包含重叠区域e,因此不能按照本例中预设的图像区域,对视频帧进行分块处理。
S402,将每个视频帧的对应同一预设的图像区域的图像块按照时序进行排列,得到所述目标视频对应的多个图像块序列。
按照时序进行排列是指:将对应同一图像区域的图像块按照各图像块对应的视频帧的时序进行排列。为了方便理解,参考图6,假设图6中预设的图像区域为4个相同大小的矩形区域,即对视频帧进行分块处理是指将视频帧分为4个大小相同的矩形图像块,并且目标视频中的视频帧共包含视频帧601、602、603,则视频帧601、602、603分块后可以分别得到图像块6011、6012、6013、6014、6021、6022、6023、6024、6031、6032、6033、6034,进行排序后,可以得到由图像块6011、6021、6031构成的第一图像块序列,由图像块6012、6022、6032构成的第二图像块序列,由图像块6013、6023、6033构成的第三图像块序列,由图像块6014、6024、6034构成的第四图像块序列。可以理解的,图像块序列的数量同样与预设的图像区域的数量相同。
S403,通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征。
参考图7a,图7a中示出了另一种可以用于异常分拣检测的检测模型700,检测模型700包括:分块层701,编码层702,自注意力层703,降采样层704,特征融合层705,以及预测层706。
分块层701用于对视频帧进行分块处理和排序处理,得到目标视频对应的多个图像块序列,例如,可以通过分块层701达到步骤S401和S402的目的。示例性地,可以将swim transformer模型中的Patch Partition模块作为分块层701。
编码层702用于对每个图像块序列进行编码处理,得到每个图像块序列的特征。示例性地,可以将swim transformer模型中的Linear Embedding模块作为编码层702。
自注意力层703用于对每一个图像块序列进行自注意力处理,得到各图像块序列的注意力权重,并根据对应的注意力权重,对每一个图像块序列的特征进行加权融合,得到全局动作特征。示例性地,可以将swim transformer模型中的Basic Layer模块作为自注意力层703。
降采样层704用于对全局动作特征进行降采样处理,得到远景动作特征。示例性地,可以将swim transformer模型中的另一个与自注意力层703相连的Basic Layer模块作为降采样层704。
特征融合层705可以由特征金字塔网络构成,用于融合全局动作特征和远景动作特征,得到增强动作特征。
预测层706可以由Faster R-CNN中的RPN层和预测层构成,用于根据增强动作特征,预测异常分拣区域。
上述swim transformer是一种可以通过分块注意力提高计算速度的模型。
可见,电子设备可以调用训练好的检测模型700,通过训练好的检测模型700中的编码层702和自注意力层703实现步骤S403的目的。一方面,可以通过自注意力机制提高特征提取的准确性,另一方面,通过分块的方式,降低自注意力机制所需要的计算量,在保证了准确性的同时实现了模型的轻量化。
为了方便理解,以下给出一种步骤S403的具体实现方式,如图7b所示,步骤S403“通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述目标视频中的全局动作特征”,可以包括如下内容。
S710,对所述多个图像块序列进行编码处理,得到多个图像块序列中的每个图像块序列对应的编码特征。
示例性地,电子设备可以调用训练好的检测模型700中的编码层702,实现步骤S710的目的,将每一个图像序列输入编码层702,即可得到对应图像区域的编码特征。
S720,通过预设的自注意力模型,确定每个图像块序列对应的编码特征之间的注意力权重。
每个编码特征的注意力权重为该编码特征与除该编码特征以外的其他编码特征之间的注意力权重。
示例性地,电子设备可以调用训练好的检测模型700中的自注意力层703,实现步骤S720的目的,在自注意力层703中,可以通过式子(1)至式子(4)计算得到每个编码特征对应的注意力权重:

Qi=Wqxi      式子(2);
Ki=Wkxi          式子(3);
Vi=Wvxi         式子(4),
其中,zi是指第i个编码特征与除第i个编码特征外的其他编码特征之间的注意力权重,j是指第j个其他编码特征,n是指编码特征的总数,即图像块序列的总数,xi是指第i个编码向量,dk是指K的维度,即Wk与编码向量乘积的向量维度,Wq、Wk、Wv均为预设的参数。
S730,根据所述每个图像块序列对应的编码特征对应的注意力权重,对每个图像块序列对应的编码特征进行加权融合处理,得到全局动作特征。
示例性地,可以通过式子(5)得到全局动作特征:
其中,b是指全局动作特征,zi是指第i个编码特征与除第i个编码特征外的其他编码特征之间的注意力权重,xi是指第i个编码向量,n是指编码特征的总数,即图像块序列的总数。
S404,对所述全局动作特征进行降采样处理,得到所述远景动作特征。
在一些实施例中,可以调用训练好的检测模型700中的降采样层704,实现步骤S404的目的。
由于远景区域在目标视频中对应的图像区域较小,全局动作特征中包含的远景动作信息较少,因此,即使对全局动作特征进行降采样,去除全局动作特征中的远景动作信息的概率也相对较低,反而可以压缩其中与远景动作信息无关的其他信息,进而得到包含远景动作信息的远景动作特征。
可见,通过步骤S401至S404的方法,一方面,可以通过自注意力机制,提取全局动作特征,并且在通过自注意力机制进行特征提取处理时,通过对视频帧进行分块的方式,能够降低自注意力的计算量。另一方面,相比检测模型300,步骤S401至S404中采用的检测模型700无需增加模型分支来提取远景动作特征,而是可以直接对全局动作特征进行降采样,以得到远景动作特征,降低了检测模型的参数量和计算时间。此外,由于步骤S401至S404中通过降采样的方法得到远景动作特征,经过降采样后,全局特征特征和远景动作特征的尺度不同,因此,检测模型700中采用能够融合不同尺度特征的特征金字塔网络作为特征融合层705,以融合全局特征特征和远景动作特征,能够保证特征融合时不会因为尺度不匹配而导致增强动作特征中包含错误的信息。
为了避免误判,在一些实施例中还可以根据分拣员与包裹放置区域之间的区域,判断异常分拣动作的产生原因是否为分拣员与包裹放置区域过近。参考图8,步骤S320“根据所述增强动作特征,确定所述分拣场地中的异常分拣区域”,可以包括如下内容。
S801,对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域的第一位置。
示例性地,电子设备可以调用训练好的检测模型700中的预测层706,实现步骤S801的目的。此时,初始分拣区域是指分拣场地中,预测层706预测的包含异常分拣行为的分拣区域。第一位置可以是指初始分拣区域在分拣场地中的位置。
S802,获取所述分拣场地中包裹放置区域的第二位置。
包裹放置区域的说明可以参考步骤S201中,具体不进行赘述。第二位置可以是指包裹放置区域在分拣场地中的位置。
在一些实施例中,电子设备可以对开源的目标检测网络进行训练,得到区域检测网络,通过区域检测网络,检测放置区域在目标视频的图像坐标系中对应的图像坐标,然后通过上文中的预设转换关系,将图像坐标系中的图像坐标转换至场地坐标系中的第二位置。例如,电子设备可以对YOLOv2进行训练,得到区域检测网络。
场地坐标系和图像坐标系的说明可以参考上文,具体不进行赘述。YOLOv2是一种通过卷积层和全连接层实现目标检测的网络。
S803,若所述第一位置与所述第二位置之间的距离大于预设的距离阈值,则将所述初始分拣区域确定为异常分拣区域。
预设的距离阈值用于评估初始分拣区域与包裹放置区域之间的距离大小。若第一位置与第二位置之间的距离小于或者等于预设的距离阈值,则说明初始分拣区域与包裹放置区域之间相距较近,分拣员出现异常分拣动作的原因可能是考虑到包裹即使抛扔至放置区域,也因为相距较近而不会被损坏,而并非是分拣员没有考虑包裹安全,此时不将初始分拣区域作为异常分拣区域。若第一位置与所述第二位置之间的距离大于预设的距离阈值,则说明初始分拣区域与放置区域之间相距较远,分拣员出现异常分拣动作的原因是未考虑包裹安全,此时可以将初始分拣区域作为异常分拣区域。预设的距离阈值可以存储在快递公司的后台数据库中。
可见,通过步骤S801至S803的方法,可以提高异常分拣区域的检测准确性,避免误判。
在一些实施例中,可以在得到异常分拣区域后,将异常分拣区域与分拣场地中各分拣线对应的预设区域进行匹配,确定异常分拣行为对应的分拣线,然后输出该分拣线对应的告警 信息,方便分拣场地的管理人员进行管理。参考图9,步骤“根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域”之后,所述方法还包括如下内容。
S901,将所述异常分拣区域与所述分拣场地中的预设分拣区域进行匹配,得到所述异常分拣区域所属的目标分拣区域,以及所述目标分拣区域对应的目标分拣线。
预设分拣区域是指分拣线对应的预设工作区域,当异常分拣行为发生在预设分拣区域时,说明作出异常分拣行为的分拣员为该预设分拣区域对应的分拣线的工作人员。预设分拣区域以及对应的分拣线可以存储在快递公司的后台数据库中。
示例性地,电子设备可以分别获取异常分拣区域和预设分拣区域在场地坐标系中的坐标区域,然后根据各自的坐标区域,从预设分拣区域中获取包含异常分拣区域的目标分拣区域,以及目标分拣区域对应的目标分拣线。
S902,向目标终端发送所述目标分拣线对应的告警信息。
告警信息可以包括文字信息、语音信息等等,本申请实施例对此不进行限制。
示例性地,电子设备可以根据目标分拣线,生成对应的告警信息。例如当告警信息为文字信息时,电子设备在确定目标分拣线后,可以生成以“目标生产线出现暴力分拣行为”为例的文字信息,并将其发送至目标终端。
目标终端可以是指智能手机、个人电脑、管理平台等等,本申请实施例对此不进行限制。例如,目标终端可以是指分拣场地的场地管理平台。
在一些实施例中,为了避免不同视频获取装置拍摄到同一分拣线时,视频中远景的图像内容不同,进而预测得到的同一分拣线的异常分拣结果不同,导致管理人员产生困惑,可以将多个视频获取装置拍摄的视频进行拼接,然后将拼接的视频作为目标视频。参考图10,步骤S201“获取分拣场地的目标视频”,可以包括如下内容。
S1001,获取分拣场地中多个视频获取装置各自拍摄得到的第一初始视频和第二初始视频。
第一初始视频和第二初始视频可以是指分拣场地中部署的两个相邻视频获取装置所捕获的两段视频。例如,可以是指两个相邻摄像头所捕获的两端视频。需要说明的是,第一初始视频和第二初始视频的拍摄时间应当相同。
根据本申请实施例所要解决的技术问题可知,第一初始视频和第二初始视频内应当包含了分拣场地中至少一部分相同区域的信息(例如,包含至少一条相同的分拣线的信息)。以通过两个相邻摄像头分别获取第一初始视频和第二初始视频为例进行说明,参考图11,在图11中,分拣场地1100内的区域1101为第一摄像头的图像捕获区域,区域1102为相邻的第二摄像头的图像捕获区域,因此分拣场地1100中的区域1103为同时被第一摄像头和第二摄像头捕获的分拣场地区域,第一初始视频和第二初始视频中包含分拣场地中同一区域的信息。
S1002,检测得到所述第一初始视频中的第一分拣线,以及所述第二初始视频中的第二分拣线。
示例性地,电子设备可以对开源的目标检测网络进行训练,得到分拣线检测网络,然后通过分拣线检测网络,检测得到第一分拣线和第二分拣线。例如,可以对YOLOv2进行训练,得到分拣线检测网络。YOLOv2的说明可以参考上文,具体不进行赘述。
为了方便下面的步骤,可以将“完整分拣线”、“不完整分拣线”作为样本的标签,对开源的目标检测网络进行训练,进而分拣线检测网络输出的第一分拣线和第二分拣线会携带是否完整的信息。一条分拣线不完整是指对应的视频中未拍摄到整条分拣线。例如分拣线过长,对应的视频获取装置的视野内无法拍摄到整条分拣线时,对于该视频获取装置拍摄得到的视频,该分拣线即为不完整分拣线。
S1003,在所述第一分拣线和所述第二分拣线中包含同一个分拣子线,并且所述分拣子线在所述第一分拣线和所述第二分拣线中的至少一个分拣线中存在不完整的情况下,对所述第一初始视频中的视频帧和所述第二初始视频中的视频帧进行拼接,确定拼接后的视频帧为目标视频。
示例性地,电子设备可以获取第一初始视频的图像坐标系中,第一分拣线的第一图像坐标,以及第二初始视频的图像坐标系中,第二分拣线的第二图像坐标,并通过第一初始视频对应的预设转换关系,将第一图像坐标转换场地坐标系中的场地坐标,通过第二初始视频对应的预设转换关系,将第二图像坐标转换场地坐标系中的场地坐标,若存在场地坐标相同的第一分拣线和第二分拣线,则说明第一分拣线和第二分拣线中包含同一个分拣子线。从上文中可知,通过训练,分拣线检测网络输出的第一分拣线和第二分拣线会携带是否完整的信息,因此电子设备可以直接读取该信息,当分拣子线在第一分拣线和/或第二分拣线中携带的信息为“不完整分拣线”时,对所述第一初始视频中的视频帧和所述第二初始视频中的视频帧进行拼接,得到拼接后的视频帧,以及由所述拼接后的视频帧构成的目标视频。
上文结合图1至11,详细描述了本申请的方法实施例,下面结合图12和13,详细描述本申请的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
为了更好实施本申请实施例中的确定异常分拣区域的方法,在确定异常分拣区域的方法基础之上,本申请实施例中还提供一种确定异常分拣区域的装置,如图12所示,为本申请实施例中确定异常分拣区域的装置的结构示意图,该确定异常分拣区域的装置1200包括:
获取单元1201,用于获取分拣场地的目标视频;
提取单元1202,用于提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;
确定单元1203,用于根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
在本申请一种可能的实现方式中,提取单元1202具体用于:根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块;将每个视频帧的对应同一预设的图像区域的图像块按照时序进行排列,得到所述目标视频对应的多个图像块序列;通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征;对所述全局动作特征进行降采样处理,得到所述远景动作特征。
在本申请一种可能的实现方式中,提取单元1202在通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征时,具体用于:对所述多个图像块序列进行编码处理,得到多个图像块序列中的每个图像块序列对应的编码特征;通过所述预设的自注意力模型,确定每个图像块序列对应的编码特征之间的注意力权重;根据每个图像块序列对应的编码特征对应的注意力权重,对每个图像块序列对应的编码特征进行加权融合处理,得到所述全局动作特征。
在本申请一种可能的实现方式中,提取单元1202在根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块时,具体用于:根据所述预设的图像区域在图像坐标系上对应的坐标范围,对每个视频帧进行所述分块处理,得到每个视频帧对应的图像块,其中,所述图像坐标系建立在所述目标视频中的每个视频帧上。
在本申请一种可能的实现方式中,所述时序包括每个视频帧的时序。
在本申请一种可能的实现方式中,确定单元1203具体用于:融合所述全局动作特征和所述远景动作特征,得到所述目标视频的增强动作特征;根据所述增强动作特征,确定所述分拣场地中的异常分拣区域。
在本申请一种可能的实现方式中,确定单元1203在根据所述增强动作特征,确定所述分拣场地中的异常分拣区域时,具体用于:对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域的第一位置;获取所述分拣场地中包裹放置区域的第二位置;若所述第一位置与所述第二位置之间的距离大于预设的距离阈值,则将所述初始分拣区域确定为所述异常分拣区域。
在本申请一种可能的实现方式中,确定单元1203在对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域时,具体用于:根据所述增强动作特征,通过基 于区域的卷积神经网络,生成多个候选框,并通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域。
在本申请一种可能的实现方式中,确定单元1203在通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域时,具体用于:通过所述非极大值抑制,在所述多个候选框中筛选交并比大于预设值的至少一个候选框;根据所述至少一个候选框,预测所述初始分拣区域。
在本申请一种可能的实现方式中,确定单元1203还用于将所述异常分拣区域与所述分拣场地中的预设分拣区域进行匹配,得到所述异常分拣区域所属的目标分拣区域,以及所述目标分拣区域对应的目标分拣线;向目标终端发生所述目标分拣线对应的告警信息。
在本申请一种可能的实现方式中,获取单元1201具体用于:获取所述分拣场地中多个视频获取装置各自拍摄得到的第一初始视频和第二初始视频;检测得到所述第一初始视频中的第一分拣线,以及所述第二初始视频中的第二分拣线;在所述第一分拣线和所述第二分拣线中包含同一个分拣子线,并且所述分拣子线在所述第一分拣线和所述第二分拣线中的至少一个分拣线中存在不完整的情况下,对所述第一初始视频中的视频帧和所述第二初始视频中的视频帧进行拼接,确定拼接后的视频帧为目标视频。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
由于该确定异常分拣区域的装置可以执行任意实施例中确定异常分拣区域的方法中的步骤,因此,可以实现本申请任意实施例中确定异常分拣区域的方法所能实现的有益效果,详见前面的说明,在此不再赘述。
此外,为了更好实施本申请实施例中确定异常分拣区域的方法,在确定异常分拣区域的方法基础之上,本申请实施例还提供一种电子设备,参阅图13,图13示出了本申请实施例电子设备的一种结构示意图,具体的,本申请实施例提供的电子设备包括处理器1301,处理器1301用于执行存储器1302中存储的计算机程序时,实现如上任一实施例所述的确定异常分拣区域的方法的各步骤;或者,处理器1301用于执行存储器1302中存储的计算机程序时,实现如图12对应实施例中各单元的功能。
示例性的,计算机程序可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器1302中,并由处理器1301执行,以完成本申请实施例。一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序在计算机装置中的执行过程。
电子设备可包括,但不仅限于处理器1301和存储器1302。本领域技术人员可以理解,示意仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。
处理器1301可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,处理器是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分。
存储器1302可用于存储计算机程序和/或模块,处理器1301通过运行或执行存储在存储器1302内的计算机程序和/或模块,以及调用存储在存储器1302内的数据,实现计算机装置的各种功能。存储器1302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据(比如音频数据、视频数据等)等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式 硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的确定异常分拣区域的装置、电子设备及其相应单元的具体工作过程,可以参考任意实施例中确定异常分拣区域的方法的说明,具体在此不再赘述。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种存储介质,存储介质上存储有计算机程序,该计算机程序被处理器执行时,执行本申请任一实施例所述的确定异常分拣区域的方法中的步骤,具体操作可参考任意实施例中确定异常分拣区域的方法的说明,在此不再赘述。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的指令,可以执行本申请任一实施例所述的确定异常分拣区域的方法中的步骤,因此,可以实现本申请任一实施例所述的确定异常分拣区域的方法所能实现的有益效果,详见前面的说明,在此不再赘述。
以上对本申请实施例所提供的一种确定异常分拣区域的方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种确定异常分拣区域的方法,其特征在于,包括:
    获取分拣场地的目标视频;
    提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;
    根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
  2. 根据权利要求1所述的确定异常分拣区域的方法,其特征在于,所述提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,包括:
    根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块;
    将每个视频帧的对应同一预设的图像区域的图像块按照时序进行排列,得到所述目标视频对应的多个图像块序列;
    通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征;
    对所述全局动作特征进行降采样处理,得到所述远景动作特征。
  3. 根据权利要求2所述的确定异常分拣区域的方法,其特征在于,所述通过预设的自注意力模型,对所述目标视频对应的多个图像块序列进行特征提取处理,得到所述全局动作特征,包括:
    对所述多个图像块序列进行编码处理,得到所述多个图像块序列中的每个图像块序列对应的编码特征;
    通过所述预设的自注意力模型,确定每个图像块序列对应的编码特征之间的注意力权重;
    根据每个图像块序列对应的编码特征对应的注意力权重,对每个图像块序列对应的编码特征进行加权融合处理,得到所述全局动作特征。
  4. 根据权利要求2或3所述的确定异常分拣区域的方法,其特征在于,所述根据预设的图像区域,将所述目标视频中的每个视频帧进行分块处理,得到每个视频帧对应的图像块,包括:
    根据所述预设的图像区域在图像坐标系上对应的坐标范围,对每个视频帧进行所述分块处理,得到每个视频帧对应的图像块,其中,所述图像坐标系建立在所述目标视频中的每个视频帧上。
  5. 根据权利要求2至4中任一项所述的确定异常分拣区域的方法,其特征在于,所述时序包括每个视频帧的时序。
  6. 根据权利要求1至5中任一项所述的确定异常分拣区域的方法,其特征在于,所述根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域,包括:
    融合所述全局动作特征和所述远景动作特征,得到所述目标视频的增强动作特征;
    根据所述增强动作特征,确定所述分拣场地中的异常分拣区域。
  7. 根据权利要求6所述的确定异常分拣区域的方法,其特征在于,所述根据所述增强动作特征,确定所述分拣场地中的异常分拣区域,包括:
    对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域的第一位置;
    获取所述分拣场地中包裹放置区域的第二位置;
    若所述第一位置与所述第二位置之间的距离大于预设的距离阈值,则将所述初始分拣区域确定为所述异常分拣区域。
  8. 根据权利要求7所述的确定异常分拣区域的方法,其特征在于,所述对所述增强动作特征进行预测处理,得到所述分拣场地中待筛选的初始分拣区域,包括:
    根据所述增强动作特征,通过基于区域的卷积神经网络,生成多个候选框,并通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域。
  9. 根据权利要求8所述的确定异常分拣区域的方法,其特征在于,所述通过非极大值抑制,对所述多个候选框进行筛选,得到所述初始分拣区域,包括:
    通过所述非极大值抑制,在所述多个候选框中筛选交并比大于预设值的至少一个候选框;
    根据所述至少一个候选框,预测所述初始分拣区域。
  10. 根据权利要求1至9中任一项所述的确定异常分拣区域的方法,其特征在于,还包括:
    将所述异常分拣区域与所述分拣场地中的预设分拣区域进行匹配,得到所述异常分拣区域所属的目标分拣区域,以及所述目标分拣区域对应的目标分拣线;
    向目标终端发生所述目标分拣线对应的告警信息。
  11. 根据权利要求1至10中任一项所述的确定异常分拣区域的方法,其特征在于,所述获取分拣场地的目标视频,包括:
    获取所述分拣场地中多个视频获取装置各自拍摄得到的第一初始视频和第二初始视频;
    检测得到所述第一初始视频中的第一分拣线,以及所述第二初始视频中的第二分拣线;
    在所述第一分拣线和所述第二分拣线中包含同一个分拣子线,并且所述分拣子线在所述第一分拣线和所述第二分拣线中的至少一个分拣线中存在不完整的情况下,对所述第一初始视频中的视频帧和所述第二初始视频中的视频帧进行拼接,确定拼接后的视频帧为目标视频。
  12. 一种确定异常分拣区域的装置,其特征在于,包括:
    获取单元,用于获取分拣场地的目标视频;
    提取单元,用于提取所述目标视频中的全局动作特征,以及所述目标视频中的远景动作特征,其中,所述远景动作特征包括所述目标视频中的远景动作信息;
    确定单元,用于根据所述全局动作特征和所述远景动作特征,确定所述分拣场地中的异常分拣区域。
  13. 一种电子设备,其特征在于,包括处理器、存储器以及存储于所述存储器中并能够在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1至11任一项所述的确定异常分拣区域的方法中的步骤。
  14. 一种存储介质,其特征在于,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1至11任一项所述的确定异常分拣区域的方法中的步骤。
PCT/CN2023/119451 2022-08-16 2023-09-18 确定异常分拣区域的方法、装置、电子设备及存储介质 WO2024037660A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210979392.5A CN117671548A (zh) 2022-08-16 2022-08-16 异常分拣检测方法、装置、电子设备及存储介质
CN202210979392.5 2022-08-16

Publications (1)

Publication Number Publication Date
WO2024037660A1 true WO2024037660A1 (zh) 2024-02-22

Family

ID=89940781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/119451 WO2024037660A1 (zh) 2022-08-16 2023-09-18 确定异常分拣区域的方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117671548A (zh)
WO (1) WO2024037660A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308885A (zh) * 2019-07-29 2021-02-02 顺丰科技有限公司 基于光流的暴力抛扔检测方法、装置、设备和存储介质
CN112507760A (zh) * 2019-09-16 2021-03-16 杭州海康威视数字技术股份有限公司 暴力分拣行为的检测方法、装置及设备
CN112668410A (zh) * 2020-12-15 2021-04-16 浙江大华技术股份有限公司 分拣行为检测方法、系统、电子装置和存储介质
CN114663793A (zh) * 2020-12-04 2022-06-24 丰田自动车株式会社 目标行为识别方法及装置、存储介质、终端

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308885A (zh) * 2019-07-29 2021-02-02 顺丰科技有限公司 基于光流的暴力抛扔检测方法、装置、设备和存储介质
CN112507760A (zh) * 2019-09-16 2021-03-16 杭州海康威视数字技术股份有限公司 暴力分拣行为的检测方法、装置及设备
CN114663793A (zh) * 2020-12-04 2022-06-24 丰田自动车株式会社 目标行为识别方法及装置、存储介质、终端
CN112668410A (zh) * 2020-12-15 2021-04-16 浙江大华技术股份有限公司 分拣行为检测方法、系统、电子装置和存储介质

Also Published As

Publication number Publication date
CN117671548A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US11423695B2 (en) Face location tracking method, apparatus, and electronic device
CN109858371B (zh) 人脸识别的方法及装置
CN110610510B (zh) 目标跟踪方法、装置、电子设备及存储介质
CN109815843B (zh) 图像处理方法及相关产品
CN110853033B (zh) 基于帧间相似度的视频检测方法和装置
CN108256404B (zh) 行人检测方法和装置
CA3077517A1 (en) Method and system for classifying an object-of-interest using an artificial neural network
CN109727275B (zh) 目标检测方法、装置、系统和计算机可读存储介质
EP4035070B1 (en) Method and server for facilitating improved training of a supervised machine learning process
CN111797826B (zh) 大骨料集中区域检测方法和装置及其网络模型训练方法
CN109783680B (zh) 图像推送方法、图像获取方法、装置及图像处理系统
CN111881849A (zh) 图像场景检测方法、装置、电子设备及存储介质
CN110942456B (zh) 篡改图像检测方法、装置、设备及存储介质
CN113255685A (zh) 一种图像处理方法、装置、计算机设备以及存储介质
CN114078127B (zh) 物件的缺陷检测及计数方法、装置、设备和存储介质
CN115103120A (zh) 拍摄场景检测方法、装置、电子设备和存储介质
CN109961103B (zh) 特征提取模型的训练方法、图像特征的提取方法及装置
CN114663871A (zh) 图像识别方法、训练方法、装置、系统及存储介质
CN112668675B (zh) 一种图像处理方法、装置、计算机设备及存储介质
CN116824641B (zh) 姿态分类方法、装置、设备和计算机存储介质
WO2024037660A1 (zh) 确定异常分拣区域的方法、装置、电子设备及存储介质
CN114038067B (zh) 煤矿人员行为检测方法、设备及存储介质
CN115393755A (zh) 视觉目标跟踪方法、装置、设备以及存储介质
CN115546221A (zh) 一种钢筋计数方法、装置、设备及存储介质
CN112819953B (zh) 三维重建方法、网络模型训练方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854555

Country of ref document: EP

Kind code of ref document: A1