CN111160314A

CN111160314A - Violence sorting identification method and device

Info

Publication number: CN111160314A
Application number: CN202010005394.5A
Authority: CN
Inventors: 刘永霞; 汪建新; 吴明辉
Original assignee: Miaozhen Information Technology Co Ltd
Current assignee: Miaozhen Information Technology Co Ltd
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2020-05-15
Anticipated expiration: 2040-01-03
Also published as: CN111160314B

Abstract

The application provides a violent sorting identification method and a violent sorting identification device, wherein the method comprises the following steps: firstly, acquiring video information in the article sorting process; then, inputting the video information into a pre-trained violence sorting identification model, and determining a violence sorting identification result corresponding to the video information; then, under the condition that the violent sorting identification result of the video information is violent sorting, determining the retention time of the articles in the video information in the air; and finally, determining a final violent sorting identification result of the video information based on the retention time. In the process, the video information in the article sorting process is acquired, so that the violent sorting video information is determined, the violent sorting behavior is finally determined based on the retention time of the articles in the violent sorting video in the air, and the accuracy of identifying the violent sorting behavior is greatly improved.

Description

Violence sorting identification method and device

Technical Field

The application relates to the technical field of monitoring, in particular to a violence sorting identification method and device.

Background

Along with the continuous increase of the online shopping amount of people, the load of the express industry is gradually increased. Especially, in some online shopping peak periods or holidays, the express delivery amount is greatly increased, a plurality of news for violently sorting and transporting express are generated, and adverse effects to a certain extent are caused to the express industry and even online shopping websites.

At present, most judgment of express violence sorting action depends on manual work, the cost is high, and the judgment accuracy cannot be guaranteed.

Therefore, how to accurately identify violent sorting behaviors becomes an urgent problem to be solved.

Disclosure of Invention

In view of this, an object of the present application is to provide a method and an apparatus for identifying violent sorting, which improve the accuracy of identifying violent sorting behavior and reduce the occurrence of violent sorting behavior.

In a first aspect, an embodiment of the present application provides a method for identifying violent sorting, including:

acquiring video information corresponding to each article in a plurality of article sorting processes; the video information corresponding to each article comprises a plurality of video images;

sequentially inputting a plurality of video images corresponding to each article into a pre-trained violent sorting identification model, and determining violent sorting identification results corresponding to each article;

in the case that the violent sorting identification result of any article is violent sorting, determining the residence time of the article in the air based on the video information of the article;

and determining a final violent sorting identification result of the article based on the residence time.

In an optional embodiment, the acquiring video information in the item sorting process includes:

acquiring original video information of a plurality of articles in a sorting process;

and sequentially identifying the articles of each frame of image in the original video information, and intercepting the video information corresponding to each article in the sorting process from the original video information based on the article identification result.

In an optional implementation, the intercepting, from the original video information, the video information corresponding to each article in a sorting process includes:

intercepting original videos corresponding to the articles in the sorting process from the original video information;

and carrying out interval sampling on video images in the original videos corresponding to the objects respectively, and obtaining video information corresponding to the objects respectively based on a plurality of video images obtained by the interval sampling.

In an alternative embodiment, the sequentially identifying the article for each frame of image in the video information includes:

sequentially inputting the frames of images into a pre-trained first recognition model according to the sequence of the time stamps of the frames of images in the video information, and determining the position information of the article in the frames of images;

the intercepting the video information respectively corresponding to each article in the sorting process from the video information based on the article identification result comprises the following steps:

determining images respectively corresponding to the articles in the sorting process based on the position information of the articles in the images of each frame;

for each article, video information corresponding to the article is generated based on the image corresponding to the article when the article is sorted.

In an alternative embodiment, the behavior recognition model includes one or more of:

a recurrent neural network RNN, a long short term memory network LSTM, and a gated recurrent unit GRU.

In an alternative embodiment, the determining the residence time of the article in the video information in the air comprises:

sequentially inputting each frame of image in the video information into a pre-trained second recognition model, and acquiring article and human hand recognition results corresponding to each frame of image in the video information;

determining a first target image when the human hand is separated from the article and a second target image when the article falls to the ground from each frame image in the video information based on the article and the human hand recognition result corresponding to each frame image;

determining the retention time based on timestamps of the first target image and the second target image.

In an optional embodiment, the determining a final violent sorting identification result of the video information based on the retention time includes:

comparing the residence time with a preset residence time threshold;

when the retention time is larger than the retention time threshold, determining that the violent sorting identification result of the video information is violent sorting;

and when the retention time is less than or equal to the retention time threshold, determining that the violent sorting identification result of the video information is non-violent sorting.

In a second aspect, an embodiment of the present application further provides an apparatus for identifying violent sorting, where the apparatus for identifying violent sorting includes: the device comprises an acquisition module, a first determination module, a second determination module and a third determination module, wherein:

the acquisition module is used for acquiring video information corresponding to each article in the sorting process of the articles; the video information corresponding to each article comprises a plurality of video images;

the first determining module is used for sequentially inputting a plurality of video images corresponding to each article into a pre-trained violence sorting identification model and determining violence sorting identification results corresponding to each article;

the second determining module is used for determining the residence time of any article in the air based on the video information of the article under the condition that the violent sorting identification result of the article is violent sorting;

and the third determining module is used for determining a final violent sorting identification result of the article based on the residence time.

In an optional implementation, the obtaining module, when obtaining the video information in the article sorting process, is specifically configured to:

In an optional implementation manner, when the video information corresponding to each article in the sorting process is intercepted from the original video information, the obtaining module is configured to:

In an optional implementation manner, the obtaining module, when performing article identification on each frame image in the video information in sequence, is configured to:

In an alternative embodiment, the second determining module, when determining the residence time of the article in the air in the video information, is configured to:

In an optional implementation, the third determining module, when determining the final violent sorting identification result of the video information based on the retention time, is specifically configured to:

comparing the residence time with a preset residence time threshold;

In a third aspect, an embodiment of the present application further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect or any possible implementation of the first aspect.

In a fourth aspect, this application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

According to the recognition method and device for violent sorting, after video information corresponding to each article in the process of sorting the articles is obtained, a plurality of video images corresponding to each article are sequentially input into a pre-trained violent sorting recognition model, and violent sorting recognition results corresponding to each article are determined; and then determining the residence time of any article in the air based on the video information of the article in the case that the violent sorting identification result of the article is violent sorting, and determining the final violent sorting identification result of the article based on the residence time. In the process, the video information of the reserved sorting is firstly identified through the violent sorting identification model, and then the final violent sorting identification result is determined based on the retention time of the articles in the air, so that the accuracy of identifying violent sorting behaviors is greatly improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating an identification method of violent sorting according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram illustrating an identification apparatus for violent sorting according to an embodiment of the present disclosure;

fig. 3 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In consideration of the problems that in the prior art, recognition of violent sorting behaviors is mostly completed manually, the cost is high, and the efficiency and the accuracy are low.

Based on this, in the recognition method and apparatus for violent sorting provided in the embodiments of the present application, after video information corresponding to each article in a process of sorting a plurality of articles is acquired, a plurality of video images corresponding to each article are sequentially input into a violent sorting recognition model trained in advance, and a violent sorting recognition result corresponding to each article is determined; and then determining the residence time of any article in the air based on the video information of the article in the case that the violent sorting identification result of the article is violent sorting, and determining the final violent sorting identification result of the article based on the residence time. In the process, the video information of the reserved sorting is firstly identified through the violent sorting identification model, and then the final violent sorting identification result is determined based on the retention time of the articles in the air, so that the accuracy of identifying violent sorting behaviors is greatly improved.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solution proposed by the present application to the above-mentioned problems in the following should be the contribution of the inventor to the present application in the process of the present application.

The technical solutions in the present application will be described clearly and completely with reference to the drawings in the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

For the convenience of understanding of the present embodiment, a detailed description will be first given of an identification method for violent sorting disclosed in the embodiments of the present application, and an execution subject of the identification method for violent sorting provided in the embodiments of the present application is generally a computer information retrieval system. In particular, the execution subject may also be other computer devices.

Example one

Referring to fig. 1, a flowchart of an identification method for violent sorting according to an embodiment of the present application is shown, where the method includes steps S101 to S104, where:

s101: acquiring video information corresponding to each article in a plurality of article sorting processes; the video information corresponding to each article comprises a plurality of video images;

s102: sequentially inputting a plurality of video images corresponding to each article into a pre-trained violent sorting identification model, and determining violent sorting identification results corresponding to each article;

s103: in the case that the violent sorting identification result of any article is violent sorting, determining the residence time of the article in the air based on the video information of the article;

s104: and determining a final violent sorting identification result of the article based on the residence time.

The following describes each of the above-mentioned steps S101 to S104 in detail.

Firstly, the method comprises the following steps: in S101, in the article sorting process, the following manner may be adopted to obtain the video information collected in the article sorting process:

acquiring original video information in a sorting process of a plurality of articles; and sequentially identifying the articles of each frame of image in the original video information, and intercepting the video information corresponding to each article in the sorting process from the original video information based on the article identification result.

For example, a monitoring camera is usually installed at some express company or some express delivery point, on one hand, the monitoring camera is used for ensuring that express is not stolen, and once a loss event occurs, some video clues can be provided; on the other hand, the monitoring camera is also used for monitoring whether the courier has the behavior of violent sorting in the sorting process. For example, the surveillance video may be directly used as the original video information.

The video information collected in the article sorting process is intercepted from the whole monitoring video information to obtain a plurality of frame images; for example: detecting whether the monitoring video information corresponds to the same object in real time in the process of acquiring the video information; and when detecting that the monitoring video information corresponds to different articles, intercepting a preset number of multi-frame images before and/or after the images as video information according to the detected images corresponding to the different articles.

In addition, in another embodiment, for example, the video information corresponding to each article in the sorting process may be intercepted from the original video information in the following manner:

intercepting original videos corresponding to the articles in the sorting process from the original video information; and carrying out interval sampling on video images in the original videos corresponding to the objects respectively, and obtaining video information corresponding to the objects respectively based on a plurality of video images obtained by the interval sampling.

In this, the frequency of interval sampling can be set according to actual need, and then reduces the image processing volume of model, promotes detection efficiency.

After the video information is obtained, sequentially inputting each frame of image into a pre-trained first recognition model according to the sequence of the time stamp of each frame of image in the video information, and determining the position information of the article in each frame of image;

The video information is intercepted from the video information; for example: in the process of acquiring the video information, synchronously and sequentially inputting each frame of image in the video information into a violence sorting identification model, and detecting whether the violence sorting video information exists in the video information in real time; and when the violent sorting video information exists in the video information, intercepting a preset number of video images before and/or after the image as the violent sorting video information according to the image when the result is detected.

For example, an article corresponds to a piece of video information, and the corresponding state of the article is determined based on the position information of the article in each frame of the image, for example: when the article is positioned in a human hand, the article is considered to be at the beginning of a section of video information; when the item is located on the ground, the item is considered to be at the end of a segment of video information. Based on the state corresponding to the article, the image corresponding to each article is determined when each article is sorted, and the video information corresponding to the article is generated.

II, secondly: in S102, the video information is input to a violent sorting recognition model trained in advance, and a violent sorting recognition result corresponding to the video information is determined.

Wherein the violence sorting recognition model is trained in the following way:

acquiring a plurality of sample videos and label information whether to violently sort the articles or not corresponding to each sample video; each sample video comprises a plurality of frames of sample images;

inputting the sample video into a behavior recognition model aiming at each sample video to obtain a violence sorting recognition result corresponding to the sample video;

training the behavior recognition model based on the violence sorting recognition result corresponding to the sample video and the label information to obtain a violence sorting recognition model.

Illustratively, based on a plurality of sample videos obtained in advance, label information whether to violently sort the articles or not, namely a violent sorting sample video and a non-violent sorting sample video, is determined for each sample video.

Wherein each sample video comprises a plurality of frames of sample images.

For example, a plurality of sample videos obtained in advance and label information whether to violently sort the articles or not corresponding to the sample videos are input into the behavior recognition model, and a violence sorting recognition result corresponding to the sample videos is obtained.

Wherein the behavior recognition model comprises one or more of the following:

any one of a Recurrent Neural Network (RNN), a long-short Term Memory (LSTM), and a Gated Recurrent Unit (GRU).

Illustratively, based on a plurality of sample videos obtained in advance, the relative spatiotemporal features of the three-dimensional posture of the human body of each frame of sample image in the violent sample video and the non-violent sample video are respectively extracted, for example: joint pair distance characteristic, joint-to-bone distance characteristic, joint-to-plane distance characteristic, bone pair included angle characteristic, bone-to-plane included angle characteristic, plane-to-plane included angle characteristic, joint rotation characteristic and the like.

And obtaining a violence sorting identification result corresponding to the sample video by using the relative time-space characteristics of the human body three-dimensional posture of each frame of sample image in the violent sample video and the non-violent sample video in the plurality of sample videos.

Training the behavior recognition model based on the obtained violence sorting recognition result corresponding to the sample video and the label information corresponding to the sample video to obtain a violence sorting recognition model for the subsequent recognition process of the video information.

Thirdly, the method comprises the following steps: in step S103, if the result of the violent sorting identification of any article is violent sorting, the retention time of the article in the air is determined based on the video information of the article.

For example, if it is determined in step S102 that the result of the violent sorting identification of the video information is violent sorting, the multi-frame images included in the violent sorting video information in the video information are sorted.

And obtaining the retention time of the article based on the first target image when the human hand is separated from the article, the second target image when the article falls to the ground and the timestamps corresponding to the first target image and the second target image respectively.

For example, the first recognition model and the second recognition model may be the same recognition model, that is, in a case that the result of the violent sorting recognition of the video information is violent sorting, the first target image when the person is separated from the article, the second target image when the article falls to the ground, and the timestamps corresponding to the first target image and the second target image, respectively, may be determined directly based on the result of the recognition of the first model, and the residence time of the article in the air in the video information may be determined based on the timestamps corresponding to the first target image and the second target image, respectively.

Fourthly, the method comprises the following steps: in the above S104, the final violent sorting recognition result of the video information is determined based on the retention time obtained in the step S103.

Wherein the residence time is compared to a preset residence time threshold.

For example, a sorting residence time threshold value may be determined according to actual conditions, and the residence time may be compared with the residence time threshold value.

For example, when the staying time of an article away from a human hand to the article landing is relatively long, the trajectory of the article during the sorting process may be considered to be not in accordance with preset requirements, such as: the article may be thrown relatively high or relatively far away, for example, in a hard sort.

For example, after the article is determined to have violent sorting, sorting video information of the article during sorting can be stored, and persons responsible for the article can be tracked, so that the aim of reducing violent sorting behaviors is fulfilled.

After video information corresponding to each article in the process of sorting the articles is acquired, a plurality of video images corresponding to each article are sequentially input into a pre-trained violent sorting identification model, and violent sorting identification results corresponding to each article are determined; and then determining the residence time of any article in the air based on the video information of the article in the case that the violent sorting identification result of the article is violent sorting, and determining the final violent sorting identification result of the article based on the residence time. In the process, the video information of the reserved sorting is firstly identified through the violent sorting identification model, and then the final violent sorting identification result is determined based on the retention time of the articles in the air, so that the accuracy of identifying violent sorting behaviors is greatly improved.

Example two

Referring to fig. 2, a schematic diagram of an identification apparatus for violent sorting according to a second embodiment of the present application is shown, where the apparatus includes: an obtaining module 21, a first determining module 22, a second determining module 23, and a third determining module 24, wherein:

the obtaining module 21 is configured to obtain video information corresponding to each article in a sorting process of a plurality of articles; the video information corresponding to each article comprises a plurality of video images;

the first determining module 22 is configured to sequentially input a plurality of video images corresponding to each article into a pre-trained violent sorting recognition model, and determine violent sorting recognition results corresponding to each article;

the second determining module 23 is configured to determine, based on the video information of any article, a residence time of the article in the air if the result of the violent sorting identification of the article is violent sorting;

the third determination module 24 is configured to determine a final violent sorting identification result of the article based on the retention time.

In an optional embodiment, the obtaining module 21, when obtaining the video information in the article sorting process, is specifically configured to:

In an optional implementation manner, when the video information corresponding to each article in the sorting process is intercepted from the original video information, the obtaining module 21 is configured to:

In an optional implementation manner, the obtaining module 21, when performing article identification on each frame image in the video information in sequence, is configured to:

In an alternative embodiment, the second determining module 23, when determining the residence time of the article in the air in the video information, is configured to:

In an optional embodiment, the third determining module 24, when determining the final violent sorting identification result of the video information based on the retention time, is specifically configured to:

comparing the residence time with a preset residence time threshold;

EXAMPLE III

An embodiment of the present application further provides a computer device 300, as shown in fig. 3, which is a schematic structural diagram of the computer device 300 provided in the embodiment of the present application, and includes:

a processor 31, a memory 32, and a bus 33; the storage 32 is used for storing execution instructions and includes a memory 321 and an external storage 322; the memory 321 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 31 and the data exchanged with the external memory 322 such as a hard disk, the processor 31 exchanges data with the external memory 322 through the memory 321, and when the computer device 300 operates, the processor 31 communicates with the memory 32 through the bus 33, so that the processor 31 executes the following instructions in a user mode:

In a possible implementation, the instructions executed by the processor 31 for obtaining video information during the sorting of the articles include:

In a possible embodiment, the processor 31 executes instructions to intercept, from the original video information, the video information corresponding to each article in the sorting process, including:

In a possible embodiment, the instructions executed by the processor 31 for sequentially performing item identification on each frame of image in the video information include:

In one possible embodiment, the processor 31 executes instructions in which the behavior recognition model includes one or more of the following:

In one possible embodiment, the instructions executed by processor 31 for determining the residence time of the article in the air in the video information include:

In one possible embodiment, the instructions executed by the processor 31 for determining the final violent sorting identification result of the video information based on the retention time include:

comparing the residence time with a preset residence time threshold;

The present application also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to execute the steps of the identification method of violent sorting described in the above method embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for identifying violent sorting, comprising:

2. The identification method according to claim 1, wherein said obtaining video information during the sorting of the articles comprises:

3. The identification method according to claim 2, wherein said intercepting the video information corresponding to each article in the sorting process from the original video information comprises:

4. The identification method according to claim 2 or 3, wherein the sequentially identifying the article for each frame of image in the video information comprises:

5. The recognition method of claim 1, wherein the behavior recognition model comprises one or more of:

6. The identification method of claim 1, wherein said determining a residence time of an item in said video information in air comprises:

7. The method of claim 1, wherein determining the final violent sorting identification of the video information based on the retention time comprises:

comparing the residence time with a preset residence time threshold;

8. An apparatus for identifying violent sorting, comprising:

9. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the identification method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the identification method according to one of the claims 1 to 7.