CN111553180B

CN111553180B - Garment counting method, garment counting method and device and electronic equipment

Info

Publication number: CN111553180B
Application number: CN201910111446.4A
Authority: CN
Inventors: 张民英; 神克乐; 龙一民; 徐博文; 吴剑; 胡露露; 陈新; 尹宁; 刘志敏; 胡旭; 袁炜
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-12
Filing date: 2019-02-12
Publication date: 2023-08-29
Anticipated expiration: 2039-02-12
Also published as: CN111553180A; WO2020164401A1; TW202030642A

Abstract

The embodiment of the application provides a garment counting method, a garment counting device and electronic equipment. The method comprises the following steps: dividing the video into a sequence of video frames; performing target detection processing on each video frame in the video frame sequence to generate a feature vector; calculating the motion completion confidence coefficient of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to a plurality of video frames, wherein the motion completion confidence coefficient is the probability of a first target object in the video frame completing the motion of a second target object; and counting according to the video frames with the motion completion confidence coefficient higher than a preset threshold value. According to the embodiment of the application, the video frame is analyzed, and the probability of the completion of the action is obtained according to the parameters such as the distance and the confidence of the target object, so that whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold value, the false counting can be reduced or avoided, and the counting accuracy is improved.

Description

Garment counting method, garment counting method and device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a garment counting method, a garment counting device, and an electronic device.

Background

To facilitate the digitization process of industrial production, for non-standard small factories with low standardization and automation, a low-invasion mode is generally adopted to correlate production flows. For example, on the premise of not changing the production habit of workers, various data in the production flow are collected by arranging equipment such as cameras, so that various links of the production flow are associated, orders can be completed as expected, and the production efficiency of factories is improved.

For some links in the production process involving workload counting, for example, in a bagging-packaging scene of a clothing factory, a video of a worker operation is generally collected by a camera, and a video frame is analyzed by a target detection algorithm to identify target objects (workers and operation objects), thereby confirming the start and end of the worker operation and thus counting the workload of the worker. In this process, although completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting due to erroneous judgment, it is necessary to manually set various thresholds such as a counting interval threshold between each workload, or a distance threshold between target objects, etc.

In the process of implementing the present application, the inventors have found that at least the following problems exist in the prior art: in the prior art, the threshold is usually set manually based on experience, so that the threshold set manually can only have approximate rationality, cannot be widely applied to various production specific scenes, and cannot guarantee the accuracy of the counting result.

Disclosure of Invention

The embodiment of the application provides a garment counting method, a garment counting device and electronic equipment, which are used for solving the defect that the accuracy of a counting result cannot be ensured by manually setting a threshold value for counting in the prior art.

To achieve the above object, an embodiment of the present application provides a garment counting method, including:

processing video frames in the video to acquire distance information of an operator and clothing, first confidence coefficient of the operator and second confidence coefficient of the clothing;

inputting the distance information of the operator and the clothes, the first confidence coefficient of the operator and the second confidence coefficient of the clothes into a clothes counting model, and calculating the packaging completion confidence coefficient of each video frame, wherein the packaging completion confidence coefficient is the probability that the operator finishes the packaging action of the clothes in the video frame;

and performing clothing packing counting according to the video frames with the packing completion confidence coefficient higher than a preset threshold value.

The embodiment of the application also provides a counting method, which comprises the following steps:

performing target detection processing on video frames in video to generate feature vectors, wherein the feature vectors at least comprise: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object;

calculating the motion completion confidence coefficient of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to a plurality of video frames, wherein the motion completion confidence coefficient is the probability of a first target object in the video frame completing the motion of a second target object;

and counting according to the video frames with the motion completion confidence coefficient higher than a preset threshold value.

The embodiment of the application also provides a counting device, which comprises:

the target detection module is used for carrying out target detection processing on video frames in the video and generating feature vectors, wherein the feature vectors at least comprise: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object;

the computing module is used for computing the action completion confidence coefficient of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to a plurality of video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frame completes the action of a second target object;

and the counting module is used for counting according to the video frames with the motion completion confidence coefficient higher than a preset threshold value.

The embodiment of the application also provides electronic equipment, which comprises:

a memory for storing a program;

a processor for running the program stored in the memory for:

According to the garment counting method, the garment counting method and the garment counting device and the electronic equipment, the video frames are analyzed, and the probability of completing the actions is obtained according to the distance, the confidence and other parameters of the target object, so that whether the actions are completed or not is judged, the completed actions can be counted without manually setting a threshold value, false counting can be reduced or avoided, and the counting accuracy is improved.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a system block diagram of a service system provided in an embodiment of the present application;

FIG. 2 is a flow chart of one embodiment of a counting method provided by the present application;

FIG. 3 is a flowchart of another embodiment of a counting method according to the present application;

FIG. 4 is a schematic diagram of a motion completion count model according to an embodiment of the present application;

FIG. 5 is a flow chart of a counting method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of a counting device according to the present application;

FIG. 7 is a schematic diagram of another embodiment of a counting device according to the present application;

fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the related art, for some links related to workload counting in a production process, for example, in a bagging-packaging scene of a clothing factory, a video of a worker operation is generally collected by a camera, and a video frame is analyzed by a target detection algorithm to identify target objects (workers and operation objects), thereby confirming the start and end of the worker operation and thus counting the workload of the worker. In this process, although completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting due to erroneous judgment, it is necessary to manually set various thresholds such as a counting interval threshold between each workload, or a distance threshold between target objects, etc. These thresholds are usually set manually based on experience, so that the manually set thresholds can only have approximate rationality, cannot be applied to various production specific scenes universally, and cannot guarantee the accuracy of the counting result. For example, assuming that the average time for a worker to pack a garment is 15 seconds, typically ranging from 10 to 20 seconds, then a count interval threshold of 10 seconds is typically set by the worker, and the time interval between two count operations exceeds the threshold, the second pack being counted. If it takes only 8 seconds for a worker to pack a garment with a higher efficiency (assuming that the manually set threshold is 10 seconds), then the count of the second packed garment is cancelled, resulting in a significant drop in the accuracy of the algorithm.

Therefore, the application provides a counting scheme, which mainly adopts the following principle: the video frames are analyzed, and the action completion confidence coefficient in each video frame, namely the probability of the target object completing the action, is obtained according to the distance, the confidence coefficient and other parameters of the target object, and whether the action is completed is judged through the action completion confidence coefficient, so that whether the completed action is counted is determined, a threshold value is not required to be set manually, false counting can be reduced or avoided, and the counting accuracy is improved.

The method provided by the embodiment of the application can be applied to any business system with data processing capability. Fig. 1 is a system block diagram of a service system provided by an embodiment of the present application, and the structure shown in fig. 1 is only one example of a service system to which the technical solution of the present application can be applied. As shown in fig. 1, the service system includes a counting device. The device comprises: the object detection module, the calculation module, and the counting module may be used to perform the process flows shown in fig. 2, 3, and 5 described below. In the service system, first, a video is divided into a sequence of video frames; then, performing object detection processing on each video frame in the video frame sequence to generate a feature vector, wherein the feature vector at least comprises: distance information of the first target object (operator) and the second target object (operation object), a first confidence of the first target object, and a second confidence of the second target object; the method comprises the steps that a first feature vector sequence is formed by feature vectors corresponding to a plurality of video frames, and the motion completion confidence of each video frame is calculated according to the first feature vector sequence, namely, the probability that a first target object in the video frame completes the motion of a second target object is calculated; and finally, counting the video frames with the motion completion confidence coefficient higher than a preset threshold value. The counting can be performed without manually setting a threshold value, so that the false counting can be reduced or avoided, and the counting accuracy is improved.

The foregoing embodiments are illustrative of the technical principles and exemplary application frameworks of embodiments of the present application, and the detailed description of specific technical solutions of the embodiments of the present application will be further described below by means of a plurality of embodiments.

Example 1

Fig. 2 is a flowchart of an embodiment of a counting method according to the present application, where an execution body of the method may be the service system, or may be various terminals or server devices with data processing capabilities, or may be devices or chips integrated on these devices. As shown in fig. 2, the counting method includes the steps of:

s201, performing target detection processing on video frames in the video to generate feature vectors.

In the embodiment of the application, after the video of the action of the first target object aiming at the second target object is obtained, the video is divided into a video frame sequence. Then, performing target detection processing on each video frame, obtaining a target object in the video frame, and generating a feature vector of the video frame, wherein the feature vector at least comprises: distance information of the first target object and the second target object, a first confidence of the first target object and a second confidence of the second target object.

Taking a scene of packing clothes by a worker in a factory as an example, in the scene, a first target object is a worker (operator); the second target object is a garment (clothing); the first confidence of the first target object is the probability that the worker is packaging clothes is identified in the current video frame; the second confidence level of the second target object is the probability that the clothes are identified to be packaged in the current video frame. The above data may be acquired by performing object detection on each video frame in the video of the worker packing the clothing, and then, the data of each video frame is composed into a feature vector for that video frame.

S202, calculating the action completion confidence of each video frame according to a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames.

In the embodiment of the application, the motion completion confidence of the video frame is the probability that the first target object in the video frame completes the motion of the second target object. Taking the worker packing clothes as an example, the action completion confidence is the probability that the worker completes the packing action. By processing the feature vectors of the respective video frames, for example, a technique such as a multi-layer neural network can be used, and the motion completion confidence (package completion confidence) of each video frame can be calculated.

S203, counting according to the video frames with the motion completion confidence coefficient higher than a preset threshold.

In the embodiment of the present application, a probability threshold may be preset, and when the probability that the first target object completes the action on the second target object in a certain video frame is higher than the preset threshold, the count is incremented by one. That is, if N video frames satisfying the above condition are calculated in one video, the number of times the first target object completes the action on the second target object is N.

According to the counting method provided by the embodiment of the application, the video frame is analyzed, and the probability of the completion of the action is obtained according to the parameters such as the distance and the confidence of the target object, so that whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold value, the false counting can be reduced or avoided, and the counting accuracy is improved.

Example two

Fig. 3 is a flowchart of another embodiment of the counting method provided by the present application. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, the counting method provided in this embodiment may further include the following steps:

s301, dividing the video into a sequence of video frames.

In the embodiment of the application, the whole video of a preset time period (for example, one day, or a plurality of hours, etc.) can be divided into a video frame sequence, and then the video frame sequence is input into a motion completion counting model (clothing counting model) trained in advance for counting.

S302, performing target detection processing on each video frame in the video frame sequence to generate a feature vector.

Fig. 4 is a schematic structural diagram of an action completion count model according to an embodiment of the present application. As shown in fig. 4, after the video frame sequence is input into the motion completion count model, the model first performs object detection processing on the video frame sequence, thereby generating feature vectors of the respective video frames, such as feature vector 1, feature vectors 2, … …, and feature vector n in fig. 4.

S303, processing a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames by using a cyclic neural network to generate a second feature vector sequence containing the context relation of each video frame.

S304, calculating the motion completion confidence of each video frame according to the second feature vector sequence.

In the embodiment of the application, the confidence coefficient association calculation can be performed by combining the context content of the video frame so as to improve the accuracy. Thus, the first sequence of feature vectors may be processed using a recurrent neural network to generate a second sequence of feature vectors that includes the context of each video frame. The second sequence of feature vectors is then input to a confidence computation module to compute a confidence that the motion of each video frame is complete. Specifically, the confidence calculation module may be obtained by inputting training data into the multi-layer perceptron for training during a model training phase.

S305, counting according to the video frames with the motion completion confidence coefficient higher than a preset threshold.

According to the counting method provided by the embodiment of the application, the video frames are analyzed, and the probability of completion of the actions is obtained according to the parameters such as the distance and the confidence of the target object and the context relation of each video frame, so that whether the actions are completed or not is judged more accurately, the completed actions can be counted without manually setting a threshold value, the false counting can be reduced or avoided, and the counting accuracy is improved.

Example III

Fig. 5 is a flowchart of a counting method according to another embodiment of the present application. As shown in fig. 5, on the basis of the embodiment shown in fig. 2 or fig. 3, the counting method provided by the embodiment of the present application may further include the following steps:

s501, training video data is acquired.

In the embodiment of the application, before the counting model is completed by using the actions, the model can be trained by acquiring training video data. The training video data may include a feature vector of each of the plurality of training video frames and an action completion identification (package completion identification) labeled for each training video frame, the action completion identification identifying whether an action in the training video frame is completed. Specifically, the action completion flag marked for each training video frame refers to whether an action is completed in the video frame, for example, for a clothing packing scene, if packing is completed, the action completion flag may be marked as 1 (the video frame is used for counting plus one), and if packing is not completed, the action completion flag is marked as 0 (the video frame cannot be used for counting plus one).

S502, calculating the action completion confidence of each training video frame according to a third feature vector sequence formed by feature vectors corresponding to a plurality of training video frames.

In the embodiment of the application, the process of calculating the motion completion confidence of the training video frame according to the third feature vector is the same as the process of calculating the motion completion confidence of each video frame according to the first feature vector sequence and calculating the motion completion confidence of each video frame according to the second feature vector sequence in the using process of the model.

S503, for each training video frame, returning a training result according to the action completion identification and the action completion confidence level.

Specifically, when the motion completion confidence of the training video frame is higher than a preset threshold, and the motion completion identification of the training video frame is used for identifying the motion completion in the training video frame, a training correct result is returned.

In the embodiment of the application, for a certain training video frame, when the motion completion confidence calculated by the motion completion counting model is higher than a preset threshold, the video frame can be used for counting by one after the motion completion counting model is calculated, and if at the moment, the motion completion mark is 1, the training result is correct.

In addition, when the motion completion confidence of the training video frame is not higher than a preset threshold value and the motion completion identification of the training video frame is used for identifying that the motion in the training video frame is not completed, a training correct result is returned.

In the embodiment of the application, for a certain training video frame, when the motion completion confidence calculated by the motion completion counting model is not higher than a preset threshold, the video frame is not used for counting by one after the motion completion counting model is calculated, and if at the moment, the motion completion mark is 0, the training result is correct.

Further, the counting method provided by the embodiment of the application can further include:

s504, acquiring the duty ratio of the training video frame with the motion completion confidence higher than a preset threshold value in the training video frames with the motion completion identification used for identifying the motion completion in the training video frames.

S505, when the ratio is higher than the preset ratio, the training process is ended.

In the embodiment of the application, whether the model can be used is determined by the accuracy of the motion completion count model output, for example, 1000 video frames with motion completion identification 1 are input to the model, 700 video frames with motion completion confidence of higher than a preset threshold value, namely, the ratio of the 1000 video frames to be 70% are output by the model, if the ratio of the video frames to be higher than the preset ratio, the model is trained, and therefore, the training process can be ended. And counting according to the input video by using the trained action completion counting model.

According to the counting method provided by the embodiment of the application, the training video data are acquired to train the motion completion counting model, and the training result is returned according to the motion completion identification and the motion completion confidence of each training video frame, so that the counting accuracy is improved.

Example IV

FIG. 6 is a schematic diagram of an embodiment of a counting device according to the present application, which can be used to perform the method steps shown in FIG. 2. As shown in fig. 6, the counting device may include: an object detection module 62, a calculation module 63 and a counting module 64.

The object detection module 62 is configured to perform object detection processing on a video frame in a video, and generate a feature vector, where the feature vector at least includes: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object; the calculating module 63 is configured to calculate, according to a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames, a motion completion confidence of each video frame, where the motion completion confidence is a probability that a first target object in the video frame completes a motion to a second target object; the counting module 64 is configured to count according to the video frames with the motion completion confidence level higher than the preset threshold.

In the embodiment of the application, after the video of the action of the first target object aiming at the second target object is obtained, the video is divided into a video frame sequence. Then, the object detection module 62 performs object detection processing on each video frame in the video, acquires an object in the video frame, and generates a feature vector of the video frame. The calculating module 63 calculates the motion completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors generated by the target detecting module 62. When the motion completion confidence for a video frame is above the preset threshold, the count module 64 counts up by one.

According to the counting device provided by the embodiment of the application, the video frame is analyzed, and the probability of completing the action is obtained according to the parameters such as the distance and the confidence of the target object, so that whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold value, the false counting can be reduced or avoided, and the counting accuracy is improved.

Example five

Fig. 7 is a schematic diagram of a counting device according to another embodiment of the present application, which may be used to perform the method steps shown in fig. 3 and 5. As shown in fig. 7, on the basis of the embodiment shown in fig. 6 described above, the calculation module 63 may include: a processing unit 631, and a computing unit 632.

Wherein the processing unit 631 may be configured to process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence that includes a context of each video frame; the computing unit 632 may be configured to calculate the motion completion confidence of each video frame according to the second feature vector sequence.

In an embodiment of the present application, the calculating module 63 may be specifically configured to process the first feature vector sequence by using a multi-layer neural network, and calculate the motion completion confidence of each video frame. Specifically, in the calculation module 63, a confidence-related calculation may be performed in combination with the context content of the video frame, so as to improve accuracy. Thus, the processing unit 631 may process the first sequence of feature vectors using a recurrent neural network to generate a second sequence of feature vectors that contain the context of each video frame. Then, the calculation unit 632 calculates the motion completion confidence of each video frame from the second feature vector sequence generated by the processing unit 631.

Further, in the embodiment of the present application, the whole video of a preset period (for example, one day, or several hours, etc.) may be divided into a video frame sequence, and then the video frame sequence is input into an action completion count model trained in advance to count. Therefore, the counting device provided by the embodiment of the application can further include: model training module 71. The model training module 71 may be configured to obtain training video data, where the training video data includes a feature vector of each training video frame of the plurality of training video frames and an action completion identifier labeled for each training video frame, where the action completion identifier is configured to identify whether an action in the training video frame is complete; calculating the action completion confidence coefficient of each training video frame according to a third feature vector sequence formed by feature vectors corresponding to a plurality of training video frames; and for each training video frame, returning a training result according to the action completion identification and the action completion confidence coefficient.

Specifically, model training module 71 may include: the first return unit 711. The first returning unit 711 may be configured to return a training correct result when the motion completion confidence of the training video frame is higher than a preset threshold, and the motion completion identifier of the training video frame is used to identify that the motion is completed in the training video frame.

Further, the model training module 71 may further include: a second return unit 712. The second returning unit 712 may be configured to return a training correct result when the motion completion confidence of the training video frame is not higher than a preset threshold, and the motion completion identifier of the training video frame is used to identify that the motion in the training video frame is not complete.

In addition, the counting device provided by the embodiment of the application can further comprise: a test module 72. The test module 72 may be configured to obtain, from the motion completion identifier, a duty cycle of a training video frame having a motion completion confidence level higher than a preset threshold in the training video frames for identifying motion completion in the training video frames; and ending the training process when the ratio is higher than the preset ratio.

The functions of each module in the embodiment of the present application are detailed in the above method embodiment, and are not described herein.

Example six

The internal functions and structures of the counting device are described above, which device may be implemented as an electronic device. Fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present application. As shown in fig. 8, the electronic device includes a memory 81 and a processor 82.

A memory 81 for storing a program. In addition to the programs described above, the memory 81 may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like.

The memory 81 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A processor 82 coupled to the memory 81, executing a program stored in the memory 81 for:

performing target detection processing on video frames in the video to generate feature vectors, wherein the feature vectors at least comprise: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object;

calculating the motion completion confidence coefficient of each video frame according to a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames, wherein the motion completion confidence coefficient is the probability that a first target object in the video frame completes the motion of a second target object;

Further, as shown in fig. 8, the electronic device may further include: communication component 83, power component 84, audio component 85, display 86, and other components. Only some of the components are schematically shown in fig. 8, which does not mean that the electronic device only comprises the components shown in fig. 8.

The communication component 83 is configured to facilitate communication between the electronic device and other devices, either wired or wireless. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 83 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 83 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

A power supply assembly 84 provides power to the various components of the electronic device. The power supply components 84 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic devices.

The audio component 85 is configured to output and/or input audio signals. For example, the audio component 85 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in the memory 81 or transmitted via the communication component 83. In some embodiments, the audio component 85 further comprises a speaker for outputting audio signals.

The display 86 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A garment counting method, comprising:

processing a plurality of video frames in the video to acquire distance information of an operator and clothing, first confidence coefficient of the operator and second confidence coefficient of the clothing; wherein the first confidence is a probability of identifying in a current video frame that the operator is packing the garment; the second confidence is the probability of identifying that the garment is packaged in the current video frame;

inputting the distance information of the operator and the clothing, the first confidence coefficient of the operator and the second confidence coefficient of the clothing into a clothing counting model, and calculating the packaging completion confidence coefficient of each video frame in the plurality of video frames, wherein the packaging completion confidence coefficient is the probability that the operator finishes the packaging action on the clothing in the video frames;

2. The garment counting method of claim 1, wherein inputting the operator-to-garment distance information, the operator first confidence level, and the garment second confidence level into a garment counting model, calculating a packing completion confidence level for each of the plurality of video frames comprises:

and calculating the packing completion confidence of each video frame by using the multi-layer neural network.

3. The garment counting method of claim 2, wherein calculating the confidence of packing completion for each video frame using a multi-layer neural network comprises:

acquiring the context relation of each video frame by using a cyclic neural network;

and calculating the packing completion confidence of each video frame according to the context relation of each video frame.

4. A counting method, comprising:

performing object detection processing on a plurality of video frames in a video to generate feature vectors, wherein the feature vectors at least comprise: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object; wherein the first confidence is a probability that the first target object is identified in the current video frame as being acting on the second target object; the second confidence is the probability of identifying the second target object as being finished in the current video frame;

calculating the action completion confidence coefficient of each video frame in the plurality of video frames according to a first feature vector sequence formed by the feature vectors corresponding to the plurality of video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frames completes the action of a second target object;

5. The method according to claim 4, wherein calculating the motion completion confidence of each of the plurality of video frames from the first feature vector sequence composed of the feature vectors corresponding to the plurality of video frames comprises:

and processing the first characteristic vector sequence by using a multi-layer neural network, and calculating the motion completion confidence of each video frame.

6. The method of counting according to claim 5, wherein processing the first sequence of feature vectors using a multi-layer neural network to calculate a confidence of motion completion for each video frame comprises:

processing the first feature vector sequence by using a cyclic neural network to generate a second feature vector sequence containing the context of each video frame;

and calculating the motion completion confidence of each video frame according to the second feature vector sequence.

7. The counting method according to any one of claims 4 to 6, characterized by further comprising, before the object detection processing is performed on the video frames in the video:

acquiring training video data, wherein the training video data comprises feature vectors of each training video frame in a plurality of training video frames and action completion marks marked for each training video frame, and the action completion marks are used for marking whether actions in the training video frames are completed or not;

calculating the action completion confidence coefficient of each training video frame according to a third feature vector sequence formed by the feature vectors corresponding to the plurality of training video frames;

and for each training video frame, returning a training result according to the action completion identification and the action completion confidence level.

8. The counting method according to claim 7, wherein the returning training results for each training video frame according to the action completion identification and the action completion confidence level thereof comprises:

and when the motion completion confidence of the training video frame is higher than the preset threshold value, and the motion completion identification of the training video frame is used for identifying the motion completion in the training video frame, returning a training correct result.

9. The counting method according to claim 7, wherein the returning training results for each training video frame according to the action completion identification and the action completion confidence level thereof comprises:

and when the motion completion confidence of the training video frame is not higher than the preset threshold value, and the motion completion identification of the training video frame is used for identifying that the motion in the training video frame is not completed, returning a training correct result.

10. The counting method according to claim 8, wherein after returning a training correct result when the motion completion confidence of the training video frame is higher than the preset threshold and the motion completion identification of the training video frame is used to identify motion completion in the training video frame, further comprising:

acquiring the duty ratio of the training video frame with the motion completion confidence higher than the preset threshold value from the training video frames with the motion completion identification used for identifying the motion completion in the training video frames;

and ending the training process when the ratio is higher than a preset ratio.

11. A counting device, comprising:

the target detection module is used for carrying out target detection processing on a plurality of video frames in the video to generate feature vectors, wherein the feature vectors at least comprise: distance information of the first target object and the second target object, a first confidence coefficient of the first target object and a second confidence coefficient of the second target object; wherein the first confidence is a probability that the first target object is identified in the current video frame as being acting on the second target object; the second confidence is the probability of identifying the second target object as being finished in the current video frame;

the computing module is used for computing the action completion confidence coefficient of each video frame in the plurality of video frames according to a first feature vector sequence formed by the feature vectors corresponding to the plurality of video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frames completes the action of a second target object;

12. The counting device according to claim 11, wherein the computing module is configured to process the first sequence of feature vectors using a multi-layer neural network to compute a confidence level of motion completion for each video frame.

13. The counting device of claim 12, wherein the computing module comprises:

the processing unit is used for processing the first characteristic vector sequence by using a cyclic neural network to generate a second characteristic vector sequence containing the context relation of each video frame;

and the calculating unit is used for calculating the action completion confidence of each video frame according to the second characteristic vector sequence.

14. The counting device according to any one of claims 11 to 13, further comprising:

the model training module is used for acquiring training video data, wherein the training video data comprises a feature vector of each training video frame in a plurality of training video frames and an action completion mark marked for each training video frame, and the action completion mark is used for marking whether the action in the training video frame is completed or not; calculating the action completion confidence coefficient of each training video frame according to a third feature vector sequence formed by the feature vectors corresponding to the plurality of training video frames; and for each training video frame, returning a training result according to the action completion identification and the action completion confidence level.

15. The counting apparatus of claim 14, wherein the model training module comprises:

the first return unit is used for returning a training correct result when the motion completion confidence of the training video frame is higher than the preset threshold value and the motion completion identification of the training video frame is used for identifying the motion completion in the training video frame.

16. The counting apparatus of claim 14, wherein the model training module further comprises:

the second returning unit is used for returning a training correct result when the motion completion confidence of the training video frame is not higher than the preset threshold value and the motion completion identification of the training video frame is used for identifying that the motion in the training video frame is not complete.

17. The counting device of claim 15, further comprising:

the test module is used for acquiring the duty ratio of the training video frame with the motion completion confidence higher than the preset threshold value from the training video frames with the motion completion identification used for identifying the motion completion in the training video frames; and ending the training process under the condition that the ratio is higher than a preset ratio.

18. An electronic device, comprising:

a memory for storing a program;

a processor for running the program stored in the memory for: