CN111553180A

CN111553180A - Clothing counting method, clothing counting device and electronic equipment

Info

Publication number: CN111553180A
Application number: CN201910111446.4A
Authority: CN
Inventors: 张民英; 神克乐; 龙一民; 徐博文; 吴剑; 胡露露; 陈新; 尹宁; 刘志敏; 胡旭; 袁炜
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-12
Filing date: 2019-02-12
Publication date: 2020-08-18
Anticipated expiration: 2039-02-12
Also published as: WO2020164401A1; CN111553180B; TW202030642A

Abstract

The embodiment of the invention provides a clothing counting method, a clothing counting device and electronic equipment. The method comprises the following steps: dividing a video into a sequence of video frames; performing target detection processing on each video frame in the video frame sequence to generate a feature vector; calculating an action completion confidence coefficient of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to the video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frames completes the action on a second target object; and counting according to the video frames with the action completion confidence higher than a preset threshold. According to the embodiment of the invention, the video frame is analyzed, and the probability of finishing the action is obtained according to the parameters such as the distance, the confidence coefficient and the like of the target object, so that whether the action is finished or not is judged, the finished action can be counted without manually setting a threshold, the false counting can be reduced or avoided, and the counting accuracy is improved.

Description

Clothing counting method, clothing counting device and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a clothing counting method, a clothing counting device and electronic equipment.

Background

In order to promote the digital process of industrial production, a low-intrusion method is generally adopted to correlate the production flow for a non-standard small factory with low standardization and automation degree. For example, on the premise of not changing the production habit of workers, various data in the production flow are collected by arranging equipment such as a camera and the like, so that each link of the production flow is associated, orders can be completed as required, and the production efficiency of a factory is improved.

For some links in the production flow that involve workload counting, for example, in a bagging scene of a clothing manufacturing plant, a video of a worker operation is generally collected by a camera, and a video frame is analyzed by a target detection algorithm to identify a target object (worker and operation object), so as to confirm the start and end of the worker operation and thus count the workload of the worker. In this process, although the completion of each operation by the human operator is judged by the target detection algorithm, various thresholds, for example, a count interval threshold between each workload, a distance threshold between target objects, and the like, need to be manually set in order to prevent an unreasonable count due to erroneous judgment.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: in the prior art, the threshold value is usually set artificially based on experience, so that the artificially set threshold value only has approximate reasonability, cannot be universally applied to various specific production scenes, and cannot ensure the accuracy of the counting result.

Disclosure of Invention

The embodiment of the invention provides a clothing counting method, a clothing counting device and electronic equipment, and aims to overcome the defect that in the prior art, the accuracy of a counting result cannot be guaranteed due to manual setting of threshold counting.

In order to achieve the above object, an embodiment of the present invention provides a clothing counting method, including:

processing a video frame in the video, and acquiring distance information between an operator and the garment, a first confidence coefficient of the operator and a second confidence coefficient of the garment;

inputting the distance information between the operator and the clothes, the first confidence coefficient of the operator and the second confidence coefficient of the clothes into a clothes counting model, and calculating the packing completion confidence coefficient of each video frame, wherein the packing completion confidence coefficient is the probability of the operator completing the packing action on the clothes in the video frames;

and counting the clothing packing according to the video frames with the packing completion confidence higher than a preset threshold.

The embodiment of the invention also provides a counting method, which comprises the following steps:

carrying out target detection processing on video frames in a video to generate a feature vector, wherein the feature vector at least comprises: the distance information of the first target object and the second target object, the first confidence of the first target object and the second confidence of the second target object;

calculating an action completion confidence coefficient of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to the video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frames completes the action on a second target object;

and counting according to the video frames with the action completion confidence higher than a preset threshold.

The embodiment of the invention also provides a counting device, which comprises:

a target detection module, configured to perform target detection processing on a video frame in a video to generate a feature vector, where the feature vector at least includes: the distance information of the first target object and the second target object, the first confidence of the first target object and the second confidence of the second target object;

the calculation module is used for calculating the action completion confidence of each video frame according to a first feature vector sequence formed by the feature vectors corresponding to the plurality of video frames, wherein the action completion confidence is the probability that a first target object in the video frames completes the action on a second target object;

and the counting module is used for counting according to the video frames with the action completion confidence coefficient higher than a preset threshold value.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing a program;

a processor for executing the program stored in the memory for:

According to the clothing counting method, the clothing counting method and device and the electronic equipment, the video frames are analyzed, the probability of action completion is obtained according to the distance, the confidence coefficient and other parameters of the target object, whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold, false counting can be reduced or avoided, and the counting accuracy is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a system block diagram of a service system according to an embodiment of the present invention;

FIG. 2 is a flow chart of one embodiment of a counting method provided by the present invention;

FIG. 3 is a flow chart of another embodiment of a counting method provided by the present invention;

FIG. 4 is a schematic structural diagram of an action completion count model according to an embodiment of the present invention;

FIG. 5 is a flow chart of another embodiment of a counting method provided by the present invention;

FIG. 6 is a schematic structural diagram of an embodiment of a counting device provided in the present invention;

FIG. 7 is a schematic structural diagram of another embodiment of a counting device provided by the present invention;

fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the prior art, for some links in the production flow which involve workload counting, for example, in a bagging scene of a clothing manufacturing factory, a video of a worker operation is generally collected by a camera, and a video frame is analyzed by a target detection algorithm to identify a target object (a worker and an operation object), so as to confirm the start and end of the worker operation and thus count the workload of the worker. In this process, although the completion of each operation by the human operator is judged by the target detection algorithm, various thresholds, for example, a count interval threshold between each workload, a distance threshold between target objects, and the like, need to be manually set in order to prevent an unreasonable count due to erroneous judgment. These thresholds are usually set artificially based on experience, so that the artificially set thresholds have only approximate reasonableness, cannot be universally applied to various specific production scenarios, and cannot guarantee the accuracy of the counting result. For example, assuming that the average time for a worker to pack an article of clothing is 15 seconds, typically ranging between 10-20 seconds, a count interval threshold of 10 seconds is typically artificially set, and only if the time interval between two counting operations exceeds the threshold will the second packing be counted. If it takes only 8 seconds for a worker to pack a garment (assuming the manually set threshold is 10 seconds), the second time the garment is packed count is cancelled, resulting in a significant decrease in the accuracy of the algorithm.

Therefore, the present application proposes a counting scheme, whose main principle is: by analyzing the video frames, the action completion confidence coefficient in each video frame, namely the probability of the target object completing the action, is obtained according to the parameters such as the distance, the confidence coefficient and the like of the target object, and whether the action is completed or not is judged according to the action completion confidence coefficient, so that whether the completed action is counted or not is determined, a threshold value does not need to be manually set, the false counting can be reduced or avoided, and the counting accuracy is improved.

The method provided by the embodiment of the invention can be applied to any business system with data processing capacity. Fig. 1 is a system block diagram of a service system provided in an embodiment of the present invention, and the structure shown in fig. 1 is only one example of a service system to which the technical solution of the present invention can be applied. As shown in fig. 1, the service system includes a counting device. The device includes: the object detection module, the calculation module and the counting module may be configured to perform the processing flows shown in fig. 2, 3 and 5 below. In the service system, firstly, a video is divided into a video frame sequence; then, each video frame in the video frame sequence is subjected to target detection processing to generate a feature vector, wherein the feature vector at least comprises: distance information of a first target object (operator) and a second target object (operation object), a first confidence of the first target object and a second confidence of the second target object; forming a first feature vector sequence by feature vectors corresponding to a plurality of video frames, and calculating the action completion confidence of each video frame according to the first feature vector sequence, namely calculating the probability of completing the action of a first target object to a second target object in the video frames; and finally, counting the video frames with the action completion confidence coefficient higher than a preset threshold value. The counting can be carried out without manually setting a threshold value, the false counting can be reduced or avoided, and the counting accuracy is improved.

The above embodiments are illustrations of technical principles and exemplary application frameworks of the embodiments of the present invention, and specific technical solutions of the embodiments of the present invention are further described in detail below through a plurality of embodiments.

Example one

Fig. 2 is a flowchart of an embodiment of the counting method provided by the present invention, and an execution subject of the method may be the service system, or may be various terminal or server devices with data processing capability, or may be a device or chip integrated on these devices. As shown in fig. 2, the counting method includes the following steps:

s201, carrying out target detection processing on video frames in the video to generate feature vectors.

In the embodiment of the invention, after the video of the action of the first target object for the second target object recorded by the user is acquired, the video is divided into the video frame sequence. Then, carrying out target detection processing on each video frame, acquiring a target object in the video frame, and generating a feature vector of the video frame, wherein the feature vector at least comprises: distance information of the first target object and the second target object, a first confidence of the first target object, and a second confidence of the second target object.

Take a scene in which a worker packs clothes in a factory as an example, in which the first target object is a worker (operator); the second target object is a garment (dress); the first confidence of the first target object is the probability that the worker is packing clothes in the current video frame; the second confidence of the second target object is then the probability that the clothing was identified as being packaged in the current video frame. The data can be obtained by performing target detection on each video frame in the video of the worker-packed clothes, and then the data of each video frame is composed into a feature vector for the video frame.

S202, calculating the action completion confidence of each video frame according to a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames.

In the embodiment of the present invention, the action completion confidence of a video frame is a probability that a first target object in the video frame completes an action on a second target object. Taking a worker for packing clothes as an example, the action completion confidence is the probability that the worker completes the packing action. By processing the feature vectors of the video frames, for example, using a multi-layer neural network or other techniques, the motion completion confidence (packing completion confidence) of each video frame can be calculated.

And S203, counting the video frames with the confidence coefficient higher than the preset threshold value according to the action completion.

In the embodiment of the present invention, a probability threshold may be preset, and when the probability that the first target object completes the action on the second target object in a certain video frame is higher than the preset threshold, the count is increased by one. That is, if there are N video frames in a piece of video that satisfy the above condition, the number of times the first target object completes the action on the second target object is N.

According to the counting method provided by the embodiment of the invention, the video frame is analyzed, and the probability of action completion is obtained according to the parameters such as the distance and the confidence coefficient of the target object, so that whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold, the false counting can be reduced or avoided, and the counting accuracy is improved.

Example two

Fig. 3 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, the counting method provided in this embodiment may further include the following steps:

s301, the video is divided into a sequence of video frames.

In the embodiment of the present invention, the whole video of a preset time period (for example, one day, or several hours, etc.) may be divided into a video frame sequence, and then the video frame sequence is input into a motion completion counting model (clothing counting model) trained in advance for counting.

S302, carrying out target detection processing on each video frame in the video frame sequence to generate a feature vector.

Fig. 4 is a schematic structural diagram of an action completion counting model according to an embodiment of the present invention. As shown in fig. 4, after the video frame sequence is input into the motion completion counting model, the model first performs object detection processing on the video frame sequence, thereby generating feature vectors of the respective video frames, such as feature vector 1, feature vector 2, … …, and feature vector n in fig. 4.

And S303, processing a first feature vector sequence consisting of feature vectors corresponding to a plurality of video frames by using a recurrent neural network to generate a second feature vector sequence containing the context of each video frame.

And S304, calculating the action completion confidence of each video frame according to the second feature vector sequence.

In the embodiment of the invention, the context content of the video frame can be combined to perform confidence degree correlation calculation so as to improve the accuracy. Thus, the first sequence of feature vectors may be processed using a recurrent neural network to generate a second sequence of feature vectors containing the context of each video frame. Then, the second feature vector sequence is input to a confidence calculation module to calculate the motion completion confidence of each video frame. Specifically, the confidence calculation module may be obtained by inputting training data into the multi-layer perceptron for training in a model training stage.

And S305, counting according to the video frames with the action completion confidence higher than a preset threshold.

According to the counting method provided by the embodiment of the invention, the probability of action completion is obtained by analyzing the video frames and combining the context of each video frame according to the parameters such as the distance and the confidence coefficient of the target object, so that whether the action is completed or not is more accurately judged, the completed action can be counted without manually setting a threshold, the false counting can be reduced or avoided, and the counting accuracy is improved.

EXAMPLE III

Fig. 5 is a flowchart of a counting method according to another embodiment of the present invention. As shown in fig. 5, on the basis of the embodiment shown in fig. 2 or fig. 3, the counting method provided in the embodiment of the present invention may further include the following steps:

s501, training video data are obtained.

In the embodiment of the invention, before the counting model is counted by using the action completion counting, the model can be trained by acquiring training video data. The training video data may include a feature vector of each of the plurality of training video frames and an action completion flag (packing completion flag) labeled for each training video frame, the action completion flag being used to identify whether an action in the training video frame is completed. Specifically, the action completion flag labeled for each training video frame indicates whether an action is completed in the video frame, for example, for a clothing packaging scene, if packaging is completed, the action completion flag may be marked as 1 (the video frame is used for counting plus one), and if packaging is not completed, the action completion flag is marked as 0 (the video frame cannot be used for counting plus one).

And S502, calculating the action completion confidence of each training video frame according to a third feature vector sequence formed by feature vectors corresponding to a plurality of training video frames.

In the embodiment of the present invention, the process of calculating the motion completion confidence of the training video frame according to the third feature vector is the same as the process of calculating the motion completion confidence of each video frame according to the first feature vector sequence and the process of calculating the motion completion confidence of each video frame according to the second feature vector sequence in the model using process.

And S503, returning a training result according to the action completion identifier and the action completion confidence of each training video frame.

Specifically, when the action completion confidence of the training video frame is higher than a preset threshold and the action completion identifier of the training video frame is used for identifying the completion of the action in the training video frame, a correct training result is returned.

In the embodiment of the present invention, for a certain training video frame, when the confidence of the completion of the motion calculated by the motion completion counting model is higher than the preset threshold, it indicates that the video frame can be used for counting by one after the computation of the model, and if the motion completion flag is 1 at this time, it indicates that the training result is correct.

In addition, when the action completion confidence of the training video frame is not higher than the preset threshold and the action completion identifier of the training video frame is used for identifying that the action in the training video frame is not completed, a correct training result is returned.

In the embodiment of the invention, for a certain training video frame, when the confidence coefficient of motion completion calculated by the motion completion counting model is not higher than the preset threshold, it indicates that the video frame cannot be used for counting by one after passing through the model calculation, and if the motion completion identifier is 0 at this time, it also indicates that the training result is correct.

Further, the counting method provided by the embodiment of the present invention may further include:

s504, in the training video frames with the action completion identifiers used for identifying the completion of the action in the training video frames, the occupation ratio of the training video frames with the action completion confidence coefficient higher than the preset threshold value is obtained.

And S505, when the occupation ratio is higher than the preset ratio, ending the training process.

In the embodiment of the present invention, whether the model can be used is determined by the accuracy of the motion completion count model output, for example, 1000 video frames with motion completion identifier 1 input to the model, 700 video frames with motion completion confidence of the 1000 video frames output by the model being higher than the preset threshold, that is, 70% are input, and if the input is higher than the preset ratio, the model is trained, so the training process can be ended. And using the trained action to finish counting according to the input video by the counting model.

According to the counting method provided by the embodiment of the invention, the motion completion counting model is trained by acquiring the training video data, and the training result is returned according to the motion completion identification and the motion completion confidence of each training video frame, so that the counting accuracy is improved.

Example four

Fig. 6 is a schematic structural diagram of an embodiment of a counting apparatus according to the present invention, which can be used to execute the method steps shown in fig. 2. As shown in fig. 6, the counting means may include: an object detection module 62, a calculation module 63 and a counting module 64.

The target detection module 62 is configured to perform target detection processing on a video frame in a video, and generate a feature vector, where the feature vector at least includes: the distance information of the first target object and the second target object, the first confidence of the first target object and the second confidence of the second target object; the calculating module 63 is configured to calculate an action completion confidence of each video frame according to a first feature vector sequence formed by feature vectors corresponding to multiple video frames, where the action completion confidence is a probability that a first target object in a video frame completes an action on a second target object; the counting module 64 is configured to count the video frames with the confidence level higher than the preset threshold according to the action completion.

In the embodiment of the invention, after the video of the action of the first target object for the second target object recorded by the user is acquired, the video is divided into the video frame sequence. Then, the target detection module 62 performs target detection processing on each video frame in the video, acquires a target object in the video frame, and generates a feature vector of the video frame. The calculation module 63 calculates the motion completion confidence of each video frame according to the first feature vector sequence formed by each feature vector generated by the target detection module 62. When the motion completion confidence of a certain video frame is higher than the preset threshold, the count in the count module 64 is increased by one.

According to the counting device provided by the embodiment of the invention, the video frame is analyzed, and the probability of action completion is obtained according to the parameters such as the distance and the confidence coefficient of the target object, so that whether the action is completed or not is judged, the completed action can be counted without manually setting a threshold, the false counting can be reduced or avoided, and the counting accuracy is improved.

EXAMPLE five

Fig. 7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention, which can be used for executing the method steps shown in fig. 3 and fig. 5. As shown in fig. 7, on the basis of the embodiment shown in fig. 6, the calculating module 63 may include: a processing unit 631 and a calculation unit 632.

The processing unit 631 may be configured to process the first feature vector sequence by using a recurrent neural network, and generate a second feature vector sequence including a context of each video frame; the calculating unit 632 may be configured to calculate an action completion confidence of each video frame according to the second feature vector sequence.

In this embodiment of the present invention, the calculating module 63 may be specifically configured to use a multi-layer neural network to process the first feature vector sequence, and calculate the motion completion confidence of each video frame. Specifically, in the calculation module 63, the confidence level association calculation may be performed in combination with the context content of the video frame to improve the accuracy. Accordingly, the processing unit 631 may process the first sequence of feature vectors using a recurrent neural network to generate a second sequence of feature vectors containing the context of each video frame. Then, the calculation unit 632 calculates the motion completion confidence of each video frame from the second feature vector sequence generated by the processing unit 631.

Further, in the embodiment of the present invention, the whole video of a preset time period (for example, one day, or several hours, etc.) may be divided into a video frame sequence, and then the video frame sequence is input into the motion completion counting model trained in advance for counting. Therefore, the counting device provided by the embodiment of the invention may further include: a model training module 71. The model training module 71 may be configured to obtain training video data, where the training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, where the action completion identifier is used to identify whether an action in a training video frame is completed; calculating the action completion confidence of each training video frame according to a third feature vector sequence formed by feature vectors corresponding to a plurality of training video frames; and returning a training result according to the action completion identification and the action completion confidence for each training video frame.

Specifically, the model training module 71 may include: a first return unit 711. The first returning unit 711 may be configured to return a correct training result when the confidence of the completion of the action of the training video frame is higher than a preset threshold and the identification of the completion of the action of the training video frame is used to identify the completion of the action in the training video frame.

Further, the model training module 71 may further include: and a second return unit 712. The second returning unit 712 may be configured to return a correct training result when the confidence of the completion of the action of the training video frame is not higher than the preset threshold and the identification of the completion of the action of the training video frame is used to identify that the action in the training video frame is not completed.

In addition, the counting device provided by the embodiment of the invention may further include: the module 72 is tested. The test module 72 may be configured to obtain, in the training video frame whose action completion identifier is used to identify completion of an action in the training video frames, a proportion of the training video frame whose action completion confidence is higher than a preset threshold; in case the occupation ratio is higher than the preset ratio, the training process is ended.

The functions of the modules in the embodiments of the present invention are described in detail in the above method embodiments, and are not described herein again.

EXAMPLE six

The internal functions and structure of the counting device have been described above, and the device can be implemented as an electronic device. Fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention. As shown in fig. 8, the electronic device includes a memory 81 and a processor 82.

The memory 81 stores programs. In addition to the above-described programs, the memory 81 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 81 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 82, coupled to the memory 81, for executing programs stored in the memory 81 for:

calculating the action completion confidence coefficient of each video frame according to a first feature vector sequence formed by feature vectors corresponding to a plurality of video frames, wherein the action completion confidence coefficient is the probability that a first target object in the video frames completes the action on a second target object;

and counting the video frames with the action completion confidence higher than a preset threshold.

Further, as shown in fig. 8, the electronic device may further include: communication components 83, power components 84, audio components 85, a display 86, and the like. Only some of the components are schematically shown in fig. 8, and the electronic device is not meant to include only the components shown in fig. 8.

The communication component 83 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 83 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 83 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

A power supply component 84 provides power to the various components of the electronic device. The power components 84 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.

The audio component 85 is configured to output and/or input audio signals. For example, the audio component 85 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 81 or transmitted via the communication component 83. In some embodiments, audio assembly 85 also includes a speaker for outputting audio signals.

The display 86 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A garment counting method, comprising:

2. The clothing counting method according to claim 1, wherein the step of inputting the distance information of the operator from the clothing, the first confidence level of the operator and the second confidence level of the clothing into a clothing counting model to calculate the packing completion confidence level of each video frame comprises the steps of:

and calculating the packing completion confidence of each video frame by using a multi-layer neural network.

3. The garment counting method according to claim 2, wherein the calculating of the packing completion confidence of each video frame using a multi-layer neural network comprises:

acquiring the context relationship of each video frame by using a recurrent neural network;

and calculating the packing completion confidence of each video frame according to the context of each video frame.

4. A counting method, comprising:

5. The counting method according to claim 4, wherein the calculating the motion completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the plurality of video frames comprises:

and processing the first characteristic vector sequence by using a multilayer neural network, and calculating the action completion confidence of each video frame.

6. The counting method of claim 5, wherein the processing the first feature vector sequence using a multi-layer neural network to calculate the motion completion confidence for each video frame comprises:

processing the first feature vector sequence by using a recurrent neural network to generate a second feature vector sequence containing the context of each video frame;

and calculating the action completion confidence of each video frame according to the second feature vector sequence.

7. The counting method according to any one of claims 4 to 6, wherein before the performing the object detection process on the video frames in the video, the method further comprises:

acquiring training video data, wherein the training video data comprises a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, and the action completion identifier is used for identifying whether an action in the training video frame is completed;

calculating the action completion confidence of each training video frame according to a third feature vector sequence formed by the feature vectors corresponding to the training video frames;

and returning a training result according to the action completion identification and the action completion confidence for each training video frame.

8. The counting method according to claim 7, wherein the returning a training result according to the action completion identifier and the action completion confidence for each training video frame comprises:

and when the action completion confidence of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used for identifying the completion of the action in the training video frame, returning a correct training result.

9. The counting method according to claim 7, wherein the returning a training result according to the action completion identifier and the action completion confidence for each training video frame comprises:

and when the action completion confidence of the training video frame is not higher than the preset threshold and the action completion identifier of the training video frame is used for identifying that the action in the training video frame is not completed, returning a correct training result.

10. The counting method according to claim 8, further comprising, after returning a training correct result when the action completion confidence of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used to identify the action completion in the training video frame:

acquiring the occupation ratio of the training video frames with the action completion confidence coefficient higher than the preset threshold value from the training video frames with the action completion identifiers used for identifying the completion of the action in the training video frames;

and when the occupation ratio is higher than a preset ratio, ending the training process.

11. A counting device, comprising:

12. The counting apparatus according to claim 11, wherein the computing module is specifically configured to use a multi-layer neural network to process the first feature vector sequence and compute the motion completion confidence of each video frame.

13. The counting device of claim 12, wherein the computing module comprises:

the processing unit is used for processing the first feature vector sequence by using a recurrent neural network to generate a second feature vector sequence containing the context relation of each video frame;

and the calculating unit is used for calculating the action completion confidence of each video frame according to the second characteristic vector sequence.

14. The counting device according to any one of claims 11 to 13, further comprising:

the model training module is used for acquiring training video data, wherein the training video data comprise a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, and the action completion identifier is used for identifying whether actions in the training video frames are completed or not; calculating the action completion confidence of each training video frame according to a third feature vector sequence formed by the feature vectors corresponding to the training video frames; and returning a training result according to the action completion identification and the action completion confidence for each training video frame.

15. The counting device of claim 14, wherein the model training module comprises:

and the first returning unit is used for returning a correct training result under the condition that the action completion confidence of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used for identifying the completion of the action in the training video frame.

16. The counting device of claim 14, wherein the model training module further comprises:

and the second returning unit is used for returning a correct training result under the condition that the action completion confidence of the training video frame is not higher than the preset threshold and the action completion identifier of the training video frame is used for identifying that the action in the training video frame is not completed.

17. The counting device of claim 15, further comprising:

the testing module is used for acquiring the proportion of the training video frames with the action completion confidence coefficient higher than the preset threshold value in the training video frames with the action completion identifiers used for identifying the completion of the action in the training video frames; in case the occupation ratio is higher than a preset ratio, the training process is ended.

18. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory for: