WO2020164401A1

WO2020164401A1 - Method for counting items of clothing, counting method and apparatus, and electronic device

Info

Publication number: WO2020164401A1
Application number: PCT/CN2020/074214
Authority: WO
Inventors: 张民英; 神克乐; 龙一民; 徐博文; 吴剑; 胡露露; 陈新; 尹宁; 刘志敏; 胡旭; 袁炜
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-02-12
Filing date: 2020-02-03
Publication date: 2020-08-20
Also published as: CN111553180A; TW202030642A; CN111553180B

Abstract

Disclosed are a method for counting items of clothing, a counting method and apparatus, and an electronic device. The method comprises: segmenting a video into a video frame sequence; carrying out target detection processing on each video frame in the video frame sequence, and generating feature vectors (S201); calculating an action completion confidence level of each video frame according to a first feature vector sequence composed of the feature vectors corresponding to a plurality of video frames (S202), wherein the action completion confidence level is the probability of a first target object in the video frame completing an action on a second target object; and carrying out counting according to video frames of which the action completion confidence levels are higher than a preset threshold (S203). According to the method, by means of analyzing the video frames, an action completion probability is acquired according to parameters, such as the distance of the target object and the confidence level, thus whether the action is completed is determined; and the number of completed actions can be counted without manually setting a threshold, and miscounting can be reduced or avoided, thus improving counting accuracy.

Description

Clothing counting method, counting method and device, and electronic equipment

This application claims the priority of the Chinese patent application filed on February 12, 2019 with the application number 201910111446.4 and the invention title "Clothing counting method, counting method and device and electronic equipment", the entire content of which is incorporated into this application by reference .

Technical field

The present invention relates to the field of computer technology, in particular to a clothing counting method, counting method and device, and electronic equipment.

Background technique

In order to promote the digital process of industrial production, for non-standard small factories with low degree of standardization and automation, the production process is usually linked in a low-invasive way. For example, on the premise of not changing the production habits of workers, cameras and other equipment are set to collect various data in the production process, so as to correlate each link of the production process to ensure that orders can be completed on schedule and improve the production efficiency of the factory.

For some links in the production process that involve workload counting, for example, in the bagging scene of a garment factory, the video of the worker's operation is generally collected through the camera, and the video frame is analyzed through the target detection algorithm to identify the target Objects (workers and operation objects) to confirm the start and end of the worker's operation, and thus count the workload of the worker. In this process, although the completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting caused by misjudgment, various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc.

In the process of implementing the present invention, the inventor found that the prior art has at least the following problems: In the prior art, the threshold is usually set artificially based on experience, so that the artificially set threshold can only be roughly reasonable and cannot It is universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed.

Summary of the invention

The embodiment of the present invention provides a clothing counting method, a counting method and a counting device, and an electronic device, so as to solve the defect that the accuracy of the counting result cannot be guaranteed by manually setting the threshold counting in the prior art.

To achieve the foregoing objective, an embodiment of the present invention provides a clothing counting method, including:

Process the video frames in the video to obtain the distance information between the operator and the clothing, the operator’s first confidence level and the clothing’s second confidence level;

The distance information between the operator and the clothing, the operator’s first confidence level, and the clothing’s second confidence level are input into the clothing counting model, and the packaging completion confidence level of each video frame is calculated. The packaging completion confidence level is the video Probability of the operator in the frame to complete the packing action of the clothing;

According to the video frames whose packing completion confidence is higher than the preset threshold, the garment packing count is performed.

The embodiment of the present invention also provides a counting method, including:

Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;

According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;

Count the video frames with the confidence of completion of the action being higher than the preset threshold.

The embodiment of the present invention also provides a counting device, including:

The target detection module is used to perform target detection processing on the video frames in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, and the first confidence of the first target object Degree and the second confidence degree of the second target object;

The calculation module is configured to calculate the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames, and the action completion confidence is the first target in the video frame The probability of the object completing the action on the second target object;

The counting module is used to count the video frames with the confidence of completion of the action being higher than the preset threshold.

The embodiment of the present invention also provides an electronic device, including:

Memory, used to store programs;

The processor is configured to run the program stored in the memory for:

The clothing counting method, counting method and device, and electronic equipment provided by the embodiments of the present invention obtain the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed without manual labor. By setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.

Description of the drawings

By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only used for the purpose of illustrating the preferred embodiments, and are not considered as a limitation to the application. Also, throughout the drawings, the same reference symbols are used to denote the same components. In the attached picture:

Figure 1 is a system block diagram of a business system provided by an embodiment of the present invention;

2 is a flowchart of an embodiment of the counting method provided by the present invention;

3 is a flowchart of another embodiment of the counting method provided by the present invention;

4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention;

5 is a flowchart of another embodiment of the counting method provided by the present invention;

6 is a schematic structural diagram of an embodiment of the counting device provided by the present invention;

7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention.

detailed description

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

In the prior art, for certain links in the production process that involve workload counting, for example, in the bagging scene of a clothing factory, the video of the worker’s operation is generally collected by a camera, and the video frame is checked by a target detection algorithm. Perform analysis to identify target objects (workers and operation objects) to confirm the start and end of worker operations, and therefore count the workload of workers. In this process, although the completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting caused by misjudgment, various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc. These thresholds are usually set artificially based on experience, so that the artificially set thresholds can only be roughly reasonable and cannot be universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed. For example, assuming that the average time for workers to pack a piece of clothing is 15 seconds, and the general range is between 10-20 seconds, then the counting interval threshold is generally artificially set to 10 seconds, and the time interval between two counting operations only exceeds this threshold. The second package will be counted. If it takes only 8 seconds for a worker who is more efficient to pack a piece of clothing (assuming that the artificially set threshold is 10 seconds), then the count of the second packaged clothes will be cancelled, leading to the algorithm The accuracy rate is greatly reduced.

Therefore, this application proposes a counting scheme, the main principle of which is to obtain the confidence of completion of actions in each video frame by analyzing the video frames and according to parameters such as the distance and confidence of the target object, that is, the target object The probability of completing an action is determined by the confidence of the completion of the action, so as to determine whether to count the completed action. There is no need to manually set a threshold, which can reduce or avoid false counting and improve the accuracy of counting.

The method provided by the embodiment of the present invention can be applied to any business system with data processing capability. Fig. 1 is a system block diagram of a business system provided by an embodiment of the present invention. The structure shown in Fig. 1 is only one example of a business system to which the technical solution of the present invention can be applied. As shown in Figure 1, the business system includes a counting device. The device includes: a target detection module, a calculation module, and a counting module, which can be used to execute the processing flow shown in Figure 2, Figure 3, and Figure 5 below. In this business system, first, the video is divided into a sequence of video frames; then, target detection processing is performed on each video frame in the sequence of video frames to generate a feature vector, which includes at least: the first target object (operation者) Distance information from the second target object (operating object), the first confidence level of the first target object, and the second confidence level of the second target object; the first feature vector is composed of feature vectors corresponding to multiple video frames Sequence, calculate the confidence of completion of the action of each video frame according to the first feature vector sequence, that is, calculate the probability of the first target object in the video frame to complete the action on the second target object; finally, the confidence of completion of the action is higher than the preset Threshold video frames are counted. It can count without manually setting the threshold, which can reduce or avoid false counts and improve the accuracy of counting.

The above-mentioned embodiments are descriptions of the technical principles and exemplary application frameworks of the embodiments of the present invention. The specific technical solutions of the embodiments of the present invention will be further described in detail below through a plurality of embodiments.

Example one

Figure 2 is a flow chart of an embodiment of the counting method provided by the present invention. The execution subject of the method can be the above-mentioned business system, various terminal or server devices with data processing capabilities, or integrated on these devices. Device or chip. As shown in Figure 2, the counting method includes the following steps:

S201: Perform target detection processing on video frames in the video to generate a feature vector.

In the embodiment of the present invention, after a video in which the user records the action of the first target object on the second target object is obtained, it is divided into a sequence of video frames. Then, the target detection process is performed on each video frame, the target object in the video frame is obtained, and the feature vector of the video frame is generated. The feature vector includes at least: the distance information between the first target object and the second target object, The first confidence level of the first target object and the second confidence level of the second target object.

Take the scene of workers packing clothes in a factory as an example. In this scene, the first target object is the worker (operator); the second target object is clothes (clothing); the first confidence level of the first target object is The probability of identifying that the worker is packing clothes in the current video frame; the second confidence level of the second target object is the probability of identifying that the clothes are packed in the current video frame. By performing target detection on each video frame in the video of the worker packing clothes, the above data can be obtained, and then the data of each video frame is composed of a feature vector for the video frame.

S202: Calculate the action completion confidence of each video frame according to a first feature vector sequence composed of feature vectors corresponding to multiple video frames.

In the embodiment of the present invention, the confidence of completion of the action of the video frame is the probability that the first target object in the video frame completes the action on the second target object. Taking the worker packing clothes as an example, the confidence of completion of the action is the probability of the worker completing the packing action. By processing the feature vector of each video frame, for example, a multi-layer neural network can be used to calculate the action completion confidence of each video frame (the packaging completion confidence).

S203: Count the video frames with the confidence of completion of the action being higher than a preset threshold.

In the embodiment of the present invention, a probability threshold may be preset, and when the probability of the first target object completing the action on the second target object in a certain video frame is higher than the preset threshold, the count is increased by one. In other words, if it is calculated that there are N video frames satisfying the above conditions in a video, the number of times the first target object completes the action on the second target object is N.

The counting method provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to the distance and confidence of the target object and other parameters, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.

Example two

Fig. 3 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 3, based on the embodiment shown in FIG. 2, the counting method provided in this embodiment may further include the following steps:

S301: Split the video into a sequence of video frames.

In the embodiment of the present invention, the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion count model ( Clothing counting model).

S302: Perform target detection processing on each video frame in the video frame sequence to generate a feature vector.

Fig. 4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention. As shown in Figure 4, after inputting the video frame sequence into the action completion counting model, the model first performs target detection processing on the video frame sequence to generate the feature vector of each video frame, such as feature vector 1, feature in Figure 4 Vector 2,..., feature vector n.

S303: Use a recurrent neural network to process a first feature vector sequence composed of feature vectors corresponding to multiple video frames, and generate a second feature vector sequence that includes the context of each video frame.

S304: Calculate the action completion confidence of each video frame according to the second feature vector sequence.

In the embodiment of the present invention, the contextual content of the video frame may be combined to perform confidence correlation calculation to improve accuracy. Therefore, a recurrent neural network can be used to process the first feature vector sequence to generate a second feature vector sequence containing the context of each video frame. Then, the second feature vector sequence is input to the confidence calculation module to calculate the confidence of completion of each video frame. Specifically, the confidence calculation module may be obtained by inputting training data into the multi-layer perceptron during the model training phase.

S305: Counting the video frames with the action completion confidence higher than the preset threshold.

The counting method provided by the embodiments of the present invention obtains the probability of completion of an action by analyzing video frames, according to parameters such as the distance and confidence of the target object, and combining the context of each video frame, so as to more accurately determine whether the action is completed. Without manually setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.

Example three

Figure 5 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 5, based on the embodiment shown in FIG. 2 or FIG. 3, the counting method provided in the embodiment of the present invention may further include the following steps:

S501: Obtain training video data.

In the embodiment of the present invention, before using the above-mentioned action completion counting model for counting, the model may be trained by obtaining training video data. The training video data may include the feature vector of each training video frame in the multiple training video frames and the action completion identifier (package completion identifier) annotated for each training video frame. The action completion identifier is used to identify the action in the training video frame Is it complete? Specifically, the action completion flag marked for each training video frame refers to whether the action is completed in the video frame, for example, for a clothes packing scene, if the packaging is completed, the action completion flag can be recorded as 1 (the video frame is used for Count plus one). If the packing is not completed, the action completion flag is recorded as 0 (the video frame cannot be used to count plus one).

S502: Calculate the action completion confidence of each training video frame according to a third feature vector sequence composed of feature vectors corresponding to multiple training video frames.

In the embodiment of the present invention, the process of calculating the confidence of the completion of the action of the training video frame according to the third feature vector, and the process of using the above model, the confidence of the completion of the action of each video frame is calculated according to the first feature vector sequence, and, according to The second feature vector sequence calculates the action completion confidence of each video frame in the same process.

S503: For each training video frame, return the training result according to its action completion identifier and action completion confidence.

Specifically, when the action completion confidence of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame, the correct training result is returned.

In the embodiment of the present invention, for a certain training video frame, when the action completion confidence level calculated by the action completion counting model is higher than the preset threshold, it means that the video frame can be used to count plus one after calculation by the model. When the action completion flag is 1, the training result is correct.

In addition, when the action completion confidence of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed, the correct training result is returned.

In the embodiment of the present invention, for a certain training video frame, when the action completion count model calculates the confidence of completion of the action is not higher than the preset threshold, it means that after calculation by the model, the video frame cannot be used for counting plus one. At this time, its action completion flag is 0, which also shows that the training result is correct.

Further, the counting method provided in the embodiment of the present invention may further include:

S504: Acquire the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames where the action completion identifier is used to identify the completion of the action in the training video frame.

S505: When the proportion is higher than the preset ratio, the training process is ended.

In the embodiment of the present invention, whether the model can be used is determined by the correct rate of the output of the action completion count model. For example, the number of video frames input to the model with the action completion flag of 1 is 1000, and these 1000 video frames are output by the model The number of video frames with the action completion confidence higher than the preset threshold is 700, that is, the proportion is 70%. If the proportion is higher than the preset ratio, it means that the model has been trained. Therefore, the training process can be ended. Use the trained action completion counting model to count according to the input video.

The counting method provided by the embodiment of the present invention trains the action completion counting model by acquiring training video data, and returns the training result according to the action completion identifier and the action completion confidence of each training video frame, thereby improving the accuracy of counting.

Example four

Fig. 6 is a schematic structural diagram of an embodiment of a counting device provided by the present invention, which can be used to perform the method steps shown in Fig. 2. As shown in FIG. 6, the counting device may include: a target detection module 62, a calculation module 63 and a counting module 64.

Among them, the target detection module 62 is configured to perform target detection processing on the video frames in the video to generate a feature vector, which includes at least: distance information between the first target object and the second target object, and the first target object The confidence level and the second confidence level of the second target object; the calculation module 63 is used to calculate the action completion confidence level of each video frame according to the first feature vector sequence composed of feature vectors corresponding to multiple video frames, and the action completion confidence level The degree is the probability that the first target object completes the action on the second target object in the video frame; the counting module 64 is configured to count the video frames whose action completion confidence is higher than the preset threshold.

In the embodiment of the present invention, after a video in which the user records the action of the first target object on the second target object is obtained, it is divided into a sequence of video frames. Then, the target detection module 62 performs target detection processing on each video frame in the video, obtains the target object in the video frame, and generates a feature vector of the video frame. The calculation module 63 calculates the action completion confidence of each video frame according to the first feature vector sequence composed of each feature vector generated by the target detection module 62. When the confidence of completion of a certain video frame is higher than the preset threshold, the count in the counting module 64 is increased by one.

The counting device provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.

Example five

FIG. 7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention, which can be used to perform the method steps shown in FIG. 3 and FIG. 5. As shown in FIG. 7, based on the embodiment shown in FIG. 6, the calculation module 63 may include: a processing unit 631 and a calculation unit 632.

Among them, the processing unit 631 can be used to process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context of each video frame; the calculation unit 632 can be used to calculate the second feature vector sequence according to the second feature vector sequence The action of each video frame completes the confidence level.

In the embodiment of the present invention, the calculation module 63 may be specifically configured to use a multilayer neural network to process the first feature vector sequence, and calculate the action completion confidence of each video frame. Specifically, in the calculation module 63, the contextual content of the video frame may be combined to perform a confidence correlation calculation to improve accuracy. Therefore, the processing unit 631 may process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context relationship of each video frame. Then, the calculation unit 632 calculates the action completion confidence of each video frame according to the second feature vector sequence generated by the processing unit 631.

Further, in the embodiment of the present invention, the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion Count in the counting model. Therefore, the counting device provided in the embodiment of the present invention may further include: a model training module 71. The model training module 71 may be used to obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, and the action completion identifier It is used to identify whether the action in the training video frame is completed; according to the third feature vector sequence composed of feature vectors corresponding to multiple training video frames, calculate the confidence of completion of each training video frame; for each training video frame, according to its The action completion flag and the action completion confidence are returned to the training result.

Specifically, the model training module 71 may include: a first returning unit 711. The first returning unit 711 can be used to return the correct result of training when the confidence of completion of the action of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame. .

Further, the model training module 71 may further include: a second returning unit 712. The second returning unit 712 may be used to return to training when the confidence of completion of the action of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed. Correct result.

In addition, the counting device provided by the embodiment of the present invention may further include a test module 72. The test module 72 can be used to obtain the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames in which the action completion identifier is used to identify the completion of the action in the training video frame; If the ratio is set, the training process ends.

For the function of each module in the embodiment of the present invention, please refer to the specific description in the above method embodiment, which will not be repeated here.

Example Six

The internal function and structure of the counting device are described above, and the device can be implemented as an electronic device. FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention. As shown in FIG. 8, the electronic device includes a memory 81 and a processor 82.

The memory 81 is used to store programs. In addition to the above-mentioned programs, the memory 81 may also be configured to store various other data to support operations on the electronic device. Examples of these data include instructions for any application or method operating on the electronic device, contact data, phone book data, messages, pictures, videos, etc.

The memory 81 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.

The processor 82 is coupled with the memory 81, and executes the program stored in the memory 81 for:

Perform target detection processing on the video frames in the video to generate a feature vector. The feature vector includes at least: the distance information between the first target object and the second target object, the first confidence level of the first target object, and the second target object Second confidence

According to the first feature vector sequence composed of feature vectors corresponding to multiple video frames, calculate the action completion confidence of each video frame. The action completion confidence is the probability that the first target object in the video frame completes the action on the second target object ；

Count the video frames with the confidence of completion of the action higher than the preset threshold.

Further, as shown in FIG. 8, the electronic device may further include: a communication component 83, a power supply component 84, an audio component 85, a display 86 and other components. Only some components are schematically shown in FIG. 8, which does not mean that the electronic device only includes the components shown in FIG. 8.

The communication component 83 is configured to facilitate wired or wireless communication between the electronic device and other devices. Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination of them. In an exemplary embodiment, the communication component 83 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 83 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

The power component 84 provides power for various components of the electronic device. The power supply component 84 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for electronic devices.

The audio component 85 is configured to output and/or input audio signals. For example, the audio component 85 includes a microphone (MIC), and the microphone is configured to receive external audio signals when the electronic device is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in the memory 81 or transmitted via the communication component 83. In some embodiments, the audio component 85 also includes a speaker for outputting audio signals.

The display 86 includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.

A person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the foregoing method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention range.

Claims

A clothing counting method, characterized in that it comprises:

Process the video frames in the video to obtain the distance information between the operator and the clothing, the operator’s first confidence level and the clothing’s second confidence level;

The distance information between the operator and the clothing, the operator’s first confidence level, and the clothing’s second confidence level are input into the clothing counting model, and the packaging completion confidence level of each video frame is calculated. The packaging completion confidence level is the video Probability of the operator in the frame to complete the packing action of the clothing;

According to the video frames whose packing completion confidence is higher than the preset threshold, the garment packing count is performed.
The clothing counting method according to claim 1, wherein the distance information between the operator and the clothing, the operator's first confidence level, and the clothing second confidence level are input into the clothing counting model, and each video is calculated Confidence of frame packing completion, including:

The multi-layer neural network is used to calculate the confidence of the packaging of each video frame.
The clothing counting method according to claim 2, wherein said using a multilayer neural network to calculate the confidence of the packaging completion of each video frame comprises:

Use recurrent neural network to obtain the context of each video frame;

According to the context relationship of each video frame, the confidence of the packaging completion of each video frame is calculated.
A counting method, characterized in that it comprises:

Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;

According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;

Count the video frames with the confidence of completion of the action being higher than the preset threshold.
The counting method according to claim 4, wherein the calculating the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames comprises:

The multi-layer neural network is used to process the first feature vector sequence, and the action completion confidence of each video frame is calculated.
The counting method according to claim 5, wherein said using a multilayer neural network to process said first feature vector sequence to calculate the confidence of completion of actions of each video frame comprises:

Using a recurrent neural network to process the first feature vector sequence to generate a second feature vector sequence containing the context relationship of each video frame;

According to the second feature vector sequence, the confidence of completion of each video frame is calculated.
The counting method according to any one of claims 4 to 6, characterized in that, before the target detection processing is performed on the video frames in the video, the method further comprises:

Obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, where the action completion identifier is used to identify the training Whether the action in the video frame is completed;

Calculate the action completion confidence of each training video frame according to the third feature vector sequence composed of the feature vectors corresponding to the multiple training video frames;

For each training video frame, the training result is returned according to its action completion identifier and action completion confidence.
The counting method according to claim 7, characterized in that, for each training video frame, returning the training result according to its action completion identifier and action completion confidence, comprising:

When the action completion confidence of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame, a correct training result is returned.
The counting method according to claim 7, characterized in that, for each training video frame, returning the training result according to its action completion identifier and action completion confidence, comprising:

When the action completion confidence of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed, the correct training result is returned.
The counting method according to claim 8, characterized in that, when the confidence of completion of the action of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the When the action in the training video frame is completed, after returning the correct result of the training, it also includes:

Acquiring, in the training video frames where the action completion identifier is used to identify the completion of the action in the training video frame, the proportion of the training video frames whose action completion confidence is higher than the preset threshold;

When the proportion is higher than the preset ratio, the training process ends.
A counting device, characterized in that it comprises:

The target detection module is used to perform target detection processing on the video frames in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, and the first confidence of the first target object Degree and the second confidence degree of the second target object;

The calculation module is configured to calculate the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames, and the action completion confidence is the first target in the video frame The probability of the object completing the action on the second target object;

The counting module is used to count the video frames with the confidence of completion of the action being higher than the preset threshold.
The counting device according to claim 11, wherein the calculation module is specifically configured to use a multilayer neural network to process the first feature vector sequence to calculate the action completion confidence of each video frame.
The counting device according to claim 12, wherein the calculation module comprises:

A processing unit, configured to process the first feature vector sequence using a cyclic neural network to generate a second feature vector sequence containing the context relationship of each video frame;

The calculation unit is configured to calculate the confidence of completion of the action of each video frame according to the second feature vector sequence.
The counting device according to any one of claims 11 to 13, characterized in that it further comprises:

The model training module is used to obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, the action completion identifier It is used to identify whether the action in the training video frame is completed; according to the third feature vector sequence composed of the feature vectors corresponding to the multiple training video frames, calculate the action completion confidence of each training video frame; The training video frame returns the training result according to its action completion flag and action completion confidence.
The counting device according to claim 14, wherein the model training module comprises:

The first return unit is configured to: when the confidence of completion of the action of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame To return the correct result of training.
The counting device according to claim 14, wherein the model training module further comprises:

The second returning unit is used to indicate that the confidence of completion of the action of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the unfinished action in the training video frame In case, the correct result of training is returned.
15. The counting device according to claim 15, further comprising:

A testing module, configured to obtain the proportion of training video frames whose action completion confidence is higher than the preset threshold among the training video frames in which the action completion identifier is used to identify the completion of the action in the training video frame; When the proportion is higher than the preset ratio, the training process is ended.
An electronic device, characterized in that it comprises:

Memory, used to store programs;

The processor is configured to run the program stored in the memory for:

Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;

According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;

Count the video frames with the confidence of completion of the action being higher than the preset threshold.