WO2020164401A1 - Method for counting items of clothing, counting method and apparatus, and electronic device - Google Patents

Method for counting items of clothing, counting method and apparatus, and electronic device Download PDF

Info

Publication number
WO2020164401A1
WO2020164401A1 PCT/CN2020/074214 CN2020074214W WO2020164401A1 WO 2020164401 A1 WO2020164401 A1 WO 2020164401A1 CN 2020074214 W CN2020074214 W CN 2020074214W WO 2020164401 A1 WO2020164401 A1 WO 2020164401A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
video frame
confidence
completion
training
Prior art date
Application number
PCT/CN2020/074214
Other languages
French (fr)
Chinese (zh)
Inventor
张民英
神克乐
龙一民
徐博文
吴剑
胡露露
陈新
尹宁
刘志敏
胡旭
袁炜
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020164401A1 publication Critical patent/WO2020164401A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present invention relates to the field of computer technology, in particular to a clothing counting method, counting method and device, and electronic equipment.
  • the production process is usually linked in a low-invasive way.
  • cameras and other equipment are set to collect various data in the production process, so as to correlate each link of the production process to ensure that orders can be completed on schedule and improve the production efficiency of the factory.
  • the video of the worker's operation is generally collected through the camera, and the video frame is analyzed through the target detection algorithm to identify the target Objects (workers and operation objects) to confirm the start and end of the worker's operation, and thus count the workload of the worker.
  • target detection algorithm to identify the target Objects (workers and operation objects) to confirm the start and end of the worker's operation, and thus count the workload of the worker.
  • various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc.
  • the threshold is usually set artificially based on experience, so that the artificially set threshold can only be roughly reasonable and cannot It is universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed.
  • the embodiment of the present invention provides a clothing counting method, a counting method and a counting device, and an electronic device, so as to solve the defect that the accuracy of the counting result cannot be guaranteed by manually setting the threshold counting in the prior art.
  • an embodiment of the present invention provides a clothing counting method, including:
  • the distance information between the operator and the clothing, the operator’s first confidence level, and the clothing’s second confidence level are input into the clothing counting model, and the packaging completion confidence level of each video frame is calculated.
  • the packaging completion confidence level is the video Probability of the operator in the frame to complete the packing action of the clothing;
  • the garment packing count is performed.
  • the embodiment of the present invention also provides a counting method, including:
  • the action completion confidence of each video frame is calculated.
  • the action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
  • the embodiment of the present invention also provides a counting device, including:
  • the target detection module is used to perform target detection processing on the video frames in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, and the first confidence of the first target object Degree and the second confidence degree of the second target object;
  • the calculation module is configured to calculate the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames, and the action completion confidence is the first target in the video frame The probability of the object completing the action on the second target object;
  • the counting module is used to count the video frames with the confidence of completion of the action being higher than the preset threshold.
  • the embodiment of the present invention also provides an electronic device, including:
  • Memory used to store programs
  • the processor is configured to run the program stored in the memory for:
  • the action completion confidence of each video frame is calculated.
  • the action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
  • the clothing counting method, counting method and device, and electronic equipment provided by the embodiments of the present invention obtain the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed without manual labor. By setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.
  • Figure 1 is a system block diagram of a business system provided by an embodiment of the present invention.
  • FIG. 3 is a flowchart of another embodiment of the counting method provided by the present invention.
  • FIG. 4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of the counting device provided by the present invention.
  • FIG. 7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention.
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention.
  • the video of the worker’s operation is generally collected by a camera, and the video frame is checked by a target detection algorithm. Perform analysis to identify target objects (workers and operation objects) to confirm the start and end of worker operations, and therefore count the workload of workers.
  • target detection algorithm perform analysis to identify target objects (workers and operation objects) to confirm the start and end of worker operations, and therefore count the workload of workers.
  • various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc.
  • thresholds are usually set artificially based on experience, so that the artificially set thresholds can only be roughly reasonable and cannot be universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed. For example, assuming that the average time for workers to pack a piece of clothing is 15 seconds, and the general range is between 10-20 seconds, then the counting interval threshold is generally artificially set to 10 seconds, and the time interval between two counting operations only exceeds this threshold. The second package will be counted. If it takes only 8 seconds for a worker who is more efficient to pack a piece of clothing (assuming that the artificially set threshold is 10 seconds), then the count of the second packaged clothes will be cancelled, leading to the algorithm The accuracy rate is greatly reduced.
  • this application proposes a counting scheme, the main principle of which is to obtain the confidence of completion of actions in each video frame by analyzing the video frames and according to parameters such as the distance and confidence of the target object, that is, the target object
  • the probability of completing an action is determined by the confidence of the completion of the action, so as to determine whether to count the completed action.
  • Fig. 1 is a system block diagram of a business system provided by an embodiment of the present invention.
  • the structure shown in Fig. 1 is only one example of a business system to which the technical solution of the present invention can be applied.
  • the business system includes a counting device.
  • the device includes: a target detection module, a calculation module, and a counting module, which can be used to execute the processing flow shown in Figure 2, Figure 3, and Figure 5 below.
  • the video is divided into a sequence of video frames; then, target detection processing is performed on each video frame in the sequence of video frames to generate a feature vector, which includes at least: the first target object (operation ⁇ ) Distance information from the second target object (operating object), the first confidence level of the first target object, and the second confidence level of the second target object; the first feature vector is composed of feature vectors corresponding to multiple video frames Sequence, calculate the confidence of completion of the action of each video frame according to the first feature vector sequence, that is, calculate the probability of the first target object in the video frame to complete the action on the second target object; finally, the confidence of completion of the action is higher than the preset Threshold video frames are counted. It can count without manually setting the threshold, which can reduce or avoid false counts and improve the accuracy of counting.
  • FIG. 2 is a flow chart of an embodiment of the counting method provided by the present invention.
  • the execution subject of the method can be the above-mentioned business system, various terminal or server devices with data processing capabilities, or integrated on these devices. Device or chip.
  • the counting method includes the following steps:
  • S201 Perform target detection processing on video frames in the video to generate a feature vector.
  • the target detection process is performed on each video frame, the target object in the video frame is obtained, and the feature vector of the video frame is generated.
  • the feature vector includes at least: the distance information between the first target object and the second target object, The first confidence level of the first target object and the second confidence level of the second target object.
  • the first target object is the worker (operator); the second target object is clothes (clothing); the first confidence level of the first target object is The probability of identifying that the worker is packing clothes in the current video frame; the second confidence level of the second target object is the probability of identifying that the clothes are packed in the current video frame.
  • S202 Calculate the action completion confidence of each video frame according to a first feature vector sequence composed of feature vectors corresponding to multiple video frames.
  • the confidence of completion of the action of the video frame is the probability that the first target object in the video frame completes the action on the second target object.
  • the confidence of completion of the action is the probability of the worker completing the packing action.
  • S203 Count the video frames with the confidence of completion of the action being higher than a preset threshold.
  • a probability threshold may be preset, and when the probability of the first target object completing the action on the second target object in a certain video frame is higher than the preset threshold, the count is increased by one. In other words, if it is calculated that there are N video frames satisfying the above conditions in a video, the number of times the first target object completes the action on the second target object is N.
  • the counting method provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to the distance and confidence of the target object and other parameters, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
  • Fig. 3 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 3, based on the embodiment shown in FIG. 2, the counting method provided in this embodiment may further include the following steps:
  • the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion count model ( Clothing counting model).
  • a preset time period for example, one day, or several hours, etc.
  • S302 Perform target detection processing on each video frame in the video frame sequence to generate a feature vector.
  • Fig. 4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention. As shown in Figure 4, after inputting the video frame sequence into the action completion counting model, the model first performs target detection processing on the video frame sequence to generate the feature vector of each video frame, such as feature vector 1, feature in Figure 4 Vector 2,..., feature vector n.
  • S303 Use a recurrent neural network to process a first feature vector sequence composed of feature vectors corresponding to multiple video frames, and generate a second feature vector sequence that includes the context of each video frame.
  • S304 Calculate the action completion confidence of each video frame according to the second feature vector sequence.
  • the contextual content of the video frame may be combined to perform confidence correlation calculation to improve accuracy. Therefore, a recurrent neural network can be used to process the first feature vector sequence to generate a second feature vector sequence containing the context of each video frame. Then, the second feature vector sequence is input to the confidence calculation module to calculate the confidence of completion of each video frame. Specifically, the confidence calculation module may be obtained by inputting training data into the multi-layer perceptron during the model training phase.
  • the counting method obtained by the embodiments of the present invention obtains the probability of completion of an action by analyzing video frames, according to parameters such as the distance and confidence of the target object, and combining the context of each video frame, so as to more accurately determine whether the action is completed. Without manually setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.
  • FIG. 5 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 5, based on the embodiment shown in FIG. 2 or FIG. 3, the counting method provided in the embodiment of the present invention may further include the following steps:
  • the model before using the above-mentioned action completion counting model for counting, the model may be trained by obtaining training video data.
  • the training video data may include the feature vector of each training video frame in the multiple training video frames and the action completion identifier (package completion identifier) annotated for each training video frame.
  • the action completion identifier is used to identify the action in the training video frame Is it complete?
  • the action completion flag marked for each training video frame refers to whether the action is completed in the video frame, for example, for a clothes packing scene, if the packaging is completed, the action completion flag can be recorded as 1 (the video frame is used for Count plus one). If the packing is not completed, the action completion flag is recorded as 0 (the video frame cannot be used to count plus one).
  • S502 Calculate the action completion confidence of each training video frame according to a third feature vector sequence composed of feature vectors corresponding to multiple training video frames.
  • the process of calculating the confidence of the completion of the action of the training video frame according to the third feature vector, and the process of using the above model the confidence of the completion of the action of each video frame is calculated according to the first feature vector sequence, and, according to The second feature vector sequence calculates the action completion confidence of each video frame in the same process.
  • the correct training result is returned.
  • the action completion confidence level calculated by the action completion counting model when the action completion confidence level calculated by the action completion counting model is higher than the preset threshold, it means that the video frame can be used to count plus one after calculation by the model.
  • the action completion flag is 1, the training result is correct.
  • the correct training result is returned.
  • the action completion count model calculates the confidence of completion of the action is not higher than the preset threshold, it means that after calculation by the model, the video frame cannot be used for counting plus one. At this time, its action completion flag is 0, which also shows that the training result is correct.
  • the counting method provided in the embodiment of the present invention may further include:
  • S504 Acquire the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames where the action completion identifier is used to identify the completion of the action in the training video frame.
  • whether the model can be used is determined by the correct rate of the output of the action completion count model.
  • the number of video frames input to the model with the action completion flag of 1 is 1000, and these 1000 video frames are output by the model
  • the number of video frames with the action completion confidence higher than the preset threshold is 700, that is, the proportion is 70%. If the proportion is higher than the preset ratio, it means that the model has been trained. Therefore, the training process can be ended. Use the trained action completion counting model to count according to the input video.
  • the counting method provided by the embodiment of the present invention trains the action completion counting model by acquiring training video data, and returns the training result according to the action completion identifier and the action completion confidence of each training video frame, thereby improving the accuracy of counting.
  • Fig. 6 is a schematic structural diagram of an embodiment of a counting device provided by the present invention, which can be used to perform the method steps shown in Fig. 2.
  • the counting device may include: a target detection module 62, a calculation module 63 and a counting module 64.
  • the target detection module 62 is configured to perform target detection processing on the video frames in the video to generate a feature vector, which includes at least: distance information between the first target object and the second target object, and the first target object The confidence level and the second confidence level of the second target object;
  • the calculation module 63 is used to calculate the action completion confidence level of each video frame according to the first feature vector sequence composed of feature vectors corresponding to multiple video frames, and the action completion confidence level The degree is the probability that the first target object completes the action on the second target object in the video frame;
  • the counting module 64 is configured to count the video frames whose action completion confidence is higher than the preset threshold.
  • the target detection module 62 performs target detection processing on each video frame in the video, obtains the target object in the video frame, and generates a feature vector of the video frame.
  • the calculation module 63 calculates the action completion confidence of each video frame according to the first feature vector sequence composed of each feature vector generated by the target detection module 62. When the confidence of completion of a certain video frame is higher than the preset threshold, the count in the counting module 64 is increased by one.
  • the counting device obtained by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
  • FIG. 7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention, which can be used to perform the method steps shown in FIG. 3 and FIG. 5.
  • the calculation module 63 may include: a processing unit 631 and a calculation unit 632.
  • the processing unit 631 can be used to process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context of each video frame; the calculation unit 632 can be used to calculate the second feature vector sequence according to the second feature vector sequence The action of each video frame completes the confidence level.
  • the calculation module 63 may be specifically configured to use a multilayer neural network to process the first feature vector sequence, and calculate the action completion confidence of each video frame.
  • the contextual content of the video frame may be combined to perform a confidence correlation calculation to improve accuracy. Therefore, the processing unit 631 may process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context relationship of each video frame. Then, the calculation unit 632 calculates the action completion confidence of each video frame according to the second feature vector sequence generated by the processing unit 631.
  • the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion Count in the counting model. Therefore, the counting device provided in the embodiment of the present invention may further include: a model training module 71.
  • the model training module 71 may be used to obtain training video data.
  • the training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, and the action completion identifier It is used to identify whether the action in the training video frame is completed; according to the third feature vector sequence composed of feature vectors corresponding to multiple training video frames, calculate the confidence of completion of each training video frame; for each training video frame, according to its The action completion flag and the action completion confidence are returned to the training result.
  • the model training module 71 may include: a first returning unit 711.
  • the first returning unit 711 can be used to return the correct result of training when the confidence of completion of the action of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame. .
  • model training module 71 may further include: a second returning unit 712.
  • the second returning unit 712 may be used to return to training when the confidence of completion of the action of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed. Correct result.
  • the counting device provided by the embodiment of the present invention may further include a test module 72.
  • the test module 72 can be used to obtain the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames in which the action completion identifier is used to identify the completion of the action in the training video frame; If the ratio is set, the training process ends.
  • the counting device obtained by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention. As shown in FIG. 8, the electronic device includes a memory 81 and a processor 82.
  • the memory 81 is used to store programs. In addition to the above-mentioned programs, the memory 81 may also be configured to store various other data to support operations on the electronic device. Examples of these data include instructions for any application or method operating on the electronic device, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 81 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the processor 82 is coupled with the memory 81, and executes the program stored in the memory 81 for:
  • the feature vector includes at least: the distance information between the first target object and the second target object, the first confidence level of the first target object, and the second target object Second confidence
  • the action completion confidence is the probability that the first target object in the video frame completes the action on the second target object ;
  • the electronic device may further include: a communication component 83, a power supply component 84, an audio component 85, a display 86 and other components. Only some components are schematically shown in FIG. 8, which does not mean that the electronic device only includes the components shown in FIG. 8.
  • the communication component 83 is configured to facilitate wired or wireless communication between the electronic device and other devices.
  • Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination of them.
  • the communication component 83 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 83 further includes a near field communication (NFC) module to facilitate short-range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the power component 84 provides power for various components of the electronic device.
  • the power supply component 84 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for electronic devices.
  • the audio component 85 is configured to output and/or input audio signals.
  • the audio component 85 includes a microphone (MIC), and the microphone is configured to receive external audio signals when the electronic device is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in the memory 81 or transmitted via the communication component 83.
  • the audio component 85 also includes a speaker for outputting audio signals.
  • the display 86 includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • a person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the steps including the foregoing method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Abstract

Disclosed are a method for counting items of clothing, a counting method and apparatus, and an electronic device. The method comprises: segmenting a video into a video frame sequence; carrying out target detection processing on each video frame in the video frame sequence, and generating feature vectors (S201); calculating an action completion confidence level of each video frame according to a first feature vector sequence composed of the feature vectors corresponding to a plurality of video frames (S202), wherein the action completion confidence level is the probability of a first target object in the video frame completing an action on a second target object; and carrying out counting according to video frames of which the action completion confidence levels are higher than a preset threshold (S203). According to the method, by means of analyzing the video frames, an action completion probability is acquired according to parameters, such as the distance of the target object and the confidence level, thus whether the action is completed is determined; and the number of completed actions can be counted without manually setting a threshold, and miscounting can be reduced or avoided, thus improving counting accuracy.

Description

服装计数方法、计数方法和装置以及电子设备Clothing counting method, counting method and device, and electronic equipment
本申请要求2019年02月12日递交的申请号为201910111446.4、发明名称为“服装计数方法、计数方法和装置以及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on February 12, 2019 with the application number 201910111446.4 and the invention title "Clothing counting method, counting method and device and electronic equipment", the entire content of which is incorporated into this application by reference .
技术领域Technical field
本发明涉及计算机技术领域,尤其涉及一种服装计数方法、计数方法和装置以及电子设备。The present invention relates to the field of computer technology, in particular to a clothing counting method, counting method and device, and electronic equipment.
背景技术Background technique
为了推动工业生产的数字化进程,针对标准化、自动化程度较低的非标准小型工厂,通常采用低侵入的方式对生产流程进行关联。例如,在不改变工人生产习惯的前提下,通过设置摄像头等设备来采集生产流程中的各类数据,从而关联生产流程的各个环节,以保证订单可以如期完成,并提高工厂的生产效率。In order to promote the digital process of industrial production, for non-standard small factories with low degree of standardization and automation, the production process is usually linked in a low-invasive way. For example, on the premise of not changing the production habits of workers, cameras and other equipment are set to collect various data in the production process, so as to correlate each link of the production process to ensure that orders can be completed on schedule and improve the production efficiency of the factory.
针对生产流程中的某些涉及工作量计数的环节,例如,在制衣工厂的打包装袋场景中,一般通过摄像头采集工人操作的视频,通过目标检测算法来对视频帧进行分析,以识别目标对象(工人及操作对象),从而确认工人操作的开始和结束,并因此对工人的工作量进行计数。在此过程中,虽然通过目标检测算法判断出工人每一次操作的完成,但是为了防止误判而造成的不合理计数,需要人工设定各种阈值,例如,每个工作量之间的计数间隔阈值,或者,目标对象之间的距离阈值等。For some links in the production process that involve workload counting, for example, in the bagging scene of a garment factory, the video of the worker's operation is generally collected through the camera, and the video frame is analyzed through the target detection algorithm to identify the target Objects (workers and operation objects) to confirm the start and end of the worker's operation, and thus count the workload of the worker. In this process, although the completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting caused by misjudgment, various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc.
发明人在实现本发明的过程中,发现现有技术至少存在如下问题:在现有技术中,通常基于经验而人为地设定阈值,这样人为设定的阈值只能具有大致的合理性,不能普适地应用于各种生产的具体场景,无法保证计数结果的准确性。In the process of implementing the present invention, the inventor found that the prior art has at least the following problems: In the prior art, the threshold is usually set artificially based on experience, so that the artificially set threshold can only be roughly reasonable and cannot It is universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed.
发明内容Summary of the invention
本发明实施例提供一种服装计数方法、计数方法和装置以及电子设备,以解决现有技术中通过人工设置阈值计数,而无法保证计数结果准确性的缺陷。The embodiment of the present invention provides a clothing counting method, a counting method and a counting device, and an electronic device, so as to solve the defect that the accuracy of the counting result cannot be guaranteed by manually setting the threshold counting in the prior art.
为达到上述目的,本发明实施例提供了一种服装计数方法,包括:To achieve the foregoing objective, an embodiment of the present invention provides a clothing counting method, including:
对视频中的视频帧进行处理,获取操作者与服装的距离信息、操作者的第一置信度以及服装的第二置信度;Process the video frames in the video to obtain the distance information between the operator and the clothing, the operator’s first confidence level and the clothing’s second confidence level;
将所述操作者与服装的距离信息、操作者的第一置信度以及服装的第二置信度输入服装计数模型,计算各视频帧的打包完成置信度,所述打包完成置信度为所述视频帧中操作者完成对服装的打包动作的概率;The distance information between the operator and the clothing, the operator’s first confidence level, and the clothing’s second confidence level are input into the clothing counting model, and the packaging completion confidence level of each video frame is calculated. The packaging completion confidence level is the video Probability of the operator in the frame to complete the packing action of the clothing;
根据所述打包完成置信度高于预设阈值的视频帧,进行服装打包计数。According to the video frames whose packing completion confidence is higher than the preset threshold, the garment packing count is performed.
本发明实施例还提供了一种计数方法,包括:The embodiment of the present invention also provides a counting method, including:
对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;
根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
根据所述动作完成置信度高于预设阈值的视频帧,进行计数。Count the video frames with the confidence of completion of the action being higher than the preset threshold.
本发明实施例还提供了一种计数装置,包括:The embodiment of the present invention also provides a counting device, including:
目标检测模块,用于对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;The target detection module is used to perform target detection processing on the video frames in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, and the first confidence of the first target object Degree and the second confidence degree of the second target object;
计算模块,用于根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;The calculation module is configured to calculate the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames, and the action completion confidence is the first target in the video frame The probability of the object completing the action on the second target object;
计数模块,用于根据所述动作完成置信度高于预设阈值的视频帧,进行计数。The counting module is used to count the video frames with the confidence of completion of the action being higher than the preset threshold.
本发明实施例还提供了一种电子设备,包括:The embodiment of the present invention also provides an electronic device, including:
存储器,用于存储程序;Memory, used to store programs;
处理器,用于运行所述存储器中存储的所述程序,以用于:The processor is configured to run the program stored in the memory for:
对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;
根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
根据所述动作完成置信度高于预设阈值的视频帧,进行计数。Count the video frames with the confidence of completion of the action being higher than the preset threshold.
本发明实施例提供的服装计数方法、计数方法和装置以及电子设备,通过对视频帧进行分析,根据目标对象的距离及置信度等参数,获取动作完成的概率,从而判定动作是否完成,无需人工设定阈值,便可对完成动作进行计数,能够减少或避免误计数,提高了计数的准确性。The clothing counting method, counting method and device, and electronic equipment provided by the embodiments of the present invention obtain the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed without manual labor. By setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.
附图说明Description of the drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only used for the purpose of illustrating the preferred embodiments, and are not considered as a limitation to the application. Also, throughout the drawings, the same reference symbols are used to denote the same components. In the attached picture:
图1为本发明实施例提供的业务系统的系统框图;Figure 1 is a system block diagram of a business system provided by an embodiment of the present invention;
图2为本发明提供的计数方法一个实施例的流程图;2 is a flowchart of an embodiment of the counting method provided by the present invention;
图3为本发明提供的计数方法另一个实施例的流程图;3 is a flowchart of another embodiment of the counting method provided by the present invention;
图4为本发明实施例提供的动作完成计数模型的结构示意图;4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention;
图5为本发明提供的计数方法又一个实施例的流程图;5 is a flowchart of another embodiment of the counting method provided by the present invention;
图6为本发明提供的计数装置一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of the counting device provided by the present invention;
图7为本发明提供的计数装置另一个实施例的结构示意图;7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention;
图8为本发明提供的电子设备实施例的结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
在现有技术中,针对生产流程中的某些涉及工作量计数的环节,例如,在制衣工厂的打包装袋场景中,一般通过摄像头采集工人操作的视频,通过目标检测算法来对视频帧进行分析,以识别目标对象(工人及操作对象),从而确认工人操作的开始和结束,并因此对工人的工作量进行计数。在此过程中,虽然通过目标检测算法判断出工人每一 次操作的完成,但是为了防止误判而造成的不合理计数,需要人工设定各种阈值,例如,每个工作量之间的计数间隔阈值,或者,目标对象之间的距离阈值等。这些阈值通常是基于经验而人为地设定的,这样人为设定的阈值只能具有大致的合理性,不能普适地应用于各种生产的具体场景,无法保证计数结果的准确性。例如,假设工人打包一件衣服的平均时间为15秒,一般范围在10-20秒之间,那么一般会人为设定计数间隔阈值为10秒,两次计数操作的时间间隔只有超过该阈值,第二次打包才会被计数。如果对于工作效率更高的某个工人来说,打包好一件衣服只需要8秒(假设人为设定的阈值是10秒),那么第二次打包衣服的计数就会被取消,从而导致算法的准确率大大下降。In the prior art, for certain links in the production process that involve workload counting, for example, in the bagging scene of a clothing factory, the video of the worker’s operation is generally collected by a camera, and the video frame is checked by a target detection algorithm. Perform analysis to identify target objects (workers and operation objects) to confirm the start and end of worker operations, and therefore count the workload of workers. In this process, although the completion of each operation of the worker is judged by the target detection algorithm, in order to prevent unreasonable counting caused by misjudgment, various thresholds need to be manually set, for example, the counting interval between each workload Threshold, or the distance threshold between target objects, etc. These thresholds are usually set artificially based on experience, so that the artificially set thresholds can only be roughly reasonable and cannot be universally applied to various specific production scenarios, and the accuracy of the counting results cannot be guaranteed. For example, assuming that the average time for workers to pack a piece of clothing is 15 seconds, and the general range is between 10-20 seconds, then the counting interval threshold is generally artificially set to 10 seconds, and the time interval between two counting operations only exceeds this threshold. The second package will be counted. If it takes only 8 seconds for a worker who is more efficient to pack a piece of clothing (assuming that the artificially set threshold is 10 seconds), then the count of the second packaged clothes will be cancelled, leading to the algorithm The accuracy rate is greatly reduced.
因此,本申请提出了一种计数方案,其主要原理是:通过对视频帧进行分析,根据目标对象的距离及置信度等参数,来获取各视频帧中的动作完成置信度,即,目标对象完成动作的概率,通过动作完成置信度,来判定动作是否完成,从而决定是否对完成动作进行计数,无需人工设定阈值,能够减少或避免误计数,提高计数的准确性。Therefore, this application proposes a counting scheme, the main principle of which is to obtain the confidence of completion of actions in each video frame by analyzing the video frames and according to parameters such as the distance and confidence of the target object, that is, the target object The probability of completing an action is determined by the confidence of the completion of the action, so as to determine whether to count the completed action. There is no need to manually set a threshold, which can reduce or avoid false counting and improve the accuracy of counting.
本发明实施例提供的方法可应用于任何具有数据处理能力的业务系统。图1为本发明实施例提供的业务系统的系统框图,图1所示的结构仅仅是本发明的技术方案可以应用的业务系统的示例之一。如图1所示,该业务系统中包括计数装置。该装置包括:目标检测模块、计算模块和计数模块,可以用来执行下述图2、图3和图5所示的处理流程。在该业务系统中,首先,将视频分割为视频帧序列;然后,对视频帧序列中的每个视频帧进行目标检测处理,以生成特征向量,特征向量中至少包括:第一目标对象(操作者)与第二目标对象(操作对象)的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;将多个视频帧对应的特征向量所组成第一特征向量序列,根据第一特征向量序列计算各视频帧的动作完成置信度,即,计算视频帧中第一目标对象完成对第二目标对象的动作的概率;最后,针对动作完成置信度高于预设阈值的视频帧进行计数。无需人工设定阈值,便可计数,能够减少或避免误计数,提高了计数的准确性。The method provided by the embodiment of the present invention can be applied to any business system with data processing capability. Fig. 1 is a system block diagram of a business system provided by an embodiment of the present invention. The structure shown in Fig. 1 is only one example of a business system to which the technical solution of the present invention can be applied. As shown in Figure 1, the business system includes a counting device. The device includes: a target detection module, a calculation module, and a counting module, which can be used to execute the processing flow shown in Figure 2, Figure 3, and Figure 5 below. In this business system, first, the video is divided into a sequence of video frames; then, target detection processing is performed on each video frame in the sequence of video frames to generate a feature vector, which includes at least: the first target object (operation者) Distance information from the second target object (operating object), the first confidence level of the first target object, and the second confidence level of the second target object; the first feature vector is composed of feature vectors corresponding to multiple video frames Sequence, calculate the confidence of completion of the action of each video frame according to the first feature vector sequence, that is, calculate the probability of the first target object in the video frame to complete the action on the second target object; finally, the confidence of completion of the action is higher than the preset Threshold video frames are counted. It can count without manually setting the threshold, which can reduce or avoid false counts and improve the accuracy of counting.
上述实施例是对本发明实施例的技术原理和示例性的应用框架的说明,下面通过多个实施例来进一步对本发明实施例具体技术方案进行详细描述。The above-mentioned embodiments are descriptions of the technical principles and exemplary application frameworks of the embodiments of the present invention. The specific technical solutions of the embodiments of the present invention will be further described in detail below through a plurality of embodiments.
实施例一Example one
图2为本发明提供的计数方法一个实施例的流程图,该方法的执行主体可以为上述业务系统,也可以为具有数据处理能力的各种终端或服务器设备,也可以为集成在这些设备上的装置或芯片。如图2所示,该计数方法包括如下步骤:Figure 2 is a flow chart of an embodiment of the counting method provided by the present invention. The execution subject of the method can be the above-mentioned business system, various terminal or server devices with data processing capabilities, or integrated on these devices. Device or chip. As shown in Figure 2, the counting method includes the following steps:
S201,对视频中的视频帧进行目标检测处理,生成特征向量。S201: Perform target detection processing on video frames in the video to generate a feature vector.
在本发明实施例中,当获取到用户记录第一目标对象针对第二目标对象的动作的视频后,将其分割为视频帧序列。然后,对每一个视频帧进行目标检测处理,获取视频帧中的目标对象,并生成该视频帧的特征向量,在该特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度。In the embodiment of the present invention, after a video in which the user records the action of the first target object on the second target object is obtained, it is divided into a sequence of video frames. Then, the target detection process is performed on each video frame, the target object in the video frame is obtained, and the feature vector of the video frame is generated. The feature vector includes at least: the distance information between the first target object and the second target object, The first confidence level of the first target object and the second confidence level of the second target object.
以工厂中工人打包衣服的场景为例,在该场景中,第一目标对象为工人(操作者);第二目标对象为衣服(服装);第一目标对象的第一置信度则为,在当前视频帧中识别出工人正在打包衣服的概率;第二目标对象的第二置信度则为,在当前视频帧中识别出衣服被打包好的概率。通过对工人打包衣服的视频中的各视频帧进行目标检测,可以获取上述数据,然后,将每个视频帧的数据组成针对该视频帧的特征向量。Take the scene of workers packing clothes in a factory as an example. In this scene, the first target object is the worker (operator); the second target object is clothes (clothing); the first confidence level of the first target object is The probability of identifying that the worker is packing clothes in the current video frame; the second confidence level of the second target object is the probability of identifying that the clothes are packed in the current video frame. By performing target detection on each video frame in the video of the worker packing clothes, the above data can be obtained, and then the data of each video frame is composed of a feature vector for the video frame.
S202,根据多个视频帧对应的特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度。S202: Calculate the action completion confidence of each video frame according to a first feature vector sequence composed of feature vectors corresponding to multiple video frames.
在本发明实施例中,视频帧的动作完成置信度为该视频帧中第一目标对象完成对第二目标对象的动作的概率。以工人打包衣服为例,动作完成置信度即为工人完成打包动作的概率。通过对各视频帧的特征向量进行处理,例如,可以使用多层神经网络等技术,则可以计算出每个视频帧的动作完成置信度(打包完成置信度)。In the embodiment of the present invention, the confidence of completion of the action of the video frame is the probability that the first target object in the video frame completes the action on the second target object. Taking the worker packing clothes as an example, the confidence of completion of the action is the probability of the worker completing the packing action. By processing the feature vector of each video frame, for example, a multi-layer neural network can be used to calculate the action completion confidence of each video frame (the packaging completion confidence).
S203,根据动作完成置信度高于预设阈值的视频帧,进行计数。S203: Count the video frames with the confidence of completion of the action being higher than a preset threshold.
在本发明实施例中,可以预先设置一个概率阈值,当某个视频帧中第一目标对象完成对第二目标对象的动作的概率高于该预设阈值时,则计数加一。也就是说,如果一段视频中计算出满足上述条件的视频帧有N个,则第一目标对象完成对第二目标对象的动作的次数为N。In the embodiment of the present invention, a probability threshold may be preset, and when the probability of the first target object completing the action on the second target object in a certain video frame is higher than the preset threshold, the count is increased by one. In other words, if it is calculated that there are N video frames satisfying the above conditions in a video, the number of times the first target object completes the action on the second target object is N.
本发明实施例提供的计数方法,通过对视频帧进行分析,根据目标对象的距离及置信度等参数,获取动作完成的概率,从而判定动作是否完成,无需人工设定阈值,便可对完成动作进行计数,能够减少或避免误计数,提高了计数的准确性。The counting method provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to the distance and confidence of the target object and other parameters, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
实施例二Example two
图3为本发明提供的计数方法另一个实施例的流程图。如图3所示,在上述图2所示实施例的基础上,本实施例提供的计数方法还可以包括以下步骤:Fig. 3 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 3, based on the embodiment shown in FIG. 2, the counting method provided in this embodiment may further include the following steps:
S301,将视频分割为视频帧序列。S301: Split the video into a sequence of video frames.
在本发明实施例中,可以将预设时间段(例如,一天,或者数个小时等)的整段视频分割为视频帧序列,然后将视频帧序列输入到提前训练好的动作完成计数模型(服装 计数模型)中进行计数。In the embodiment of the present invention, the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion count model ( Clothing counting model).
S302,对视频帧序列中的每个视频帧进行目标检测处理,生成特征向量。S302: Perform target detection processing on each video frame in the video frame sequence to generate a feature vector.
图4为本发明实施例提供的动作完成计数模型的结构示意图。如图4所示,在将视频帧序列输入到动作完成计数模型中后,模型首先对视频帧序列进行目标检测处理,从而生成各个视频帧的特征向量,如图4中的特征向量1、特征向量2、……、特征向量n。Fig. 4 is a schematic structural diagram of an action completion counting model provided by an embodiment of the present invention. As shown in Figure 4, after inputting the video frame sequence into the action completion counting model, the model first performs target detection processing on the video frame sequence to generate the feature vector of each video frame, such as feature vector 1, feature in Figure 4 Vector 2,..., feature vector n.
S303,使用循环神经网络对多个视频帧对应的特征向量所组成的第一特征向量序列进行处理,生成包含各视频帧的上下文关系的第二特征向量序列。S303: Use a recurrent neural network to process a first feature vector sequence composed of feature vectors corresponding to multiple video frames, and generate a second feature vector sequence that includes the context of each video frame.
S304,根据第二特征向量序列,计算各视频帧的动作完成置信度。S304: Calculate the action completion confidence of each video frame according to the second feature vector sequence.
在本发明实施例中,可以结合视频帧的上下文内容,进行置信度关联计算,以提高准确性。因此,可以使用循环神经网络对第一特征向量序列进行处理,以生成包含各视频帧的上下文关系的第二特征向量序列。然后,将该第二特征向量序列输入到置信度计算模块,以计算各视频帧的动作完成置信度。具体地,该置信度计算模块可以是在模型训练阶段将训练数据输入到多层感知器中训练而获得。In the embodiment of the present invention, the contextual content of the video frame may be combined to perform confidence correlation calculation to improve accuracy. Therefore, a recurrent neural network can be used to process the first feature vector sequence to generate a second feature vector sequence containing the context of each video frame. Then, the second feature vector sequence is input to the confidence calculation module to calculate the confidence of completion of each video frame. Specifically, the confidence calculation module may be obtained by inputting training data into the multi-layer perceptron during the model training phase.
S305,根据动作完成置信度高于预设阈值的视频帧,进行计数。S305: Counting the video frames with the action completion confidence higher than the preset threshold.
本发明实施例提供的计数方法,通过对视频帧进行分析,根据目标对象的距离及置信度等参数,结合各视频帧的上下文关系,获取动作完成的概率,从而更加准确地判定动作是否完成,无需人工设定阈值,便可对完成动作进行计数,能够减少或避免误计数,提高了计数的准确性。The counting method provided by the embodiments of the present invention obtains the probability of completion of an action by analyzing video frames, according to parameters such as the distance and confidence of the target object, and combining the context of each video frame, so as to more accurately determine whether the action is completed. Without manually setting the threshold, the completed actions can be counted, which can reduce or avoid false counts and improve the accuracy of counting.
实施例三Example three
图5为本发明提供的计数方法又一个实施例的流程图。如图5所示,在上述图2或图3所示实施例的基础上,本发明实施例提供的计数方法还可以包括以下步骤:Figure 5 is a flowchart of another embodiment of the counting method provided by the present invention. As shown in FIG. 5, based on the embodiment shown in FIG. 2 or FIG. 3, the counting method provided in the embodiment of the present invention may further include the following steps:
S501,获取训练视频数据。S501: Obtain training video data.
在本发明实施例中,在使用上述动作完成计数模型进行计数之前,可以通过获取训练视频数据对模型进行训练。训练视频数据可以包括多个训练视频帧中的每个训练视频帧的特征向量以及为每个训练视频帧标注的动作完成标识(打包完成标识),该动作完成标识用于标识训练视频帧中动作是否完成。具体地,为每个训练视频帧标注的动作完成标识是指,该视频帧中是否完成动作,例如,针对衣服打包场景,如果完成打包,则动作完成标识可以记为1(该视频帧用于计数加一),如果未完成打包,则动作完成标识记为0(该视频帧不能用于计数加一)。In the embodiment of the present invention, before using the above-mentioned action completion counting model for counting, the model may be trained by obtaining training video data. The training video data may include the feature vector of each training video frame in the multiple training video frames and the action completion identifier (package completion identifier) annotated for each training video frame. The action completion identifier is used to identify the action in the training video frame Is it complete? Specifically, the action completion flag marked for each training video frame refers to whether the action is completed in the video frame, for example, for a clothes packing scene, if the packaging is completed, the action completion flag can be recorded as 1 (the video frame is used for Count plus one). If the packing is not completed, the action completion flag is recorded as 0 (the video frame cannot be used to count plus one).
S502,根据多个训练视频帧对应的特征向量所组成的第三特征向量序列,计算各训 练视频帧的动作完成置信度。S502: Calculate the action completion confidence of each training video frame according to a third feature vector sequence composed of feature vectors corresponding to multiple training video frames.
在本发明实施例中,根据第三特征向量计算训练视频帧的动作完成置信度的过程,与上述模型使用过程中,根据第一特征向量序列计算各视频帧的动作完成置信度,和,根据第二特征向量序列计算各视频帧的动作完成置信度的过程相同。In the embodiment of the present invention, the process of calculating the confidence of the completion of the action of the training video frame according to the third feature vector, and the process of using the above model, the confidence of the completion of the action of each video frame is calculated according to the first feature vector sequence, and, according to The second feature vector sequence calculates the action completion confidence of each video frame in the same process.
S503,针对每个训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果。S503: For each training video frame, return the training result according to its action completion identifier and action completion confidence.
具体地,当训练视频帧的动作完成置信度高于预设阈值、且训练视频帧的动作完成标识用于标识训练视频帧中动作完成时,返回训练正确结果。Specifically, when the action completion confidence of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame, the correct training result is returned.
在本发明实施例中,针对某个训练视频帧,当动作完成计数模型计算出的动作完成置信度高于预设阈值,说明通过模型计算后,该视频帧可以用于计数加一,如果此时,其动作完成标识为1,则说明训练结果正确。In the embodiment of the present invention, for a certain training video frame, when the action completion confidence level calculated by the action completion counting model is higher than the preset threshold, it means that the video frame can be used to count plus one after calculation by the model. When the action completion flag is 1, the training result is correct.
另外,当训练视频帧的动作完成置信度不高于预设阈值、且训练视频帧的动作完成标识用于标识训练视频帧中动作未完成时,返回训练正确结果。In addition, when the action completion confidence of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed, the correct training result is returned.
在本发明实施例中,针对某个训练视频帧,当动作完成计数模型计算出的动作完成置信度不高于预设阈值,说明通过模型计算后,该视频帧不可用于计数加一,如果此时,其动作完成标识为0,则也说明训练结果正确。In the embodiment of the present invention, for a certain training video frame, when the action completion count model calculates the confidence of completion of the action is not higher than the preset threshold, it means that after calculation by the model, the video frame cannot be used for counting plus one. At this time, its action completion flag is 0, which also shows that the training result is correct.
进一步地,本发明实施例提供的计数方法还可以包括:Further, the counting method provided in the embodiment of the present invention may further include:
S504,在动作完成标识用于标识训练视频帧中动作完成的训练视频帧中,获取动作完成置信度高于预设阈值的训练视频帧的占比。S504: Acquire the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames where the action completion identifier is used to identify the completion of the action in the training video frame.
S505,当占比高于预设比率时,结束训练过程。S505: When the proportion is higher than the preset ratio, the training process is ended.
在本发明实施例中,通过动作完成计数模型输出的正确率来决定模型是否可以使用,例如,向模型输入的动作完成标识为1的视频帧为1000个,这1000个视频帧通过模型输出的动作完成置信度高于预设阈值的视频帧为700个,即占比为70%,如果该占比高于预设比率,则说明模型已训练好,因此,可以结束训练过程。利用该训练好的动作完成计数模型根据输入的视频进行计数。In the embodiment of the present invention, whether the model can be used is determined by the correct rate of the output of the action completion count model. For example, the number of video frames input to the model with the action completion flag of 1 is 1000, and these 1000 video frames are output by the model The number of video frames with the action completion confidence higher than the preset threshold is 700, that is, the proportion is 70%. If the proportion is higher than the preset ratio, it means that the model has been trained. Therefore, the training process can be ended. Use the trained action completion counting model to count according to the input video.
本发明实施例提供的计数方法,通过获取训练视频数据,以对动作完成计数模型进行训练,根据各训练视频帧的动作完成标识与动作完成置信度,返回训练结果,从而提高计数的准确性。The counting method provided by the embodiment of the present invention trains the action completion counting model by acquiring training video data, and returns the training result according to the action completion identifier and the action completion confidence of each training video frame, thereby improving the accuracy of counting.
实施例四Example four
图6为本发明提供的计数装置一个实施例的结构示意图,可用于执行如图2所示的 方法步骤。如图6所示,该计数装置可以包括:目标检测模块62、计算模块63和计数模块64。Fig. 6 is a schematic structural diagram of an embodiment of a counting device provided by the present invention, which can be used to perform the method steps shown in Fig. 2. As shown in FIG. 6, the counting device may include: a target detection module 62, a calculation module 63 and a counting module 64.
其中,目标检测模块62用于对视频中的视频帧进行目标检测处理,生成特征向量,该特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;计算模块63用于根据多个视频帧对应的特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,该动作完成置信度为视频帧中第一目标对象完成对第二目标对象的动作的概率;计数模块64用于根据动作完成置信度高于预设阈值的视频帧,进行计数。Among them, the target detection module 62 is configured to perform target detection processing on the video frames in the video to generate a feature vector, which includes at least: distance information between the first target object and the second target object, and the first target object The confidence level and the second confidence level of the second target object; the calculation module 63 is used to calculate the action completion confidence level of each video frame according to the first feature vector sequence composed of feature vectors corresponding to multiple video frames, and the action completion confidence level The degree is the probability that the first target object completes the action on the second target object in the video frame; the counting module 64 is configured to count the video frames whose action completion confidence is higher than the preset threshold.
在本发明实施例中,当获取到用户记录第一目标对象针对第二目标对象的动作的视频后,将其分割为视频帧序列。然后,目标检测模块62对视频中的每一个视频帧进行目标检测处理,获取视频帧中的目标对象,并生成该视频帧的特征向量。计算模块63根据目标检测模块62生成的各个特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度。当某个视频帧的动作完成置信度高于该预设阈值时,则计数模块64中计数加一。In the embodiment of the present invention, after a video in which the user records the action of the first target object on the second target object is obtained, it is divided into a sequence of video frames. Then, the target detection module 62 performs target detection processing on each video frame in the video, obtains the target object in the video frame, and generates a feature vector of the video frame. The calculation module 63 calculates the action completion confidence of each video frame according to the first feature vector sequence composed of each feature vector generated by the target detection module 62. When the confidence of completion of a certain video frame is higher than the preset threshold, the count in the counting module 64 is increased by one.
本发明实施例提供的计数装置,通过对视频帧进行分析,根据目标对象的距离及置信度等参数,获取动作完成的概率,从而判定动作是否完成,无需人工设定阈值,便可对完成动作进行计数,能够减少或避免误计数,提高了计数的准确性。The counting device provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
实施例五Example five
图7为本发明提供的计数装置另一个实施例的结构示意图,可以用于执行如图3和图5所示的方法步骤。如图7所示,在上述图6所示实施例的基础上,计算模块63可以包括:处理单元631和计算单元632。FIG. 7 is a schematic structural diagram of another embodiment of the counting device provided by the present invention, which can be used to perform the method steps shown in FIG. 3 and FIG. 5. As shown in FIG. 7, based on the embodiment shown in FIG. 6, the calculation module 63 may include: a processing unit 631 and a calculation unit 632.
其中,处理单元631可以用于使用循环神经网络对第一特征向量序列进行处理,生成包含各视频帧的上下文关系的第二特征向量序列;计算单元632可以用于根据第二特征向量序列,计算各视频帧的动作完成置信度。Among them, the processing unit 631 can be used to process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context of each video frame; the calculation unit 632 can be used to calculate the second feature vector sequence according to the second feature vector sequence The action of each video frame completes the confidence level.
在本发明实施例中,计算模块63可以具体用于使用多层神经网络对第一特征向量序列进行处理,计算各视频帧的动作完成置信度。具体地,在计算模块63中,可以结合视频帧的上下文内容,进行置信度关联计算,以提高准确性。因此,处理单元631可以使用循环神经网络对第一特征向量序列进行处理,以生成包含各视频帧的上下文关系的第二特征向量序列。然后,计算单元632根据处理单元631生成的第二特征向量序列,计算各视频帧的动作完成置信度。In the embodiment of the present invention, the calculation module 63 may be specifically configured to use a multilayer neural network to process the first feature vector sequence, and calculate the action completion confidence of each video frame. Specifically, in the calculation module 63, the contextual content of the video frame may be combined to perform a confidence correlation calculation to improve accuracy. Therefore, the processing unit 631 may process the first feature vector sequence using a recurrent neural network to generate a second feature vector sequence containing the context relationship of each video frame. Then, the calculation unit 632 calculates the action completion confidence of each video frame according to the second feature vector sequence generated by the processing unit 631.
进一步地,在本发明实施例中,可以将预设时间段(例如,一天,或者数个小时等)的整段视频分割为视频帧序列,然后将视频帧序列输入到提前训练好的动作完成计数模型中进行计数。因此,本发明实施例提供的计数装置还可以包括:模型训练模块71。该模型训练模块71可以用于获取训练视频数据,该训练视频数据包括多个训练视频帧中的每个训练视频帧的特征向量以及为每个训练视频帧标注的动作完成标识,该动作完成标识用于标识训练视频帧中动作是否完成;根据多个训练视频帧对应的特征向量所组成的第三特征向量序列,计算各训练视频帧的动作完成置信度;针对每个训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果。Further, in the embodiment of the present invention, the entire video of a preset time period (for example, one day, or several hours, etc.) can be divided into a sequence of video frames, and then the sequence of video frames is input to the pre-trained action completion Count in the counting model. Therefore, the counting device provided in the embodiment of the present invention may further include: a model training module 71. The model training module 71 may be used to obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, and the action completion identifier It is used to identify whether the action in the training video frame is completed; according to the third feature vector sequence composed of feature vectors corresponding to multiple training video frames, calculate the confidence of completion of each training video frame; for each training video frame, according to its The action completion flag and the action completion confidence are returned to the training result.
具体地,模型训练模块71可以包括:第一返回单元711。该第一返回单元711可以用于在训练视频帧的动作完成置信度高于预设阈值、且该训练视频帧的动作完成标识用于标识训练视频帧中动作完成的情况下,返回训练正确结果。Specifically, the model training module 71 may include: a first returning unit 711. The first returning unit 711 can be used to return the correct result of training when the confidence of completion of the action of the training video frame is higher than the preset threshold and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame. .
进一步地,模型训练模块71还可以包括:第二返回单元712。该第二返回单元712可以用于在训练视频帧的动作完成置信度不高于预设阈值、且该训练视频帧的动作完成标识用于标识训练视频帧中动作未完成的情况下,返回训练正确结果。Further, the model training module 71 may further include: a second returning unit 712. The second returning unit 712 may be used to return to training when the confidence of completion of the action of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed. Correct result.
另外,本发明实施例提供的计数装置还可以包括:测试模块72。该测试模块72可以用于在动作完成标识用于标识训练视频帧中动作完成的训练视频帧中,获取动作完成置信度高于预设阈值的训练视频帧的占比;在占比高于预设比率的情况下,结束训练过程。In addition, the counting device provided by the embodiment of the present invention may further include a test module 72. The test module 72 can be used to obtain the proportion of training video frames whose action completion confidence is higher than a preset threshold among the training video frames in which the action completion identifier is used to identify the completion of the action in the training video frame; If the ratio is set, the training process ends.
本发明实施例中各模块的功能详见上述方法实施例中的具体描述,在此不再赘述。For the function of each module in the embodiment of the present invention, please refer to the specific description in the above method embodiment, which will not be repeated here.
本发明实施例提供的计数装置,通过对视频帧进行分析,根据目标对象的距离及置信度等参数,获取动作完成的概率,从而判定动作是否完成,无需人工设定阈值,便可对完成动作进行计数,能够减少或避免误计数,提高了计数的准确性。The counting device provided by the embodiment of the present invention obtains the probability of completion of the action by analyzing the video frame, according to parameters such as the distance and confidence of the target object, so as to determine whether the action is completed, and the action can be completed without manually setting a threshold. Counting can reduce or avoid false counting and improve the accuracy of counting.
实施例六Example Six
以上描述了计数装置的内部功能和结构,该装置可实现为一种电子设备。图8为本发明提供的电子设备实施例的结构示意图。如图8所示,该电子设备包括存储器81和处理器82。The internal function and structure of the counting device are described above, and the device can be implemented as an electronic device. FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided by the present invention. As shown in FIG. 8, the electronic device includes a memory 81 and a processor 82.
存储器81,用于存储程序。除上述程序之外,存储器81还可被配置为存储其它各种数据以支持在电子设备上的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。The memory 81 is used to store programs. In addition to the above-mentioned programs, the memory 81 may also be configured to store various other data to support operations on the electronic device. Examples of these data include instructions for any application or method operating on the electronic device, contact data, phone book data, messages, pictures, videos, etc.
存储器81可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静 态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 81 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
处理器82,与存储器81耦合,执行存储器81所存储的程序,以用于:The processor 82 is coupled with the memory 81, and executes the program stored in the memory 81 for:
对视频中的视频帧进行目标检测处理,生成特征向量,特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;Perform target detection processing on the video frames in the video to generate a feature vector. The feature vector includes at least: the distance information between the first target object and the second target object, the first confidence level of the first target object, and the second target object Second confidence
根据多个视频帧对应的特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,动作完成置信度为视频帧中第一目标对象完成对第二目标对象的动作的概率;According to the first feature vector sequence composed of feature vectors corresponding to multiple video frames, calculate the action completion confidence of each video frame. The action completion confidence is the probability that the first target object in the video frame completes the action on the second target object ;
根据动作完成置信度高于预设阈值的视频帧,进行计数。Count the video frames with the confidence of completion of the action higher than the preset threshold.
进一步,如图8所示,电子设备还可以包括:通信组件83、电源组件84、音频组件85、显示器86等其它组件。图8中仅示意性给出部分组件,并不意味着电子设备只包括图8所示组件。Further, as shown in FIG. 8, the electronic device may further include: a communication component 83, a power supply component 84, an audio component 85, a display 86 and other components. Only some components are schematically shown in FIG. 8, which does not mean that the electronic device only includes the components shown in FIG. 8.
通信组件83被配置为便于电子设备和其他设备之间有线或无线方式的通信。电子设备可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件83经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件83还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 83 is configured to facilitate wired or wireless communication between the electronic device and other devices. Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination of them. In an exemplary embodiment, the communication component 83 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 83 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
电源组件84,为电子设备的各种组件提供电力。电源组件84可以包括电源管理系统,一个或多个电源,及其他与为电子设备生成、管理和分配电力相关联的组件。The power component 84 provides power for various components of the electronic device. The power supply component 84 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for electronic devices.
音频组件85被配置为输出和/或输入音频信号。例如,音频组件85包括一个麦克风(MIC),当电子设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器81或经由通信组件83发送。在一些实施例中,音频组件85还包括一个扬声器,用于输出音频信号。The audio component 85 is configured to output and/or input audio signals. For example, the audio component 85 includes a microphone (MIC), and the microphone is configured to receive external audio signals when the electronic device is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in the memory 81 or transmitted via the communication component 83. In some embodiments, the audio component 85 also includes a speaker for outputting audio signals.
显示器86包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器 可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。The display 86 includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the foregoing method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention range.

Claims (18)

  1. 一种服装计数方法,其特征在于,包括:A clothing counting method, characterized in that it comprises:
    对视频中的视频帧进行处理,获取操作者与服装的距离信息、操作者的第一置信度以及服装的第二置信度;Process the video frames in the video to obtain the distance information between the operator and the clothing, the operator’s first confidence level and the clothing’s second confidence level;
    将所述操作者与服装的距离信息、操作者的第一置信度以及服装的第二置信度输入服装计数模型,计算各视频帧的打包完成置信度,所述打包完成置信度为所述视频帧中操作者完成对服装的打包动作的概率;The distance information between the operator and the clothing, the operator’s first confidence level, and the clothing’s second confidence level are input into the clothing counting model, and the packaging completion confidence level of each video frame is calculated. The packaging completion confidence level is the video Probability of the operator in the frame to complete the packing action of the clothing;
    根据所述打包完成置信度高于预设阈值的视频帧,进行服装打包计数。According to the video frames whose packing completion confidence is higher than the preset threshold, the garment packing count is performed.
  2. 根据权利要求1所述的服装计数方法,其特征在于,所述将所述操作者与服装的距离信息、操作者的第一置信度以及服装的第二置信度输入服装计数模型,计算各视频帧的打包完成置信度,包括:The clothing counting method according to claim 1, wherein the distance information between the operator and the clothing, the operator's first confidence level, and the clothing second confidence level are input into the clothing counting model, and each video is calculated Confidence of frame packing completion, including:
    使用多层神经网络计算各视频帧的打包完成置信度。The multi-layer neural network is used to calculate the confidence of the packaging of each video frame.
  3. 根据权利要求2所述的服装计数方法,其特征在于,所述使用多层神经网络计算各视频帧的打包完成置信度,包括:The clothing counting method according to claim 2, wherein said using a multilayer neural network to calculate the confidence of the packaging completion of each video frame comprises:
    使用循环神经网络,获取各视频帧的上下文关系;Use recurrent neural network to obtain the context of each video frame;
    根据所述各视频帧的上下文关系,计算各视频帧的打包完成置信度。According to the context relationship of each video frame, the confidence of the packaging completion of each video frame is calculated.
  4. 一种计数方法,其特征在于,包括:A counting method, characterized in that it comprises:
    对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;
    根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
    根据所述动作完成置信度高于预设阈值的视频帧,进行计数。Count the video frames with the confidence of completion of the action being higher than the preset threshold.
  5. 根据权利要求4所述的计数方法,其特征在于,所述根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,包括:The counting method according to claim 4, wherein the calculating the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames comprises:
    使用多层神经网络对所述第一特征向量序列进行处理,计算各视频帧的动作完成置信度。The multi-layer neural network is used to process the first feature vector sequence, and the action completion confidence of each video frame is calculated.
  6. 根据权利要求5所述的计数方法,其特征在于,所述使用多层神经网络对所述第一特征向量序列进行处理,计算各视频帧的动作完成置信度,包括:The counting method according to claim 5, wherein said using a multilayer neural network to process said first feature vector sequence to calculate the confidence of completion of actions of each video frame comprises:
    使用循环神经网络对所述第一特征向量序列进行处理,生成包含各视频帧的上下文关系的第二特征向量序列;Using a recurrent neural network to process the first feature vector sequence to generate a second feature vector sequence containing the context relationship of each video frame;
    根据所述第二特征向量序列,计算各视频帧的动作完成置信度。According to the second feature vector sequence, the confidence of completion of each video frame is calculated.
  7. 根据权利要求4至6中任一权利要求所述的计数方法,其特征在于,在所述对视频中的视频帧进行目标检测处理之前,还包括:The counting method according to any one of claims 4 to 6, characterized in that, before the target detection processing is performed on the video frames in the video, the method further comprises:
    获取训练视频数据,所述训练视频数据包括多个训练视频帧中的每个训练视频帧的特征向量以及为每个训练视频帧标注的动作完成标识,所述动作完成标识用于标识所述训练视频帧中动作是否完成;Obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, where the action completion identifier is used to identify the training Whether the action in the video frame is completed;
    根据多个训练视频帧对应的所述特征向量所组成的第三特征向量序列,计算各训练视频帧的动作完成置信度;Calculate the action completion confidence of each training video frame according to the third feature vector sequence composed of the feature vectors corresponding to the multiple training video frames;
    针对每个所述训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果。For each training video frame, the training result is returned according to its action completion identifier and action completion confidence.
  8. 根据权利要求7所述的计数方法,其特征在于,所述针对每个所述训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果,包括:The counting method according to claim 7, characterized in that, for each training video frame, returning the training result according to its action completion identifier and action completion confidence, comprising:
    当所述训练视频帧的动作完成置信度高于所述预设阈值、且所述训练视频帧的动作完成标识用于标识所述训练视频帧中动作完成时,返回训练正确结果。When the action completion confidence of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame, a correct training result is returned.
  9. 根据权利要求7所述的计数方法,其特征在于,所述针对每个所述训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果,包括:The counting method according to claim 7, characterized in that, for each training video frame, returning the training result according to its action completion identifier and action completion confidence, comprising:
    当所述训练视频帧的动作完成置信度不高于所述预设阈值、且所述训练视频帧的动作完成标识用于标识所述训练视频帧中动作未完成时,返回训练正确结果。When the action completion confidence of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify that the action in the training video frame is not completed, the correct training result is returned.
  10. 根据权利要求8所述的计数方法,其特征在于,在所述当所述训练视频帧的动作完成置信度高于所述预设阈值、且所述训练视频帧的动作完成标识用于标识所述训练视频帧中动作完成时,返回训练正确结果之后,还包括:The counting method according to claim 8, characterized in that, when the confidence of completion of the action of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the When the action in the training video frame is completed, after returning the correct result of the training, it also includes:
    在所述动作完成标识用于标识所述训练视频帧中动作完成的训练视频帧中,获取所述动作完成置信度高于所述预设阈值的训练视频帧的占比;Acquiring, in the training video frames where the action completion identifier is used to identify the completion of the action in the training video frame, the proportion of the training video frames whose action completion confidence is higher than the preset threshold;
    当所述占比高于预设比率时,结束训练过程。When the proportion is higher than the preset ratio, the training process ends.
  11. 一种计数装置,其特征在于,包括:A counting device, characterized in that it comprises:
    目标检测模块,用于对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;The target detection module is used to perform target detection processing on the video frames in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, and the first confidence of the first target object Degree and the second confidence degree of the second target object;
    计算模块,用于根据多个视频帧对应的所述特征向量所组成的第一特征向量序列, 计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;The calculation module is configured to calculate the action completion confidence of each video frame according to the first feature vector sequence composed of the feature vectors corresponding to the multiple video frames, and the action completion confidence is the first target in the video frame The probability of the object completing the action on the second target object;
    计数模块,用于根据所述动作完成置信度高于预设阈值的视频帧,进行计数。The counting module is used to count the video frames with the confidence of completion of the action being higher than the preset threshold.
  12. 根据权利要求11所述的计数装置,其特征在于,所述计算模块具体用于,使用多层神经网络对所述第一特征向量序列进行处理,计算各视频帧的动作完成置信度。The counting device according to claim 11, wherein the calculation module is specifically configured to use a multilayer neural network to process the first feature vector sequence to calculate the action completion confidence of each video frame.
  13. 根据权利要求12所述的计数装置,其特征在于,所述计算模块包括:The counting device according to claim 12, wherein the calculation module comprises:
    处理单元,用于使用循环神经网络对所述第一特征向量序列进行处理,生成包含各视频帧的上下文关系的第二特征向量序列;A processing unit, configured to process the first feature vector sequence using a cyclic neural network to generate a second feature vector sequence containing the context relationship of each video frame;
    计算单元,用于根据所述第二特征向量序列,计算各视频帧的动作完成置信度。The calculation unit is configured to calculate the confidence of completion of the action of each video frame according to the second feature vector sequence.
  14. 根据权利要求11至13中任一权利要求所述的计数装置,其特征在于,还包括:The counting device according to any one of claims 11 to 13, characterized in that it further comprises:
    模型训练模块,用于获取训练视频数据,所述训练视频数据包括多个训练视频帧中的每个训练视频帧的特征向量以及为每个训练视频帧标注的动作完成标识,所述动作完成标识用于标识所述训练视频帧中动作是否完成;根据多个训练视频帧对应的所述特征向量所组成的第三特征向量序列,计算各训练视频帧的动作完成置信度;针对每个所述训练视频帧,根据其动作完成标识与动作完成置信度,返回训练结果。The model training module is used to obtain training video data. The training video data includes a feature vector of each training video frame in a plurality of training video frames and an action completion identifier labeled for each training video frame, the action completion identifier It is used to identify whether the action in the training video frame is completed; according to the third feature vector sequence composed of the feature vectors corresponding to the multiple training video frames, calculate the action completion confidence of each training video frame; The training video frame returns the training result according to its action completion flag and action completion confidence.
  15. 根据权利要求14所述的计数装置,其特征在于,所述模型训练模块包括:The counting device according to claim 14, wherein the model training module comprises:
    第一返回单元,用于在所述训练视频帧的动作完成置信度高于所述预设阈值、且所述训练视频帧的动作完成标识用于标识所述训练视频帧中动作完成的情况下,返回训练正确结果。The first return unit is configured to: when the confidence of completion of the action of the training video frame is higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the completion of the action in the training video frame To return the correct result of training.
  16. 根据权利要求14所述的计数装置,其特征在于,所述模型训练模块还包括:The counting device according to claim 14, wherein the model training module further comprises:
    第二返回单元,用于在所述训练视频帧的动作完成置信度不高于所述预设阈值、且所述训练视频帧的动作完成标识用于标识所述训练视频帧中动作未完成的情况下,返回训练正确结果。The second returning unit is used to indicate that the confidence of completion of the action of the training video frame is not higher than the preset threshold, and the action completion identifier of the training video frame is used to identify the unfinished action in the training video frame In case, the correct result of training is returned.
  17. 根据权利要求15所述的计数装置,其特征在于,还包括:15. The counting device according to claim 15, further comprising:
    测试模块,用于在所述动作完成标识用于标识所述训练视频帧中动作完成的训练视频帧中,获取所述动作完成置信度高于所述预设阈值的训练视频帧的占比;在所述占比高于预设比率的情况下,结束训练过程。A testing module, configured to obtain the proportion of training video frames whose action completion confidence is higher than the preset threshold among the training video frames in which the action completion identifier is used to identify the completion of the action in the training video frame; When the proportion is higher than the preset ratio, the training process is ended.
  18. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    存储器,用于存储程序;Memory, used to store programs;
    处理器,用于运行所述存储器中存储的所述程序,以用于:The processor is configured to run the program stored in the memory for:
    对视频中的视频帧进行目标检测处理,生成特征向量,所述特征向量中至少包括:第一目标对象与第二目标对象的距离信息、第一目标对象的第一置信度以及第二目标对象的第二置信度;Perform target detection processing on the video frame in the video to generate a feature vector, the feature vector at least including: the distance information between the first target object and the second target object, the first confidence of the first target object, and the second target object Second degree of confidence;
    根据多个视频帧对应的所述特征向量所组成的第一特征向量序列,计算各视频帧的动作完成置信度,所述动作完成置信度为所述视频帧中第一目标对象完成对第二目标对象的动作的概率;According to the first feature vector sequence composed of the feature vectors corresponding to multiple video frames, the action completion confidence of each video frame is calculated. The action completion confidence is that the first target object in the video frame completes the second Probability of the target object's action;
    根据所述动作完成置信度高于预设阈值的视频帧,进行计数。Count the video frames with the confidence of completion of the action being higher than the preset threshold.
PCT/CN2020/074214 2019-02-12 2020-02-03 Method for counting items of clothing, counting method and apparatus, and electronic device WO2020164401A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910111446.4 2019-02-12
CN201910111446.4A CN111553180B (en) 2019-02-12 2019-02-12 Garment counting method, garment counting method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2020164401A1 true WO2020164401A1 (en) 2020-08-20

Family

ID=72005429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/074214 WO2020164401A1 (en) 2019-02-12 2020-02-03 Method for counting items of clothing, counting method and apparatus, and electronic device

Country Status (3)

Country Link
CN (1) CN111553180B (en)
TW (1) TW202030642A (en)
WO (1) WO2020164401A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944382A (en) * 2017-11-20 2018-04-20 北京旷视科技有限公司 Method for tracking target, device and electronic equipment
CN108241844A (en) * 2016-12-27 2018-07-03 北京文安智能技术股份有限公司 A kind of public traffice passenger flow statistical method, device and electronic equipment
CN108470255A (en) * 2018-04-12 2018-08-31 上海小蚁科技有限公司 Workload Account method and device, storage medium, computing device
CN108491759A (en) * 2018-02-10 2018-09-04 合肥迪宏自动化有限公司 A kind of process detection device and its process detection method based on deep learning
CN108986064A (en) * 2017-05-31 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of people flow rate statistical method, equipment and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
HK1081393A2 (en) * 2006-02-06 2006-05-12 Msc Ltd Improving manufacturing using rfid means
WO2017146930A1 (en) * 2016-02-22 2017-08-31 Rapiscan Systems, Inc. Systems and methods for detecting threats and contraband in cargo
CN207529154U (en) * 2017-10-27 2018-06-22 成都华西天然药物有限公司 A kind of clothes of electromechanical integration are regulated the traffic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241844A (en) * 2016-12-27 2018-07-03 北京文安智能技术股份有限公司 A kind of public traffice passenger flow statistical method, device and electronic equipment
CN108986064A (en) * 2017-05-31 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of people flow rate statistical method, equipment and system
CN107944382A (en) * 2017-11-20 2018-04-20 北京旷视科技有限公司 Method for tracking target, device and electronic equipment
CN108491759A (en) * 2018-02-10 2018-09-04 合肥迪宏自动化有限公司 A kind of process detection device and its process detection method based on deep learning
CN108470255A (en) * 2018-04-12 2018-08-31 上海小蚁科技有限公司 Workload Account method and device, storage medium, computing device

Also Published As

Publication number Publication date
CN111553180A (en) 2020-08-18
TW202030642A (en) 2020-08-16
CN111553180B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US20190370551A1 (en) Object detection and tracking delay reduction in video analytics
CN110659397B (en) Behavior detection method and device, electronic equipment and storage medium
CN105306931A (en) Smart TV anomaly detection method and device
US20180047173A1 (en) Methods and systems of performing content-adaptive object tracking in video analytics
CN111078446A (en) Fault information acquisition method and device, electronic equipment and storage medium
CN109767453A (en) Information processing unit, background image update method and non-transient computer readable storage medium
CN109271929B (en) Detection method and device
CN106559631A (en) Method for processing video frequency and device
CN107508573A (en) Crystal oscillator oscillation frequency correction method and device
CN114760339A (en) Fault prediction method, apparatus, device, medium, and product
CN110717399A (en) Face recognition method and electronic terminal equipment
WO2020039559A1 (en) Information processing device, information processing method, and work evaluation system
WO2019101002A1 (en) Shoe data acquisition method and device, and computer storage medium
WO2020164401A1 (en) Method for counting items of clothing, counting method and apparatus, and electronic device
WO2022057806A1 (en) Background image self-updating method, apparatus and device, and storage medium
CN111047049B (en) Method, device and medium for processing multimedia data based on machine learning model
US20200311401A1 (en) Analyzing apparatus, control method, and program
WO2021057879A1 (en) Data processing system and method, electronic device, and computer readable storage medium
US9218669B1 (en) Image ghost removal
CN115409094A (en) Equipment fault prediction method, device and storage medium
CN112070094B (en) Method and device for screening training data, electronic equipment and storage medium
CN111913942B (en) Data quality detection method and device
US11398091B1 (en) Repairing missing frames in recorded video with machine learning
CN109241729B (en) Application program detection and processing method and device, terminal device and electronic device
US11763561B2 (en) Information processing apparatus, information processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20755516

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20755516

Country of ref document: EP

Kind code of ref document: A1