WO2023248676A1 - 推定方法及び推定装置 - Google Patents
推定方法及び推定装置 Download PDFInfo
- Publication number
- WO2023248676A1 WO2023248676A1 PCT/JP2023/019081 JP2023019081W WO2023248676A1 WO 2023248676 A1 WO2023248676 A1 WO 2023248676A1 JP 2023019081 W JP2023019081 W JP 2023019081W WO 2023248676 A1 WO2023248676 A1 WO 2023248676A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- work
- sound
- estimation
- data
- worker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
- G06Q10/063114—Status monitoring or status determination for a person or group
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- the present disclosure relates to an estimation method and the like for estimating the work of a worker.
- Patent Document 1 discloses a technique for classifying tasks by identifying an object (for example, a transparent object, etc.) handled in the task from images captured under a plurality of imaging conditions.
- the present disclosure provides an estimation method etc. that can accurately estimate the work of handling transparent objects.
- An estimation method is an estimation method for estimating the work of a worker using a computer, wherein the computer acquires data of collected work sounds accompanying the work, and By inputting the work sound data into the first model, it is estimated whether or not the worker is working with a transparent object.
- FIG. 1 is a block diagram showing an example of the functional configuration of an estimation system according to an embodiment.
- FIG. 2 is a flowchart showing an example 1 of operation of the estimation system according to the embodiment.
- FIG. 3 is a diagram schematically showing an example of the flow in step S02 of FIG.
- FIG. 4 is a diagram showing a graph of the degree of similarity between the feature amount of the collected work sound and the feature amount of the work sound of work that handles transparent objects.
- FIG. 5 is a diagram showing the results of time-series analysis of one hour's worth of work sounds in Verification Example 1.
- FIG. 6 is a diagram for explaining a method for estimating bagging work performed in Verification Example 3.
- FIG. 7 is a diagram illustrating an example of the architecture of a neural network.
- FIG. 1 is a block diagram showing an example of the functional configuration of an estimation system according to an embodiment.
- FIG. 2 is a flowchart showing an example 1 of operation of the estimation system according to the embodiment.
- FIG. 3
- FIG. 8 is a diagram showing a method for calculating the correct answer rate when estimating two categories.
- FIG. 9 is a diagram showing estimation results and correct answer rates for two classifications in Verification Example 3.
- FIG. 10 is a diagram showing a method for calculating the correct answer rate when estimating three categories.
- FIG. 11 is a diagram showing the estimation results and correct answer rates for the three classifications in Verification Example 3.
- FIG. 12 is a diagram showing a method for estimating two categories based on a combination of input data and a method for calculating a correct answer rate.
- FIG. 13 is a diagram showing estimation results and correct answer rates for two classifications based on combinations of input data in Verification Example 3.
- FIG. 14 is a diagram showing a comparison result of the estimation accuracy of the estimation method using image AI and the estimation method of Operation Example 1.
- FIG. 15 is a diagram for explaining the difference between the estimation results using work sound data and the estimation results using image data.
- FIG. 16 is a diagram for explaining an overview of the flow of operation example 2 of the estimation system according to the embodiment.
- FIG. 17 is a flowchart showing a second operation example of the estimation system according to the embodiment.
- FIG. 18 is a flowchart showing a modification 1 of the operation example 2 of the estimation system according to the embodiment.
- FIG. 19 is a diagram schematically illustrating a configuration example 1 of an estimator that executes the flow of the modification 1 of the operation example 2.
- FIG. 20 is a diagram for explaining a method of estimating bagging work performed by Configuration Example 1.
- FIG. 21 is a diagram schematically illustrating a second configuration example of an estimation unit that executes the flow of the first modification of the second operation example.
- FIG. 22 is a diagram schematically illustrating a third configuration example of an estimation unit that executes the flow of the first modification of the second operation example.
- FIG. 23 is a diagram illustrating an example of the architecture of an image subnetwork.
- FIG. 24 is a diagram illustrating an example of the architecture of the sound subnetwork.
- FIG. 25 is a diagram illustrating an example of the architecture of the fusion layer.
- FIG. 26 is a diagram illustrating an example of the architecture of a classification network.
- FIG. 21 is a diagram schematically illustrating a second configuration example of an estimation unit that executes the flow of the first modification of the second operation example.
- FIG. 22 is a diagram schematically illustrating a third configuration example of an estimation unit that executes the flow of the
- FIG. 27 is a diagram illustrating an example of the architecture of a contrastive learning network.
- FIG. 28 is a diagram schematically illustrating a configuration example of an estimating unit that executes the flow of Modification 2 of Operation Example 2.
- FIG. 29 is a diagram illustrating an example of work sounds when the estimator incorrectly estimates that a worker is working with a transparent object.
- FIG. 30A is a flowchart of operation example 3 of the estimation system in the embodiment.
- FIG. 30B is a flowchart illustrating an example of an operation for pre-registering feature amounts of work sounds that may be erroneously estimated.
- FIG. 31 is a block diagram showing an example of the functional configuration of an estimation system according to another embodiment.
- the system automatically collects data on the work performed by workers, classifies the work, and measures the time required for each classification. This allows the user to understand which work the worker is spending time on, and thus allows the user to create a work plan so that the worker can work more efficiently.
- the work is classified by photographing the work performed by the worker with a camera and identifying the objects handled by the worker.
- a transparent object is identified from a plurality of images captured under different imaging conditions, and a worker is classified as working with a transparent object.
- objects with high transparency so-called transparent objects
- Difficult to identify in video Therefore, with the technique described in Patent Document 1, it may not be possible to accurately estimate the task of handling a transparent object.
- the inventors of the present application have developed a system that collects work sounds accompanying work (in other words, sounds generated during work), so that even if a transparent object is moved or deformed by a worker's work, We found that the work involved in handling transparent objects can be estimated with high accuracy.
- the estimation method of Example 1 is an estimation method using a computer to estimate the work of a worker, wherein the computer acquires data of collected work sounds accompanying the work. By inputting the work sound data into the trained first model, it is estimated whether or not the worker is working with a transparent object.
- the device that executes the estimation method uses the first model that inputs work sound data and outputs whether or not the work is handling a transparent object, so it is possible to estimate the work that handles a transparent object with high accuracy. Can be done.
- an estimation method of Example 2 is the estimation method of Example 1, in which the computer generates data of an image of the worker performing the work corresponding to the data of the work sound.
- the estimation method may also include estimating whether or not the worker is working with the transparent object based on the estimation result using the second model.
- the estimation result using the first model is the estimation result estimated from the work sound data by the first model
- the estimation result using the second model is the estimation result estimated from the image data using the second model. This is the estimation result.
- the device that executes the estimation method determines whether a worker can detect a transparent object based on the estimation result estimated from the work sound data using the first model and the estimation result estimated from the image data using the second model. Estimate whether or not you are working with. Therefore, the device that executes the estimation method can estimate the task of handling a transparent object with higher accuracy than when estimating using only the data of the task sound.
- an estimation method of Example 3 is the estimation method of Example 1, in which the computer generates data of an image of the worker performing the work corresponding to the work sound data. and inputting the work sound data and the image data into the first model to estimate whether or not the worker is working with the transparent object. You can.
- the device that executes the estimation method uses the first model that receives work sound data and image data corresponding to the work sound as input and outputs whether or not the work involves handling a transparent object. It is possible to estimate the task of handling transparent objects with higher accuracy than when estimating using only the data of .
- the estimation method of Example 4 is the estimation method of any one of Examples 1 to 3, wherein the computer calculates the feature amount of the work sound output from the first model.
- the device that executes the estimation method determines whether the worker recognizes the transparent object based on the similarity between the feature amount of the work sound output from the first model and the feature amount of the work sound of the work that handles the transparent object. Since it is estimated whether the user is working on handling transparent objects, it is possible to accurately estimate the work on handling transparent objects.
- the estimation method of Example 5 is the estimation method of any one of Examples 1 to 4, and the computer further includes a feature of the work sound output from the first model. the degree of similarity of the amount to the feature amount of the work sound of the work handling the transparent object stored in the storage unit in advance, and the similarity between the amount and the feature value of the work sound of the work handling the transparent object stored in the storage unit in advance;
- the estimation method may estimate whether or not the worker is working with the transparent object based on a degree of similarity to a feature amount of the work sound that may be incorrectly estimated.
- the device that executes the estimation method calculates the degree of similarity between the feature amount of the work sound output from the first model and the feature amount of the work sound of work that handles transparent objects, and By comparing the degree of similarity between the feature amount of the work sound and the feature amount of the work sound that may be erroneously estimated, it is possible to reduce the occurrence of erroneous estimation. Therefore, the device that executes the estimation method can accurately estimate the task of handling a transparent object even if only the data of the task sound is used.
- an estimation method of Example 6 is the estimation method of Example 5, in which the computer handles the transparent object in the feature amount of the work sound output from the first model. If the degree of similarity to the feature amount of the work sound of the work exceeds the degree of similarity to the feature amount of the work sound that may be erroneously estimated as the worker is working with the transparent object, the worker The estimation method may be such that it is estimated that the user is working on the transparent object.
- the device that executes the estimation method can reduce the occurrence of erroneous estimations, so it can accurately estimate the task of handling a transparent object even if only the work sound data is used.
- the estimation method of Example 7 is the estimation method of Example 5 or Example 6, in which the computer collects work sound data of work that handles a non-transparent object different from the transparent object. If the degree of similarity between the feature amount of the work sound of the work that handles the non-transparent object obtained by inputting it into the first model and the feature amount of the work sound of the work that handles the transparent object exceeds a threshold, then The work sound of the work that handles a transparent object is determined to be a work sound that can be mistakenly estimated as the work sound of the work that handles the transparent object, and the feature amount of the work sound of the work that handles the non-transparent object is determined to be the work sound that can be mistakenly estimated as the work sound of the work that handles the transparent object.
- the estimation method may be such that the feature amount of the work sound is stored in the storage unit as a feature amount of the work sound that can be estimated.
- the device that executes the estimation method handles the non-transparent object based on the similarity between the feature amount of the work sound of the work that handles the non-transparent object and the feature amount of the work sound of the work that handles the transparent object. It is possible to accurately determine whether the work sound is a work sound that can be mistakenly assumed to be work that involves handling a transparent object. Therefore, the device that executes the estimation method can store in the storage unit feature amounts of work sounds that are relatively likely to be incorrectly estimated. Therefore, the device that executes the estimation method can reduce the occurrence of erroneous estimation by using the feature quantities of the work sound stored in the storage unit that can be erroneously estimated. It is also possible to estimate the work involved in handling transparent objects with high accuracy.
- the estimation method of Example 8 is the estimation method of any one of Examples 1 to 7, wherein the work sound data includes sound data in an inaudible band. It may be.
- the device that executes the estimation method estimates whether or not the worker is working with a transparent object using work sound data that includes sounds in the audible band to sounds in the inaudible band.
- work sound data includes sounds in the inaudible range
- a device that performs the estimation method can estimate whether or not a worker is working with a transparent object based on more information than when using only sound data in the audible range. . Therefore, the device that executes the estimation method can more accurately estimate the task of handling transparent objects.
- an estimation device that estimates a worker's work, and includes an acquisition unit that acquires data of a work sound accompanying the collected work; an estimation unit that estimates whether or not the worker is working with a transparent object by inputting data of the work sound into the completed first model.
- the estimating device uses the first model that inputs work sound data and outputs whether or not the work is handling a transparent object, so it can accurately estimate the work that involves handling a transparent object.
- Example 10 is a program for causing a computer to execute any of the estimation methods of Examples 1 to 8.
- CD-ROM compact disc read only memory
- FIG. 1 is a block diagram showing an example of the functional configuration of an estimation system 200 in an embodiment.
- the estimation system 200 is a system that estimates the work of a worker. For example, the estimation system 200 acquires the work sounds associated with work collected by the sound collection device 10, and inputs the work sound data into a trained first model 132 (hereinafter also simply referred to as the first model 132). This system estimates whether or not a worker is working with transparent objects by inputting .
- the estimation system 200 may display the estimation result estimated by the estimation device 100 on the display unit of the information terminal 50 to present it to the user.
- the user can refer to the estimation results to understand the time required for work with transparent objects and work to handle non-transparent objects. Since it is possible to make a work plan, it is possible to improve the efficiency of work in the work space 80.
- Work sounds associated with work include sounds generated during work.
- Work sounds are, for example, sounds generated when a worker moves or deforms an object handled by a worker.
- the work includes, for example, picking parts, cleaning work, inspection, or packaging.
- the work space 80 refers to a space where workers work in, for example, a manufacturing factory or a distribution warehouse.
- the transparent object is a highly transparent object, and is made of a highly transparent material such as synthetic resin or glass.
- High transparency means, for example, that when the object is in the form of a sheet or is composed of a sheet, the haze of the sheet is less than 0.5%; When it is shaped like a plate or a block, or when it is composed of a flat or block shape, it means that the refractive index of light is 1.30 or more and 1.70 or less.
- the transparent object is, for example, a container, a bag, a cushioning material, or a component.
- Examples of synthetic resins include vinyl resins such as polyvinyl chloride resins, polycarbonate resins, polyester resins, polyethylene naphthalate resins, polyethylene resins, polypropylene resins, polyimide resins, polystyrene resins, urethane resins, acrylic resins, and fluorine resins. You can. Note that the material constituting the highly transparent object is not limited to the above example, and may include, for example, a natural polymer such as fine fibrous cellulose.
- the estimation system 200 acquires data of an image of a worker performing work captured by the imaging device 20, and inputs the acquired image data and work sound data to the first model 132. , it may be estimated whether the worker is working with a transparent object or not, or image data may be input to the trained second model 133 (hereinafter also simply referred to as the second model 133). Based on the obtained estimation result and the estimation result obtained by inputting work sound data into the first model 132, it may be estimated whether the worker is working with a transparent object. The image data corresponds to the work sound data.
- the estimation system 200 includes, for example, a sound collection device 10, an imaging device 20, an information terminal 50, and an estimation device 100.
- the sound collecting device 10 and the imaging device 20 are installed in a space where a worker works (work space 80), and are connected to the information terminal 50 and the estimation device 100 via communication.
- work space 80 a space where a worker works
- the configuration of the estimation system 200 shown in FIG. 1 is just an example, and is not limited to this example.
- the sound collection device 10 collects, for example, work sounds accompanying the work of a worker.
- the sound collection device 10 is installed in a work space 80, for example.
- the sound collection device 10 is capable of collecting sounds from an audible band to an inaudible band.
- the audible band is a frequency band that is perceptible to the human ear
- the inaudible band is a frequency band that is not perceptible to the human ear.
- the sound in the inaudible band is, for example, the sound in the frequency band of 20 kHz or higher.
- the sound collection device 10 is a microphone, and may be a (Micro Electro Mechanical Systems) microphone or a laser microphone, for example.
- the sound collection device 10 is a laser microphone, for example, it can collect sound in a wider band than a normal microphone. Furthermore, since a laser microphone does not have a diaphragm like a normal microphone, it is possible to collect sound even in an environment of electromagnetic waves, high temperature, or high heat.
- FIG. 1 shows an example in which the estimation system 200 includes one sound collection device 10, it may include two or more sound collection devices 10. Moreover, the sound collection device 10 may be a directional microphone. This makes it difficult for the sound collection device 10 to collect sounds that become noise, such as surrounding noise, so that the sound collection device 10 can collect work sounds with high sensitivity.
- the sound collection device 10 converts the collected sound (work sound) into an electrical signal and outputs it to the estimation device 100.
- the sound collection device 10 may attach a time stamp and its own identification number to the collected work sound data, and output the data to the estimation device 100.
- the imaging device 20 captures, for example, an image of a worker performing work.
- the image data corresponds to the work sound data collected by the sound collection device 10.
- the imaging device 20 operates in conjunction with the sound collecting device 10, and for example, by attaching a time stamp to the acquired data (work sound data and image data), it is possible to combine the work sound data and the image data. It may also be associated with data. At this time, for example, the imaging device 20 may attach its own identification number to the image data.
- the imaging device 20 is installed in a work space 80, for example.
- the imaging device 20 is, for example, an RGB camera, but may also include distance data.
- the imaging device 20 outputs data of the captured image to the estimation device 100.
- the information terminal 50 is an information terminal used by a user, and is, for example, a personal computer or a tablet terminal.
- the information terminal 50 displays the estimation result estimated by the estimation device 100 on the display unit.
- the information terminal 50 also receives instructions input by the user and transmits the instructions to the sound collection device 10, the imaging device 20, and the estimation device 100.
- the estimation device 100 is a device that estimates the work of a worker. For example, the estimation device 100 acquires data on work sounds associated with work collected by the sound collection device 10, and inputs the data on the work sounds to the trained first model 132. Estimate whether the user is working with transparent objects.
- the estimation device 100 includes a communication section 110, an information processing section 120, a storage section 130, a model generation section 140, and an input reception section 150.
- Estimation device 100 is, for example, a server device. Note that in the example of FIG. 1, the estimation device 100 includes the second model 133, but does not necessarily need to include the second model 133. Each configuration of the estimation device 100 will be described below.
- the communication unit 110 is a communication circuit (communication module) for the estimation device 100 to communicate with the sound collection device 10 and the imaging device 20.
- the communication unit 110 includes a communication circuit (communication module) for communicating via a wide area communication network, but may also include a communication circuit (communication module) for communicating via a local communication network.
- the communication unit 110 is, for example, a wireless communication circuit that performs wireless communication, but may also be a wired communication circuit that performs wired communication. Note that the communication standard for communication performed by the communication unit 110 is not particularly limited.
- the information processing unit 120 performs various information processing regarding the estimation device 100. More specifically, for example, the information processing unit 120 acquires work sound data (for example, an electrical signal of the work sound) collected by the sound collection device 10, and analyzes the work performed by the worker when handling a transparent object. performs various information processing related to estimating whether or not the Further, for example, the information processing unit 120 acquires data of an image of a worker performing work captured by the imaging device 20, and relates to estimation of whether or not the worker is working with a transparent object. Various information processing may be performed. The information processing unit 120 may estimate the work using work sound data, or may estimate the work using work sound data and image data. Specifically, the information processing section 120 includes an acquisition section 121 and an estimation section 122. The functions of the acquisition unit 121 and the estimation unit 122 are realized by a processor or a microcomputer forming the information processing unit 120 executing a computer program stored in the storage unit 130.
- work sound data for example, an electrical signal of the work sound
- the information processing unit 120 acquires data of
- the acquisition unit 121 acquires, for example, data on work sounds collected by the sound collection device 10.
- the work sound data is the sound accompanying the work of the worker, for example, the sound generated in conjunction with the work of the worker. Further, the acquisition unit 121 acquires, for example, data of an image of a worker performing work, which is captured by the imaging device 20 and corresponds to data of work sounds.
- the work sound data may be a spectrogram image created by Fourier transforming the electric signal of the work sound collected by the sound collection device 10, or may be time-series numerical data.
- the estimation unit 122 estimates whether or not the worker is working with a transparent object from the work sound data. For example, the estimation unit 122 estimates whether or not the worker is working with a transparent object by inputting work sound data to a trained first model 132 (hereinafter referred to as the first model 132). do. Specifically, for example, the estimation unit 122 uses the feature amount of the work sound output from the first model 132 and the transparent object stored in advance in the storage unit 130 (for example, the feature database 131 in the storage unit 130). It is estimated whether the worker is working with a transparent object or not based on the similarity with the feature amount of the work sound of the work with the transparent object.
- the estimating unit 122 inputs work sound data to the first model 132, and stores the feature amount of the work sound of work handling a transparent object extracted by the first model 132 and the storage unit in advance. 130, and if the calculated similarity is greater than or equal to a predetermined value (that is, a threshold), the worker can perform the task of handling transparent objects. It may be presumed that they are doing so.
- a predetermined value that is, a threshold
- the estimation unit 122 is not limited to this example, and may use a model that directly outputs an estimation result of whether or not a worker is working with a transparent object based on work sound data.
- the estimation unit 122 calculates that when the work sound data and the image data indicate that the worker is facing a transparent object. It may also be possible to estimate whether or not the person is working on something. Specifically, for example, the estimating unit 122 inputs work sound data and image data of an image of the worker performing the work corresponding to the work sound data into the first model 132. Estimate whether the user is working with transparent objects. Details of the first model 132 will be described later.
- the estimation unit 122 when the estimation device 100 includes a trained second model 133, when the above image data is acquired by the acquisition unit 121, the estimation unit 122 inputs the image data to the second model 133. By doing so, it is possible to estimate whether or not the worker is working with transparent objects. At this time, the estimation unit 122 inputs into the first model 132 the data of the work sound of the work performed by the worker reflected in the data of the image acquired by the acquisition unit 121, so that the worker can hear the transparent object. Estimate whether or not you are working on handling. Then, the estimating unit 122 determines whether the worker can Estimate whether you are working with transparent objects.
- the estimating unit 122 may determine, for example, whether the work sound collected by the sound collection device 10 is a work sound that can be erroneously estimated to be the work sound of work handling a transparent object. Specifically, the estimation unit 122 calculates, for example, the feature amount of the work sound of the work that handles a non-transparent object, which is obtained by inputting the data of the work sound of the work that handles a non-transparent object different from the transparent object into the first model 132.
- the estimation unit 122 selects the work sound of the work that handles the non-transparent object as the work sound of the work that handles the transparent object. It is determined that the sound is a work sound that could be incorrectly estimated as a work sound. Then, the estimating unit 122 stores the feature amount of the work sound determined to be a work sound that can be incorrectly estimated in the feature amount database 131 (feature amount DB) of the storage unit 130.
- a predetermined value so-called threshold value
- the feature database 131 may store feature amounts of work sounds of work that handles transparent objects that have been stored in advance.
- the feature database 131 will be described later.
- the storage unit 130 is a storage device that stores dedicated application programs and the like for the information processing unit 120 to execute various information processes.
- the storage unit 130 stores a feature database 131, a first model 132, and a second model 133.
- the storage unit 130 is realized by, for example, an HDD (Hard Disk Drive), but may also be realized by a semiconductor memory.
- the feature database 131 stores feature amounts of work sounds extracted in advance. This feature amount may be expressed as a numerical value or a combination of numerical values as an embedding (for example, a tensor, a matrix, etc.), an embedding vector, a distributed representation, or the like.
- the feature database 131 stores feature amounts of work sounds that accompany work that involves handling transparent objects, and feature amounts of work sounds that may be mistakenly assumed to indicate that a worker is working with transparent objects. You can.
- the feature database 131 may store feature amounts of images extracted in advance. For example, the feature database 131 may store feature amounts of an image in which a worker working with a transparent object is shown (specifically, a feature amount indicating a transparent object appearing in the image).
- the first model 132 is, for example, a trained model generated by the model generation unit 140.
- the first model 132 receives, for example, work sound data as input and outputs whether or not the worker is working with a transparent object. More specifically, the first model 132 extracts the feature amount of the input work sound data, for example, and combines the extracted feature amount with the work of the work handling transparent objects stored in the storage unit 130 in advance. The degree of similarity with the feature amount of the sound is calculated, and if the calculated degree of similarity is greater than or equal to a predetermined value, it is estimated that the worker is working with a transparent object.
- the first model 132 further inputs data of an image of a worker performing work corresponding to the work sound data, and outputs whether or not the worker is working with a transparent object. You may. More specifically, the first model 132, for example, extracts the feature amount of the input image data, and uses the extracted feature amount and the task of handling a transparent object stored in advance in the storage unit 130. If the calculated similarity is equal to or greater than a predetermined value, it may be assumed that the worker is working with a transparent object.
- the second model 133 is a trained model generated by the model generation unit 140.
- the second model 133 inputs, for example, data of an image of a worker performing work corresponding to work sound data, and outputs whether or not the worker is working with a transparent object. More specifically, the second model 133, for example, extracts the feature amount of the input image data, and uses the extracted feature amount and the task of handling a transparent object stored in advance in the storage unit 130. If the calculated similarity is equal to or greater than a predetermined value, it may be assumed that the worker is working with a transparent object.
- first model 132 and the second model 133 may extract the feature amount of the input data and output the extracted feature amount.
- the first model 132 and the second model 133 are neural network models, such as a convolutional neural network (CNN), a recurrent neural network (RNN), or a LSTM (Long-Short Term Memory). Good too.
- CNN convolutional neural network
- RNN recurrent neural network
- LSTM Long-Short Term Memory
- the model generation unit 140 generates the first model 132 and the second model 133 by, for example, machine learning using teacher data.
- the model generation unit 140 uses machine learning to create a sound identification model (hereinafter also referred to as an acoustic subnetwork) that receives work sound data as input and outputs whether or not a worker is working with a transparent object. generate.
- the model generation unit 140 uses machine learning to further input data of an image of a worker performing work corresponding to the work sound data, and determines whether the worker is working with a transparent object.
- An image identification model (hereinafter also referred to as a video subnetwork) may be generated that outputs whether or not the image is displayed.
- the first model 132 may be, for example, a sound identification model or a model including a sound identification model and an image identification model.
- the work sound data input to the first model 132 may be, for example, a spectrogram image or time-series numerical data.
- the work sound data may include sound data in an inaudible band.
- model generation unit 140 may use machine learning to generate an image identification model (for example, the second model 133) that receives image data as input and outputs a feature amount indicating a transparent object appearing in the image.
- image identification model for example, the second model 133
- the sound identification model extracts the feature amount of the input work sound data, and combines the extracted feature amount with the work sound of work handling transparent objects stored in the storage unit 130 in advance. If the calculated similarity is greater than or equal to a predetermined value, it is estimated that the worker is performing transparent work.
- the image identification model extracts the feature amount of input image data, and uses the extracted feature amount and an image of a worker working on a transparent object stored in advance in the storage unit 130. The degree of similarity with the feature amount is calculated, and if the calculated degree of similarity is greater than or equal to a predetermined value, it is estimated that the worker is working with a transparent object. Note that the model including the sound identification model and the image identification model estimates whether or not the worker is working with a transparent object based on the estimation results using these two models.
- the model generation unit 140 may update the first model 132 and the second model 133 by storing the learned model in the storage unit 130.
- the model generation unit 140 is realized, for example, by a processor executing a program stored in the storage unit 130.
- first model 132 and the second model 133 may extract the feature amount of the input data and output the extracted feature amount.
- the input accepting unit 150 is an input interface that accepts operation input by a user using the estimation device 100.
- the input reception unit 150 is realized by a touch panel display or the like.
- the touch panel display functions as a display unit (not shown) and the input reception unit 150.
- the input receiving unit 150 is not limited to a touch panel display, and may be, for example, a keyboard, a pointing device (eg, a touch pen or a mouse), a hardware button, or the like. Further, the input receiving unit 150 may be a microphone when accepting voice input.
- FIG. 2 is a flowchart showing an example 1 of operation of the estimation system 200 according to the embodiment.
- the sound collection device 10 collects work sounds accompanying the work of the worker, and outputs data of the collected work sounds to the estimation device 100. .
- the acquisition unit 121 of the estimation device 100 acquires the data of the work sound collected by the sound collection device 10 (S01), and outputs the acquired data of the work sound to the estimation unit 122.
- the estimation unit 122 of the estimation device 100 estimates whether the worker is working with a transparent object by inputting work sound data to the learned first model 132 (S02). .
- FIG. 3 is a diagram schematically showing an example of the flow in step S02 of FIG.
- the estimation unit 122 divides the sound data during work (so-called work sound data) acquired from the acquisition unit 121 into data for each predetermined time (for example, 2 seconds), and uses the divided data as a sound identification model. (eg, first model 132).
- the work sound data may be subjected to preprocessing such as standardization before being input to the sound identification model.
- the sound identification model extracts the feature amount of the work sound of work handling transparent objects from the input work sound data.
- the feature extracted by the sound identification model is referred to as the feature to be evaluated, that is, the evaluation sound feature.
- the estimating unit 122 uses the evaluation sound feature output from the sound identification model and the feature of the work sound (herein referred to as the target sound) of work that involves handling transparent objects, which is registered in advance in the storage unit 130. A degree of similarity indicating how similar or not the registered feature amount is is calculated, and the calculated degree of similarity is output.
- FIG. 4 is a diagram showing a graph of the degree of similarity between the feature amount of the collected work sound and the feature amount of the work sound of work that handles transparent objects.
- a user visually confirms an image captured by the imaging device 20, and a section in which a worker is working with a transparent object (herein referred to as a work section) and a section in which a worker is handling a transparent object are shown.
- the results are also shown by distinguishing between sections where no work is being done (herein referred to as non-work sections).
- the broken line in the figure indicates the similarity threshold.
- the worker It is presumed that he is working with transparent objects.
- the difference between the working section and the non-working section is expressed in the similarity score.
- the similarity score increases.
- the sound emitted from the transparent object is not picked up, so the similarity score is not calculated.
- FIG. 5 is a diagram showing the results of time-series analysis of one hour's worth of work sounds in Verification Example 1.
- the transparent object is a transparent plastic bag (hereinafter referred to as a transparent bag), and data of work sounds accompanying the work of the worker is input to the first model 132 (for example, the sound identification model in FIG. 3).
- the first model 132 for example, the sound identification model in FIG. 3
- the work sound data collected in Verification Example 1 is sound data in an audible range, it may also include sound data in an inaudible range.
- bag task a correct label indicating that the user was performing a task of handling transparent bags
- bag task the correct answer is labeled as "bag work" for the state in which the worker is not touching the transparent bag but the transparent bag is on the workbench, and the state in which the worker is packing products into bags. granted.
- bag work the state in which workers are filling out documents, unpacking, etc. is not work involving transparent bags (in other words, non-bag work).
- the similarity of the image features shown in Figure 5 is between the feature amount showing the transparent bag appearing in the image extracted using the image identification model and the feature amount showing the transparent bag appearing in the pre-registered image. shows the degree of similarity.
- the similarity score increased when a sound other than the sound emitted from the transparent bag was generated.
- Verification Example 1 the accuracy of task identification using the sound identification model had a correct answer rate of 28% and an incorrect answer rate of 5%.
- the first model 132 may be a model that inputs work sound data and directly estimates (in other words, outputs) whether or not the work involves handling a transparent object.
- the first model 132 may be a model that inputs work sound data and directly estimates (in other words, outputs) whether or not the work involves handling a transparent object.
- FIG. 6 is a diagram for explaining a method for estimating bagging work performed in Verification Example 2.
- the neural network shown in FIG. 6 is an example of the first model 132.
- the model generation unit 140 uses, as learning data, an image of a spectrogram of a work sound or image data showing a worker corresponding to the work sound (that is, imaged at the same time as the time when the work sound was collected). .
- the model generation unit 140 uses learning data as teacher data to classify the worker into two categories: whether or not the worker is doing bagging work (in other words, whether there is bagging work or not), or if there is bagging work, the bag We use data labeled with three categories including the type (for example, large bag, small bag, etc.).
- the model generation unit 140 determines the parameters of the neural network through learning.
- the estimation unit 122 performs inference using a neural network using the parameters determined during learning. For example, the estimation unit 122 inputs data to be classified into work (work sound data or image data) into a neural network, and categorizes the data into two categories: whether there is bag work or not, or by the type of bag if there is bag work. The estimation results for the three categories are output.
- work work sound data or image data
- the estimation results for the three categories are output.
- FIG. 7 is a diagram showing an example of the architecture of the neural network shown in FIG. 6.
- the neural network has a convolution layer because the input data is an image, but for example, if the input data is time-series numerical data, it does not need to have a convolution layer. Note that the example in FIG. 7 is just an example and is not limited thereto.
- FIG. 8 is a diagram showing a method for calculating the correct answer rate when estimating two categories.
- the neural network was trained using data labeled with bagging and without bagging as training data.
- the correct answer rate (%) was calculated using the formula shown in FIG. Figure 9 shows the estimation results and correct answer rate.
- FIG. 9 is a diagram showing the estimation results and correct answer rate for two classifications in Verification Example 2 of Operation Example 1.
- (a) of FIG. 9 shows the estimation results and correct answer rate for two classifications when the work sound data input to the neural network is data of sounds in the audible range
- (b) of FIG. It shows the estimation results and correct answer rate for two classifications when the work sound data is wideband sound data including sounds in the inaudible range.
- Bag operation 1 is an operation in which polyethylene bags each having a length and width of approximately 10 cm are handled
- bag operation 2 is an operation in which a polyethylene bag each having a length and width of approximately 30 cm is handled. As shown in FIGS.
- FIG. 10 is a diagram showing a method for calculating the correct answer rate when estimating three categories.
- the neural network was trained using, as training data, data with a label indicating the type of bag when there is bagging work and a label when there is no bagging work.
- the correct answer rate (%) was calculated using the formula shown in FIG.
- the estimation results and correct answer rate are shown in FIG. 11.
- FIG. 11 is a diagram showing the estimation results and correct answer rate for the three classifications in Verification Example 2 of Operation Example 1.
- FIG. 11(a) shows the estimation results and correct answer rate for three classifications when the work sound data input to the neural network is sound data in the audible range
- FIG. 11(b) shows the estimation results and correct answer rate.
- the estimation results and correct answer rate are shown when the work sound data is broadband sound data including sounds in the inaudible range.
- FIGS. 11(a) and 11(b) when working sound data in a wide range of sounds including non-audible sounds is used as input data, data of working sounds in an audible range is obtained.
- the correct answer rate was higher than using . Therefore, it was confirmed that the worker's work can be estimated more accurately when the work sound data is broadband sound data than the audible band sound data.
- FIG. 12 is a diagram showing a method for estimating two categories based on a combination of input data and a method for calculating a correct answer rate.
- FIG. 12(a) shows a classification method for estimation results
- FIG. 12(b) shows a correspondence relationship between estimation results and labels.
- classification A indicates bag work when the input data is (i) image data, and (ii) image data + broadband sound data. This indicates that the work could be estimated according to the label.
- classification D indicates that when the input data is at least one of (i) and (ii) above, the work could be estimated according to the label without bagging work.
- the correct answer rate (%) was calculated using the formula shown in FIG. 12(b). The estimation results and correct answer rate are shown in FIG. 13.
- FIG. 13 is a diagram showing the estimation results and correct answer rate of two classifications based on the combination of input data in Verification Example 2 of Operation Example 1.
- FIG. 13(a) shows the estimation results and correct answer rate for two classifications when the input data input to the neural network is image data
- FIG. 13(b) shows the estimation results and correct answer rate when the input data input to the neural network is image data.
- the estimation results and correct answer rates are shown for data and broadband work sound data.
- FIGS. 13(a) and 13(b) when broadband work sound data was used as input data, the correct answer rate was higher than when only image data was used. Therefore, it was confirmed that the worker's work can be estimated more accurately when the input data input to the neural network is image data and broadband work sound data, rather than only image data.
- Verification Example 3 of Operation Example 1 Next, Verification Example 3 of Operation Example 1 will be specifically explained.
- Verification Example 1 work sounds in the audible band were used to estimate the work, but verification example 3 differs from verification example 1 in that work sound data including sounds in the inaudible band was used.
- Verification Example 3 the estimation accuracy when implementing the estimation method described in Operation Example 1 (referred to as this method) using work sound data including sounds in the inaudible band, and the estimation accuracy using image AI (in other words, The estimation accuracy was compared with that when an estimation method using video AI) was implemented.
- FIG. 14 is a diagram showing a comparison result between the estimation accuracy of the estimation method using image AI and the estimation accuracy of this method.
- "1" in the label column indicates that a label (so-called correct label) indicating that work is being done with transparent bags (so-called bag work) has been attached, and "0” means that This indicates that the correct answer label is not attached (that is, non-bag work).
- "1" in the image AI and method columns indicates that it is estimated that bagging work is being performed, and "0” indicates that it is estimated that bagging work is not being performed.
- FIG. 15 is a diagram for explaining the difference between the estimation results using work sound data and the estimation results using image data.
- FIG. 16 is a diagram for explaining an overview of the flow of operation example 2 of estimation system 200 in the embodiment.
- FIG. 17 is a flowchart showing a second operation example of the estimation system 200 according to the embodiment. In operation example 2, differences from operation example 1 will be mainly explained, and explanations of common steps will be omitted or simplified.
- the user visually checks the worker's work in the image and determines the section where bag work is present (bag work section).
- bag work section the section where bag work is present.
- the number of bag operations can be counted even when the operations are estimated using only image data.
- the sound similarity score responds (increases) earlier than the image-based estimation.
- the acquisition unit 121 of the estimation device 100 acquires the image data corresponding to the work sound data
- the acquisition unit 121 sends the image to the estimation system 200 that receives the image data as input. Enter the data.
- the estimation system 200 performs preprocessing such as adjusting or standardizing the size of input image data, inputs it to a neural network (e.g., image identification model), and outputs the image data. Based on the feature amount of the image, the degree of similarity with the feature amount representing the transparent bag reflected in the image is calculated.
- a neural network e.g., image identification model
- the acquisition unit 121 of the estimation device 100 inputs the work sound data to the estimation system 200 which receives the work sound data as input.
- the system performs preprocessing such as standardization of input work sound data, and inputs it into a neural network (e.g., sound identification model) to characterize the output work sound. Based on the amount, the degree of similarity with the feature amount of the bag work sound is calculated. Then, by combining the estimation results obtained by these estimation systems 200, an estimation result is output.
- a neural network e.g., sound identification model
- the sound collection device 10 collects work sounds accompanying the work of the worker, and outputs data of the collected work sounds to the estimation device 100.
- the imaging device 20 also captures an image of the worker performing the work that corresponds to the work sound collected by the sound collection device 10 (that is, the images were captured at the same time), and data of the captured image. is output to the estimation device 100. Note that when the worker is working with a transparent object, the image shows the transparent object (here, a transparent bag) together with the worker.
- the acquisition unit 121 of the estimating device 100 acquires data on work sounds accompanying the work of the worker (S01), and outputs the acquired work sound data to the estimation unit 122.
- the estimation unit 122 estimates whether or not the worker is working with a transparent object by inputting work sound data to the first model 132 (S02). Specifically, for example, the estimating unit 122 determines that the degree of similarity between the feature extracted by the first model 132 and the feature of the work sound of work handling transparent objects stored in the storage unit 130 in advance is a predetermined value. (so-called threshold value) or more, it is estimated that the worker is working with a transparent object.
- the acquisition unit 121 of the estimation device 100 outputs the data of the acquired image to the estimation unit 122.
- the estimation unit 122 estimates whether or not the worker is working with a transparent object by inputting image data to the second model 133 (S04). Specifically, for example, the estimation unit 122 uses the feature amount of an image showing a worker doing transparent work extracted from the second model 133 and the image showing a worker handling a transparent object stored in the storage unit 130 in advance. If the degree of similarity with the feature amount of the image is greater than or equal to a predetermined value (so-called threshold value), it is estimated that the worker is working with transparent objects.
- a predetermined value so-called threshold value
- the estimation unit 122 determines whether the worker recognizes the transparent object based on the estimation result estimated from the work sound data using the first model 132 and the estimation result estimated from the image data using the second model 133. It is estimated whether or not the person is working on handling the object (S05). Specifically, for example, the estimation unit 122 calculates the degree of similarity between the feature amount of the work sound extracted by the first model 132 and the feature amount of the work sound of work handling transparent objects stored in the storage unit 130 in advance. is greater than or equal to a predetermined value (threshold), and the degree of similarity between the feature amount of the image extracted by the second model 133 and the feature amount of the image in which the worker handling the transparent object is stored in advance in the storage unit 130. is greater than or equal to a predetermined value (threshold value), it is estimated that the worker is working with a transparent object.
- a predetermined value threshold
- Modification 1 of operation example 2 In operation example 2, the worker is transparent based on the feature amount obtained by inputting work sound data to the first model 132 and the feature amount obtained by inputting image data to the second model 133. An example of estimating whether or not a user is working with an object has been explained.
- work sound data is added to the first model 132 by the example of the first model 132 that directly estimates whether or not the work is handling a transparent object as described in verification example 2 of operation example 1. Based on the feature amount of the work sound and the feature amount of the image obtained by inputting the image data and image data, it is estimated whether the worker is working with a transparent object or not.
- FIG. 18 is a flowchart showing a first modification of the second operation example of the estimation system 200 in the embodiment.
- the acquisition unit 121 of the estimation device 100 acquires data of the work sound collected by the sound collection device 10 (S01), and outputs the acquired data to the estimation unit 122.
- the acquisition unit 121 of the estimation device 100 acquires data of an image in which the worker performing the work corresponds to the data of the work sound captured by the imaging device 20 (S03), and transfers the acquired data to the estimation unit 122. Output to.
- the estimating unit 122 calculates the amount of work performed by the worker handling the transparent object based on the feature amount of the work sound and the feature amount of the image obtained by inputting the data of the work sound and the image data into the first model 132. It is estimated whether or not the user is doing the following (S06).
- FIG. 19 is a diagram schematically showing a configuration example 1 of the estimation unit 122 that executes the flow of the modification 1 of the operation example 2.
- FIG. 20 is a diagram for explaining a method of estimating bagging work performed by Configuration Example 1.
- the estimation unit 122 includes an embedding vector creation unit, a work classification unit, and a bag work identification unit.
- the embedding vector creation unit includes an image subnetwork that inputs image data and extracts image features, and an image subnetwork that inputs image data and extracts image features, and an image subnetwork that inputs sound (here, work sound) data and extracts sound features (here, work sound features). a fusion layer; and a fusion layer.
- the neural network may include, for example, an image sub-network and a sound sub-network.
- a neural network may be the first model 132.
- the sound sub-network may be the first model 132
- the image sub-network may be the second model 133.
- the model generation unit 140 uses image data and work sound data as learning data, and uses data labeled with the presence or absence of similarity to the learning data as teacher data.
- the model generation unit 140 determines the parameters of the neural network through learning.
- the work sound data is broadband sound data including sounds in the audible range or sounds in the inaudible range.
- the work sound data may be, for example, a spectrogram of 257 ⁇ 199 pixels.
- the image data may be, for example, 224 ⁇ 224 pixel data. Note that the model generation unit 140 may perform transfer learning on the fusion layer.
- the estimation unit 122 creates an embedding vector using a fusion layer using the parameters determined during learning.
- the estimation unit 122 inputs the embedding vector to the work classification unit, and identifies the baggage work based on the probability value output from the Softmax layer.
- FIG. 21 is a diagram schematically showing a second configuration example of the estimation unit 122 that executes the flow of the first modification of the second operation example.
- the work classification unit includes a classification network and a softmax layer, but in configuration example 2, the work classification unit includes a contrast learning network.
- Contrastive learning is a type of self-supervised learning that allows you to learn from vast amounts of data as is, using a mechanism that compares data without labeling it. In contrastive learning, features are learned so that similar data are placed close together and different data are placed far away.
- FIG. 22 is a diagram schematically showing a third configuration example of the estimation unit 122 that executes the flow of the first modification of the second operation example.
- the fusion layer is placed before the classification network, but in configuration example 3, the fusion layer is placed after the classification network.
- FIG. 23 is a diagram illustrating an example of the architecture of the image subnetwork.
- FIG. 24 is a diagram illustrating an example of the architecture of the sound subnetwork.
- the sizes of the image data and work sound data that are input data are often different, so the sizes of each layer of the image subnetwork and the sound subnetwork do not have to be the same. However, it is sufficient if the final layer sizes of these subnetworks are the same.
- FIG. 25 is a diagram illustrating an example of the architecture of the fusion layer. As shown in FIG. 25, data output from the image sub-network and data output from the sound sub-network are input to the connection layer, and different outputs are obtained during learning and during inference.
- FIG. 26 is a diagram illustrating an example of the architecture of a classification network.
- the size of the first layer of the classification network is, for example, the same size as the data output from the final layer of the image subnetwork when placed after the image subnetwork, and the size of the first layer of the classification network is the same size as the data output from the final layer of the image subnetwork, If placed after the subnetwork, it is the same size as the data output from the last layer of the sound subnetwork. Further, the size of the first layer of the classification network is the same size as the data output from the last layer of the fusion layer, for example, when the classification network is placed after the fusion layer.
- FIG. 27 is a diagram illustrating an example of the architecture of a contrastive learning network.
- the size of the first layer of the contrastive learning network is, for example, the same as the size of the embedding vector output from the embedding vector creation unit.
- Contrastive learning networks are used as transfer learning.
- the loss function for example, Equation 1 below is used.
- sim(x, y) is a function for calculating similarity, and for example, cosine similarity may be used.
- zi and zz are corresponding embedding vectors, and for example, embedding vectors of image data and broadband work sound data may be used, respectively.
- ⁇ is an adjustment parameter.
- Equation 1 The loss function of Equation 1 above is large when the similarity between the two embedding vectors is large, and becomes small when the similarity is small.
- FIG. 28 is a diagram schematically showing a configuration example of the estimation unit 122 that executes the flow of the second modification of the second operation example.
- the flow of Modification 2 of Operation Example 2 will be described with reference to the flow of Modification 1 of Operation Example 2 shown in FIG.
- image data is acquired and used as input data, but in modification 2 of operation example 2, in step S03 of FIG.
- distance data acquired by a distance measuring sensor or the like may be used as input data.
- the estimation unit 122 includes a distance subnetwork instead of the image subnetwork.
- the task classification unit may include a contrastive learning network.
- FIG. 29 is a diagram illustrating an example of work sounds when the estimation unit 122 incorrectly estimates that a worker is working with a transparent object.
- FIG. 30A is a flowchart of operation example 3 of estimation system 200 in the embodiment.
- FIG. 30B is a flowchart illustrating an example of an operation for pre-registering feature amounts of work sounds that may be erroneously estimated.
- the work sound that accompanies the work of handling transparent bags is called transparent bag sound
- the work sound that accompanies the work of handling non-transparent bags is called non-transparent bag sound.
- the work of handling transparent bags is called bag work.
- the similarity threshold is, for example, 25, and if the similarity of the work sound to the transparent bag sound is greater than or equal to the threshold, the estimation unit 122 estimates that the worker is working on handling transparent bags (bag work). ). At this time, the work of the worker may be accurately estimated based on the work sounds that accompany the work of handling transparent bags, such as the sound of opening a plastic bag or taking a bag out of a shelf. In some cases, it may be mistakenly assumed that the person is doing bag work even though he or she is not.
- non-transparent bag sounds such as the sound of boxes being tied together with rubber bands, the sound of boxes or bags being stored at the bottom of a cart, or the sound of barcode scanning during transportation, etc. Based on this, it is erroneously assumed that the worker is doing bag work.
- the estimation unit 122 calculates the similarity between the feature amount of the non-transparent vesicle sound and the feature amount of the transparent vesicle sound registered in advance, and if the similarity exceeds a threshold value, In this case, the non-transparent bag sound is determined to be the incorrect estimation target sound and is stored in the storage unit 130.
- the estimating unit 122 reads from the storage unit 130 the feature amount of a work sound that can be erroneously estimated (hereinafter also referred to as a sound to be erroneously estimated) and the feature amount of a transparent bag sound, which are registered in advance. The degree of similarity between the feature values of the work sound and these feature values is compared to estimate whether or not the worker is doing bag work.
- the acquisition unit 121 of the estimation device 100 acquires data of the work sound collected by the sound collection device 10 and outputs the acquired data to the estimation unit 122.
- the estimation unit 122 inputs the acquired work sound data to the sound identification model (S11), detects sound from the input work sound data, and extracts input feature amounts (S12).
- the estimation unit 122 extracts the feature amount (sound feature amount) of the work sound (hereinafter referred to as input sound) using the sound identification model (S13).
- the estimating unit 122 reads the feature amount of the transparent sound and the feature amount of the incorrect estimation target sound from the storage unit 130 (S14).
- the estimation unit 122 calculates the similarity between the transparent bag sound and the input sound, and the similarity between the incorrect estimation target sound and the input sound.
- the estimation unit 122 determines whether the degree of similarity between the transparent bag sound and the input sound exceeds the degree of similarity between the incorrect estimation target sound and the input sound (S16), and if it is determined that the degree of similarity exceeds the degree of similarity (Yes in S16). , it is determined whether the degree of similarity between the transparent envelope sound and the input sound exceeds a threshold value (S17). If the estimation unit 122 determines that the degree of similarity between the transparent envelope sound and the input sound exceeds the threshold (Yes in S17), the estimation unit 122 determines that the input sound is a transparent envelope sound (S18). Thereby, the estimating unit 122 estimates that the worker is working with transparent bags based on the feature amount of the input sound (work sound).
- the estimation unit 122 determines in step S16 that the similarity between the transparent bag sound and the input sound does not exceed the similarity between the incorrect estimation target sound and the input sound (No in S16), the input sound is not a transparent bag sound. It is determined that there is no one (S19). Furthermore, if the estimation unit 122 determines in step S17 that the degree of similarity between the transparent envelope sound and the input sound does not exceed the threshold (No in S17), the estimation unit 122 determines that the input sound is not a transparent envelope sound (S19). Thereby, the estimating unit 122 estimates that the worker is doing work that does not involve handling transparent bags, based on the feature amount of the input sound (work sound).
- the acquisition unit 121 of the estimation device 100 acquires data of the work sound acquired by the sound collection device 10 and outputs the acquired data to the estimation unit 122.
- the work sound data acquired by the acquisition unit 121 is work sound associated with work that does not involve handling transparent bags.
- the estimation unit 122 inputs the acquired work sound data to the sound identification model (S21), detects sound from the input work sound data, and extracts input feature amounts (S22).
- the estimation unit 122 extracts the feature amount (sound feature amount) of the work sound (hereinafter referred to as input sound) using the sound identification model (S23). Next, the estimation unit 122 reads the feature amount of the transparent bag sound from the storage unit 130 (S24).
- the estimation unit 122 calculates the similarity between the transparent sound and the input sound.
- the estimating unit 122 determines whether the degree of similarity between the transparent bag sound and the input sound exceeds a threshold (S26), and when determining that it exceeds the threshold (Yes in S26), the estimation unit 122 incorrectly identifies the input sound. It is determined that the sound is the estimation target sound (S27). Then, the estimation unit 122 stores the feature amount of the collected sound (work sound) in the storage unit 130 as the feature amount of the incorrect estimation target sound (S29). On the other hand, if the estimation unit 122 determines that the degree of similarity between the transparent bag sound and the input sound does not exceed the threshold (No in S26), the estimation unit 122 determines that the input sound is not an incorrect estimation target sound (S28).
- the estimation method according to the present embodiment is an estimation method for estimating the work of a worker using a computer (for example, the estimation device 100), in which the computer By acquiring work sound data (S01 in FIG. 2) and inputting the work sound data into the trained first model 132, it is estimated whether the worker is working with a transparent object ( S02 in FIG. 2).
- the device that executes the estimation method uses the first model 132 that inputs work sound data and outputs whether or not the work is handling a transparent object.
- the work to be handled can be estimated with high accuracy.
- the computer acquires data of an image showing a worker performing the work, which corresponds to the work sound data (S03 in FIG. 17), By inputting image data into the learned second model 133, it is estimated whether the worker is working with a transparent object (S04 in FIG. 17), and the estimation result using the first model 132 is estimated. Based on the estimation result using the second model 133, it is estimated whether the worker is working with a transparent object (S05 in FIG. 17).
- the estimation result using the first model 132 is the estimation result estimated from the work sound data by the first model 132
- the estimation result using the second model 133 is the estimation result estimated from the image data by the second model 133. This is the estimation result estimated from .
- the device that executes the estimation method (for example, the estimation device 100) can estimate the estimation result estimated from the work sound data by the first model 132 and the estimation result estimated from the image data by the second model 133. Based on this, it is estimated whether the worker is working with transparent objects or not. Therefore, the device that executes the estimation method can estimate the task of handling a transparent object with higher accuracy than when estimating using only the data of the task sound.
- the computer acquires data of an image showing a worker performing the work, which corresponds to the work sound data (S03 in FIG. 18), By inputting work sound data and image data to the first model 132, it is estimated whether the worker is working with a transparent object or not (S06).
- the device that executes the estimation method inputs the data of the work sound and the data of the image corresponding to the work sound, and outputs the first output indicating whether or not the work involves handling a transparent object. Since the model 132 is used, the task of handling a transparent object can be estimated with higher accuracy than when estimation is made using only work sound data.
- the computer for example, the estimation device 100
- the device that executes the estimation method can perform the estimation method based on the degree of similarity between the feature amount of the work sound output from the first model 132 and the feature amount of the work sound of work that handles transparent objects. Since it is estimated whether or not the worker is working with a transparent object, it is possible to estimate the work of handling a transparent object with high accuracy.
- the computer stores the feature quantities of the work sound output from the first model 132 in advance in the storage unit 130 (for example, the feature quantity database 131).
- the degree of similarity (in other words, the first degree of similarity) to the saved characteristic amount of the work sound of the work that handles the transparent object, and the degree of similarity between the worker and the person who handled the transparent object, which was previously stored in the storage unit 130 (for example, the feature amount database 131).
- a worker identifies a transparent object based on the degree of similarity (in other words, the second degree of similarity) to the feature amount of a work sound that may be incorrectly estimated when handling a work (for example, the sound to be incorrectly estimated in FIG. 30A). It is estimated whether or not the person is working on handling the object (S16 to S19 in FIG. 30A).
- the device that executes the estimation method calculates the feature amount of the work sound output from the first model 132 and the feature amount of the work sound of work that handles transparent objects, and the similarity (first similarity).
- the similarity second similarity
- the computer calculates the feature amount of the work sound outputted from the first model 132 based on the feature amount of the work sound of work handling a transparent object.
- the degree of similarity (the above first degree of similarity) is the degree of similarity (above first degree of similarity) with respect to the feature amount of the work sound (the sound to be erroneously estimated in FIG. 30A) that can be erroneously estimated as a worker working with a transparent object If the second similarity described above is exceeded (Yes in S16 of FIG. 30A), it is estimated that the worker is working with a transparent object.
- the device that executes the estimation method e.g., the estimation device 100
- the estimation device 100 can reduce the occurrence of erroneous estimations, so even if only the work sound data is used, it can accurately estimate the work of handling transparent objects. be able to.
- the computer e.g., estimation device 100
- the computer inputs into the first model 132 work sound data of a work that handles a non-transparent object that is different from a transparent object. If the degree of similarity (in other words, the third degree of similarity) between the feature amount of the work sound of the task of handling objects and the feature amount of the work sound of the task of handling transparent objects exceeds the threshold (Yes in S26 of FIG. 30B),
- the work sound of the work that handles the non-transparent object is determined to be a work sound that can be erroneously estimated as the work sound of the work that handles the transparent object (so-called erroneous estimation target sound) (S27 in FIG. 30B), and
- the feature amount of the work sound of the work that handles is stored in the storage unit 130 (for example, the feature amount database 131) as the feature amount of the work sound that may be erroneously estimated (S29 in FIG. 30B).
- the device that executes the estimation method calculates the degree of similarity (the third Based on the degree of similarity), it is possible to accurately determine whether or not the work sound of working with a non-transparent object is a work sound that can be mistakenly assumed to be work that involves working with a transparent object. Therefore, the device that executes the estimation method can store in the storage unit 130 feature amounts of work sounds that are relatively likely to be erroneously estimated. Therefore, the device that executes the estimation method can reduce the occurrence of erroneous estimation by using the feature amount of the work sound that may be erroneously estimated and is stored in the storage unit 130, and therefore uses only the data of the work sound. It is possible to estimate the work involved in handling transparent objects with high accuracy.
- the work sound data may include sound data in an inaudible band.
- the device that executes the estimation method uses work sound data that includes sounds in the audible band to sounds in the inaudible band to determine whether the worker is working with a transparent object. Estimate whether or not.
- the work sound data includes sounds in the inaudible range, there is less noise from the environment that can cause erroneous estimation in the work sound data. It is possible to improve the estimation accuracy of tasks that involve
- a device that performs the estimation method can estimate whether or not a worker is working with a transparent object based on more information than when using only sound data in the audible range. . Therefore, the device that executes the estimation method can more accurately estimate the task of handling transparent objects.
- the estimation device 100 is an estimation device that estimates the work of a worker, and includes an acquisition unit 121 that acquires data of work sounds accompanying the collected work, and a learned
- the present invention includes an estimation unit 122 that estimates whether or not a worker is working with a transparent object by inputting work sound data into a model 132.
- the estimation device 100 uses the first model 132 that inputs work sound data and outputs whether or not the work is handling a transparent object, so it is possible to estimate the work that handles a transparent object with high accuracy. .
- the program according to this embodiment is a program for causing a computer to execute the above estimation method.
- FIG. 31 is a block diagram illustrating an example of the functional configuration of an estimation system according to another embodiment.
- the estimation system 200 according to the embodiment has been described as an example in which the estimation device 100 is a server device, the estimation device 100 does not need to be a server device.
- the estimation device 100a may be a stationary computer device such as a personal computer.
- the estimating device 100a differs from the estimating device 100 in that it includes a display section 160. Only the different points will be explained below.
- the display unit 160 displays the estimation results, for example.
- the display unit 160 is, for example, a display device that displays image information including characters, and is, for example, a display that includes a liquid crystal (LC) panel or an organic EL (Electro Luminescence) panel as a display device.
- LC liquid crystal
- organic EL Electro Luminescence
- the estimation device 100a may include, for example, a sound collection section and an imaging section, and one or more may be installed in the work space 80. Equipped with a sound collection unit and an imaging unit may be connected to the sound collection device 10 and the imaging device 20 by wired or wireless communication, or may be a single unit including the sound collection device 10 and the imaging device 20. It may be a device of.
- the estimation device 100a may be communicatively connected to, for example, a server device or a user's information terminal. In this case, the estimation device 100a may store the estimation results in the storage unit 130 for a predetermined period (for example, one day, several days, one week, etc.) and output the estimation results to the server device or information terminal. , the estimation result may be output each time the estimation is performed.
- the server device may be a cloud server.
- the information terminal may be a stationary computer device such as a personal computer, or a portable computer device such as a tablet terminal.
- each of the estimation systems 200 and 200a is realized by a plurality of devices, but may be realized as a single device. Further, when the system is realized by a plurality of devices, the plurality of components included in each of the estimation systems 200 and 200a may be distributed to the plurality of devices in any manner. Further, for example, a server device capable of communicating with the estimation system 200 or 200a may include a plurality of components included in the information processing unit 120.
- the communication method between devices in the above embodiment is not particularly limited. Further, in communication between devices, a relay device (not shown) may intervene.
- the processing executed by a specific processing unit may be executed by another processing unit. Further, the order of the plurality of processes may be changed, or the plurality of processes may be executed in parallel.
- each component may be realized by executing a software program suitable for each component.
- Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
- each component may be realized by hardware.
- each component may be a circuit (or integrated circuit). These circuits may constitute one circuit as a whole, or may be separate circuits. Further, each of these circuits may be a general-purpose circuit or a dedicated circuit.
- the general or specific aspects of the present disclosure may be implemented in a system, apparatus, method, integrated circuit, computer program, or computer-readable recording medium such as a CD-ROM. Further, the present invention may be realized by any combination of a system, an apparatus, a method, an integrated circuit, a computer program, and a recording medium.
- the present disclosure may be realized as an estimation method executed by a computer such as the estimation device 100, or may be realized as a program for causing a computer to execute such an estimation method. Further, the present disclosure may be realized as a program for causing a general-purpose computer to operate as the estimation device 100 of the above embodiment. The present disclosure may be realized as a computer-readable non-transitory recording medium on which these programs are recorded.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Multimedia (AREA)
- Educational Administration (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Manufacturing & Machinery (AREA)
- Primary Health Care (AREA)
- Image Analysis (AREA)
- General Factory Administration (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024528432A JPWO2023248676A1 (https=) | 2022-06-22 | 2023-05-23 | |
| CN202380046481.7A CN119404204A (zh) | 2022-06-22 | 2023-05-23 | 推测方法及推测装置 |
| US18/980,330 US20250111305A1 (en) | 2022-06-22 | 2024-12-13 | Estimation method and estimation device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022-100193 | 2022-06-22 | ||
| JP2022100193 | 2022-06-22 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/980,330 Continuation US20250111305A1 (en) | 2022-06-22 | 2024-12-13 | Estimation method and estimation device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023248676A1 true WO2023248676A1 (ja) | 2023-12-28 |
Family
ID=89379849
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/019081 Ceased WO2023248676A1 (ja) | 2022-06-22 | 2023-05-23 | 推定方法及び推定装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250111305A1 (https=) |
| JP (1) | JPWO2023248676A1 (https=) |
| CN (1) | CN119404204A (https=) |
| WO (1) | WO2023248676A1 (https=) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010186651A (ja) * | 2009-02-12 | 2010-08-26 | Toyota Motor Corp | コネクタ嵌合判定装置及びコネクタ嵌合判定方法 |
| JP2016051052A (ja) * | 2014-08-29 | 2016-04-11 | 本田技研工業株式会社 | 環境理解装置および環境理解方法 |
| JP2019028512A (ja) * | 2017-07-25 | 2019-02-21 | パナソニックIpマネジメント株式会社 | 情報処理方法および情報処理装置 |
| JP2021076913A (ja) * | 2019-11-05 | 2021-05-20 | 株式会社日立製作所 | 計算機及びモデルの学習方法 |
-
2023
- 2023-05-23 JP JP2024528432A patent/JPWO2023248676A1/ja active Pending
- 2023-05-23 CN CN202380046481.7A patent/CN119404204A/zh active Pending
- 2023-05-23 WO PCT/JP2023/019081 patent/WO2023248676A1/ja not_active Ceased
-
2024
- 2024-12-13 US US18/980,330 patent/US20250111305A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010186651A (ja) * | 2009-02-12 | 2010-08-26 | Toyota Motor Corp | コネクタ嵌合判定装置及びコネクタ嵌合判定方法 |
| JP2016051052A (ja) * | 2014-08-29 | 2016-04-11 | 本田技研工業株式会社 | 環境理解装置および環境理解方法 |
| JP2019028512A (ja) * | 2017-07-25 | 2019-02-21 | パナソニックIpマネジメント株式会社 | 情報処理方法および情報処理装置 |
| JP2021076913A (ja) * | 2019-11-05 | 2021-05-20 | 株式会社日立製作所 | 計算機及びモデルの学習方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119404204A (zh) | 2025-02-07 |
| JPWO2023248676A1 (https=) | 2023-12-28 |
| US20250111305A1 (en) | 2025-04-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6709862B6 (ja) | 畳み込みニューラルネットワーク画像認識技術による会計方法及び設備 | |
| US20190371134A1 (en) | Self-checkout system, method thereof and device therefor | |
| CN101965576B (zh) | 用于追踪、索引及搜寻的物件匹配 | |
| WO2020046960A1 (en) | System and method for optimizing damage detection results | |
| CN111149129A (zh) | 异常检测装置及异常检测方法 | |
| US12062013B1 (en) | Automated planogram generation and usage | |
| US20110103652A1 (en) | Image processing apparatus and image processing method | |
| KR102597692B1 (ko) | 영상을 이용한 물건 부피의 측정 장치, 방법, 및 컴퓨터 프로그램 | |
| US12586171B2 (en) | Methods and systems for grading devices | |
| CN104977038A (zh) | 使用与关联存储器耦合的运动感测设备识别移动 | |
| CN114067401A (zh) | 目标检测模型的训练及身份验证方法和装置 | |
| CN118153914B (zh) | 基于图像分析的集装箱病媒生物检测预警方法及系统 | |
| CN112183460A (zh) | 一种智能识别环境卫生的方法及装置 | |
| CN111461104B (zh) | 视觉识别方法、装置、设备及存储介质 | |
| WO2023248676A1 (ja) | 推定方法及び推定装置 | |
| KR102941345B1 (ko) | 정보 처리 프로그램, 정보 처리 방법 및 정보 처리 장치 | |
| CN120107598A (zh) | 餐盘识别方法、装置、设备、存储介质及计算机程序产品 | |
| JP2011150425A (ja) | リサーチ装置およびリサーチ方法 | |
| CN119919679A (zh) | 一种基于时序的透明物体检测方法、系统、设备及存储介质 | |
| TW202004619A (zh) | 自助結帳系統、方法與裝置 | |
| CN121545106B (zh) | 交互关系识别方法和装置 | |
| Mourey et al. | Human body fall recognition system | |
| US12620230B2 (en) | Non-transitory computer-readable recording medium, information processing method, and information processing apparatus for detecting fraud at accounting machine | |
| Lin | Multi-sensor fusion for inventory monitoring | |
| US20240193993A1 (en) | Non-transitory computer-readable recording medium, information processing method, and information processing apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23826864 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024528432 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380046481.7 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 202380046481.7 Country of ref document: CN |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23826864 Country of ref document: EP Kind code of ref document: A1 |