WO2020181553A1 - 用于识别工厂中处于异常状态的生产设备的方法和装置 - Google Patents

用于识别工厂中处于异常状态的生产设备的方法和装置 Download PDF

Info

Publication number
WO2020181553A1
WO2020181553A1 PCT/CN2019/078152 CN2019078152W WO2020181553A1 WO 2020181553 A1 WO2020181553 A1 WO 2020181553A1 CN 2019078152 W CN2019078152 W CN 2019078152W WO 2020181553 A1 WO2020181553 A1 WO 2020181553A1
Authority
WO
WIPO (PCT)
Prior art keywords
production equipment
abnormal state
factory
machine learning
sound
Prior art date
Application number
PCT/CN2019/078152
Other languages
English (en)
French (fr)
Inventor
莫拉⋅卡洛斯
韩克�
哈尔坦托⋅维克多
张子涵
任文科
王文科
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2019/078152 priority Critical patent/WO2020181553A1/zh
Publication of WO2020181553A1 publication Critical patent/WO2020181553A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements

Definitions

  • the present disclosure relates to the field of industrial control, and more specifically, to methods, devices, computing devices, computer-readable storage media, and program products for identifying production equipment in an abnormal state in a factory.
  • production equipment In the factory environment, production machinery and equipment (hereinafter collectively referred to as "production equipment") may cause harm to humans and/or damage other objects.
  • One cause of injury and/or damage is that the production equipment or its moving parts make undesirable contact with the operator or other objects, such as the collision of the robot arm of the production equipment with the operator.
  • Another possible reason is that items that are dropped or ejected from the production equipment may cause injury and/or damage to the operator or other objects, such as broken moving parts on the production equipment and falling from the production equipment, chemical The product overflows from a container in the production facility, etc.
  • the stop switch can have various forms such as a button, a lever, or a plug. When the stop switch is enabled, the production equipment can be stopped.
  • a protective cage is set outside the production equipment to prevent operators or other objects from entering the area where the moving parts of the production equipment (for example, the robotic arm) may touch. Sensors can be installed on the protective cage to detect whether the cage door is opened. When it is detected that the cage door is opened, the operation of the production equipment is stopped.
  • the first embodiment of the present disclosure proposes a method for identifying production equipment in an abnormal state in a factory, which includes: obtaining environmental information around at least one production equipment in the factory; obtaining a classification result of the environmental information using a machine learning model ,
  • the machine learning model is configured to output corresponding classification results based on the features extracted from the environmental information; determine whether there are production equipment in an abnormal state in the factory based on the classification results; and when the classification results indicate that there is production in an abnormal state in the factory
  • the candidate production equipment in the abnormal state is determined based on a predetermined rule to control at least one of the candidate production equipment in the abnormal state.
  • this method allows when items that may cause injury and/or damage to the operator or other objects are dropped or ejected from the production equipment without direct contact with the production equipment, or even if it is not within the sight of the sensor. It can detect and normally activate the safety mechanism, for example, make the production equipment emit an audible/visual alarm, or turn off the production equipment in an emergency. Moreover, the method also allows detecting abnormal operation of the production equipment or potential injury and/or damage to the operator or other objects and initiating a safety mechanism. In addition, since the sensor in the traditional method is not required, the method can cover a larger detection range.
  • the second embodiment of the present disclosure proposes an apparatus for identifying production equipment in an abnormal state in a factory, including: an information acquisition unit configured to obtain environmental information around at least one production equipment in the factory; and information classification A unit configured to use a machine learning model to obtain a classification result of environmental information, the machine learning model is configured to output corresponding classification results based on features extracted from the environmental information; an abnormality judgment unit, which is configured to judge based on the classification result Whether there is production equipment in an abnormal state in the factory; and a candidate determination unit configured to determine candidate production equipment in an abnormal state based on a predetermined rule when it is determined that there is production equipment in an abnormal state in the factory to control the candidate At least one of the production equipment in an abnormal state.
  • the third embodiment of the present disclosure proposes a computing device that includes: a processor; and a memory for storing computer-executable instructions, and when the computer-executable instructions are executed, the processor executes the first embodiment Method in.
  • the fourth embodiment of the present disclosure proposes a computer-readable storage medium having computer-executable instructions stored thereon, and the computer-executable instructions are used to execute the method of the first embodiment.
  • the fifth embodiment of the present disclosure proposes a computer program product, which is tangibly stored on a computer-readable storage medium and includes computer-executable instructions, which when executed, cause at least one processing The device executes the method of the first embodiment.
  • Fig. 1 shows a flowchart of a method for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of the architecture of a system for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure
  • Fig. 3 shows an example architecture of a convolutional neural network in the machine learning model according to the embodiment of Fig. 2;
  • FIG. 4 shows an example architecture of a recurrent neural network in the machine learning model according to the embodiment of FIG. 2;
  • FIG. 5 shows the time-expanded architecture of the recurrent neural network illustrated in FIG. 4;
  • Fig. 6 shows an example of unit A of the recurrent neural network illustrated in Fig. 4;
  • Fig. 7 shows a block diagram of an apparatus for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • FIG. 8 shows a block diagram of a computing device for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • Fig. 1 shows a method for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • the method 100 starts at step 101.
  • step 101 environmental information around at least one production device in the factory is obtained.
  • the environmental information can be sounds or images in the surrounding environment of the production equipment.
  • human beings When human beings are frightened by sudden events or suffer sudden pain, they usually scream, accompanied by surprised or painful facial expressions. Therefore, in the factory environment, when the production equipment operates abnormally or drops from the production equipment or ejects objects that may cause injury to the operator, the natural response of humans to shock or pain can be used to activate the safety protection mechanism. In addition, it can also detect potential injuries and/or damages to operators or other objects of production equipment.
  • the production equipment when the production equipment emits abnormal operation noise or there is a cracking sound in the environment of the production equipment (for example, the sound of broken glass or the sound of breaking objects), it usually implies that the production equipment is operating abnormally or that the production equipment may or will affect operators or other objects Cause injury and/or damage. Therefore, the sound or image in the surrounding environment of at least one production equipment can be captured or collected and analyzed to determine whether the production equipment is operating abnormally or whether the production equipment is or is about to cause injury and/or damage to the operator or other objects.
  • abnormal operation noise or there is a cracking sound in the environment of the production equipment for example, the sound of broken glass or the sound of breaking objects
  • the sound or image in the surrounding environment of at least one production equipment can be captured or collected and analyzed to determine whether the production equipment is operating abnormally or whether the production equipment is or is about to cause injury and/or damage to the operator or other objects.
  • the environmental information may include a sound signal collected by at least one sound collection device, and the at least one sound collection device is arranged around at least one production device in the factory.
  • One or more sound collection devices can be arranged around the production equipment in the factory to collect sound signals.
  • the sound collection device may include, but is not limited to, a camera, a microphone, a microphone, and so on.
  • a microphone or microphone array can be used. Any appropriate topological structure of the microphone array may be adopted, for example, a one-dimensional microphone array such as a linear microphone array, a two-dimensional microphone array such as a circular microphone array, and so on.
  • the sound signal in order to obtain the characteristics of the sound signal in a short period of time, after the sound signal is collected, the sound signal may be divided into frames, and then each frame of the divided multi-frame signal may be converted into a frequency spectrum.
  • the graph is used as the input of the machine learning model (mentioned below).
  • the sound signal When the sound signal is framed, the sound signal can be windowed, that is, a window function is used to slide and cut a segment of the sound signal to obtain multiple short-term sound segments.
  • an overlapping segmentation method may be used for framing of the sound signal, which is to make a smooth transition between frames and maintain continuity.
  • a continuous segmentation method may also be used for the framing of the sound signal, that is, there is no overlap between the previous frame and the next frame.
  • a time-frequency domain transformation method such as Fast Fourier Transform (FFT) is used to transform the sound signal of each frame from the time domain to the frequency domain to obtain multiple spectrograms.
  • FFT Fast Fourier Transform
  • the processing sequence of framing and converting the sound signal into the frequency domain can also be exchanged.
  • the collected sound signal may be filtered to remove undesired sound to increase the accuracy of the machine learning model prediction. For example, when monitoring whether the production equipment causes harm to the operator has a higher priority, it is necessary to monitor whether the sound signal contains human screams.
  • a DSP filter or a digital filter e.g., a band pass filter
  • the filtering process can be performed after or before framing the sound signal.
  • the environmental information may include image signals collected by at least one image capture device, and the at least one image capture device is arranged around the at least one production device.
  • the image acquisition device may include, but is not limited to, a video camera, a camera or a camera, and so on.
  • a camera can be installed near or around each production facility to collect facial images of operators close to the production facility to recognize human expressions.
  • a camera installed in a factory floor can be used to capture images of production equipment to identify items that fall or eject from the production equipment that may cause injury/damage to the operator/other objects.
  • step 102 a machine learning model is used to obtain a classification result of the environmental information, and the machine learning model is configured to output a corresponding classification result based on the features extracted from the environmental information.
  • the machine learning model is a neural network model. In other embodiments, the machine learning model may also be other types of machine learning models.
  • the classification result may include at least one of the following: human screams, noise from abnormal operation of production equipment, sound of normal operation of production equipment, and cracking sound.
  • human screams When the sound signal contains human screams, it is usually possible that the operator is suddenly frightened or suffers sudden pain.
  • the sound signal contains the noise or broken sound of the abnormal operation of the production equipment, the usually possible situation is the production equipment Abnormal operation or potentially causing injury and/or damage to the operator or other objects.
  • the classification result may include at least one of the following: a human surprised expression, a human frowning expression, a human smiling expression, and an item dropped or ejected from the production equipment. Similar to the sound signal, when the human expression in the image is surprised or frowning, it is usually possible that the operator is suddenly frightened or suffers from sudden pain, or the operator finds that the production equipment is in an abnormal operation state. When the production equipment drops or ejects objects, it means that the production equipment is or is about to cause injury and/or damage to the operator or other objects. Therefore, by using the machine learning model to obtain the classification result of the environmental information, it can be determined whether there is any production equipment in an abnormal state in the factory.
  • the machine learning model is trained to extract features from environmental information.
  • step 102 further includes: using a machine learning model to extract features from environmental information.
  • manual extraction may also be used, such as manual input, annotation, measurement or other configuration.
  • the method 100 before using the machine learning model to obtain the classification result of the environmental information, the method 100 further includes training the machine learning model.
  • Training a machine learning model first includes building a machine learning model.
  • the machine learning model can use a combination of various models in a cascaded manner, for example, a model for extracting features and a model for classification are cascaded. Next is to obtain the characteristics of the training sample, which has the actual classification label.
  • the training samples may be collected from one or more memories and/or sensors, and the training samples may be stored and/or transmitted to a buffer, memory, cache, processor, or other device for training. Training samples can also be obtained from the network or existing databases. The number of training samples can be determined as needed.
  • certain processing may be performed on the training samples to generate additional training samples, thereby increasing the robustness of the model.
  • white noise can be superimposed on the image captured by the camera to perform blurring, flip, translate, and/or rotate the image to generate additional training samples.
  • the training samples may be pre-processed. For example, when the environmental information is a sound signal, the sound signal samples used for training may be filtered, framed, and converted to frequency spectrum.
  • the next step in training the machine learning model is to use the machine learning model to obtain the classification results of the training samples based on the characteristics of the training samples.
  • the classifier in the machine learning model can obtain a classification result corresponding to the characteristics of the training sample through a classification algorithm. Then, the error between the actual classification mark and the classification result is determined, and the weight/parameter of the machine learning model is adjusted based on the error. Adjust or optimize the weights/parameters of the machine learning model by comparing the error between the actual classification mark of the training sample and the classification result obtained by the machine learning model, and minimizing the error, so that the machine learning model can better represent the input The relationship between the training samples and the output classification results. After the machine learning model is trained, it can be stored in the memory in the format of a text file for future use.
  • the machine learning model can be saved in the memory after being trained. Therefore, the trained machine learning model can be directly loaded when applying the machine learning model.
  • the machine learning model can be retrained periodically or when needed. For example, after a trained machine learning model has been applied for a period of time, the environmental information collected during actual application can be used as a supplement to training samples to train the model, thereby increasing the robustness of the model. For another example, when the architecture of the machine learning model is changed, the machine learning model with the new architecture also needs to be retrained.
  • step 103 it is determined based on the classification result whether there is any production equipment in an abnormal state in the factory.
  • the representation of the classification result indicates whether there are production equipment in an abnormal state in the factory (for example, human screams, noise from abnormal operation of production equipment, human expressions of surprise, etc.), so it can be based on the classification results Determine whether there is any production equipment in an abnormal state in the factory.
  • rules can be set to make judgments based on the multiple classification results. For example, when the environmental information includes a sound signal, the sound signal is divided into multiple frames, and it can be determined whether there is any production equipment in an abnormal state in the factory according to the classification results of several consecutive frames in the multiple frames. For another example, when the sound signal is a multi-channel sound signal, it can be determined whether there is any production equipment in an abnormal state in the factory according to the classification result of the frames divided from several sound signals in the multi-channel sound signal.
  • the method proceeds to step 104, when it is judged that there are production equipment in an abnormal state in the factory, the candidate production equipment in the abnormal state is determined based on a predetermined rule to control at least one of the candidate production equipment in the abnormal state.
  • a predetermined rule to control at least one of the candidate production equipment in the abnormal state.
  • the environmental information includes a sound signal
  • step 104 further includes: determining the position of the sound source emitting the sound signal; and determining the candidate production equipment in an abnormal state based on the position of the sound source.
  • Various methods can be used to locate the sound source emitting the sound signal.
  • a microphone is used to collect sound signals, and a machine learning model can be trained to locate the sound source. When training the machine learning model, set the production equipment and microphones at fixed positions, and make the sound source emit different sounds at different positions (for example, the operator screams, the noise of the simulated abnormal operation of the production equipment, the simulated broken Sound, etc.) as a training sample.
  • a microphone array is used to collect sound signals.
  • the microphone array has the function of locating the sound source, for example, by a method based on time difference of arrival (TDOA), a method based on high-resolution spectrum estimation, a controllable beam method, a subspace-based method, and so on.
  • TDOA time difference of arrival
  • the candidate production equipment in an abnormal state can be determined based on the position.
  • the candidate production equipment in an abnormal state can be determined according to predetermined rules.
  • the predetermined rules can be changed according to factory settings or actual needs.
  • the operator works in collaboration with production equipment (eg, collaborative robots).
  • the predetermined rule may include: when the classification result indicates that a human screams (which may mean that the production equipment is causing injury to the operator, for example, the production equipment collides with the operator, and the parts on the production equipment fall Or chemical spills, etc.), all production equipment near the sound source location (for example, within 2 meters) are identified as candidate production equipment in an abnormal state; when the classification result indicates the noise or abnormal operation of the production equipment When the sound is broken (may mean that the production equipment is operating abnormally or a part of the production equipment is broken or dropped), all production equipment in the area where the sound source is located (for example, within 1 meter) are determined as candidates Production equipment in an abnormal state. In other embodiments, the operator supervises the operation of one or more production equipment.
  • a human screams which may mean that the production equipment is causing injury to the operator, for example, the production equipment collides with the operator, and the parts on the production equipment fall Or chemical spills, etc.
  • all production equipment near the sound source location for example, within 2 meters
  • the predetermined rule may include: when the classification result indicates that a human screams (which may mean that one or some production equipment that the operator is supervising is malfunctioning or operating abnormally), the location of the sound source is relatively close. All production equipment within a long distance (for example, 3 meters) as the production equipment being supervised by the operator are determined as candidate production equipment in an abnormal state; when the classification result indicates the noise or broken sound of abnormal operation of the production equipment, the All production equipment in the area (for example, within 1 meter) of the sound source location are determined as candidate production equipment in an abnormal state.
  • the operator works with production equipment (for example, a collaborative robot), and a camera is arranged next to the production equipment to capture the face image of the operator.
  • the predetermined rule may include: when the classification result indicates a surprised or frowning expression of humans (which may mean that the production equipment is causing harm to the operator), it will correspond to the collected facial image.
  • the production equipment operated by the operator is regarded as a candidate production equipment in an abnormal state.
  • the operator supervises the operation of one or more production equipment, and arranges a camera near the production equipment to capture the face image of the operator.
  • the predetermined rule may include: when the classification result indicates a human expression of surprise or frowning expression (which may mean that a certain or some production equipment being supervised by the operator is malfunctioning or operating abnormally), All production equipment supervised by the operator corresponding to the collected face image are determined as candidate production equipment in an abnormal state.
  • predetermined rules may be set as needed to determine the candidate production equipment in an abnormal state.
  • a control signal may be sent to at least one of the determined candidate production equipment in the abnormal state, for example, the control signal indicates that the candidate is in the abnormal state. At least one of the production equipment stops operating.
  • a sound and/or visual alarm may be issued to indicate that the candidate production equipment may be in an abnormal state. After the operator or other staff receives the alarm, they first check the production equipment, and then control the operation of the production equipment. In other embodiments, it is also possible to preset whether to send a control signal or an alarm as needed.
  • the abnormal state is an emergency state (for example, the operator is injured)
  • the abnormal state is a non-emergency state (for example, the operation of the production equipment is abnormal)
  • an alarm is issued to indicate.
  • the above method allows when items that may cause injury and/or damage to the operator or other objects are dropped or ejected from the production equipment, without direct contact with the production equipment, or even if it is not within the sight of the sensor. Can detect and start the safety mechanism normally. Moreover, the method also allows detecting abnormal operation of the production equipment or potential injury and/or damage to the operator or other objects and initiating a safety mechanism. In addition, since sensors in traditional security protection methods are not required, this method can cover a larger detection range.
  • FIG. 2 shows an architectural schematic diagram 200 of a system for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • the operator works with the production equipment 202a-202c, and needs to monitor whether the production equipment 202a-202c is operating normally and whether it causes harm to the operator.
  • Microphones 203a, 203b, 203c, and 203d are arranged next to the production equipment 202a-202c.
  • the computing device 201 (for example, a server) is communicatively connected (wired or wirelessly) with the microphone arrays 203a-203d and the controllers of the production equipment 202a-202c, so as to be able to receive the sound signals collected by the microphone arrays 203a-203d and respond to the sound
  • the signal undergoes a series of processing operations such as framing, frequency domain transformation, and classification through machine learning models, and sends corresponding control signals to the controllers of the production equipment 202a-202c according to the classification results and predetermined rules.
  • the training and application of the machine learning model are all performed on the same computing device 201, that is, the machine learning model is trained first, and then the trained model is stored in the local memory for future use.
  • the trained model can also be directly applied without being stored for future use.
  • the training and application of the machine learning model can be performed on different computing devices. For example, a machine learning model can be trained on another computing device with a higher configuration, and the machine learning model can be stored in a file after the training is completed. When applying the machine learning model, only the trained machine learning model needs to be loaded on the computing device.
  • FIG. 3 shows an example architecture of the convolutional neural network in the machine learning model according to the embodiment of FIG. 2.
  • FIG. 4 shows an example architecture of a recurrent neural network in the machine learning model according to the embodiment of FIG. 2.
  • FIG. 5 shows the time unfolding architecture of the recurrent neural network illustrated in FIG. 4.
  • Fig. 6 shows an example of unit A of the recurrent neural network illustrated in Fig. 4.
  • the sound in the factory environment is collected by the microphone arrays 203a-203d arranged next to the three production equipment 202a-202c in the factory.
  • the characteristics of the sound generally change with time, but within a short time interval, it remains basically stable. Therefore, in order to characterize the change state or degree of sound over time, in the embodiment of FIG. 2, the characteristics of the sound signal can be extracted in both the time domain and the frequency domain to train and apply a machine learning model.
  • the machine learning model may be a cascade of a convolutional neural network (RNN) and a recurrent neural network (RNN).
  • FIG. 3 shows the architecture 300 of the AlexNet neural network as a convolutional neural network.
  • the first layer structure 301 of the network architecture 300 is a convolutional layer with 96 11 ⁇ 11 convolution kernels and two pooling layers.
  • the second layer structure 302 It is a convolutional layer with 256 5 ⁇ 5 convolution kernels and two pooling layers.
  • the third layer structure 303 and the fourth layer structure 304 are both convolution layers with 384 3 ⁇ 3 convolution kernels.
  • the layer structure 305 is a convolutional layer with 256 3 ⁇ 3 convolution kernels and two pooling layers.
  • the sixth layer structure 306 and the seventh layer structure 307 are both fully connected layers with 4096-dimensional vector output.
  • the eighth layer The structure 308 is a fully connected layer with a 1000-dimensional vector output.
  • Each frame of the sound signal is input into the first layer structure 303 of the network architecture and output from the eighth layer structure 308 after being converted into a spectrogram.
  • the convolutional layer the inner product of the weight and the pixel value of the local area image is calculated to extract the characteristics of the local area. This process is repeated slidingly on the entire image to extract the characteristic information of the entire image about the weight vector.
  • the feature information input to the layer is converted into a multi-dimensional feature vector. Therefore, in the example of FIG. 3, the convolutional neural network extracts a 1000-dimensional feature vector from the spectrogram.
  • the recurrent neural network is used to extract the time domain characteristics of the sound signal.
  • the 1000-dimensional feature vector of the frame at the current time point t extracted by the convolutional neural network Alexnet illustrated in Figure 3 (in Figure 4 is represented by x t , t represents the current The start time of the frame relative to the entire sound signal, such as 0ms, 10ms, 20ms). is input into the RNN unit A.
  • RNN unit A iterates weights or parameters within itself over time.
  • FIG. 6 shows an architecture 600 of a gated recurrent unit (GRU) adopted by the RNN unit A.
  • GRU gated recurrent unit
  • the RNN unit A may adopt other types of structures, including but not limited to a long short-term memory network (LSTM) and a long short-term memory network with peephole (LSTM with peephole), and so on.
  • LSTM long short-term memory network
  • LSTM with peephole long short-term memory network with peephole
  • the hidden state h t is input to the fully connected layer to obtain the classification result Y t .
  • the fully connected layer of the recurrent neural network has a softmax classifier.
  • the role of the classifier is to convert the hidden state h t into N probability values of N categories.
  • the category with the largest probability value will be output as the predicted category (ie, classification result) Y t of the frame at the current time point t.
  • the categories may include four categories: human screaming, noise from abnormal operation of production equipment, normal operation of production equipment, and breaking sound.
  • the classifier converts the hidden state h t into 4 probability values of 4 categories. For example, if the four probability values obtained for the frame at the current time point t are 0.5, 0.2, 0.1, and 0.2, the category with the largest probability value of 0.5—human screams will be output as the classification result.
  • a cascade architecture of convolutional neural network and recurrent neural network is constructed.
  • the convolutional neural network and the recurrent neural network are trained together as an end-to-end model, so the parameters obtained by training are more accurate.
  • Using this cascade method is also easy to replace the convolutional neural network when needed in the future.
  • the specific architecture of the network and recurrent neural network can also be trained separately. For example, after the last fully connected layer of the Alexnet convolutional neural network shown in FIG. 3, a Softmax classifier is added to train the convolutional neural network separately.
  • other neural network models or other machine learning models may also be used.
  • the next step in training a machine learning model is to obtain training data.
  • different types of sound samples can be collected through the microphone arrays 203a-203d, including human screams, noise from abnormal operation of production equipment, sounds of normal operation of production equipment, and broken sounds.
  • sound samples can also be obtained from the network.
  • audio clips of different categories can be obtained as sound samples through audio and video websites or search engines. Considering that the sound samples on the Internet are not necessarily collected in the factory environment, these sound clips can be preprocessed such as superimposing factory background noise to make them more in line with the actual factory environment.
  • sound samples can also be obtained from an existing speech library or sound library. In some embodiments, any number of the sound samples collected by the microphone array, the sound samples obtained from the Internet, and the sound samples obtained from the speech library or the sound library may also be combined as the sound samples.
  • the microphone arrays 203a-203d send the collected sound samples to the computing device 201, and the computing device 201 stores them in its memory.
  • the computing device 201 preprocesses these stored sound signals to obtain training samples, including framing the sound signal and converting each frame of the sound signal into the frequency domain.
  • the frame shift is set to 10ms and the frame length is set to 25ms. That is to say, the previous frame and the next frame have an overlap of 15ms.
  • the first frame is from 0ms to 25ms in the sound signal, and the second The frame ranges from 10ms to 35ms and so on in the sound signal.
  • the computing device 201 uses a time-frequency domain transformation method such as Fast Fourier Transform (FFT) to transform the sound signal of each frame from the time domain to the frequency domain to obtain multiple spectrograms.
  • FFT Fast Fourier Transform
  • the actual classification mark of the training sample needs to be used to adjust the parameters in the model in the process of training the model, it is necessary to make a corresponding classification mark for each frame after or during the framing process, namely Indicates that the frame belongs to one of human screaming, noise from abnormal operation of production equipment, sound from normal operation of production equipment, and broken sound.
  • training samples and corresponding actual classification labels are used as model input and corresponding expected output pairs to train the machine learning model.
  • each spectrogram is sequentially input into the machine learning model for model training.
  • the spectrogram of the current frame extract a set of features from the spectrogram.
  • the convolutional neural network in the machine learning model is used to extract the frequency domain features of the current frame and the recurrent neural network is used to extract the time domain features of the current frame.
  • the output of the convolutional neural network is used as the input of the recurrent neural network. Therefore, the feature vector obtained from the convolutional neural network is input to the recurrent neural network to extract time-domain features.
  • the model is initialized before it is used to extract features and predict classification results for the first time.
  • the initial value of each weight/parameter in the convolutional neural network and the recurrent neural network can be obtained randomly, and the initial hidden state h 0 can be set equal to the feature vector x 0 .
  • the classifier of the machine learning model obtains a predicted classification result Y t based on the extracted features for each spectrogram.
  • the model can automatically extract features from the spectrogram and output the predicted classification results.
  • the error between the actual classification mark of the frame and the predicted classification result Y t is obtained.
  • a loss function is used to represent this error, that is, the degree of inconsistency between the predicted classification result Y t and the actual classification mark.
  • An optimization algorithm for example, gradient descent
  • the weights/parameters of the machine learning model are updated, the above process is repeated for the spectrogram of the next frame until the machine learning model converges and each weight/parameter tends to stabilize.
  • the machine learning model converges, it means that the model has been trained and can be stored in the memory in a text file format for future use.
  • the linear microphone arrays 203 a-203 d collect sound in the surrounding environment of the production equipment 202 a-202 c in real time and transmit the collected four sound signals to the computing device 201.
  • the computing device 201 saves the received sound signals for a predetermined period of time (for example, 1s) at predetermined intervals (for example, 5ms, 100ms, 1s...) Audio files.
  • the computing device 201 divides each saved audio file into frames with a frame shift of 10 ms and a frame length of 25 ms.
  • frame shifts and frame lengths of other lengths for framing can also be set as needed, as long as they are consistent with the frame shifts and frame lengths used when training the machine learning model.
  • the computing device 201 can use a DSP filter or a digital filter to filter each frame, for example, to filter out sound signals with frequencies above 5.5 kHz.
  • the computing device 201 transforms each frame of the sound signal from the time domain to the frequency domain through a time-frequency domain transformation method such as fast Fourier transform to obtain multiple spectrograms.
  • the above completes the process of converting the sound signals collected by the microphone arrays 203a-203d within a period of time (for example, 1s) into the input of the machine learning model.
  • a period of time for example, 1s
  • each of the multiple spectrograms is sequentially input into the trained machine learning model.
  • the machine learning model first extracts features from the input spectrogram.
  • the process of extracting features is the same as the process of extracting features from the spectrogram when training the machine learning model described above, and will not be repeated here.
  • the machine learning model then predicts the classification result of the spectrogram based on the extracted features.
  • the classification result may be one of human screams, noise of abnormal operation of production equipment, sound of normal operation of production equipment, or broken sound.
  • the sound source of the sound signal is located through the microphone array 203a-203d.
  • a method based on time difference of arrival (TDOA) is used to locate the sound source.
  • the first step of positioning is to calculate that the sound signal reaches each pair of microphones in the microphone array 203a-203d (203a-203b, 203a-203c, 203a-203d, 203b-203c, 203b-203d, 203c-203d, a total of 6 pairs) ) Time difference between.
  • Methods of calculating the time difference include, but are not limited to, generalized cross-correlation (GCC), multi-channel cross-correlation coefficient (MCCC), and so on.
  • GCC generalized cross-correlation
  • MCCC multi-channel cross-correlation coefficient
  • the second step is to calculate the direction of the sound source relative to the microphone array 203a-203d. It can be calculated by listing the mathematical equations of the calculated time difference between each pair of microphones and the plane coordinates of each microphone.
  • the third step is to calculate the position of the sound source based on the spatial coordinate position of each microphone.
  • the calculation method may include, but is not limited to, a method of determining a hyperbola through the time difference between each pair of microphones, a method based on triangulation, a method based on grid, a method based on machine learning, and so on.
  • the candidate production equipment in an abnormal state can be determined according to the following rules: when the classification result is human screaming, all the near (for example, within 2 meters) of the sound source location (ie, the operator's location) Production equipment is determined as candidate production equipment in an abnormal state; when the classification result is the noise or breaking sound of abnormal operation of the production equipment, the production equipment within a predetermined range (for example, within 1 meter) of the sound source position is determined as the candidate Production equipment in an abnormal state. Then, the computing device 201 sends a control signal to all candidate production devices in an abnormal state to stop the operation of these production devices.
  • Fig. 7 shows a block diagram of an apparatus for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • the apparatus 700 includes an information acquisition unit 701, an information classification unit 702, an abnormality judgment unit 703, and a candidate determination unit 704.
  • the information obtaining unit 701 is configured to obtain environmental information around at least one production device in the factory.
  • the information classification unit 702 is configured to obtain a classification result of the environmental information by using a machine learning model, and the machine learning model is configured to output a corresponding classification result based on the features extracted from the environmental information.
  • the abnormality determination unit 703 is configured to determine whether there is a production equipment in an abnormal state in the factory based on the classification result.
  • the candidate determination unit 704 is configured to determine a candidate production device in an abnormal state based on a predetermined rule when it is determined that there is a production device in an abnormal state in the factory to control at least one of the candidate production devices in an abnormal state.
  • Each unit in FIG. 7 can be implemented by software, hardware (for example, an integrated circuit, FPGA, etc.), or a combination of software and hardware.
  • the environmental information includes a sound signal collected by at least one sound collection device, and the at least one sound collection device is arranged around the at least one production device.
  • the classification result includes at least one of the following: human screams, noises of abnormal operation of production equipment, sounds of normal operation of production equipment, and cracking sounds.
  • the candidate determining unit 704 is further configured to: determine the position of the sound source emitting the sound signal; and determine the candidate production equipment in an abnormal state based on the position of the sound source.
  • the device 700 further includes a signal framing module (not shown), which is configured to frame the sound signal; and a signal conversion module (not shown), which is configured to frame the framing Each frame of the multi-frame signal is converted into a spectrogram, which is used as the input of the machine learning model.
  • the environmental information includes image signals collected by at least one image capture device, and the at least one image capture device is arranged around the at least one production device.
  • the classification result includes at least one of the following: a human surprised expression, a human frowning expression, and a human smiling expression.
  • the apparatus 700 further includes a model training module (not shown), which is configured to train a machine learning model.
  • the machine learning model is a neural network model.
  • FIG. 8 shows a block diagram of a computing device 800 for identifying production equipment in an abnormal state in a factory according to an embodiment of the present disclosure.
  • the computing device 800 for identifying production equipment in an abnormal state in a factory includes a processor 801 and a memory 802 coupled to the processor 801.
  • the memory 802 is used to store computer-executable instructions, and when the computer-executable instructions are executed, the processor 801 executes the method in the above embodiment.
  • the above method can be implemented by a computer-readable storage medium.
  • the computer-readable storage medium carries computer-readable program instructions for executing various embodiments of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the present disclosure proposes a computer-readable storage medium having computer-executable instructions stored thereon, and the computer-executable instructions are used to execute various implementations of the present disclosure.
  • the method in the example is a computer-readable storage medium having computer-executable instructions stored thereon, and the computer-executable instructions are used to execute various implementations of the present disclosure. The method in the example.
  • the present disclosure proposes a computer program product, which is tangibly stored on a computer-readable storage medium, and includes computer-executable instructions that, when executed, cause At least one processor executes the methods in the various embodiments of the present disclosure.
  • the various example embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device.
  • firmware or software that may be executed by a controller, microprocessor, or other computing device.
  • the computer-readable program instructions or computer program products used to execute the various embodiments of the present disclosure can also be stored in the cloud. When needed, the user can access the files stored on the cloud for execution through the mobile Internet, fixed network or other networks.
  • the computer-readable program instructions of an embodiment of the present disclosure implement the technical solutions disclosed according to the various embodiments of the present disclosure.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Factory Administration (AREA)

Abstract

一种用于识别工厂中处于异常状态的生产设备的方法、装置、计算设备、计算机可读存储介质及计算机程序产品,该方法包括:获得工厂中的至少一个生产设备周围的环境信息;利用机器学习模型获得环境信息的分类结果,机器学习模型被配置为基于从环境信息中提取的特征来输出对应的分类结果;基于分类结果判断工厂中是否存在处于异常状态的生产设备;以及当判断为工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制候选的处于异常状态的生产设备中的至少一个。

Description

用于识别工厂中处于异常状态的生产设备的方法和装置 技术领域
本公开涉及工业控制领域,更具体地说,涉及用于识别工厂中处于异常状态的生产设备的方法、装置、计算设备、计算机可读存储介质和程序产品。
背景技术
在工厂环境中,生产机器和设备(以下统称“生产设备”)可能会对人类造成伤害和/或对其它对象造成损坏。伤害和/或损坏的一种原因是,生产设备或其移动部件与操作人员或其它对象形成不期望的接触,比如生产设备的机械臂与操作人员碰撞。另外一种可能的原因是,从生产设备上掉落或喷射出可能会对操作人员或其它对象造成伤害和/或损坏的物品,比如生产设备上的移动部件断裂而从生产设备掉落、化学制品从生产设备的某个容器中溢出等等。
目前,通常采用以下几种方法来避免生产设备可能造成的伤害和/或损坏。1)设置安全按钮/脚踏板。持续按压/踩踏安全按钮/脚踏板可以使得生产设备工作,而当松开安全按钮/脚踏板时,会使得生产设备停止工作。2)设置停止开关。停止开关可以具有例如按钮、控制杆或插塞等多种形式。当启用停止开关时,可以使得生产设备停止工作。3)在生产设备外部套设防护笼来避免操作人员或其它对象进入生产设备的移动部件(例如,机械臂)可能触及的区域。在防护笼上可以安装传感器以检测笼门是否被打开。当检测到笼门打开时,停止生产设备的工作。4)在生产设备附近设置挡光板或压力垫。可以通过挡光板或压力垫上的传感器检测操作人员是否进入生产设备的操作区域。当检测到操作人员进入生产设备的操作区域时,停止生产设备的工作。5)对于协作式机器人之类的生产设备,可以在其上设置传感器(例如,使用传感器皮肤)来检测协作式机器人与操作人员或其它对象的接触或碰撞。当检测到接触或碰撞时,停止生产设备的工作。
发明内容
在传统的生产设备的安全保护方法中,通常需要直接的物理接触或传感器的视线关系来启动安全保护机制。例如,对于以上提及的方法1)和方法2),需要对按钮/脚踏板/开关的按压与松开,而对于以上提及的方法3)、方法4)和方法5),检测的事件需要发生在传感器的感测范围内。然而,当从生产设备上掉落或喷射出可能会对操作人员或其它对象造成伤害和/或损坏的物品(例如,生产设备上的移动部件断裂而从生产设备掉落,化学制品从生产设备的某个容器中溢出等等)时,传统的生产设备的安全保护方法无法起到保护作用。
本公开的第一实施例提出了一种用于识别工厂中处于异常状态的生产设备的方法,包括:获得工厂中的至少一个生产设备周围的环境信息;利用机器学习模型获得环境信息的分类结果,机器学习模型被配置为基于从环境信息中提取的特征来输出对应的分类结果;基于分类结果判断工厂中是否存在处于异常状态的生产设备;以及当分类结果指示工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制候选的处于异常状态的生产设备中的至少一个。
在该实施例中,通过采集工厂中的环境信息(例如,环境声音和图像),并利用经训练的机器学习模型来判断生产设备是否正在或者潜在地对操作人员和/或其它对象造成伤害或损坏,从而控制生产设备的操作。因而,该方法允许当从生产设备上掉落或喷射出可能会对操作人员或其它对象造成伤害和/或损坏的物品时,无需与生产设备的直接接触,或者即使不在传感器的视线范围内也能进行检测并正常启动安全机制,例如,使生产设备发出声音/可视化警报,或者紧急关闭生产设备。而且,该方法还允许检测生产设备的异常操作或者潜在的对操作人员或其它对象的伤害和/或损坏并启动安全机制。此外,由于不需要传统方法中的传感器,该方法还能覆盖更大的检测范围。
本公开的第二实施例提出了一种用于识别工厂中处于异常状态的生产设备的装置,包括:信息获取单元,其被配置为获得工厂中的至少一个生产设备周围的环境信息;信息分类单元,其被配置为利用机器学习模型获得环 境信息的分类结果,机器学习模型被配置为基于从环境信息中提取的特征来输出对应的分类结果;异常判断单元,其被配置为基于分类结果判断工厂中是否存在处于异常状态的生产设备;以及候选确定单元,其被配置为当判断工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制候选的处于异常状态的生产设备中的至少一个。
本公开的第三实施例提出了一种计算设备,该计算设备包括:处理器;以及存储器,其用于存储计算机可执行指令,当计算机可执行指令被执行时使得处理器执行第一实施例中的方法。
本公开的第四实施例提出了一种计算机可读存储介质,该计算机可读存储介质具有存储在其上的计算机可执行指令,计算机可执行指令用于执行第一实施例的方法。
本公开的第五实施例提出了一种计算机程序产品,该计算机程序产品被有形地存储在计算机可读存储介质上,并且包括计算机可执行指令,计算机可执行指令在被执行时使至少一个处理器执行第一实施例的方法。
附图说明
结合附图并参考以下详细说明,本公开的各实施例的特征、优点及其他方面将变得更加明显,在此以示例性而非限制性的方式示出了本公开的若干实施例,在附图中:
图1示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的方法流程图;
图2示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的系统的架构示意图;
图3示出了根据图2的实施例的机器学习模型中卷积神经网络的一个示例架构;
图4示出了根据图2的实施例的机器学习模型中循环神经网络的一个示例架构;
图5示出了图4示例的循环神经网络的时间上展开的架构;
图6示出了图4示例的循环神经网络的单元A的一个示例;
图7示出了根据本公开的一个实施例的用于识别工厂中处于异常状态 的生产设备的装置的框图;以及
图8示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的计算设备的框图。
具体实施方式
以下参考附图详细描述本公开的各个示例性实施例。虽然以下所描述的示例性方法、装置包括在其它组件当中的硬件上执行的软件和/或固件,但是应当注意,这些示例仅仅是说明性的,而不应看作是限制性的。例如,考虑在硬件中独占地、在软件中独占地、或在硬件和软件的任何组合中可以实施任何或所有硬件、软件和固件组件。因此,虽然以下已经描述了示例性的方法和装置,但是本领域的技术人员应容易理解,所提供的示例并不用于限制用于实现这些方法和装置的方式。
此外,附图中的流程图和框图示出了根据本公开的各个实施例的方法和系统的可能实现的体系架构、功能和操作。应当注意,方框中所标注的功能也可以按照不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,或者它们有时也可以按照相反的顺序执行,这取决于所涉及的功能。同样应当注意的是,流程图和/或框图中的每个方框、以及流程图和/或框图中的方框的组合,可以使用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以使用专用硬件与计算机指令的组合来实现。
本文所使用的术语“包括”、“包含”及类似术语是开放性的术语,即“包括/包含但不限于”,表示还可以包括其他内容。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”等等。
图1示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的方法。参考图1,方法100从步骤101开始。在步骤101中,获得工厂中的至少一个生产设备周围的环境信息。环境信息可以是生产设备周围环境中的声音或图像。当人类被突然发生的事件惊吓或经受突如其来的痛苦时,通常会发出尖叫声,并伴随着惊讶或痛苦的面部表情。因而,在工厂环境中,当生产设备操作异常或者从生产设备上掉落或喷射出可能会对操 作人员造成伤害的物品时,可以利用人类对惊吓或痛苦的自然反应来启动安全保护机制。另外,还可以检测生产设备的潜在的对操作人员或其它对象的伤害和/或损坏。例如,当生产设备发出异常操作的噪声或者生产设备的环境中存在破碎声(例如,玻璃破碎声或者物品断裂声),通常暗示着生产设备操作异常或者生产设备可能或即将对操作人员或其它对象造成伤害和/或损坏。因此,可以捕获或采集至少一个生产设备周围环境中的声音或图像并对其进行分析来判断生产设备是否操作异常或者生产设备是否正在或即将对操作人员或其它对象造成伤害和/或损坏。
在一些实施例中,环境信息可以包括由至少一个声音采集设备采集的声音信号,至少一个声音采集设备被布置在工厂中的至少一个生产设备周围。可以在工厂中的生产设备周围布置一个或多个声音采集设备来采集声音信号。声音采集设备可以包括但不限于摄像机、拾音器或麦克风等等。当使用麦克风作为声音采集设备时,可以使用一个麦克风或麦克风阵列。可以采用任意适当的拓扑结构的麦克风阵列,例如,线性麦克风阵列之类的一维麦克风阵列、圆形麦克风阵列之类的二维麦克风阵列等等。
在一些实施例中,为了获得声音信号在短时间内的特征,在采集到声音信号之后,可以对声音信号进行分帧,然后将分帧后的多帧信号中的每帧信号分别转换为频谱图,以作为机器学习模型(下文中将提及)的输入。在对声音信号进行分帧时,可以对声音信号进行加窗,即用一个窗函数对一段声音信号进行滑动截断,得到多个短时的声音段。在一些实施例中,对声音信号的分帧可以采用交叠分段的方法,这是为了使帧与帧之间平滑过渡,保持其连续性。在另一些实施例中,对声音信号的分帧也可以采用连续分段的方法,即前一帧与后一帧之间不存在重叠部分。在分帧后,利用快速傅里叶变换(FFT)之类的时频域变换方法将每帧声音信号从时域变换为频域,以获得多个频谱图。在另一些实施例中,也可以交换对声音信号的分帧和转换为频域的处理顺序。
在一些实施例中,可以对采集到的声音信号进行滤波,以去除不期望的声音来增加机器学习模型预测的准确性。例如,当监测生产设备是否对操作人员造成伤害具有更高的优先级时,需要监测声音信号中是否包含人类尖叫。可以使用DSP滤波器或数字滤波器(例如,带通滤波器)来滤除人类 声音频率以外的频率。滤波过程可以在对声音信号进行分帧后或分帧前进行。
在一些实施例中,环境信息可以包括由至少一个图像采集设备采集的图像信号,至少一个图像采集设备被布置在至少一个生产设备周围。图像采集设备可以包括但不限于摄像机、摄像头或照相机等等。例如,可以在每个生产设备附近或周围安装一个摄像头来采集靠近生产设备的操作人员的人脸图像,以识别人类表情。又例如,可以使用安装在工厂车间内的摄像头来采集生产设备的图像,以识别从生产设备上掉落或喷射出可能会对操作人员/其它对象造成伤害/损坏的物品。又例如,也可以同时采集人脸图像和生产设备图像,以识别人类表情以及从生产设备掉落或喷射出物品两者。
继续参考图1,接下来,方法100进行到步骤102。在步骤102中,利用机器学习模型获得环境信息的分类结果,机器学习模型被配置为基于从环境信息中提取的特征来输出对应的分类结果。在一些实施例中,机器学习模型为神经网络模型。在另一些实施例中,机器学习模型也可以是其它种类的机器学习模型。
在环境信息包括声音信号时,分类结果可以包括以下各项中的至少一项:人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音、以及破碎声。当声音信号中包含人类尖叫时,通常可能的情况是操作人员受到突然的惊吓或经受突如其来的痛苦,当声音信号中包含生产设备异常操作的噪声或破碎声时,通常可能的情况是生产设备操作异常或者潜在地对操作人员或其它对象造成伤害和/或损坏。
在环境信号包括图像信号时,分类结果可以包括以下各项中的至少一项:人类惊讶的表情、人类皱眉的表情、人类微笑的表情、以及从生产设备掉落或喷射出物品。类似于声音信号,当图像中人类的表情是惊讶或皱眉时,通常可能的情况是操作人员受到突然的惊吓或经受突如其来的痛苦,或者操作人员发现生产设备处于异常操作的状态,当图像中从生产设备掉落或喷射出物品时,表示生产设备正在或即将对操作人员或其它对象造成伤害和/或损坏。因而,利用机器学习模型获得环境信息的分类结果,能够判断工厂中是否存在处于异常状态的生产设备。
在一些实施例中,机器学习模型被训练成从环境信息中提取特征。在这 样的实施例中,步骤102进一步包括:利用机器学习模型从环境信息中提取特征。在另一些实施例中,也可以采用手动方式提取,例如通过人工输入、注释、测量或进行其它配置。
在一些实施例中,在利用机器学习模型获得环境信息的分类结果之前,方法100还包括训练机器学习模型。训练机器学习模型首先包括构建机器学习模型。机器学习模型可以以级联方式使用各式各样的模型的组合,例如,用于提取特征的模型与用于分类的模型进行级联。接着是获得训练样本的特征,该训练样本具有实际分类标记。可以从一个或多个存储器和/或传感器中收集训练样本,训练样本可以被存储和/或传输到缓冲器、存储器、高速缓存、处理器或用于训练的其它设备。也可以从网络或现有的数据库中获得训练样本。训练样本的数量可以根据需要来确定。在一些实施例中,可以对训练样本进行一定的处理以产生额外的训练样本,以此增加模型的鲁棒性。例如,可以在摄像头采集的图片上叠加白噪声以进行模糊化处理、对图片翻转、平移和/或旋转来产生额外的训练样本。在一些实施例中,可以对训练样本进行预处理,例如,当环境信息为声音信号时,可以对训练使用的声音信号样本进行滤波、分帧、转换为频谱等操作。
训练机器学习模型的下一步是利用机器学习模型基于训练样本的特征获得训练样本的分类结果。机器学习模型中的分类器能够通过分类算法获得与训练样本的特征相对应的分类结果。接着,确定实际分类标记与分类结果之间的误差并基于误差来调整机器学习模型的权重/参数。通过比较训练样本的实际分类标记与通过机器学习模型获得的分类结果之间的误差,并使得该误差最小化来调整或优化机器学习模型的权重/参数,从而机器学习模型能够较好地表示输入的训练样本与输出的分类结果之间的关系。在训练好机器学习模型之后,可以将其以文本文件的格式存储在存储器中,以供将来使用。
应当指出,并非每次在利用机器学习模型获得环境信息的分类结果之前都需要训练机器学习模型。如上面提及的,机器学习模型在训练好之后可以被保存在存储器中。因此,可以在应用机器学习模型时直接加载训练好的机器学习模型。
在一些实施例中,可以周期性地或者在需要时重新训练机器学习模型。例如,在训练好的机器学习模型已经被应用一段时间之后,可以将实际应用 时所采集到的环境信息作为训练样本的补充以训练模型,从而增加模型的鲁棒性。又例如,当对机器学习模型的架构进行更改时,也需要对具有新架构的机器学习模型进行重新训练。
接着,在步骤103中,基于分类结果判断工厂中是否存在处于异常状态的生产设备。如上面提及的,分类结果的表现形式表示工厂中是否存在处于异常状态的生产设备(例如,人类尖叫、生产设备异常操作的噪声、人类惊讶的表情等等),因此可以根据分类结果来判断工厂中是否存在处于异常状态的生产设备。在一些实施例中,在环境信息对应多个分类结果时,可以设定规则来根据多个分类结果进行判断。例如,在环境信息包括声音信号时,声音信号会被分成多个帧,可以根据多个帧中的若干个连续帧的分类结果来判断工厂中是否存在处于异常状态的生产设备。又例如,当声音信号为多路声音信号时,可以根据从多路声音信号中的若干个声音信号划分的帧的分类结果来判断工厂中是否存在处于异常状态的生产设备。
随后方法转到步骤104,当判断工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制候选的处于异常状态的生产设备中的至少一个。当识别出工厂中存在处于异常状态的生产设备,比如有生产设备正在或者潜在地对操作人员或其它对象造成伤害和/或损坏时,需要采取一定的安全措施来预防或者阻止这种伤害和/或损坏。例如,紧急停止对操作人员或其它对象造成伤害和/或损坏的生产设备、发出声音和/或可视化警报来引起操作人员的注意等等。这需要在判断为工厂中存在处于异常状态的生产设备时确定哪个或哪些生产设备有可能处于异常状态。可以根据实际需要或工厂设置来确定预定的规则,以确定候选的处于异常状态的生产设备。
在一些实施例中,环境信息包括声音信号,步骤104进一步包括:确定发出声音信号的声源的位置;以及基于声源的位置确定候选的处于异常状态的生产设备。可以使用各种方式对发出声音信号的声源进行定位。在一些实施例中,使用一个麦克风采集声音信号,可以训练一个机器学习模型来实现声源的定位。在训练机器学习模型时,将生产设备和麦克风设置在固定位置,并使得声源在不同位置处发出不同的声音(例如,操作人员发出尖叫、模拟的生产设备异常操作的噪声、模拟的破碎声,等等),以作为训练样本。在 收集到声源在相对于麦克风的不同位置处发出的足够多的声音样本之后,使用这些声音样本和对应的声源相对于麦克风的位置来训练机器学习模型。在另一些实施例中,使用麦克风阵列采集声音信号。麦克风阵列具有对声源进行定位的功能,例如通过基于到达时间差(TDOA)的方法、基于高分辨率谱估计的方法、可控波束的方法、基于子空间的方法等等。
在获得声源的位置后,便可以基于该位置确定候选的处于异常状态的生产设备。可以根据预定的规则来确定候选的处于异常状态的生产设备。预定的规则可以根据工厂设置或实际需要而变化。在一些实施例中,操作人员与生产设备(例如,协作机器人)协同工作。在这样的实施例中,预定的规则可以包括:当分类结果指示人类尖叫时(可能意味着生产设备正在对操作人员造成伤害,例如,生产设备与操作人员碰撞、生产设备上的部件掉落或化学制品溢出等等),将声源位置附近(例如,在2米的范围内)的所有生产设备都确定为候选的处于异常状态的生产设备;当分类结果指示生产设备异常操作的噪声或破碎声时(可能意味着生产设备操作异常或者生产设备上的某个部件破碎或掉落),将声源位置所在区域内(例如,1米的范围内)的所有生产设备都确定为候选的处于异常状态的生产设备。在另一些实施例中,操作人员监督一个或多个生产设备的操作。在这样的实施例中,预定的规则可以包括:当分类结果指示人类尖叫时(可能意味着操作人员正在监督的某个或某些生产设备发生故障或操作异常),将离声源位置较远距离(例如,3米)范围内的所有生产设备作为操作人员正在监督的生产设备都确定为候选的处于异常状态的生产设备;当分类结果指示生产设备异常操作的噪声或破碎声时,将声源位置所在区域内(例如,1米的范围内)的所有生产设备都确定为候选的处于异常状态的生产设备。
在一些实施例中,操作人员与生产设备(例如,协作机器人)协同工作,并且在生产设备旁布置摄像头来拍摄操作人员的人脸图像。在这样的实施例中,预定的规则可以包括:当分类结果指示人类惊讶的表情或皱眉的表情时(可能意味着生产设备正在对操作人员造成伤害),将与所采集的人脸图像对应的操作人员所操作的生产设备作为候选的处于异常状态的生产设备。在另一些实施例中,操作人员监督一个或多个生产设备的操作,并且在生产设备附近布置摄像头来拍摄操作人员的人脸图像。在这样的实施例中,预定的 规则可以包括:当分类结果指示人类惊讶的表情或皱眉的表情时(可能意味着操作人员正在监督的某个或某些生产设备发生故障或操作异常),将与所采集的人脸图像对应的操作人员所监督的所有生产设备都确定为候选的处于异常状态的生产设备。
以上仅列举了基于预定的规则确定候选的处于异常状态的生产设备的一些示例。在其它实施例中,可以根据需要设定其它预定的规则来确定候选的处于异常状态的生产设备。
在一些实施例中,确定候选的处于异常状态的生产设备后,可以向所确定的候选的处于异常状态的生产设备中的至少一个发送控制信号,例如,控制信号指示使得候选的处于异常状态的生产设备中的至少一个停止操作。在另一些实施例中,确定候选的处于异常状态的生产设备后,可以发出声音和/或可视化警报,指示这些候选的生产设备可能处于异常状态。操作人员或其它工作人员收到警报后,先查看这些生产设备,再控制生产设备的操作。在另一些实施例中,也可以根据需要预先设置发送控制信号还是发出警报,例如,在异常状态为紧急状态时(例如,操作人员受到伤害),直接向所有候选的处于异常状态的生产设备发送停止操作的控制信号,在异常状态为非紧急状态时(例如,生产设备操作异常),发出警报进行指示。
因而,以上方法允许当从生产设备上掉落或喷射出可能会对操作人员或其它对象造成伤害和/或损坏的物品时,无需与生产设备的直接接触,或者即使不在传感器的视线范围内也能进行检测并正常启动安全机制。而且,该方法还允许检测生产设备的异常操作或者潜在的对操作人员或其它对象的伤害和/或损坏并启动安全机制。此外,由于不需要传统的安全保护方法中的传感器,该方法还能覆盖更大的检测范围。
下面参照一个具体的实施例来说明图1所示的用于识别工厂中处于异常状态的生产设备的方法。图2示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的系统的架构示意图200。在图2中示出的实施例中,一共有三个生产设备202a、202b和202c。操作人员与生产设备202a-202c协同工作,需要监测生产设备202a-202c是否正常操作以及是否对操作人员产生伤害。在生产设备202a-202c旁布置麦克风203a、203b、203c和203d,这些麦克风构成了线性麦克风阵列,用于采集生产设备202a-202c 周围环境中的声音。使用麦克风阵列能够覆盖工厂中较大的工作区域。计算设备201(例如,服务器)与麦克风阵列203a-203d以及生产设备202a-202c的控制器通信连接(有线地或无线地),从而能够接收麦克风阵列203a-203d所采集的声音信号并对该声音信号进行分帧、频域变换、通过机器学习模型分类等一系列处理操作,并根据分类结果和预定的规则向生产设备202a-202c的控制器发送相应控制信号。
接下来描述机器学习模型的训练过程。在本实施例中,机器学习模型的训练和应用都在相同的计算设备201上进行,即先进行机器学习模型的训练,然后将训练好的模型存储在本地存储器中,供将来使用。在一些实施例中,训练好的模型也可以直接被应用,而不存储以用于将来的使用。在另一些实施例中,机器学习模型的训练和应用可以在不同的计算设备上进行。例如,可以在具有较高配置的另一计算设备上训练机器学习模型,并在训练完成后将该机器学习模型以文件形式存储。当应用该机器学习模型时,仅需在计算设备上加载该训练好的机器学习模型。
训练机器学习模型首先需要构建机器学习模型。在本实施例中,采用神经网络模型作为机器学习模型。下面结合图3-图6描述图2的实施例中所使用的一个具体的机器学习模型。图3示出了根据图2的实施例的机器学习模型中卷积神经网络的一个示例架构。图4示出了根据图2的实施例的机器学习模型中循环神经网络的一个示例架构。图5示出了图4示例的循环神经网络的时间展开的架构。图6示出了图4示例的循环神经网络的单元A的一个示例。如上面提及的,在图2的实施例中,通过布置在工厂中的三个生产设备202a-202c旁边的麦克风阵列203a-203d来采集工厂环境中的声音。声音的特征总体上随时间变化,但在一段较短的时间间隔内,又保持基本平稳。因此,为了表征声音随时间的变化状态或程度,在图2的实施例中,可以在时域和频域两者上提取声音信号的特征来训练和应用机器学习模型。在图2的实施例中,机器学习模型可以是卷积神经网络(RNN)和循环神经网络(RNN)的级联。
图3示出了作为卷积神经网络的AlexNet神经网络的架构300。如图3中示出的,该网络架构300的第一层结构301为具有96个11×11卷积核的4重卷积操作的卷积层和两个池化层,第二层结构302为具有256个5×5卷积 核的卷积层和两个池化层,第三层结构303和第四层结构304均为具有384个3×3卷积核的卷积层,第五层结构305为具有256个3×3卷积核的卷积层和两个池化层,第六层结构306和第七层结构307均为具有4096维向量输出的全连接层,第八层结构308为具有1000维向量输出的全连接层。声音信号的每个帧在被转换为频谱图之后输入该网络架构的第一层结构303并从第八层结构308输出。在卷积层中,计算权重和局部区域图像像素值的内积,从而提取局部区域的特征,在整副图像上滑动地重复计算这个过程,提取得到整副图像关于此权重向量的特征信息。在全连接层中,将输入到该层的特征信息转换为多维的特征向量。因此,在图3的示例中,卷积神经网络从频谱图中提取1000维的特征向量。
通过卷积神经网络提取关于每帧声音信号的频域特征之后,利用循环神经网络提取关于声音信号的时域特征。在图4示出的循环神经网络的架构400中,通过图3示例的卷积神经网络Alexnet提取的当前时间点t的帧的1000维特征向量(在图4中用x t表示,t表示当前帧相对于整段声音信号的起始时间,例如0ms,10ms,20ms……)被输入到RNN单元A中。RNN单元A随时间在其自身内部迭代权重或参数。参考图5,上一时间点t-1的RNN单元A的输出结果h t-1作为当前时间点t的RNN单元A的输入,从而最终的输出h t能够保留从时间0开始的特征输入,具有先前输入的“知识记忆”并与“当前知识”集成并更新RNN单元A中的参数。通过这样的方式,在提取每帧声音信号的特征时,能够考虑到该帧与先前帧的关联性,因而增加模型的鲁棒性。
图6示出了RNN单元A所采用的门控循环单元(GRU)的架构600。本领域技术人员应当理解,RNN单元A可以采用其它类型的结构,包括但不限于长短期记忆网络(LSTM)和具有窥视孔的长短期记忆网络(LSTM with peephole)等等。参考图6,在本实施例中,在GRU架构中,将“忘记门”和“输入门”合成了一个单一的“更新门”。在该架构中,继承了上一时间点的隐藏状态h t-1的信息,并将其与输入的特征向量x t组合作为当前时间点t的新的输入。下面给出了“忘记门”输出的状态r t、“输入门”输出的状态z t、“更新门”输出的状态
Figure PCTCN2019078152-appb-000001
以及最终生成的隐藏状态h t的计算公式(1)-(4)。“忘记门”和“输入门”采用的是sigmoid激活函数,“更新门”采用 的是tanh激活函数,sigmoid激活函数将生成0-1范围之间的值,tanh激活函数将生成-1-1范围之间的值,以更新先前的隐藏状态h t-1。最终生成的隐藏状态h t将被作为下一时间点的RNN单元A的输入,使得下一时间点的隐藏状态继承当前时间点的隐藏状态h t的信息。
z t=σ(W z·[h t-1,x t])            (1)
r t=σ(W r·[h t-1,x t])           (2)
Figure PCTCN2019078152-appb-000002
Figure PCTCN2019078152-appb-000003
返回图4,在生成当前时间点的隐藏状态h t之后,该隐藏状态h t被输入到全连接层以获得分类结果Y t。具体来说,在本实施例中,循环神经网络的全连接层具有softmax分类器。该分类器的作用是将隐藏状态h t转换为N个类别的N个概率值。具有最大概率值的类别将作为当前时间点t的帧的预测类别(即分类结果)Y t进行输出。在本实施例中,类别可以包括人类尖叫、生产设备异常操作的噪声、生产设备正常操作声音、以及破碎声4个类别。因此,对于当前时间点t的帧,分类器将隐藏状态h t转换为4个类别的4个概率值。例如,针对当前时间点t的帧所获得的4个概率值分别为0.5、0.2、0.1和0.2,则具有最大概率值0.5的类别——人类尖叫将作为分类结果进行输出。
以上介绍了机器学习模型的一个实施例。在该实施例中,构建了卷积神经网络和循环神经网络的级联架构。在训练模型的过程中,将卷积神经网络和循环神经网络作为端到端模型一起进行训练,因而训练得到的参数更为准确,采用这样的级联方式也易于在将来需要时替换卷积神经网络和循环神经网络的具体架构。在一些实施例中,也可以将卷积神经网络和循环神经网络单独训练。例如,在图3示出的Alexnet的卷积神经网络的最后一层全连接层后增加Softmax分类器来将卷积神经网络单独进行训练。在一些实施例中,也可以采用其它神经网络模型或其它机器学习模型。
训练机器学习模型的下一步是获得训练数据。在图2的实施例中,可以通过麦克风阵列203a-203d收集不同类别的声音样本,包括人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音以及破碎声。在一些实施例中,还可以从网络获得声音样本。例如,可以通过音视频网站或搜索引擎获 得不同类别的声音片段作为声音样本。考虑到网络上的声音样本不一定是在工厂环境中采集的,因此,可以对这些声音片段进行叠加工厂背景噪音之类的预处理,使其更符合实际的工厂环境。在一些实施例中,还可以从现有的语音库或声音库中获得声音样本。在一些实施例中,还可以结合麦克风阵列所收集的声音样本、从网络上获得的声音样本、以及从语音库或声音库中获得的声音样本中的任意多个作为声音样本。
在本实施例中,麦克风阵列203a-203d将所采集的声音样本发送给计算设备201,并由计算设备201存储在其存储器中。计算设备201对这些存储的声音信号进行预处理以获得训练样本,包括对声音信号进行分帧以及将分帧后的每帧声音信号转换到频域。在本实施例中,将帧移设为10ms,帧长设为25ms,也就是说,前一帧与后一帧具有15ms的重叠部分,第一帧从声音信号中的0ms到25ms、第二帧从声音信号中的10ms到35ms等等。在分帧后,计算设备201利用快速傅里叶变换(FFT)之类的时频域变换方法将每帧声音信号从时域变换为频域,以获得多个频谱图。
由于在训练模型的过程中需要利用训练样本的实际分类标记来调整模型中的参数,因此,在分帧之后或者在分帧的过程中,还需要对每个帧做出相应的分类标记,即指示该帧属于人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音、破碎声中的一种。这样,训练样本和相应的实际分类标记作为模型输入和对应的期望输出对来训练机器学习模型。
之后,每个频谱图依次被输入到机器学习模型中以进行模型训练。针对当前帧的频谱图,从频谱图中提取一组特征。在本实施例中,利用机器学习模型中的卷积神经网络提取当前帧的频域特征并利用循环神经网络提取当前帧的时域特征。卷积神经网络的输出作为循环神经网络的输入,因此,从卷积神经网络所获得的特征向量被输入到循环神经网络中提取时域特征。在模型被初次用于提取特征和预测分类结果之前,对模型进行初始化。具体来说,卷积神经网络和循环神经网络中的各个权重/参数的初始值可以随机获得,而初始的隐藏状态h 0可以被设为与特征向量x 0相等。机器学习模型的分类器针对每个频谱图,基于所提取的特征获得预测的分类结果Y t。当模型被训练之后,该模型可以自动从频谱图中提取特征并输出预测的分类结果。
然后,针对当前帧的频谱图,获得该帧的实际分类标记与所预测的分类 结果Y t之间的误差。在本实施例中,使用损失函数来表示这种误差,即预测的分类结果Y t与实际分类标记的不一致程度。接着,基于误差来调整机器学习模型的权重/参数。可以采用优化算法(例如,梯度下降)来求解损失函数的最小值,从而更新模型中的各权重/参数。在机器学习模型的权重/参数被更新之后,针对下一个帧的频谱图重复以上过程,直到机器学习模型收敛,各权重/参数趋向稳定。当机器学习模型收敛时,表示模型已经训练完成,可以将其以文本文件的格式存储在存储器中,以供将来使用。
下面描述利用图2的系统来识别工厂中处于异常状态的生产设备的过程。在该过程中,应用训练好的机器学习模型。在图2所示出的实施例中,线性麦克风阵列203a-203d实时地采集生产设备202a-202c的周围环境中的声音并将所采集到的四路声音信号传输给计算设备201。取决于计算设备201的处理器的计算能力,计算设备201以预定的间隔时间段(例如,5ms、100ms、1s……)将所接收的各路声音信号分别保存为预定时长(例如,1s)的音频文件。随后,计算设备201将所保存的每个音频文件按10ms的帧移和25ms的帧长进行分帧。在其它实施例中,也可以根据需要设置用于分帧的其它长度的帧移和帧长,只要其与训练机器学习模型时使用的帧移和帧长相一致即可。分帧后,计算设备201可以使用DSP滤波器或数字滤波器对每帧进行滤波,例如,滤除5.5kHz以上频率的声音信号。接着,计算设备201通过快速傅里叶变换之类的时频域变换方法将每帧声音信号从时域变换为频域,以获得多个频谱图。以上便完成了将麦克风阵列203a-203d在一段时间(例如,1s)内采集到的声音信号转换为机器学习模型的输入的过程。在其它实施例中,也可以先对所接收的各路声音信号进行预滤波,然后再进行分帧和转换为频域的处理。
随后,将多个频谱图中的每个依次输入到训练好的机器学习模型中。机器学习模型首先从输入的频谱图中提取特征。提取特征的过程与上面所描述的在训练机器学习模型时从频谱图中提取特征的过程相同,在此将不再赘述。机器学习模型随后根据所提取的特征预测该频谱图的分类结果。在本实施例中,分类结果可以是人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音或破碎声中的一个。
在本实施例中,当针对某帧声音信号的分类结果为人类尖叫、生产设备 异常操作的噪声或破碎声时,判断为工厂中存在异常操作的生产设备或者存在正在或潜在地对操作人员造成伤害的生产设备。在这些情形下,首先,通过麦克风阵列203a-203d对声音信号的声源进行定位。在本实施例中,使用基于到达时间差(TDOA)的方法对声源进行定位。具体来说,定位的第一步是计算声音信号到达麦克风阵列203a-203d中每对麦克风(203a-203b,203a-203c,203a-203d,203b-203c,203b-203d,203c-203d共6对)之间的时间差。计算时间差的方法包括但不限于通过广义互相关(GCC)、多信道互相关系数(MCCC)的方法等等。第二步是计算声源相对于麦克风阵列203a-203d的方向。可以通过计算出的每对麦克风之间的时间差和每个麦克风的平面坐标列出数学方程来计算。第三步是基于每个麦克风的空间坐标位置计算声源的位置。计算方法可以包括但不限于通过每对麦克风之间的时间差确定双曲线的方法、基于三角测量的方法、基于网格的方法、基于机器学习的方法等等。
在确定声源位置之后,可以根据以下规则来确定候选的处于异常状态的生产设备:当分类结果为人类尖叫时,将声源位置(即操作人员位置)附近(例如,2米内)的所有生产设备都确定为候选的处于异常状态的生产设备;当分类结果为生产设备异常操作的噪声或破碎声时,将声源位置的预定范围内(例如,1米内)的生产设备确定为候选的处于异常状态的生产设备。然后,计算设备201向所有的候选的处于异常状态的生产设备发送控制信号,以停止这些生产设备的操作。
在其它实施例中,也可以根据从麦克风阵列接收到的各路声音信号的多个帧的多个分类结果来判断工厂中是否存在处于异常状态的生产设备。例如,在某个时间,当从来自麦克风阵列的至少三个声音信号划分的帧的分类结果均为人类尖叫、生产设备异常操作的噪声或破碎声时,才判断为工厂中存在处于异常状态的生产设备。又例如,当从来自麦克风阵列的至少一个声音信号划分的至少三个连续帧的分类结果均为人类尖叫、生产设备异常操作的噪声或破碎声时,才判断为工厂中存在处于异常状态的生产设备。应当指出,可以根据需要和工厂设置来设定各种规则,以根据分类结果判断工厂中是否存在处于异常状态的生产设备。
图7示出了根据本公开的一个实施例的用于识别工厂中处于异常状态 的生产设备的装置的框图。参照图7,装置700包括信息获取单元701、信息分类单元702、异常判断单元703和候选确定单元704。信息获取单元701被配置为获得工厂中的至少一个生产设备周围的环境信息。信息分类单元702被配置为利用机器学习模型获得环境信息的分类结果,机器学习模型被配置为基于从环境信息中提取的特征来输出对应的分类结果。异常判断单元703被配置为基于分类结果判断工厂中是否存在处于异常状态的生产设备。候选确定单元704被配置为当判断工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制候选的处于异常状态的生产设备中的至少一个。图7中的各单元可以利用软件、硬件(例如集成电路、FPGA等)或者软硬件结合的方式来实现。
在一些实施例中,环境信息包括由至少一个声音采集设备采集的声音信号,至少一个声音采集设备被布置在至少一个生产设备周围。在一些实施例中,分类结果包括以下各项中的至少一项:人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音、以及破碎声。
在一些实施例中,候选确定单元704被进一步配置为:确定发出声音信号的声源的位置;以及基于声源的位置确定候选的处于异常状态的生产设备。
在一些实施例中,装置700还包括信号分帧模块(未示出),其被配置为对声音信号进行分帧;以及信号转换模块(未示出),其被配置为将分帧后的多帧信号中的每帧信号分别转换为频谱图,以作为机器学习模型的输入。
在一些实施例中,环境信息包括由至少一个图像采集设备采集的图像信号,至少一个图像采集设备被布置在至少一个生产设备周围。在一些实施例中,分类结果包括以下各项中的至少一项:人类惊讶的表情、人类皱眉的表情、以及人类微笑的表情。
在一些实施例中,装置700还包括模型训练模块(未示出),其被配置为训练机器学习模型。在一些实施例中,机器学习模型为神经网络模型。
图8示出了根据本公开的一个实施例的用于识别工厂中处于异常状态的生产设备的计算设备800的框图。从图8中可以看出,用于识别工厂中处于异常状态的生产设备的计算设备800包括处理器801以及与处理器801耦 接的存储器802。存储器802用于存储计算机可执行指令,当计算机可执行指令被执行时使得处理器801执行以上实施例中的方法。
此外,替代地,上述方法能够通过计算机可读存储介质来实现。计算机可读存储介质上载有用于执行本公开的各个实施例的计算机可读程序指令。计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
因此,在另一个实施例中,本公开提出了一种计算机可读存储介质,该计算机可读存储介质具有存储在其上的计算机可执行指令,计算机可执行指令用于执行本公开的各个实施例中的方法。
在另一个实施例中,本公开提出了一种计算机程序产品,该计算机程序产品被有形地存储在计算机可读存储介质上,并且包括计算机可执行指令,该计算机可执行指令在被执行时使至少一个处理器执行本公开的各个实施例中的方法。
一般而言,本公开的各个示例实施例可以在硬件或专用电路、软件、固件、逻辑,或其任何组合中实施。某些方面可以在硬件中实施,而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本公开的实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时,将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备,或其某些组合中实施。
用于执行本公开的各个实施例的计算机可读程序指令或者计算机程序产品也能够存储在云端,在需要调用时,用户能够通过移动互联网、固网或者其他网络访问存储在云端上的用于执行本公开的一个实施例的计算机可读程序指令,从而实施依据本公开的各个实施例所公开的技术方案。
虽然已经参考若干具体实施例描述了本公开的实施例,但是应当理解,本公开的实施例并不限于所公开的具体实施例。本公开的实施例旨在涵盖在所附权利要求的精神和范围内所包括的各种修改和等同布置。权利要求的范围符合最宽泛的解释,从而包含所有这样的修改及等同结构和功能。

Claims (17)

  1. 用于识别工厂中处于异常状态的生产设备的方法,包括:
    获得工厂中的至少一个生产设备周围的环境信息;
    利用机器学习模型获得所述环境信息的分类结果,所述机器学习模型被配置为基于从所述环境信息中提取的特征来输出对应的分类结果;
    基于所述分类结果判断所述工厂中是否存在处于异常状态的生产设备;以及
    当判断所述工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制所述候选的处于异常状态的生产设备中的至少一个。
  2. 根据权利要求1所述的方法,其中,所述环境信息包括由至少一个声音采集设备采集的声音信号,所述至少一个声音采集设备被布置在所述至少一个生产设备周围。
  3. 根据权利要求2所述的方法,其中,所述分类结果包括以下各项中的至少一项:人类尖叫、生产设备异常操作的噪声、生产设备正常操作的声音、以及破碎声。
  4. 根据权利要求2所述的方法,其中,当判断所述工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备进一步包括:
    确定所述声音信号的声源的位置;以及
    基于所述声源的位置确定所述候选的处于异常状态的生产设备。
  5. 根据权利要求2所述的方法,还包括:
    对所述声音信号进行分帧;以及
    将分帧后的多帧信号中的每帧信号分别转换为频谱图,以作为所述机器学习模型的输入。
  6. 根据权利要求1所述的方法,其中,所述环境信息包括由至少一个图像采集设备采集的图像信号,所述至少一个图像采集设备被布置在所述至少一个生产设备周围。
  7. 根据权利要求6所述的方法,其中,所述分类结果包括以下各项中的至少一项:人类惊讶的表情、人类皱眉的表情、人类微笑的表情、以及从生产设备掉落或喷射出物品。
  8. 根据权利要求1所述的方法,还包括:训练所述机器学习模型。
  9. 根据权利要求1所述的方法,其中,所述机器学习模型为神经网络模型。
  10. 用于识别工厂中处于异常状态的生产设备的装置,包括:
    信息获取单元,其被配置为获得工厂中的至少一个生产设备周围的环境信息;
    信息分类单元,其被配置为利用机器学习模型获得所述环境信息的分类结果,所述机器学习模型被配置为基于从所述环境信息中提取的特征来输出对应的分类结果;
    异常判断单元,其被配置为基于所述分类结果判断所述工厂中是否存在处于异常状态的生产设备;以及
    候选确定单元,其被配置为当判断所述工厂中存在处于异常状态的生产设备时,基于预定的规则确定候选的处于异常状态的生产设备,以控制所述候选的处于异常状态的生产设备中的至少一个。
  11. 根据权利要求10所述的装置,其中,所述环境信息包括由至少一个声音采集设备采集的声音信号,所述至少一个声音采集设备被布置在所述至少一个生产设备周围。
  12. 根据权利要求11所述的装置,其中,所述候选确定单元被进一步配置为:
    确定所述声音信号的声源的位置;以及
    基于所述声源的位置确定所述候选的处于异常状态的生产设备。
  13. 根据权利要求10所述的方法,其中,所述环境信息包括由至少一个图像采集设备采集的图像信号,所述至少一个图像采集设备被布置在所述至少一个生产设备周围。
  14. 根据权利要求10所述的装置,还包括:
    模型训练模块,所述模型训练模块被配置为训练所述机器学习模型。
  15. 计算设备,包括:
    处理器;以及
    存储器,其用于存储计算机可执行指令,当所述计算机可执行指令被执行时使得所述处理器执行根据权利要求1-9中任一项所述的方法。
  16. 计算机可读存储介质,所述计算机可读存储介质具有存储在其上的计算机可执行指令,所述计算机可执行指令用于执行根据权利要求1-9中任一项所述的方法。
  17. 计算机程序产品,所述计算机程序产品被有形地存储在计算机可读存储介质上,并且包括计算机可执行指令,所述计算机可执行指令在被执行时使至少一个处理器执行根据权利要求1-9中任一项所述的方法。
PCT/CN2019/078152 2019-03-14 2019-03-14 用于识别工厂中处于异常状态的生产设备的方法和装置 WO2020181553A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078152 WO2020181553A1 (zh) 2019-03-14 2019-03-14 用于识别工厂中处于异常状态的生产设备的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078152 WO2020181553A1 (zh) 2019-03-14 2019-03-14 用于识别工厂中处于异常状态的生产设备的方法和装置

Publications (1)

Publication Number Publication Date
WO2020181553A1 true WO2020181553A1 (zh) 2020-09-17

Family

ID=72427775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078152 WO2020181553A1 (zh) 2019-03-14 2019-03-14 用于识别工厂中处于异常状态的生产设备的方法和装置

Country Status (1)

Country Link
WO (1) WO2020181553A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220130411A1 (en) * 2020-10-23 2022-04-28 Institute For Information Industry Defect-detecting device and defect-detecting method for an audio device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0997392A (ja) * 1995-10-02 1997-04-08 Hitachi Zosen Corp 異常監視方法およびその装置
JPH11177974A (ja) * 1997-12-15 1999-07-02 Hitachi Information Technology Co Ltd 監視装置
CN102036158A (zh) * 2009-10-07 2011-04-27 株式会社日立制作所 声响监视系统及声音集音系统
JP2015031766A (ja) * 2013-07-31 2015-02-16 富士通ファシリティーズ株式会社 表示プログラム、表示装置及び表示システム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0997392A (ja) * 1995-10-02 1997-04-08 Hitachi Zosen Corp 異常監視方法およびその装置
JPH11177974A (ja) * 1997-12-15 1999-07-02 Hitachi Information Technology Co Ltd 監視装置
CN102036158A (zh) * 2009-10-07 2011-04-27 株式会社日立制作所 声响监视系统及声音集音系统
JP2015031766A (ja) * 2013-07-31 2015-02-16 富士通ファシリティーズ株式会社 表示プログラム、表示装置及び表示システム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220130411A1 (en) * 2020-10-23 2022-04-28 Institute For Information Industry Defect-detecting device and defect-detecting method for an audio device

Similar Documents

Publication Publication Date Title
JP6527187B2 (ja) 学習モデル構築装置、異常検出装置、異常検出システム及びサーバ
Planinc et al. Introducing the use of depth data for fall detection
JP6532106B2 (ja) 監視装置、監視方法および監視用プログラム
KR101270074B1 (ko) 소리 기반 공간지도를 이용한 상황인식 장치 및 방법
Andersson et al. Fusion of acoustic and optical sensor data for automatic fight detection in urban environments
JP2019532387A (ja) 電子ゲート環境のための乳幼児検出
CN111223261A (zh) 一种复合智能生产安防系统及其安防方法
Mokhtari et al. Non-wearable UWB sensor to detect falls in smart home environment
KR20150038877A (ko) 사용자 입력에 대응되는 이벤트를 이용한 유저 인터페이싱 장치 및 방법
US20180188104A1 (en) Signal detection device, signal detection method, and recording medium
Pan et al. Cognitive acoustic analytics service for Internet of Things
US20200012866A1 (en) System and method of video content filtering
Arslan et al. Performance of deep neural networks in audio surveillance
WO2020181553A1 (zh) 用于识别工厂中处于异常状态的生产设备的方法和装置
JP2010197998A (ja) 音声信号処理システムおよび該システムを備えた自律ロボット
KR101553484B1 (ko) 손동작 인식 장치 및 그 방법
Ghidoni et al. A distributed perception infrastructure for robot assisted living
Yun et al. Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living
KR20190140546A (ko) 영상 기반 힘 예측 시스템 및 그 방법
Dahanayaka et al. A multi-modular approach for sign language and speech recognition for deaf-mute people
CN103826202A (zh) 基于手机传感器信息改进WiFi定位结果跳动的方法
KR102143077B1 (ko) 이종 센서들을 이용한 타겟 검출 방법 및 장치
Naronglerdrit et al. Monitoring of indoors human activities using mobile phone audio recordings
KR20220074630A (ko) 음향 이벤트 검출 장치 및 방법
Ahmed et al. Recognition of professional activities with displaceable sensors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919192

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19919192

Country of ref document: EP

Kind code of ref document: A1