WO2023060574A1 - 仪表识别方法、装置、电子设备和存储介质 - Google Patents

仪表识别方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023060574A1
WO2023060574A1 PCT/CN2021/124158 CN2021124158W WO2023060574A1 WO 2023060574 A1 WO2023060574 A1 WO 2023060574A1 CN 2021124158 W CN2021124158 W CN 2021124158W WO 2023060574 A1 WO2023060574 A1 WO 2023060574A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
sample
features
encoding
feature
Prior art date
Application number
PCT/CN2021/124158
Other languages
English (en)
French (fr)
Inventor
冀潮
沈鸿翔
欧歌
姜博然
魏书琪
Original Assignee
京东方科技集团股份有限公司
北京京东方技术开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京京东方技术开发有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180003003.9A priority Critical patent/CN116267025A/zh
Priority to PCT/CN2021/124158 priority patent/WO2023060574A1/zh
Publication of WO2023060574A1 publication Critical patent/WO2023060574A1/zh

Links

Images

Definitions

  • the present disclosure relates to the field of display technology, and in particular to an instrument identification method, an instrument identification device, electronic equipment, and a computer-readable storage medium.
  • the instrument reading operation is generally performed manually.
  • the reading personnel need to go to the position of the instrument and look at the position of the pointer in the dial for reading.
  • the image of the meter can be collected first in some cases, it is still necessary to manually check the image of the meter to determine the position of the pointer in the dial for reading.
  • the reading needs to be recorded, and the reading and recording operations need to be completed manually.
  • the efficiency is not high, and on the other hand, the accuracy is not high.
  • the disclosure provides a meter recognition method, a meter recognition device, electronic equipment and a computer-readable storage medium to solve the deficiencies in related technologies.
  • a meter identification method including:
  • the input of the target model includes the label
  • the output of the target model includes the coordinates of the key points in the sample meter image
  • the encoder includes a multi-head self-attention layer, the target model is used to encode the superimposed features to obtain encoded features, and determine the coordinates of the key points in the target instrument image according to the encoded features.
  • the method before inputting the superimposed features obtained after superimposing the position-coding features and the embedded features into the target model, the method further includes:
  • the initial model is trained according to the sample encoding features in the training sample set to obtain the target model, wherein the sample model includes the encoder, the input of the sample model includes the label, and the output of the sample model is at least Coordinates of the sample key points in the sample meter image are included.
  • the key point includes at least one of the following: a starting position of a dial reading, an end position of a dial reading, a midpoint position of a dial reading, a starting position of an instrument pointer, and an end position of an instrument pointer.
  • said determining the embedded feature of the pixel in the target instrument image, and encoding the position information of the pixel to obtain the position encoding feature includes:
  • the embedded feature of the pixel in each image is determined, and the position information of the pixel is encoded to obtain the position encoding feature.
  • the encoder includes a plurality of sequentially connected sub-encoders, and the target model further includes a feature pyramid;
  • the output of the target model further includes at least one of the following: the type of the target instrument image; and the coordinates of at least two diagonal points in a quadrilateral circumscribing an instrument in the target instrument image.
  • the target model includes a first target sub-model, a second target sub-model and a third target sub-model;
  • the input of the first target sub-model includes the fusion feature, and the output includes the type of the target instrument image
  • the input of the second target sub-model includes the fusion feature, and the output includes the coordinates of at least two diagonal points in the circumscribed quadrilateral of the instrument in the target instrument image;
  • the input of the third target sub-model includes the fusion features, and the output includes the coordinates of the key points in the target instrument image.
  • the input of the target model includes the label
  • the output of the target model includes the coordinates of the key points in the sample meter image
  • the encoder includes a multi-head self-attention layer, the target model is used to encode the superimposed features to obtain encoded features, and determine the coordinates of the key points in the target instrument image according to the encoded features.
  • the processor is further configured to:
  • the initial model is trained according to the sample encoding features in the training sample set to obtain the target model, wherein the sample model includes the encoder, the input of the sample model includes the label, and the output of the sample model is at least Coordinates of the sample key points in the sample meter image are included.
  • the key point includes at least one of the following: a starting position of a dial reading, an end position of a dial reading, a midpoint position of a dial reading, a starting position of an instrument pointer, and an end position of an instrument pointer.
  • the processor is configured to perform:
  • the embedded feature of the pixel in each image is determined, and the position information of the pixel is encoded to obtain the position coding feature.
  • the encoder includes a plurality of sequentially connected sub-encoders, and the target model further includes a feature pyramid;
  • the output of the target model further includes at least one of the following: the type of the target instrument image; and the coordinates of at least two diagonal points in a quadrilateral circumscribing an instrument in the target instrument image.
  • the target model includes a first target sub-model, a second target sub-model and a third target sub-model;
  • the input of the first target sub-model includes the fusion feature, and the output includes the type of the target instrument image
  • the input of the third target sub-model includes the fusion features, and the output includes the coordinates of the key points in the target instrument image.
  • Fig. 1 is a schematic flowchart of a meter identification method according to an embodiment of the present disclosure.
  • Fig. 2 is a schematic diagram of an encoder according to an embodiment of the present disclosure.
  • Fig. 3 is a schematic flowchart of another meter identification method according to an embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram of a feature pyramid according to an embodiment of the present disclosure.
  • Fig. 6 is a schematic block diagram of an apparatus for meter identification according to an embodiment of the present disclosure.
  • first target submodel may also be referred to as a second target submodel
  • a second target submodel may also be referred to as a first target submodel.
  • word "if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • the terms used herein are “greater than” or “less than”, “higher than” or “lower than” when representing a size relationship. But for those skilled in the art, it can be understood that the term “greater than” also covers the meaning of “greater than or equal to”, and “less than” also covers the meaning of “less than or equal to”; the term “higher than” covers the meaning of “higher than or equal to”. “The meaning of "below” also covers the meaning of "less than or equal to”.
  • Fig. 1 is a schematic flowchart of a meter identification method according to an embodiment of the present disclosure.
  • the method shown in this embodiment can be applied to an electronic device equipped with a processor, and the steps in the following embodiments can be mainly executed by the processor, wherein the electronic device includes but is not limited to a terminal and a server, and the terminal can, for example, It is a mobile phone, a tablet computer, a wearable device, etc., and the server may be a local service, a cloud server, etc., for example.
  • the instrument identification method includes the following steps:
  • step S101 the embedded feature of the pixel in the target instrument image is determined, and the position information of the pixel is encoded to obtain the position encoding feature, wherein the target instrument image contains a plurality of key points related to the instrument pointer as Label;
  • step S102 input the superimposed features obtained after superimposing the position-encoded features and the embedded features into the encoder of the target model
  • the input of the target model includes the label
  • the output of the target model includes the coordinates of the key points in the sample meter image
  • the encoder includes a multi-head self-attention layer, the target model is used to encode the superimposed features to obtain encoded features, and determine the coordinates of the key points in the target instrument image according to the encoded features.
  • a plurality of sample meter images may be collected first, then a training sample set is constructed according to the sample meter images, and then the target model is obtained by training based on the training sample set.
  • the method before inputting the superimposed features obtained after superimposing the position-coding features and the embedded features into the target model, the method further includes:
  • the initial model is trained according to the sample encoding features in the training sample set to obtain the target model, wherein the sample model includes the encoder, the input of the sample model includes the label, and the output of the sample model is at least Coordinates of the sample key points in the sample meter image are included.
  • the trained target model can determine the embedding features of the pixels in the target instrument image for which the reading needs to be determined, and can also encode the position information of the pixels to obtain the position coding features.
  • the pixels in the target instrument image can be expanded in one dimension (horizontally or vertically), and then the embedded features of the pixels can be obtained through the calculation of the fully connected layer, for example, the dimension of the embedded features is d.
  • the expanded pixels can be numbered, and then randomly initialized according to the number of the pixel to obtain the position encoding feature.
  • the dimension of the position encoding feature is the same as that of the embedded feature, which is also d-dimensional.
  • position-encoding features and embedded features can be superimposed to obtain superimposed features, for example, the position-encoding features and embedded features are spliced by concat.
  • the encoder of the target model can process the superimposed features to obtain one or more tensor tensors, and the target model can further process the tensors to output the coordinates of the key points in the target instrument image in the target instrument image, and then The pointer reading in the target gauge image can be determined from the resulting coordinates.
  • the instrument image can be processed by the target model obtained through training, and the coordinates of the key points in the target instrument image can be output, and then the pointer reading in the target instrument image can be determined according to the obtained coordinates, Therefore, the reading of the pointer in the meter can be automatically determined according to the image of the meter.
  • manual operation is reduced and efficiency is improved.
  • possible misoperation during manual operation can be avoided, which is conducive to improving accuracy.
  • the encoder of the target model contains a multi-head self-attention layer.
  • the multi-head self-attention layer can be used for the three main parameters Q (query vector sequence), K (key vector sequence) and V ( value vector sequence), projected through h different linear transformations, where h is the number of heads in the multi-head.
  • the multi-head self-attention layer maps Q and K to different subspaces of the high-level space ⁇ to calculate the similarity, normalizes the calculation results and multiplies them as weights and V, and then stitches them together (such as concat)
  • the attention information in different subspaces which reduces the dimension of each vector when calculating the attention of each head, is beneficial to avoid the problem of overfitting in the process of training the target model.
  • Fig. 2 is a schematic diagram of an encoder according to an embodiment of the present disclosure.
  • the initial input of the encoder is Inputs, such as the target meter image.
  • the embedded feature Input Embedding of the pixel in the target instrument image can be determined, and the position encoding feature Positional Encoding of the pixel can also be determined, and then the superposition feature obtained after superimposing the position encoding feature and the embedded feature is input into the encoding of the target model device.
  • the N on the left side of the encoder means that N encoders can be connected and used to perform multiple processing on the superimposed features.
  • the encoder can also include a feed-forward Feed Forward layer, superposition and normalization Add&Norm layer, the output and input of the multi-head self-attention layer can be processed by the Add&Norm layer, and then input into the Feed Forward layer, the output and input of the Feed Forward layer can be processed by the Add&Norm layer to obtain the encoded feature output.
  • Fig. 3 is a schematic flowchart of another meter identification method according to an embodiment of the present disclosure.
  • the determination of the embedded feature of the pixel in the target instrument image, and encoding the position information of the pixel to obtain the position encoding feature includes:
  • step S301 the target meter image is divided into multiple blocks
  • step S302 the embedded feature of the pixel in each image is determined, and the position information of the pixel is encoded to obtain the position coding feature.
  • the instrument image in the present disclosure can contain a large number of pixels, and for A large number of pixels are processed by the encoder together, and the resulting encoded features have relatively low accuracy for image semantic representation.
  • the target instrument image can be divided into multiple blocks first, for example, it can be divided into multiple regular blocks, and it can also be divided into irregular blocks, and then for each image, a process can be performed in the same way. dimension expansion to get embedded features.
  • regular blocks as an example, for example, they can be divided into 9 blocks of 3 ⁇ 3, for example, they can be divided into 64 blocks of 8 ⁇ 8, and so on.
  • the embedded features and position coding features can be determined for each image, and then the superimposed features can be obtained. Since the semantics represented by the superimposed features is for one image, and the number of corresponding pixels is small, the accuracy of the representation is relatively high. Furthermore, the encoding features obtained through the encoder processing have a relatively high accuracy for image semantic representation.
  • the target meter image is divided into n blocks, which are respectively X 1 to X n , and the corresponding position coding features are E pos1 to E posn .
  • Each block passes through the fully connected layer E d to obtain embedded features, and d indicates that the dimension of the embedded features obtained through the fully connected layer is d.
  • the fully connected layer may sequentially include a linear liner layer, a linear rectification Relu layer (which may function as an activation function, or other activation functions may be used as required), and a batch normalization Batch Norm layer.
  • the method of obtaining the superimposed feature Z through concat can be as follows:
  • the number of target instrument images processed each time is batchsize, batchsize is greater than or equal to 1, and the size of each target instrument image is the same, both are 256*256, that is, the number of horizontal and vertical pixels is 256 indivual.
  • the size of the image input to the target model can be batchsize*3*256*256, 3 represents the three color channels of the pixel, such as RGB (red, green and blue) channels, for example, the block method for the target instrument image is 8 ⁇ 8, Divided into 64 blocks, the input dimension of the encoder is 128, then the size can be changed:
  • This process can be understood as a dimensionality reduction process.
  • the encoded features obtained after the encoder process can represent the semantics of each image, and then be used to determine the label in each image. Since the label corresponds to the key point, after determining the label, it can be Further determine the coordinates of key points.
  • the encoder includes a plurality of sequentially connected sub-encoders, and the target model further includes a feature pyramid;
  • Each of the sub-encoders outputs an encoding result according to the input encoding features, and inputs the encoding results into the feature pyramid to obtain a plurality of encoding features, wherein the size corresponding to the encoding features of each of the sub-encoders is input.
  • the information is different, and the size information corresponding to the encoding result output by each sub-encoder is different.
  • the coding features determined for an image of one size express relatively weak semantic information.
  • the target instrument image can be processed, for example, through multiple sub-encoders to process the coding features of different size information separately, so as to obtain multiple coding results with different size information, then comprehensively consider the obtained multiple As a result of encoding, the expressed semantic information is relatively strong.
  • the encoding result output by each sub-encoder can be further input into the feature pyramid for processing, so as to enrich the position information of the final output encoding feature.
  • each of the sub-encoders is respectively connected to a linear layer, and the linear layer is used to reduce the dimensionality of the encoding result output by the encoder, and input the reduced-dimensional encoding result to the next sub-encoder.
  • Fig. 4 is a schematic diagram of a sub-encoder according to an embodiment of the disclosure.
  • the embedded feature of the input sub-encoder encoder A is embedding(bsz,n,d), and the size information is (bsz,n,d).
  • the size information of the encoding result stage1 is (bsz ,n,d).
  • the encoding result stage1 is subjected to dimension reduction processing, and the size information of the encoding result stage1 can be adjusted, for example, the size information can be adjusted to (bsz,n/4,d), and the encoding result after dimension reduction can be reduced Enter the next sub-encoder encoder B.
  • bsz is the abbreviation of batchsize
  • n indicates that the target meter image is divided into n blocks
  • d indicates the dimension of embedded features
  • the embedded feature embedding(bsz,n/4,d) after adjusting the size information can be input into the sub-encoder encoder B, and the size information is (bsz,n/4,d), which can be obtained after being processed by the sub-encoder encoder B
  • the size information of the encoding result stage2 is (bsz,n/4,d).
  • the encoding result stage2 is subjected to dimensionality reduction processing, and the size information of the encoding result stage2 can be adjusted, for example, the size information can be adjusted to (bsz,n/16,d), and the encoding result after dimensionality reduction can be reduced Enter the next sub-encoder encoder C.
  • the embedded feature embedding(bsz,n/16,d) after adjusting the size information can be input into the sub-encoder encoder C, and the size information is (bsz,n/16,d), which is obtained after being processed by the sub-encoder encoder C
  • the size information of the encoding result stage3 is (bsz,n/16,d).
  • the encoding result stage3 is dimensionally reduced, and the size information of the encoding result stage3 can be adjusted, for example, the size information can be adjusted to (bsz,n/16,d).
  • Encoder C is the last sub-encoder, and the output may not be connected to the linear layer.
  • Fig. 5 is a schematic diagram of a feature pyramid according to an embodiment of the present disclosure.
  • the processing from bottom to top is equivalent to the bottom-up feature convolution, which can be realized through the multiple sub-encoders, for example, through the three sub-encoders shown in Figure 4 accomplish.
  • the highest feature layer in the feature layer such as the top layer on the left side of Figure 5, it can be processed from top to bottom as shown in Figure 5, for example, upsampling can be performed to double the size of the feature, and pass Convolute the lower layers (for example, 1 ⁇ 1 convolution) to change the number of channels, and then add the enlarged feature and the feature after the convolution changes the number of channels.
  • upsampling can be performed to double the size of the feature, and pass Convolute the lower layers (for example, 1 ⁇ 1 convolution) to change the number of channels, and then add the enlarged feature and the feature after the convolution changes the number of channels.
  • the fusion feature can be obtained by further splicing through the fully connected layer , for example, the size information n, n/4, and n/16 are concatenated to obtain a dimension in the size information of the fused feature as (n+n/4+n/16).
  • the location features are spliced into the strong semantic information to ensure that the coding features are more comprehensive and accurate for the representation of the instrument image.
  • the output of the target model further includes at least one of the following:
  • the type of the target instrument image the coordinates of at least two diagonal points in the circumscribed quadrilateral of the instrument in the target instrument image.
  • the target model can also construct the initial model during the training process so that the trained target model can output the type of the target instrument image, the circumscribed quadrilateral of the instrument in the target instrument image Coordinates of at least two diagonal points, etc.
  • the processing of the target model can be paused so as not to waste memory.
  • the position of the instrument in the target image can be determined according to the coordinates of at least two diagonal points in the circumscribed quadrilateral of the instrument in the target instrument image output by the target type, which is beneficial to the subsequent accurate coordinates of the key points in the target instrument image.
  • the target model includes a first target sub-model, a second target sub-model and a third target sub-model;
  • the input of the first target sub-model includes the fusion feature, and the output includes the type of the target instrument image
  • the input of the second target sub-model includes the fusion feature, and the output includes the coordinates of at least two diagonal points in the circumscribed quadrilateral of the instrument in the target instrument image;
  • the input of the third target sub-model includes the fusion features, and the output includes the coordinates of the key points in the target instrument image.
  • a plurality of sub-models can be constructed to form a target model, such as a first target sub-model, a second target sub-model and a third target sub-model.
  • the input of the three sub-models can be the above-mentioned fusion features, and the fully connected layer in the sub-model can be set according to the required output. For example:
  • the size information tensor1 of the fully connected layer in the first target sub-model is ((bsz,(n+n/4+n/16)*d,2), where the third dimension 2 indicates that two results can be output, for example including 1 and 0 two results, 1 indicates that the target instrument image contains instruments, and 0 indicates that the target instrument image does not contain instruments;
  • the size information tensor3 of the fully connected layer in the third target sub-model is ((bsz,(n+n/4+n/16)*d,10), where the third dimension 10 means that 4 results can be output, for example, every 10 results constitute the coordinates of the above five key points.
  • the first target sub-model can realize the classification task, and the loss function type can be cross entropy; the second target sub-model can realize the regression task, and the loss function type can be L1 loss; the third target sub-model realizes It can be a regression task, and the type of loss function used can be L1 loss.
  • each coding feature and the sub-model is not limited to the situation described in the above-mentioned embodiments, and can be adjusted as needed.
  • the embodiments of the present disclosure also provide an embodiment of a meter recognition device.
  • Embodiments of the present disclosure also propose an instrument identification device, which can be applied to an electronic device equipped with a processor, and the steps in the following embodiments can be mainly executed by the processor, wherein the electronic device includes but is not limited to A terminal, a server, the terminal may be, for example, a mobile phone, a tablet computer, a wearable device, etc., and the server may be, for example, a local service, a cloud server, or the like.
  • the input of the target model includes the label
  • the output of the target model includes the coordinates of the key points in the sample meter image
  • the encoder includes a multi-head self-attention layer, the target model is used to encode the superimposed features to obtain encoded features, and determine the coordinates of the key points in the target instrument image according to the encoded features.
  • the processor is further configured to:
  • the initial model is trained according to the sample encoding features in the training sample set to obtain the target model, wherein the sample model includes the encoder, the input of the sample model includes the label, and the output of the sample model is at least Coordinates of the sample key points in the sample meter image are included.
  • the processor is configured to:
  • the embedded feature of the pixel in each image is determined, and the position information of the pixel is encoded to obtain the position encoding feature.
  • the encoder includes a plurality of sequentially connected sub-encoders, and the target model further includes a feature pyramid;
  • Each of the sub-encoders outputs encoding results according to the input encoding features, and inputs the encoding results into the feature pyramid to obtain fusion features, wherein the size information corresponding to the encoding results output by each of the sub-encoders is different , the size information corresponding to the encoding features input to each of the sub-encoders is different.
  • each of the sub-encoders is respectively connected to a linear layer, and the linear layer is used to reduce the dimensionality of the encoding result output by the encoder, and input the reduced-dimensional encoding result to the next sub-encoder.
  • the output of the target model further includes at least one of the following: the type of the target instrument image; and the coordinates of at least two diagonal points in a quadrilateral circumscribing the instrument in the target instrument image.
  • the target model includes a first target sub-model, a second target sub-model and a third target sub-model;
  • the input of the first target sub-model includes the fusion feature, and the output includes the type of the target instrument image
  • the input of the second target sub-model includes the fusion feature, and the output includes the coordinates of at least two diagonal points in the circumscribed quadrilateral of the instrument in the target instrument image;
  • Embodiments of the present disclosure also provide a computer-readable storage medium for storing a computer program.
  • the computer program is executed by a processor, the steps in the instrument identification method described in any of the above-mentioned embodiments are implemented.
  • device 600 may include one or more of the following components: processing component 602, memory 604, power supply component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616 .
  • the processing component 602 generally controls the overall operations of the device 600, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 602 may include one or more processors 620 to execute instructions to complete all or part of the steps of the above reference signal receiving method.
  • processing component 602 may include one or more modules that facilitate interaction between processing component 602 and other components.
  • processing component 602 may include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602 .
  • the memory 604 is configured to store various types of data to support operations at the device 600 . Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 604 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the multimedia component 608 includes a screen that provides an output interface between the device 600 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • the multimedia component 608 includes a front camera and/or a rear camera. When the device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 610 is configured to output and/or input audio signals.
  • the audio component 610 includes a microphone (MIC) configured to receive external audio signals when the device 600 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 604 or sent via communication component 616 .
  • the audio component 610 also includes a speaker for outputting audio signals.
  • the communication component 616 is configured to facilitate wired or wireless communication between the apparatus 600 and other devices.
  • the device 600 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, 4G LTE, 5G NR or a combination thereof.
  • the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 616 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • apparatus 600 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Realized by gate array (FPGA), controller, microcontroller, microprocessor or other electronic components, used to execute the above reference signal receiving method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable Realized by gate array
  • controller microcontroller, microprocessor or other electronic components, used to execute the above reference signal receiving method.
  • non-transitory computer-readable storage medium including instructions, such as the memory 604 including instructions, which can be executed by the processor 620 of the device 600 to implement the above reference signal receiving method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Landscapes

  • Image Processing (AREA)

Abstract

本公开涉及一种仪表识别方法,包括:确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征;将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;其中,所述目标模型的输入包括标签,所述目标模型的输出包括关键点在目标仪表图像中的坐标。根据本公开,可以通过训练得到的目标模型对仪表图像进行处理,输出目标仪表图像中关键点在目标仪表图像中的坐标,一方面减少了人工操作,提高了效率,另一方面则可以避免人工操作过程中可能出现的误操作,有利于提高准确率。

Description

仪表识别方法、装置、电子设备和存储介质 技术领域
本公开涉及显示技术领域,尤其涉及仪表识别方法、仪表识别装置、电子设备和计算机可读存储介质。
背景技术
目前对于仪表读数操作一般都是人工执行的,有些情况下需要读数人员亲自走到仪表所在位置看着指针在表盘中的位置进行读数。虽然一些情况下也可先采集仪表的图像,但是还是需要人工查看仪表图像确定指针在表盘中的位置进行读数。
而且,读数之后还需要记录读数,读数和记录操作都需要人工完成,一方面效率不高,另一方面准确度也不高。
发明内容
本公开提供仪表识别方法、仪表识别装置、电子设备和计算机可读存储介质,以解决相关技术中的不足。
根据本公开实施例第一方面,提出一种仪表识别方法,包括:
确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
可选地,在将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的之前,所述方法还包括:
确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征输入编码器以得到样本编码特征;
基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
可选地,所述关键点包括以下至少之一:表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
可选地,所述确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征包括:
将目标仪表图像划分为多块;
确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
可选地,所述编码器包括多个依次相连的子编码器,所述目标模型还包括特征金字塔;
每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入所述特征金字塔,以得到融合特征,其中,每个所述子编码器输出的编码结果对应的尺寸信息不同,输入每个所述子编码器的编码特征对应的尺寸信息不同。
可选地,所述目标模型的输出还包括以下至少之一:所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
可选地,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
可选地,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表 图像的类型;
所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
根据本公开实施例第二方面,提出一种仪表识别装置,包括一个或多个处理器,所述处理器被配置为执行:
确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
可选地,所述处理器还被配置为执行:
确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征输入编码器以得到样本编码特征;
基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
可选地,所述关键点包括以下至少之一:表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
可选地,所述处理器被配置为执行:
将目标仪表图像划分为多块;
确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
可选地,所述编码器包括多个依次相连的子编码器,所述目标模型还包括特征金字塔;
每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入所述特征金字塔,以得到融合特征,其中,每个所述子编码器输出的编码结果对应的尺寸信息不同,输入每个所述子编码器的编码特征对应的尺寸信息不同。
可选地,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
可选地,所述目标模型的输出还包括以下至少之一:所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
可选地,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像的类型;
所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
根据本公开实施例第三方面,提出一种电子设备,包括:处理器;用于存储计算机程序的存储器;其中,当所述计算机程序被处理器执行时,实现上述任一实施例所述的仪表识别方法。
根据本公开实施例第四方面,提出一种计算机可读存储介质,用于存储计算机程序,当所述计算机程序被处理器执行时,实现上述任一实施例所述的仪表识别方法中的步骤。
根据本公开的实施例,可以通过训练得到的目标模型对仪表图像进行处理,输 出目标仪表图像中关键点在目标仪表图像中的坐标,进而可以根据得到的坐标确定目标仪表图像中的指针读数,从而实现了根据仪表的图像自动确定仪表中指针的读数,一方面减少了人工操作,提高了效率,另一方面则可以避免人工操作过程中可能出现的误操作,有利于提高准确率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是根据本公开实施例示出的一种仪表识别方法的示意流程图。
图2是根据本公开的实施例示出的一种编码器的示意图。
图3是根据本公开实施例示出的另一种仪表识别方法的示意流程图。
图4是根据本公开实施例示出的一种子编码器的示意图。
图5是根据本公开的实施例示出的一种特征金字塔的示意图。
图6是根据本公开的实施例示出的一种用于仪表识别的装置的示意框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
在本公开实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开实施例。在本公开实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开实施例范围的情况下,第一目标子模型也可以被称为第二目标 子模型,类似地,第二目标子模型也可以被称为第一目标子模型。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
出于简洁和便于理解的目的,本文在表征大小关系时,所使用的术语为“大于”或“小于”、“高于”或“低于”。但对于本领域技术人员来说,可以理解:术语“大于”也涵盖了“大于等于”的含义,“小于”也涵盖了“小于等于”的含义;术语“高于”涵盖了“高于等于”的含义,“低于”也涵盖了“低于等于”的含义。
图1是根据本公开实施例示出的一种仪表识别方法的示意流程图。本实施例所示的方法可以适用于设置有处理器的电子设备,以下实施例中的步骤主要可以由处理器执行,其中,所述电子设备包括但不限于终端、服务器,所述终端例如可以是手机、平板电脑、可穿戴设备等,所述服务器例如可以是本地服务、云服务器等。
如图1所示,所述仪表识别方法包括以下步骤:
在步骤S101中,确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
在步骤S102中,将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
在一个实施例中,可以先采集多个样本仪表图像,然后根据样本仪表图像构建训练样本集,进而基于训练样本集进行训练得到所述目标模型。
例如,在将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的之前,所述方法还包括:
确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征 输入编码器以得到样本编码特征;
基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
据此,训练得到的目标模型针对需要确定读数的目标仪表图像,可以确定其中像素的嵌入式(embedding)特征,还可以对像素的位置信息进行编码得到位置编码特征。
例如,可以先在目标仪表图像中确定关键点作为标签,例如所述关键点包括以下至少之一:表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
需要说明的是,选择作为标签的关键点可以根据需要进行设置,本实施例选择以上5个关键点作为标签只是一种实施方式。
接下来可以将目标仪表图像中的像素一维(横向或纵向均可)展开,然后经过全连接层计算得到像素的嵌入式特征,例如嵌入式特征的维度为d维。
而为了确定像素的位置编码特征,对于展开后的像素可以进行编号,进而根据像素的编号进行随机初始化得到位置编码特征,例如位置编码特征的维度与嵌入式特征的维度相同,也是d维。
然后可以将位置编码特征和嵌入式特征进行叠加得到叠加特征,例如将位置编码特征和嵌入式特征通过concat方式进行拼接。
进而目标模型的编码器可以针对叠加特征进行处理,可以得到一个或多个张量tensor,目标模型可以针对张量进行进一步处理,以输出目标仪表图像中关键点在目标仪表图像中的坐标,进而可以根据得到的坐标确定目标仪表图像中的指针读数。
根据本公开的实施例,可以通过训练得到的目标模型对仪表图像进行处理,输出目标仪表图像中关键点在目标仪表图像中的坐标,进而可以根据得到的坐标确定目标仪表图像中的指针读数,从而实现了根据仪表的图像自动确定仪表中指针的读数,一方面减少了人工操作,提高了效率,另一方面则可以避免人工操作过程中可能出现的误操作,有利于提高准确率。
另外,目标模型的编码器中包含多头自注意层,多头自注意Multi-head self-attention层可以对于attention层中的三个主要参数Q(查询向量序列)、K(键向量序列)和V(值向量序列),通过h个不同的线性变换进行投影,h为多头中头的个数。多头自注意Multi-head self-attention层将Q、K映射到高位空间α的不同子空间中去计算相似度,将计算结果归一化后作为权重和V相乘,然后再拼接(例如concat)不同子空间中的attention信息,这样降低了计算每个head的attention时每个向量的维度,有利益避免训练得到目标模型的过程中产生过拟合的问题。
图2是根据本公开的实施例示出的一种编码器的示意图。
如图2所示,编码器的初始输入为Inputs,例如目标仪表图像。然后可以确定目标仪表图像中像素的嵌入式特征Input Embedding,还可以确定像素的位置编码特征Positional Encoding,然后将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器。编码器左侧的N,是指可以将N个编码器相连进行使用,从而对叠加特征进行多次处理。
所述编码器除了可以包括多头自注意Multi-head attention层,还可以包括前馈Feed Forward层、叠加和归一化Add&Norm层,多头自注意层的输出和输入可以经过Add&Norm层处理,进而输入Feed Forward层,Feed Forward层的输出和输入可以经过Add&Norm层处理,得到编码特征输出。
图3是根据本公开实施例示出的另一种仪表识别方法的示意流程图。如图3所示,在一个实施例中,所述确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征包括:
在步骤S301中,将目标仪表图像划分为多块;
在步骤S302中,确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
在一个实施例中,随着摄影技术和显示技术的提高,图像的分辨率也越来越高,图像中的像素也越来越多,本公开中的仪表图像就可以包含大量像素,而针对大量像素一起经过编码器处理,得到的编码特征对于图像语义表征的准确度相对较低。
因此,根据本实施例,可以先将目标仪表图像划分为多块,例如可以划分为多个规则的块,也可以划分为不规则的块,进而针对每块图像,可以按照相同的方式进行一维展开得到嵌入式特征。以划分为规则的块为例,例如可以划分为3×3共9块, 例如可以划分为8×8共64块等。
从而可以针对每块图像确定嵌入式特征和位置编码特征,进而得到叠加特征,由于叠加特征所表征的语义是针对一块图像而言的,对应像素数量较少,所以表征的准确度相对较高,进而通过编码器处理得到的编码特征对于图像语义表征的准确度也就相对较高。
针对将目标仪表图像划分为多块的情况,以下通过一个实施例对得到嵌入式特征的大致过程进行描述。
例如将目标仪表图像划分为n块,分别为X 1至X n,对应的位置编码特征为E pos1至E posn。每块分别经过全连接层E d得到嵌入式特征,d表示经过全连接层处理得到的嵌入式特征的维度为d维。全连接层例如可以依次包括线性liner层,线性整流Relu层(可以起到激活函数的作用,也可以根据需要采用其他激活函数),以及批量归一化Batch Norm层。
那么针对目标仪表图像,通过concat得到叠加特征Z的方式可以如下:
Z=concat(X 1E d+E pos1,X 2E d+E pos2,…,X nE d+E posn)。
例如针对批量处理的情况,每次处理的目标仪表图像的数量为batchsize,batchsize大于或等于1,每个目标仪表图像的尺寸相同,均为256*256,也即横向和纵向像素的数量为256个。
那么输入目标模型的图像的尺寸可以为batchsize*3*256*256,3表示像素的三个颜色通道,例如RGB(红绿蓝)通道,例如对于目标仪表图像分块的方式为8×8,划分为64块,编码器的输入维度为128,那么对于尺寸可以进行变化:
将batchsize*3*256*256先转变为batchsize*3*8*8*32*32,再进一步将32*32通过降维转换为128*1,最终转变为batchsize*3*64*128。
该过程可以理解为一个降维过程,经过编码器处理后得到的编码特征,就可以表征每块图像的语义,进而用于确定每块图像中的标签,由于标签对应关键点,确定标签后可以进一步确定关键点的坐标。
在一个实施例中,所述编码器包括多个依次相连的子编码器,所述目标模型还包括特征金字塔;
每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入 所述特征金字塔,以得到多个编码特征,其中,输入每个所述子编码器的编码特征对应的尺寸信息不同,每个所述子编码器输出的编码结果对应的尺寸信息不同。
由于一个目标仪表图像只有一种尺寸,例如上述实施例中的256*256,而针对一种尺寸的图像确定的编码特征,表达的语义信息相对较弱。为了克服这种情况,可以对目标仪表图像进行处理,例如通过多个子编码器进行对不用尺寸信息的编码特征分别进行处理,从而得到多个尺寸信息不同的编码结果,那么综合考虑得到的多个编码结果,表达的语义信息相对较强。
但是据此得到的编码结果,虽然语义信息较强,但是位置信息较弱。对此,本实施例可以将每个子编码器输出的编码结果进一步输入到特征金字塔中进行处理,以丰富最终输出的编码特征的位置信息。
在一个实施例中,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
图4是根据本公开实施例示出的一种子编码器的示意图。
在一个实施例中,主要以编码特征中的嵌入式特征部分进行说明,例如所述编码器包括3个子编码器,分别为encoder A、encoder B和encoder C,encoder A连接线性层Liner Layer A,encoderB连接线性层Liner Layer B,encoder C连接线性层Liner Layer C。
例如输入子编码器encoder A的嵌入式特征为embedding(bsz,n,d),尺寸信息为(bsz,n,d),经过子编码器encoder A处理后得到编码结果stage1的尺寸信息为(bsz,n,d)。进而经过线性层Liner Layer A对编码结果stage1进行降维处理,可以调整编码结果stage1的尺寸信息,例如将尺寸信息调整为(bsz,n/4,d),进而可以将降维后的编码结果输入下一个子编码器encoder B。
其中,bsz为batchsize的缩写,n表示目标仪表图像划分为n块,d表示嵌入式特征的维度。
进而可以将调整尺寸信息后的嵌入式特征embedding(bsz,n/4,d)输入子编码器encoder B,尺寸信息为(bsz,n/4,d),经过子编码器encoder B处理后得到的编码结果stage2的尺寸信息为(bsz,n/4,d)。进而经过线性层Liner Layer B对编码结果stage2进行降维处理,可以调整编码结果stage2的尺寸信息,例如将尺寸信息调整为(bsz,n/16,d),进而可以将降维后的编码结果输入下一个子编码器encoder C。
进而可以将调整尺寸信息后的嵌入式特征embedding(bsz,n/16,d)输入子编码器encoder C,尺寸信息为(bsz,n/16,d),经过子编码器encoder C处理后得到的编码结果stage3的尺寸信息为(bsz,n/16,d)。进而经过线性层Liner Layer C对编码结果stage3进行降维处理,可以调整编码结果stage3的尺寸信息,例如将尺寸信息调整为(bsz,n/16,d)。encoder C为最后一个子编码器,输出可以不连接线性层。
图5是根据本公开的实施例示出的一种特征金字塔的示意图。
如图5所示,针对目标仪表图像,从下到上进行的处理过程相当于bottom-up的特征卷积,可以通过所述多个子编码器实现,例如通过图4所示的3个子编码器实现。
进而对于特征层中最高的特征层,例如图5左侧最上面一层,可以按照图5所示再从上到下进行处理,例如可以进行上采样,将特征的尺寸扩大一倍,并通过对较低层进行卷积(例如1×1卷积)来改变通道数,进而将扩大尺寸后的特征和卷积改变通道数后的特征相加。
以此类推,如图5所示,从上到下依次确定三个层,每层输出的结果称作预测结果predict,对于三个层的输出结果,可以进一步经过全连接层进行拼接得到融合特征,例如将尺寸信息n、n/4、n/16拼接,得到融合特征的尺寸信息中的一个维度为(n+n/4+n/16)。
据此,在较强的语义信息中拼接了位置特征,确保编码特征对于仪表图像的表征更为全面、准确。
在一个实施例中,所述目标模型的输出还包括以下至少之一:
所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
目标模型除了可以输出关键点在目标仪表图像中的坐标,还可以通过在训练过程中对初始模型进行构造,使得训练得到的目标模型能够输出目标仪表图像的类型、目标仪表图像中仪表外接四边形中至少两个对角点的坐标等。
其中,根据目标模型输出的目标仪表图像的类型,可以确定目标仪表图像的类型是不是仪表图像,如果其中不包含仪表,那么类型就不是仪表图像。在这种情况下,目标模型的处理过程可以暂停,以免浪费内存。
根据目标类型输出的目标仪表图像中仪表外接四边形中至少两个对角点的坐标,可以确定仪表在目标图像中的位置,有利于后续准确地根据关键点在目标仪表图像中的坐标。
在一个实施例中,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像的类型;
所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
在一个实施例中,可以构建多个子模型来组成目标模型,例如第一目标子模型、第二目标子模型和第三目标子模型。
3个子模型的输入可以是上述融合特征,并且可以根据需要的输出设置子模型中的全连接层。例如:
第一目标子模型中全连接层的尺寸信息tensor1为((bsz,(n+n/4+n/16)*d,2),其中第3个维度2表示可以输出两种结果,例如包括1和0两种结果,1表示目标仪表图像中包含仪表,0表示标仪表图像中不包含仪表;
第二目标子模型中全连接层的尺寸信息tensor2为((bsz,(n+n/4+n/16)*d,4),其中第3个维度4表示可以输出4个结果,例如每个4个结果构成上述对角点的两个坐标;
第三目标子模型中全连接层的尺寸信息tensor3为((bsz,(n+n/4+n/16)*d,10),其中第3个维度10表示可以输出4个结果,例如每个10个结果构成上述5个关键点的坐标。
第一目标子模型实现的可以是分类任务,采用的损失函数类型可以为cross entropy;第二目标子模型实现的可以是回归任务,采用的损失函数类型可以为L1 loss;第三目标子模型实现的可以是回归任务,采用的损失函数类型可以为L1 loss。
需要说明的是,每个编码特征与子模型之间的关系并不限于上述实施例所述的 情况,具体可以根据需要进行调整。
与上述仪表识别方法的实施例相对应地,本公开的实施例还提出了一种仪表识别装置的实施例。
本公开的实施例还提出一种仪表识别装置,所述装置可以适用于设置有处理器的电子设备,以下实施例中的步骤主要可以由处理器执行,其中,所述电子设备包括但不限于终端、服务器,所述终端例如可以是手机、平板电脑、可穿戴设备等,所述服务器例如可以是本地服务、云服务器等。
在一个实施例中,所述仪表识别装置包括一个或多个处理器,所述处理器被配置为执行:
确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
在一个实施例中,所述处理器还被配置为执行:
确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征输入编码器以得到样本编码特征;
基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
在一个实施例中,所述关键点包括以下至少之一:表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
在一个实施例中,所述处理器被配置为执行:
将目标仪表图像划分为多块;
确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
在一个实施例中,所述编码器包括多个依次相连的子编码器,所述目标模型还包括特征金字塔;
每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入所述特征金字塔,以得到融合特征,其中,每个所述子编码器输出的编码结果对应的尺寸信息不同,输入每个所述子编码器的编码特征对应的尺寸信息不同。
在一个实施例中,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
在一个实施例中,所述目标模型的输出还包括以下至少之一:所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
在一个实施例中,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像的类型;
所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在相关方法的实施例中进行了详细描述,此处将不做详细阐述说明。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是 或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本公开的实施例还提出一种电子设备,包括:处理器;用于存储计算机程序的存储器;其中,当所述计算机程序被处理器执行时,实现上述任一实施例所述的仪表识别方法。
本公开的实施例还提出一种计算机可读存储介质,用于存储计算机程序,当所述计算机程序被处理器执行时,实现上述任一实施例所述的仪表识别方法中的步骤。
图6是根据本公开的实施例示出的一种用于仪表识别的装置600的示意框图。例如,装置600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图6,装置600可以包括以下一个或多个组件:处理组件602,存储器604,电源组件606,多媒体组件608,音频组件610,输入/输出(I/O)的接口612,传感器组件614,以及通信组件616。
处理组件602通常控制装置600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件602可以包括一个或多个处理器620来执行指令,以完成上述的参考信号接收方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理组件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。
存储器604被配置为存储各种类型的数据以支持在装置600的操作。这些数据的示例包括用于在装置600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件606为装置600的各种组件提供电力。电源组件606可以包括电源管理系统,一个或多个电源,及其他与为装置600生成、管理和分配电力相关联的组件。
多媒体组件608包括在所述装置600和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包 括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当装置600处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件610被配置为输出和/或输入音频信号。例如,音频组件610包括一个麦克风(MIC),当装置600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音频组件610还包括一个扬声器,用于输出音频信号。
I/O接口612为处理组件602和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件614包括一个或多个传感器,用于为装置600提供各个方面的状态评估。例如,传感器组件614可以检测到装置600的打开/关闭状态,组件的相对定位,例如所述组件为装置600的显示器和小键盘,传感器组件614还可以检测装置600或装置600一个组件的位置改变,用户与装置600接触的存在或不存在,装置600方位或加速/减速和装置600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件616被配置为便于装置600和其他设备之间有线或无线方式的通信息。装置600可以接入基于通信标准的无线网络,如WiFi,2G或3G,4G LTE、5G NR或它们的组合。在一个示例性实施例中,通信组件616经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件616还包括近场通信(NFC)模块,以促进短程通信息。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT) 技术和其他技术来实现。
在示例性实施例中,装置600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述参考信号接收方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器604,上述指令可由装置600的处理器620执行以完成上述参考信号接收方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本公开实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为 对本公开的限制。

Claims (18)

  1. 一种仪表识别方法,其特征在于,包括:
    确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
    将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
    其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
    所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
  2. 根据权利要求1所述的方法,其特征在于,在将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的之前,所述方法还包括:
    确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
    将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征输入编码器以得到样本编码特征;
    基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
    根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
  3. 根据权利要求1所述的方法,其特征在于,所述关键点包括以下至少之一:
    表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
  4. 根据权利要求1所述的方法,其特征在于,所述确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征包括:
    将目标仪表图像划分为多块;
    确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
  5. 根据权利要求4所述的方法,其特征在于,所述编码器包括多个依次相连的子 编码器,所述目标模型还包括特征金字塔;
    每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入所述特征金字塔,以得到融合特征,其中,每个所述子编码器输出的编码结果对应的尺寸信息不同,输入每个所述子编码器的编码特征对应的尺寸信息不同。
  6. 根据权利要求5所述的方法,其特征在于,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
  7. 根据权利要求5所述的方法,其特征在于,所述目标模型的输出还包括以下至少之一:
    所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
  8. 根据权利要求7所述的方法,其特征在于,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
    其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像的类型;
    所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
    所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
  9. 一种仪表识别装置,其特征在于,包括一个或多个处理器,所述处理器被配置为执行:
    确定目标仪表图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征,其中,所述目标仪表图像中包含多个与仪表指针相关的关键点作为标签;
    将所述位置编码特征和所述嵌入式特征叠加后得到的叠加特征输入目标模型的编码器;
    其中,所述目标模型的输入包括所述标签,所述目标模型的输出包括所述关键点在样本仪表图像中的坐标;
    所述编码器包括多头自注意层,所述目标模型用于对所述叠加特征进行编码得到编码特征,并根据所述编码特征确定所述关键点在目标仪表图像中的坐标。
  10. 根据权利要求9所述的装置,其特征在于,所述处理器还被配置为执行:
    确定样本仪表图像中像素的样本嵌入式特征,并对所述像素的样本位置信息进行编码得到位置编码特征,其中,所述样本仪表图像中包含多个与仪表指针相关的样本关键点作为样本标签;
    将所述样本位置编码特征和所述样本嵌入式特征叠加后得到的样本叠加特征输入编码器以得到样本编码特征;
    基于根据多个样本仪表图像得到样本编码特征确定训练样本集;
    根据所述训练样本集中的样本编码特征对初始模型进行训练以得到目标模型,其中,所述样本模型包括所述编码器,所述样本模型的输入包括所述标签,所述样本模型的输出至少包括所述样本关键点在样本仪表图像中的坐标。
  11. 根据权利要求9所述的装置,其特征在于,所述关键点包括以下至少之一:
    表盘读数起点位置、表盘读数终点位置、表盘读数中点位置、仪表指针起点位置、仪表指针终点位置。
  12. 根据权利要求9所述的装置,其特征在于,所述处理器被配置为执行:
    将目标仪表图像划分为多块;
    确定每块图像中像素的嵌入式特征,并对所述像素的位置信息进行编码得到位置编码特征。
  13. 根据权利要求12所述的装置,其特征在于,所述编码器包括多个依次相连的子编码器,所述目标模型还包括特征金字塔;
    每个所述子编码器分别根据输入的编码特征输出编码结果,并将编码结果输入所述特征金字塔,以得到融合特征,其中,每个所述子编码器输出的编码结果对应的尺寸信息不同,输入每个所述子编码器的编码特征对应的尺寸信息不同。
  14. 根据权利要求13所述的装置,其特征在于,每个所述子编码器分别连接线性层,所述线性层用于对编码器输出的编码结果进行降维,并将降维后的编码结果输入下一个子编码器。
  15. 根据权利要求13所述的装置,其特征在于,所述目标模型的输出还包括以下至少之一:
    所述目标仪表图像的类型;所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标。
  16. 根据权利要求15所述的装置,其特征在于,所述目标模型包括第一目标子模型、第二目标子模型和第三目标子模型;
    其中,所述第一目标子模型的输入包括所述融合特征,输出包括所述目标仪表图 像的类型;
    所述第二目标子模型的输入包括所述融合特征,输出包括所述目标仪表图像中仪表外接四边形中至少两个对角点的坐标;
    所述第三目标子模型的输入包括所述融合特征,输出包括所述关键点在目标仪表图像中的坐标。
  17. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储计算机程序的存储器;
    其中,当所述计算机程序被处理器执行时,实现权利要求1至8中任一项所述的仪表识别方法。
  18. 一种计算机可读存储介质,用于存储计算机程序,其特征在于,当所述计算机程序被处理器执行时,实现权利要求1至8中任一项所述的仪表识别方法中的步骤。
PCT/CN2021/124158 2021-10-15 2021-10-15 仪表识别方法、装置、电子设备和存储介质 WO2023060574A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180003003.9A CN116267025A (zh) 2021-10-15 2021-10-15 仪表识别方法、装置、电子设备和存储介质
PCT/CN2021/124158 WO2023060574A1 (zh) 2021-10-15 2021-10-15 仪表识别方法、装置、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/124158 WO2023060574A1 (zh) 2021-10-15 2021-10-15 仪表识别方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023060574A1 true WO2023060574A1 (zh) 2023-04-20

Family

ID=85987997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124158 WO2023060574A1 (zh) 2021-10-15 2021-10-15 仪表识别方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN116267025A (zh)
WO (1) WO2023060574A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351229A (zh) * 2023-09-25 2024-01-05 昆仑数智科技有限责任公司 仪表的读取方法及装置、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200034436A1 (en) * 2018-07-26 2020-01-30 Google Llc Machine translation using neural network models
CN110826549A (zh) * 2019-11-04 2020-02-21 山东欧玛嘉宝电气科技有限公司 基于计算机视觉的巡检机器人仪表图像识别方法及系统
CN113127631A (zh) * 2021-04-23 2021-07-16 重庆邮电大学 基于多头自注意力机制和指针网络的文本摘要方法
CN113221874A (zh) * 2021-06-09 2021-08-06 上海交通大学 基于Gabor卷积和线性稀疏注意力的文字识别系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200034436A1 (en) * 2018-07-26 2020-01-30 Google Llc Machine translation using neural network models
CN110826549A (zh) * 2019-11-04 2020-02-21 山东欧玛嘉宝电气科技有限公司 基于计算机视觉的巡检机器人仪表图像识别方法及系统
CN113127631A (zh) * 2021-04-23 2021-07-16 重庆邮电大学 基于多头自注意力机制和指针网络的文本摘要方法
CN113221874A (zh) * 2021-06-09 2021-08-06 上海交通大学 基于Gabor卷积和线性稀疏注意力的文字识别系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VASWANI ASHISH, SHAZEER NOAM, PARMAR NIKI, USZKOREIT JAKOB, JONES LLION, GOMEZ AIDAN N, KAISER LUKASZ, POLOSUKHIN ILLIA: "Attention Is All You Need", 9 December 2017 (2017-12-09), Long Beach, CA, USA , pages 1 - 11, XP055832424, Retrieved from the Internet <URL:https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf> [retrieved on 20210817] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351229A (zh) * 2023-09-25 2024-01-05 昆仑数智科技有限责任公司 仪表的读取方法及装置、存储介质

Also Published As

Publication number Publication date
CN116267025A (zh) 2023-06-20

Similar Documents

Publication Publication Date Title
US10007841B2 (en) Human face recognition method, apparatus and terminal
CN109658401B (zh) 图像处理方法及装置、电子设备和存储介质
US10115019B2 (en) Video categorization method and apparatus, and storage medium
CN104700353B (zh) 图像滤镜生成方法及装置
WO2017071083A1 (zh) 指纹识别方法及装置
CN107527059A (zh) 文字识别方法、装置及终端
CN105302315A (zh) 图片处理方法及装置
CN106373156A (zh) 通过图像确定空间参数的方法、装置及终端设备
KR20110020746A (ko) 오브젝트 정보 제공방법 및 이를 적용한 촬영장치
CN109360261A (zh) 图像处理方法、装置、电子设备及存储介质
EP3767488A1 (en) Method and device for processing untagged data, and storage medium
CN107944367A (zh) 人脸关键点检测方法及装置
CN104867112B (zh) 照片处理方法及装置
CN108009563B (zh) 图像处理方法、装置及终端
CN105635452A (zh) 移动终端及其联系人标识方法
CN109271552A (zh) 通过图片检索视频的方法、装置、电子设备及存储介质
CN109255128A (zh) 多层级标签的生成方法、装置和存储介质
CN110399841A (zh) 一种视频分类方法、装置及电子设备
CN107426088A (zh) 图片信息处理方法及装置
CN108122212A (zh) 图像修复方法及装置
CN110399934A (zh) 一种视频分类方法、装置及电子设备
US20230224574A1 (en) Photographing method and apparatus
CN113965694A (zh) 录像方法、电子设备及计算机可读存储介质
WO2023060574A1 (zh) 仪表识别方法、装置、电子设备和存储介质
CN104240274B (zh) 人脸图像处理方法及装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 18272360

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE