CN112766349A - Object description generation method based on machine vision and tactile perception - Google Patents

Object description generation method based on machine vision and tactile perception Download PDF

Info

Publication number
CN112766349A
CN112766349A CN202110037740.2A CN202110037740A CN112766349A CN 112766349 A CN112766349 A CN 112766349A CN 202110037740 A CN202110037740 A CN 202110037740A CN 112766349 A CN112766349 A CN 112766349A
Authority
CN
China
Prior art keywords
tactile
visual
label
keywords
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110037740.2A
Other languages
Chinese (zh)
Other versions
CN112766349B (en
Inventor
张鹏
周茂辉
单东日
邹文凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202110037740.2A priority Critical patent/CN112766349B/en
Publication of CN112766349A publication Critical patent/CN112766349A/en
Application granted granted Critical
Publication of CN112766349B publication Critical patent/CN112766349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明涉及一种基于机器视觉和触觉感知的物体描述方法,该方法以物体的机器视觉和触觉信息为输入,使用深度学习的方法识别出物体的种类以及物理属性,然后将识别结果转化为关键词形成物体描述语句。本发明专利提出的方法在宾夕法尼亚大学公开的视觉和触觉数据集(PHAC‑2数据集)上进行了训练并测试,类别关键词和物理属性关键词预测准率分别达到了100%和97.8%。这种由机器人探索感知物体后形成的描述语句的方法可有效推动机器人感知领域中人机交互技术的发展。

Figure 202110037740

The invention relates to an object description method based on machine vision and tactile perception. The method takes the machine vision and tactile information of the object as input, uses a deep learning method to identify the type and physical properties of the object, and then converts the identification result into a key Words form object description sentences. The method proposed in the patent of the present invention was trained and tested on the visual and tactile data set (PHAC‑2 data set) published by the University of Pennsylvania, and the prediction accuracy of category keywords and physical attribute keywords reached 100% and 97.8%, respectively. This method of describing sentences formed by robots after exploring and perceiving objects can effectively promote the development of human-computer interaction technology in the field of robot perception.

Figure 202110037740

Description

Object description generation method based on machine vision and tactile perception
Technical Field
The invention relates to the technical field of robot perception technology, multi-modal fusion and object description generation, in particular to an object description generation method based on machine vision and touch perception.
Background
With the development of sensor technology and artificial intelligence technology, the perception and decision-making capability of the robot is continuously improved, and the development of the robot is converting from the attribute of the robot to the attribute of a human. However, the cognitive discrimination ability of robots for objects is still far less than that of humans.
Humans use a combination of visual and tactile information to accomplish object recognition processes. Functional magnetic resonance imaging data indicates that human tactile and visual signals are processed in a multi-sensory coordinated manner in identifying objects. Inspired by human brain cross-modal coprocessing, foreign researchers use touch and visual signals to design a deep learning framework for touch attribute classification, and prove that the touch and visual signals are complementary, and the performance can be improved by combining data of the two forms.
It is of great significance to generate object descriptions at the visual and tactile perception level through the exploration of robots. The technology can effectively increase the participation sense and the acquisition sense of disabled people in life, and meanwhile, the object description technology can be applied to high-risk environments, so that robots are used for replacing people to explore and sense objects, corresponding feedback reports are formed, and the injury of people can be effectively reduced. At present, no corresponding object description generation method based on visual perception and tactile perception exists, and therefore the object description generation method provided by the invention can fill the blank of the technology.
Disclosure of Invention
The invention provides a grabbed object identification method based on fusion of a touch vibration signal and a visual image, aiming at making up for the defects in the prior art.
The invention is realized by the following technical scheme:
an object description generation method based on machine vision and tactile perception is characterized by comprising the following steps:
s1, preprocessing the visual and tactile original data;
s2, inputting the collected visual and tactile information into a two-dimensional convolution neural network and a one-dimensional convolution neural network respectively, and connecting the eigenvectors output by the two neural networks in series to obtain a visual and tactile fusion eigenvector;
s3, inputting the obtained visual-touch fusion characteristic vector into two fully-connected network branches, wherein the first fully-connected network is used for identifying and classifying objects, and the second fully-connected network is used for identifying physical attributes of the objects;
and S4, embedding the classification results and the physical attributes obtained by the two fully-connected networks into the object description sentences in the form of keywords.
Further, in order to better implement the present invention, in S1, the method for acquiring visual information includes transforming the original high-pixel image into a picture with a pixel value of 300 × 300, and randomly generating 30% offset processing on the brightness, contrast, and saturation of the picture to obtain the final image to be input.
Further, in order to better implement the present invention, in S1, the method for acquiring the haptic information includes cutting data by using matlab software, and compressing the multidimensional data with different lengths to finally obtain the haptic data with the same length.
Further, in order to better implement the present invention, in S2, the visual information and the tactile information are input in pairs, the visual information is input into the two-dimensional convolutional neural network, and the tactile information is input into the one-dimensional convolutional neural network; three layers of one-dimensional convolutional neural networks are used for processing the tactile information, and the RELU function is used as the activation function; the densnet169 model is used for visual information processing.
Further, in order to better implement the present invention, in S3, the supervised labels used by the two fully-connected networks are in the form of labels in a standard multi-class task and a multi-label task, the neural network with multi-branch output has two branches and two loss functions, the cross-entropy loss function is used in the multi-class task, and the loss function used in the multi-label task is a multi-label class loss function multilabel software label loss () provided by the pytorch neural network architecture.
Further, in order to better implement the present invention, in S4, a specific method for converting the classification result and the physical attribute into the keywords is to sort the object category keywords to form a list of n elements, and then use the index value of the object category keywords as the tags of the objects, where each object has only one tag; the output of the multi-classification task is n probability values, and the corresponding object category key words can be found according to the index of the numerical value with the maximum probability value; the label generation in the multi-label classification task is similar to the multi-classification task, m physical attribute key word values are firstly sequenced to form a list of m elements, the multi-classification label is composed of m elements and respectively corresponds to the m physical attribute key words, the physical attribute key words are required to be obtained through multi-label classification network output, an index with a predicted value of 1 in network output is required to be obtained, then corresponding attributes are called from the physical attribute key word list according to the index, and the extraction of the physical attribute key words is completed.
The invention has the beneficial effects that:
the object description generation method based on machine vision and touch perception provided by the invention constructs a multi-branch network model capable of simultaneously predicting object category keywords and physical attribute keywords, and then forms a description sentence of the object according to the predicted keywords. The method effectively improves the external perception expression capability of the robot, and enables the robot to be more intelligent in the human-computer interaction process.
Drawings
FIG. 1 is a schematic diagram of a multi-drop network of the present invention;
FIG. 2 is a schematic diagram of image data processing in a data set according to the present invention;
FIG. 3 is a diagram illustrating the centralized haptic data processing of the present invention;
FIG. 4 is a multi-category label mapping of the present invention;
FIG. 5 is a diagram of a multi-label classification label correspondence of the present invention;
FIG. 6 is a diagram of the results of various physical property predictions of the present invention;
FIG. 7 shows the object class prediction result of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
Fig. 1-7 illustrate a specific embodiment of the present invention, which is an object description generation method based on visual and tactile perception, and as shown in fig. 1, this embodiment proposes a multi-branch neural network structure with multi-modal input and multi-level output, which takes machine vision and tactile sensation as two modal inputs, where machine vision is input into a two-dimensional convolutional neural network, and machine tactile sensation is input into a one-dimensional convolutional neural network. And then, connecting the feature vectors output by the two-dimensional convolutional neural network and the one-dimensional convolutional neural network in series to obtain a visual-touch fusion feature vector. And finally, respectively inputting the visual-touch fusion characteristic vectors into two fully-connected network branches, wherein the first fully-connected network outputs the object type predicted based on the visual-touch fusion characteristic vectors, and the second predicts the physical attributes of the object based on the visual-touch fusion characteristic vectors. In addition, the embodiment provides an object description generation method, which converts the classification result and the physical attribute output by the multi-branch network structure into a keyword, and then embeds the keyword into a description statement template.
The specific implementation process of this embodiment is as follows:
1. the data set is composed of a plurality of data sets,
the method of this example was trained and tested on the PHAC-2 dataset, published by the university of Pennsylvania, containing visual and tactile data for 53 objects, wherein each object's visual data contained 8 photographs, which were collected by placing the objects on an aluminum disk that was photographed once for every 45 degrees of rotation. The haptic data set consists of two pressure values, micro-vibrations, and temperature values, the haptic data being from haptic data of squeezing, pinching, slow sliding, and fast sliding for each object. The data set also contains 24 tactile adjectives to describe physical properties of the object, including softness, hardness, temperature, viscosity, elasticity, etc. Each object in the data set is assigned several tactile adjectives, and to exclude contingencies, the adjectives of each object are determined collectively by 36 individuals.
The method proposed by this embodiment requires the data set to be divided into training set and test set, and we extract one visual data and one tactile data from each sample as test set. To ensure fairness, a number a between 1 and 8 is randomly generated by the computer for each object in the test set data selection process, and then the a-th image and the a-th tactile data of the object are taken.
To reduce the amount of network parameters, the image data was changed to 300 × 300 pictures. Because the visual information of the robot is most interfered by light, 30% of offset processing is randomly generated on the brightness, the contrast and the saturation of the picture in order to improve the robustness of the model.
Since the tactile data in the PHAC-2 data set is 88-dimensional data having a long length and a different length, it is necessary to compress the data. Through observation, the lengths of the two 'slow sliding' and 'fast sliding' tactile actions in the data set are about 2000 data points, the data volume of the part is small, and the data characteristics are obvious. And then using matlab software to cut data, wherein the data cutting basis is the data change amplitude, the data is read from the last, when the absolute value of the slope of the pressure value in the tactile data is greater than 1, the data change is considered to be large, and the length of 2000 data points is continuously read forwards as the starting point of cutting. In order to further reduce the data volume, only important pressure values and micro-vibration actions in the data set are extracted as the tactile data, and finally 46-dimensional tactile data with the length of 2000 data points is obtained.
2. The introduction of the model is carried out,
in the present embodiment, the visual sense and the tactile sense corresponding to the object are input in pairs, the visual sense model is input to the two-dimensional convolution model, the tactile sense data is input to the one-dimensional convolution model, and the learning rate is set to 0.00002.
The processed haptic data consists of 46 one-dimensional signals, and features of the haptic data are extracted by using a one-dimensional convolutional neural network according to the characteristics of the one-dimensional signals, three layers of one-dimensional convolutional neural networks are used in the embodiment, the RELU function is used as the activation function, and specific parameters of each layer are as follows:
table 1: one-dimensional convolution neural network parameter table
Number of layers Number of input channels Number of output channels Convolution kernel size Convolution step size
1 46 32 7 5
2 32 64 5 3
3 64 46 5 3
The processed visual image is a 300 × 3 three-channel color image, and the processing of the image uses a mature densnet169 model in the visual field.
Visual and tactile information is respectively extracted by using a two-dimensional convolution and a one-dimensional convolution to obtain eigenvectors with the lengths of 1664 and 1978, the two eigenvectors are connected in series to obtain a visual and tactile fusion eigenvector with the length of 3642, and then the visual and tactile fusion eigenvectors are respectively input into two fully-connected neural networks for classification. The two fully-connected networks differ in that the first is used for multi-classification tasks, i.e. after visual and tactile information of an object in the test set is entered into the model, the first fully-connected network can predict which of the 53 objects the object is. The second fully-connected network is used for a multi-label classification task, which differs from the multi-classification task in that the multi-classification task identifies which of a plurality of objects the object belongs to, and the multi-label classification task identifies which of a plurality of attributes the object belongs to.
The supervised tags used by both fully connected networks are in the form of tags in a standard multi-classification and multi-tagging task. It should be noted that such a multi-branch output neural network has two branches and thus has two loss functions. In this embodiment, the multi-classification task uses a cross entropy loss function (formula 1), the loss function used by the multi-label classification task is a multi label classification loss function (formula 2) provided in the pyrrch neural network architecture, an output using the loss function is defined by 0, an output prediction value greater than 0 is 1, and an output prediction value smaller than 0 is 0. The goal of the optimization during training is to minimize the total loss function (equation 3) value resulting from the addition of two loss functions.
loss(x1,class)=-x1[class]+log(∑j exp(x1[j]) Equation 1)
Wherein: x is the number of1Representing the prediction output of the fully-connected network, class representing the index of the label class
x1[j]Denotes x1The j-th value of (a).
Figure BDA0002893940290000051
Wherein: x is the number of2Representing the output of a fully connected network, y2Presentation label
x2[i]Denotes x2Is given by the ith value, y2[i]Denotes y2Value of (1)
y2[i]∈{0,1},i∈{0,…,x2.nElement()-1}
x2Nelelement () is used to count the number of output elements.
Loss=loss(x1,class)+loss(x2,y2) Equation 3
3. Conversion to keywords
The multi-tasking labeling process entails sorting the object class keywords into a list of 53 elements and then using the index values of the object class keywords as labels for the objects (as shown in fig. 4), with only one label per object. The label in the multi-classification task is composed of numbers from 0 to 52, and the 53 numbers have strict corresponding relation with 53 object class keywords. Our goal is to convert the numerical values of the multi-classification task output into corresponding object class keywords. The output of the multi-classification task is 53 probability values, and according to the corresponding relation of fig. 4, the corresponding object category keyword can be found by the index of the numerical value with the maximum probability value. For example, if the 0 th output probability value in the multi-classification task is the maximum, the corresponding object category keyword is "aluminum", and if the 51 st output probability value is the maximum, the corresponding object category keyword is "yellow felt". Therefore, to obtain the category keyword corresponding to the multi-category output, it is necessary to obtain the index corresponding to the maximum probability value among the 53 probabilities, and then use this index value to retrieve the object category keyword at the corresponding position in the keyword list.
The label generation in the multi-label classification task is similar to the multi-classification task, in this embodiment, 24 physical attribute keyword values are firstly sorted to form a list of 24 elements, and the multi-classification label is composed of 24 elements and respectively corresponds to the 24 physical attribute keywords. Referring to fig. 5, the labels are formed by numbers 0/1, each position in the label corresponds to an attribute, for example, if the number of the nth position is 1, the object has the attribute corresponding to the nth position in the attribute list, and if the nth +1 position is 0, the object has no attribute corresponding to the nth +1 position in the attribute list. Therefore, when the physical attribute keywords are required to be obtained from the multi-label classification network output, the index with the predicted value of 1 in the network output needs to be obtained, and then the corresponding attributes are called from the physical attribute keyword list according to the index to complete the extraction of the physical attribute keywords.
4. The generation of the descriptive sentence is carried out,
after the object type key words and the physical attribute key words are obtained, simple object description sentences can be formed. Wherein the category keywords can determine which category the object is, and the physical attribute keywords are used to describe what the object gives. The input of the visual and tactile information of each object in the test set into the multi-branch network model proposed in this embodiment predicts the object category keyword and the physical attribute keyword. And then filling the obtained object category keywords and the obtained physical attribute keywords into a fixed sentence description template to form the object description sentence. For example: this is a plastic box whose surface is smooth, resilient and somewhat hard. Wherein "plastic box" is an object category keyword, and "smooth", "elastic", "somewhat hard" is a physical attribute keyword.
5. The results and the analysis were carried out in the same way,
through testing on the international PHAC-2 data set, after 150 rounds of training on the training set, the prediction accuracy of the network model of the embodiment on the object category keywords reaches 100%, and the prediction accuracy on the physical attributes reaches 97.8%, which indicates that the model of the embodiment can effectively form object description sentences.
Fig. 6 is a diagram of the result of predicting the physical attributes of 53 objects in the test set by the multi-branch network model provided in this embodiment, where the images and the haptic data in the test set are not included in the training set. Since the distribution of physical properties of objects in 53 is not uniform, the different properties do not occur the same number of times in the entire data set. Therefore, the AUC value is used as an evaluation standard of the prediction result, the value of the AUC is between 0 and 1, the closer the AUC value is to 1, the higher the model accuracy is, and the AUC value can be regarded as the prediction accuracy. As can be seen from the figure, the predicted result AUC values of the 24 attributes are all above 0.9, and the average value is 0.978.
Fig. 7 is a diagram of the result of predicting 53 object types in the test set by the multi-branch network model provided in this embodiment, where the diagram is presented in the form of a confusion matrix, the ordinate is the true value of the object type, and the abscissa is the result of predicting the object type of the multi-branch network. It can be seen from the figure that if the true value and the predicted value of the same object are equal, the intersection point is on the diagonal line of the picture. If the predicted value and the true value are different, the intersection point will appear at a position other than the diagonal line. As can be seen from the figure, the multi-branch network model of the embodiment successfully predicts the categories of 53 objects, and the accuracy rate reaches 100%.
In summary, the object category keyword prediction and the physical attribute keyword prediction of the multi-branch network model provided by the embodiment can respectively reach 100% and 97.8% of accuracy. The descriptive statement formed based on the object category keyword and the physical attribute keyword also has high credibility.
Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, and other modifications or equivalent substitutions made by the technical solutions of the present invention by those of ordinary skill in the art should be covered within the scope of the claims of the present invention as long as they do not depart from the spirit and scope of the technical solutions of the present invention.

Claims (6)

1.一种基于机器视觉与触觉感知的物体描述生成方法,其特征在于,包括以下步骤:1. an object description generation method based on machine vision and tactile perception, is characterized in that, comprises the following steps: S1,预处理视觉和触觉原始数据;S1, preprocessing visual and tactile raw data; S2,将采集到的视觉和触觉信息分别输入二维卷积神经网络和一维卷积神经网络,并将两个神经网络输出的特征向量串联得到视触融合特征向量;S2, input the collected visual and tactile information into a two-dimensional convolutional neural network and a one-dimensional convolutional neural network respectively, and connect the feature vectors output by the two neural networks in series to obtain a visual-touch fusion feature vector; S3,将得到的视触融合特征向量输入到两个全连接网络分支中,第一个全连接网络用于物体的识别分类,第二个全连接网络用于物体的物理属性识别;S3, input the obtained visual-touch fusion feature vector into two fully connected network branches, the first fully connected network is used for object recognition and classification, and the second fully connected network is used for object physical attribute recognition; S4,将两个全连接网络得到的分类结果和物理属性以关键词的形式嵌入到物体描述语句中。S4, the classification results and physical attributes obtained by the two fully connected networks are embedded in the object description sentences in the form of keywords. 2.根据权利要求1所述的基于机器视觉与触觉感知的物体描述生成方法,其特征在于:2. the object description generation method based on machine vision and tactile perception according to claim 1, is characterized in that: 所述S1中,视觉信息的预处理方法为,将原始高像素图像进行尺寸变换,变换为像素值300*300大小的图片,同时对图片亮度、对比度、饱和度随机产生30%的偏移处理,得到最终需要输入的图像。In the S1, the preprocessing method of the visual information is to transform the original high-pixel image into a picture with a pixel value of 300*300, and at the same time randomly generate a 30% offset process for the brightness, contrast, and saturation of the picture. , to get the final image that needs to be input. 3.根据权利要求1所述的基于机器视觉与触觉感知的物体描述生成方法,其特征在于:3. the object description generation method based on machine vision and tactile perception according to claim 1, is characterized in that: 所述S1中,触觉信息的预处理方法为,使用matlab软件对数据剪切,将长短不一的多维数据进行压缩处理,最终得到长短一致的触觉数据。In the S1, the preprocessing method of the haptic information is to use the matlab software to cut the data, compress the multi-dimensional data of different lengths, and finally obtain the haptic data of the same length. 4.根据权利要求1所述的基于机器视觉与触觉感知的物体描述生成方法,其特征在于:4. the object description generation method based on machine vision and tactile perception according to claim 1, is characterized in that: 所述S2中,视觉和触觉信息是成对输入的,视觉信息输入二维卷积神经网络,触觉信息输入一维卷积神经网络;对于触觉信息特征提取共使用三层一维卷积神经网络,激活函数使用的是RELU函数;对于视觉信息特征提取使用densnet169模型。In the S2, the visual and tactile information are input in pairs, the visual information is input into the two-dimensional convolutional neural network, and the tactile information is input into the one-dimensional convolutional neural network; for the feature extraction of the tactile information, a total of three layers of one-dimensional convolutional neural networks are used. , the activation function uses the RELU function; for the visual information feature extraction, the densnet169 model is used. 5.根据权利要求1所述的基于机器视觉与触觉感知的物体描述生成方法,其特征在于:5. The object description generation method based on machine vision and tactile perception according to claim 1, wherein: 所述S3中,两个全连接网络所使用的监督标签都是标准的多分类任务和多标签任务中的标签形式,多分支输出的神经网络有两个分支就会有两个损失函数,多分类任务中使用的是交叉熵损失函数,多标签任务中使用的损失函数是pytorch神经网络架构提供的多标签分类损失函数MultilLabelSoftMarginLoss()。In the above S3, the supervised labels used by the two fully connected networks are in the form of standard multi-classification tasks and labels in multi-label tasks. The multi-branch output neural network has two branches and there are two loss functions. The cross-entropy loss function is used in the classification task, and the loss function used in the multi-label task is the multi-label classification loss function MultilLabelSoftMarginLoss() provided by the pytorch neural network architecture. 6.根据权利要求1所述的基于机器视觉与触觉感知的物体描述生成方法,其特征在于:6. The object description generation method based on machine vision and tactile perception according to claim 1, wherein: 所述S4中,将分类结果和物理属性转化为关键词的具体方法为,将物体类别关键词排序形成n个元素的列表,然后将物体类别关键词的索引值作为物体的标签,每个物体只有一个标签;In the S4, the specific method for converting the classification results and physical attributes into keywords is to sort the object category keywords to form a list of n elements, and then use the index value of the object category keywords as the label of the object, and each object only one label; 多分类任务的输出是n个概率值,由概率值最大的数值的索引可以找到对应的物体类别关键词;The output of the multi-classification task is n probability values, and the corresponding object category keywords can be found by the index of the value with the largest probability value; 多标签分类任务中标签生成与多分类任务相似,先将m个物理属性关键词排序形成一个m个元素的列表,多分类的标签是由m个元素组成分别和m个物理属性关键词对应,要从多标签分类网络输出得到物理属性关键词,需要得到网络输出中预测值为1的索引,然后根据索引从物理属性关键词列表中调取相应的属性,完成物理属性关键词的提取。The label generation in the multi-label classification task is similar to the multi-classification task. First, the m physical attribute keywords are sorted to form a list of m elements. The multi-category label is composed of m elements and corresponds to the m physical attribute keywords. To obtain physical attribute keywords from the multi-label classification network output, it is necessary to obtain an index with a predicted value of 1 in the network output, and then retrieve the corresponding attribute from the physical attribute keyword list according to the index to complete the extraction of physical attribute keywords.
CN202110037740.2A 2021-01-12 2021-01-12 An object description generation method based on machine vision and tactile perception Active CN112766349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110037740.2A CN112766349B (en) 2021-01-12 2021-01-12 An object description generation method based on machine vision and tactile perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110037740.2A CN112766349B (en) 2021-01-12 2021-01-12 An object description generation method based on machine vision and tactile perception

Publications (2)

Publication Number Publication Date
CN112766349A true CN112766349A (en) 2021-05-07
CN112766349B CN112766349B (en) 2021-08-24

Family

ID=75699764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110037740.2A Active CN112766349B (en) 2021-01-12 2021-01-12 An object description generation method based on machine vision and tactile perception

Country Status (1)

Country Link
CN (1) CN112766349B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219982A (en) * 2021-12-15 2022-03-22 齐鲁工业大学 Self-adaptive feature weighted visual-touch fusion object classification method
CN114330460A (en) * 2022-01-12 2022-04-12 齐鲁工业大学 An object attribute recognition method based on dexterous hand touch

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008027223A (en) * 2006-07-21 2008-02-07 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for presenting integrated vision and touch
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
CN105718954A (en) * 2016-01-22 2016-06-29 清华大学 Target attribute and category identifying method based on visual tactility fusion
CN106874840A (en) * 2016-12-30 2017-06-20 东软集团股份有限公司 Vehicle information recognition method and device
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108921054A (en) * 2018-06-15 2018-11-30 华中科技大学 A kind of more attribute recognition approaches of pedestrian based on semantic segmentation
CN110909637A (en) * 2019-11-08 2020-03-24 清华大学 Outdoor mobile robot terrain recognition method based on visual-touch fusion
CN111598164A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and device for identifying attribute of target object, electronic equipment and storage medium
CN111651035A (en) * 2020-04-13 2020-09-11 济南大学 A virtual experiment system and method based on multimodal interaction

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008027223A (en) * 2006-07-21 2008-02-07 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for presenting integrated vision and touch
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
CN105718954A (en) * 2016-01-22 2016-06-29 清华大学 Target attribute and category identifying method based on visual tactility fusion
CN106874840A (en) * 2016-12-30 2017-06-20 东软集团股份有限公司 Vehicle information recognition method and device
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108921054A (en) * 2018-06-15 2018-11-30 华中科技大学 A kind of more attribute recognition approaches of pedestrian based on semantic segmentation
CN110909637A (en) * 2019-11-08 2020-03-24 清华大学 Outdoor mobile robot terrain recognition method based on visual-touch fusion
CN111651035A (en) * 2020-04-13 2020-09-11 济南大学 A virtual experiment system and method based on multimodal interaction
CN111598164A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and device for identifying attribute of target object, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219982A (en) * 2021-12-15 2022-03-22 齐鲁工业大学 Self-adaptive feature weighted visual-touch fusion object classification method
CN114330460A (en) * 2022-01-12 2022-04-12 齐鲁工业大学 An object attribute recognition method based on dexterous hand touch

Also Published As

Publication number Publication date
CN112766349B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN111898736B (en) An Efficient Pedestrian Re-identification Method Based on Attribute Awareness
Xiang et al. Fabric image retrieval system using hierarchical search based on deep convolutional neural network
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
CN114661933B (en) A cross-modal retrieval method based on fetal congenital heart disease ultrasound image-diagnosis report
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
CN106779087A (en) A kind of general-purpose machinery learning data analysis platform
CN112949740B (en) A Small Sample Image Classification Method Based on Multi-Level Metric
US11386655B2 (en) Image processing neural network systems and methods with scene understanding
CN112766349B (en) An object description generation method based on machine vision and tactile perception
CN113157913A (en) Ethical behavior discrimination method based on social news data set
CN110097096A (en) A kind of file classification method based on TF-IDF matrix and capsule network
Tavakoli Seq2image: Sequence analysis using visualization and deep convolutional neural network
Thepade et al. Human face gender identification using Thepade's sorted N-ary block truncation coding and machine learning classifiers
CN111898704B (en) Method and device for clustering content samples
Guan et al. A unified probabilistic model for global and local unsupervised feature selection
Abu-Jamie et al. Classification of Sign-Language Using Deep Learning-A Comparison between Inception and Xception models
CN114170460A (en) Multi-mode fusion-based artwork classification method and system
CN118799619A (en) A method for batch recognition and automatic classification and archiving of image content
CN117934957A (en) A method of garbage classification and identification based on capsule network
Anderson et al. Category systems for real-world scenes
CN115080699B (en) Cross-modal retrieval method based on modality-specific adaptive scaling and attention network
CN115392474B (en) Local perception graph representation learning method based on iterative optimization
CN116958624A (en) Method, device, equipment, medium and program product for identifying appointed material
ALtememe et al. Gesture interpreting of alphabet Arabic sign language based on machine learning algorithms
CN109002832B (en) Image identification method based on hierarchical feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 250000 Science and Technology Park of Xincheng University in the West of Jinan City, Changqing District, Jinan City, Shandong Province

Patentee after: Qilu University of Technology (Shandong Academy of Sciences)

Country or region after: China

Address before: 250000 Science and Technology Park of Xincheng University in the West of Jinan City, Changqing District, Jinan City, Shandong Province

Patentee before: Qilu University of Technology

Country or region before: China