CN112766349A - Object description generation method based on machine vision and tactile perception - Google Patents

Object description generation method based on machine vision and tactile perception Download PDF

Info

Publication number
CN112766349A
CN112766349A CN202110037740.2A CN202110037740A CN112766349A CN 112766349 A CN112766349 A CN 112766349A CN 202110037740 A CN202110037740 A CN 202110037740A CN 112766349 A CN112766349 A CN 112766349A
Authority
CN
China
Prior art keywords
tactile
visual
keywords
machine vision
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110037740.2A
Other languages
Chinese (zh)
Other versions
CN112766349B (en
Inventor
张鹏
周茂辉
单东日
邹文凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202110037740.2A priority Critical patent/CN112766349B/en
Publication of CN112766349A publication Critical patent/CN112766349A/en
Application granted granted Critical
Publication of CN112766349B publication Critical patent/CN112766349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention relates to an object description method based on machine vision and tactile perception, which takes the machine vision and tactile information of an object as input, identifies the type and physical attributes of the object by using a deep learning method, and then converts the identification result into a keyword to form an object description sentence. The method provided by the invention is trained and tested on a visual and tactile data set (PHAC-2 data set) disclosed by the university of Pennsylvania, and the prediction accuracy rates of the category keywords and the physical attribute keywords respectively reach 100% and 97.8%. The method for forming the descriptive sentence after the robot explores and perceives the object can effectively promote the development of the human-computer interaction technology in the robot perception field.

Description

Object description generation method based on machine vision and tactile perception
Technical Field
The invention relates to the technical field of robot perception technology, multi-modal fusion and object description generation, in particular to an object description generation method based on machine vision and touch perception.
Background
With the development of sensor technology and artificial intelligence technology, the perception and decision-making capability of the robot is continuously improved, and the development of the robot is converting from the attribute of the robot to the attribute of a human. However, the cognitive discrimination ability of robots for objects is still far less than that of humans.
Humans use a combination of visual and tactile information to accomplish object recognition processes. Functional magnetic resonance imaging data indicates that human tactile and visual signals are processed in a multi-sensory coordinated manner in identifying objects. Inspired by human brain cross-modal coprocessing, foreign researchers use touch and visual signals to design a deep learning framework for touch attribute classification, and prove that the touch and visual signals are complementary, and the performance can be improved by combining data of the two forms.
It is of great significance to generate object descriptions at the visual and tactile perception level through the exploration of robots. The technology can effectively increase the participation sense and the acquisition sense of disabled people in life, and meanwhile, the object description technology can be applied to high-risk environments, so that robots are used for replacing people to explore and sense objects, corresponding feedback reports are formed, and the injury of people can be effectively reduced. At present, no corresponding object description generation method based on visual perception and tactile perception exists, and therefore the object description generation method provided by the invention can fill the blank of the technology.
Disclosure of Invention
The invention provides a grabbed object identification method based on fusion of a touch vibration signal and a visual image, aiming at making up for the defects in the prior art.
The invention is realized by the following technical scheme:
an object description generation method based on machine vision and tactile perception is characterized by comprising the following steps:
s1, preprocessing the visual and tactile original data;
s2, inputting the collected visual and tactile information into a two-dimensional convolution neural network and a one-dimensional convolution neural network respectively, and connecting the eigenvectors output by the two neural networks in series to obtain a visual and tactile fusion eigenvector;
s3, inputting the obtained visual-touch fusion characteristic vector into two fully-connected network branches, wherein the first fully-connected network is used for identifying and classifying objects, and the second fully-connected network is used for identifying physical attributes of the objects;
and S4, embedding the classification results and the physical attributes obtained by the two fully-connected networks into the object description sentences in the form of keywords.
Further, in order to better implement the present invention, in S1, the method for acquiring visual information includes transforming the original high-pixel image into a picture with a pixel value of 300 × 300, and randomly generating 30% offset processing on the brightness, contrast, and saturation of the picture to obtain the final image to be input.
Further, in order to better implement the present invention, in S1, the method for acquiring the haptic information includes cutting data by using matlab software, and compressing the multidimensional data with different lengths to finally obtain the haptic data with the same length.
Further, in order to better implement the present invention, in S2, the visual information and the tactile information are input in pairs, the visual information is input into the two-dimensional convolutional neural network, and the tactile information is input into the one-dimensional convolutional neural network; three layers of one-dimensional convolutional neural networks are used for processing the tactile information, and the RELU function is used as the activation function; the densnet169 model is used for visual information processing.
Further, in order to better implement the present invention, in S3, the supervised labels used by the two fully-connected networks are in the form of labels in a standard multi-class task and a multi-label task, the neural network with multi-branch output has two branches and two loss functions, the cross-entropy loss function is used in the multi-class task, and the loss function used in the multi-label task is a multi-label class loss function multilabel software label loss () provided by the pytorch neural network architecture.
Further, in order to better implement the present invention, in S4, a specific method for converting the classification result and the physical attribute into the keywords is to sort the object category keywords to form a list of n elements, and then use the index value of the object category keywords as the tags of the objects, where each object has only one tag; the output of the multi-classification task is n probability values, and the corresponding object category key words can be found according to the index of the numerical value with the maximum probability value; the label generation in the multi-label classification task is similar to the multi-classification task, m physical attribute key word values are firstly sequenced to form a list of m elements, the multi-classification label is composed of m elements and respectively corresponds to the m physical attribute key words, the physical attribute key words are required to be obtained through multi-label classification network output, an index with a predicted value of 1 in network output is required to be obtained, then corresponding attributes are called from the physical attribute key word list according to the index, and the extraction of the physical attribute key words is completed.
The invention has the beneficial effects that:
the object description generation method based on machine vision and touch perception provided by the invention constructs a multi-branch network model capable of simultaneously predicting object category keywords and physical attribute keywords, and then forms a description sentence of the object according to the predicted keywords. The method effectively improves the external perception expression capability of the robot, and enables the robot to be more intelligent in the human-computer interaction process.
Drawings
FIG. 1 is a schematic diagram of a multi-drop network of the present invention;
FIG. 2 is a schematic diagram of image data processing in a data set according to the present invention;
FIG. 3 is a diagram illustrating the centralized haptic data processing of the present invention;
FIG. 4 is a multi-category label mapping of the present invention;
FIG. 5 is a diagram of a multi-label classification label correspondence of the present invention;
FIG. 6 is a diagram of the results of various physical property predictions of the present invention;
FIG. 7 shows the object class prediction result of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
Fig. 1-7 illustrate a specific embodiment of the present invention, which is an object description generation method based on visual and tactile perception, and as shown in fig. 1, this embodiment proposes a multi-branch neural network structure with multi-modal input and multi-level output, which takes machine vision and tactile sensation as two modal inputs, where machine vision is input into a two-dimensional convolutional neural network, and machine tactile sensation is input into a one-dimensional convolutional neural network. And then, connecting the feature vectors output by the two-dimensional convolutional neural network and the one-dimensional convolutional neural network in series to obtain a visual-touch fusion feature vector. And finally, respectively inputting the visual-touch fusion characteristic vectors into two fully-connected network branches, wherein the first fully-connected network outputs the object type predicted based on the visual-touch fusion characteristic vectors, and the second predicts the physical attributes of the object based on the visual-touch fusion characteristic vectors. In addition, the embodiment provides an object description generation method, which converts the classification result and the physical attribute output by the multi-branch network structure into a keyword, and then embeds the keyword into a description statement template.
The specific implementation process of this embodiment is as follows:
1. the data set is composed of a plurality of data sets,
the method of this example was trained and tested on the PHAC-2 dataset, published by the university of Pennsylvania, containing visual and tactile data for 53 objects, wherein each object's visual data contained 8 photographs, which were collected by placing the objects on an aluminum disk that was photographed once for every 45 degrees of rotation. The haptic data set consists of two pressure values, micro-vibrations, and temperature values, the haptic data being from haptic data of squeezing, pinching, slow sliding, and fast sliding for each object. The data set also contains 24 tactile adjectives to describe physical properties of the object, including softness, hardness, temperature, viscosity, elasticity, etc. Each object in the data set is assigned several tactile adjectives, and to exclude contingencies, the adjectives of each object are determined collectively by 36 individuals.
The method proposed by this embodiment requires the data set to be divided into training set and test set, and we extract one visual data and one tactile data from each sample as test set. To ensure fairness, a number a between 1 and 8 is randomly generated by the computer for each object in the test set data selection process, and then the a-th image and the a-th tactile data of the object are taken.
To reduce the amount of network parameters, the image data was changed to 300 × 300 pictures. Because the visual information of the robot is most interfered by light, 30% of offset processing is randomly generated on the brightness, the contrast and the saturation of the picture in order to improve the robustness of the model.
Since the tactile data in the PHAC-2 data set is 88-dimensional data having a long length and a different length, it is necessary to compress the data. Through observation, the lengths of the two 'slow sliding' and 'fast sliding' tactile actions in the data set are about 2000 data points, the data volume of the part is small, and the data characteristics are obvious. And then using matlab software to cut data, wherein the data cutting basis is the data change amplitude, the data is read from the last, when the absolute value of the slope of the pressure value in the tactile data is greater than 1, the data change is considered to be large, and the length of 2000 data points is continuously read forwards as the starting point of cutting. In order to further reduce the data volume, only important pressure values and micro-vibration actions in the data set are extracted as the tactile data, and finally 46-dimensional tactile data with the length of 2000 data points is obtained.
2. The introduction of the model is carried out,
in the present embodiment, the visual sense and the tactile sense corresponding to the object are input in pairs, the visual sense model is input to the two-dimensional convolution model, the tactile sense data is input to the one-dimensional convolution model, and the learning rate is set to 0.00002.
The processed haptic data consists of 46 one-dimensional signals, and features of the haptic data are extracted by using a one-dimensional convolutional neural network according to the characteristics of the one-dimensional signals, three layers of one-dimensional convolutional neural networks are used in the embodiment, the RELU function is used as the activation function, and specific parameters of each layer are as follows:
table 1: one-dimensional convolution neural network parameter table
Number of layers Number of input channels Number of output channels Convolution kernel size Convolution step size
1 46 32 7 5
2 32 64 5 3
3 64 46 5 3
The processed visual image is a 300 × 3 three-channel color image, and the processing of the image uses a mature densnet169 model in the visual field.
Visual and tactile information is respectively extracted by using a two-dimensional convolution and a one-dimensional convolution to obtain eigenvectors with the lengths of 1664 and 1978, the two eigenvectors are connected in series to obtain a visual and tactile fusion eigenvector with the length of 3642, and then the visual and tactile fusion eigenvectors are respectively input into two fully-connected neural networks for classification. The two fully-connected networks differ in that the first is used for multi-classification tasks, i.e. after visual and tactile information of an object in the test set is entered into the model, the first fully-connected network can predict which of the 53 objects the object is. The second fully-connected network is used for a multi-label classification task, which differs from the multi-classification task in that the multi-classification task identifies which of a plurality of objects the object belongs to, and the multi-label classification task identifies which of a plurality of attributes the object belongs to.
The supervised tags used by both fully connected networks are in the form of tags in a standard multi-classification and multi-tagging task. It should be noted that such a multi-branch output neural network has two branches and thus has two loss functions. In this embodiment, the multi-classification task uses a cross entropy loss function (formula 1), the loss function used by the multi-label classification task is a multi label classification loss function (formula 2) provided in the pyrrch neural network architecture, an output using the loss function is defined by 0, an output prediction value greater than 0 is 1, and an output prediction value smaller than 0 is 0. The goal of the optimization during training is to minimize the total loss function (equation 3) value resulting from the addition of two loss functions.
loss(x1,class)=-x1[class]+log(∑j exp(x1[j]) Equation 1)
Wherein: x is the number of1Representing the prediction output of the fully-connected network, class representing the index of the label class
x1[j]Denotes x1The j-th value of (a).
Figure BDA0002893940290000051
Wherein: x is the number of2Representing the output of a fully connected network, y2Presentation label
x2[i]Denotes x2Is given by the ith value, y2[i]Denotes y2Value of (1)
y2[i]∈{0,1},i∈{0,…,x2.nElement()-1}
x2Nelelement () is used to count the number of output elements.
Loss=loss(x1,class)+loss(x2,y2) Equation 3
3. Conversion to keywords
The multi-tasking labeling process entails sorting the object class keywords into a list of 53 elements and then using the index values of the object class keywords as labels for the objects (as shown in fig. 4), with only one label per object. The label in the multi-classification task is composed of numbers from 0 to 52, and the 53 numbers have strict corresponding relation with 53 object class keywords. Our goal is to convert the numerical values of the multi-classification task output into corresponding object class keywords. The output of the multi-classification task is 53 probability values, and according to the corresponding relation of fig. 4, the corresponding object category keyword can be found by the index of the numerical value with the maximum probability value. For example, if the 0 th output probability value in the multi-classification task is the maximum, the corresponding object category keyword is "aluminum", and if the 51 st output probability value is the maximum, the corresponding object category keyword is "yellow felt". Therefore, to obtain the category keyword corresponding to the multi-category output, it is necessary to obtain the index corresponding to the maximum probability value among the 53 probabilities, and then use this index value to retrieve the object category keyword at the corresponding position in the keyword list.
The label generation in the multi-label classification task is similar to the multi-classification task, in this embodiment, 24 physical attribute keyword values are firstly sorted to form a list of 24 elements, and the multi-classification label is composed of 24 elements and respectively corresponds to the 24 physical attribute keywords. Referring to fig. 5, the labels are formed by numbers 0/1, each position in the label corresponds to an attribute, for example, if the number of the nth position is 1, the object has the attribute corresponding to the nth position in the attribute list, and if the nth +1 position is 0, the object has no attribute corresponding to the nth +1 position in the attribute list. Therefore, when the physical attribute keywords are required to be obtained from the multi-label classification network output, the index with the predicted value of 1 in the network output needs to be obtained, and then the corresponding attributes are called from the physical attribute keyword list according to the index to complete the extraction of the physical attribute keywords.
4. The generation of the descriptive sentence is carried out,
after the object type key words and the physical attribute key words are obtained, simple object description sentences can be formed. Wherein the category keywords can determine which category the object is, and the physical attribute keywords are used to describe what the object gives. The input of the visual and tactile information of each object in the test set into the multi-branch network model proposed in this embodiment predicts the object category keyword and the physical attribute keyword. And then filling the obtained object category keywords and the obtained physical attribute keywords into a fixed sentence description template to form the object description sentence. For example: this is a plastic box whose surface is smooth, resilient and somewhat hard. Wherein "plastic box" is an object category keyword, and "smooth", "elastic", "somewhat hard" is a physical attribute keyword.
5. The results and the analysis were carried out in the same way,
through testing on the international PHAC-2 data set, after 150 rounds of training on the training set, the prediction accuracy of the network model of the embodiment on the object category keywords reaches 100%, and the prediction accuracy on the physical attributes reaches 97.8%, which indicates that the model of the embodiment can effectively form object description sentences.
Fig. 6 is a diagram of the result of predicting the physical attributes of 53 objects in the test set by the multi-branch network model provided in this embodiment, where the images and the haptic data in the test set are not included in the training set. Since the distribution of physical properties of objects in 53 is not uniform, the different properties do not occur the same number of times in the entire data set. Therefore, the AUC value is used as an evaluation standard of the prediction result, the value of the AUC is between 0 and 1, the closer the AUC value is to 1, the higher the model accuracy is, and the AUC value can be regarded as the prediction accuracy. As can be seen from the figure, the predicted result AUC values of the 24 attributes are all above 0.9, and the average value is 0.978.
Fig. 7 is a diagram of the result of predicting 53 object types in the test set by the multi-branch network model provided in this embodiment, where the diagram is presented in the form of a confusion matrix, the ordinate is the true value of the object type, and the abscissa is the result of predicting the object type of the multi-branch network. It can be seen from the figure that if the true value and the predicted value of the same object are equal, the intersection point is on the diagonal line of the picture. If the predicted value and the true value are different, the intersection point will appear at a position other than the diagonal line. As can be seen from the figure, the multi-branch network model of the embodiment successfully predicts the categories of 53 objects, and the accuracy rate reaches 100%.
In summary, the object category keyword prediction and the physical attribute keyword prediction of the multi-branch network model provided by the embodiment can respectively reach 100% and 97.8% of accuracy. The descriptive statement formed based on the object category keyword and the physical attribute keyword also has high credibility.
Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, and other modifications or equivalent substitutions made by the technical solutions of the present invention by those of ordinary skill in the art should be covered within the scope of the claims of the present invention as long as they do not depart from the spirit and scope of the technical solutions of the present invention.

Claims (6)

1. An object description generation method based on machine vision and tactile perception is characterized by comprising the following steps:
s1, preprocessing the visual and tactile original data;
s2, inputting the collected visual and tactile information into a two-dimensional convolution neural network and a one-dimensional convolution neural network respectively, and connecting the eigenvectors output by the two neural networks in series to obtain a visual and tactile fusion eigenvector;
s3, inputting the obtained visual-touch fusion characteristic vector into two fully-connected network branches, wherein the first fully-connected network is used for identifying and classifying objects, and the second fully-connected network is used for identifying physical attributes of the objects;
and S4, embedding the classification results and the physical attributes obtained by the two fully-connected networks into the object description sentences in the form of keywords.
2. The method of claim 1, wherein the object description generation method based on machine vision and haptic perception is as follows:
in S1, the preprocessing method of the visual information includes transforming the original high-pixel image into a picture with a pixel value of 300 × 300 by performing size transformation, and performing a 30% shift process on the brightness, contrast, and saturation of the picture at random to obtain the final image to be input.
3. The method of claim 1, wherein the object description generation method based on machine vision and haptic perception is as follows:
in S1, the preprocessing method of the haptic information includes cutting data by matlab software, compressing the multidimensional data with different lengths, and finally obtaining haptic data with the same length.
4. The method of claim 1, wherein the object description generation method based on machine vision and haptic perception is as follows:
in S2, the visual information and the tactile information are input in pairs, the visual information is input to the two-dimensional convolutional neural network, and the tactile information is input to the one-dimensional convolutional neural network; three layers of one-dimensional convolutional neural networks are used for extracting the characteristics of the tactile information, and the RELU function is used as the activation function; for visual information feature extraction the densnet169 model is used.
5. The method of claim 1, wherein the object description generation method based on machine vision and haptic perception is as follows:
in S3, the supervised tags used by the two fully-connected networks are both in the form of tags in a standard multi-class task and a multi-tag task, the neural network with multi-branch output has two branches and has two loss functions, the cross-entropy loss function is used in the multi-class task, and the loss function used in the multi-tag task is a multi-tag class loss function multilabelsoftmarkloss () provided by the pyrrch neural network architecture.
6. The method of claim 1, wherein the object description generation method based on machine vision and haptic perception is as follows:
in S4, the specific method of converting the classification result and the physical attribute into the keywords is to sort the object category keywords to form a list of n elements, and then to use the index value of the object category keywords as the tags of the objects, where each object has only one tag;
the output of the multi-classification task is n probability values, and the corresponding object category key words can be found according to the index of the numerical value with the maximum probability value;
the label generation in the multi-label classification task is similar to the multi-classification task, m physical attribute keywords are firstly sequenced to form a list of m elements, the multi-classification label is composed of the m elements and respectively corresponds to the m physical attribute keywords, the physical attribute keywords are required to be obtained from the multi-label classification network output, an index with a predicted value of 1 in the network output is required to be obtained, then corresponding attributes are called from the physical attribute keyword list according to the index, and the extraction of the physical attribute keywords is completed.
CN202110037740.2A 2021-01-12 2021-01-12 Object description generation method based on machine vision and tactile perception Active CN112766349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110037740.2A CN112766349B (en) 2021-01-12 2021-01-12 Object description generation method based on machine vision and tactile perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110037740.2A CN112766349B (en) 2021-01-12 2021-01-12 Object description generation method based on machine vision and tactile perception

Publications (2)

Publication Number Publication Date
CN112766349A true CN112766349A (en) 2021-05-07
CN112766349B CN112766349B (en) 2021-08-24

Family

ID=75699764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110037740.2A Active CN112766349B (en) 2021-01-12 2021-01-12 Object description generation method based on machine vision and tactile perception

Country Status (1)

Country Link
CN (1) CN112766349B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219982A (en) * 2021-12-15 2022-03-22 齐鲁工业大学 Self-adaptive feature weighted visual-touch fusion object classification method
CN114330460A (en) * 2022-01-12 2022-04-12 齐鲁工业大学 Object attribute identification method based on dexterous hand touch

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008027223A (en) * 2006-07-21 2008-02-07 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for presenting integrated vision and touch
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
CN105718954A (en) * 2016-01-22 2016-06-29 清华大学 Target attribute and category identifying method based on visual tactility fusion
CN106874840A (en) * 2016-12-30 2017-06-20 东软集团股份有限公司 Vehicle information recognition method and device
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108921054A (en) * 2018-06-15 2018-11-30 华中科技大学 A kind of more attribute recognition approaches of pedestrian based on semantic segmentation
CN110909637A (en) * 2019-11-08 2020-03-24 清华大学 Outdoor mobile robot terrain recognition method based on visual-touch fusion
CN111598164A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and device for identifying attribute of target object, electronic equipment and storage medium
CN111651035A (en) * 2020-04-13 2020-09-11 济南大学 Multi-modal interaction-based virtual experiment system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008027223A (en) * 2006-07-21 2008-02-07 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for presenting integrated vision and touch
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
CN105718954A (en) * 2016-01-22 2016-06-29 清华大学 Target attribute and category identifying method based on visual tactility fusion
CN106874840A (en) * 2016-12-30 2017-06-20 东软集团股份有限公司 Vehicle information recognition method and device
CN108549926A (en) * 2018-03-09 2018-09-18 中山大学 A kind of deep neural network and training method for refining identification vehicle attribute
CN108921054A (en) * 2018-06-15 2018-11-30 华中科技大学 A kind of more attribute recognition approaches of pedestrian based on semantic segmentation
CN110909637A (en) * 2019-11-08 2020-03-24 清华大学 Outdoor mobile robot terrain recognition method based on visual-touch fusion
CN111651035A (en) * 2020-04-13 2020-09-11 济南大学 Multi-modal interaction-based virtual experiment system and method
CN111598164A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Method and device for identifying attribute of target object, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219982A (en) * 2021-12-15 2022-03-22 齐鲁工业大学 Self-adaptive feature weighted visual-touch fusion object classification method
CN114330460A (en) * 2022-01-12 2022-04-12 齐鲁工业大学 Object attribute identification method based on dexterous hand touch

Also Published As

Publication number Publication date
CN112766349B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN107516110B (en) Medical question-answer semantic clustering method based on integrated convolutional coding
CN111368896B (en) Hyperspectral remote sensing image classification method based on dense residual three-dimensional convolutional neural network
CN110442684B (en) Class case recommendation method based on text content
CN108596039B (en) Bimodal emotion recognition method and system based on 3D convolutional neural network
Xiang et al. Fabric image retrieval system using hierarchical search based on deep convolutional neural network
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
Grinstein et al. Benchmark development for the evaluation of visualization for data mining
CN112766349B (en) Object description generation method based on machine vision and tactile perception
Guan et al. A unified probabilistic model for global and local unsupervised feature selection
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN112905822A (en) Deep supervision cross-modal counterwork learning method based on attention mechanism
CN112733602B (en) Relation-guided pedestrian attribute identification method
CN114661933A (en) Cross-modal retrieval method based on fetal congenital heart disease ultrasonic image-diagnosis report
CN112182275A (en) Trademark approximate retrieval system and method based on multi-dimensional feature fusion
Thepade et al. Human face gender identification using Thepade's sorted N-ary block truncation coding and machine learning classifiers
Tavakoli Seq2image: Sequence analysis using visualization and deep convolutional neural network
CN111968124A (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
Pratama et al. Deep convolutional neural network for hand sign language recognition using model E
CN111898704A (en) Method and device for clustering content samples
CN112347252B (en) Interpretability analysis method based on CNN text classification model
CN115392474B (en) Local perception graph representation learning method based on iterative optimization
Anderson et al. Category systems for real-world scenes
CN112560712B (en) Behavior recognition method, device and medium based on time enhancement graph convolutional network
CN114170460A (en) Multi-mode fusion-based artwork classification method and system
CN114022698A (en) Multi-tag behavior identification method and device based on binary tree structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant