CN115758107A

CN115758107A - Haptic signal transmission method and apparatus, storage medium, and electronic device

Info

Publication number: CN115758107A
Application number: CN202211338523.8A
Authority: CN
Inventors: 张利平; 俞科峰; 朱应钊; 乔宏明; 陈龙杰
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-03-07
Anticipated expiration: 2042-10-28
Also published as: CN115758107B

Abstract

The disclosure provides a tactile signal transmission method and device, a storage medium and electronic equipment, and relates to the technical field of signal processing. Collecting tactile signals, image data and audio-video data; determining the material type according to the image data, and obtaining the force feedback characteristics under the corresponding material; inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-modal input signal; performing compression coding on the multi-modal input signal based on the Weber's law to obtain a multi-modal characteristic signal; and sending the multi-modal characteristic signal to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signal. The effective compression is realized, the data transmission quantity is reduced, and the accuracy of the representation of the tactile signals is improved.

Description

Haptic signal transmission method and apparatus, storage medium, and electronic device

Technical Field

The present disclosure relates to the field of signal processing technologies, and in particular, to a method and an apparatus for transmitting a haptic signal, a storage medium, and an electronic device.

Background

With the rapid development of digital signal processing technology and communication technology, virtual reality technology enters the public field of vision, and the way of mutual convergence between the virtual world and the real world becomes an important direction for scientific and technological development and research. From audiovisual interaction to multi-experience interaction, in order to bring more extreme interactive feeling and richer experience to users, the multimedia interaction experience begins to incorporate the touch sense. The importance of touch is self-evident as a perception means next to auditory and visual senses.

However, techniques such as haptic signal compression and reconstruction are still in the beginning and exploration stages, and have a plurality of problems. At present, the traditional tactile signals have signal delay or loss in different degrees due to overlarge data packets in the signal transmission process, so that the problems of disordered tactile signal time sequence, poor action simulation experience feeling and the like can be caused.

Therefore, it is desirable to solve the problems of signal delay and loss caused by too large data packets during the transmission of the haptic signal.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a haptic signal transmission method and apparatus, a storage medium, and an electronic device, which overcome, at least to some extent, the problem of an excessively large data packet in the transmission process in the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a method for processing a multi-modal signal, applied to a transmitting end, including:

collecting tactile signals, image data and audio-video data;

determining the material type according to the image data, and obtaining the force feedback characteristics under the corresponding material;

inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-modal input signal;

performing compression coding on the multi-modal input signal based on the Weber's law to obtain a multi-modal characteristic signal;

and sending the multi-modal characteristic signals to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signals.

In one embodiment of the present disclosure, the multi-modal input signal comprises: a tactile input signal, an audio-video input signal and image information;

the inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-modal input signal comprises the following steps:

acquiring characteristic information of the force feedback characteristic and label information of the audio and video data;

and according to the feature information and the label information, performing signal feature extraction on the corresponding tactile signals in the same time domain to obtain tactile input signals.

In an embodiment of the present disclosure, before the step of inputting the haptic signal, the force feedback feature, and the audio/video data into a preset neural network model for feature extraction to obtain a multi-modal input signal, the method includes:

detecting a signal type corresponding to the audio and video data;

and carrying out cross-modal signal coding according to the signal type to obtain an audio/video input signal.

In one embodiment of the present disclosure, the haptic signal includes: a vibration signal, a friction signal, and a pressure signal.

In one embodiment of the present disclosure, the multi-modal feature signals include: the tactile characteristic signal, the audio and video characteristic signal and the image characteristic information are acquired;

the method for compressing and encoding the multi-modal input signal based on the weber's law to obtain a multi-modal characteristic signal comprises the following steps:

determining a perception blind area range of the touch input signal based on the Weber's law;

removing the signals of the tactile input signals in the sensing blind area range according to the sensing blind area range to obtain tactile characteristic signals;

and acquiring image characteristic information and audio-video characteristic signals of corresponding time domains according to the time domains of the tactile characteristic signals.

In one embodiment of the present disclosure, the preset neural network model includes an input layer, a hidden layer, and an output layer; the number of the neurons of the input layer and the output layer is determined according to the dimension of the tactile characteristic signal.

In one embodiment of the present disclosure, the method comprises:

and determining initialization parameters of the preset neural network model by using a genetic algorithm.

According to another aspect of the present disclosure, there is provided an apparatus for processing a multi-modal signal, including:

the information acquisition module is used for acquiring the tactile signals, the image data and the audio and video data;

the force characteristic extraction module is used for determining the material type according to the image data and obtaining the force feedback characteristics under the corresponding material;

the characteristic extraction module is used for inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-modal input signal;

the compression coding module is used for carrying out compression coding on the multi-modal input signal based on the Weber's law to obtain a multi-modal characteristic signal;

and the signal sending module is used for sending the multi-modal characteristic signals to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signals.

According to still another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any of the above methods of processing multi-modal signals via execution of the executable instructions.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of processing a multimodal signal as recited in any of the above.

The method for processing the multi-modal signal provided by the embodiment of the disclosure is applied to a sending end, and comprises the following steps: collecting tactile signals, image data and audio-video data; determining the material type according to the image data, and obtaining the force feedback characteristic under the corresponding material; inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-modal input signal; performing compression coding on the multi-modal input signal based on the Weber's law to obtain a multi-modal characteristic signal; and sending the multi-modal characteristic signals to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signals. The tactile signals of the actual scene are restored by the aid of the image and audio and video information, and the accuracy of the tactile signals is improved; and the multi-modal input signals are effectively compressed through the processes of pre-feature extraction and compression coding, so that the data transmission quantity is reduced, and the accuracy of representation of the tactile signals is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.

FIG. 1 illustrates an existing haptic signal transmission schematic in one embodiment of the present disclosure;

FIG. 2 is a flow diagram illustrating a method for processing a multi-modal signal according to an embodiment of the disclosure;

FIG. 3 illustrates a multi-modal signal flow diagram in one embodiment of the present disclosure;

FIG. 4 is a flow diagram of a multi-modal signal processing method in accordance with yet another embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a method for multi-modal signal processing in yet another embodiment of the present disclosure;

FIG. 6 is a flow diagram of a multi-modal signal processing method according to another embodiment of the disclosure;

FIG. 7 is a schematic diagram of a device for processing multi-modal signals according to an embodiment of the disclosure; and

fig. 8 shows a block diagram of an electronic device in an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

According to the scheme provided by the application, the haptic signal compression and reconstruction are one of core technologies for realizing deep immersion and interactive experience.

In practical scenarios, there are usually different losses in the transmission and reception of signals, so that the modal signal is no longer complete. In the prior art related solution, a diagram of the prior art haptic signal transmission is shown in fig. 1. The pressure of the tactile signal is collected by the pressure collecting unit 110, the collected pressure is processed by the framing unit 120 to form frames with multiple rows and multiple columns of pressure, then the framing information is sent to the information receiving unit 140 of the receiving end by the information sending unit 130 through the communication network, then the received framing information is unframed by the unframed unit 150, and then the controlled unit 170 is controlled by the control unit 160 to realize signal simulation according to the unframed information.

Haptic sensations are characterized in the prior art related schemes described above by only rows and columns of pressure build frames, which are relatively singular. And still have because of the signalling process because of the data packet is big, there is the problem of delay or disappearance to different degrees.

Therefore, to solve the above problem, embodiments of the present disclosure provide a multi-modal signal processing method. The following description is made with reference to the accompanying drawings.

As shown in fig. 2, a flow chart of a method for processing a multi-modal signal is schematically shown, and in an embodiment of the present disclosure, a method for processing a multi-modal signal is provided, and applied to a transmitting end, and includes:

s201, collecting a touch signal, image data and audio and video data;

specifically, as shown in fig. 3, a multi-modal signal flow diagram is obtained by collecting information through three different modules, and collecting tactile signals through the tactile collection unit 301, where the tactile signals are derived from not only the pressure of the touch object on the skin, but also the surface shape, temperature, friction force, and the like of the object. The image detection unit 302 acquires image information of a current scene, for example, touch objects such as wood, metal, and the like in the current scene, and then acquires audio and video information of the current scene through the audio and video acquisition unit 303, for example, acquires currently emitted knocking sound and a corresponding video picture.

S202, determining the material type according to the image data, and obtaining the force feedback characteristics under the corresponding material;

with reference to fig. 3, the image detection unit 302 may determine a material type according to the image data, for example, determine that a material of the currently acting object is wood according to the image data, encode the material type, and then determine a feature of the acting force corresponding to the wood material. Optionally, the step may train a neural network model to learn force feedback characteristics under different materials, so as to achieve the monitoring purpose. The steps not only start from the angle of the tactile pressure, but also can comprehensively and accurately assist in restoring the tactile signal by combining the material type of the current application scene.

S203, inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-mode input signal;

specifically, as described with reference to fig. 3, in the feature extraction process, the feature extraction unit 304 inputs the haptic signals, the force feedback features, and the audio/video data in the unified scene into the preset neural network model, so as to extract the haptic signals in the current scene, extract the signal features through deep learning, mainly extract the features of various types of haptic signals such as key friction and vibration, extract relatively complete signals, better characterize the haptic information, and then obtain the multi-mode input signals, so that the haptic signals can be restored to a greater extent in the subsequent process. The multi-mode input signals comprise touch input signals after characteristic extraction, material types, force feedback characteristics and audio and video input signals.

S204, carrying out compression coding on the multi-modal input signal based on the Weber' S law to obtain a multi-modal characteristic signal;

specifically, in the compression encoding process, the cognition deviation law based on weber-fisher in the compression encoding module 305 is that only when the current kinesthetic signal is significantly different from the previous kinesthetic signal, people feel that the kinesthetic data can be compressed without affecting the user experience by adjusting the size of the region of the sensing blind region. And signals falling into the human perception blind area due to small stimulation in a short time are removed, and the compression coding of the human perception interval signals is realized. The effective compression can be carried out through the step, the data transmission quantity is reduced, and the delay and the missing degree in the data transmission process are reduced. Obtaining a multi-modal feature signal after compression coding, wherein the multi-modal feature signal specifically comprises: the compressed tactile input signal, the compressed audio and video input signal correspond to time sequence characteristic labels and other information, and material types and other information with different dimensions.

S205, the multi-modal characteristic signals are sent to a receiving end, so that the receiving end can simulate according to the multi-modal characteristic signals.

Specifically, with reference to fig. 3, the compressed multi-modal feature signal is sent to the receiving end through the information sending module 306, the information receiving module 307 of the receiving end decodes the multi-modal feature signal by using the decoding unit 308 after receiving the multi-modal feature signal, and performs fusion simulation by using the multi-modal fusion unit 309 after decoding, during the fusion simulation process, the multi-modal feature signal is received, and fusion is performed based on the obtained signals with different features, and then the control unit 310 controls the controlled unit 311 to perform simulation, so that the user can feel physical attributes of friction, vibration, and the like of an object by simulating the object with the same touch sense as the signal parameter, thereby improving the interactive experience of the user.

In the processing method of the multi-modal signal provided in this embodiment, the haptic signal, the image data and the audio/video data are collected; determining the material type according to the image data, and obtaining the force feedback characteristics under the corresponding material; inputting the touch signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction, and acquiring a multi-modal input signal; performing compression coding on the multi-modal input signal based on the Weber's law to obtain a multi-modal characteristic signal; and sending the multi-modal characteristic signals to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signals. The tactile signals of the actual scene are restored by the aid of the image and audio and video information, and the accuracy of the tactile signals is improved; the multi-modal input signal is effectively compressed through the processes of pre-feature extraction and compression coding, the data transmission quantity is reduced, and the accuracy of tactile signal representation is improved.

In an embodiment of the present disclosure, the multi-modal input signal comprises: a tactile input signal, an audio-video input signal and image information;

as shown in fig. 4, a flow diagram of another multi-modal signal processing method, where the inputting the haptic signal, the force feedback feature, and the audio/video data into a preset neural network model for feature extraction to obtain a multi-modal input signal includes:

s401, acquiring characteristic information of the force feedback characteristic and label information of the audio and video data;

s402, according to the feature information and the label information, signal feature extraction is carried out on the corresponding touch signals in the same time domain, and touch input signals are obtained.

Specifically, a preset neural network model is trained in advance, a test group and a training group are determined by using collected data of the output signal, the preset neural network model is initialized at first, training is performed after initialization, and the trained preset neural network model can obtain relatively complete touch input signals through the collected characteristic information of the force feedback characteristics and the auxiliary extraction of label information of the audio and video data. The method is used for extracting the characteristics of various touch signals such as key friction, vibration and the like, extracting relatively complete signals and representing touch information better.

In the step, in the characteristic extraction process, the tactile signals are combined with the object material image, audio and video signals of the same scene by using the abstract label information in the image and audio and video signals, and the tactile signals in the same scene are subjected to characteristic extraction, so that the tactile signals are restored to the maximum extent.

As shown in fig. 5, in another flow diagram of a multi-modal signal processing method, in the embodiment of the present disclosure, before the step of inputting the haptic signal, the force feedback feature, and the audio/video data into a preset neural network model for feature extraction to obtain a multi-modal input signal, the method includes:

s501, detecting a signal type corresponding to the audio and video data;

and S502, carrying out cross-modal signal coding according to the signal type to obtain an audio/video input signal.

Specifically, audio and video information when a tactile scene occurs is collected, the signal type is detected, and cross-mode signal coding is performed on the signal type. The coded audio and video input signal is obtained. The detection signal types can be used for detecting the sound and the picture of the audio and video data in the current application scene, and respectively correspond to different signal types. The cross-modal coding is combined coding by utilizing semantic correlation among multi-modal code streams, and the steps realize the cross-modal coding of audio and video.

In an embodiment of the present disclosure, the haptic signal includes: a vibration signal, a friction signal, and a pressure signal.

Optionally, the tactile signal in this embodiment includes, but is not limited to, a vibration signal, a friction signal, a pressure signal, a temperature signal, and the like. The type of the haptic signal may be determined according to an actual application scenario.

As shown in fig. 6, another flow chart of the multi-modal signal processing method, in the embodiment of the present disclosure, the multi-modal feature signal includes: the tactile characteristic signal, the audio and video characteristic signal and the image characteristic information;

s601, determining a perception blind area range of the touch input signal based on the Weber' S law;

s602, removing the signals of the tactile input signals in the perception blind area range according to the perception blind area range to obtain tactile characteristic signals;

s603, acquiring image characteristic information and audio-video characteristic signals of corresponding time domains according to the time domains of the tactile characteristic signals.

Specifically, only when the current kinesthetic signal is significantly different from the previous kinesthetic signal, people feel that the kinesthetic data can be compressed under the condition that the user experience is not influenced by adjusting the size of the area of the sensing blind area. And signals falling into the human perception blind area due to small stimulation in a short time are removed, and the compression coding of the signals in the human perception interval is realized. For example, m is a threshold of the range of the blind area, the input sampling point is an kinesthetic signal X, and the signal at the time T adjacent to X is the kinesthetic signal X _T According to the following expression:

IX _T -XI≤m·X _T (1)

since the signal at time T is less in amplitude and therefore not perceived by humans, it can be discarded from being transmitted. Discarding the perceptual blind signal can greatly reduce the transmission rate of the data packet.

And selecting image characteristic information and audio-video characteristic signals corresponding to the time domain according to the determined tactile characteristic signals to ensure that the signals are in the same scene.

In an embodiment of the present disclosure, the preset neural network model includes an input layer, a hidden layer, and an output layer; the number of neurons of the input layer and the output layer is determined according to the dimension of the tactile feature signal.

In the preset neural network model, the neural network model has an input layer, a hidden layer and an output layer in the model training process. The number of neurons in the input layer and the output layer is correspondingly selected by the input dimension and the output dimension of the touch signal.

In an embodiment of the present disclosure, the method includes:

Specifically, in the process of model initialization, a population individual of a genetic algorithm is formed by using a weight and a threshold of a neural network, then the population individual is encoded, after a fitness function is selected, crossover and mutation operations are performed, an individual with the highest fitness is repeatedly searched, and the individual is used as an initial value and a threshold of an optimal neural network and then used for training a neural network model. The obtained feature extraction capability of the preset neural network model is more accurate and efficient.

As shown in fig. 7, a schematic structural diagram of a processing apparatus for multi-modal signals is provided, in another embodiment of the present disclosure, the processing apparatus for multi-modal signals includes:

an information collecting module 701, configured to collect a haptic signal, image data, and audio/video data;

a force feature extraction module 702, configured to determine a material type according to the image data, and obtain a force feedback feature under a corresponding material;

the feature extraction module 703 is configured to input the haptic signal, the force feedback feature, and the audio/video data into a preset neural network model for feature extraction, so as to obtain a multi-modal input signal;

the compression coding module 704 is used for carrying out compression coding on the multi-modal input signal based on the weber law to obtain a multi-modal characteristic signal;

the signal sending module 705 is configured to send the multi-modal feature signal to a receiving end, so that the receiving end performs simulation according to the multi-modal feature signal.

The embodiment provides a processing apparatus for multi-modal signals, which comprises an information acquisition module 701, a force characteristic extraction module 702, a characteristic extraction module 703, a compression coding module 704 and a signal sending module 705. The tactile signals of the actual scene are restored by the aid of the image and audio and video information, and the accuracy of the tactile signals is improved; the multi-modal input signals are effectively compressed through the pre-arranged feature extraction module and the compression coding module, so that the data transmission quantity is reduced, and the accuracy of the representation of the touch signals is improved.

In yet another embodiment of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

In the electronic device provided by this embodiment, the processing method of the multi-modal signal is implemented by a processor.

And will not be described in detail herein.

In yet another embodiment of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the method of processing a multimodal signal as set forth in any of the above.

The present embodiment provides a computer-readable storage medium, which is executed by a processor through a computer program to implement the processing method of the multi-modal signal.

And will not be described in detail herein.

An electronic device 800 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code that can be executed by the processing unit 810, such that the processing unit 810 performs the steps according to various exemplary embodiments of the present invention described in the above section "exemplary method" of this specification. For example, the processing unit 810 may perform S201 as shown in fig. 2 to collect haptic signals, image data, and audio-video data; s202, determining the material type according to the image data, and obtaining the force feedback characteristics under the corresponding material; s203, inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction, and acquiring a multi-modal input signal; s204, carrying out compression coding on the multi-modal input signal based on the Weber' S law to obtain a multi-modal characteristic signal; s205 sends the multi-modal signature to the receiving end, so that the receiving end performs simulation according to the multi-modal signature.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

A program product for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for processing multi-modal signals, applied to a transmitting end, includes:

collecting tactile signals, image data and audio-video data;

inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-mode input signal;

and sending the multi-modal characteristic signal to a receiving end so that the receiving end can simulate according to the multi-modal characteristic signal.

2. A method of processing a haptic signal according to claim 1, wherein the multi-modal input signal comprises: a tactile input signal, an audio-video input signal and image information;

inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction, and acquiring a multi-modal input signal, wherein the method comprises the following steps:

3. A method for processing a haptic signal according to claim 1, wherein before the step of inputting the haptic signal, the force feedback feature and the audio/video data into a preset neural network model for feature extraction to obtain a multi-modal input signal, the method comprises:

detecting a signal type corresponding to the audio and video data;

and performing cross-modal signal coding according to the signal type to obtain an audio/video input signal.

4. The method of processing multi-modal signals of claim 1, wherein the haptic signals comprise: a vibration signal, a friction signal, and a pressure signal.

5. The method of processing multi-modal signals of claim 1, wherein the multi-modal feature signals comprise: the tactile characteristic signal, the audio and video characteristic signal and the image characteristic information; the method for compressing and encoding the multi-modal input signal based on the weber's law to obtain a multi-modal characteristic signal comprises the following steps:

removing the signals of the tactile input signals in the perception blind area range according to the perception blind area range to obtain tactile characteristic signals;

6. The method of processing multi-modal signals according to claim 1, wherein the pre-defined neural network model comprises an input layer, a hidden layer and an output layer; the number of the neurons of the input layer and the output layer is determined according to the dimension of the tactile characteristic signal.

7. The method of processing multi-modal signals of claim 6, the method comprising:

8. An apparatus for processing a multi-modal signal, comprising:

the characteristic extraction module is used for inputting the tactile signal, the force feedback characteristic and the audio and video data into a preset neural network model for characteristic extraction to obtain a multi-mode input signal;

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of processing multi-modal signals of any of claims 1 to 7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of processing a multimodal signal as claimed in any one of claims 1 to 7.