CN117202372A

CN117202372A - Communication mode determining method and device, electronic equipment and storage medium

Info

Publication number: CN117202372A
Application number: CN202311199399.6A
Authority: CN
Inventors: 向湘鹏
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2023-12-08

Abstract

The invention provides a method and a device for determining a communication mode, electronic equipment and a storage medium, wherein the method comprises the following steps: determining candidate communication modes of the control terminal and the equipment to be controlled; and determining an optimal communication mode in the candidate communication modes by adopting a multi-agent depth deterministic strategy gradient algorithm. Therefore, the optimal communication mode can be determined in the candidate communication modes according to actual conditions, so that the communication efficiency is improved, and the communication safety is considered. And the multi-agent depth deterministic strategy gradient algorithm is adopted to optimize the communication mode selection mode, so that the performance of the system can be further improved, the system can have self-adaption and learning capability, and the communication flexibility and adaptability of the system are enhanced.

Description

Communication mode determining method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a method and a device for determining a communication mode, electronic equipment and a storage medium.

Background

When a mobile device is used to remotely control other devices, bluetooth communication technology, wi-Fi local area network communication technology, and technology for performing remote communication through the internet are all common communication modes. However, in the related art, the selection of the communication mode depends on a single network connection or user setting, and if the network environment changes, such as the Wi-Fi signal is weak or lost, the system may not work properly. In addition, in the related art, the optimization problem of the communication scheme selection is not considered, and the adopted communication scheme does not consider the problems of communication efficiency, safety and resource utilization.

Disclosure of Invention

The embodiment of the invention provides a method, a device, an electronic device and a storage medium for determining a communication mode, which are used for solving the problems that the communication mode is not considered in terms of communication efficiency, safety and resource utilization rate when other devices are remotely controlled by using mobile equipment in the related art.

In a first aspect, an embodiment of the present invention provides a method for determining a communication manner, where the method includes:

determining candidate communication modes of the control terminal and the equipment to be controlled;

and determining an optimal communication mode in the candidate communication modes by adopting a multi-agent depth deterministic strategy gradient algorithm.

Optionally, a multi-agent depth deterministic strategy gradient algorithm is adopted, and determining the optimal communication mode in the candidate communication modes includes:

each candidate communication mode is respectively used as an agent;

determining an expected jackpot for each agent;

the candidate communication means represented by the agent with the highest expected jackpot is determined to be the best communication means.

Optionally, determining the expected jackpot for each agent includes:

determining the current environment states of the control terminal and the equipment to be controlled, wherein the current environment states comprise at least one of the following: the method comprises the steps of enabling a distance between the control terminal and equipment to be controlled, bluetooth signal quality between the control terminal and the equipment to be controlled, network connection quality of a local area network and/or a public network between the control terminal and the equipment to be controlled and use efficiency of communication energy between the control terminal and the equipment to be controlled;

taking the current environment state as the environment state of the intelligent agent;

determining a policy network and a value network of each agent;

inputting the environmental state into the policy network to obtain actions of an agent corresponding to the policy network, wherein the actions comprise any one of the following steps: activating the candidate communication mode represented by the corresponding agent or not activating the candidate communication mode represented by the corresponding agent;

inputting said action and said environmental status into said value network to obtain said expected jackpot.

Optionally, optimizing the policy network and the value network using the environmental state of the agent and the actions of the agent;

wherein, the optimization objective of the policy network is: on the premise that the environmental state is determined, an action of maximizing the expected jackpot output by the value network can be output;

the optimization targets of the value network are as follows: minimizing the square difference of the expected jackpot and the actual jackpot that is output.

Optionally, after the multi-agent depth deterministic strategy gradient algorithm is adopted and the optimal communication mode is determined in the candidate communication modes, the method further comprises:

establishing connection between the control terminal and the equipment to be controlled in the optimal communication mode;

after the connection is established, displaying an analog touch pad on a display screen;

and receiving an operation instruction of a user on the simulated touch pad, and sending a corresponding control instruction to the equipment to be controlled according to the operation instruction, wherein the control instruction is used for indicating the execution of the corresponding operation.

Optionally, in the case that the simulated touch pad is a simulated keyboard, the method further includes:

determining a selected key position on the simulated keyboard based on the operating instruction;

determining a target key value corresponding to the selected key position based on a preset key position and key value corresponding relation table; the target key value is sent to the equipment to be controlled; the target key value is used for indicating execution of the control instruction corresponding to the key position corresponding to the target key value.

Optionally, the candidate communication means includes at least one of: bluetooth communication mode, LAN communication mode, public network communication mode, infrared communication mode, radio frequency communication mode.

In a second aspect, an embodiment of the present invention provides a device for determining a communication manner, where the device includes:

the candidate communication mode determining module is used for determining candidate communication modes of the control terminal and the equipment to be controlled;

and the optimal communication mode determining module is used for determining an optimal communication mode in the candidate communication modes by adopting a multi-agent depth deterministic strategy gradient algorithm.

Optionally, the optimal communication mode determining module is further configured to use each candidate communication mode as an agent respectively;

determining an expected jackpot for each agent;

Optionally, the optimal communication manner determining module is further configured to determine a current environmental state where the control terminal and the device to be controlled are located, where the current environmental state includes at least one of the following: the method comprises the steps of enabling a distance between the control terminal and equipment to be controlled, bluetooth signal quality between the control terminal and the equipment to be controlled, network connection quality of a local area network and/or a public network between the control terminal and the equipment to be controlled and use efficiency of communication energy between the control terminal and the equipment to be controlled;

determining a policy network and a value network of each agent;

Optionally, the optimal communication mode determining module is further configured to optimize the policy network and the value network by adopting an environmental state of the agent and an action of the agent;

Optionally, the apparatus further includes:

the connection establishment module is used for establishing connection between the optimal communication mode and the equipment to be controlled by adopting a multi-agent depth deterministic strategy gradient algorithm after determining the optimal communication mode in the candidate communication modes;

the display module is used for displaying the analog touch pad on the display screen after the connection is established;

the receiving module is used for receiving an operation instruction of a user on the analog touch pad;

and the sending module is used for sending a corresponding control instruction to the equipment to be controlled according to the operation instruction, wherein the control instruction is used for indicating the execution of the corresponding operation.

Optionally, the apparatus further includes:

the key position determining module is used for determining the selected key position on the simulated keyboard based on the operation instruction under the condition that the simulated touch pad is the simulated keyboard;

the target key value determining module is used for determining a target key value corresponding to the selected key position based on a preset key position and key value corresponding relation table;

the sending module is further configured to send the target key value to the device to be controlled; the target key value is used for indicating execution of the control instruction corresponding to the key position corresponding to the target key value.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method of determining a communication scheme as described in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for determining a communication mode according to the first aspect.

Therefore, the optimal communication mode can be determined in the candidate communication modes according to actual conditions, so that the communication efficiency is improved, and the communication safety is considered. And the multi-agent depth deterministic strategy gradient algorithm is adopted to optimize the communication mode selection mode, so that the performance of the system can be further improved, the system can have self-adaption and learning capability, and the communication flexibility and adaptability of the system are enhanced.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flow chart of a method for determining a communication mode according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for determining a communication mode according to an embodiment of the present invention;

fig. 3 is a schematic diagram of communication using WebSocket in a local area network according to an embodiment of the present invention;

fig. 4 is a schematic diagram of communication using a public network transit server according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an analog touch pad according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a simulated keyboard according to an embodiment of the present invention;

fig. 7 is a block diagram of a communication mode determining apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 shows a method for determining a communication manner according to an embodiment of the present invention, where the method includes:

step S101, determining candidate communication modes of a control terminal and equipment to be controlled;

step S102, a multi-agent depth deterministic strategy gradient algorithm is adopted, and an optimal communication mode is determined in the candidate communication modes.

The method for determining the communication modes can be switched among a plurality of communication modes such as Bluetooth, local area network, public network and the like, so that the system has stronger communication flexibility and adaptability. In addition, the embodiment of the invention optimizes the communication mode selection strategy by adopting a multi-agent depth deterministic strategy gradient (MADDPG) algorithm, which not only can further improve the performance of the system, but also can enable the system to have self-adaption and learning capability.

The control terminal can be mobile equipment such as a mobile phone, and the equipment to be controlled can be cloud computer equipment. The candidate communication means includes at least one of: bluetooth communication mode, LAN communication mode, public network communication mode, infrared communication mode, radio frequency communication mode.

In one possible implementation manner, step S102, using a multi-agent depth deterministic strategy gradient algorithm, determining the optimal communication mode among the candidate communication modes includes: each candidate communication mode is respectively used as an agent; determining an expected jackpot for each agent; the candidate communication means represented by the agent with the highest expected jackpot is determined to be the best communication means.

In one possible implementation, as shown in FIG. 2, determining the expected jackpot for each agent includes:

step S201, determining the current environment states of the control terminal and the equipment to be controlled;

step S202, taking the current environment state as the environment state of the intelligent agent;

step S203, determining a strategy network and a value network of each agent;

step S204, inputting the environmental state into a strategy network to obtain the action of an intelligent agent corresponding to the strategy network;

step S205, inputting the action and the environmental status into the value network to obtain the expected jackpot.

Wherein the current environmental state comprises at least one of: the method comprises the steps of controlling the distance between a terminal and equipment to be controlled, bluetooth signal quality between the terminal and the equipment to be controlled, network connection quality of a local area network and/or a public network between the terminal and the equipment to be controlled, and the use efficiency of communication energy between the terminal and the equipment to be controlled; the actions include any of the following: and activating the candidate communication mode represented by the corresponding agent or not activating the candidate communication mode represented by the corresponding agent.

When determining the optimal communication method, an alternative method is to use the communication method selected by the user as the optimal communication method according to the instruction of the user, and an alternative method is to determine the optimal communication method according to the preset priority. For example: the system (comprising the control terminal and the device to be controlled) firstly tries to connect the device through Bluetooth (Core Bluetooth), and if the device is in the Bluetooth range and the signal quality is good enough, the system will preferably select the communication mode with low power consumption and high speed; if the bluetooth connection is not available or the signal quality is poor, the system will attempt to use WebSocket communications within the local area network, as shown in fig. 3, and if neither is available, the system will eventually choose to connect through the transit server in the public network, as shown in fig. 4.

Among them, webSocket is a protocol for full duplex communication over a single TCP (Transmission Control Protocol ) connection. The WebSocket makes data exchange between the client and the server simpler, and allows the server to actively push data to the client. In WebSocket API (Application Programming Interface ), the browser and the server only need to complete a handshake once, and a persistent connection can be directly created between the two, and bidirectional data transmission is performed.

Another alternative is: the system uses madppg (Multiple Agent Deep Deterministic Policy Gradient, multi-agent depth deterministic strategy gradient algorithm).

In the multi-factor communication mode selection strategy, based on the MADDPG algorithm, continuous action space can be processed, and the method is suitable for multi-agent cooperation and competition. Each communication scheme can be seen as an agent whose goal is to select an action (active or inactive) under a given environmental condition (including distance between devices, bluetooth signal quality, network connection quality and energy usage efficiency), i.e., whether to activate the communication scheme represented by the agent itself, to maximize a jackpot.

In one possible implementation, the method further includes: and optimizing a policy network and a value network by adopting the environmental states of the intelligent agents and the actions of the intelligent agents. Specifically, a gradient descent method can be adopted to optimize a strategy network and a value network, wherein the gradient descent method is as follows: the method comprises the steps of minimizing or maximizing an objective function corresponding to a model by adjusting parameters of the model, wherein the model is a strategy network or a value network, the objective function corresponding to the strategy network is a strategy gradient, and the objective function corresponding to the value network is a square difference between an expected accumulated rewards and an actual accumulated rewards output by the value network; wherein, the optimization objective of the strategy network is: on the premise of determining the environmental state, outputting an action of maximizing the expected jackpot output by the value network; the optimization targets of the value network are: minimizing the square difference of the expected jackpot and the actual jackpot that is output.

That is, one policy network and one value network may be defined for each agent. The policy network is used to select an action and the value network is used to estimate an expected jackpot for the action. The parameters of the two networks can be optimized by a gradient descent method.

Wherein, the optimization objective of the strategy network is: maximizing the output of the value network, i.e., maximizing the expected jackpot.

max0E[Q(s,a1,a2，.aN0')]

The optimization targets of the value network are: minimizing the square difference of the expected jackpot versus the actual jackpot.

min 0'E [ (Q (s, a1, a2, aNθ -r.decaVQ (Sa 1a2aN 0')) ]

Where θ is a parameter of the policy network, s is a current environmental state of the agent, a1, a2,) aN is aN action of the agent in the current environmental state, Q represents the value network, θ ' is a parameter of the value network, r is aN actual reward, γ is a discount factor, s ' is a next environmental state of the agent, a1', a2,) aN is aN action of the agent in the next environmental state. By optimizing the above formula, the intelligent agent can be trained to automatically select the optimal communication mode. For example, assume that there are three communication modes: bluetooth (BT), local Area Network (LAN), and public network (WAN). At some point the distance between the devices is 2 meters, the bluetooth signal quality is 80%, the network connection quality is 70%, the energy usage efficiency is 60%, the madppg algorithm can be used to calculate the expected jackpot for each communication mode and select the communication mode with the highest prize. If the calculated expected jackpot is: BT:10, lan:8, wan:6, bluetooth communication should be selected as the best communication mode.

It should be noted that, the system may adjust the communication mode according to the actual situation, for example: the adjustment is performed once every preset time or real-time detection and real-time adjustment are performed. And the communication method is not limited to the above three, for example: wireless communication protocols such as infrared communication, radio frequency communication, etc. may also be employed.

In summary, the adaptive communication selection strategy can maximize communication efficiency and energy consumption usage. The system will continuously monitor four key factors: distance between devices, bluetooth signal quality, network connection quality, and energy usage efficiency. When the states of the factors change, the system can comprehensively consider the four key factors, automatically adjust the adopted communication strategy to maintain the optimal communication effect, and the capability of automatically adjusting the communication strategy enables the system to keep high-efficiency running in a continuously-changing environment.

The energy use efficiency refers to the ability to effectively utilize and manage energy resources during the use of the device. It measures the proportional relationship between the energy consumed by a device when it performs a particular task or provides a service and the functions it performs.

The madppg algorithm may consider the importance of each factor, as well as the interaction effect between the factors, to more accurately select the optimal communication mode. The system can use the Netty network framework, has the advantage of cross-platform, can be used on iOS equipment and Android equipment, and has a wider application range.

It should be noted that other reinforcement learning algorithms such as Q-learning and monte carlo tree search may be used to select the optimal communication method.

In one possible implementation, after determining the optimal communication mode among the candidate communication modes using the multi-agent depth deterministic strategy gradient algorithm, the method further includes: establishing connection between the control terminal and the equipment to be controlled in an optimal communication mode; after the connection is established, displaying an analog touch pad on a display screen; receiving an operation instruction of a user on the simulated touch pad, and sending a corresponding control instruction to the equipment to be controlled according to the operation instruction, wherein the control instruction is used for indicating the execution of the corresponding operation. That is, the device to be controlled is configured to receive the control instruction and execute an operation corresponding to the control instruction.

It should be noted that, as shown in fig. 5, after the device to be controlled (such as a cloud computer device) is connected via a bluetooth/wireless lan, the APP on the control terminal may automatically switch to a control mode, display a simulated touch pad, and simulate the movement of a mouse within the area range of icon 1 shown in fig. 5, and the area ranges of icon 2 and icon 3 respectively simulate the left click and the right click of the mouse. The view layer can package the user triggering event into an event object through an event transmission and distribution mechanism by moving a finger in the screen range, the event object is transmitted to the view controller as an event transmission message carrier, the mobile coordinate or the touch event is transmitted to the cloud computer through a transmission protocol by the system library finally, and the cloud computer executes corresponding operation according to the transmitted data.

In one possible implementation manner, in a case that the simulated touch pad is a simulated keyboard, the method further includes: determining a selected key position on the simulated keyboard based on the operating instruction; determining a target key value corresponding to the selected key position based on a preset corresponding relation table of the key position and the key value; and sending the target key value to the equipment to be controlled, wherein the target key value is used for indicating the execution of the control instruction corresponding to the key position corresponding to the target key value. That is, the device to be controlled is further configured to receive the target key value, search a key position corresponding to the target key value in the mapping table, and execute a control instruction corresponding to the key position corresponding to the target key value.

It should be noted that, the control terminal may also simulate keyboard input, and the cloud computer obtains the corresponding key position according to the key value table through the protocol of defining the key value transmitted by each key position with the cloud computer terminal, and simulates the cloud computer to output the key value. The keyboard bit also supports receiving voice input using the icon 4 button as shown in fig. 6, connecting to the cloud computer microphone. Specifically, after the cloud computer is connected, an audio stream (in a contracted H264 coding format) is output by using a transmission protocol, and an iOS system CoreAudio (digital audio processing base) is used for participating in decoding, so that the cloud computer terminal is supported to play sound under the condition of no loudspeaker.

In a specific application scenario, the method of the embodiment of the invention can also utilize various hardware of the mobile device (such as the IOS mobile device) to achieve the purpose of replacing various PC external devices, such as: screen touch replaces mouse (mobile, sliding, left click, right click), keyboard input, microphone and speaker, and the mobile device can also connect other hardware through bluetooth or USB, such as: printers, scanners, etc.

It should be noted that, the application scenario of the embodiment of the present invention may be: the cloud computer hardware (such as a card machine) has no peripheral equipment, and uses the mobile equipment (a control terminal) to replace a scene of the hardware peripheral equipment, or the cloud computer hardware needs to be externally connected with a screen, and the scene of control is performed by simulating a touch pad and a keyboard by utilizing the information projection technology of the iOS.

In summary, the embodiment of the invention provides a system, which can be a mobile equipment screen touch linkage cloud computer control system, replaces hardware peripherals with a software layer, dynamically selects an optimal communication mode according to the current environment and requirements by implementing a multi-mode communication selection strategy, and can ensure the communication efficiency and flexibility under different network environments no matter Bluetooth, local area network or public network; the portable external hardware is superior to the external hardware which is required to be matched, the downloading rate and the utilization rate of the product application software are increased, virtual peripheral equipment is relied on the software level, and the viscosity degree and the user quantity of the user product are improved; in terms of customizable, the virtual peripheral is relied on a software level, so that the virtual peripheral can be flexibly customized according to the needs of users.

With the rapid development of mobile internet technology, mobile devices have become an indispensable part of people's daily life and work. People increasingly rely on mobile devices such as mobile phones, tablets and the like for communication, entertainment, learning and work. This trend has driven the rapid growth of the mobile device market, and there is a great need to improve the use experience of mobile devices, such as better control, safer data transmission. The current network environment becomes more and more complex, and not only includes traditional wired networks and wireless networks, but also various types of networks such as bluetooth, public networks, private clouds and the like. In this case, a control system that can operate efficiently in a variety of network environments clearly has a great help to improve user experience and use efficiency. The software layer replaces hardware peripheral equipment, and the portable level is superior to the portable level that external hardware is required to be matched. And the download rate and the use rate of the application software of the product are increased, virtual peripheral equipment is relied on the software level, the cost is reduced by developers and users, and the viscosity degree and the user quantity of the product are also improved.

Fig. 7 shows a communication manner determining apparatus according to an embodiment of the present invention, the apparatus 70 includes:

a candidate communication mode determining module 701, configured to determine a candidate communication mode of the control terminal and the device to be controlled;

the optimal communication mode determining module 702 is configured to determine an optimal communication mode among candidate communication modes by using a multi-agent depth deterministic strategy gradient algorithm.

In one possible implementation, the best communication mode determining module 702 is further configured to use each candidate communication mode as an agent;

determining an expected jackpot for each agent;

In a possible implementation manner, the best communication manner determining module 702 is further configured to determine a current environmental state of the control terminal and the device to be controlled, where the current environmental state includes at least one of the following: the method comprises the steps of controlling the distance between a terminal and equipment to be controlled, bluetooth signal quality between the terminal and the equipment to be controlled, network connection quality of a local area network and/or a public network between the terminal and the equipment to be controlled, and the use efficiency of communication energy between the terminal and the equipment to be controlled;

taking the current environmental state as the environmental state of the intelligent agent;

determining a policy network and a value network of each agent;

inputting the environmental state into a policy network to obtain actions of an agent corresponding to the policy network, wherein the actions comprise any one of the following steps: activating the candidate communication mode represented by the corresponding agent or not activating the candidate communication mode represented by the corresponding agent;

the actions and environmental conditions are entered into the value network to obtain the desired jackpot.

In one possible implementation, the best communication mode determining module 702 is further configured to optimize a policy network and a value network by using the environmental state of the agent and the actions of the agent;

wherein, the optimization objective of the strategy network is: on the premise of determining the environmental state, outputting an action of maximizing the expected jackpot output by the value network;

the optimization targets of the value network are: minimizing the square difference of the expected jackpot and the actual jackpot that is output.

In one possible implementation, the apparatus 70 further includes:

the connection establishment module is used for establishing connection between the optimal communication mode and equipment to be controlled in the candidate communication mode after the optimal communication mode is determined by adopting a multi-agent depth deterministic strategy gradient algorithm;

In one possible implementation, the apparatus 70 further includes:

the sending module is also used for sending the target key value to the equipment to be controlled; the target key value is used for indicating execution of a control instruction corresponding to a key position corresponding to the target key value.

In one possible implementation, the candidate communication means includes at least one of: bluetooth communication mode, LAN communication mode, public network communication mode, infrared communication mode, radio frequency communication mode.

According to the embodiment of the invention, a multi-factor-based communication mode selection strategy is provided on a software level, and the optimal communication mode can be dynamically selected according to environmental conditions, so that the communication efficiency and the communication safety are considered. In addition, the MADDPG algorithm is utilized to optimize the strategy selection mode, so that the performance of the system can be further improved. The multimedia hardware of the mobile terminal equipment is utilized, cloud computer equipment is controlled through a Bluetooth/real-time full duplex protocol, the functions of a keyboard, a mouse, a microphone, a sound and the like can be realized without additional peripheral equipment, the external equipment is isolated, the windows driver of the original external equipment is installed on a single cloud service, the access of users on the domestic computer environment can be realized, and the conversation isolation among the users is realized.

The embodiment of the invention also provides an electronic device 80, as shown in fig. 8, including: a processor 801, a memory 802, and a program stored in the memory 802 and executable on the processor 801, which when executed by the processor, implements the steps of a method for determining a communication scheme as in the above-described embodiment.

The embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements each process of the method embodiment shown in fig. 1 and achieves the same technical effects, and is not repeated herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. A method for determining a communication mode, the method comprising:

2. The method of claim 1, wherein using a multi-agent depth deterministic strategy gradient algorithm, determining the best communication mode among the candidate communication modes comprises:

each candidate communication mode is respectively used as an agent;

determining an expected jackpot for each agent;

3. The method of claim 2, wherein determining the expected jackpot for each agent comprises:

determining a policy network and a value network of each agent;

4. A method according to claim 3, characterized in that the method further comprises:

optimizing the policy network and the value network by adopting the environmental state of the intelligent agent and the action of the intelligent agent;

5. The method of claim 1, wherein after determining the best communication mode among the candidate communication modes using a multi-agent depth deterministic strategy gradient algorithm, the method further comprises:

establishing connection with the equipment to be controlled in the optimal communication mode;

6. The method of claim 5, wherein in the case where the simulated touch pad is a simulated keyboard, the method further comprises:

7. The method according to any of claims 1-6, wherein the candidate communication means comprises at least one of: bluetooth communication mode, LAN communication mode, public network communication mode, infrared communication mode, radio frequency communication mode.

8. A communication method determining apparatus, the apparatus comprising:

9. An electronic device, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method of determining a communication scheme as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method for determining a communication mode according to any of claims 1 to 7.