CN114125354A - Method for cooperation of intelligent sound box and electronic equipment - Google Patents

Method for cooperation of intelligent sound box and electronic equipment Download PDF

Info

Publication number
CN114125354A
CN114125354A CN202111309784.2A CN202111309784A CN114125354A CN 114125354 A CN114125354 A CN 114125354A CN 202111309784 A CN202111309784 A CN 202111309784A CN 114125354 A CN114125354 A CN 114125354A
Authority
CN
China
Prior art keywords
electronic device
server
content
electronic equipment
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111309784.2A
Other languages
Chinese (zh)
Inventor
杨毅轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202111309784.2A priority Critical patent/CN114125354A/en
Publication of CN114125354A publication Critical patent/CN114125354A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/146Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A method for cooperation of an intelligent sound box and electronic equipment and the electronic equipment relate to the technical field of communication and can improve user experience, and the method comprises the following steps: the first electronic equipment receives and sends voice input to the server; the method comprises the steps that a server sends a first calling instruction carrying a call identifier to first electronic equipment; the first electronic equipment establishes call connection with the third electronic equipment through the server; the first electronic equipment sends a second call instruction carrying the call identifier to the second electronic equipment; the second electronic equipment sends a call identifier to the server; the server associates the second electronic device to the call connection; the method comprises the steps that first electronic equipment collects sound information to generate a first audio file containing first collection time, and the first audio file is sent to third electronic equipment through a server; the second electronic equipment collects image information to generate a first video file containing second collection time, and the first video file is sent to the third electronic equipment through the server.

Description

Method for cooperation of intelligent sound box and electronic equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method for cooperating an intelligent speaker with an electronic device, and an electronic device.
Background
With the rapid development of artificial intelligence technology, an intelligent sound box as an artificial intelligence interaction entry point becomes one of the most popular hardware devices at present. In order to provide a better human-computer interaction experience for the user, for example: the high-definition video call is provided, the high-definition video is watched, and the intelligent loudspeaker box with the screen and the camera is urgently needed to be provided in the market.
However, installing a screen and a camera for the smart speaker increases the cost of the smart speaker. In addition, for intelligent audio amplifier has installed screen and camera, still has such problem: due to the limitation of the volume of the intelligent sound box, an installed screen is usually small, the resolution ratio is low, and the visual experience of a user is not good. Moreover, the addition of a screen reduces the space for accommodating the speaker and microphone, thereby causing the speaker and microphone in the smart speaker to be squeezed. Therefore, how to provide equipment such as a screen and a camera for the intelligent sound box becomes a problem to be solved urgently.
Disclosure of Invention
According to the method for the intelligent sound box and the electronic equipment, the video and the audio can be played on different equipment, the advantages of the different equipment are played, and the user experience is improved.
In a first aspect, the present application provides a method for a video call by cooperation of multiple electronic devices, including: the system comprises a first electronic device, a second electronic device with a display screen, a third electronic device and a server; the method comprises the steps that a first electronic device receives voice input of a user; the first electronic equipment sends voice input to the server; the server sends a first call instruction to the first electronic equipment according to the voice input, wherein the first call instruction carries a call identifier; in response to receiving the first call instruction, the first electronic device establishes a call connection with a third electronic device via the server; the first electronic equipment sends a second calling instruction to the second electronic equipment, and the second calling instruction carries a call identifier; in response to receiving the second call instruction, the second electronic equipment sends a call identifier to the server; in response to receiving the call identifier sent by the second electronic device, the server associates the second electronic device with the call connection; the method comprises the steps that first electronic equipment collects sound information to generate a first audio file, and sends the first audio file to a server, wherein the first audio file comprises first collection time; the second electronic equipment collects image information to generate a first video file and sends the first video file to the server, and the first video file comprises second collection time; and the server sends the first audio file and the first video file to the third electronic equipment through the call connection.
Therefore, the first electronic equipment collects the first audio file, the second electronic equipment collects the second video file, different advantages of different electronic equipment can be exerted, the quality of audio and video collected by a user is improved, and visual and auditory experience is promoted.
In a possible implementation manner, the account number of the first electronic device logging in the server is the same as the account number of the second electronic device logging in the server.
In a possible implementation manner, the second call instruction carries a local time of the first electronic device, and the method further includes: and the second electronic equipment determines the difference value between the local time of the second electronic equipment and the local time of the first electronic equipment according to the local time of the first electronic equipment and the current local time of the second electronic equipment.
In a possible implementation manner, the second capture time is a difference value added to or subtracted from a local time of the second electronic device when the second electronic device captures the first video file.
Therefore, the collected first video file and the second audio file are marked by using the time standard of the first electronic equipment, and the synchronous playing of the first video file and the second audio file is realized.
In a possible implementation manner, the first call instruction carries a phone number corresponding to the third electronic device; in response to receiving the first call instruction, the first electronic device establishes a call connection with a third electronic device via the server, including: and responding to the received first calling instruction, and the first electronic equipment establishes call connection with the third electronic equipment through the server according to the telephone number corresponding to the third electronic equipment.
In a possible implementation manner, the first call instruction carries a voice playing instruction to instruct the first electronic device to play a voice prompt.
In a possible implementation manner, the second call instruction carries a display instruction to instruct the second electronic device to display the prompt message.
In a possible implementation, the method further includes: the second electronic equipment receives a second video file, and the second video file carries third acquisition time; the second electronic equipment sends a notification message to the first electronic equipment, and the notification message indicates that the second electronic equipment receives the second video file; the first electronic equipment sends a playing instruction to the second electronic equipment, and the playing instruction instructs the second electronic equipment to play the second video file; and the first electronic equipment plays a second audio file corresponding to the third acquisition time.
Therefore, the playing instruction sent to the second electronic device by the first electronic device enables the second electronic device to play the second video file while the first electronic device plays the second audio file, different advantages of different electronic devices are played, and user experience is improved.
In a possible implementation manner, the first electronic device is a smart sound box, and the second electronic device is any one of a mobile phone, a computer with a screen, a television and a tablet computer.
In a second aspect, the present application provides a method for a video call by cooperation of multiple electronic devices, including: a first electronic device, a second electronic device having a display screen, and a third electronic device; the method comprises the steps that first electronic equipment receives voice input of a user and sends the voice input to a server; the method comprises the steps that first electronic equipment receives a first calling instruction sent by a server, wherein the first calling instruction is sent by the server in response to received voice input, and the first calling instruction carries a call identifier; the first electronic equipment establishes call connection with the third electronic equipment through the server; the first electronic equipment sends a second calling instruction to the second electronic equipment, and the second calling instruction carries a call identifier; in response to receiving the second call instruction, the second electronic device sends a call identifier to the server so that the server associates the second electronic device with the call connection; the method comprises the steps that first electronic equipment collects sound information to generate a first audio file, the first audio file is sent to third electronic equipment through a server, and the first audio file comprises first collection time; the second electronic equipment collects image information to generate a first video file, and the first video file is sent to the third electronic equipment through the server, and the first video file comprises second collection time.
In a possible implementation manner, the account number of the first electronic device logging in the server is the same as the account number of the second electronic device logging in the server.
In a possible implementation manner, the second call instruction carries a local time of the first electronic device, and the method further includes: and the second electronic equipment determines the difference value between the local time of the second electronic equipment and the local time of the first electronic equipment according to the local time of the first electronic equipment and the current local time of the second electronic equipment.
In a possible implementation manner, the second capture time is a difference value added to or subtracted from a local time of the second electronic device when the second electronic device captures the first video file.
In a possible implementation, the method further includes: the second electronic equipment receives a second video file, and the second video file carries third acquisition time; the second electronic equipment sends a notification message to the first electronic equipment, and the notification message indicates that the second electronic equipment receives the second video file; the first electronic equipment sends a playing instruction to the second electronic equipment, and the playing instruction instructs the second electronic equipment to play the second video file; and the first electronic equipment plays a second audio file corresponding to the third acquisition time.
In a third aspect, the present application provides a method for executing audio/video playing by cooperation of multiple electronic devices, including: the method comprises the steps that a first electronic device receives a voice request for playing first content; the first electronic equipment acquires the address of the first content according to the voice request; the first electronic device sending an address of the first content to a second electronic device having a display screen associated with the first electronic device; the method comprises the steps that first electronic equipment sends a first instruction to second electronic equipment, the first instruction instructs the second electronic equipment to display a picture of first content from a first time, and the picture of the first content is obtained by the second electronic equipment according to an address of the first content; the first electronic device plays the audio of the first content from the first time, wherein the audio of the first content is acquired by the first electronic device according to the address of the first content.
In a possible implementation manner, the method further includes: and if the distance between the first electronic equipment and the second electronic equipment is smaller than a first threshold value, automatically associating the first electronic equipment with the second electronic equipment.
In a possible implementation manner, if a distance between the first electronic device and the second electronic device is smaller than a threshold, the first electronic device and the second electronic device are automatically associated, specifically including: and if the distance between the first electronic equipment and the second electronic equipment is smaller than the threshold value, and the account number of the first electronic equipment for logging in the server is the same as the account number of the second electronic equipment for logging in the server, automatically associating the first electronic equipment with the second electronic equipment.
In one possible implementation, after the first electronic device and the second electronic device are automatically associated, the method further includes: the second electronic equipment automatically opens the first application and displays the first application in a full screen mode.
In one possible implementation, after the first electronic device and the second electronic device are automatically associated, the method further includes: the second electronic equipment automatically plays the picture of the first content in a full screen mode.
In a possible implementation manner, in a process in which a first electronic device plays an audio of a first content and a second electronic device plays a picture of the first content, the method further includes: the second electronic equipment sends a first progress to the first electronic equipment, wherein the first progress is the progress of the second electronic equipment in playing the picture of the first content; and the first electronic equipment sends a progress adjusting instruction to the second electronic equipment according to the first progress.
In a possible implementation manner, the first electronic device is a smart sound box, and the second electronic device is any one of a mobile phone, a computer with a screen, a television and a tablet computer.
In a fourth aspect, the present application provides a method for cooperation of multiple electronic devices, including: the first electronic equipment receives a request for playing first content; the first electronic equipment acquires first information and second information according to a request for playing the first content; the method comprises the steps that a first electronic device sends first information to a second electronic device which is associated with the first electronic device and provided with a display screen; the method comprises the steps that first electronic equipment sends a first instruction to second electronic equipment, the first instruction instructs the second electronic equipment to display an image of first content, and the image of the first content is acquired by the second electronic equipment according to first information; and the first electronic equipment plays the audio of the first content, wherein the audio of the first content is acquired by the first electronic equipment according to the second information.
Therefore, the audio of the first content and the image of the first content are respectively played on different electronic devices. In addition, the first electronic device can control the second electronic device to realize synchronous playing of the first electronic device and the second electronic device, and user experience is improved.
In a fifth aspect, the present application provides a communication system, comprising: the system comprises a first electronic device, a second electronic device with a display screen, a third electronic device and a server; the first electronic equipment is used for receiving voice input of a user and sending the voice input to the server; the server is used for sending a first calling instruction to the first electronic equipment according to the voice input, wherein the first calling instruction carries a call identifier; a first electronic device further to: establishing a call connection with a third electronic device via a server in response to receiving the first call instruction; sending a second calling instruction to the second electronic equipment, wherein the second calling instruction carries a call identifier; the second electronic equipment is used for responding to the received second call instruction and sending a call identifier to the server; the server is further used for responding to the received call identifier sent by the second electronic equipment and associating the second electronic equipment to call connection; the first electronic equipment is also used for acquiring sound information to generate a first audio file and sending the first audio file to the server, and the first audio file comprises first acquisition time; the second electronic equipment is also used for acquiring image information to generate a first video file and sending the first video file to the server, and the first video file comprises second acquisition time; and the server is also used for sending the first audio file and the first video file to the third electronic equipment through the call connection.
In a possible implementation manner, the account number of the first electronic device logging in the server is the same as the account number of the second electronic device logging in the server.
In a possible implementation manner, the second call instruction carries a local time of the first electronic device, and the second electronic device is further configured to determine a difference between the local time of the second electronic device and the local time of the first electronic device according to the local time of the first electronic device and a current local time of the second electronic device.
In a possible implementation manner, the second capture time is a difference value added to or subtracted from a local time of the second electronic device when the second electronic device captures the first video file.
In a possible implementation manner, the second electronic device is further configured to: receiving a second video file, wherein the second video file carries third acquisition time; sending a notification message to the first electronic device, wherein the notification message indicates that the second electronic device receives the second video file; a first electronic device further to: sending a playing instruction to the second electronic equipment, wherein the playing instruction instructs the second electronic equipment to play the second video file; and playing a second audio file corresponding to the third acquisition time.
A sixth aspect, a communication system, comprising: a first electronic device, a second electronic device having a display screen, and a third electronic device; a first electronic device to: receiving voice input of a user and sending the voice input to a server; receiving a first calling instruction sent by a server, wherein the first calling instruction is sent by the server in response to the received voice input, and the first calling instruction carries a call identifier; establishing a call connection with a third electronic device via a server; sending a second calling instruction to the second electronic equipment, wherein the second calling instruction carries a call identifier; the second electronic equipment is used for responding to the received second call instruction and sending the call identifier to the server so that the server can associate the second electronic equipment with the call connection; the first electronic equipment is also used for acquiring sound information to generate a first audio file and sending the first audio file to the third electronic equipment through the server, and the first audio file comprises first acquisition time; the second electronic device is further used for acquiring the image information to generate a first video file and sending the first video file to the third electronic device through the server, and the first video file comprises second acquisition time.
A seventh aspect relates to a first electronic device, including: a processor and a memory coupled to the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when executed by the first electronic device, cause the first electronic device to perform operations comprising: receiving a voice request for playing first content; acquiring an address of the first content according to the voice request; sending an address of the first content to a second electronic device having a display screen associated with the first electronic device; sending a first instruction to second electronic equipment, wherein the first instruction instructs the second electronic equipment to display a picture of first content from a first time, and the picture of the first content is acquired by the second electronic equipment according to an address of the first content; and playing the audio of the first content from the first time, wherein the audio of the first content is acquired by the first electronic equipment according to the address of the first content.
An eighth aspect is a first electronic device, comprising: a processor and a memory coupled to the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when executed by the first electronic device, cause the first electronic device to perform operations comprising: receiving a request to play first content; acquiring first information and second information according to a request for playing first content; sending first information to a second electronic device having a display screen associated with the first electronic device; sending a first instruction to second electronic equipment, wherein the first instruction instructs the second electronic equipment to display an image of first content, and the image of the first content is acquired by the second electronic equipment according to the first information; and playing the audio of the first content, wherein the audio of the first content is acquired by the first electronic equipment according to the second information.
A ninth aspect is a computer storage medium comprising computer instructions that, when executed on a terminal, cause the terminal to perform a method for video call with multiple electronic devices cooperating as in any one of the possible implementations of the first aspect; or, a method of multiple electronic devices cooperating to perform audio/video playing in any possible implementation manner of the second aspect is performed, or a method of multiple electronic devices cooperating in any possible implementation manner of the third aspect is performed.
A tenth aspect of the present invention is a computer program product, which, when run on a computer, causes the computer to execute the method for video call with cooperation of multiple electronic devices as in any of the possible implementations of the first aspect; or, a method of multiple electronic devices cooperating to perform audio/video playing in any possible implementation manner of the second aspect is performed, or a method of multiple electronic devices cooperating in any possible implementation manner of the third aspect is performed.
Drawings
FIG. 1 is a schematic diagram of a system according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of another electronic device provided in the embodiment of the present application;
fig. 4A is a schematic diagram of an electronic device and a smart speaker association provided in an embodiment of the present application;
fig. 4B is a schematic diagram of another electronic device and smart sound box association provided in the embodiment of the present application;
fig. 5 is a schematic process diagram of a method for cooperating an intelligent sound box with an electronic device according to an embodiment of the present application;
fig. 6 is a schematic process diagram of another method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 7 is a schematic process diagram of another method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 8A is a schematic diagram of an association between a smart sound box and an electronic device according to an embodiment of the present application;
fig. 8B is a schematic flowchart of a method for associating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 9A is a schematic flowchart of a method for cooperating an intelligent sound box with an electronic device according to an embodiment of the present application;
fig. 9B is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 9C is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 9D is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 9E is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 9F is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 10 is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 11 is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 12 is a schematic flowchart of a method for cooperating a smart sound box with an electronic device according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an intelligent sound box provided in the embodiment of the present application;
fig. 14 is a schematic structural diagram of another smart sound box according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of another electronic device provided in the embodiment of the present application;
fig. 16 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.
The method for cooperation of the intelligent sound box and the electronic device provided by the embodiment of the application can be applied to a communication system shown in fig. 1. The communication system includes electronic device 100, smart sound box 200, and server 300.
Wherein the electronic device 100 has a display screen. Optionally, the electronic device 100 may further include a camera, a microphone, and the like. For example, the electronic device 100 in the present application may be a mobile phone, a tablet Computer, a Personal Computer (PC), a Personal Digital Assistant (PDA), a smart watch, a netbook, a wearable electronic device, an Augmented Reality (AR) device, a Virtual Reality (VR) device, an in-vehicle device, a smart car, a robot, a digital television, and the like, and the present application does not particularly limit the specific form of the electronic device 100.
The smart speaker 200, which may be a cloud speaker in some embodiments, generally has functions of picking up sound and playing sound, and is a tool for a user to surf the internet by using sound, such as ordering songs, shopping online, or knowing weather forecast, and it may also control smart home devices, such as opening curtains, setting the temperature of a refrigerator, controlling the temperature of a water heater, and the like. The hardware structure of the smart sound box 200 may refer to the structure of the electronic device 100 shown in fig. 2. It should be noted that smart sound box 200 may include more or fewer components than those shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. For example: smart sound box 200 may include, for example, processor 110, internal memory 121, audio module 170, microphone 170C, microphone 170B, wireless communication module 160, charging management module 140, and the like.
The server 300, in some embodiments, may be a cloud server, a speaker cloud server, or a server cluster formed by a plurality of servers. Server 300 may provide cloud services for smart sound box 200, such as: massive music library, video library and the like. Server 300 may also be communicatively connected to other servers (e.g., content providing servers (e.g., weather server, music server, intention recognition server, voice recognition server, etc.), home control cloud server, etc.) to provide more diversified services (e.g., services such as weather forecast, music playing, voice recognition, etc.) for smart sound box 200.
In some embodiments of the present application, a wired or wireless connection manner may be adopted between the smart sound box 200 and the electronic device 100, the wired connection manner may be, for example, connection through a data line or an optical fiber, and the wireless connection manner may be, for example, bluetooth, WiFi, NFC, ZigBee (ZigBee), and the like. For example: smart speaker 200 and electronic device 100 may be connected to the same router via WiFi, respectively, through which smart speaker 200 and electronic device 100 may communicate. Another example is: electronic device 100 has a WiFi hotspot capability, and may open a WiFi hotspot, through which smart speaker 200 is connected to electronic device 100. Smart sound box 200 may access server 300 via a communication network (wide area network), such as WiFi, a mobile data network, etc. In other embodiments of the present application, the electronic device 100 may access the server 300 through a communication network, such as WiFi, mobile data network, and the like.
In this application, after the smart speaker is connected to the electronic device, the smart speaker may request to associate (or control) the electronic device, that is, after the smart speaker is associated with the electronic device, the smart speaker may call a screen, a camera, a microphone, and the like of the electronic device. That is, the smart speaker may send an instruction to the electronic device, instruct a screen of the electronic device to display some content (e.g., a user interface, a video picture, or the like), or instruct a camera of the electronic device to start capturing the video picture, or the like.
In some embodiments, as shown in fig. 4A, the smart speaker may request association from the electronic device, perform authentication by the electronic device, and return the authentication result to the smart speaker. And after receiving the result that the authentication passes, the intelligent sound box associates the electronic equipment. In other embodiments, as shown in fig. 4B, the electronic device may request association from the smart speaker, perform authentication by the smart speaker, and return the authentication result to the electronic device. After the authentication is passed, the intelligent sound box can be associated with the electronic equipment. In still other embodiments, the smart speaker or the electronic device may request the server, perform authentication by the server, and return an authentication result to the smart speaker and/or the electronic device, which is not limited in this embodiment of the application.
In the present application, when playing a video, the smart speaker plays the audio content (or sound information) of the video, and the electronic device plays the video content (or picture information) of the video. However, since the smart speaker and the electronic device are independent devices, and the manner of acquiring the time is very likely to be different, the time of the smart speaker and the time of the electronic device may be different, i.e. the time between the smart speaker and the electronic device may not be synchronized. Therefore, the situation that the sound heard by the user from the intelligent sound box is inconsistent with the picture seen from the electronic equipment is caused, and the user experience is seriously influenced. Similarly, when a video is collected or recorded, the smart speaker collects audio content (or sound information) of the video, the electronic device collects video content (or picture information) of the video, and the electronic device and the smart speaker also have a time synchronization problem.
In order to solve the problem of time synchronization of the electronic equipment and the intelligent sound box, the electronic equipment and the intelligent sound box can use the time of the same time source. For example: the time standard of the intelligent sound box is adopted, or the time standard of the electronic equipment is adopted, or the time standards of other time sources are adopted.
In the following, taking the time standard of the smart speaker (i.e. the time standard of the smart speaker) adopted by both the electronic device and the smart speaker as an example, the implementation of time synchronization between the electronic device and the smart speaker will be described.
In some embodiments of the application, the smart sound box may send its local Time (e.g., Time _1) to the electronic device, and then when the electronic device receives Time _1, the local Time of the electronic device is Time _2, a Time difference between Time _2 and Time _1 is calculated, and the subsequent electronic device determines the local Time of the smart sound box by using the Time difference and the local Time of the electronic device. For example: when the local Time of the electronic device is Time _3, the Time of the smart sound box at this Time is: time _3+ (Time _2-Time _1), that is, the Time standard of the intelligent sound box is adopted, and the Time at this Time is as follows: time _3+ (Time _2-Time _ 1).
For example: the current local Time Time _1 of the smart speaker is 10: 00: and 10, the intelligent loudspeaker box sends the information of the Time _1 to the electronic equipment. When the electronic device receives the Time _1 information, the local Time of the electronic device is Time _2, which is 10: 05: 20. then, the Time difference between Time _2 and Time _1 is 00: 05: 10. subsequently, the electronic device may determine the time of the smart speaker according to the local time of the electronic device plus the time difference. For example: when the local time of the electronic device is 10: 30: 00, it can be determined that the local time of the intelligent sound box is 10: 35: 10.
in some examples, the smart speaker transmits its local time to the electronic device after associating with the electronic device, so that the electronic device determines the time of the smart speaker, i.e., determines the time criteria of the smart speaker. The local time of the smart speaker itself may also be sent to the electronic device while, before, or after the smart speaker instructs the electronic device to display the content to be displayed (e.g., start playing a video, displaying interface content, etc.). The embodiments of the present application do not limit this.
It should be noted that, the method provided in the embodiment of the present application is explained below by taking, as an example, a time standard in which both the electronic device and the smart speaker adopt the smart speaker.
In other embodiments of the present application, after the smart speaker is connected to the electronic device, the smart speaker may pick up the user voice request, and determine the video content or the display content to be played and the audio content to be played according to the user voice request. Or, the intelligent sound box sends the voice request to the server, and the server determines the video content (the video content is image information, picture information, and the like in the audio/video file, and may be a specific video file, an image file, and the like, and may also be a URL link that is played or downloaded online and that corresponds to the video file or the image file) or the display content that needs to be played, and the audio content (the audio content is sound information in the audio/video file, and may also be a specific audio file, and may also be a URL link that is played or downloaded online and that corresponds to the video file or the image file) that needs to be played, and then returns the determined played content to the intelligent sound box. Then, the intelligent sound box controls the electronic equipment to play video content or display content, and the intelligent sound box plays audio content. When the electronic equipment and the intelligent sound box play video content and audio content in the same video, the intelligent sound box controls the synchronous playing of the video content and the audio content.
For example: fig. 5 is a schematic diagram illustrating a method for playing a video according to an embodiment of the present application. Specifically, step 1, a user requests to play a video a (e.g., a movie, a music MV, a video clip, etc.) by voice, where the video a includes audio content and video content. And 2, the intelligent sound box sends the voice request of the user to the server. And 3, the server determines the audio content and the video content in the video A and returns the audio content and the video content to the intelligent sound box.
And 4, transmitting a video A playing instruction to the electronic equipment by the intelligent sound box, wherein the instruction comprises the time a for the electronic equipment to start playing the video A. Then, the electronic device adopts the time standard of the smart speaker, and starts playing the video content of the video a at the time a (i.e., step 5 b). At the same time, the smart audio starts playing the audio content of video a at time a (i.e., step 5 a). Therefore, the electronic equipment and the intelligent sound box can synchronously play the video A.
In still other embodiments of the present application, the smart speaker is connected to the electronic device, and the smart speaker picks up the user voice request, and determines the video content to be captured and the audio content to be captured according to the user voice request. The intelligent sound box controls the electronic equipment to collect video content, and the intelligent sound box collects audio content.
Specifically, when the user requests to record or capture a video B by voice, the video B includes video content and audio content. The intelligent sound box issues a recording or collecting instruction to the electronic equipment, and the recording or collecting instruction comprises the time b for starting recording or collecting. The electronic equipment adopts the time standard of the intelligent sound box, starts to acquire the video content of the video B at the time B, and marks the starting time of the video content of the video B as the time B. Meanwhile, the smart sound box starts to acquire the audio content of the video B at time B, and the starting time of the second audio content is marked as time B. Because the electronic equipment and the intelligent sound box adopt the same time standard, namely the time of the intelligent sound box, the time of the video content and the audio content of the video B is synchronized.
Fig. 6 and 7 are schematic diagrams illustrating an example of applying the method provided by the embodiment of the present application to a video call service.
As shown in fig. 6, a schematic diagram of a process for making a video call for a calling party (user a) is shown. Specifically, the user a requests a video call by voice, the smart speaker sends the request to the server 1 where the user a is located, and the server 1 determines that the request is a video call service. And sending a video call request to a server of the audio and video call service through the server 1, and establishing a call link. Then, the server 1 transmits a video call instruction to the smart speaker, and the smart speaker transmits the video call instruction to the electronic device, where the video call instruction includes a Time (e.g., Time _4) for starting a video call.
Subsequently, the electronic device starts to collect the video content from Time _4 according to the Time standard of the intelligent sound box, marks the collected video content by adopting the Time standard of the intelligent sound box, namely, stamps a timestamp, and uploads the timestamp to the server 1. Meanwhile, the smart sound box also starts to collect audio content from Time _4, marks the collected audio content by adopting the Time standard of the smart sound box, namely, stamps a timestamp, and uploads the timestamp to the server 1. The server 1 merges the video content and the audio content marked with the same time stamp. After being combined, the audio and video call service is sent to a server 2 corresponding to the receiver (user B). In some embodiments, the electronic device may also capture video content and audio content simultaneously, that is, capture video content through its own camera, and capture audio content through its own sound pickup device, such as a microphone. Therefore, when the audio content is not collected by the intelligent sound box, the electronic equipment can also collect the audio content. The embodiments of the present application are not limited.
Fig. 7 is a schematic diagram of a process for receiving a video call for a receiver (user B). In some embodiments, after receiving the audio and video data (i.e., the data obtained by combining the video content and the audio content marked with the same timestamp) forwarded by the audio and video call service server, the server 2 splits the audio and video data into the video content and the audio content. And then, the video content is sent to the electronic equipment, and the audio content is sent to the intelligent sound box. In other embodiments, the server 2 may also directly send the received audio/video data to the smart speaker, and the smart speaker splits the audio/video data, and splits the audio/video data into video content and audio content. And then, locally storing the audio content and sending the video content to the electronic equipment.
Subsequently, the smart speaker sends an instruction to start playing to the electronic device, the instruction includes Time _5 to start playing, and Time _5 is the Time of the smart speaker. The instructions also include playing the content at the specified time, i.e., playing the video content at the specified time. In this way, the electronic device starts playing the video content at the specified Time at Time _5 by using the Time standard of the smart speaker. At the same Time, the smart speaker also starts playing the audio content at Time _5 for that specified Time.
For example: the start playing instruction sent by the intelligent sound box instructs the electronic equipment to start playing the video from Time _5, and the video is played in 2018, 12 months, 1 day 13: 10 minutes (specified time) of video content. Then, the electronic device starts playing 2018, 12, 1, 13 at Time _ 5: 10 points of video content; meanwhile, the smart sound box also starts playing at Time _5 for 12 months, 1 day, 13 in 2018: 10 cents of audio content.
Therefore, the electronic equipment and the intelligent sound box can start playing at the same time, and the appointed time of playing the content is also the same, so that the electronic equipment and the intelligent sound box can be synchronized in the time of playing the same audio and video content.
Fig. 2 shows a schematic structural diagram of the electronic device 100.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.
The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K via an I2C interface, such that the processor 110 and the touch sensor 180K communicate via an I2C bus interface to implement the touch functionality of the electronic device 100.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.
MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture functionality of electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.
It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.
The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.
The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it can receive voice by placing the receiver 170B close to the ear of the person. The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on. The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic apparatus 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.
The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.
The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.
The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and so on.
The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 100 to shut down abnormally. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.
The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic apparatus 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
The software system of the electronic device 100 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention uses an Android system with a layered architecture as an example to exemplarily illustrate a software structure of the electronic device 100.
Fig. 3 is a block diagram of the software configuration of the electronic apparatus 100 according to the embodiment of the present invention.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages.
As shown in fig. 3, the application layer may include application packages such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
In the embodiment of the application, the application program layer includes a first application, and the first application may be a sound box application, a home control application, and the like, and may be used to provide an interface for a user to interact with the smart sound box, so as to implement setting and management of the smart sound box by the user. The first application can also call a screen, a camera device, a microphone, a sensor and other hardware devices of the electronic device through the display driver, the camera driver, the microphone driver, the sensor driver and the like of the kernel layer according to the instruction sent by the intelligent sound box.
For example: the first application may display corresponding content according to the specific content requested by the user. For example: when the user requests a voice call, the first application displays an interface during the call, such as an interface for making a call, an interface for receiving a call, and the like. For another example: and when the user requests to play the audio and video, the first application can play the picture information of the audio and video. Another example is: the user requests to inquire weather, and the first application can display information such as a background picture of the weather. Another example is: if the user requests to play music, the first application may play information related to playing music, such as: song title, singer, lyrics, MV, album information, etc. The display of the first application is not limited in the embodiment of the present application.
In some embodiments, the first application may include a speaker distribution network module, an account management module, an upgrade management module, a problem feedback module, and the like. The distribution network module can be used for association of the electronic equipment and the intelligent loudspeaker box and the like, the account management module can be used for managing an account and the like of the first application, the upgrade management module is used for managing software version upgrade and the like of the first application, and the problem feedback module is used for collecting problems fed back by a user and information such as errors and the like of the first application in the operation process.
As shown in FIG. 3, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The following describes exemplary workflow of the software and hardware of the electronic device 100 in connection with capturing a photo scene.
When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and taking a control corresponding to the click operation as a control of a camera application icon as an example, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera drive by calling a kernel layer, and captures a still image or a video through the camera 193.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
After the intelligent sound box is connected with the electronic equipment, the intelligent sound box and the electronic equipment need authentication firstly, and after the authentication is passed, the intelligent sound box can call a screen, a camera, a microphone and other devices of the electronic equipment. That is, the smart speaker may send an instruction to the electronic device, instruct a screen of the electronic device to display some content (e.g., a user interface, a video picture, or the like), or instruct a camera of the electronic device to start capturing the video picture, or the like.
Generally, if a plurality of devices (e.g., smart speakers, electronic devices, etc.) log in to a server using the same account, the plurality of devices may be considered to belong to the same user and may trust each other. In some embodiments, the smart speaker may request association from the electronic device, perform authentication by the electronic device, and return the authentication result to the smart speaker. And after receiving the result of passing the authentication, the intelligent sound box establishes association with the electronic equipment. In other embodiments, the electronic device may request association from the smart speaker, perform authentication by the smart speaker, and return the authentication result to the electronic device. And after the authentication is passed, the intelligent loudspeaker box establishes association with the electronic equipment. In still other embodiments, the smart speaker or the electronic device may request the server, perform authentication by the server, and return an authentication result to the smart speaker and/or the electronic device, which is not limited in this embodiment of the application.
In some embodiments, after the smart speaker and the electronic device establish the bluetooth connection, when the smart speaker and the electronic device are close to each other, it may be considered that the user has an intention to play audio in audio and video using the smart speaker and play video in audio and video using the electronic device. In other words, if it is determined that the distance between the smart speaker and the electronic device is smaller than the threshold, the smart speaker and the electronic device may automatically perform authentication. After the authentication is passed, the smart sound box can directly instruct the electronic device to open the first application, or automatically switch to a mode of playing the video in a full screen mode. Therefore, after the electronic equipment receives the video playing instruction of the intelligent sound box, the electronic equipment can play the video in a full screen mode.
As shown in fig. 8A, an interface schematic diagram before and after association of a smart sound box and an electronic device provided in the embodiment of the present application is shown. When the distance between the smart speaker and the electronic device is greater than or equal to the threshold value, the smart speaker and the electronic device may not be associated. In other words, the electronic device and the smart speaker may be used independently or not, that is, the electronic device may normally display a corresponding interface or be in a screen-locking state according to the operation of the user (as shown in fig. 8A). When the intelligent sound box and the electronic equipment are close to each other and the distance between the intelligent sound box and the electronic equipment is smaller than the threshold value, the intelligent sound box and the electronic equipment are automatically associated. After association, the electronic device may be automatically expanded to a display device of the smart device. In other words, the electronic device may automatically open the first application, display an interface of the first application, or automatically switch to a mode of playing video in full screen, or the like. In some embodiments, the electronic device is in a blank screen state or a lock screen state prior to being automatically associated with the smart speaker. Then, after the electronic device and the smart sound box are automatically associated, the electronic device may automatically turn on a screen, automatically unlock the screen, and automatically open the first application or switch to a full-screen display interface. The embodiment of the application does not specifically limit the operation automatically executed by the electronic equipment after the electronic equipment is associated with the intelligent sound box.
It should be noted that, in the above example, whether to automatically associate is determined based on the distance between the smart speaker and the electronic device, or whether to automatically associate is determined based on the signal strength of the bluetooth connection between the smart speaker and the electronic device, that is, when the signal strength of the bluetooth connection between the smart speaker and the electronic device is greater than a certain threshold, the smart speaker and the electronic device may be considered to be close to each other, and may automatically associate. The specific method for automatically associating the intelligent sound box with the electronic equipment is not limited.
As shown in fig. 8B, a method for associating an intelligent speaker with an electronic device provided in the embodiment of the present application includes the following steps:
s101a, the smart sound box logs in the server by using the first account information, and the first login information is obtained and stored from the server.
The first login information is an identity obtained by the intelligent sound box from the server side after the intelligent sound box logs in the server.
In some examples, the smart sound box may be configured with a network through a smart sound box APP on the electronic device. The electronic device here may be any electronic device with a screen, for example: provided is a mobile phone. The distribution network rough process may be, for example: on the one hand, the electronic equipment and the intelligent sound box are connected in a Bluetooth mode. Or the electronic device and the intelligent sound box can be connected through temporary WIFI. For example: the intelligent sound box establishes temporary WIFI after being powered on, namely the intelligent sound box is in an access point mode. The electronic device can be connected to the temporary WIFI network to form a local area network. Optionally, temporary WIFI may also be established by the electronic device, and the smart sound box is connected to the temporary WIFI network to form a local area network. In another aspect, the electronic device is connected to the internet via a WiFi network. Then, the electronic device sends the information of WiFi (for example, the name and the password of the WiFi) to the smart sound box through the Bluetooth connection or the temporary WIFI network, and the smart sound box is connected to the Internet according to the WiFi information. The electronic equipment logs in the server through the WIFI network to apply for an identification (namely the identification of the intelligent sound box) for the intelligent sound box. The server sends the identification of the intelligent sound box to the intelligent sound box. Then, the intelligent sound box registers an account number by using the identification of the intelligent sound box, and the server binds the identification of the intelligent sound box with the account number information (namely, the first account number information) registered by the user, namely, the intelligent sound box logs in the server.
When the smart speaker logs in successfully, the server creates an Access Token (AT), namely AT1, which contains the security information of the login session. Thereafter, all processes running in the user identity have a copy of the AT1, that is, the AT1 is the identity of the smart speaker AT the time of accessing the server, i.e., the first login information. The server sends the first login information to the intelligent sound box, and the intelligent sound box locally stores the first login information. It should be noted that the AT1 has a validity period and needs to be reacquired if it expires.
Therefore, the server stores the identification of the intelligent sound box, the first account information of the intelligent sound box and the corresponding relation of the first login information.
S101b, the electronic device logs in the server by using the second account information, and the second login information is obtained and stored from the server.
The second login information is an identity of the electronic equipment on the server side after the electronic equipment logs in the server.
Similarly, the electronic device may log in to the server using the second account information (including the account number and password) through WiFi or a mobile data network. When the electronic device successfully logs in, the server creates an Access Token (AT), i.e. AT2, and the AT2 is the identity of the electronic device when accessing the server, i.e. the second login information. And the server sends the second login information to the electronic equipment, and the electronic equipment locally stores the second login information. It should be noted that the AT2 has a validity period and needs to be reacquired if it expires.
Therefore, the server stores the corresponding relation between the second account information and the second login information of the electronic equipment. That is, as shown in table 1, the server maintains information of all devices logged into the server. Subsequently, the server may determine an access token of the device, an identifier of the device, account information used when the device logs in, and the like according to the access token.
TABLE 1
Figure BDA0003341540660000181
S101, 101c, the intelligent sound box and the electronic device are in communication connection.
As mentioned above, the smart speaker and the electronic device may be connected in a wired or wireless manner, the wired connection may be, for example, connection via a data line (for example, connection via a USB interface) or connection via an optical fiber (for example, connection via an optical fiber interface), and the wireless connection may be, for example, bluetooth, WiFi, NFC, ZigBee (ZigBee), and the like, which are not described herein again.
For example: electronic equipment can establish the connection with intelligent audio amplifier through audio amplifier APP (or other APPs such as house control APP). Take the connection between smart speaker and the electronic device through bluetooth as an example, an exemplary description is made.
When first connecting, bluetooth is opened to intelligent audio amplifier. The electronic equipment opens the audio amplifier APP, starts the Bluetooth and scans nearby Bluetooth. After the bluetooth of intelligent audio amplifier is scanned, pair the connection. The specific bluetooth establishment process may refer to the prior art, and is not described herein in detail. After, when bluetooth is all opened to smart audio amplifier and electronic equipment, if when two equipment were close to, can automatic connection.
It should be noted that the execution sequence of the above steps S101a, S101b, and S101c is not limited in the embodiment of the present application.
It should be noted that, after the smart speaker and the electronic device are connected in a communication manner, communication can be performed between the smart speaker and the electronic device, but no association is established between the smart speaker and the electronic device. That is, at this time, the smart speaker cannot call the hardware device of the electronic device, for example: a display, a camera, a microphone, a sensor, etc.
S102a, the smart sound box sends a first request to the electronic equipment, and the request is related to the electronic equipment.
The first request carries login information of the intelligent sound box, namely first login information, so that the electronic equipment can determine whether to allow the intelligent sound box to be associated according to the first login information.
S103a, the electronic device sends a second request to the server, wherein the second request is used for requesting to acquire the first account information corresponding to the first login information.
As mentioned above, the server stores the corresponding relationship between the account information and the login information of each device logged in to the server. Then, in some embodiments of the application, the server may query, according to the first login information sent by the electronic device, first account information corresponding to the first login information, that is, account information of the smart speaker, and return the first account information to the electronic device, so that the electronic device determines whether to allow association of the smart speaker. In other embodiments of the application, when the server queries the first account information, the server may also query second account information corresponding to the second login information, that is, account information of the electronic device, according to the login information of the electronic device, that is, the second login information. The server compares the first account information with the second account information, determines whether the intelligent sound box and the electronic equipment use the same account for login, and sends a result to the electronic equipment so that the electronic equipment can determine whether to allow the intelligent sound box to be associated. The embodiments of the present application do not limit this.
And S104a, if the first account information is the same as the second account information used when the electronic equipment logs in, the electronic equipment sends a response of allowing the association to the smart sound box.
Generally, if multiple devices log in to a server using the same account, the multiple devices may be considered to belong to the same user and may trust each other. Therefore, in the embodiment of the application, if the smart speaker and the electronic device log in the server by using the same account, the smart speaker is allowed to be associated with the electronic device.
In some embodiments of the application, the electronic device compares the second account information returned by the server with the first account information used when the electronic device logs in. If the information of the two accounts is the same, the two devices are logged in by using the same account and belong to the same user, and the electronic device sends a response allowing association to the intelligent sound box. Then, after receiving the response of allowing the association, the smart speaker may send a corresponding control command to the electronic device.
If the two account information are different, the two devices are considered to belong to one user, and the smart sound box is not allowed to be associated with the electronic device, namely, a response that the association is not allowed is sent to the smart sound box, or no response is returned. And then, after receiving the response that the association is not allowed, or after not receiving the response that the association is allowed within a preset time, the smart sound box determines that the corresponding control command cannot be sent to the electronic equipment.
Optionally, the electronic device may also return an authentication result indicating whether the associated smart speaker is allowed to be associated to the server, so that when the subsequent server receives the audio and video content, the audio and video content may be split into the video content and the audio content according to the authentication result.
In other embodiments of the present application, the server has determined whether the first account information and the second account information are the same, and may return a determination result (i.e., an authentication result) of whether to allow the smart sound box to associate with the electronic device to the electronic device. In this case, the electronic device may return a response to the smart speaker that allows the association or a response that does not allow the association directly. Or, the server may directly return the determination result of whether to allow the smart speaker to associate with the electronic device to the smart speaker, and the smart speaker may determine whether to associate with the electronic device according to the determination result.
In steps S102a to S104a, the smart speaker requests the electronic device to associate with the electronic device. In other embodiments of the present application, the electronic device may also send the request association to the smart sound box, that is, steps S102a to S104a may be replaced with steps S102b to S104b, as follows:
s102b, the electronic device sends a third request to the smart sound box to request the smart sound box to associate with the electronic device.
The third request carries login information of the electronic device, namely second login information, so that the intelligent sound box can determine whether to associate the electronic device according to the second login information.
S103b, the smart sound box sends a fourth request to the server, and the fourth request is used for requesting to acquire second account information corresponding to the second login information.
As mentioned above, the server stores the corresponding relationship between the account information and the login information of each device logged in to the server. In some embodiments, the server may query, according to the second login information sent by the smart speaker, second account information corresponding to the second login information, that is, account information of the electronic device, and return the second account information to the smart speaker, so that the smart speaker determines whether to associate the electronic device. In other embodiments, when the server queries the second account information, the server may also query first account information corresponding to the first login information, that is, the account information of the smart speaker, according to the login information of the smart speaker, that is, the first login information. The server compares the first account information with the second account information, determines whether the intelligent sound box and the electronic equipment use the same account to log in, and sends a result to the intelligent sound box so that the intelligent sound box can determine whether the electronic equipment is associated. The embodiments of the present application do not limit this.
And S104b, if the second account information is the same as the first account information used when the smart sound box logs in, the smart sound box sends a response of allowing association to the electronic equipment.
Generally, if multiple devices log in to a server using the same account, the multiple devices may be considered to belong to the same user and may trust each other. Therefore, in the embodiment of the application, if the smart speaker and the electronic device log in the server by using the same account, the smart speaker is allowed to be associated with the electronic device.
In some embodiments of the application, the smart sound box compares the second account information returned by the server with the first account information used when the smart sound box logs in. If the information of the two accounts is the same, the two devices are logged in by using the same account and belong to the same user, and the intelligent sound box sends a response allowing association to the electronic device. And then, the electronic equipment executes corresponding operation after receiving the control command of the intelligent sound box.
If the two account information are different, the two devices are considered to belong to one user, and the smart sound box is not allowed to be associated with the electronic device, namely, a response that the association is not allowed is sent to the smart sound box, or no response is returned. And then, after receiving the response that the association is not allowed, or after not receiving the response that the association is allowed within a preset time, the smart sound box determines that the corresponding control command cannot be sent to the electronic equipment.
After step S104a or S104b, the smart sound box may send a control command to the electronic device, such as: the intelligent sound box indicates the electronic equipment to display in a full screen mode. For another example: the intelligent sound box indicates the electronic equipment to start the camera and collect images. Another example is: the smart speaker may instruct the electronic device to begin playing video, etc. Another example is: the smart speaker may send the local time of the smart speaker to the electronic device so that the electronic device may determine the time of the smart speaker according to the local time of the smart speaker. The specific method for acquiring the time of the smart sound box is mentioned above, and is not repeated herein. Another example is: if electronic equipment has opened pronunciation assistant APP, in order to avoid the function conflict with intelligent audio amplifier, intelligent audio amplifier also can instruct electronic equipment to withdraw from pronunciation assistant APP, perhaps suspends to use pronunciation assistant APP, and when intelligent audio amplifier and electronic equipment disconnection, electronic equipment also can resume to use pronunciation assistant APP.
The method for controlling the electronic device to play the video or display the picture by the smart sound box provided by the embodiment of the present application is described in detail below with reference to specific scenes.
As shown in fig. 9A, a schematic view of a method for cooperating an intelligent sound box and an electronic device provided in an embodiment of the present application is as follows:
s201, the intelligent sound box receives a first voice of a user.
The first voice of the user is used for requesting to play a first audio and video, and the first audio and video comprises a first audio and a first video. For example: the first audio and video can be a movie, a video clip, a music MV, etc., and the first audio is sound information of the first audio and video, i.e., audio content. The first video is picture information of the first audio and video, namely video content or display content.
It can be understood that, in the embodiment of the application, in consideration of the effects of high-quality sound receiving and playing of the smart sound box, the smart sound box is used for receiving sound, that is, receiving the first voice of the user, which is beneficial to more accurately identifying the intention of the user. When the first audio and video is played, the first audio frequency in the first audio and video is played through the intelligent sound box, so that the tone quality effect of playing the first audio and video can be improved; the first video of the first audio and video is displayed through the electronic equipment, the limitation of a screen-free sound box is broken through, the picture content of the first audio and video can be played, and the user experience is improved.
S202, the intelligent sound box reports the first voice to the first server.
S203, the first server determines a first audio and video needing to be played according to the first voice.
In steps S202-S203, the first server may be, for example, a server in the cloud end where the smart speaker logs in, and may perform intention recognition on the voice of the user delivered by the smart speaker, and perform corresponding processing according to the recognized intention, for example, return corresponding content, or request corresponding service from another server, and the like. For example, when the first server determines a service requested by the user according to the first voice, for example, playing a first audio/video, the first audio/video may be found from a local server or another server (i.e., a second server, for example, an audio/video content server), or an online playing address or a downloading address of the first audio/video may be obtained.
S204, the first server returns the determined content of the first audio and video to the intelligent sound box.
The content of the first audio/video may be a file of the first audio/video, and may also be address information of the first audio/video, for example: the online playing address or the downloading address of the first audio/video may also be identification information of the first audio/video, for example: information describing the first audio-video includes keyword information and the like, for example: xu ru yun, Du jiao xi, MV, etc., for example: movies, banging my country, etc.
Optionally, the content of the first audio-video may further include first video content and first audio content. The first video content may include a file of the first video, or address information of the first video, such as an online playing address or a downloading address of the first video. The first audio content includes a first audio file, or address information of the first audio, such as an online playing address or a downloading address of the first audio.
In some embodiments of the present application, the first server may download the first audio/video from the second server or play the first audio/video online according to the address information of the first audio/video. Or the first server searches and downloads or online plays the first audio and video from the second server according to the identification information of the first audio and video. Because the first audio and video comprises the video content and the audio content, the first server can split the first audio and video into the first video and the first audio. Then, first video content is generated from the first video, and the first video content may include address information of the first video. First audio content is generated according to the first audio, and the first audio content can comprise address information of the first audio. Then, the first video content and the first audio content are returned to the smart speaker, i.e., step S205 is executed.
In other embodiments of the application, the first server may directly return the acquired content of the first audio/video to the smart speaker, that is, execute step S205.
S205, the intelligent sound box sends the content of the first audio and video to the electronic equipment.
S206, the intelligent sound box acquires a first audio file according to the content of the first audio and video; the electronic equipment acquires a first video file according to the content of the first audio and video.
In some embodiments of the application, when the content of the first audio/video includes address information of the first audio and address information of the first video, the smart sound box may download or play the first audio file on line from the first server according to the address information of the first audio. Meanwhile, the electronic equipment can download or play the first video content from the first server on line according to the address information of the first video sent by the intelligent sound box.
In other embodiments of the application, when the content of the first audio/video includes a download address of the first audio/video, the smart sound box downloads the content of the first audio/video from the second server according to the download address of the first audio/video, and splits the first audio/video to split the first audio/video content in the first audio/video. Meanwhile, the electronic equipment downloads the content of the first audio and video from the second server according to the download address of the first audio and video, splits the first audio and video and splits the first video content of the first audio and video.
It can be understood that, in the embodiment of the present application, the smart speaker may start playing the entire first audio file after downloading the entire first audio file, or may continue to download the remaining audio files while playing the downloaded partial audio file. Similarly, the electronic device may start playing the entire first video file after downloading the entire first video file, or may continue downloading the remaining video files while playing the downloaded portion of the video file.
S207, the intelligent sound box sends a first playing instruction to the electronic equipment, wherein the first playing instruction comprises playing time, namely the first time.
It should be noted that the first play instruction sent by the smart speaker to the electronic device may be a single message, or may be carried in other messages, for example: the information of the content of the first video sent to the electronic device by the smart sound box in step S205 is carried, which is not specifically limited in this embodiment of the application.
S208, the intelligent sound box starts to play the first audio file from the first time, and the electronic equipment starts to play the first video file from the first time.
In some embodiments, the smart speaker plays the first audio file at a first time using the time standard of the smart speaker. Meanwhile, the electronic equipment adopts the time standard of the intelligent sound box, and the first video file is played at the first time. Therefore, although the video picture and the sound in the first audio and video are respectively played on different devices, the situation that the playing progress of the sound and the picture is inconsistent can be avoided because the time of the two devices is synchronized. In the above description, how the electronic device obtains the time standard of the smart speaker to achieve time synchronization with the smart speaker is described in detail, and details are not repeated here.
It is to be understood that the electronic device and the smart speaker may also adopt other time synchronization methods, for example, both adopt the time standard of the electronic device, or both adopt other time standards, and the embodiment of the present application is not particularly limited.
Further, consider that in the process of broadcast, there are factors such as intelligent audio amplifier and electronic equipment's broadcast speed difference, or download speed difference, cause the broadcast progress of two equipment to take place inconsistent circumstances, this application embodiment can also be in the broadcast process, monitor the broadcast progress of two equipment, in time adjust when the broadcast progress is inconsistent as follows:
in some embodiments, the electronic device reports the playing progress, and the smart speaker adjusts the playing progress, or instructs the electronic device to adjust the playing progress. Specifically, when the electronic device plays the first video file, the playing progress is periodically reported to the smart sound box. When the playing progress of the electronic device is different from that of the intelligent sound box (or the playing progress of the two devices meets a certain condition, for example, the difference value of the playing progress of the two devices reaches a threshold value, for example, 3 seconds), the intelligent sound box can adjust the playing progress of the intelligent sound box according to the playing of the electronic device, and the intelligent sound box can also send a playing instruction to the electronic device again to instruct the electronic device to adjust the playing progress to the current playing progress of the intelligent sound box. The playing progress can be the current playing position, or the proportion of the played video occupying the whole video. For example: the whole video is 20 minutes, and the electronic device has played to a position of 10 minutes and 30 seconds currently, wherein the playing progress can be 10 minutes and 30 seconds, and can also be 52.5%.
For example: the time length of the first audio and video file is 15 minutes, namely the first audio and video file and the first audio file are both 15 minutes. The playing progress of the current first video file reported by the electronic device is 09: 10 minutes, and the playing progress of the current first audio file of the smart sound box is 09: 00. the smart speaker may directly output the audio data from 09: 10 begin playing. The smart sound box may also send a play instruction again, instructing the electronic device to play the audio file from 09: 00 begin playing. It can be understood that, considering that a certain time is required from the generation of the play instruction by the smart speaker to the reception of the play instruction by the electronic device, the play instruction of the smart speaker may instruct the electronic device to play the first video file from 09: 00 begin playing at a later time (e.g., 09: 02, etc.).
In other embodiments, the smart speaker reports the playing progress, and the electronic device adjusts the playing progress, or instructs the smart speaker to adjust the playing progress. Specifically, when the smart sound box plays the first audio file, the smart sound box periodically sends a playing progress to the electronic device. When the playing progress of the smart sound box is different from the playing progress of the electronic device (or the playing progress of the two devices meets a certain condition, for example, the difference between the playing progress of the two devices reaches a threshold value, for example, 3 seconds), the electronic device may adjust the playing progress of the electronic device according to the playing of the smart sound box, or may request the smart sound box for a synchronous playing progress, so as to request the current playing progress of the smart sound box and the electronic device.
It can be understood that the methods of steps S201-S208 described above can also be applied to various scenes where the smart speaker plays audio and the electronic device displays an interface.
For example: as shown in fig. 9B, an example of a scenario where the method provided by the embodiment of the present application is applied to play music is shown. Specifically, step 1, the smart speaker receives a voice request for playing music. And step 2, the intelligent sound box sends the voice request of the user to the sound box server. And 3, the sound box server returns the audio content and the display content of the music to the intelligent sound box. And 4, the intelligent sound box sends a playing instruction to the electronic equipment, wherein the playing instruction comprises display content returned by the server. And step 5a, the intelligent sound box acquires the music file from the music server. And 5b, the electronic equipment acquires interface display content from the music server or the sound box server. And 6a, playing music by the intelligent sound box. And 6b, displaying the picture content corresponding to the music by the electronic equipment.
Fig. 9C is a flowchart of a method for playing music according to an embodiment of the present application. Specifically, the user requests to play music through the smart speaker, and the server returns, for example, a download address of the music and interface display content corresponding to the music after performing intention recognition. The interface display content may include information related to the music, such as albums, singers, lyrics, and the like. In some embodiments, the interface display content may also be predefined, i.e., the intent recognition server has the interface display content stored therein. Then, the intention recognition server can also directly return the interface display content to the speaker server and the smart speaker. And finally, the intelligent sound box returns the sound to the electronic equipment for displaying. In other embodiments, the interface display content may also be stored in the music server, and then the intention identification server may return the download address corresponding to the interface display content to the sound box server, the smart sound box, and the electronic device downloads the interface display content from the music server according to the download address corresponding to the interface display content. The method for acquiring the interface display content and the music by the electronic device is similar to the processing method for acquiring the first video in the steps S201 to S208, and is not repeated here. Then, the intelligent sound box plays the audio frequency of the music, and the electronic equipment displays the interface display content corresponding to the music.
For another example: as shown in fig. 9D, an example of a scenario in which the method provided by the embodiment of the present application is applied to a voice conversation is shown. Specifically, step 1, the smart speaker receives a voice conversation of the user. And 2, the intelligent sound box sends the voice of the user to the server. And step 3, the server returns a response to the intelligent sound box. And 4, the intelligent sound box sends a display instruction to the electronic equipment, wherein the display instruction comprises display content (or a download address corresponding to the display content) returned by the server. And 5, the electronic equipment acquires the characters after the voice recognition of the server or the characters of the response. And 6a, the intelligent sound box plays the voice of the reply. And 6b, displaying the corresponding content by the electronic equipment.
Fig. 9E is a flowchart of a method for voice dialog according to an embodiment of the present application. Specifically, the user performs a voice conversation with the server through the smart speaker, the server performs voice recognition (the server may transmit the voice of the user to another server, for example, a voice recognition server, an intention recognition server, or the like, and performs voice recognition), the recognized text is transmitted to the electronic device, or the text is transmitted to the electronic device through the smart speaker, a conversation page is displayed by the electronic device, and the text converted after voice recognition is displayed.
Another example is: as shown in fig. 9F, for the method for querying weather by voice according to the embodiment of the present application, specifically, the user inquires about a weather condition through the smart speaker, and after performing intent recognition by the server, returns weather content, where the weather content may include voice (for playing the weather condition) and interface display content (interface display weather information, including characters, pictures, animations, and the like). In some embodiments, the interface display content may be predefined, i.e., the intent recognition server has stored therein the interface display content. Then, the intention recognition server can also directly return the interface display content to the speaker server and the smart speaker. And finally, the intelligent sound box returns the sound to the electronic equipment for displaying. In other embodiments, the interface display content may also be stored in the weather server, and then the intention identification server may return the download address corresponding to the interface display content to the sound box server, the smart sound box, and the electronic device downloads the interface display content from the weather server according to the download address corresponding to the interface display content. The method for acquiring the interface display content and the weather broadcast voice by the electronic device is similar to the processing method for acquiring the first video in the steps S201 to S208, and is not repeated here. Then, the intelligent sound box plays the voice played by the weather, and the electronic equipment displays characters, pictures or animations and the like related to weather information.
Fig. 10 and fig. 11 are schematic diagrams of a method for video call according to an embodiment of the present application, where fig. 10 is an interaction flowchart of each device of a calling party and a server, and fig. 11 is an interaction flowchart of each device of a called party and a server.
For convenience of description, the smart speaker used by the user of the calling party (which may be referred to as user a) is referred to as smart speaker a, and the electronic device used may be referred to as electronic device a. The smart sound box used by the user of the called party (which can be recorded as user B) is recorded as smart sound box B, and the electronic equipment used by the user can be recorded as electronic equipment B. The following describes in detail the establishment of a video call and the communication process with reference to fig. 10 and 11, specifically as follows:
s301, the intelligent sound box A receives second voice of the user A.
Referring to fig. 10, the second voice is used to request a video call with the user B. For example: the second voice may be a request to call user B, or a request to make a video call with user B, etc.
S302, the intelligent sound box A sends the second voice to the first server.
The first server is a cloud server logged in by the intelligent sound box A and can provide cloud service for the intelligent sound box A. The first server may further include an intention recognition server, which may perform intention recognition on the voice input by the user through the smart sound box a, and the like.
As mentioned above, the smart speaker a needs to log in the first server by using the account and the password, the first server binds the identifier and the account of the smart speaker a, and assigns an access token for the smart speaker a, and then the smart speaker a can carry the access token when communicating with the first server.
That is, in this step, when smart speaker a sends the second voice to the first server, it will also carry the access token. Therefore, the first server can determine the account used by the intelligent sound box A during login and the identification of the intelligent sound box A according to the access token. That is, the first server may determine, according to the correspondence shown in table 1, the account number used when the smart speaker a logs in, and the identifier of the smart speaker a. The identifier of the smart speaker a may be, for example, an MAC address of the smart speaker a, so that the first server sends a message to the smart speaker a according to the identifier of the smart speaker a.
And S303, the first server determines the call intention according to the second voice.
Specifically, the first server determines the call intention of the user A according to the second voice, namely, establishes the video call with the user B. Then, the first server establishes a call identifier for the call (which can be regarded as a call) to mark the call. That is, the message related to the call will carry the call identifier.
It is understood that smart speaker a and electronic device a have established a connection, and that smart speaker a may control electronic device a. For a specific method for establishing connection and implementing control, reference may be made to the above-mentioned method, which is not described herein again. As mentioned above, the premise that the smart speaker a can control the electronic device a is that the account used when the smart speaker a logs in the first server is the same as the account used when the electronic device a logs in the first server. That is to say, the first server may determine, according to a login account used when the smart speaker a logs in the first server, an identifier of the electronic device logged in using the same account, and an access token of the electronic device. That is, the first server may determine the identifier of the electronic device a according to the corresponding relationship shown in table 2, so as to communicate with the electronic device. The identifier of the electronic device may be, for example, a MAC address of the electronic device, so that the first server sends a message to the electronic device a according to the identifier of the electronic device a.
TABLE 2
Type of device Device identification Login account Access token
Intelligent sound box MAC1 (Intelligent sound box A) Account number 1 AT1
Electronic equipment (for example: mobile phone) MAC2 (electronic equipment A) Account number 1 AT2
S304, the first server sends a voice prompt instruction to the intelligent sound box A, and the intelligent sound box A plays a voice prompt to prompt the user A that the user B is calling.
S305, the intelligent sound box A sends an interface prompt instruction to the electronic equipment A, and the electronic equipment A displays a prompt interface to prompt the user A that the user B is being called.
It can be understood that, in the present application, the user a may be prompted by any one of the smart sound box a and the electronic device a, that is, step S304 or S305 is executed; both ways may also be employed, i.e. performing steps S304 and S305; step S304 may be combined with step S306, and step S305 may be combined with step S307; the user may not be prompted, that is, steps S304 and S305 are not executed, and the embodiment of the present application is not limited.
S306, the first server sends a calling instruction to the intelligent sound box A.
The call instruction carries a call identifier and can be used for identifying one-way call. The call instruction may also carry identification information of the called party, for example: a telephone number, etc., so that the smart speaker a establishes a call connection with the called party according to the telephone number.
And S307, the intelligent sound box A sends a calling instruction to the electronic equipment A.
The call instruction carries the call identifier, so that the electronic equipment A can request the second server to associate the electronic equipment A with the call connection corresponding to the call identifier conveniently based on the call identifier.
And S308, the intelligent sound box A sends a call request to the second server through the first server.
The second server is a server capable of providing an audio/video call service, for example: a Communication As A Service (CAAS) server, etc. The second server may be a different server from the first server, or may be the same server as the first server. For example: when the second server and the first server are different servers, the first server provides a sound box cloud service, and the second server provides an audio and video call service. Another example is: when the second server and the first server are the same server and the first server is the same server, namely, the server provides both the sound box cloud service and the audio and video communication service. In this case, the information transmission between the first server and the second server may be understood as internal communication between the same servers, and will not be described in detail below.
The call request may carry a calling party number, a called party number, and a service type. The service types comprise: video calls, audio calls, etc.
And S309, the second server pushes the incoming call event to the intelligent sound box B through the third server.
Referring to fig. 11, the third server is a cloud server logged in by the smart speaker B, and can provide cloud services for the smart speaker B. The third server may be the same as the first server or may be a different server, which is not limited in this embodiment of the application.
For example, after the second server sends the call request according to the first server, the second server is determined according to the called party number carried in the call request, and an incoming call event is pushed to the second server, where the incoming call event carries the calling party number, the service type, and the like.
And S310, ringing the intelligent sound box B to prompt the user B that the call comes.
And S311, the smart sound box B receives the third voice of the user B. And the third voice is used for indicating whether to answer the incoming call.
Illustratively, after hearing the ring of the incoming call of the smart speaker B, the user B may answer or not answer the incoming call through a voice instruction, that is, a third voice. For example: the user speaks voice instructions of 'answer', 'hang up', and the like.
And S312, uploading the third voice to a third server by the intelligent sound box B.
And S313, the third server determines whether the user intends to answer the call according to the third voice. And if the user indicates to answer the call, sending an answering instruction to the intelligent sound box B.
In some examples, the third server determines whether the user answers the incoming call or does not answer the incoming call according to the third voice. And if the call is answered, sending an answering instruction to the intelligent sound box B so as to establish communication connection. And if not, sending a corresponding response to the second server to inform the user A that the user B hangs up the telephone. The procedure of non-listening can refer to the prior art, and is not described herein again.
And S314a, the intelligent sound box B establishes communication connection with the second server through the third server.
For a specific process of establishing a communication connection, reference may be made to the prior art, which is not described herein again.
Then, the second server pushes the communication establishment event to the smart sound box a through the first server (i.e., step S316 in fig. 10).
And S314B, sending an answering instruction to the electronic equipment B by the intelligent sound box B.
In some embodiments, if the smart sound box B establishes a connection with the electronic device B when receiving the answering instruction of the user B, that is, when performing step S311, the smart sound box B may send an answering instruction to the electronic device B according to the received answering instruction of the user B, and instruct the electronic device B to establish a communication connection with the second server, that is, perform step S315.
In other embodiments, if the smart sound box B does not establish connection with the electronic device B when receiving the answering instruction of the user B, that is, when executing step S311, the smart sound box B may attempt to connect with the electronic device B according to the answering instruction, and after the connection is successful, send the answering instruction to the electronic device B, that is, execute step S315. If the intelligent sound box B is not successfully connected with the electronic equipment, the intelligent sound box B establishes communication connection with the second server and carries out voice communication with the user A.
And S315, the electronic device B establishes communication connection with the second server through the third server.
Then, the second server pushes the communication establishment event to the electronic device a through the first server (i.e., step S317 in fig. 10).
So far, the communication equipment of the user A (comprising the intelligent sound box A and the electronic equipment A) and the listening communication equipment of the user B (comprising the intelligent sound box B and the electronic equipment B) establish communication connection. User a may then use the communication connection to conduct a video call with user B.
In the following, a video call process provided by the embodiment of the present application is exemplarily described by taking a process in which the user a speaks and the user B listens, that is, the smart speaker a collects audio of the user a, the electronic device a collects video of the user a, and sends the collected video to the user B, and the user B listens by using the smart speaker B and the electronic device B, as follows:
s318a, the smart speaker a collects the audio of the user a, and marks the collected audio with a collection time, where the collection time is the local time of the smart speaker. And then uploaded to the first server.
Referring to fig. 10, in some examples, after receiving the communication establishment event sent by the first server (i.e., step S316a), smart sound box a may start capturing audio of user a.
In other examples, the communication setup event sent by the first server may also carry a time for starting a video call, which is used to instruct the smart speaker a to start capturing audio of the user a at the time.
In still other examples, after receiving the communication establishment event sent by the first server (i.e., step S316a), the smart sound box a may also send an instruction to start a video call (shown as step S317 in fig. 10) to the electronic device a, where the instruction carries a time to start the video call to instruct the electronic device a to start capturing video at the time. Meanwhile, smart speaker a also starts to collect audio from that time.
S318b, the electronic device A collects the video of the user A, and marks the collected video with the collection time, wherein the collection time is the local time of the intelligent sound box. And then uploaded to the first server.
In some examples, the electronic device a may start capturing the video of the user a upon receiving the communication establishment event sent by the first server (i.e., step S316 b).
In other examples, the communication establishment event sent by the first server may also carry a time for starting a video call, which is used to instruct the electronic device a to start capturing video of the user a at the time. The embodiments of the present application are not particularly limited.
In still other examples, the electronic device a, upon receiving an instruction sent by the smart sound box a to start a video call, acquires a time carried in the instruction to start the video call. And the time standard of the intelligent sound box is adopted, and the video is collected from the time of starting the video call.
The method for acquiring the time standard of the smart speaker by the electronic device a may refer to the above description, and is not described herein again.
It is understood that the execution order of steps S318a and S318b is not limited in the embodiments of the present application.
S319, the first server uploads the received audio and video to the second server.
In some embodiments of the application, since the audio received by the first server is marked with the capture time, the received video is marked with the capture time, and the capture times of the audio and the video marks are obtained by using the same time source (for example, the time of the smart speaker), the audio and the video belonging to the same time can be combined into an audio and video according to the capture time. It can be understood that the audio and the video belonging to the same moment are combined into an audio and video for transmission, so that the situation that the audio and the video cannot be played simultaneously due to the loss of one of the audio and the video, the large time difference between the arrival of the audio and the video and the like can be avoided.
In other embodiments of the present application, considering that the receiving side also uses two devices to play the audio and the video respectively after receiving the audio and the video, the first server may also directly transmit the audio and the video to the other side, instead of merging the received audio and the video. Therefore, the workload of splitting the merged audio and video can be reduced.
And S320, the second server sends the received audio and video to a third server.
Referring to fig. 11, in some embodiments of the present application, if the audio and the video received by the third server from the second server are merged audio and video, the third server splits the audio and the video into the audio and the video. The split audio and video packets contain acquisition time. Then, the split audio is sent to the smart speaker B, and the split video is sent to the electronic device B, that is, steps S321a and S321B are executed.
In other embodiments of the present application, if the third server receives the audio and video signals separately from the second server, the third server directly sends the audio signal to the smart sound box B and sends the video signal to the electronic device B, i.e., steps S321a and S321B are performed.
S321a, the third server sends the audio to the smart sound box B.
S321B, the third server sends the video to the electronic device B.
In some embodiments, electronic device B may send a notification message to smart sound box B after receiving the video, where the notification message may be used to indicate that electronic device B has received the video. The notification message may also contain a timestamp of the receipt of the video. That is to say, the electronic device B tells the smart speaker B which time periods the smart speaker B receives the videos collected, so that the smart speaker B issues the playing instruction accordingly.
S322, the intelligent sound box B sends a playing instruction to the electronic equipment B, and the playing instruction comprises the time for starting playing and the content of playing the appointed time.
The content of the playing specified time means that the intelligent sound box B plays the audio of the specified time, and the electronic equipment B plays the video of the specified time.
S323a, smart speaker B starts playing the specified content (audio at the specified time) from the specified time (time when playing starts).
S323B, electronic device B starts playing the content (video at the designated time) of the video from the designated time (time to start playing).
And S324, in the playing process, the progress state of the intelligent sound box B and the electronic equipment B is played synchronously.
This step can refer to the above related content about adjusting the playing progress, or adopt other methods of synchronizing the playing progress, for example.
It is understood that after the communication connection is established, i.e., after steps S314a and S315, smart sound box B also collects the audio of user B and uploads it to the third server; meanwhile, the electronic equipment collects the video of the user B and uploads the video to the third server, the specific process is similar to that of the user A, and the detailed description is omitted.
It will also be appreciated that the methods of steps S301-S324 may also be applied when only one device (e.g., electronic device) is used by either the calling party or the called party. Fig. 12 shows a schematic diagram of a network for user B to play video phone only by using electronic device B. At this time, the interaction and processing on the user a side (from the smart speaker a, the electronic device a, the first server, and the second server) may adopt the methods in steps S301 to S324. At this time, the first server needs to combine the audio collected by the smart speaker a and the video collected by the electronic device a, send the combined audio and video to the second server, and send the audio and video to the user B through the mobile communication network. And the first server receives the audio and video sent by the user B, splits the received audio and video and respectively sends the split audio and video to the electronic equipment A and the intelligent sound box B.
The interaction and processing of the user B side (the electronic device B and the mobile communication network) may adopt a mobile communication method in the prior art, and the interaction between the second server and the mobile communication network may also adopt the prior art, which is not described again.
Therefore, the video and the audio can be played on different devices, the advantages of the different devices are exerted, and the user experience is improved.
It is to be understood that the above-mentioned terminal and the like include hardware structures and/or software modules corresponding to the respective functions for realizing the above-mentioned functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
In the embodiment of the present application, the terminal and the like may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation. The following description will be given by taking the division of each function module corresponding to each function as an example:
fig. 13 is a schematic structural diagram of an apparatus according to the foregoing embodiment, where the apparatus can implement the function of the smart sound box in the method provided in the embodiment of the present application. The device can be a smart sound box or a device capable of supporting the smart sound box to realize the function of the smart sound box in the embodiment of the application, for example, the device is a chip system applied to the smart sound box. The device includes: a processing unit 1301 and a communication unit 1302. The communication unit 1302 may be configured to support the smart speaker shown in fig. 8B to perform steps S101a, S101c, S102a, S104a, S102B, S103B, and S104B in the foregoing embodiment, or to support the smart speaker shown in fig. 9A to perform steps S201, S202, S204, and S208 in the foregoing embodiment, or to support the smart speaker shown in fig. 10 to perform steps S301, S302, S304 to S308, S316a, S317, and S318a in the foregoing embodiment, or to support the smart speaker shown in fig. 11 to perform steps S309, S311, S312, S313, S314a, S314B, S321a, S322, and S324 in the foregoing embodiment. The processing unit 1301 is configured to support the smart speaker shown in fig. 10 to execute the step S318 in the foregoing embodiment to stamp the acquired audio with the time stamp of the smart speaker; or the method is used to support the smart sound box shown in fig. 11 to execute step S323a in the above embodiment. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
Optionally, in the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.
Alternatively, the communication unit in the embodiments of the present application may be a circuit, a device, an interface, a bus, a software module, a transceiver, or any other device that can implement communication.
Alternatively, the communication unit 1302 may be a communication interface of the electronic device or a system-on-chip applied to the electronic device, for example, the communication interface may be a transceiver circuit, and the processing unit 1302 may be a processor integrated on the electronic device or the system-on-chip applied to the electronic device.
Fig. 14 is a schematic diagram illustrating a possible logical structure of the apparatus according to the above embodiments, which may implement the functions of the electronic device in the method provided in the embodiments of the present application. The device can be an electronic device or a chip system applied to the electronic device, and comprises: a processing module 1401 and a communication module 1403. Processing module 1401 is configured to control and manage the actions of the apparatus shown in fig. 7, for example, processing module 1401 is configured to support the smart speaker shown in fig. 10 to execute the step S318 in the foregoing embodiment to timestamp the captured audio with the time of the smart speaker; or the method is used to support the smart sound box shown in fig. 11 to execute step S323a in the above embodiment. The communication module 1403 may be used to support the smart speaker shown in fig. 8B to perform steps S101a, S101c, S102a, S104a, S102B, S103B, and S104B in the above-described embodiment, or to support the smart speaker shown in fig. 9A to perform steps S201, S202, S204 to S208 in the above-described embodiment, or to support the smart speaker shown in fig. 10 to perform steps S301, S302, S304 to S308, S316a, S317, and S318a in the above-described embodiment, or to support the smart speaker shown in fig. 11 to perform steps S309, S311, S312, S313, S314a, S314B, S321a, S322, and S324 in the above-described embodiment, and/or to support other processes performed by the apparatus shown in fig. 14 for the techniques described herein. Optionally, the apparatus shown in fig. 14 may further include a storage module 1402 for storing program codes and data of the apparatus.
The processing module 1401 may be a processor or controller, such as a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure of the embodiments of the application. A processor may also be a combination of computing functions, e.g., a combination of one or more microprocessors, a digital signal processor and a microprocessor, or the like. The communication module 1403 may be a transceiver, a transceiver circuit or a communication interface, etc. The storage module 1402 may be a memory.
When the processing module 1401 is the processor 110, the communication module 1403 is the mobile communication module 150, the wireless communication module 160, or the USB interface 130, and the storage module 1402 is the internal memory 121, or an external memory connected to the external memory interface 120, the apparatus according to the embodiment of the present application may be the electronic device 100 shown in fig. 2.
Fig. 15 is a schematic structural diagram of an apparatus according to the foregoing embodiments, where the apparatus may implement the functions of the electronic device in the method provided in the embodiments of the present application. The apparatus may be an electronic device or an apparatus that can support the electronic device to implement the functions of the electronic device in the embodiment of the present application, for example, the apparatus is a chip system applied in the electronic device. The device includes: a processing unit 1501 and a communication unit 1502. The processing unit 1501 may be configured to support the electronic device shown in fig. 10 to perform the time stamping of the smart sound box on the captured video in step S318b in the foregoing embodiment, or to support the electronic device shown in fig. 11 to perform step S323b in the foregoing embodiment. The communication unit 1502 is configured to support the electronic apparatus shown in fig. 8B to perform steps S101B, S101c, S102a, S103a, S104a, S102B, and S104B in the above-described embodiment; or for supporting the electronic device shown in fig. 9A to perform steps S205, S206, S207, and S208 in the above-described embodiment, or for supporting the electronic device shown in fig. 10 to perform steps S305, S307, S316b, S317, and S318b in the above-described embodiment, or for supporting the electronic device shown in fig. 11 to perform steps S314b, S315, S321b, S322, and S324 in the above-described embodiment. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
Optionally, in the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.
Alternatively, the communication unit in the embodiments of the present application may be a circuit, a device, an interface, a bus, a software module, a transceiver, or any other device that can implement communication.
Alternatively, the communication unit 1502 may be a communication interface of the electronic device or a system-on-chip applied to the electronic device, for example, the communication interface may be a transceiver circuit, and the processing unit 1501 may be a processor integrated on the electronic device or the system-on-chip applied to the electronic device.
Fig. 16 is a schematic diagram illustrating a possible logical structure of the apparatus according to the above embodiments, which may implement the functions of the electronic device in the method provided in the embodiments of the present application. The device can be an electronic device or a chip system applied to the electronic device, and comprises: a processing module 1601 and a communication module 1603. The processing module 1601 is configured to control and manage the operations of the apparatus shown in fig. 7, for example, the processing module 1601 may be configured to support the electronic device shown in fig. 10 to perform the time stamping of the smart speaker on the captured video in step S318b in the foregoing embodiment, or to support the electronic device shown in fig. 11 to perform step S323b in the foregoing embodiment. The communication module 1603 is used to support the electronic device shown in fig. 8B to perform steps S101B, S101c, S102a, S103a, S104a, S102B, and S104B in the above-described embodiment; or for supporting the electronic device shown in fig. 9A to perform steps S205, S206, S207, and S208 in the above-described embodiment, or for supporting the electronic device shown in fig. 10 to perform steps S305, S307, S316b, S317, and S318b in the above-described embodiment, or for supporting the electronic device shown in fig. 11 to perform steps S314b, S315, S321b, S322, and S324 in the above-described embodiment, and/or for other processes performed by the apparatus shown in fig. 16 for the techniques described herein. Optionally, the apparatus shown in fig. 16 may further include a storage module 1602 for storing program codes and data of the apparatus.
The processing module 1601 may be a processor or controller, such as a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure of the embodiments of the application. A processor may also be a combination of computing functions, e.g., a combination of one or more microprocessors, a digital signal processor and a microprocessor, or the like. The communication module 1603 may be a transceiver, transceiving circuitry, or a communication interface, etc. The storage module 802 may be a memory.
When the processing module 1601 is the processor 110, the communication module 1603 is the mobile communication module 150, the wireless communication module 160, or the USB interface 130, and the storage module 802 is the internal memory 121, or an external memory connected to the external memory interface 120, the apparatus according to the embodiment of the present application may be the electronic device 100 shown in fig. 2.
An embodiment of the present application further provides a communication system, including: the electronic device comprises a first electronic device, a second electronic device with a display screen and a third electronic device. The communication system may also include one or more servers.
Specifically, the first electronic device receives a voice input of a user; the first electronic equipment sends voice input to the server; the server sends a first call instruction to the first electronic equipment according to the voice input, wherein the first call instruction carries a call identifier; in response to receiving the first call instruction, the first electronic device establishes a call connection with a third electronic device via the server: the first electronic equipment sends a second calling instruction to the second electronic equipment, and the second calling instruction carries a call identifier; in response to receiving the second call instruction, the second electronic equipment sends a call identifier to the server; in response to receiving the call identifier sent by the second electronic device, the server associates the second electronic device with the call connection; the method comprises the steps that first electronic equipment collects sound information to generate a first audio file, and sends the first audio file to a server, wherein the first audio file comprises first collection time; the second electronic equipment collects image information to generate a first video file and sends the first video file to the server, and the first video file comprises second collection time; and the server sends the first audio file and the first video file to the third electronic equipment through the call connection.
Wherein, the first electronic device may be the smart speaker in the above embodiment, for example: may be an apparatus as described in fig. 13 and 14. The second electronic device may be the electronic device in the above-described embodiment, for example: may be an apparatus as described in fig. 15 and 16. The third electronic device may be the electronic device in the above embodiment, and is the electronic device of the called party. The one or more servers may be one or more of a speaker cloud server, an intent recognition server, an audio-video call server.
The method provided by the embodiment of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a terminal, or other programmable apparatus. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., SSD), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific implementation of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (27)

1. A method for video call by cooperation of multiple electronic devices is characterized by comprising the following steps: the system comprises a first electronic device, a second electronic device with a display screen, a third electronic device and a server;
the first electronic equipment establishes call connection with the third electronic equipment through the server;
the first electronic device associating the second electronic device to the call connection via the server;
the first electronic equipment collects sound information to generate first audio content, and sends the first audio content to the third electronic equipment through the server;
the second electronic equipment collects image information to generate first video content, and the first video content is sent to the third electronic equipment through the server.
2. The method of claim 1, wherein establishing, by the first electronic device, the call connection with the third electronic device via the server comprises:
the first electronic equipment receives voice input of a user;
the first electronic device sends the voice input to the server;
the first electronic equipment receives a first calling instruction sent by the server according to the voice input, wherein the first calling instruction carries a call identifier;
in response to receiving the first call instruction, the first electronic device establishes a call connection with the third electronic device via the server.
3. The method of claim 1 or 2, wherein the first electronic device associating the second electronic device to the call connection via the server comprises:
the first electronic equipment sends a second call instruction to the second electronic equipment, wherein the second call instruction carries the call identifier of the call connection;
in response to receiving the second call instruction, the second electronic device sends the call identifier to the server, where the call identifier is used for the server to associate the second electronic device with the call connection.
4. The method of any of claims 1-3, wherein the first audio content comprises a first capture time at which the first audio content is captured, and wherein the first video content comprises a second capture time at which the first video content is captured.
5. The method of claim 1, wherein the first electronic device logs into the server with the same account as the second electronic device logs into the server.
6. The method of claim 3, wherein the audio content comprises a first capture time at which the audio content is captured, wherein the video content comprises a second capture time at which the video content is captured, and wherein the first capture time is a local time at which the audio content is captured by the first electronic device; the second call instruction carries a local time of the first electronic device, and the method further comprises:
the second electronic device determines a difference value between the local time of the second electronic device and the local time of the first electronic device according to the local time of the first electronic device and the current local time of the second electronic device, wherein the second acquisition time is obtained by adding or subtracting the difference value to or from the local time of the second electronic device when the second electronic device acquires the video content.
7. The method according to claim 2, wherein the first call instruction carries a telephone number corresponding to the third electronic device;
the establishing, by the first electronic device and via the server, a call connection with the third electronic device in response to receiving the first call instruction includes:
and responding to the received first calling instruction, and the first electronic equipment establishes call connection with the third electronic equipment through the server according to the telephone number corresponding to the third electronic equipment.
8. The method of claim 2, wherein the first call instruction carries a voice play indication instructing the first electronic device to play a voice prompt.
9. The method of claim 3, wherein the second call instruction carries a display indication instructing the second electronic device to display a prompt message.
10. The method of claims 1-9, further comprising:
the second electronic device receiving second video content;
the second electronic device sends a notification message to the first electronic device, wherein the notification message indicates that the second electronic device receives the second video content;
the first electronic device sends a playing instruction to the second electronic device, wherein the playing instruction instructs the second electronic device to play the second video content;
and the first electronic equipment plays the second audio content.
11. The method according to any one of claims 1-10, wherein the first electronic device is a smart speaker and the second electronic device is any one of a mobile phone, a computer with a screen, a television, and a tablet computer.
12. A method for executing audio and video playing by cooperation of multiple electronic devices is characterized by comprising the following steps:
the method comprises the steps that a first electronic device receives a voice request for playing first content;
the first electronic equipment acquires the address of the first content according to the voice request;
the first electronic device sending an address of the first content to a second electronic device having a display screen associated with the first electronic device;
the first electronic device sends a first instruction to the second electronic device, wherein the first instruction instructs the second electronic device to display the picture of the first content from the first time, and the picture of the first content is obtained by the second electronic device according to the address of the first content;
and the first electronic equipment plays the audio of the first content from the first time, wherein the audio of the first content is acquired by the first electronic equipment according to the address of the first content.
13. The method of claim 12, further comprising:
and if the distance between the first electronic equipment and the second electronic equipment is smaller than a first threshold value, automatically associating the first electronic equipment with the second electronic equipment.
14. The method according to claim 13, wherein if the distance between the first electronic device and the second electronic device is smaller than a threshold value, the automatically associating between the first electronic device and the second electronic device includes:
and if the distance between the first electronic equipment and the second electronic equipment is smaller than a threshold value, and the account number of the first electronic equipment for logging in the server is the same as the account number of the second electronic equipment for logging in the server, automatically associating the first electronic equipment with the second electronic equipment.
15. The method of claim 13 or 14, wherein after the first electronic device and the second electronic device are automatically associated, the method further comprises:
and the second electronic equipment automatically opens the first application and displays the first application in a full screen mode.
16. The method of any of claims 12-15, wherein after the first electronic device and the second electronic device are automatically associated, the method further comprises:
and the second electronic equipment automatically plays the picture of the first content in a full screen mode.
17. The method according to any one of claims 12-16, wherein during the playing of the audio of the first content by the first electronic device and the playing of the picture of the first content by the second electronic device, the method further comprises:
the second electronic equipment sends a first progress to the first electronic equipment, wherein the first progress is the progress of the second electronic equipment in playing the picture of the first content;
and the first electronic equipment sends a progress adjusting instruction to the second electronic equipment according to the first progress.
18. The method according to any one of claims 12-17, wherein the first electronic device is a smart speaker and the second electronic device is any one of a cell phone, a computer with a screen, a television, and a tablet computer.
19. A method of multi-electronic device collaboration, comprising:
the first electronic equipment receives a request for playing first content;
the first electronic equipment acquires first information and second information according to a request for playing the first content;
the first electronic device sending the first information to a second electronic device having a display screen associated with the first electronic device;
the first electronic device sends a first instruction to the second electronic device, wherein the first instruction instructs the second electronic device to display the image of the first content, and the image of the first content is acquired by the second electronic device according to the first information;
and the first electronic equipment plays the audio of the first content, wherein the audio of the first content is acquired by the first electronic equipment according to the second information.
20. A communication system, comprising: a first electronic device and a second electronic device having a display screen;
the first electronic device is used for establishing a call connection with a third electronic device through a server and associating the second electronic device to the call connection through the server;
the first electronic device is further configured to collect sound information to generate first audio content, and send the first audio content to the third electronic device via the server;
the second electronic device is used for acquiring image information to generate first video content and sending the first video content to the third electronic device through the server.
21. The communication system of claim 20, further comprising the third electronic device configured to receive and play the first audio content and the first video content.
22. The communication system according to claim 20 or 21, further comprising the server configured to establish a call connection between the first electronic device and the third electronic device, associate the second electronic device to the call connection, receive first audio content from the first electronic device, receive first video content from the second electronic device, and transmit the first audio content and the first video content to the third electronic device.
23. A first electronic device, comprising: a processor and a memory coupled with the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when executed by the first electronic device, cause the first electronic device to perform operations comprising:
establishing a call connection with the third electronic device via the server;
associating, via the server, the second electronic device to the call connection to facilitate the second electronic device sending the captured first video content to the third electronic device via the server;
collecting sound information to generate first audio content, and sending the first audio content to the third electronic device through the server.
24. The first electronic device of claim 23, wherein the first electronic device is specifically configured to:
receiving a voice input of a user;
sending the voice input to the server;
receiving a first calling instruction sent by the server according to the voice input, wherein the first calling instruction carries a call identifier;
establishing a call connection with the third electronic device via the server in response to receiving the first call instruction;
and sending a second call instruction to the second electronic device, wherein the second call instruction carries a call identifier of the call connection, so that the second electronic device sends the call identifier to the server, and the call identifier is used for the server to associate the second electronic device with the call connection.
25. A first electronic device, comprising: a processor and a memory coupled with the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when executed by the first electronic device, cause the first electronic device to perform operations comprising:
receiving a voice request for playing first content;
acquiring the address of the first content according to the voice request;
sending an address of the first content to a second electronic device having a display screen associated with the first electronic device;
sending a first instruction to the second electronic device, wherein the first instruction instructs the second electronic device to display the picture of the first content from a first time, and the picture of the first content is obtained by the second electronic device according to the address of the first content;
and playing the audio of the first content from the first time, wherein the audio of the first content is acquired by the first electronic equipment according to the address of the first content.
26. A first electronic device, comprising: a processor and a memory coupled with the processor, the memory for storing computer program code, the computer program code comprising computer instructions that, when executed by the first electronic device, cause the first electronic device to perform operations comprising:
receiving a request to play first content;
acquiring first information and second information according to the request for playing the first content;
sending the first information to a second electronic device having a display screen associated with the first electronic device;
sending a first instruction to the second electronic device, wherein the first instruction instructs the second electronic device to display the image of the first content, and the image of the first content is acquired by the second electronic device according to the first information;
and playing the audio of the first content, wherein the audio of the first content is acquired by the first electronic equipment according to the second information.
27. A computer storage medium, characterized by comprising computer instructions, which, when run on a terminal, cause the terminal to perform a method for multi-electronic device cooperation to perform audio-video playing according to any one of claims 12-18, or perform a method for multi-electronic device cooperation to perform claim 19.
CN202111309784.2A 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment Pending CN114125354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111309784.2A CN114125354A (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910147336.3A CN111628916B (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment
CN202111309784.2A CN114125354A (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910147336.3A Division CN111628916B (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment

Publications (1)

Publication Number Publication Date
CN114125354A true CN114125354A (en) 2022-03-01

Family

ID=72271608

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910147336.3A Active CN111628916B (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment
CN202111309784.2A Pending CN114125354A (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910147336.3A Active CN111628916B (en) 2019-02-27 2019-02-27 Method for cooperation of intelligent sound box and electronic equipment

Country Status (1)

Country Link
CN (2) CN111628916B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114845078B (en) * 2020-12-01 2023-04-11 华为技术有限公司 Call method and electronic equipment
CN114911603A (en) * 2021-02-09 2022-08-16 华为技术有限公司 Distributed device capability virtualization method, medium, and electronic device
CN115314327B (en) * 2021-05-07 2024-02-06 海信集团控股股份有限公司 Electronic equipment, intelligent equipment and intelligent equipment control method
CN115460233A (en) * 2021-05-20 2022-12-09 华为技术有限公司 Application-based equipment connection relation establishing method and related device
CN115550090A (en) * 2021-06-30 2022-12-30 华为技术有限公司 Equipment cooperation method and electronic equipment
CN115733919A (en) * 2021-08-23 2023-03-03 北京小米移动软件有限公司 Information processing method, device and storage medium
CN115379283A (en) * 2022-08-16 2022-11-22 维沃移动通信有限公司 Video call method and device
CN116048744B (en) * 2022-08-19 2023-09-12 荣耀终端有限公司 Image acquisition method and related electronic equipment

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464767B (en) * 2013-09-17 2018-03-20 惠州超声音响有限公司 The method and audio frequency broadcast system that the audio of synchronous multiple playing devices plays
US9741244B2 (en) * 2014-05-30 2017-08-22 Qualcomm Incorporated Methods, smart objects, and systems for naming and interacting with smart objects
CN104601685B (en) * 2014-12-31 2019-03-12 Oppo广东移动通信有限公司 A kind of method for playing music and device of intelligent sound box
CN104902316B (en) * 2015-05-14 2017-09-01 广东欧珀移动通信有限公司 Method, device, intelligent sound box and the mobile terminal of synchronous reproduction time
CN106911950A (en) * 2015-12-23 2017-06-30 福建星网视易信息系统有限公司 A kind of video synchronization method and system
CN105847926A (en) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 Multimedia data synchronous playing method and device
CN106507202B (en) * 2016-11-11 2019-12-17 传线网络科技(上海)有限公司 play control method and device
CN106851470A (en) * 2017-03-02 2017-06-13 深圳市启腾视听有限公司 Multimedia intelligent audio amplifier based on TV or projector applications
US10778463B2 (en) * 2017-05-30 2020-09-15 Harman International Industries, Incorporated Displaying information for a smart-device-enabled environment
CN107360475B (en) * 2017-08-18 2020-06-09 苏州科可瑞尔航空技术有限公司 Video synchronous playing and control method, control terminal and playing terminal
CN107770570A (en) * 2017-09-13 2018-03-06 深圳天珑无线科技有限公司 Audio video synchronization player method, system and computer-readable recording medium
CN107835444B (en) * 2017-11-16 2019-04-23 百度在线网络技术(北京)有限公司 Information interacting method, device, voice frequency terminal and computer readable storage medium
CN107966910B (en) * 2017-11-30 2021-08-03 深圳Tcl新技术有限公司 Voice processing method, intelligent sound box and readable storage medium
CN108196465A (en) * 2018-03-07 2018-06-22 佛山市云米电器科技有限公司 A kind of intelligent sound box and its control method based on phonetic order control
CN108366319A (en) * 2018-03-30 2018-08-03 京东方科技集团股份有限公司 Intelligent sound box and its sound control method
CN108541080B (en) * 2018-04-23 2020-09-22 Oppo广东移动通信有限公司 Method for realizing loop connection between first electronic equipment and second electronic equipment and related product
CN109195009B (en) * 2018-10-22 2021-06-22 奇酷互联网络科技(深圳)有限公司 Audio and video playing method and playing system, intelligent sound box and storage device

Also Published As

Publication number Publication date
CN111628916B (en) 2021-11-09
CN111628916A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111628916B (en) Method for cooperation of intelligent sound box and electronic equipment
CN113542839B (en) Screen projection method of electronic equipment and electronic equipment
CN113923230B (en) Data synchronization method, electronic device, and computer-readable storage medium
CN113691842B (en) Cross-device content projection method and electronic device
CN111030990B (en) Method for establishing communication connection, client and server
CN112119641B (en) Method and device for realizing automatic translation through multiple TWS (time and frequency) earphones connected in forwarding mode
WO2021052204A1 (en) Address book-based device discovery method, audio and video communication method, and electronic device
CN113961157B (en) Display interaction system, display method and equipment
CN112181616B (en) Task processing method and related device
WO2021031865A1 (en) Call method and apparatus
CN114079893A (en) Bluetooth communication method, terminal device and computer readable storage medium
CN110989961A (en) Sound processing method and device
WO2022135157A1 (en) Page display method and apparatus, and electronic device and readable storage medium
CN113126948A (en) Audio playing method and related equipment
CN111031492A (en) Call demand response method and device and electronic equipment
CN114064160A (en) Application icon layout method and related device
CN114827098A (en) Method and device for close shooting, electronic equipment and readable storage medium
CN113271577B (en) Media data playing system, method and related device
CN115460445B (en) Screen projection method of electronic equipment and electronic equipment
CN116541589A (en) Broadcast record display method and related equipment
CN115857964A (en) Application program installation method and related equipment
CN117255400A (en) Application component interaction method and related equipment
CN116560535A (en) Application component management method and related equipment
CN115017227A (en) Data synchronization method and related equipment
CN116841761A (en) Equipment cooperation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination