CN114844873A

CN114844873A - Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence

Info

Publication number: CN114844873A
Application number: CN202210375466.4A
Authority: CN
Inventors: 吉约姆·龙卡里; 索蒂里奥斯·斯塔西诺普洛斯·索毅; 安德烈·翁古雷努·安德烈
Original assignee: Shenma Artificial Intelligence Technology Shenzhen Co ltd
Current assignee: Shenma Artificial Intelligence Technology Shenzhen Co ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-08-02

Abstract

The invention is suitable for the technical field of information, and provides an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence, which comprises: the collecting module is used for collecting the audio-visual stream and transmitting the audio-visual stream to the user terminal equipment through a real-time stream protocol; the system comprises user terminal equipment, a WEBRTC graphic client and an AI cloud server, wherein the user terminal equipment is used for operating the WEBRTC graphic client and accesses the collection module and the AI cloud server in a network connection manner; and the AI cloud server is used for reasoning on the CPU or the GPU and a calculation optimization chip supporting the machine learning framework. The invention processes the audiovisual stream through the collection module, the user terminal equipment and the AI cloud server, provides better service for users, solves the problem of high subsequent cost caused by the processing capacity of the data acquisition equipment, and simultaneously avoids the delay of transmitting the audiovisual stream to the user terminal equipment.

Description

Real-time processing system for audio-visual stream of Internet of things equipment based on artificial intelligence

Technical Field

The invention belongs to the technical field of information, and particularly relates to an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence.

Background

Streaming is not a new concept and many solutions for streaming video and audio have been created for transmitting data over wireless connections of different transmission protocols, and also, although developed later, some of these protocols can be used to implement the transmission of audiovisual streams from robots and connected/internet of things devices, but the difficulty of solving the connection problems of these devices and trying to maintain a constant transmission rate increases, and the computational power of the internet of things devices is limited, which may prevent fast encoding of the streams. With the recent advances in artificial intelligence and the realization of real-time performance in processing audio-visual data and receiving AI results, the increased AI process adds another level of complexity to the overall streaming media process, requiring different approaches.

Some of the traditionally successful methods of applying AI to the combined audio and video streams of a camera/microphone connected to a static processing system, local processing immediately after data acquisition, processing the streams using the processing resources of the processing system, such as a separate computer or in a local server connected to a local network, and combining the original streams with the results by visually applying them or modifying the audio before streaming over the Internet, after which the end user can see the modified streaming device at their end. For example, someone streaming from their computer using a Zoom, etc. online conferencing application will record video and audio using a camera and microphone connected to the computer, then the application will detect people in the video stream using AI, will delete the background and add virtual background, and can even process the sound locally to remove background noise and stream the processed video and audio onto another computer or smartphone using Zoom APP to view. This process is common and works well in systems where the AI model can run locally in a processing unit with sufficient processing power. The delay depends on the processing power of the local system, the lower the processing power the greater the delay, so to reduce the delay, the local processing system may need higher cost, which in the case of low cost mobile robots or other internet of things devices that we do not usually have high processing power, may introduce a large delay, making the AI process very slow and adding a large delay to the overall streaming media experience. In some cases of insufficient processing power, this may even prevent the entire process from being completed because the device does not have enough processing power to complete the AI process, and thus, the conventional methods cannot be applied to robots or internet of things (IOT) devices. In addition to the above, there are other methods that include the data collection device transmitting the audiovisual stream to a cloud server, and using the computing power of the cloud server to complete the artificial intelligence process, after the AI process is completed, merging the results with the original stream, and sending the final stream to the end user's device for display. This process can avoid the complexity of performing the entire AI flow and streaming on the original local system, and indeed the AI flow on the cloud server can occur very quickly, but there is also a significant cost in using cloud processing.

Disclosure of Invention

The embodiment of the invention aims to provide an audio-visual stream real-time processing system of Internet of things equipment based on artificial intelligence, and aims to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

internet of things equipment audio-visual stream real-time processing system based on artificial intelligence includes:

the collecting module is used for collecting the audio-visual stream and transmitting the audio-visual stream to the user terminal equipment through a real-time stream protocol;

the system comprises user terminal equipment, a web browser and an AI (Artificial Intelligence) cloud server, wherein the user terminal equipment is used for operating a WEBRTC (Web browser graphics) graphic client and accesses a collection module and the AI cloud server in a network connection manner;

and the AI cloud server is used for reasoning on the CPU or the GPU and a computing optimization chip supporting the machine learning framework.

Further, the collection module is an IOT device or a robot.

Further, the system of user terminal devices is used for real-time processing of audiovisual streams, isolated video frame sequences and audio clips.

Further, a web browser application (WEBSOCKET) connection is established between the user terminal equipment and the AI cloud server.

Further, the user terminal device extracts the data block from the stream and sends the data block to the AI cloud server for calculation through webscoket, and the AI cloud server sends the inference result back to the user terminal device for display through webscoket.

Further, the real-time processing of the audio-visual stream comprises the following specific steps:

1) a P2P connection between the IOT device and the user terminal device is initiated;

2) the real-time stream is received on the user terminal equipment and is directly displayed to the user;

3) the user terminal equipment extracts the data block from the real-time stream and sends the data block to the AI cloud server;

4) the AI cloud server processes the data block and sends an inference result back to the user terminal equipment;

5) the user terminal device processes the inference result and displays the output at the top of the stream.

Further, the specific steps of interaction among the IOT device, the user terminal device and the AI cloud server are as follows:

webrtc handshake: the IOT equipment and the user terminal equipment exchange WEBRTC handshake by using a third-party server to discover, and if the handshake is successful, the user terminal equipment and the AI cloud server establish WEBSOCKET connection;

b. video stream establishment: the IOT equipment and the user terminal equipment exchange video streams, and videos are displayed on a user terminal equipment layer;

c. frame extraction and calculation: the user terminal equipment extracts frames from the video stream and sends the frames to the AI cloud server for calculation through WEBSOCKET;

AI information display: and after the calculation is finished, the AI cloud server sends the information back to the user terminal equipment for display through WEBSOCKET.

Compared with the prior art, the invention has the beneficial effects that:

according to the real-time processing system for the audio-visual stream of the Internet of things equipment based on artificial intelligence, the audio-visual stream is processed through the collection module, the user terminal equipment and the AI cloud server, so that better service is provided for a user, the problem of high cost caused by the processing capacity of the data acquisition equipment is solved, and meanwhile, the delay of transmitting the audio-visual stream to the user terminal equipment is avoided.

Drawings

Fig. 1 is a schematic structural diagram of an audiovisual stream real-time processing system of an internet of things device based on artificial intelligence.

Fig. 2 is a schematic diagram of a streaming layer and an AI display layer on a user terminal device in an audio-visual stream real-time processing system of an internet-of-things device based on artificial intelligence.

In the figure: 01-AI cloud server, 02-user terminal equipment, 03-IOT equipment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specific implementations of the present invention are described in detail below with reference to specific embodiments.

As shown in fig. 1 and fig. 2, an audiovisual stream real-time processing system of an internet of things device based on artificial intelligence according to an embodiment of the present invention includes:

a collection module for collecting audiovisual streams, the collection module transmitting the audiovisual streams to the user terminal device 02 via a real-time streaming protocol;

the system comprises user terminal equipment 02, wherein the user terminal equipment 02 is used for operating a WEBRTC graphic client, and the user terminal equipment 02 accesses a collection module and an AI cloud server 01 in a network connection mode;

the AI cloud server 01 is used for reasoning on a CPU or a GPU and a computing optimization chip supporting a machine learning framework.

In the embodiment of the present invention, preferably, the system of the user terminal device 02 uses local resources, processes the separated data stream through the AI cloud server 01, or transmits the separated data to the cloud, the cloud performs processing according to network performance, the processed result is sent back to the user terminal device 02, and the user terminal device 02 combines the processed result with the original stream and reproduces the final stream for the user to watch and listen.

As shown in fig. 1, as a preferred embodiment of the present invention, the collection module is an IOT device 03 or a robot.

In an embodiment of the invention, preferably, the audiovisual stream is collected by a robot or an internet of things device and transmitted directly to the end-user device using a real-time streaming protocol.

As shown in fig. 1, the system of the user terminal equipment 02 is used for real-time processing of audiovisual streams, isolated video frame sequences and audio clips as a preferred embodiment of the present invention.

In the embodiment of the present invention, it is preferable that the system of the user terminal device 02 is responsible for processing the stream, isolating the sequence of video frames and the audio clip, and then the system processes the isolated stream through AI using the local resources of the user terminal device 02 or transmits some isolated data to the cloud, which processes according to the network performance and sends the results back to the local device, combines them with the original stream, and reproduces the final stream for the end user to listen to and watch.

As shown in fig. 1, as a preferred embodiment of the present invention, a webscocket connection is established between the user terminal device 02 and the AI cloud server 01.

In the embodiment of the present invention, webscocket is preferably a protocol for performing full duplex communication on a single TCP connection.

As shown in fig. 1, as a preferred embodiment of the present invention, the user terminal device 02 extracts a data block from a stream and sends it to the AI cloud server 01 for computation through web block, and the AI cloud server 01 sends an inference result back to the user terminal device 02 through web block for display.

In the embodiment of the present invention, it is preferable that the user terminal device 02 has support of web session.

As shown in fig. 1, as a preferred embodiment of the present invention, the real-time processing of the audiovisual stream comprises the following specific steps:

1) a P2P connection between IOT device 03 and user end device 02 is initiated;

2) the real-time stream is received at the user terminal device 02 and directly displayed to the user;

3) the user terminal device 02 extracts a data block from the real-time stream and sends the data block to the AI cloud server 01;

4) the AI cloud server 01 processes the data block and sends an inference result back to the user terminal device 02;

5) the user terminal device 02 processes the inference result and displays the output at the top of the stream.

In the embodiment of the present invention, preferably, the P2P, i.e. the abbreviation of Peer-to-Peer, means "Peer-to-Peer" or "Peer-to-Peer", in the P2P network, all nodes are in Peer-to-Peer status, and each node acts as both a server and a client, so that the pressure of a central server can be relieved, and the resource or task processing is more decentralized.

As shown in fig. 2, as a preferred embodiment of the present invention, the specific steps of interaction between the IOT device 03, the user terminal device 02 and the AI cloud server 01 are as follows:

In the embodiment of the present invention, preferably, the user equipment extracts the frame from the stream and sends it to the AI cloud server 01 through the webscocket connection for computation, during which the video stream is still displayed, and only after receiving the reply from the AI cloud server 01, a new extraction is performed. After the calculation is completed, the AI cloud server 01 sends information back to the user equipment for display through webscoket, and since the video is already displayed on one layer, another layer is added thereon to display the AI information. Under normal circumstances, the delay of the WEBRTC flow is about 0.2 seconds, the request to the AI cloud server 01 requires about 0.2 seconds, and the AI inference requires 0.1 seconds.

The working principle of the invention is as follows:

according to the real-time processing system for the audio-visual stream of the Internet of things equipment based on artificial intelligence, the audio-visual stream is processed through the collection module, the user terminal equipment 02 and the AI cloud server 01, so that better service is provided for a user, the problem of high cost caused by the processing capacity of data acquisition equipment is solved, and meanwhile, the delay of transmitting the audio-visual stream to the user terminal equipment 02 is avoided.

The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make several variations and modifications without departing from the concept of the present invention, and these should be considered as the protection scope of the present invention, which will not affect the effect of the implementation of the present invention and the practicability of the patent.

Claims

1. Internet of things equipment audio-visual stream real-time processing system based on artificial intelligence, its characterized in that includes:

the system comprises user terminal equipment, a WEBRTC graphic client and an AI cloud server, wherein the user terminal equipment is used for operating the WEBRTC graphic client and accesses the collection module and the AI cloud server in a network connection manner;

And the AI cloud server is used for reasoning on the CPU or the GPU and a calculation optimization chip supporting the machine learning framework.

2. The real-time processing system for the audio-visual stream of the IOT equipment based on artificial intelligence of claim 1, wherein the collection module is an IOT equipment or a robot.

3. The artificial intelligence based internet of things device audiovisual stream real-time processing system of claim 1, wherein the system of user terminal devices is configured to process audiovisual streams, isolated video frame sequences and audio clips in real-time.

4. The real-time processing system for audiovisual stream of an internet of things device based on artificial intelligence of claim 3, wherein a web browser key (webmaster) connection is established between the user terminal device and the AI cloud server.

5. The system of claim 4, wherein the user terminal device extracts the data blocks from the stream and sends them to the AI cloud server for computation through WEBSOCKET, and the AI cloud server sends the inference results back to the user terminal device for display through WEBSOCKET.

6. The real-time processing system for the audio-visual stream of the internet-of-things equipment based on artificial intelligence of any one of claims 1 to 5, wherein the real-time processing method for the audio-visual stream comprises the following specific steps:

7. The real-time processing system for the audio-visual stream of the internet of things equipment based on artificial intelligence of any one of claims 1 to 5, wherein the specific steps of interaction among the IOT equipment, the user terminal equipment and the AI cloud server are as follows: