CN114077747A

CN114077747A - Media information transmission method and device

Info

Publication number: CN114077747A
Application number: CN202011300677.9A
Authority: CN
Inventors: 提纯利; 何小祥; 韦家毅
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-14
Filing date: 2020-11-19
Publication date: 2022-02-22

Abstract

The application provides a media information transmission method and a device, when the method is applied to first equipment, the first equipment comprises a first transmission interface, and the method comprises the steps of collecting first media information; extracting the characteristics of the first media information, and determining first characteristic data of the first media information; and sending the first characteristic data to a second device through the first transmission interface, wherein the first characteristic data is used for the second device to obtain the result of the first application. Therefore, the transmission overhead and the encoding and decoding overhead during the transmission of the media information are reduced, and the transmission effect is improved.

Description

Media information transmission method and device

Cross Reference to Related Applications

The present application claims priority from chinese patent application filed on 14/08/2020, having application number 202010819860.3 and entitled "distributed AI oriented transport interface system," the entire contents of which are incorporated herein by reference.

Technical Field

The embodiment of the application relates to the technical field of media, in particular to a media information transmission method and device.

Background

In machine algorithm application scenes such as machine vision, voice interaction, face recognition, automatic driving, environmental modeling and the like, image acquisition equipment acquires original video or image resources, the video or image resources are compressed into video streams and transmitted to corresponding servers or equipment for realizing machine algorithm application, and the equipment decodes or decompresses the received video streams and then recovers standard audio and video signals; and then, processing the audio/video signals by using machine algorithms such as deep learning and the like to obtain a processing result applied by Artificial Intelligence (AI).

In the above process, a large amount of bandwidth is occupied, and for a device to which a machine algorithm is applied, a large amount of decoding or decompression processes are also required, resulting in a waste of computing resources and a waste of transmission bandwidth resources.

Disclosure of Invention

The application provides a media information transmission method and a device, which are used for solving the problems that in the prior art, a media data transmission process is adopted, a large bandwidth resource is occupied, the subsequent application of the media resource is not facilitated, and the like.

In a first aspect, the present application provides a media information transmission method, applied to a first device, where the first device includes a first transmission interface, and the method includes: collecting first media information; extracting the characteristics of the first media information, and determining first characteristic data of the first media information; and sending the first characteristic data to a second device through the first transmission interface, wherein the first characteristic data is used for the second device to obtain the result of the first application.

The first application may be an AI application, and the device for executing the method may be the first device, or may be a module or a component, such as a chip, having corresponding functions. Taking the first device to perform the method as an example below, the first device may be a device having the capability of collecting media information and the capability of feature extraction. The first device can comprise a first transmission interface which supports transmission of the characteristic data, so that media information is transmitted to the second device in a characteristic data mode, the situation that in the prior art, the media information needs to be compressed and encoded at a sending end and recovered at a receiving end of the media information is avoided, the encoding and decoding process is complex, the expenditure of computing resources is reduced, the delay of a system is integrally reduced, the method and the device are beneficial to being applied to AI application with high real-time requirements, and the experience of a user in using the AI application is improved. In addition, because the abstract feature data is transmitted, the encoding difficulty is reduced, the transmitted information amount is greatly reduced, and the expense of transmission resources is reduced. Furthermore, in consideration of the security during the transmission of the media information, the abstracted feature data is transmitted, and compared with the mode of transmitting the media information in the prior art, the feature data transmitted in the transmission interface cannot be reversely converted into the original media information, so that better privacy protection capability is realized.

A possible implementation manner, before sending the first feature data to the second device, further includes:

receiving a capability negotiation request message sent by the second device through the first transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information; sending a capability negotiation response message to the second device through the first transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data and the feature extraction capability of the first device.

In one possible implementation, in response to a first operation on a first application, sending a first notification message to the second device; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; receiving a first response message returned by the second equipment; the first response message is used for confirming that the first device and the second device start first application cooperation.

Through the method, the process of establishing the first application cooperation between the first device and the second device can be triggered through the first operation on the first application of the first device, and whether the first application cooperation is started or not is confirmed through the second device. The first application cooperation may be a cooperation process in which the first device sends the first feature data to the second device through the first transmission interface, and the second device obtains a result of the first application according to the first feature data, so as to improve an experience of a user in cooperatively processing the first application by using the first device and the second device.

In one possible implementation, a first notification message sent by the second device is received; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; in response to a third operation on the first application, sending a first response message to the second device; the first response message is used for confirming that the first device and the second device start first application cooperation.

According to the method, the second device triggers the first device and the second device to establish the establishment process of the first application cooperation, whether the first application cooperation is started or not is confirmed according to the third operation of the first application on the first device, and the experience of a user for cooperatively processing the first application by using the first device and the second device is improved.

a capability negotiation request message sent to the second device through the first transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the second device and the feature data processing capability of the second device, and the transmission protocol of the second device is used for indicating that the second device supports transmission of feature data; the feature data processing capability of the second device is used for indicating the capability of the second device supporting the processing of the first feature data to obtain the result of the first application; receiving a capability negotiation response message from the second device through the first transmission interface; the capability negotiation response message is used for confirming that the second device supports a transmission protocol for transmitting the feature data and the feature data processing capability of the second device.

By the method, the capability negotiation mode can be used, for example, the first device initiates the negotiation request message, or the second device initiates the negotiation request message, so that whether the first device has the capability of extracting the feature and transmitting the feature data, whether the second device supports the capability of receiving the feature data, and the capability of processing the feature data are confirmed, whether the first device and the second device support the transmission of the feature data and cooperatively realize the corresponding function of the first application, and the performance of the AI application is improved.

In a possible implementation manner, before the first device performs feature extraction on the first media information, the method further includes:

acquiring a first feature extraction model; the first feature extraction model is used for extracting features of the first media information, the version of the first feature extraction model corresponds to the version of a first feature data processing model, and the first feature data processing model is used for processing the first feature data by the second device to obtain a result of the first application.

By the method, after the first device and the second device establish connection and determine the task of the first application which can execute corresponding cooperative processing, the first device and the second device respectively load an input part (a first characteristic data extraction model) and an output part (a first characteristic data processing model) of an algorithm model corresponding to the first application. Thereby realizing the first application cooperation of the first device and the second device.

In one possible implementation manner, the capability negotiation response message further includes:

a version of a feature extraction model in the first device; or, a version of a feature data processing model in the second device.

By the method, whether the version of the feature extraction model in the first device and the version of the feature data processing model in the second device can complete the cooperation of the first application or not can be confirmed through the capability negotiation response.

One possible implementation manner of obtaining the first feature extraction model includes:

and receiving a first feature extraction model from the second equipment through the first transmission interface, or receiving a first feature extraction model from a server, or reading a first feature extraction model stored by the first equipment.

By the method, the first feature extraction model can be obtained in multiple modes, and more flexible AI application cooperation modes are realized.

In one possible implementation, the method further includes:

sending a first feature data processing model to the second device through the first transmission interface;

the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature data processing model is used for the second device to process the first feature data to obtain the result of the first application.

The first device may store a first feature data processing model, and the first feature data processing model is sent to the second device through the first transmission interface by the method, so that the first device and the second device may implement AI application to cooperatively process media information.

In one possible implementation, the method further includes:

and acquiring a second feature extraction model, wherein the version of the second feature extraction model corresponds to the version of a second feature data processing model, and the second feature extraction model and the second feature data processing model are determined after the first feature extraction model and the second feature data processing model are updated.

By the method, the updated feature extraction model (namely the second feature extraction model) can be obtained after the first feature extraction model is updated, so that the method can adapt to the requirements of various different AI applications, and the applicability of the first device and the second device for cooperatively processing the media information for the AI applications is improved.

In one possible implementation, the method further includes:

according to the first feature extraction model, feature extraction is carried out on a training sample to generate first training feature data;

transmitting the first training feature data to the second device through the first transmission interface; the first training feature data is used to train the first feature extraction model and the first feature data processing model.

By the method, the first equipment and the second equipment can be used for joint training, the computing power of the first equipment and the second equipment is reasonably used, and the performance of the AI application is improved.

In a possible implementation manner, feedback data from the second device is received through the first transmission interface, where the feedback data is determined after the second device is trained according to the first training feature data; the feedback data is used by the first device to train the first feature extraction model.

By the method, the characteristic extraction model on the first equipment is trained through feedback data fed back by the second equipment, and the performance of the characteristic extraction model is improved by utilizing the training effect of joint training of the first equipment and the second equipment.

In one possible implementation, the method further includes:

receiving a first message from the second device through the first transmission interface; the first message is used for indicating the state of the first equipment for collecting the media information; and responding to the first message, and adjusting the state of the first equipment for collecting the media information.

By the method, the first device can adjust the state of the first device for acquiring the media information according to the first message sent by the second device, so that the media information required to be acquired by the first application can be better acquired, and the effect of the first application is improved.

One possible implementation manner, the state of the first device collecting the media information includes at least one of the following: an on state, an off state, or parameters to collect media information.

In one possible implementation, the method further includes: receiving a second message from the second device through the first transmission interface; wherein the second message is used for instructing the first device to acquire first data; responding to the second message, acquiring the first data, or acquiring the first data; sending the first data to the second device; the first data is one of: the media information collected by the first device, the parameters of the first device, the data stored by the first device, and the data received by the first device.

Through the method, the second device can instruct the first device to acquire the first data, so that the transmission of the characteristic data is compatible with the transmission of other data while the characteristic data is transmitted between the first device and the second device, which is beneficial to improving the transmission performance and the adaptability of the transmission scene required by the first device and the second device for realizing the AI application function.

In one possible implementation, the first data is sent to the second device through the first transmission interface.

By the method, the first transmission interface can support the transmission of various data, and the transmission performance and the adaptability of a transmission scene are improved.

In a possible implementation manner, a second message is received from the second device through the first transmission interface, where the second message is used to instruct the first device to collect feature data of third media information; collecting the third media information in response to the second message; extracting the characteristics of the third media information to obtain third characteristic data; and sending the third characteristic data to the second equipment through the first transmission interface.

Through the method, the second message can be sent by the second device to control the first device to collect the media information and transmit the corresponding third characteristic data, for example, the third media information to be collected can be determined according to the processing result of the first application or the requirement of the AI application, so that the media information collected by the first device can be flexibly adjusted, the AI application can obtain a better result, and the effect of the AI application is integrally improved.

In a possible implementation manner, the second message or the first message is determined by the second device according to the first feature data.

Through the method, the second device can determine the result of the first application based on the first characteristic data transmitted by the first device, and generate the first message or the second message according to the result of the first application, so as to feed back the corresponding first message or the second message to the first device, and the first device can adjust the acquisition, acquisition and transmission of the media information in response to the first message or the second message, so that the first device and the second device can better complete the first application cooperation.

In one possible implementation, the first device may further include a display unit; the method further comprises the following steps: receiving a third message from the second device through the first transmission interface, wherein the third message is determined by the second device according to the first characteristic data, and the third message is used for indicating the content displayed by the first device; and responding to the third message, and displaying the content which is used for indicating the first equipment to display in the third message through a display unit.

By the method, the content to be displayed can be obtained through the third message, and the content can be the processing result of the first application or the content which needs to be displayed by the first device by other second devices, so that the first device and the second device can better realize the cooperation of the AI application, and the use experience of the AI application is improved.

In one possible implementation, the method further includes:

receiving an authentication request message sent by the second device through the first transmission interface, where the authentication request message is used to request whether the first device establishes a communication connection with the second device, and the communication connection is used to confirm the authority of the second device to control the first device;

sending an authentication response message to the second device through the first transmission interface; the authentication response message is used for confirming the authority of the second device to control the first device.

By the method, whether the second device can obtain the authority for controlling the first device or not can be confirmed through authentication between the first device and the second device, so that the first device is adjusted to collect the media information after the second device obtains the result of the first application according to the first characteristic data, a better result of the first application is obtained, and the performance of the AI application is improved.

In one possible implementation, the method further includes:

receiving an authentication success message sent by the second equipment through the first transmission interface; the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

By the method, the first device and the second device can be set as devices in a distributed system, so that the first device and the second device can be better managed, and AI application cooperation can be realized by utilizing a plurality of devices.

In one possible implementation, the first device includes a first module; the authentication success message further includes at least one of: an identification of a first module of the first device, and an identification of the first module in the distributed system.

Through the method, the module in the first device can be set as the module in the distributed system, so that preparation is made for the second device to control the modules in the devices and cooperatively finish the AI application.

In a possible implementation manner, the first device and the second device establish a channel connection through a third transmission interface; the feature data or the message sent by the first device is sent through the third transmission interface after being encapsulated into first bit stream data through the first transmission interface.

According to the method, the characteristic data are packaged through the first transmission interface, and the packaged data are sent to the second equipment through the third transmission interface, so that multiple transmission protocols can be compatible through the third transmission interface, functions such as aggregation transmission and the like can be realized, and the transmission capability of the first equipment and the compatibility of transmission media information are improved.

In a possible implementation manner, the first device and the second device establish a channel connection through a third transmission interface; the message received by the first device is second bitstream data received through the third transmission interface, and the second bitstream data is obtained by decapsulating the second bitstream data through the first transmission interface.

By the method, the data received by the third transmission interface from the second device is de-encapsulated through the first transmission interface, and the third transmission interface can be compatible with various transmission protocols and can also realize functions such as aggregation transmission and the like, so that the transmission capability of the first device and the compatibility of transmission media information are improved.

In a second aspect, the present application provides a media information transmission method, which is applied to a second device; the second device comprises a second transmission interface; the method comprises the following steps: receiving first feature data from a first device through the second transmission interface; the first characteristic data is determined after characteristic extraction is carried out on the collected first media information according to first equipment; and processing the first characteristic data to obtain a processing result of the first application.

The first application may be an AI application, and the device for executing the method may be a second device, or may be a module or a component, such as a chip, having corresponding functions. Taking the second device performing the method as an example below, the second device may be a device having the capability of receiving and processing the feature data. The second device may include a second transmission interface, and the second transmission interface supports transmission of the characteristic data, so that the second device receives the characteristic data instead of directly receiving the media data, thereby avoiding the need to perform compression coding on the media information at the sending end and to recover the transmitted media information at the receiving end of the media information in the prior art, which is more complex in coding and decoding processes, reducing the overhead of computing resources, and integrally reducing the delay of the system, and is helpful for being applied to AI applications with high real-time requirements, and improving the experience of users in using the AI applications. In addition, because the abstract feature data is transmitted, the encoding difficulty is reduced, the transmitted information amount is greatly reduced, and the expense of transmission resources is reduced. Furthermore, in consideration of the security during the transmission of the media information, the abstracted feature data is transmitted, and compared with the mode of transmitting the media information in the prior art, the feature data transmitted in the transmission interface cannot be reversely converted into the original media information, so that better privacy protection capability is realized.

In one possible implementation, in response to a second operation on a first application, sending a first notification message to the first device; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; receiving a first response message returned by the second equipment; the first response message is used for confirming that the first device and the second device start first application cooperation.

According to the method, the second device responds to the second operation on the first application, the establishment process of establishing the first application cooperation between the first device and the second device is triggered, whether the first application cooperation is started or not is confirmed through the first device, and the experience of a user for cooperatively processing the first application by using the first device and the second device is improved.

In a possible implementation manner, a first notification message sent by a first device is received; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; in response to a fourth operation on the first application, sending a first response message to the first device; the first response message is used for confirming that the first device and the second device start first application cooperation.

Through the method, the first device can trigger the establishment process of establishing the first application cooperation between the first device and the second device, and the second device responds to the fourth operation on the first application to confirm whether the first application cooperation is started or not. The first application cooperation may be a cooperation process in which the first device sends the first feature data to the second device through the first transmission interface, and the second device obtains a result of the first application according to the first feature data, so as to improve an experience of a user in cooperatively processing the first application by using the first device and the second device.

A possible implementation manner, before receiving the first feature data from the first device, further includes:

sending a capability negotiation request message to the first device through the second transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information; receiving a capability negotiation response message sent by the first equipment through the second transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data.

receiving a capability negotiation request message sent by the first equipment through the second transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the second transmission interface and the feature data processing capability of the second device, and the transmission protocol of the second device is used for indicating that the second device supports transmission of feature data; the feature data processing capability of the second device is used for indicating the capability of the second device supporting the processing of the first feature data to obtain the result of the first application; sending a capability negotiation response message to the first device through the second transmission interface; the capability negotiation response message is used for confirming that the second device supports a transmission protocol for transmitting the feature data and the feature data processing capability of the second device.

A possible implementation manner, before receiving the first feature data, further includes:

acquiring a first characteristic data processing model; the first characteristic data processing model is used for the second equipment to process the first characteristic data to obtain a result of the first application; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature extraction model is used for performing feature extraction on the first media information.

In one possible implementation manner, the capability negotiation response message further includes: a version of a feature extraction model in the first device; or, a version of a feature data processing model in the second device.

One possible implementation manner, where the obtaining of the first feature data processing model includes: receiving a first feature data processing model from the first device through the second transmission interface, or receiving a first feature data processing model from a server, or reading a first feature data processing model stored by the second device;

in one possible implementation, the method further includes: sending a first feature extraction model to the first device through the second transmission interface; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature data processing model is used for the second device to process the first feature data to obtain the result of the first application. In one possible implementation, the method further includes: and acquiring a second feature data processing model, wherein the version of the second feature data processing model corresponds to the version of the second feature extraction model, and the second feature extraction model and the second feature data processing model are determined after the first feature extraction model and the second feature data processing model are updated.

By the aid of the method, the first equipment can acquire the first feature extraction model, the second equipment can acquire the first feature data processing model, and more flexible AI application cooperation modes are achieved.

In one possible implementation, the method further includes: receiving first training feature data; the first training feature data is determined after the first equipment performs feature extraction on a training sample according to the first feature extraction model; and training the first feature data processing model according to the first training feature data.

In one possible implementation, the method further includes: obtaining feedback data of the first feature extraction model; the feedback data is determined after the second equipment is trained according to the first training characteristic data; the feedback data is used for the first device to train the first feature extraction model; sending the feedback data to the first device.

In one possible implementation, the method further includes: receiving second characteristic data sent by the second equipment; the second feature data is determined after the first equipment performs feature extraction according to the collected second media information and the second feature extraction model; and processing the second characteristic data according to the second characteristic data processing model to obtain a result of the first application.

In one possible implementation, the method further includes: sending a first message to the first device through the second transmission interface; the first message is used for indicating the state of the first device for collecting the media information.

By the method, the second device can send the first message to the first device to adjust the state of the first device for acquiring the media information, so that the media information required to be acquired by the first application can be better acquired, and the effect of the first application is improved.

In one possible implementation, the method further includes: sending a second message to the first device through the second transmission interface; the second message is used for instructing the first equipment to acquire first data; the first data is one of: the media information collected by the first device, the parameters of the first device, the data stored by the first device, and the data received by the first device.

In one possible implementation, the method further includes: receiving the first data from the first device through the second transmission interface.

By the method, the second transmission interface can support the transmission of various data, and the transmission performance and the adaptability of a transmission scene are improved.

In one possible implementation, the method further includes: sending a second message to the first device through the second transmission interface; the second message is used for instructing the first equipment to collect the characteristic data of the third media information; receiving third characteristic data sent by the first equipment through the second transmission interface; the third feature data is determined after the first device performs feature extraction on the acquired third media information.

In a possible implementation manner, the first message or the second message is determined according to a processing result of the first feature data.

In one possible implementation, the first device further includes a display unit; the method further comprises the following steps:

generating a third message in response to a processing result of the first feature data; the third message is used for indicating the content displayed by the first device.

In a possible implementation manner, the number of the first devices is N; the method further comprises the following steps:

receiving a fourth message through the second transmission interface; the fourth message comprises M first feature data of the N first devices; n, M is a positive integer greater than 1; m is greater than or equal to N;

and processing the M first characteristic data according to the characteristic data processing models corresponding to the M first characteristic data to obtain a result of the first application.

By the method, M pieces of first characteristic data in the plurality of first devices can be transmitted, the M pieces of first characteristic data are processed in the corresponding characteristic data processing model in the second device, first application cooperation between the plurality of first devices and the second device is achieved, and the effect of the first application cooperation is improved.

In one possible implementation, the method further includes:

sending an authentication request message to the first device through the second transmission interface, wherein the authentication request message is used for requesting whether the first device establishes a communication connection with the second device; the communication connection is used for confirming the authority of the second device to control the first device; receiving an authentication response message sent by the second device through the second transmission interface; the authentication response message is used for confirming whether the first device establishes communication connection with the second device.

In one possible implementation, the method further includes: responding to an authentication response message sent by the second equipment, and setting an equipment identifier corresponding to the first equipment and identifiers of distributed systems where the first equipment and the second equipment are located for the first equipment; the device identifier corresponding to the first device and the identifier of the distributed system are used for communication between the first device and the second device; the first equipment sends an authentication success message to the second equipment through the first transmission interface; the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

In one possible implementation, the second device includes a second module; the authentication success message further includes at least one of: an identification of the second module, and an identification of the second module in the distributed system.

Through the method, the modules in the first device can be set as the modules in the distributed system, so that the modules in the first devices are controlled by the second device, and preparation is made for finishing the AI application cooperatively.

In a possible implementation manner, the second device further includes a third transmission interface; the first equipment and the second equipment establish channel connection through a third transmission interface; and the message sent by the second device is sent through the third transmission interface after being encapsulated into second bit stream data through the second transmission interface.

By the method, the data are packaged through the second transmission interface, and the packaged data are sent to the first equipment through the third transmission interface, so that multiple transmission protocols can be compatible through the third transmission interface, functions such as aggregation transmission and the like can also be realized, and the transmission capability of the second equipment and the compatibility of transmission media information are improved.

In a possible implementation manner, the first device and the second device establish a channel connection through a third transmission interface; the characteristic data or the message received by the second device is the first bit stream data received through the third transmission interface, and the second bit stream data is obtained after being decapsulated through the second transmission interface.

By the method, the data received by the third transmission interface from the first device is de-encapsulated through the second transmission interface, and the third transmission interface can be compatible with various transmission protocols and can also realize functions such as aggregation transmission and the like, so that the transmission capability of the second device and the compatibility of transmission media information are improved.

In a third aspect, the present application provides an electronic device comprising memory and one or more processors; wherein the memory is to store computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of the first aspects.

In a fourth aspect, the present application provides an electronic device comprising memory and one or more processors; wherein the memory is to store computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of the possible implementations of the first or second aspect.

In a fifth aspect, the present application provides a media information transmission system, including: the electronic device of the third aspect and the electronic device of the fourth aspect.

In a sixth aspect, the present application provides a computer-readable storage medium comprising program instructions which, when run on an electronic device, cause the electronic device to perform any one of the possible methods of the first aspect, or cause the electronic device to perform any one of the possible methods of the second aspect.

Drawings

Fig. 1a is a schematic structural diagram of a media information sending device in the prior art;

FIG. 1b is a schematic diagram of a media information receiving apparatus in the prior art;

FIG. 1c is a schematic diagram of a media information receiving apparatus in the prior art;

fig. 2a is a schematic diagram of a media information transmission method provided in the present application;

FIG. 2b is a schematic structural diagram of an AI algorithm model provided in the present application;

fig. 3a is a schematic structural diagram of a first apparatus provided in the present application;

FIG. 3b is a schematic structural diagram of a second apparatus provided herein;

FIG. 3c is a schematic diagram of a distributed system architecture according to the present application;

fig. 4a is a schematic flowchart of a method for establishing a communication connection in cooperation with an AI application according to the present application;

4 b-4 c are schematic views of a search interface of a first device provided in the present application;

FIGS. 4 d-4 e are schematic diagrams of an interface of an AI application collaboration provided herein;

FIG. 5a is a schematic diagram of a distributed system architecture provided herein;

fig. 5b is a schematic flowchart of a media information transmission method provided in the present application;

FIG. 5c is a schematic view of a scenario provided herein;

FIG. 5d is a schematic diagram of an AI application provided herein;

FIG. 6a is a schematic diagram of a distributed system architecture provided herein;

fig. 6b is a schematic flowchart of a media information transmission method provided in the present application;

fig. 6c is a schematic view of a scenario provided by the present application;

FIG. 6d is a schematic diagram of an AI application provided herein;

FIG. 7a is a schematic diagram of a distributed system architecture provided herein;

fig. 7b is a schematic flowchart of a media information transmission method provided in the present application;

FIG. 7c is a schematic view of a scenario provided herein;

FIG. 7d is a schematic diagram of an AI application provided herein;

FIG. 8 is a schematic structural diagram of a possible electronic device according to an embodiment of the disclosure;

fig. 9 is a schematic structural diagram of another possible electronic device according to an embodiment of the present disclosure.

Detailed Description

In the following, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.

1) Media information

The media information related to the application can comprise: image information, audio information, video information, sensor information, etc. are collected by the first device. For example, the captured video image information may be audio, video, visible light images, or radar, depth, etc. information. The first device may include: cameras, sensors, microphones, and other devices or units having a media information collection function.

Taking the media information as the image information as an example, the media information acquired by the first device according to the embodiment of the present application may be an original image, for example, an output image of a camera, that is, original data obtained by converting light information reflected by an object collected by the camera into a digital image signal, where the original data is not processed. For example, the original image may be raw format data. The raw format data may include information of the object and camera parameters. The camera parameters may include a sensitivity (ISO), a shutter speed, an aperture value, a white balance, and the like. The raw image may also be an input image of an ISP, or the raw image may also be an input image of a network neural unit, such as a neural-Network Processing Unit (NPU) below. The output image of the network neural unit may be a High Dynamic Range (HDR) image, or may be an image after other processing, and is not limited herein.

The media information acquired by the first device according to the embodiment of the present application may also be an output image of an ISP, where the ISP processes an original image to obtain an image in RGB format or YUV format, and adjusts the brightness of the image in RGB format or YUV format to obtain the image. The specific value of the ISP for adjusting the brightness of the RGB format or YUV format image may be set by the user or set by the mobile phone when the mobile phone leaves the factory. The media information acquired by the first device may also be an input image of a processor, such as a Graphics Processing Unit (GPU) of the first device, hereinafter.

It should be noted that the "media information" referred to in the embodiments of the present application, for example, the original media information, the media information acquired by the first device, the media information processed by the second device (for example, an HDR image), and the like, when the media information is an image, may refer to a picture, or may be a set of some parameters (for example, pixel information, color information, and luminance information).

The embodiments of the present application relate to a plurality of numbers greater than or equal to two.

2) Sending terminal equipment of media information

The sending end device of the media information related to the embodiment of the application can be a device with a media information acquisition function. The media information may include one or more of image information, video information, audio information, sensor information. Taking the media information as video image information as an example, the sending end device of the media information may be a unit or a device with a video image acquisition function, and the acquired video image information may be audio, video, visible light images, or media information such as radar, depth, and the like.

The media information sending end device may include a video acquisition unit such as a camera for acquiring video information or image information, and may further include an audio acquisition unit such as a microphone for acquiring audio information. The video capture unit may be one or more of an optical lens, an image sensor, a microphone, etc. for capturing the original media audio signal (audio, image, or mixed). For example, the sending end device of the media information may be: mobile terminals such as mobile phone panels, smart home terminals such as smart televisions, and mobile phone accessory devices such as AR/VR head-mounted displays, vehicle-mounted cameras, and external cameras. For example, the sending end device of the media information may be a terminal device including a media acquisition unit, such as a smart screen. At this time, the sending end device of the media information collects original audio and video information, and forms audio and video signals with a standard format after processing. The sending end equipment of the media information can also be used as a sending end of the media information and is sent to a receiving end through a transmission interface or a network after being compressed and coded. The transmission interface may be a media transmission interface such as HDMI, DP, USB, etc.

In other possible scenarios, the sender device of the media information may also be a device that obtains the media information and sends the media information, for example, the sender device of the media information may obtain the media information from a network or from a local storage unit and send the media information to the second device. At this time, the sending end device of the media information may not be a device having the media information collecting function, that is, the sending end device of the media information may only be a device having the sending function of sending the media information.

At this time, the sending end device of the media information may send the obtained audio/video media information to the receiving end device through a transmission interface or a network after compression and encoding. The transmission interface may be a media transmission interface such as HDMI, DP, USB, etc.

As shown in fig. 1a, which is a schematic structural diagram of a sending end device of media information provided in the prior art, the sending end device of media information shown in fig. 1a may include: a media signal acquisition unit (e.g., an audio acquisition unit and a video acquisition unit), a media encoding unit (e.g., an audio encoder and a video encoder), and an output interface (e.g., an audio output interface and a video output interface). The media signal acquiring unit may have a plurality of implementation forms, for example, the media signal acquiring unit may include at least one of the following: the device comprises a media signal acquisition unit, a media signal receiving unit and a storage unit. The media signal acquisition unit can be used for acquiring original media signals. For example, one or more of an optical lens, an image sensor, a microphone, a radar, etc., media signal acquisition sensor or device may be included. The acquired media information can be audio, video, visible light image, or radar, depth and the like. The media signal receiving unit can be used for receiving media signals from a network or other devices. The storage unit may be configured to store the media signal locally on the sending end device, and of course, may also be configured to store other information. The media coding unit is used for carrying out media coding and channel coding on the media signal acquired by the first equipment according to a media coding protocol, a link layer and a physical layer protocol to obtain a physical layer signal, and transmitting the physical layer signal to the output interface so as to send the physical layer signal to the receiving end equipment of corresponding media information. Optionally, the media information acquiring apparatus may further include a media signal preprocessing unit. The media information preprocessing unit can be used for preprocessing the original audio and video media signals such as noise reduction and recovery. For example, the video preprocessing unit may be configured to perform preprocessing such as noise reduction and demosaicing on the original video frame image.

3) Receiving end equipment of media information

The receiving end device of the media information, as a receiving end of the media information, may be a media processing device. The receiving terminal device can be a mobile phone, a tablet computer, an intelligent television, a vehicle-mounted computer and other terminal devices. The electronic device may also be referred to as a terminal device. A terminal device can also be called a user device, access terminal, subscriber unit, subscriber station, mobile, remote station, remote terminal, mobile device, user terminal, wireless communication device, user agent, or user equipment. The terminal device may be a mobile phone (mobile phone), a tablet computer (pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal, an Augmented Reality (AR) terminal, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote medical (remote medical), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), and the like. Cellular phones, cordless phones, Personal Digital Assistants (PDAs), handheld devices with wireless communication capabilities, computing devices or other processing devices for wireless modem adaptation, in-vehicle devices, wearable devices, and the like.

The receiving end device of the media information may also be a set top box, a DOCK, a smart television, a smart large screen, a mobile phone, a tablet computer or a Personal Computer (PC), a smart screen, a mobile phone, a smart camera, a smart speaker, an earphone, or other terminal devices. The intelligent screen can be a video entertainment center in a family, and is further an information sharing center, a control management center and a multi-device interaction center. The terminal device may also be a portable terminal containing functionality such as a personal digital assistant and/or a music player, such as a mobile phone, a tablet computer, a wearable device with wireless communication functionality (e.g. a smart watch), a vehicle-mounted device, etc. Exemplary embodiments of the portable terminal include, but are not limited to, a mount

Or other operating system. The portable terminal may also be a portable terminal such as a Laptop computer (Laptop) with a touch sensitive surface, e.g. a touch panel, etc. It should also be understood that in other embodiments, the terminal may be a desktop computer with a touch-sensitive surface (e.g., a touch panel).

The receiving end device of the media information may also be a processor chip in a set top box, a display screen, an intelligent large screen, a Television (TV), a mobile phone, or other terminal devices with media information processing functions, and the processor chip may be, for example, a system on chip (SoC) or a baseband chip. The second device may also be a computing device deployed with a Graphics Processing Unit (GPU), a distributed computing device, or the like.

As shown in fig. 1b, a schematic structural diagram of a receiving end device of media information provided in this application is shown, where the media processing apparatus may include: the device comprises an input interface, a media decoding unit and a media information processing unit. Wherein the input interface may be configured to receive a media signal transmitted from a transmitting end (e.g., a transmitting end device of the media information). The input interface is used for receiving physical layer signals from a transmission channel, and the media decoding unit is used for decoding media data signals from the physical layer signals according to a link layer protocol and a physical layer protocol.

For example, the media decoding unit and the media information processing unit may include: the system comprises a parser, an audio decoder, a video decoder and a video post-processing unit. Each unit may be implemented by hardware, software, or a combination of hardware and software. For example, the video decoder, the video post-processing unit, etc. are implemented by hardware logic, the units of AI analysis of media data, display policy processing, etc. may be implemented by software code running on a hardware processor, and other units of the audio decoder, etc. may be implemented by software.

In a possible scenario, a receiving end device of media information decompresses and decodes a coded signal acquired from an interface channel to restore audio/video media information, and illustratively, a media file in a format such as mp4 is parsed by a parser to obtain an audio coded file, a video coded file, and the like. The audio coding file may be audio Elementary Stream (ES) data, and the video coding file may be video ES data. The audio coding file is decoded by an audio decoder to obtain audio data; and processing the video coding file by a video decoder to obtain a video frame. In addition, the receiving end device of the media information can be used for synchronizing the image obtained by video post-processing with the audio data, so that the output of the audio output interface is synchronized with the output of the video output interface, namely, the audio output by the audio output interface is synchronized with the video image output by the video output interface.

Optionally, the receiving end device of the media information may further include a display unit, and at this time, the received audio/video media information may be processed and played. Alternatively, the display unit may be located in another device that establishes a communication connection with the media data processing apparatus wirelessly or by wire. For example, the display unit may be located in a terminal device such as a display (or referred to as a display screen), a television, a projector, or the like. The display unit can be used for playing the media files processed by the data processing device and also playing other media files.

In another possible scenario, with the development of deep learning algorithm and NPU and other computing hardware, machine vision, voice interaction and other Artificial Intelligence (AI) applications are rapidly popularized, and the data processing device can perform AI application and other processing on multimedia files, that is, the data processing device can also have AI processing capability for realizing corresponding AI application functions.

In this scenario, after the audio/video media information is restored after the information set acquired by the receiving end from the interface channel is decompressed and decoded, operations such as image processing and the like can be executed, and the AI hardware and software processing system of the receiving end is utilized to execute the subsequent AI application processing, so as to obtain the AI analysis result of the media data.

For example, as shown in fig. 1c, the receiving end device of the media information may further include: the device comprises a neural network training unit and a neural network processing unit.

And the neural network training unit is used for training the AI algorithm model by using the labeled data. The training process of the AI algorithm model can be performed offline on a server or online on a device side or a cloud side.

And the neural network processing unit is used for loading 1 or more AI algorithm models, processing the media data and acquiring an inference result of the AI algorithm models. For example, the deep learning algorithm of the receiving end device of the media information and the AI software and hardware systems such as the NPU are used to perform AI processing on the audio/video signal, so as to obtain inference results such as detection, classification, identification, positioning, tracking, and the like, and the inference results can be used in a corresponding AI application scenario. For example, the inference result is utilized to realize functions of biological feature recognition, environment recognition, scene modeling, machine vision interaction, voice interaction and the like.

4) Transmission of media information

The transmission involved in embodiments of the present application includes receiving and/or transmitting. The sending end of the media information and the receiving end of the media information can be connected in a wired or wireless way and the like and transmit the media information. The transmission interface may be in the form of a wired transmission electric signal, an optical signal transmitted by an optical fiber, a radio signal, a wireless optical signal, or the like. The sending end device of the media information and the receiving end device of the media information can establish physical channel connection through wired and/or wireless communication protocols such as copper wires, optical fibers and the like.

Fig. 1c is a schematic diagram of an architecture of a network system for media information transmission according to an embodiment of the present application, where the network system includes a sending end device of media information and a receiving end device of media information.

For example, in a wired manner, a physical layer signal for transmitting media information may be transmitted through a transmission channel. The transmission channel may be a physical transmission channel such as a copper wire and an optical fiber. The signal of the transmitted media information may be a wired electric signal, an optical signal, or the like. The data signal for transmitting the media information may be a data signal of an HDMI protocol, a data signal of a DP protocol, or a data signal of another protocol. For example, interface standards used by electronic devices to transfer media data include: a High Definition Multimedia Interface (HDMI), a USB interface, a DP interface, and the like. HDMI is an interface for transmitting uncompressed digital high-definition multimedia (video and/or audio). In data transmission, the HDMI uses a Transition Minimized Differential Signaling (TMDS) technique. USB is a serial bus standard and also a specification of input/output interfaces. For example, the USBType-C interface can support PDs and support the transfer of data other than multimedia data.

In order to realize the transmission of the media signal, one possible way is to encode the audio/video media information obtained from the sending end of the media information before the transmission of the media information, and transmit the encoded audio/video media information to the receiving end of the media information. The method can transmit the media signal in the channel, and the media signal has large data volume in the transmission process, high consumption and calculation resources, high cost and large overall delay of the system, and is not beneficial to AI application with high real-time requirement.

In order to transmit media information under a lower channel bandwidth and reduce the data amount in the video transmission process, so that the timeliness of video transmission is higher, in another possible way, a sending end of the media information may perform preprocessing (such as resolution compression processing) on the video, that is, perform lossy compression coding on the media information, so that the generated lossy compression video is coded and then sent to a receiving end device. In this way, the receiving end of the media information cannot completely acquire the media information due to lossy compression, which may affect the application of the subsequent media information. In addition, since the media information needs to be compressed and encoded at the sending end and the transmitted media information needs to be recovered at the receiving end of the media information, the encoding and decoding process is complex, more computing resources need to be consumed, the overall delay of the system is large, and the application to AI application with high real-time requirement is not facilitated.

In addition, since media information is transmitted on both the transmission channel and the receiving end, there is a risk that privacy of media contents such as images and audio is leaked. Even private information in a media signal subjected to encryption processing risks being leaked in the transmission process and the receiving end.

Based on the above problem, as shown in fig. 2a, the present application provides a flow diagram of a media information transmission method. In the method, according to the requirement of an AI application (for example, a first application), a corresponding algorithm model is correspondingly set. When a distributed AI application cooperation is performed by a first device (e.g., a sending end device of media information) and a second device (e.g., a receiving end device of media information), in the present application, a pre-trained algorithm model or an algorithm model obtained by online training may be divided into two parts: an input section and an output section.

The input part may be a first feature extraction model. The output section may be a feature data processing model that post-processes the feature data. The first feature extraction model may be a feature extraction unit of a convolutional neural network (convolutional neural network), and the first feature data processing model may be any neural network model (such as a convolutional neural network model), and may also be other algorithm models, which is not limited herein. Fig. 2b illustrates a convolutional neural network model as an example.

The input part is a feature extraction part (also called a feature extraction model), and the input part can comprise an input layer of an AI algorithm model and is used for performing feature extraction on the acquired media information such as audio and video to obtain feature data. For example, the output feature data may be feature data subjected to convolution, weighting, or the like of the input layer of the AI algorithm model. X 1-x 6 in fig. 2b are convolution modules in the input layer. The output section is a section for post-processing the feature data (which may also be referred to as a feature data processing model), and includes, for example, a hidden layer and an output layer of the model, and the feature data obtained by extracting features for the convolution modules in the input layer through x1 to x7 in fig. 2b is input to the convolution module in the hidden layer, and the obtained data is input to the output layer, which may be, for example, a classifier, and the processing result of the AI algorithm model is obtained.

When the AI application cooperation is performed by the first device and the second device, an input part of the AI algorithm model corresponding to the AI application may be loaded to the first device, and an output part of the AI algorithm model corresponding to the AI application may be loaded to the second device. Both the first device and the second device have the hardware and software capabilities required for the AI application, including, for example, NPU's and other computing hardware that process the input part of the AI algorithm model and the AI algorithm model for the AI application cooperation. Taking the processor processing the AI algorithm model as the NPU as an example, at this time, the input part may be deployed in the NPU of the first device, and the output part may be deployed in the NPU of the second device.

Step 201: the first device obtains first media information.

The manner in which the first device obtains the first media information may refer to the manner in which the sending end device of the media information in fig. 1a obtains the first media information, and is not described herein again.

Step 202: and the first equipment extracts the characteristics of the first media information and determines first characteristic data of the first media data.

The first device may determine an input portion (a first feature extraction model) of an AI algorithm model corresponding to the AI application based on the AI application in the AI application coordination between the first device and the second device, and load the input portion of the AI algorithm model of the AI application on the first device to perform feature extraction on the first media information.

Step 203: the first device sends the first feature data to the second device.

The feature data output by the input part of the AI algorithm model loaded by the first device is transmitted to the second device through the transmission interface in the application, and the final output result of the AI model is obtained after further processing by the output part (the first feature data processing model) of the AI algorithm model loaded by the second device. For example, the media information acquired by the first device may not be transmitted to the second device, but the NPU of the first device may be used to run the input part of the AI algorithm model, convert the media information into the feature data and transmit the feature data, so that the feature data is transmitted to the second device through the transmission interface supporting feature data transmission to perform the processing of the output part of the AI model.

By the method, the processing capacity of the feature extraction model is increased in the first equipment for sending the media information, so that the sending end does not need to send the media information but sends the feature data obtained by extracting the features of the media information. The data volume of the characteristic data is obviously lower than that of the original audio and video information, an additional compression coding process of the media information can be omitted, the power consumption and the time delay of the system are reduced, the cost is reduced, the real-time transmission under the condition of smaller channel bandwidth can be realized, and the product competitiveness is improved. In addition, because the characteristic data transmitted in the transmission interface can not be reversely converted into the original media information, better privacy protection capability is realized.

In some embodiments, before the AI application collaboration is performed, the input part and the output part of the AI algorithm model may be stored in the first device and the second device, respectively, or may be stored in the first device, the second device, or the cloud server, which is not limited herein.

After the first device and the second device establish connection and determine that corresponding collaborative processing AI tasks can be executed, the first device and the second device respectively load an input part and an output part of an algorithm model corresponding to the AI application, and confirm whether the first device and the second device successfully load the input part and the output part of the algorithm model of the AI application.

Take the input part and the output part of the second device storing the algorithm model as an example. When the second device confirms that the first device fails to load the input part of the algorithm model corresponding to the AI application, the second device confirms whether the first device can execute the cooperation of the AI application according to whether the first device has the capability of loading the input part of the algorithm model for processing the AI application. And if so, transmitting the data of the input part of the model to the first device through a data transmission channel of the transmission interface so as to enable the first device to load the input part of the AI algorithm model.

There may be a variety of reasons for the loading failure, one possible reason being that, for example, the input portion of the algorithm model corresponding to the AI application is not stored on the first device. Another possible reason is that the version of the input part of the algorithm model corresponding to the AI application stored on the first device is not the version required by the AI application, and at this time, the version may be obtained from the device storing the input part of the algorithm model corresponding to the AI application according to the model identifier of the algorithm model corresponding to the AI application and the version identifier of the input part of the algorithm model corresponding to the AI application. In combination with the above example, the information may be obtained from the second device, or may be obtained from a corresponding server, which is not limited herein.

Take the input part and the output part of the algorithm model stored in the server in the network as an example. At this time, when the server confirms that the first device fails to load the input part of the algorithm model corresponding to the AI application, the server confirms whether the first device can execute the cooperation of the AI application according to whether the first device has the capability of loading the input part of the algorithm model for processing the AI application. If so, data of the input part of the model is transmitted to the first device via a data transmission channel of the transmission interface. Correspondingly, when the server confirms that the second device fails to load the output part of the algorithm model corresponding to the AI application, the server confirms whether the second device can execute the cooperation of the AI application according to whether the second device has the capability of loading the output part of the algorithm model for processing the AI application. And if so, transmitting the data of the output part of the model to the second equipment through a data transmission channel of the transmission interface so as to enable the second equipment to load the output part of the algorithm model.

It should be noted that the first device may be multiple, that is, the first feature extraction models in multiple first devices serve as input portions of the algorithm model of the second device, so as to implement cooperative AI application of the algorithm models of the multiple first devices and the second device. In addition, for different AI applications, the first feature extraction models in the first device may also be the same feature extraction model, or may also be different feature extraction models, and may be set correspondingly based on a specific application, which is not limited herein.

And determining label information for training the algorithm model according to the requirements of AI application. The label information may be label information manually labeled through network or man-machine interaction input, or label information such as standard label information and clustering information. Thus, the AI algorithm model is trained by the neural network training unit in the second device using the determined label data.

The training process of the AI algorithm model may be trained offline or online at the server. Or training or optimizing on-line at the device side or the cloud side. For example, the parameters of the AI algorithm model can be updated by methods such as online learning, transfer learning, reinforcement learning, and federal learning, so as to retrain or re-optimize the model. Optionally, the model may also be obtained by collaborative training or optimization using the distributed system in the present application. In the present application, a designated model ID may be assigned to each AI algorithm model obtained after training.

The distributed system is used for carrying out distributed cooperative training on the AI algorithm model, under the training mode, input information of the model is collected or input by first equipment, the transmission interface can transmit NPUs of the first equipment to output characteristic data for carrying out characteristic extraction on a training sample, the characteristic data of the training sample sent by the first equipment is trained by second equipment through label information to obtain feedback data of an output part of the AI algorithm model, the second equipment can reversely transmit the feedback data of the output part of the AI algorithm model to the input part of the model through the transmission interface for training the input part of the first equipment, and cooperative adjustment of two parts of network parameters is achieved, so that cooperative training of the first equipment and the second equipment is achieved.

Taking the example of performing collaborative training on the first feature extraction model and the first feature data processing model, at this time, the first device may perform feature extraction on the training sample according to the first feature extraction model to generate first training feature data; first training feature data is sent to the data processing apparatus. And the second equipment trains the first characteristic data processing model according to the received first training characteristic data and the label information of the training sample, and determines the feedback data for training the first characteristic extraction model. And transmitting the feedback data to the first device. The first device trains a first feature extraction model according to the feedback data.

When online training is started, an initial version ID may be assigned to the model obtained by offline training (for example, the initial version ID corresponds to the first feature extraction model and the first feature data processing model); if the model is retrained or re-optimized, updating a specified version ID; after each training or online optimization of the model, an updated version ID is assigned to the input part and the output part of the model (for example, the updated version ID corresponds to the second feature extraction model and the second feature data processing model). Thus, the input part and the output part of the model are collectively identified by the model ID and the version ID. Further, corresponding identifications can be set for the input part and the output part of the model. For example, for an input part of a model, a model ID and a version ID and an input part ID may be set. For the input part of the model, a model ID and version ID and an output part ID may be set. In another possible implementation, the model ID and the version ID may be set for the input part and the output part of the model, respectively. For example, the model ID includes a model input part ID and a model output part ID, and the version ID includes an input part ID of the model version and an output part ID of the model version. The present application is not limited to the specific implementation.

At this time, the first device may perform feature extraction on the acquired second media information according to the second feature extraction model based on the updated version while executing the AI application protocol corresponding to the AI algorithm model, to obtain second feature data. Thereby, the second characteristic data is transmitted to the second device. At this time, the second device may determine the second feature data processing model according to the model ID and the updated version ID, thereby obtaining an inference result of the AI algorithm model of the AI application according to the second feature data processing model and the second feature data.

Considering that there may be multiple scenarios of the first device, in this case, the second device may receive training data of multiple first devices to train a neural network model of the second device, generate feedback data of model training in the multiple first devices accordingly, and send the multiple feedback data to the multiple first devices, respectively, so that the multiple first devices may perform model training using the corresponding feedback data.

Optionally, when a plurality of units, such as a plurality of media information acquisition units, a storage unit, and a media signal receiving unit, exist in the AI application collaborative distributed system, a unit identifier may be set for each unit. Thus, the second device can address and transmit the control information and the feedback data for training the AI algorithm model according to the unit ID of each unit. For example, when the control information is sent to the media information collection unit, the sent message of the control information may carry the unit ID of the media information collection unit. When the feedback data is sent to the neural network training unit, the sent message of the control information may carry the unit ID of the neural network training unit.

Fig. 3a is a schematic structural diagram of a first device according to an embodiment of the present disclosure. The first device 200 may include a processor 210, an external memory interface 220, an internal memory 221, a transmission interface 230, a media acquisition unit 240, and a communication unit 250. Among them, the media capturing unit 240 may include: a microphone 270C, a headphone interface 270D, an audio unit 270, a speaker 270A; sensor unit 280, camera 281, etc. The communication unit 250 may include: antenna 1, antenna 2, mobile communication unit 251, wireless communication unit 252.

Processor 210 may include one or more processing units, such as: the processor 210 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and a Neural Network Processing Unit (NPU), for example. The different processing units may be separate devices or may be integrated into one or more processors. Wherein the controller may be a neural center and a command center of the first device 200. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

In an embodiment of the present application, the neural network processor may include a neural network processing unit. And the neural network processing unit is used for loading an input part of the AI algorithm model corresponding to the first device, processing the media information (such as the media information acquired by the first device) input to the neural network processing unit and outputting the characteristic data of the media information.

Processing the media information by using an input part of an AI algorithm model loaded on a neural network processing unit of the first device, and obtaining abstract characteristic data after processing; the first device transmits the characteristic information to the second device via the transmission interface of the first device, so that the second device performs further processing using the output part of the model and obtains the output result of the AI application.

The AI algorithm model may be determined by the server, may also be determined by the first device or the second device through separate training, or may also be determined by the first device and the second device through cooperative training. The AI algorithm model may be trained offline or online. The second device uses the output portion of the AI algorithm model based on the input portion of the first device using the AI algorithm model. Accordingly, the training process may also be an input part of the first device for training the AI algorithm model, and an output part of the second device for training the AI algorithm model, so as to implement the collaborative training of the AI algorithm model.

Optionally, the neural network processor may further include: and a neural network training unit. The neural network training unit causes the first device to train the AI algorithm model using the annotation data. The training process may be performed offline on the first device, may be performed online on the first device, and may be performed in cooperation with the second device.

When the training is performed cooperatively, the training data obtained by the online training may be sent to the second device, so that the second device may train the model of the second device according to the training data obtained by the first device, and generate feedback data for training the model in the first device, so that the first device trains the neural network model of the first device according to the feedback data.

A memory may also be provided in processor 210 for storing instructions and data. In some embodiments, the memory in the processor 210 is a cache memory. The memory may hold instructions or data that have just been used or recycled by processor 210. If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 210, thereby increasing the efficiency of the system.

The processor 210 may operate the media information transmission method provided in the embodiment of the present application, so as to implement cooperation of multiple devices of the first device under the AI application, and improve user experience. After the processor 210 runs the media information transmission method provided by the embodiment of the present application, the processor 210 may generate and send the feature data according to the acquired media information. Optionally, when the first device includes a display screen, the media information and the instruction for playing the media information may also be received from the second device, and the media information may be played through the display screen. The processor 210 may include different devices, for example, when the CPU and the GPU are integrated, the CPU and the GPU may cooperate to execute the media information transmission method provided in the embodiment of the present application, for example, part of algorithms in the media information transmission method is executed by the CPU, and another part of algorithms is executed by the GPU, so as to obtain faster processing efficiency.

Internal memory 221 may be used to store computer-executable program code, including instructions. The processor 210 executes various functional applications of the first device 200 and data processing by executing instructions stored in the internal memory 221. The internal memory 221 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, codes of application programs (such as a camera application, a WeChat application, etc.), and the like. The storage data area may store data created during use of the first device 200 (e.g., images, videos, etc. captured by a camera application), etc.

The internal memory 221 may also store one or more computer programs corresponding to the data transmission algorithm provided in the embodiments of the present application. The one or more computer programs stored in the memory 221 and configured to be executed by the one or more processors 210 include instructions that can be used to perform the steps as in the respective embodiments of fig. 2a to 7b, and can be used to implement the media information transmission method as referred to in the embodiments of the present application. When the code of the data transmission algorithm stored in the internal memory 221 is executed by the processor 210, the processor 210 may execute the media information transmission method referred to in the embodiments of the present application.

In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

Of course, the codes of the data transmission algorithm provided in the embodiment of the present application may also be stored in the external memory. In this case, the processor 210 may execute the code of the data transmission algorithm stored in the external memory through the external memory interface 220, and the processor 210 may perform the media information transmission method referred to in the embodiments of the present application.

The camera 281 (front camera or rear camera, or one camera may be both front camera and rear camera) is used to capture still images or video. In general, the camera 281 may include a photosensitive element such as a lens group including a plurality of lenses (convex or concave lenses) for collecting an optical signal reflected by an object to be photographed and transferring the collected optical signal to an image sensor, and an image sensor. And the image sensor generates an original image of the object to be shot according to the optical signal.

The first device 200 may implement an audio function through the audio unit 270, the speaker 270A, the receiver 270B, the microphone 270C, the headphone interface 270D, and the application processor, etc. Such as music playing, recording, etc. Alternatively, the first device 200 may receive a key 290 input, generating a key signal input related to user settings and function control of the first device 200.

Wherein the sensor unit 280 may include a distance sensor, a gyroscope sensor, an acceleration sensor, a proximity light sensor, a fingerprint sensor, a touch sensor, a temperature sensor, a pressure sensor, a distance sensor, a magnetic sensor, an ambient light sensor, an air pressure sensor, a bone conduction sensor, etc., not shown in the drawings.

The transmission interface in the first device 200 is used to connect other devices, so that the first device 200 performs media information transmission with other devices. In one possible implementation, the transmission interface in the first device 200 may include a first transmission interface and a third transmission interface. The data to be sent to the second device 300 is encapsulated into first bit stream data by connecting the first transmission interface with the processor of the first device, and is sent to the third transmission interface of the second device through the third transmission interface. Second bitstream data sent from the second device can be received through the third transmission interface, and thus, data or a message corresponding to the second bitstream data is obtained by decapsulating through the first transmission interface (the second bitstream data is the data or the message encapsulated by the second device through the second transmission interface). Thereby, a transmission channel established through the third transmission interface of the first device and the third transmission interface of the second device is made to support bidirectional transmission.

In other embodiments, the plurality of first devices 200 may further encapsulate the first feature data sent to the second device through the first transmission interfaces of the plurality of first devices. For example, the N first devices need to send M first feature data, at this time, the M first feature data may be packaged as independent bitstream data through the first transmission interfaces of the N first devices (the M first feature data may be packaged into M bitstream data, or may be packaged into N bitstream data according to the N first devices, respectively), and the packaged bitstream data is packaged into a packet (for example, a fourth message) through the third transmission interface and sent to the third transmission interface of the second device. Therefore, the second device may receive, through the third transmission interface, the fourth message of the M first feature data of the N first devices, which is encapsulated respectively, decapsulate the fourth message through the second transmission interface to obtain M first feature data, and forward, according to the feature data processing models corresponding to the M first feature data, the M first feature data to the corresponding feature data processing models for processing, so as to obtain a result of the first application.

When the transmission interface is a wired transmission interface 230, a cable adapted to the transmission interface 230 may be inserted into the transmission interface 230 or pulled out from the transmission interface 230 to contact and separate from the first device 200.

When the transmission interface of the first device 200 is a wireless communication interface, the wireless communication function of the first device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication unit 251, the wireless communication unit 252, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the first device 200 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication unit 251 may provide a solution including 2G/3G/4G/5G, etc. wireless communication applied on the first device 200. The mobile communication unit 251 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication unit 251 can receive electromagnetic waves from the antenna 1, and can perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication unit 251 can also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional units of the mobile communication unit 251 may be disposed in the processor 210. In some embodiments, at least some of the functional units of the mobile communication unit 251 may be provided in the same device as at least some of the units of the processor 210. In this embodiment of the application, the mobile communication unit 251 may further be configured to perform information interaction with the second device, that is, send a transmission request of the media information to the second device, and encapsulate the sent transmission request of the media information into a message in a specified format, or the mobile communication unit 251 may be configured to receive the transmission request of the media information sent by the second device.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor.

The modem processor may further include a channel coding unit and a decoding unit, where the channel coding unit is configured to perform channel coding on the data signal acquired by the first device according to a link layer protocol and a physical layer protocol to obtain a physical layer signal, and transmit the physical layer signal to the transmission interface. The data signal may be feature data determined by extracting features of the media information, or may be data of the media information. Of course, other parameters and device status information that need to be sent to the second device may also be transmitted to the second device (e.g., the second device) through the transmission interface according to the corresponding transmission protocol through the corresponding transmission channel. These information may be sent together with the media information, or may be sent through other transmission protocols, and the specific implementation manner is not limited in this application. The device also comprises a control information decoding unit used for decoding the control signal sent by the second device and also used for decoding the received feedback data from the second device, wherein the feedback information is used for the online training optimization of the model.

In one possible implementation, the application processor outputs sound signals via an audio device (not limited to speaker 270A, receiver 270B, etc.) or displays images or video via a display screen. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication unit 251 or other functional units, independently of the processor 210.

The wireless communication unit 252 may provide a solution for wireless communication applied on the first device 200, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication unit 252 may be one or more devices integrating at least one communication processing unit. The wireless communication unit 252 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 210. The wireless communication unit 252 may also receive a signal to be transmitted from the processor 210, frequency-modulate it, amplify it, and convert it into electromagnetic waves via the antenna 2 to radiate it. In this embodiment of the application, the wireless communication unit 252 is configured to establish a connection with the second device, and cooperatively complete the task of the AI application through the second device. Or the wireless communication unit 252 may be configured to access the access point device, send a message corresponding to the transmission request of the media information of the feature data to the second device, or receive a message corresponding to the transmission request of the media information sent from the second device. Optionally, the wireless communication unit 252 may also be used to receive media information from other devices.

Fig. 3b is a schematic structural diagram of a second apparatus according to an embodiment of the present disclosure.

The second device 300 may include a processor 310, an external memory interface 320, an internal memory 321, a transmission interface 330, an antenna 11, an antenna 12, a mobile communication unit 351, a wireless communication unit 352.

Processor 310 may include one or more processing units, such as: the processor 310 may include an Application Processor (AP), a modem processor, a Graphic Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, a Neural-Network Processing Unit (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors. Wherein the controller can be a neural center and a command center of the second device 300. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

In an embodiment of the present application, the neural network processor may include a neural network processing unit. And the neural network processing unit is used for loading and loading 1 or more output parts of the AI algorithm models. Processing the characteristic data according to the output part of the AI algorithm model corresponding to the characteristic data, transmitting the decoded characteristic data to a neural network processing unit of the second equipment, and processing the characteristic data by using the output part of the loaded AI algorithm model of the neural network unit to obtain a final inference result processed by the AI algorithm; and acquiring inference results such as detection, classification, identification, positioning, tracking and the like. And outputting the inference result to artificial intelligence application, and realizing the functions of biological feature recognition, environment recognition, scene modeling, machine vision interaction, voice interaction and the like by the artificial intelligence application by using the inference result.

Optionally, the second device in this embodiment of the present application may further include: a neural network training unit that causes the second device to train the AI algorithm model using the annotation data. The training process may be performed offline on the second device, may be performed online on the second device, or may be performed in cooperation with the first device. Optionally, when the first device and the second device cooperate to perform joint online training and optimization on the AI model, feedback data obtained by online training may be sent from the model output part to the model input part through the interface system via the transmission interface, so that the first device trains the neural network model of the first device according to the feedback data. At this time, the second device may also transmit online training feedback data of the AI model to the first device.

A memory may also be provided in the processor 310 for storing instructions and data. In some embodiments, the memory in the processor 310 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 310. If the processor 310 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 310, thereby increasing the efficiency of the system.

The processor 310 may operate the media information transmission method provided in the embodiment of the present application, so as to implement cooperation between the second device and the first device under the AI application, and improve user experience. After the processor 310 runs the media information transmission method provided by the embodiment of the present application, the processor 310 may generate and send the feature data according to the acquired media information. Optionally, when the first device includes a display screen, the media information and the instruction for playing the media information may be also sent to the first device, and the media information may be played through the display screen. Optionally, when the second device includes a display screen, the media information sent by the first device may also be received, or after receiving the feature data from the first device, the feature data is processed, and when the obtained inference result of the AI algorithm processing is the media information to be played, the media information may be played through the display screen.

The processor 310 may include different devices, for example, when the CPU and the GPU are integrated, the CPU and the GPU may cooperate to execute the media information transmission method provided in the embodiment of the present application, for example, part of algorithms in the media information transmission method is executed by the CPU, and another part of algorithms is executed by the GPU, so as to obtain faster processing efficiency.

The internal memory 321 may be used to store computer-executable program code, which includes instructions. The processor 310 executes various functional applications of the second device 300 and data processing by executing instructions stored in the internal memory 321. The internal memory 321 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, codes of application programs (such as a camera application, a WeChat application, etc.), and the like. The storage data area may store data created during use of the second device 300 (e.g., images, videos, etc. captured by a camera application), etc.

The internal memory 321 may also store one or more computer programs corresponding to the data transmission algorithm provided in the embodiments of the present application. The one or more computer programs stored in the memory 321 and configured to be executed by the one or more processors 310 include instructions that can be used to perform the steps in the respective embodiments of fig. 2a to 7a, and can be used to implement the media information transmission method in the embodiments of the present application. When the code of the data transmission algorithm stored in the internal memory 321 is executed by the processor 310, the processor 310 may execute the media information transmission method referred to in the embodiments of the present application.

In addition, the internal memory 321 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

Of course, the codes of the data transmission algorithm provided in the embodiment of the present application may also be stored in the external memory. In this case, the processor 310 may execute the code of the data transmission algorithm stored in the external memory through the external memory interface 320, and the processor 310 may perform the media information transmission method according to the embodiment of the present application.

The transmission interface 330 in the second device 300 is used to connect other devices so that the second device 300 can perform transmission of media information with other devices.

In one possible implementation, the transmission interface in the second device 300 may include a second transmission interface and a third transmission interface. The second transmission interface is connected to the processor of the second device, encapsulates the data to be sent to the second device 300 into second bitstream data through the second transmission interface, and sends the second bitstream data to the third transmission interface of the first device through the third transmission interface. The first bit stream data from the first device can be received through the third transmission interface, and the characteristic data, the data sum, the control information, the feedback data, the handshake data, the message and the like sent by the first device are obtained through de-encapsulation of the second transmission interface. Thereby, a transmission channel established through the third transmission interface of the first device and the third transmission interface of the second device is made to support bidirectional transmission.

In other embodiments, the second device 300 may further receive a fourth message through a second transmission interface of the second device; the fourth message comprises M first feature data of the N first devices; n, M is a positive integer greater than 1; m is greater than or equal to N; specifically, a fourth message of M first feature data of the N first devices that are respectively encapsulated may be received through the third transmission interface (the fourth message may be encapsulated as N data packets, or may be encapsulated as M data packets, which is not limited herein), and the fourth message is decapsulated through the second transmission interface to obtain M first feature data, and according to a feature data processing model corresponding to the M first feature data, the M first feature data is forwarded to a corresponding feature data processing model for processing, so as to obtain a result of the first application.

When the transmission interface 330 is a wired transmission interface, a cable adapted to the transmission interface may be inserted into the transmission interface 330 or pulled out from the transmission interface 330 to make contact with or separate from the second device 300.

When the transmission interface of the second device 300 is a wireless communication interface, the wireless communication function of the second device 300 may be implemented by the antenna 11, the antenna 12, the mobile communication unit 351, the wireless communication unit 352, the modem processor, the baseband processor, and the like.

The antennas 11 and 12 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the second device 300 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 11 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication unit 351 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied on the second device 300. The mobile communication unit 351 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication unit 351 may receive electromagnetic waves from the antenna 11, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication unit 351 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 11 to radiate the electromagnetic wave. In some embodiments, at least some of the functional elements of the mobile communication unit 351 may be located in the processor 310. In some embodiments, at least some of the functional elements of the mobile communication unit 351 may be provided in the same device as at least some of the elements of the processor 310. In this embodiment, the mobile communication unit 351 may be further configured to perform information interaction with the second device, that is, send a transmission request of media information to the first device, where the transmission request of media information received by the mobile communication unit may be encapsulated into a message in a specified format, or the mobile communication unit 351 may be configured to send a transmission instruction of media information to the first device or send a message of control information to the first device.

The modem processor may further include a channel encoding unit and a decoding unit.

The channel decoding unit may decode the data signal sent by the first device from the received physical layer signal of the first device according to the link layer and the physical layer protocol. The data signal may be feature data determined by the first device through feature extraction of the media information (the feature data may be an input of the neural network processing unit), or may be data of the media information. Of course, information such as other parameters and device states that need to be received by the second device may also be transmitted through the transmission interface according to the corresponding transmission protocol through the corresponding transmission channel. These information may be sent together with the media information, or may be sent through other transmission protocols, and the specific implementation manner is not limited in this application.

At this time, the channel coding unit may be used to code the data signal transmitted by the second device. For example, the data signal may be a control instruction sent to the first device. The control command may be channel coded by a channel coding unit of the second device according to an interface transmission protocol; the coded control signal is modulated by the transmission interface and then sent to the control channel, and is transmitted to the first device through the transmission interface and the control channel of the second device, so that the first device can receive the control instruction through the control channel. The channel coding unit may be further configured to code feedback data sent to the first device, where the feedback information is used for online training optimization of a model of the first device.

In one possible implementation, the application processor outputs sound signals through an audio device or displays images or video through a display screen. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be separate from the processor 310, in the same device as the mobile communication unit 351 or other functional units.

The wireless communication unit 352 may provide a solution for wireless communication applied on the second device 300, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication unit 352 may be one or more devices integrating at least one communication processing unit. The wireless communication unit 352 receives electromagnetic waves via the antenna 12, performs frequency modulation and filtering processing on the electromagnetic wave signal, and transmits the processed signal to the processor 310. The wireless communication unit 352 may also receive signals to be transmitted from the processor 310, frequency modulate them, amplify them, and convert them to electromagnetic radiation via the antenna 12. In this embodiment, the wireless communication unit 352 is configured to establish a connection with the first device, and complete the task of the AI application by cooperating with the first device. In some embodiments, the wireless communication unit 352 may further be configured to access the access point device, receive a message corresponding to the transmission request of the feature data sent by the first device, or send a message corresponding to the transmission request of the media information to the first device, and send a message corresponding to the control information to the first device. Optionally, the wireless communication unit 352 may also be used to receive media information from other first devices or information from other devices.

Fig. 3c is a schematic structural diagram of a distributed system in which an AI application composed of a first device and a second device is coordinated according to an embodiment of the present application. The transmission interface provided in this embodiment may be the transmission interface 230 in the first device or the communication unit 250 in the first device, and the transmission interface 330 in the second device or the communication unit 350 in the second device), which are illustrated in the figure by taking the transmission interface 230 in the first device and the transmission interface 330 in the second device as examples. The transmission interface provided in the embodiment of the present application may be applicable to multiple transmission protocols, and may also be referred to as an aggregation interface, or referred to as a NEW interface (NEW port), and may also use other names, which is not limited in this embodiment of the present application.

The transmission interface protocol in the embodiment of the application supports transmission of the feature data output by the first device. The data volume of the characteristic data obtained after the input part of the AI model is far lower than that of the original audio/video media data, so that the characteristic data occupies lower bandwidth, and the problems of high bandwidth transmission and more consumed bandwidth resources are solved. Especially, under wireless communication technologies with smaller bandwidth, such as WIFI and Bluetooth, real-time transmission of a transmission interface with low bandwidth is realized, and conditions are created for realizing real-time distributed AI processing. In addition, because the original media information cannot be recovered through the transmitted characteristic data, the potential privacy data leakage risk can be solved, and the data security of media information transmission is improved.

In some embodiments, the transmission interface protocol in the embodiment of the present application may support the first device and the second device to negotiate the software and hardware capabilities of AI processing, such as an NPU architecture, a model Identifier (ID) and a version Identifier (ID) of a loaded AI algorithm model, and determine whether a distributed system with AI application cooperation can be formed in a coordinated manner, so as to complete a processing task of a corresponding AI application; in the capability negotiation process, the transmission interface supports bidirectional transmission of the AI algorithm model, for example, part or all of the AI algorithm model stored by the first device or the second device or the AI algorithm model acquired by the network may be transmitted to the first device or the second device, so as to implement loading of the input part and the output part of the AI algorithm model.

Optionally, through the transmission interface in the embodiment of the present application, the feedback data of the online training output by the second device may be sent to the first device, so as to perform online training or online optimization on the input part information of the AI algorithm model on the first device, and feed back the feature data of the further training to the second device, thereby implementing online training and online optimization of the AI algorithm models of the first device and the second device.

In other embodiments, the transport interface may also be used to transport multiple types of data. The transmission interface can also transmit output characteristic data of an input part of the AI model and can also transmit control messages such as handshake signals, control signals and the like in a two-way manner. The transmission interface may also transmit signals of media information or other data signals. The transmission interface can carry out simultaneous aggregation transmission and bidirectional transmission on the signals. For example, the transmission interface may support compatible transmission of media information and AI feature data, may transmit standard media information data, and may also transmit feature data processed by the NPU of the acquisition terminal. The transmission interface can also transmit media signals and other data signals, and the transmission interface can also transmit handshaking and control signals.

In some embodiments, the transmission interface may be a unidirectional transmission interface or a bidirectional transmission interface. Taking the unidirectional transmission interface as an example, a sending interface is arranged at a sending end, and a receiving interface is arranged at a receiving end. Thereby realizing the function of transmitting the media data from the transmitting end to the receiving end. In one example, the transmission interface may be a bidirectional transmission interface, in which case, the transmission interface has a sending function and also has a receiving function, that is, supports bidirectional data transmission. For example, the transmission interface supports sending and receiving data signals, that is, the transmission interface may be used as both a sending end and a receiving end of the data signal.

In some embodiments, the transmission interface has data aggregation transmission capability, for example, the protocol of the interface may support simultaneous transmission of media information and AI feature data in the same channel in technologies such as data packing and mixing, if bandwidth allows.

In some embodiments, the transport interface may transport raw or compressed media information, e.g., the transport interface may support bi-directional transport of media information and AI feature data by configuring multiple channels.

The transmission interface in the first device 200 may include a first transmission interface and a third transmission interface. The data to be sent to the second device 300 is encapsulated into first bit stream data by connecting the first transmission interface with the processor of the first device, and is sent to the third transmission interface of the second device through the third transmission interface. Second bitstream data sent from the second device can be received through the third transmission interface, and thus, data or a message corresponding to the second bitstream data is obtained by decapsulating through the first transmission interface (the second bitstream data is the data or the message encapsulated by the second device through the second transmission interface).

The transmission interface in the second device 300 may include a second transmission interface and a third transmission interface. The second transmission interface is connected to the processor of the second device, encapsulates the data to be sent to the second device 300 into second bitstream data through the second transmission interface, and sends the second bitstream data to the third transmission interface of the first device through the third transmission interface. The first bit stream data from the first device can be received through the third transmission interface, and the characteristic data, the data sum, the control information, the feedback data, the handshake data, the message and the like sent by the first device are obtained through de-encapsulation of the second transmission interface. Thereby, a transmission channel established through the third transmission interface of the first device and the third transmission interface of the second device is made to support bidirectional transmission.

In other embodiments, the third transmission interface of the plurality of first devices 200 is a third transmission interface, and at this time, the plurality of first bit stream data encapsulated by the first transmission interfaces of the plurality of first devices may be sent to the third transmission interface of the second device 300 through the third transmission interface. For example, the N first devices generate M first feature data, and encapsulate the M first feature data through the N first transmission interfaces of the N first devices, respectively, and package the M first feature data into a fourth message through the third transmission interface. Receiving the fourth message through the third transmission interface of the second device 300, decapsulating the M first feature data through the second transmission interface, and forwarding the M first feature data to the corresponding feature data processing model for processing according to the feature data processing model corresponding to the M first feature data.

It should be noted that the data signal herein may be multimedia data, may also be characteristic data related in the embodiment of the present application, may also be control information for establishing a transmission link, and may also be used for transmitting other parameters and other data signals, which is not limited herein.

The following illustrates a specific transmission process of the transmission interface in the present application.

In one possible implementation, the sending end compresses and encrypts the transmitted data; and transmitting the transmitted data to a transmission interface through channel coding, and transmitting the data to a physical layer channel of the interface after modulation.

Take the first device as an example as a sending end for sending the characteristic data to the second device. The first device may further include a channel coding unit, and the channel coding unit performs channel coding on the characteristic data according to a data transmission protocol agreed by a transmission interface or a standard to obtain a coded signal. Optionally, before performing channel coding on the characteristic data, the first device may further perform data compression on the characteristic data, so as to further reduce the data transmission amount. Optionally, the first device performs channel coding on the feature data, may further encrypt the feature data after the channel coding, and modulates the feature data after the channel coding into a physical layer signal according to a transmission interface system or an electrical layer and physical layer transmission protocol agreed by a standard, and sends the physical layer signal to a transmission channel from the output interface.

In a possible implementation manner, after demodulating the physical layer signal, the receiver interface may perform channel decoding to obtain transmitted data; correspondingly, the receiving end can also decompress and decrypt the decoded signal.

Take the second device as an example of a receiving end receiving the feature data sent by the first device. The second device may further comprise a channel decoding unit for receiving the characteristic data of the first device. For example, the physical layer signal may be received from the transmission channel through the input interface of the second device, and the physical layer signal may be demodulated to obtain the encoded signal. And performing channel decoding on the received coded signal according to the protocol of the transmission interface through a channel decoding unit to obtain the characteristic data sent by the second equipment. Optionally, when the encoded signal is an encrypted and/or compressed signal, the channel decoding unit may further decrypt and decompress the encoded signal.

Optionally, the first device 200 or the second device 300 may further include: and the display screen is used for displaying images, videos and the like. The display screen includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the first device 200 or the second device 300 may include 1 or S display screens, S being a positive integer greater than 1. The display screen may be used to display information input by or provided to a user (e.g., video information, voice information, image information, text information, etc.) and various Graphical User Interfaces (GUIs). For example, the display screen may display a photograph, video, web page, or file, etc. Alternatively, the display screen may display a graphical user interface as shown in fig. 4 b. A status bar, a concealable navigation bar, a time and weather widget (widget), and an icon of an application, such as a browser icon, may be included on the graphical user interface as shown in fig. 4 b. The status bar includes the name of the operator (e.g., china mobile), the mobile network (e.g., 4G), the time and the remaining power. The navigation bar includes a back key icon, a home key icon, and a forward key icon. Further, it is understood that in some embodiments, a Bluetooth icon, a Wi-Fi icon, an add-on icon, etc. may also be included in the status bar. It is also understood that in other embodiments, a Dock bar may be included in the graphical user interface shown in fig. 4b, and common application icons may be included in the Dock bar. When the processor 210 detects a touch or gesture event of a finger (or a stylus, etc.) of a user with respect to an application icon, in response to the touch or gesture event, a user interface of an application corresponding to the application icon is opened and displayed on the display screen. Illustratively, the display screen of the first device 200 or the second device 300 displays a main interface including icons of a plurality of applications (such as a camera application, a WeChat application, etc.). The user clicks an icon of the camera application in the main interface through the touch sensor, and the trigger processor 210 starts the camera application and opens the camera. The display screen displays an interface, such as a viewfinder interface, for the camera application. Alternatively, the first device 200 or the second device 300 may generate a vibration alert (e.g., an incoming call vibration alert) using a motor. The indicator in the first device 200 or the second device 300 may be an indicator light, and may be used to indicate a charging status, a power change, or a message, a missed call, a notification, or the like.

Alternatively, the first device 200 or the second device 300 may implement an audio function through an audio unit, an application processor, and the like. Such as music playing, recording, etc. Optionally, the audio unit may include: one or more of a speaker, a receiver, a microphone, and an earphone interface. Alternatively, the first device 200 or the second device 300 may receive a key input, and generate a key signal input related to user setting and function control of the first device 200 or the second device 300.

In the embodiment of the application, the display screen can be an integrated flexible display screen, and a spliced display screen composed of two rigid screens and one flexible screen positioned between the two rigid screens can also be adopted.

It should be understood that in practical applications, the first device 200 may include more or less components than those shown in fig. 3a, and the second device 300 may include more or less components than those shown in fig. 3b, and the embodiment of the present application is not limited thereto. The illustrated first apparatus 200 or second apparatus 300 is merely an example, and the first apparatus 200 or second apparatus 300 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

As shown in fig. 4a, which is a flowchart illustrating a media information transmission method according to an embodiment of the present application, a first device and a second device establish a communication connection in cooperation with an AI application, and the first device and the second device cooperate with each other to implement that the first device and the second device cooperatively complete an AI task. The following description will take an example of a communication connection mode in which the second device has a display screen and initiates establishment of the AI application cooperation distributed system. In some embodiments, it may also be that the first device actively initiates to establish a communication connection with the second device, which may be implemented by referring to the manner of this embodiment, and is not limited herein. The method specifically comprises the following steps:

step 401: the second device discovers the first device through a device discovery protocol.

The first device and the second device may be connected to each other through their respective transmission interfaces and corresponding wired or wireless channels, for example, wireless communication connection may be established through bluetooth, NFC, and WIFI, and communication connection may also be established in a wired manner.

Step 402: the second device sends a capability negotiation request message to the first device.

The capability negotiation request message is used for requesting a transmission protocol supported by the first device, and the transmission protocol of the first device is used for indicating whether the first device supports transmission of the feature data.

The second device may discover the first device according to the device discovery protocol adapted in the embodiments of the present application. Furthermore, capability negotiation between the first device and the second device is performed through a control channel and a handshake protocol in the transmission interface, so that the second device determines information such as the type of the first device, software capability supporting AI processing, hardware capability supporting AI processing, a transmission protocol of supported media information, and the like, thereby determining whether the first device can establish communication connection with the second device, and being used for realizing a corresponding AI application cooperation function.

In some embodiments, the establishment of the distributed system formed by the first device and the second device may be triggered by a corresponding AI application, the establishment of the distributed system formed by the first device and the second device may also be triggered by other manners, or the establishment of the distributed system formed by the first device and the second device may be initiated by a user actively to establish a communication connection between the first device and the second device. For example, the user may set the AI application cooperation function of the first device or the second device through the first device or the second device provided with the interface of the AI application cooperation function, respectively. For example, an interface corresponding to the AI application cooperation function of the first device may be set on the first device, so that the user may set the AI application cooperation function of the first device on the interface. An interface corresponding to the AI application cooperation function of the second device may also be set on the second device, so that the user may set the AI application cooperation function of the first device on the interface.

In some scenarios, the first device or the second device does not have the function of displaying the interface, and in this case, the AI application cooperation function of the first device and the second device may be set on the device having the function of displaying the interface, that is, the first device and the second device as a whole may be set to perform the AI application cooperation function. For example, when the first device has a function of displaying an interface, the AI application cooperation function of the first device may be set, and the AI application cooperation function of the second device may also be set, so as to determine that the AI application cooperation function formed by the first device and the second device is set, which is not limited herein.

The following description will take the manner of user active triggering as an example.

For example, the second device may be provided with an interface for an AI application cooperation function. Illustratively, as shown in fig. 4b, the control interface 410 is a control interface 410 for applying the cooperative function to the AI of the second device, and the user can operate the control interface 410 to set whether to establish the AI cooperative function with the first device. For example, the user may turn on/off the AI application collaboration function based on the on/off control 420 of the AI application collaboration function.

The user opens the AI application collaboration function of the second device. An identification of the first device discovered by the second device may be displayed in an AI application collaboration function interface of the second device. The identifier of the first device may include a device icon of the first device and/or a discovery name of the first device when the first device is used as a sender of the feature data in cooperation with the AI application.

In some embodiments, in response to a click operation of the user, the second device displays a first lookup device interface 430, where the first lookup device interface 430 may be a lookup device interface of an AI application collaboration service of the second device, and the lookup device interface 430 may include an identifier of a discoverable first device, specifically, as shown in fig. 4c, the lookup device interface includes an icon 450 of the first device and a discovery name 440 of the first device when the first device is used as a sender of feature data of the AI application collaboration. Therefore, the user can distinguish the types of the equipment of the AI application cooperative characteristic data sending end found by the second equipment conveniently, for example, the equipment type of the AI application cooperative characteristic data sending end is a split type intelligent screen, a mobile phone accessory, a monitoring device, a vehicle-mounted sensor device and the like. As shown in fig. 4c, the first device may be a smart screen 1, a camera 1, a headset 1, AR glasses. In some other embodiments of the present application, the first lookup device interface may not distinguish the discovered devices according to the types of the devices of the collaboration service end.

For example, in some other embodiments, the user may operate a prompt box in a notification bar or a task bar of the second device, and in response to the operation, the second device opens the first lookup device interface. In some other embodiments, the user may operate on an associated icon in a notification bar or a task bar of the second device, and in response to the operation, the second device opens the first lookup device interface. In response to a selection operation of the first device, the second device may send a capability negotiation request to the first device.

The capability negotiation request may be a handshake request message sent to the first device through a control channel in a transmission interface of the second device and a corresponding handshake protocol. The capability negotiation request is used for requesting the capability information of the first device, such as media information acquisition capability, main parameters of media information, AI software and hardware processing capability, and a transmission protocol supported by a transmission interface.

Step 403: the first device sends a capability negotiation response message to the second device.

Wherein the capability negotiation response message may be used to confirm that the first device supports a transmission protocol for transmitting the feature data. Optionally, the capability negotiation response message may also be used to confirm the capability information of the first device. For example, the capability negotiation response message further includes at least one of: a feature extraction capability of the first device, and a version of a feature extraction model in the first device.

In some embodiments, after receiving the capability negotiation request message sent by the second device, the first device may return a capability negotiation response to the second device, where the capability negotiation response message may include: the first device includes capability information such as media information acquisition capability, main parameters of the media information, AI software and hardware processing capability, and a transmission protocol supported by the transmission interface.

In response to the capability negotiation response message of the first device, the first device may determine whether the first device is capable of forming a distributed system for AI application collaboration (e.g., collaboration of the first application) through the respective transmission interfaces according to the capability negotiation response message. Namely, whether the transmission interface of the first device and the transmission interface of the second device can support the feature data of the transmission media information, whether the first device has an AI processing capability for performing feature extraction on the media information, and whether the first device supports the processing capability of the input part of the model corresponding to the AI application.

When determining that the first device supports the AI application coordination, a capability negotiation confirmation message may be sent to the first device, where the capability negotiation confirmation message may be used to prompt the first device whether to start the AI application coordination function or not, and the capability negotiation success may be prompted to the first device.

In some embodiments, when the AI application collaborative capability negotiation is successful, a reminder message of the capability negotiation confirmation message may also be displayed on the second device, as shown in (a) in fig. 4d, to prompt the user whether to turn on the AI application collaborative function of the first device and the second device. On the interface of the reminder message, a setting control may be set, so as to jump to an interface in cooperation with the AI application, so that the user may create the AI application in cooperation with the first device.

In some embodiments, when the AI application cooperation capability negotiation fails (for example, the NPU architectures of the first device and the second device are incompatible and cannot support the same AI model), the second device may return a capability negotiation failure message to the first device, as shown in (b) in fig. 4d, a view detail control may be set on an interface of the alert message, so as to jump to an interface of the AI application cooperation capability negotiation failure, so that a user may view a specific interface of the AI application cooperation capability negotiation failure result. At this time, the user may determine whether media information is still transmitted with the first device according to the negotiation failure result to implement the AI application function of the second device. Another possible way, as shown in (b) in fig. 4d, on the interface of the alert message, the user may also be prompted whether to transmit media information with the first device, so as to implement an AI application function of the second device.

In one possible scenario, the first device and the second device may negotiate the capability to transfer media information. And determining the transmission mode of the media information which is simultaneously supported and transmitted by the first equipment, the second equipment and the corresponding transmission interface through the corresponding handshake protocol. Therefore, the first device may establish a communication link for transmitting the media information with the second device, and based on the acquired media information, the first device may perform media encoding and channel encoding on the media information according to the media encoding modes supported by the first device and the second device, and send a signal of the encoded media information to the second device. At this time, the second device decodes the received signal of the media information to obtain corresponding media information, and processes the media information according to the requirement of the AI application. When the media information needs to be displayed/played on the second device, the second device can display/play the media information. When the processing result of the media information needs to be displayed/played on the first device, the second device may perform media encoding and channel encoding on the processing result of the media information, and generate a signal of the media information that enables the first device to display/play, so that the first device may receive the signal of the media information and perform display/play.

Step 404: the second device sends an authentication request message to the first device.

Wherein the authentication request message is used for requesting whether the first device establishes a trusted communication connection with the data processing apparatus, and the communication connection is used for confirming the authority of the data processing apparatus for controlling the first device.

In some embodiments, in response to an operation of the AI application cooperation request of the first interface by the user, the second device may send a security authentication request message to the first device; the security authentication request message is used for requesting the second device to acquire the control authority of units used for the AI application cooperative function, such as a media acquisition unit of the first device.

The authentication mode corresponding to the security authentication request may include manual authorization, unified biometric authentication authorization, account authorization, cloud authorization, near field communication authorization, and the like. This is exemplified by the way the user enters a username and password. At this time, the second device may display an authentication interface of the first device, the authentication interface being used to prompt the user to input a user name and a password for login authentication of the first device. After the second device receives the confirmation input of the user, the second device may carry the user name and the password input by the user in the security authentication request message and send the security authentication request message to the first device. And the first equipment receives the security authentication request message of the second equipment from the transmission port and carries out validity verification on the user name and the password carried in the security authentication request. After the validity of the second device is verified by the first device, the first device may send a security authentication response message to the second device, where the security authentication response message is used to notify the second device whether the second device can obtain the control authority of the units used for the AI application cooperation function, such as the media collection unit of the first device.

Step 405: the first device sends an authentication response message to the second device.

Wherein the authentication response message is used to confirm whether the first device establishes a trusted communication connection with the data processing apparatus.

In some embodiments, the second device may send an authentication success message to the first device.

Wherein the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

In some embodiments, in response to the security authentication response message of the first device, the second device may assign an identifier of a distributed system to the distributed system in which the first device cooperates with the AI application formed by the second device, that is, the first device and the second device may be considered to form a super device or a distributed system. In the distributed system, the first device may be assigned a corresponding device identifier, and the second device may be assigned a corresponding device identifier, for data transmission between the distributed systems. The identification of the distributed system may be used for the second device to establish a communication link with the first device. And establishing a corresponding media data link, a characteristic data transmission link obtained by applying a corresponding model of AI according to the requirement, and a control link. Therefore, the second device can receive the feature data sent by the first device through the feature data transmission link, and after processing the feature data, the second device can be used for AI application. The second device may receive the media information sent by the first device over the media data link. The second device may further send a control instruction to the first device through the control link, where the control instruction may be used to instruct control over units (a media information acquisition unit, a storage unit, a media information playing unit, and the like) in the first device, for example, control manners such as start and end of acquiring media information by the media information acquisition unit, parameter adjustment control and operation of the media information acquisition unit, and the like. And realizing a corresponding AI application task for the media information acquired by the first equipment through the cooperation of the first equipment and the second equipment.

When the security authentication fails, the first device cannot obtain the control authority of the unit required for AI application cooperation corresponding to the first device, and a notification message indicating that the AI application cooperation between the second device and the first device fails can be displayed to the second device.

When the security authentication part fails, for example, the media collection unit in the first device succeeds in authentication, and the storage unit fails in authentication, at this time, a communication connection in cooperation with the AI application may be established for the successfully authenticated unit and the second device. Further, a cooperative interface of the AI application of the second device may also display that the authentication of the media acquisition units of the second device and the first device is successful, and establish a notification message that the AI application is successful in cooperation. For another example, the AI processing unit of the first device authenticates successfully, and at this time, the second device may establish a communication connection with the AI processing unit of the first device in cooperation with the AI application. That is, the second device may configure device identifiers for the AI processing units of the second device and the first device, so as to form a distributed system with a cooperative AI application. The device is used for establishing a communication link of the characteristic data between the first device and the second device and also establishing a communication link of the media information. The first device may send the feature data to the second device, so that the second device performs operation on the output part of the model of the AI application according to the feature data sent by the first device to obtain an inference result of the AI application, thereby implementing the AI application cooperation of the first device and the second device.

In some embodiments, the first device and the second device may be multiple devices, one first device may initiate establishment of a communication connection in cooperation with an AI application, or multiple devices may initiate a capability negotiation request and a security authentication request to a server together, at this time, the server may perform, on the first device and the second device, confirmation of the capability of establishing cooperation with the AI application, and perform security authentication on the first device and the second device, and after the capability negotiation is successful and the authentication is successful, the second device may be configured with a corresponding control authority of the first device, and a device identifier is configured for the formed distributed system.

When a plurality of first devices or a plurality of units such as a media information acquisition unit, a storage unit, a transmission interface and the like for the first devices in the AI application cooperation exist in a distributed system formed by the AI application cooperation, unit identifiers of the device identifiers can be allocated to different units. As shown in fig. 4b (c), for the display interface after the AI application cooperation is successfully established between the second device and the first device, a unit that can establish AI application cooperation with the second device, a unit identifier, and a state of the unit (whether to start the AI application cooperation) may be displayed in the display interface. In other embodiments, the unit for establishing the AI application cooperation with the second device may also be actively set by the user, and is not limited to this application.

Therefore, after the media information acquired by the plurality of first devices or the plurality of media information acquisition units is subjected to feature extraction by the input part of each model, the acquired feature data can be used for distinguishing the source of each feature data according to each unit identifier, so that the feature data are uniformly transmitted to the second device after being aggregated by the transmission interface, the second device can determine each feature data according to each unit identifier, corresponding processing is carried out, and a control instruction is transmitted to the corresponding device by carrying each unit identifier, so that AI application cooperation of the plurality of devices is realized, and the second device can conveniently and uniformly control and manage the plurality of first devices or the plurality of units.

In some other embodiments of the present application, after the validity of the second device is verified by the first device, the second device may further display a notification message to notify the user that the second device has successfully established a connection with the first device, which may establish a transmission scenario for cooperating with the second device AI application with the media data in the first device.

Further, in the process of establishing a communication connection between the first device and the second device, in order to avoid frequent operation of the AI application cooperation function by the user, in a possible implementation manner, the second device may preset a trigger condition for starting the AI application cooperation function, for example, when the first device establishes a communication connection with the second device and the second device starts a corresponding AI application, the second device automatically starts the AI application cooperation function. The AI application triggering the AI application cooperation function may be determined by setting a white list, as shown in fig. 4b, where the white list may be set on an interface of the AI application cooperation function by a user, or may be set by default through factory settings, which is not limited herein.

Correspondingly, the first device may also preset a trigger condition for starting the AI application cooperation function, for example, after the first device establishes a communication connection with the second device, the first device automatically starts the AI application cooperation function.

In another possible implementation, considering that the communication connection is used for implementing the AI application cooperation function, the user may set the AI application cooperation control interface in the second device to: the searched first device is the first device with the AI application cooperation function opened. Therefore, the second device can establish communication connection only with the first device with the AI application cooperation function opened, thereby avoiding unnecessary communication connection establishment and wasting network resources. Illustratively, when the second device determines that the first device turns on the AI application cooperation function, a communication connection is established with the first device.

After the communication connection of the AI application cooperative function is established between the first device and the second device, the first device and the second device are confirmed to correspondingly load the input part and the output part of the AI algorithm model through the established communication connection. The first equipment has the media information acquisition capability and can load an input part of the AI algorithm model; the second device is capable of loading the output portion of the AI algorithm model; each of the first device and the second device may load an input portion and an output portion of one or more AI algorithm models.

The input part and the output part of the AI algorithm model loaded for ensuring the first device and the second device can be used for applying the cooperative function for the current AI. In one possible implementation manner, whether the input part and the output part of the corresponding AI algorithm model are loaded on the first device and the second device respectively may be determined according to the AI algorithm model corresponding to the AI application through capability negotiation between the first device and the second device.

In some embodiments, the second device may also query the second device for the model ID and the version ID of the loaded AI algorithm model of the first device through the capability negotiation request message. Correspondingly, after receiving the capability negotiation request sent by the second device, the first device may return a capability negotiation response to the second device, where the capability negotiation response message may include: the model ID and version ID of the AI algorithm model loaded by the first device.

In some embodiments, the model ID and version ID of the AI algorithm model may be displayed on the AI application collaboration interface and the load status displayed. For example, as shown in (a) of fig. 4e, the cell 1 of the first device is used to establish AI application cooperation for the first application with the second device, and at this time, a model ID and a version ID corresponding to the first application loaded by the first device may be displayed in the display field of the cell 1.

Wherein, the algorithm model is stored on the second device for example. The determination may be made by determining whether the model ID, version ID corresponding to the input portion of the AI algorithm model loaded by the first device, and the model ID, version ID corresponding to the output portion of the AI algorithm model loaded by the second device are consistent or compatible.

When the second device determines that the model ID and the version ID corresponding to the input portion of the loaded AI algorithm model of the first device are consistent or compatible with the model ID and the version ID corresponding to the output portion of the AI algorithm model loaded by the second device, it may be confirmed that the establishment of the AI application cooperation of the first device and the second device is completed.

When the second device determines that the model ID and the version ID corresponding to the input portion of the loaded AI algorithm model of the first device are inconsistent or incompatible with the model ID and the version ID corresponding to the output portion of the loaded AI algorithm model of the second device, the second device may determine that the version of the first device needs to be updated, at this time, the second device may display an interface with failed loading on the display screen, and further, may display a prompt box on the display interface, where the prompt box is used to prompt a user whether to update the model or the version of the model of the first device. In response to an instruction to update the input part of the AI algorithm model, the second device may display an update interface on the display screen, for example, as shown in (b) of fig. 4e, the cell 1 of the first device is used to establish AI application cooperation for the first application with the second device, at which time, a model ID and a version ID corresponding to the first application loaded by the first device may be displayed in the display field of the cell 1 to be updated. At this time, the input part of the AI algorithm model, which is identical or compatible with the model ID, version ID, or the output part of the AI algorithm model loaded by the second device, may be transmitted to the first device and loaded on the first device. Optionally, as shown in fig. 4e (b), the progress or status of the update may be displayed in an update interface displayed on the display screen, and when it is determined that the input part of the AI algorithm model loaded on the first device is completed, it may be determined that the AI applications of the first device and the second device are cooperatively established.

While the input portion and the output portion of the AI algorithm model are stored in the second device, the first device may refer to the above embodiment to load the input portion and the output portion of the AI algorithm model in correspondence with the first device and the second device. In addition, the input part and the output part of the AI algorithm model may also be stored in the server, and at this time, the process of loading the input part and the output part of the AI algorithm model by the first device and the second device may be completed by the server, which is not described herein again.

For an AI application, the input part and the output part may correspond to one another, or a plurality of input parts may correspond to one output part, i.e. an AI algorithm model, which includes a plurality of input parts and one output part. One input section corresponds to a first device and one output section corresponds to a second device.

For example, in a scene where machine vision processing is performed by using video information acquired by a plurality of cameras, the plurality of cameras may serve as a plurality of first devices, and a device that obtains a result of the machine vision processing may serve as a second device. For another example, data information acquired by a multi-mode sensor such as a camera and a laser radar is used for applications such as environment understanding; at this time, the camera may serve as 1 first device, the lidar as 1 first device, each sensor as one first device, and the device that obtains the processing result of the application of the environment understanding or the like as the second device. At this time, a plurality of media signals on one or more first devices may be processed through a plurality of AI algorithm model input parts loaded on a plurality of different neural network processing units. That is, each first device performs feature extraction on the acquired media information, and sends feature data after feature extraction to the second device, so that the second device can obtain feature data corresponding to the media information of the sensors of the multiple first devices.

According to a possible implementation manner, multiple feature data obtained from one or more first devices can be independently packaged and transmitted in a unified manner after being aggregated in a transmission interface, at this time, the second device can recover according to a received data packet to determine the feature data corresponding to each first device, and the feature data of each first device is synchronized and then input to an AI algorithm model output part loaded on a neural network processing unit of the second device for processing to obtain an inference result for subsequent AI application, so that a collaborative AI task among multiple devices is realized.

In another possible implementation, one or more first devices may establish a communication link with each second device, so that each first device may send corresponding feature data to each second device. At this time, the second device may input the received feature data sent by each first device to the output part of the AI algorithm model corresponding to the corresponding second device for processing, so as to obtain an inference result of the AI algorithm model, thereby implementing a collaborative AI task among multiple devices.

Example 1

In the first example, the method can be used for split televisions, split AR/VR scenes and the like. The first device can be a screen end of a split television, an AR/VR head-mounted display device and the like. The second device can be a split television host box, a mobile phone, a PC, a game host and the like. Fig. 5a is a schematic diagram of a system architecture corresponding to this example. The first device and the second device establish communication connection of AI application cooperation through corresponding transmission interfaces to form a distributed system of AI application cooperation. In this scenario, as shown in fig. 5b, a flow of the media information transmission method according to the embodiment of the present application may include the following steps:

step 501: the first device obtains media information.

The first device can be an AI algorithm model processing hardware with audio and video acquisition and display playing capabilities.

The original audio and video signals are collected through an audio and video collecting unit of the first device, the media information to be transmitted is obtained after the preprocessing of a processing unit of the first device, and the media information is transmitted to an NPU of the first device. For example, as shown in fig. 5c, the first device is a screen end of a split tv, and a plurality of video image frames of a person collected on a camera of the first device are used as media information to be transmitted.

Step 502: the first device performs feature extraction on the media information according to an input part (e.g., a first feature extraction model) of the AI algorithm model, and determines feature data.

The first device determines an input part loaded with the AI algorithm model of the first application in the NPU of the first device according to the currently determined co-processed AI application (for example, application 1), so that the media information is subjected to feature extraction through the input part of the AI algorithm model of the first application in the NPU of the first device to obtain feature data. Taking the first application as an example of a related application of machine vision interaction, for example, as shown in fig. 5d, the AI algorithm model corresponding to the first application may be an AI algorithm model for recognizing a gesture of a person, and at this time, feature extraction may be performed on a plurality of video image frames of the person captured on the camera of the first device according to an input part of the AI model for gesture recognition, so as to determine feature data after feature extraction is performed.

Step 503: the first device sends the feature data to the second device.

The feature data is transmitted to the NPU of the second device. The second device is provided with media information processing and control capabilities, for example, AI algorithm model processing hardware and media information processing and human machine interaction capabilities.

Step 504: the second device processes the feature data according to an output portion (e.g., a first feature data processing model) of the AI algorithm model.

The second device may obtain the inference result of the AI algorithm model by processing using the output part of the AI algorithm model. And providing the inference result of the AI algorithm model for a subsequent AI application program to obtain the processing results of tasks of the AI application such as subsequent voice interaction, machine vision interaction, environment modeling and the like.

In some embodiments, the processing result of the AI application obtained by the second device needs to be displayed on the display interface of the first device, for example, as shown in fig. 5d, a plurality of video image frames of the person captured on the first device are used as media information to be transmitted, at this time, the obtained feature data may be processed according to the output part of the AI model for gesture recognition to recognize the gesture in the video image of the person captured by the first device. The recognized gesture may be a result of processing by the AI application by generating a recognition box at a corresponding location of the image. At this time, the processing result of the AI application may be displayed on the display screen of the first device, and thus, the second device may send a control instruction of the gesture recognition result to the first device to instruct the first device to display the gesture recognition result on the corresponding position of the display screen, so that the user may determine that the gesture recognition is successful according to the displayed gesture recognition result.

Optionally, step 505 a: the second device sends a first message to the first device.

Wherein the first message is used for indicating the state of the first device for collecting the media data, and the state of the first device for collecting the media data comprises at least one of the following items: an on state, an off state, or parameters for collecting media data.

Step 506 a: adjusting a state in which the first device collects media data in response to the first message.

In other embodiments, when the second device has the control right of the camera of the first device, the second device may further determine whether the parameter setting of the camera of the first device is reasonable according to the gesture recognition result, and when it is determined that the parameter setting of the camera of the first device needs to be adjusted, a corresponding control instruction may be generated, where the control instruction is used to adjust parameters of the camera of the first device (for example, a pose of the camera of the first device, a focal length of the camera, a focal position, and the like, a type of the opened camera, and the like).

Optionally, step 505 b: the second device sends a second message to the first device.

In some embodiments, the second message is for instructing the first device to acquire the first data.

Step 506 b: the first device acquires first data.

The first device may respond to the second message, based on a network or a storage unit of the first device, or acquire the first data, or a media information acquisition unit of the first device acquires the first data. Thus, the first device may transmit the first data to the second device; the first data may be: the media data collected by the first device, the data stored by the first device, and the data received by the first device.

For another example, when the second device needs to control the media information acquisition unit of the first device to adjust the parameter of the media information acquisition unit, a second message may be sent to the first device first, where the second message may be used to request to acquire the parameter of the media information acquisition unit of the first device.

In other embodiments, the second device may control the first device to collect the media information based on the inference result of the AI application, and at this time, the second message may be used to instruct the first device to collect the feature data of the third media data. For example, the second device determines that audio information needs to be acquired by the first device according to the inference result of the AI application, and at this time, the second message may instruct the first device to acquire corresponding audio information, and optionally, the second message may also instruct a model ID and a version ID for performing feature extraction on the audio information after acquiring the audio information. At this time, the first device responds to the second message, collects the third media data, performs feature extraction on the third media data according to a feature extraction model corresponding to the model ID and the version ID to obtain third feature data, and sends the third feature data to the second device. Therefore, after receiving the third feature data, the second device processes the third feature data according to the feature data processing model corresponding to the model ID and the version ID to obtain an inference result of the AI application corresponding to the audio information. Thus, the task of the AI application is better fulfilled.

In some embodiments, the second device may further generate a third message to be sent to the first device according to the received operation information of the user and/or the processing result of the AI model.

For example, the user may operate an interface of an AI application (e.g., application 1), where the operation may be a click operation, a gesture operation, or a voice instruction operation, which is not limited herein. The second device may receive an operation instruction of the user on the interface of the AI application, collected from the first device, and thereby generate a third message to the first device according to the operation instruction and the processing result of the AI model.

Optionally, step 505 c: the second device sends a third message to the first device.

Wherein the third message is determined by the second device based on the first characteristic data; the third message is used for indicating the content displayed by the first device.

Step 506 c: and the first equipment responds to the third message and displays the content used for indicating the first equipment to display in the third message through a display unit.

For example, when a gesture operation instruction of a user is used to open a corresponding AI application, the second device determines that the gesture operation instruction is used to open the corresponding AI application by recognizing the gesture, and at this time, the second device opens the AI application according to the operation instruction and sends media information required for displaying an opening interface of the AI application to the first device. At this time, after the first device receives the media information, the open interface of the AI application may be displayed through the display screen of the first device.

For another example, the operation instruction of the user is taken as a video interface for jumping to a corresponding operation and displaying a corresponding video, and at this time, the second device acquires media information of the corresponding video according to the identified operation instruction of the user, so that the media information is sent to the first device through the transmission interface and displayed and played on a display screen of the first device, so as to realize response to the operation instruction of the user and complete human-computer interaction.

Example two

Fig. 6a is a schematic diagram of a system architecture of example two. The second device is provided with a display screen. At this moment, the first device can be external camera accessories, a vehicle-mounted camera, a household monitoring camera, intelligent household appliances with video acquisition capacity, a screen end of a split television, an AR/VR head display and other terminal devices. The second device can be a terminal device with strong calculation display, such as a split television host box, a mobile phone, a vehicle-mounted host, a PC, a game host and the like. The first device and the second device establish communication connection of AI application cooperation through corresponding transmission interfaces to form a distributed system of AI application cooperation. In this scenario, as shown in fig. 6b, the flow of the media information transmission method according to the embodiment of the present application may include the following steps:

step 601: the first device obtains media information.

The first device can be audio and video acquisition capability and AI algorithm model processing hardware.

The original audio and video signals are collected through an audio and video collecting unit of the first device, the media information to be transmitted is obtained after the preprocessing of a processing unit of the first device, and the media information is transmitted to an NPU of the first device. For example, as shown in fig. 6c, taking the first device as a sensor unit in the vehicle-mounted device as an example, the video images of the road and the outside of the vehicle during the driving of the vehicle, which are collected on the camera of the first device, are taken as the media information to be transmitted.

Step 602: the first device performs feature extraction on the media information according to an input part (e.g., a first feature extraction model) of the AI algorithm model, and determines feature data.

An input part of the AI algorithm model is loaded in the NPU of the first device, so that the media information is subjected to feature extraction through the input part of the AI algorithm model in the NPU to obtain feature data. For example, the AI application is an application related to automatic driving, in this case, the AI algorithm model corresponding to the AI application may be an AI algorithm model for recognizing an environment such as a lane where a vehicle is traveling, a road condition, and the like, and in this case, feature extraction may be performed on an image of the lane collected on a sensor (e.g., a radar sensor, a camera, and the like) of the first device according to an input part of the AI model for road recognition, so as to determine feature data after the feature extraction is performed. For example, as shown in fig. 6c, the media information collection unit for AI application cooperation on the first device may be a sensor unit 1 (e.g., a radar sensor). At this time, the media information collected by the sensor unit 1 may be subjected to feature extraction by the first feature extraction model corresponding to the sensor unit 1, so as to obtain feature data 1.

Step 603: the first device sends the feature data to the second device.

The feature data is transmitted to the NPU of the second device. The second device is provided with media information processing and control capabilities, e.g., AI algorithm model processing hardware, as well as media information processing and human-computer interaction capabilities, display capabilities, and the like.

Step 604: the second device processes the feature data according to an output portion (e.g., a first feature data processing model) of the AI algorithm model.

With reference to the above example, in step 604, the feature data 1 obtained by performing feature extraction on the point cloud image of the lane collected by the radar sensor of the first device is input to the output part of the corresponding lane recognition AI algorithm model for processing, so as to obtain lane information in the image (as shown in fig. 6c, it may be determined that the vehicle is about to enter the 2 nd lane from left to right), thereby providing more accurate positioning information of the first device (i.e., the lane where the vehicle is located) for the second device, and providing a better navigation path for the first device according to the positioning information of the first device.

In other embodiments, the first device may be a media information collection device of multiple types to provide more media information to the AI application to achieve better AI algorithm results. Still take a plurality of types of sensors in the vehicle as an example, for example, under different weather conditions, a large error may occur when only a single sensor is used to identify the current road, and at this time, media information collected by the plurality of types of sensors may be used to perform comprehensive identification, so that a better road identification effect may be obtained.

For example, the collected media information may be one video/image signal or a combination of a plurality of video/image signals; each video/image signal can be a visible light image, or a video/image signal of other modalities such as an infrared image, a radar signal, depth information and the like; at this time, the media information acquisition units on the one or more first devices may be processed by the input parts of the AI algorithm models loaded by the different NPUs to extract corresponding feature data. For example, as shown in fig. 6c, the media information collection unit for AI application collaboration on the first device may include a sensor unit 1 (e.g., a radar sensor) and a sensor unit 2 (e.g., a camera). At this time, the media information collected by the sensor unit 1 may be subjected to feature extraction by the first feature extraction model corresponding to the sensor unit 1, so as to obtain feature data 1. Media information acquired by the sensor unit 2 is acquired, and feature extraction is performed through a first feature extraction model corresponding to the sensor unit 2 to obtain feature data 2. Therefore, a plurality of feature data output by each NPU can be independently packaged and transmitted to the second equipment after being aggregated in the transmission interface. And the second equipment inputs the characteristic data 1 and the characteristic data 2 into respective output parts of the AI algorithm models for processing according to the received characteristic data 1 and the characteristic data 2, or inputs the characteristic data 1 and the characteristic data 2 into the AI algorithm models for fusion processing, so as to obtain better recognition effect.

Step 605: the processing result of the AI application is displayed on the second device.

In some embodiments, the processing results of the AI application may be displayed on the second device. For example, as shown in fig. 6d, a lane where the current vehicle is located may be displayed on the display screen of the second device, a navigation path planned for the user based on the lane where the vehicle is located may be displayed, and the like.

A control instruction for the first device may also be generated according to a processing result of the task of the AI application, and specific details may refer to the embodiment in fig. 5b, which are not described herein again.

The first equipment and the second equipment are connected through the transmission interface to form an AI application cooperative distributed system, the information perception capability and the light-weight AI processing capability of the first equipment are combined with the stronger computing hardware, the AI processing capability and the interactive capability of the second equipment, and the tasks of AI applications such as voice interaction, visual interaction, environment modeling and the like are cooperatively completed.

Example three

As shown in fig. 7a, the first device and the second device may constitute a distributed voice interactive system in which AI applications are coordinated. The first device may have audio information acquisition capabilities and AI algorithm model processing hardware. For example, smart headphones, smart speakers, AR/VR head displays, vehicle audio collection devices, smart appliances with audio collection capabilities. The second device can be a mobile phone, a smart television, a vehicle-mounted host and other terminal devices with strong calculation. Optionally, the second device may also have a display function. Considering that the AI application is mainly used for transmitting audio signals in this example, the AI application cooperates with a corresponding transmission interface, and the AI application may be a wireless transmission system such as wifi or bluetooth, and may also be an electrical signal transmitted by a wire, an optical signal transmitted by an optical fiber, and the like. As shown in fig. 7b, the present application provides a schematic flow chart of a media information transmission method, which specifically includes:

step 701: the first device obtains media information.

Wherein the first device may be audio capture capable and AI algorithm model processing hardware.

The method comprises the steps of collecting original audio signals through an audio unit of first equipment, preprocessing the original audio signals through a processing unit of the first equipment to obtain media information to be transmitted, and transmitting the media information to an NPU of the first equipment. For example, as shown in fig. 7c, the first device is a smart headset, and media information such as voice input by a user or environmental noise is collected on a microphone of the first device by establishing an AI application cooperation (e.g., an AI application for noise reduction, voice interaction, etc.) with the second device through a microphone unit in the first device.

Step 702: the first device performs feature extraction on the media information according to an input part (e.g., a first feature extraction model) of the AI algorithm model, and determines feature data.

An input part of the AI algorithm model is loaded in the NPU of the first device, so that the media information is subjected to feature extraction through the input part of the AI algorithm model in the NPU to obtain feature data. For example, the AI application is a related application of voice recognition interaction, in this case, the AI algorithm model corresponding to the AI application may be an AI algorithm model for recognizing voice, and in this case, feature extraction may be performed on audio information collected on a microphone of the first device according to an input part of the AI model for voice recognition, so as to determine feature data after feature extraction. For another example, the AI application is an automatic noise reduction related application, at this time, the AI algorithm model corresponding to the AI application may be an AI algorithm model for identifying environmental noise, and at this time, feature extraction may be performed on audio information acquired on a microphone of the first device according to an input portion of the AI model for noise identification, so as to determine feature data after the feature extraction is performed.

Step 703: the first device sends the feature data to the second device.

Step 704: the second device processes the feature data according to an output portion (e.g., a first feature data processing model) of the AI algorithm model.

The second device may obtain the inference result of the AI algorithm model by processing using the output part of the AI algorithm model. And providing the inference result of the AI algorithm model for a subsequent AI application program to obtain the processing results of subsequent voice interaction tasks such as voice recognition, natural language processing, voiceprint recognition and the like.

In connection with the above example, the speech recognition result captured by the first device may be determined according to the processing of the feature data, for example, the speech recognition result is a video specified by the search. At this time, as shown in fig. 7d, the second device may display the voice recognition result on a display interface of the second device, and search for a corresponding video according to the voice recognition result, and further, may display the searched video on an AI collaboration interface or jump to a corresponding video playing application, so as to complete a task of an AI application of voice recognition interaction.

In combination with the above example, the noise identification result acquired by the first device may be determined according to the processing of the feature data, so that corresponding noise-reduced audio information is generated according to the noise identification result, and the noise-reduced audio information is sent to the first device through the transmission interface, so that when a microphone of the first device performs audio recording or audio/video playing by the audio/video playing unit, the noise reduction of the audio recording or audio playing of the first device is realized through the noise-reduced audio information.

The audio information acquired by the first device can be transmitted to the second device instead of being transmitted to the second device after being converted into abstract characteristic data through the input part of the AI algorithm model; after the characteristic data is subjected to model processing, obvious information loss exists, and audio and video information which can be directly understood by people cannot be recovered, so that the privacy protection capability is improved; the data volume of the characteristic data is obviously lower than that of the original audio and video information, and the characteristic data can be transmitted in real time under the condition of smaller channel bandwidth, so that an additional compression coding process is omitted, the power consumption and the time delay of a system are reduced, the cost is reduced, and the product competitiveness is improved.

Combining the audio sensing capability and the light AI processing capability of the sending end with stronger computing hardware and AI processing capability of the second equipment to cooperatively complete voice interaction tasks such as voice recognition, natural voice processing, voiceprint recognition module and the like; the interface does not need to support the bidirectional transmission of video information, the required bandwidth is low, and the method is more suitable for short-distance wireless transmission.

The above embodiments of the present application can be combined arbitrarily to achieve different technical effects.

In the embodiments provided in the present application, the methods provided in the embodiments of the present application are described from the perspective of the first device and the second device as execution subjects. In order to implement the functions in the method provided by the embodiments of the present application, the electronic device may include a hardware structure and/or a software module, and the functions are implemented in the form of a hardware structure, a software module, or a hardware structure and a software module. Whether any of the above-described functions is implemented as a hardware structure, a software module, or a hardware structure plus a software module depends upon the particular application and design constraints imposed on the technical solution.

Based on the same concept, fig. 8 shows an electronic device 800 of the present application, which includes: a transceiver module 801, an acquisition module 803, and a processing module 802, and optionally, the electronic device 800 may further include a display module. Exemplarily, the electronic apparatus 800 may be a first device in the embodiment of the present application. At this time, the transceiving module 801 includes a first transmission interface.

An acquisition module 803, configured to acquire first media information;

a processing module 802, configured to perform feature extraction on the first media information, and determine first feature data of the first media information; and sending the first characteristic data to a second device through the first transmission interface, wherein the first characteristic data is used for the second device to obtain the result of the first application.

In a possible implementation manner, the transceiver module 801 is configured to receive, through the first transmission interface, a capability negotiation request message sent by the second device; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information; sending a capability negotiation response message to the second device through the first transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data and the feature extraction capability of the first device.

In one possible implementation, the processing module 802 is configured to send a first notification message to the second device through the transceiver module 801 in response to a first operation on a first application; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; a transceiver module 801, configured to receive a first response message returned by the second device; the first response message is used for confirming that the first device and the second device start first application cooperation.

In a possible implementation manner, the transceiver module 801 is configured to receive a first notification message sent by the second device; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; a processing module 802, configured to send a first response message to the second device through the transceiver module 801 in response to a third operation on the first application; the first response message is used for confirming that the first device and the second device start first application cooperation.

In a possible implementation manner, the transceiver module 801 is configured to send a capability negotiation request message to the second device through the first transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the second device and the feature data processing capability of the second device, and the transmission protocol of the second device is used for indicating that the second device supports transmission of feature data; the feature data processing capability of the second device is used for indicating the capability of the second device supporting the processing of the first feature data to obtain the result of the first application; receiving a capability negotiation response message from the second device through the first transmission interface; the capability negotiation response message is used for confirming that the second device supports a transmission protocol for transmitting the feature data and the feature data processing capability of the second device.

In a possible implementation manner, the processing module 802 is configured to obtain a first feature extraction model through the transceiver module 801; the first feature extraction model is used for extracting features of the first media information, the version of the first feature extraction model corresponds to the version of a first feature data processing model, and the first feature data processing model is used for processing the first feature data by the second device to obtain a result of the first application.

In a possible implementation manner, the transceiver module 801 is configured to receive the first feature extraction model from the second device through the first transmission interface, or receive the first feature extraction model from a server, or read the first feature extraction model stored by the first device.

In a possible implementation manner, the transceiver module 801 is configured to send a first feature data processing model to the second device through the first transmission interface; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature data processing model is used for the second device to process the first feature data to obtain the result of the first application.

A possible implementation manner is processing module 802, configured to obtain, through transceiver module 801, a second feature extraction model, where a version of the second feature extraction model corresponds to a version of a second feature data processing model, and the second feature extraction model and the second feature data processing model are determined after updating the first feature extraction model and the second feature data processing model.

In a possible implementation manner, the processing module 802 is configured to perform feature extraction on a training sample according to the first feature extraction model to generate first training feature data; a transceiver module 801, configured to send the first training characteristic data to the second device through the first transmission interface; the first training feature data is used to train the first feature extraction model and the first feature data processing model.

In a possible implementation manner, the transceiver module 801 is configured to receive feedback data from the second device through the first transmission interface, where the feedback data is determined after the second device is trained according to the first training characteristic data; the feedback data is used by the first device to train the first feature extraction model.

In a possible implementation manner, the transceiver module 801 is configured to receive a first message from the second device through the first transmission interface; the first message is used for indicating the state of the first equipment for collecting the media information; and responding to the first message, and adjusting the state of the first equipment for collecting the media information.

In a possible implementation manner, the transceiver module 801 is configured to receive a second message from the second device through the first transmission interface; wherein the second message is used for instructing the first device to acquire first data; a processing module 802, configured to, in response to the second message, obtain the first data, or collect the first data; sending the first data to the second device; the first data is one of: the media information collected by the first device, the parameters of the first device, the data stored by the first device, and the data received by the first device.

In a possible implementation manner, the transceiver module 801 is configured to send the first data to the second device through the first transmission interface.

In a possible implementation manner, the transceiver module 801 is configured to receive a second message from the second device through the first transmission interface, where the second message is used to instruct the first device to collect feature data of third media information; collecting the third media information in response to the second message; extracting the characteristics of the third media information to obtain third characteristic data; and sending the third characteristic data to the second equipment through the first transmission interface.

In a possible implementation manner, the transceiver module 801 is configured to receive, through the first transmission interface, a third message from the second device, where the third message is determined by the second device according to the first feature data, and the third message is used to indicate content displayed by the first device; a processing module 802, configured to, in response to the third message, display, by a display module, content in the third message for indicating the first device to display.

In a possible implementation manner, the transceiver module 801 is configured to receive, through the first transmission interface, an authentication request message sent by the second device, where the authentication request message is used to request whether the first device establishes a communication connection with the second device, and the communication connection is used to confirm that the second device controls the authority of the first device; sending an authentication response message to the second device through the first transmission interface; the authentication response message is used for confirming the authority of the second device to control the first device.

In a possible implementation manner, the transceiver module 801 is configured to receive, through the first transmission interface, an authentication success message sent by the second device; the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

In one possible implementation, the electronic device 800 may further include a first module; the authentication success message further includes at least one of: an identification of a first module of the first device, and an identification of the first module in the distributed system.

In a possible implementation manner, the transceiver module 801 may further include a third transmission interface; the first equipment and the second equipment establish channel connection through a third transmission interface; the feature data or the message sent by the first device is sent through the third transmission interface after being encapsulated into first bit stream data through the first transmission interface.

Based on the same concept, fig. 9 shows an electronic device 900 of the present application, which includes: a transceiver module 901 and a processing module 902, and optionally, the electronic device 900 may further include a display module. Exemplarily, the electronic apparatus 900 may be the second device in the embodiment of the present application. At this time, the transceiving module 901 includes a second transmission interface.

A transceiver module 901, configured to receive first feature data from a first device through the second transmission interface; the first characteristic data is determined after characteristic extraction is carried out on the collected first media information according to first equipment;

the processing module 902 is configured to process the first feature data to obtain a processing result of the first application.

In a possible implementation manner, the processing module 902 is configured to send, through the transceiver module 901, a first notification message to the first device in response to a second operation on the first application; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; receiving a first response message returned by the second equipment; the first response message is used for confirming that the first device and the second device start first application cooperation.

In a possible implementation manner, the processing module 902 is configured to receive, through the transceiver module 901, a first notification message sent by a first device; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device; in response to the fourth operation on the first application, sending a first response message to the first device through the transceiving module 901; the first response message is used for confirming that the first device and the second device start first application cooperation.

In a possible implementation manner, a capability negotiation request message is sent to the first device through the second transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information; receiving a capability negotiation response message sent by the first equipment through the second transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data.

a transceiver module 901, configured to receive, through the second transmission interface, a capability negotiation request message sent by the first device; the capability negotiation request message is used for requesting a transmission protocol supported by the second transmission interface and the feature data processing capability of the second device, and the transmission protocol of the second device is used for indicating that the second device supports transmission of feature data; the feature data processing capability of the second device is used for indicating the capability of the second device supporting the processing of the first feature data to obtain the result of the first application; sending a capability negotiation response message to the first device through the second transmission interface; the capability negotiation response message is used for confirming that the second device supports a transmission protocol for transmitting the feature data and the feature data processing capability of the second device.

In a possible implementation manner, the processing module 902 is configured to obtain a first feature data processing model; the first characteristic data processing model is used for the second equipment to process the first characteristic data to obtain a result of the first application; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature extraction model is used for performing feature extraction on the first media information.

In a possible implementation manner, the transceiver module 901 is configured to receive the first feature data processing model from the first device through the second transmission interface, or receive the first feature data processing model from a server, or the processing module 902 is configured to read the first feature data processing model stored in the second device;

in a possible implementation manner, the transceiver module 901 is configured to send a first feature extraction model to the first device through the second transmission interface; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature data processing model is used for the second device to process the first feature data to obtain the result of the first application.

In a possible implementation manner, the processing module 902 is configured to obtain a second feature data processing model, where a version of the second feature data processing model corresponds to a version of the second feature extraction model, and the second feature extraction model and the second feature data processing model are determined after the first feature extraction model and the second feature data processing model are updated.

In a possible implementation manner, the processing module 902 is configured to receive the first training feature data through the transceiver module 901; the first training feature data is determined after the first equipment performs feature extraction on a training sample according to the first feature extraction model; and training the first feature data processing model according to the first training feature data.

In a possible implementation manner, the processing module 902 is configured to obtain feedback data of the first feature extraction model; the feedback data is determined after the second equipment is trained according to the first training characteristic data; the feedback data is used for the first device to train the first feature extraction model; the feedback data is sent to the first device through the transceiver module 901.

In a possible implementation manner, the transceiver module 901 receives second feature data sent by the second device; the second feature data is determined after the first equipment performs feature extraction according to the collected second media information and the second feature extraction model; a processing module 902, configured to process the second feature data according to the second feature data processing model to obtain a result of the first application.

In a possible implementation manner, the transceiver module 901 is configured to send a first message to the first device through the second transmission interface; the first message is used for indicating the state of the first device for collecting the media information.

In a possible implementation manner, the transceiver module 901 is configured to send a second message to the first device through the second transmission interface; the second message is used for instructing the first equipment to acquire first data; the first data is one of: the media information collected by the first device, the parameters of the first device, the data stored by the first device, and the data received by the first device.

In a possible implementation manner, the transceiver module 901 is configured to receive the first data from the first device through the second transmission interface.

In a possible implementation manner, the transceiver module 901 is configured to send a second message to the first device through the second transmission interface; the second message is used for instructing the first equipment to collect the characteristic data of the third media information; receiving third characteristic data sent by the first equipment through the second transmission interface; the third feature data is determined after the first device performs feature extraction on the acquired third media information.

In a possible implementation manner, the processing module 902 is configured to generate a third message in response to a processing result of the first feature data; the third message is used for indicating the content displayed by the first device.

the transceiver module 901 is configured to receive a fourth message through the second transmission interface; the fourth message comprises M first feature data of the N first devices; n, M is a positive integer greater than 1; m is greater than or equal to N;

a processing module 902, configured to process the M first feature data according to the feature data processing models corresponding to the M first feature data, so as to obtain a result of the first application.

In a possible implementation manner, the transceiver module 901 is configured to send an authentication request message to the first device through the second transmission interface, where the authentication request message is used to request whether the first device establishes a communication connection with the second device; the communication connection is used for confirming the authority of the second device to control the first device; receiving an authentication response message sent by the second device through the second transmission interface; the authentication response message is used for confirming whether the first device establishes communication connection with the second device.

In a possible implementation manner, the processing module 902 is configured to, in response to an authentication response message sent by the second device, set a device identifier corresponding to the first device and identifiers of a distributed system in which the first device and the second device are located for the first device; the device identifier corresponding to the first device and the identifier of the distributed system are used for communication between the first device and the second device; the first equipment sends an authentication success message to the second equipment through the first transmission interface; the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

In a possible implementation manner, the transceiver module 901 further includes a third transmission interface; the first equipment and the second equipment establish channel connection through a third transmission interface; and the message sent by the second device is sent through the third transmission interface after being encapsulated into second bit stream data through the second transmission interface.

An embodiment of the present application further provides a media information transmission system, which includes the electronic apparatus 800 shown in fig. 8 or the first device shown in fig. 3a, and further includes the electronic apparatus 900 shown in fig. 9 or the second device shown in fig. 3 b.

Embodiments of the present application further provide a computer storage medium, which is used to store a computer program, and when the computer program runs on a computer, the computer is caused to execute the method as described in any one of the possible embodiments in fig. 2a to fig. 7 a.

Embodiments of the present application also provide a computer program product including instructions for storing a computer program, which, when executed on a computer, causes the computer to perform the method described in any one of the possible embodiments in fig. 2 a-7 a.

It should be understood that the processor mentioned in the embodiments of the present application may be a CPU, and may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific implementation of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A media information transmission method is applied to a first device, the first device comprises a first transmission interface, and the method comprises the following steps:

collecting first media information;

extracting the characteristics of the first media information, and determining first characteristic data of the first media information;

and sending the first characteristic data to a second device through the first transmission interface, wherein the first characteristic data is used for the second device to obtain the result of the first application.

2. The method of claim 1, wherein the method further comprises:

sending a first notification message to the second device in response to a first operation on the first application; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device;

receiving a first response message returned by the second equipment; the first response message is used for confirming that the first device and the second device start the first application cooperation.

3. The method of claim 1 or 2, wherein prior to sending the first characteristic data to the second device, further comprising:

receiving a capability negotiation request message sent by the second device through the first transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information;

sending a capability negotiation response message to the second device through the first transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data and the feature extraction capability of the first device.

4. A method as recited in any of claims 1-3, wherein said extracting the features of the first media information is preceded by:

acquiring a first feature extraction model;

the first feature extraction model is used for extracting features of the first media information, the version of the first feature extraction model corresponds to the version of a first feature data processing model, and the first feature data processing model is used for processing the first feature data by the second device to obtain a result of the first application.

5. The method of claim 4, wherein the capability negotiation response message further comprises:

6. The method of any one of claims 1-5, further comprising:

receiving a first message from the second device through the first transmission interface; the first message is used for indicating the state of the first equipment for collecting the media information;

and responding to the first message, and adjusting the state of the first equipment for collecting the media information.

7. The method of any one of claims 1-6, further comprising:

receiving a second message from the second device through the first transmission interface, wherein the second message is used for instructing the first device to collect characteristic data of third media information;

collecting the third media information in response to the second message;

extracting the characteristics of the third media information to obtain third characteristic data;

and sending the third characteristic data to the second equipment through the first transmission interface.

8. The method of any of claims 6-7, wherein the second message or the first message is determined by the second device based on the first characteristic data.

9. The method of any one of claims 1-8, further comprising:

sending an authentication response message to the second device through the first transmission interface; the authentication response message is used for confirming the authority of the second device for controlling the first device;

10. The method of any of claims 1-9, wherein the first device further comprises a third transmission interface; the first equipment and the second equipment establish channel connection through a third transmission interface; the feature data or the message sent by the first device is sent through the third transmission interface after being encapsulated into first bit stream data through the first transmission interface.

11. The method of any one of claims 1-10, wherein the first device establishes a channel connection with the second device through a third transport interface; the message received by the first device is second bitstream data received through the third transmission interface, and the second bitstream data is obtained by decapsulating the second bitstream data through the first transmission interface.

12. The method of any of claims 1-11, wherein the first device further comprises a display unit; the method further comprises the following steps:

receiving a third message from the second device through the first transmission interface, wherein the third message is a result of the first application determined by the second device according to the first characteristic data, and the third message is used for indicating content displayed by the first device;

and responding to the third message, and displaying the content which is used for indicating the first equipment to display in the third message through a display unit.

13. A media information transmission method is applied to a second device; the second device comprises a second transmission interface; the method comprises the following steps:

receiving first feature data from a first device through the second transmission interface; the first characteristic data is determined after characteristic extraction is carried out on the collected first media information according to first equipment;

and processing the first characteristic data to obtain a processing result of the first application.

14. The method of claim 13, wherein the method further comprises:

sending a first notification message to the first device in response to a second operation on the first application; the first device is an electronic device which establishes communication connection with the second device; the first notification message is used for requesting the first device to establish first application cooperation with the second device;

receiving a first response message returned by the first equipment; the first response message is used for confirming that the first device and the second device start first application cooperation.

15. The method of claim 13 or 14, wherein prior to receiving the first characteristic data from the first device, further comprising:

sending a capability negotiation request message to the first device through the second transmission interface; the capability negotiation request message is used for requesting a transmission protocol supported by the first device and the feature extraction capability of the first device; the transmission protocol of the first device is used for indicating that the first device supports transmission characteristic data; the feature extraction capability of the first device is used for indicating that the first device supports extracting first feature data of the first media information;

receiving a capability negotiation response message sent by the first equipment through the second transmission interface; the capability negotiation response message is used for confirming that the first device supports a transmission protocol for transmitting the feature data.

16. The method of any of claims 13-15, wherein prior to receiving the first characterization data, further comprising:

acquiring a first characteristic data processing model;

the first characteristic data processing model is used for the second equipment to process the first characteristic data to obtain a result of the first application; the version of the first feature extraction model corresponds to the version of the first feature data processing model, and the first feature extraction model is used for performing feature extraction on the first media information.

17. The method of any one of claims 13-16, further comprising:

sending a first message to the first device through the second transmission interface; the first message is used for indicating the state of the first equipment for collecting the media information; the state of the first device collecting media information comprises at least one of the following: an on state, an off state, or parameters to collect media information.

18. The method of any one of claims 13-17, further comprising:

sending a second message to the first device through the second transmission interface; the second message is used for instructing the first equipment to collect the characteristic data of the third media information;

receiving third characteristic data sent by the first equipment through the second transmission interface; the third feature data is determined after the first device performs feature extraction on the acquired third media information.

19. A method according to any of claims 17-18, wherein the first message or the second message is determined from the result of the processing of the first characteristic data.

20. The method of any of claims 13-19, wherein the number of first devices is N; the method further comprises the following steps:

21. The method of any one of claims 17-20, further comprising:

sending an authentication request message to the first device through the second transmission interface, wherein the authentication request message is used for requesting whether the first device establishes a communication connection with the second device; the communication connection is used for confirming the authority of the second device to control the first device;

receiving an authentication response message sent by the second device through the second transmission interface; the authentication response message is used for confirming that the first device establishes communication connection with the second device.

22. The method of claim 21, wherein the method further comprises:

responding to an authentication response message sent by the second equipment, and setting an equipment identifier corresponding to the first equipment and identifiers of distributed systems where the first equipment and the second equipment are located for the first equipment; the device identifier corresponding to the first device and the identifier of the distributed system are used for communication between the first device and the second device;

the first equipment sends an authentication success message to the second equipment through the first transmission interface; the authentication success message includes: and the device identifier corresponds to the first device, and the identifiers of the distributed system in which the first device and the second device are located.

23. The method of any of claims 13-22, wherein the second device further comprises a third transmission interface; the first equipment and the second equipment establish channel connection through a third transmission interface; and the message sent by the second device is sent through the third transmission interface after being encapsulated into second bit stream data through the second transmission interface.

24. The method of any of claims 13-23, wherein the first device establishes a channel connection with the second device through a third transport interface; the characteristic data or the message received by the second device is the first bit stream data received through the third transmission interface, and the second bit stream data is obtained after being decapsulated through the second transmission interface.

25. The method of any one of claims 13-24, wherein the second device further comprises a display unit, the method further comprising:

displaying, by the display unit, a result of the first application.

26. An electronic device, wherein the electronic device comprises memory and one or more processors; wherein the memory is to store computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of claims 1-12.

27. An electronic device, wherein the electronic device comprises memory and one or more processors; wherein the memory is to store computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of claims 13 to 25.

28. A media information transmission system, comprising: an electronic device as claimed in claims 1-12 and an electronic device as claimed in claims 13-25.

29. A computer readable storage medium, characterized in that the computer readable storage medium comprises program instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1 to 12 or cause the electronic device to perform the method of any of claims 13 to 25.