CN113542688B - Audio and video monitoring method, device, equipment, storage medium and system - Google Patents

Audio and video monitoring method, device, equipment, storage medium and system Download PDF

Info

Publication number
CN113542688B
CN113542688B CN202110797686.1A CN202110797686A CN113542688B CN 113542688 B CN113542688 B CN 113542688B CN 202110797686 A CN202110797686 A CN 202110797686A CN 113542688 B CN113542688 B CN 113542688B
Authority
CN
China
Prior art keywords
audio
media
audio signals
signals
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110797686.1A
Other languages
Chinese (zh)
Other versions
CN113542688A (en
Inventor
黄凡夫
陈喆
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202110797686.1A priority Critical patent/CN113542688B/en
Publication of CN113542688A publication Critical patent/CN113542688A/en
Application granted granted Critical
Publication of CN113542688B publication Critical patent/CN113542688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/183Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching

Abstract

The embodiment of the application discloses an audio and video monitoring method, device, equipment, storage medium and system, and belongs to the field of audio and video monitoring. The method comprises the following steps: acquiring description information of each path of media signals in the multi-path media signals from the media acquisition equipment, wherein the description information comprises a serial number and a PT (potential Transformer), and the multi-path media signals comprise single-path video signals and multi-path audio signals; acquiring a transmission parameter corresponding to each audio signal in the multi-channel audio signals from the media acquisition equipment based on the serial numbers of the multi-channel media signals; acquiring a data packet of a plurality of paths of media signals from a media acquisition device; and determining a video data packet and an audio data packet from the acquired data packets based on the PT of the multi-path media signals, and determining a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the transmission parameter corresponding to each audio signal in the multi-path audio signals. The embodiment of the application can correctly distinguish the data packet of each path of media signal, and solves the transmission problem of multiple paths of media signals.

Description

Audio and video monitoring method, device, equipment, storage medium and system
Technical Field
The embodiment of the application relates to the field of audio and video monitoring, in particular to an audio and video monitoring method, device, equipment, storage medium and system.
Background
In a monitoring scene, the coverage range of a single video acquisition module is very wide, the coverage range is from several meters to several kilometers, the coverage range of a sound pickup is very limited, usually within the range of several meters, and therefore, in order to be able to acquire audio signals within the whole coverage range of the single video acquisition module, a monitoring scheme of the single video acquisition module and a plurality of sound pickups is usually required. Under the circumstances, a problem of transmission of a single video and multiple audios is involved, that is, how to correctly distinguish the single video and the multiple audios at a transmitting end and a receiving end, and the problem becomes a problem to be solved urgently at present.
Disclosure of Invention
The embodiment of the application provides an audio and video monitoring method, device, equipment, storage medium and system, which can solve the problem of audio and video monitoring under multi-path media signals. The technical scheme is as follows:
in a first aspect, an audio and video monitoring method is provided, which is applied to a media receiving device, and the method includes:
acquiring description information of each path of media signals in multiple paths of media signals from media acquisition equipment, wherein the description information comprises a number and a payload type PT, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, and the multiple paths of audio signals are acquired by adopting different sound pickups;
acquiring a transmission parameter corresponding to each audio signal in the multi-channel audio signals from the media acquisition equipment based on the serial numbers of the multi-channel media signals, wherein the transmission parameter is used for indicating a transmission channel or a transmission port of the corresponding audio signal;
acquiring data packets of the multi-path media signals from the media acquisition equipment;
and determining a video data packet and an audio data packet from the acquired data packets based on the PT of the multi-path media signals, and determining a data packet of each audio signal in the multi-path audio signals from the audio data packets based on the transmission parameter corresponding to each audio signal in the multi-path audio signals.
Optionally, the media receiving device communicates with the media collecting device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, and the audio data packet carries a channel identifier corresponding to the audio signal;
the determining, from the audio data packets, a data packet corresponding to each audio signal in the multiple audio signals based on the transmission parameter corresponding to each audio signal in the multiple audio signals includes:
and under the condition that the channel identifiers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining the data packet of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packets.
Optionally, the description information further includes a synchronization source SSRC;
the method further comprises the following steps:
and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Optionally, the media receiving device communicates with the media acquiring device through a user datagram protocol UDP, the transmission parameter includes a port number, and the audio data packet carries a port number corresponding to the audio signal;
the determining, from the audio data packets, a data packet corresponding to each audio signal in the multiple audio signals based on the transmission parameter corresponding to each audio signal in the multiple audio signals includes:
and under the condition that the port numbers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining a data packet corresponding to each audio signal in the multiple audio signals according to the port numbers carried in the audio data packets.
Optionally, the description information further includes an SSRC;
the method further comprises the following steps:
and under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Optionally, the media receiving device is a client;
the method further comprises the following steps:
playing the video signal and the multi-channel audio signal;
displaying description information of a plurality of sound pickups, wherein the sound pickups are used for collecting the multi-path audio signals;
receiving an audio signal switching instruction based on the description information of the plurality of sound collectors, wherein the audio signal switching instruction carries the number of a target audio signal, and the target audio signal is one of the plurality of paths of audio signals;
and switching the currently played audio signal to the target audio signal.
In a second aspect, an audio and video monitoring method is provided, which is applied to a media acquisition device, and the method includes:
providing description information of each media signal in a plurality of paths of media signals to a media receiving device, wherein the description information comprises a serial number and a payload type PT, the plurality of paths of media signals comprise a single path of video signals and a plurality of paths of audio signals, and the plurality of paths of audio signals are collected by different sound collectors;
providing a transmission parameter corresponding to each audio signal in the multiple audio signals to the media receiving device, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
and sending the data packets of the multiple paths of media signals to the media receiving equipment, so that the media receiving equipment determines the video data packets and the audio data packets from the obtained data packets, and determines the data packets of each path of audio signals in the multiple paths of audio signals from the audio data packets.
Optionally, the description information further includes a synchronization source SSRC;
before providing the description information of each media signal in the multiple media signals to the media receiving device, the method further includes:
and allocating a synchronous source SSRC for each audio signal in the multi-path audio signals.
Optionally, the description information is transmitted by using a session description protocol SDP.
Optionally, the method further includes:
and providing the description information of the sound pickup corresponding to each audio signal in the multi-channel audio signals to the media receiving equipment.
In a third aspect, an audio and video monitoring apparatus is provided, which is applied to a media receiving device, and the apparatus includes:
the device comprises a description information acquisition module, a data acquisition module and a data processing module, wherein the description information acquisition module is used for acquiring the description information of each path of media signals in the multiple paths of media signals from a media acquisition device, the description information comprises a serial number and a payload type PT, the multiple paths of media signals comprise a single path of video signals and multiple paths of audio signals, and the multiple paths of audio signals are acquired by adopting different sound pickups;
a transmission parameter obtaining module, configured to obtain, from the media acquisition device, a transmission parameter corresponding to each of the multiple channels of audio signals based on the number of the multiple channels of media signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
the data packet acquisition module is used for acquiring the data packets of the multi-path media signals from the media acquisition equipment;
and the determining module is used for determining a video data packet and an audio data packet from the acquired data packets and determining a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the PT of the multi-path media signals.
Optionally, the media receiving device communicates with the media collecting device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, and the audio data packet carries a channel identifier corresponding to the audio signal;
the determining module is specifically configured to:
and under the condition that the channel identifiers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining the data packet of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packets.
Optionally, the description information further includes a synchronization source SSRC;
the determining module is specifically configured to:
and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Optionally, the media receiving device communicates with the media acquiring device through a user datagram protocol UDP, the transmission parameter includes a port number, and the audio data packet carries a port number corresponding to the audio signal;
the determining module is specifically configured to:
and under the condition that the port numbers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining a data packet corresponding to each audio signal in the multiple audio signals according to the port numbers carried in the audio data packets.
Optionally, the description information further includes an SSRC;
the determining module is specifically configured to:
and under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Optionally, the media receiving device is a client;
the device further comprises:
the playing module is used for playing the video signal and the multi-channel audio signal;
the display module is used for displaying description information of a plurality of sound pickups, and the sound pickups are used for collecting the multi-channel audio signals;
the receiving module is used for receiving an audio signal switching instruction based on the description information of the plurality of sound pickups, wherein the audio signal switching instruction carries the number of a target audio signal, and the target audio signal is one of the plurality of paths of audio signals;
and the switching module is used for switching the currently played audio signal to the target audio signal.
In a fourth aspect, an audio/video monitoring apparatus is provided, which is applied to a media acquisition device, the apparatus includes:
the first description information providing module is used for providing description information of each path of media signal in a plurality of paths of media signals to the media receiving equipment, wherein the description information comprises a serial number and a payload type PT, the plurality of paths of media signals comprise a single path of video signals and a plurality of paths of audio signals, and the plurality of paths of audio signals are collected by different sound collectors;
a transmission parameter providing module, configured to provide, to the media receiving device, a transmission parameter corresponding to each of the multiple audio signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
and the data packet sending module is used for sending the data packets of the multi-path media signals to the media receiving equipment.
Optionally, the description information further includes a synchronization source SSRC;
the device further comprises:
and the distribution module is used for distributing a synchronous signal source SSRC for each path of audio signals in the multi-path audio signals.
Optionally, the description information is transmitted by using a session description protocol SDP.
Optionally, the apparatus further comprises:
and the second description information providing module is used for providing the description information of the sound pickup corresponding to each audio signal in the multi-channel audio signals to the media receiving equipment.
In a fifth aspect, a media capturing device is provided, the device comprising:
the video acquisition module is used for acquiring a single-channel video signal;
the system comprises a plurality of sound pickups, a plurality of sound collecting units and a plurality of sound collecting units, wherein the sound pickups are used for collecting multiple paths of audio signals which are collected by different sound pickups;
a transmitter for providing description information of each media signal in a plurality of media signals to a media receiving device, wherein the description information comprises a number and a payload type PT, and the plurality of media signals comprise the single video signal and the plurality of audio signals; providing a transmission parameter corresponding to each audio signal in the multiple audio signals to the media receiving device, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal; and sending the data packets of the multi-path media signals to the media receiving equipment.
In a sixth aspect, a computer device is provided, where the computer device includes a memory and a processor, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so as to implement the steps of the above-mentioned audio/video monitoring method.
In a seventh aspect, a computer-readable storage medium is provided, where a computer program is stored in the storage medium, and when being executed by a processor, the computer program implements the steps of the audio/video monitoring method.
In an eighth aspect, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform the steps of the audio/video monitoring method described above.
The technical scheme provided by the embodiment of the application can at least bring the following beneficial effects:
according to the embodiment of the application, the media receiving equipment acquires the description information and the transmission parameters of the multi-path media signals from the media acquisition equipment, and after the media receiving equipment receives the data packet of the multimedia signals, the media receiving equipment can determine which path of media signals the data packet belongs to based on the PT and the transmission parameters, so that the media receiving equipment can correctly distinguish the data packet of each path of media signals, and the problem of transmission of the multi-path media signals is solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation environment of an audio/video monitoring method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an implementation environment of another audio/video monitoring method provided in an embodiment of the present application;
fig. 3 is a flowchart of an audio/video monitoring method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an audio/video monitoring device provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of another audio/video monitoring device provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a client according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
Before explaining the audio and video monitoring method provided by the embodiment of the present application in detail, an implementation environment provided by the embodiment of the present application is introduced.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an implementation environment in accordance with an example embodiment. The implementation environment includes a media capturing device 100, an external memory 200, and a client 300, the media capturing device 100 including at least one video capturing module and a plurality of microphones deployed at different locations. The media capturing device 100 is in communication connection with the external storage 200, the client 300 is in communication connection with the media capturing device 100, and the client 300 can also be in communication connection with the external storage 200. The communication connection may be a wired connection or a wireless connection, which is not limited in this embodiment of the present application.
The media capturing device 100 is configured to capture an audio signal and a video signal, and encode and pack the captured audio signal and video signal to obtain a plurality of data packets. The resultant plurality of data packets are transmitted to the external memory 200 or the client 300. As described above, the media capturing device 100 includes at least one video capturing module and a plurality of microphones, each video capturing module capturing a video signal and each microphone capturing an audio signal.
The external memory 200 is used for determining video data packets and audio data packets from the data packets sent by the media capturing device 100, determining data packets of the video signals captured by each video capturing module from the video data packets, and determining data packets of the audio signals captured by each sound pickup from the audio data packets by using the method provided by the embodiment of the application. In other embodiments, the external memory 200 is also used to transmit the stored data packets to the client 300.
The client 300 is configured to receive data packets sent by the media capturing device 100 or the external memory 200, and determine a video data packet and an audio data packet from the received data packets, determine a data packet of a video signal captured by each video capturing module from the video data packet, and determine a data packet of an audio signal captured by each microphone from the audio data packet by using the method provided by the embodiment of the present application. The video signal and the audio signal can then be played.
Optionally, referring to fig. 2, the media capturing device 100 includes a plurality of microphones, an audio encoding module, at least one video capturing module, a video encoding module, a packetizing module, and a transmitter. The plurality of sound pickups, the audio coding module, the at least one video acquisition module, the video coding module, the packaging module and the emitter can be integrated on the same physical device or different physical devices, and a user can select the sound pickups, the audio coding module, the at least one video acquisition module, the video coding module, the packaging module and the emitter as required without limitation.
Each sound pick-up is used for collecting an audio signal and sending the audio signal to the audio coding module, and the audio signal collected by the sound pick-up can be an analog signal or a digital signal. If the audio signal collected by the sound pickup is an analog audio signal, the analog audio signal needs to be converted into a digital audio signal, and then the digital audio signal is sent to the audio coding module. If the audio signal collected by the sound pick-up is a digital audio signal, the digital audio signal can be directly sent to the audio coding module.
The audio coding module is used for coding the audio signals collected by the sound pick-up and sending the coded audio signals to the packaging module. The audio encoding module supports encoding of audio signals in a plurality of formats, and the format of the audio signal is not limited herein.
It should be noted that, in the above fig. 2, the sound pick-up and the audio coding module are two independent modules, but in other embodiments, the audio coding module may also be integrated in the sound pick-up, that is, the sound pick-up integrated with the audio coding module not only has a function of collecting an audio signal, but also has a function of coding the audio signal.
Optionally, the multiple audio signals may also be uniformly encoded by using other external encoding devices except the media capturing device 100, and then the encoded audio signals are sent to the packaging module.
Each video acquisition module is configured to acquire a video signal, where the video signal may be a digital signal or other types of video signals, and is not limited herein.
The video coding module is used for coding the video signal collected by the video collecting module.
The packaging module is used for packaging the coded audio signal into an audio data packet and packaging the coded video signal into a video data packet according to the message format of the streaming media transmission protocol.
The transmitter is used for providing the description information of the multi-channel media signals to the media receiving equipment after receiving the description information request sent by the media receiving equipment, providing the transmission parameters of the multi-channel media signals to the media receiving equipment after receiving the transmission parameter request sent by the media receiving equipment, and then sending the audio data packet and the video data packet to the media receiving equipment.
Wherein, the video signal and the audio signal have different PTs (payload type), the PTs of different video signals may be the same or different, and the PTs of different audio signals may also be the same or different. However, the video signal and the video signal have different SSRCs (synchronization sources), and the video signal of a different path has a different SSRC, and the audio signal of a different path also has a different SSRC. The data packet encapsulated by the packaging module may carry corresponding PT and SSRC.
It should be noted that the packetizing module supports packetizing multiple media signals having different PTs or different SSRCs. The packaging module also supports packaging of multiple media signals of different encoding formats. The streaming media Transport Protocol may be RTP (real time Transport Protocol).
The external memory 200 includes a trans-encapsulation module and a storage module. The decapsulation module is configured to convert the data packet generated by the media capturing device 100 from one encapsulation format to another encapsulation format, for example, the data packet in the RTP encapsulation format may be decapsulated into a data packet in a PS (program stream) or MP4 (MPEG-4) encapsulation format. The storage module is configured to store the data packet after being encapsulated, for example, store the data packet encapsulated into a PS or MP4 encapsulation format.
The media capturing device may be a camera, a mobile phone, a PDA (Personal Digital Assistant), a tablet computer, or the like.
The client may be any electronic product capable of performing human-Computer interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction, or handwriting equipment, for example, a PC (Personal Computer), a mobile phone, a smart phone, a PDA, a wearable device, a Pocket PC (Pocket PC), a tablet Computer, a smart car, a smart television, a smart sound box, and the like.
The external memory may be any memory that supports the storage of audio-visual data. Optionally, the memory may also support forwarding the stored audio-video data. Such as a network video recorder.
It should be understood by those skilled in the art that the above-mentioned client and external memory are only examples, and other existing or future existing client and external memory may be applicable to the embodiments of the present application, and are included in the scope of the embodiments of the present application and are incorporated herein by reference.
The following explains the audio/video monitoring method provided in the embodiment of the present application in detail.
Fig. 3 is a flowchart of an audio and video monitoring method provided in an embodiment of the present application, where the method is applied to a communication process between a media receiving device and a media capturing device 100. The media receiving device may be the client 300 in fig. 1 or the external memory 200 in fig. 1. Referring to fig. 3, the method includes the following steps.
S101, the media collecting device collects multiple paths of media signals, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, and the multiple paths of audio signals are collected through different sound collectors.
Based on the above description, the media capturing device includes at least one video capturing module and a plurality of sound collectors, where each video capturing module is used to capture a video signal and each sound collector is used to capture an audio signal. In the case that the media capturing device includes a plurality of video capturing modules, the plurality of channels of video signals captured by the plurality of video capturing modules may also be distinguished by referring to a plurality of channels of audio signals.
Since the media capturing apparatus includes a plurality of microphones disposed at different positions, audio signals can be captured through the plurality of microphones over the entire coverage area of a single video capturing module.
After the media acquisition device acquires the multiple media signals, the multiple media signals can be sent to the client or an external memory, played by the client, or stored by the external memory. Please refer to S102-S106 described below for a detailed implementation process.
S102, the media receiving device obtains the description information of each path of media signal in the multi-path of media signals from the media collecting device, and the description information comprises a number and a PT.
The media receiving device sends a description information request message to the media collecting device, and the description information request message is used for requesting the description information of the multi-channel media signals. After receiving the request message of the description information, the media acquisition equipment sends a response message of the description information to the media receiving equipment, wherein the response message of the description information carries the description information of the multi-channel media signals.
The description information of the multi-path media signal includes the number and PT of each media signal, and optionally, the description information of the multi-path media signal may further include one or more of SSRC, coding format, clock frequency, channel number, and description information of a microphone of each media signal.
It should be noted that the description information of the multi-path media signal may be transmitted by using SDP (session description protocol), and of course, may also be transmitted by using other protocols. The SSRC of each media signal is distributed for each media signal by the media acquisition equipment. The description information of the sound pickup is determined by the media capturing device according to the orientation of the sound pickup corresponding to each audio signal, that is, the description information of the sound pickup is used for describing the orientation of the sound pickup. Of course, the description information of the sound pickup can also describe other attribute information of the sound pickup.
In addition, in the case that the description information includes the code, the PT and the SSRC, the media receiving device may further store the number, the PT and the SSRC of each media signal in a mapping relationship among the number, the PT and the SSRC after acquiring the description information of the multiple media signals.
For example, the description information request message is a DESCRIBE message, and the description information response message is a 200OK message. The media receiving device sends a DESCRIBE message to the media collecting device, for obtaining the description information of each path of media signal in the multiple paths of media signals. The media collecting device sends 200OK information to the media receiving device, and the description information of each path of media signal in the multiple paths of media signals carried by the 200OK information is as follows:
RTSP/1.0 200OK
CSeq:3
Content-Type:application/sdp
Content-Length:609
v=0
o=-1610842017961245 1610842017961245IN IP4 10.112.75.115
s=Media Presentation
e=NONE
b=AS:5100
t=0 0
a=control:*
m=video 0RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:5000
a=recvonly
a=x-dimensions:1280,720
a=control:trackID=0
a=rtpmap:96H264/90000
a=fmtp:96profile-level-id=420029;packetization-mode=1;
sprop-parameter-sets=Z00AH5Y1QKALdNwEBAUAAAcIAAFfkAQ=,aO48gA==
m=audio 0RTP/AVP 8 0
c=IN IP4 0.0.0.0
b=AS:50
a=recvonly
a=rtpmap:0PCMU/8000/1
a=ssrc:12345cname=name1
a=control:trackID=1
a=ssrc:34567cname=name2
a=control:trackID=2
a=rtpmap:8PCMA/8000/1
a=ssrc:67890cname=name3
a=control:trackID=3
a=Media_header:MEDIAINFO=494D4B48010300000400000111710110401F000000FA000000000000000000000000000000000000;
a=appversion:1.0
where m = video 0RTP/AVP 96 indicates that the media signal is a video signal and is transmitted by RTP, and PT is 96.a = control: trackID =0 indicates that the number of the video signal is 0.m = audio 0RTP/AVP 8 denotes a plurality of audio signals, which are transmitted using RTP, and PT is 8 and 0, respectively. a = rtpmap 0PCMU/8000/1 denotes that the coding format of the audio signal with PT 0 is PCMU, the clock frequency is 8000, and the number of channels is 1.a = SSRC:12345 indicates that the media capturing device allocates a SSRC of 12345 for the first audio signal with PT of 0, a = trackid =1 indicates that the number of the first audio signal with PT of 0 is 1. 34567 indicates that the media capturing device allocates a SSRC of 34567 for the second audio signal with PT of 0, a = trackid =2 indicates that the number of the second audio signal with PT of 0 is 2.a = rtpmap:8PCMA/8000/1 denotes an audio signal with PT 8 encoded in PCMA, a clock frequency of 8000, and a channel number of 1.a = SSRC:67890 indicates that the media capturing device assigns the audio signal with PT 8 has SSRC of 67890, a = track id =3 indicates that the audio signal with PT 8 has number 8.
Optionally, before the media receiving device sends the description information request message to the media acquiring device, the media receiving device may also send a communication mode request message to the media acquiring device, where the communication mode request message is used to request a communication mode supported by the media acquiring device. The media acquisition equipment receives the communication mode request message and sends a communication mode response message to the media receiving equipment, and the media receiving equipment receives the communication mode response message sent by the media acquisition equipment, wherein the communication mode response message carries the communication mode supported by the media acquisition equipment. Thus, the media receiving device can communicate with the media capturing device in a communication manner supported by the media capturing device.
For example, the communication mode request message is an OPTIONS message, and the communication mode response message is a 200OK message. The media receiving device sends an OPTIONS message to the media collecting device, wherein the OPTIONS message is used for acquiring the communication mode supported by the media collecting device. The media collecting device sends 200OK message to the media receiving device, the content is as follows:
RTSP/1.0 200OK
CSeq:1
Public:OPTIONS,DESCRIBE,PLAY,PAUSE,SETUP,TEARDOWN,SET_PARAMETER
Date:Sun,Jan 17 2021 00:06:57GMT
the Public field indicates communication modes supported by the media acquisition device, and the media acquisition device subsequently uses the communication modes to communicate with the media receiving device. For example, the media capturing device supports a DESCRIBE communication mode, and then the media receiving device may send a DESCRIBE message to the media capturing device to obtain description information of each media signal in the multiple media signals.
And S103, the media receiving device acquires the transmission parameter corresponding to each audio signal in the multi-channel audio signals from the media collecting device based on the number of the multi-channel media signals, wherein the transmission parameter is used for indicating the transmission channel or the transmission port of the corresponding audio signal.
The media receiving device obtains the number of each path of media signal from the description information of the multiple paths of media signals. For each path of media signal in the multi-path media signals, the media receiving device sends a transmission parameter request message to the media collecting device, and the transmission parameter request message carries the number of the path of media signal. The media acquisition equipment receives the transmission parameter request message and sends a transmission parameter response message to the media receiving equipment, wherein the transmission parameter response message carries the transmission parameters corresponding to the media signal. And the media receiving equipment receives the transmission parameter response message and determines the transmission parameters carried by the transmission parameter response message as the transmission parameters corresponding to the media signals.
Optionally, the transmission parameter request message may also carry transmission parameters that the media receiving device expects to use with respect to the path of media signal. In this way, after the media capturing device receives the transmission parameter request message, the transmission parameter expected to be used by the media capturing device can be determined based on the transmission parameter expected to be used by the media receiving device with respect to the path of media signal, and the transmission parameter is carried in the transmission parameter response message and sent to the media receiving device.
The transmission protocol between the media receiving device and the media collecting device may be agreed in advance. Of course, the media receiving device may also negotiate with the media capturing device.
Illustratively, the media receiving device sends a transport protocol request message to the media capturing device, the transport protocol request message carrying the type of transport protocol that the media receiving device expects to use. The media acquisition equipment receives the transmission protocol request message and sends a transmission protocol response message to the media receiving equipment, wherein the transmission protocol response message carries the type of the transmission protocol expected to be used by the media acquisition equipment. At this time, the media receiving device may determine the transmission protocol corresponding to the type of the transmission protocol carried in the transmission protocol response message as the transmission protocol for the media receiving device and the media acquiring device to transmit the media signal. Thus, the process of negotiating the transmission protocol between the media receiving device and the media collecting device is completed.
It should be noted that the transmission parameter request message and the transmission protocol request message may be the same request message, that is, the media receiving device may obtain the transmission protocol while obtaining the transmission parameter of the media signal, and of course, the two request messages may also be different request messages, which is not limited in this embodiment of the present application.
In addition, after the media receiving device acquires the transmission parameter corresponding to each audio signal in the multiple audio signals, the number and the transmission parameter of each media signal can be stored in the mapping relationship between the number and the transmission parameter. Of course, the transmission parameters of each media signal can also be stored together with the above-mentioned number, PT and SSRC. That is, the mapping relationship between the storage number, PT, SSRC, and transmission parameter.
It should be noted that the transmission parameter may be a channel identifier, which is used to indicate a transmission channel of the media signal. Alternatively, the transmission parameter may be a port number indicating a transmission port of the media signal. The port number may include a client port number (client port) and a service port number (server port).
And when the transmission parameter is the channel identifier, the transmission parameter stored in the mapping relation is the channel identifier. When the transmission parameter is a port number, the transmission parameter stored in the mapping relationship is the port number.
For example, the transmission parameter request message and the transmission protocol request message are the same request message, and the request message is a SETUP message, which is as follows:
SETUP rtsp://10.112.75.115:554/PSIA/streaming/channels/101/trackID=1RTSP/1.0
CSeq:4
Authorization:Digest username="admin",realm="IP Camera(12345)",nonce="6af89f9fb8ef5bce472ee0dc07c1ca65",
uri="rtsp://10.112.75.115:554/PSIA/streaming/channels/101",response="a9acc7819a490abbf01bf2254a5a070c"
User-Agent:NKPlayer-VSPlayer1.0
Transport:RTP/AVP/TCP;unicast;interleaved=0-1
where trackID =1 represents a transmission parameter that the media receiving apparatus requests the one audio signal numbered 1. RTP/AVP/TCP indicates that a Transport Protocol expected to be used by the media receiving device is RTP/TCP, interleaved =0-1 indicates a channel identifier, indicates that the media receiving device is expected to receive an RTP packet using a channel 0, and is expected to receive an RTCP (real time Transport Control Protocol) packet using a channel 1.
The media collecting device sends 200OK message to the media receiving device, the 200OK message carries the transmission protocol and transmission parameter expected to be used when the media collecting device transmits the media signal:
RTSP/1.0 200OK
CSeq:4
Session:1656473847;timeout=60
Transport:RTP/AVP/TCP;unicast;interleaved=0-1;ssrc=736dd621;mode="play"
Date:Sun,Jan 17 2021 00:06:57GMT
wherein, transport: RTP/AVP/TCP represents that the transmission protocol expected to be used by the media acquisition equipment is RTP/TCP, interleaved =0-1 represents that the media acquisition equipment expects to use channel 0 to send RTP messages, and expect to use channel 1 to send RTCP messages.
For another example, the transmission parameter request message and the transmission protocol request message are the same request message, the request message is a SETUP message, and the SETUP message is as follows:
SETUP rtsp://10.112.75.115:554/PSIA/streaming/channels/101/trackID=1RTSP/1.0
CSeq:4
Authorization:Digest username="admin",realm="IP Camera(12345)",nonce="a4a2ee6a5ee449f398e8ebcee6b5c78b",
uri="rtsp://10.112.75.115:554/PSIA/streaming/channels/101",response="3eb8a1ec516a007cbbbd46e72af684f7"
User-Agent:NKPlayer-VSPlayer1.0
Transport:RTP/AVP;unicast;client_port=58000-58001
wherein, the trackID =1 represents the transmission parameter of the audio signal with number 1 requested by the media receiving device, RTP/AVP represents that the transmission protocol expected to be used by the media receiving device is RTP/UDP, client _ port =58000-58001 represents the port number, it represents that the media receiving device expects to use port 58000 to receive RTP message, and expects to use port 58001 to receive RTCP message.
The media collecting device sends 200OK message to the media receiving device, the 200OK message carries the transmission protocol and transmission parameter expected to be used when the media collecting device transmits the media signal:
RTSP/1.0 200OK
CSeq:4
Session:353530289;timeout=60
Transport:RTP/AVP;unicast;client_port=58000-58001;server_port=8360-8361;ssrc=44b0b59d;mode="play"
Date:Sun,Jan 17 2021 00:09:32GMT
wherein, transport: RTP/AVP indicates that the Transport protocol expected to be used by the media collection device is RTP/UDP, server _ port =8360-8361 indicates that the media collection device expects to use the port 8360 to send RTP messages, and expect to use the port 8361 to send RTCP messages.
S104, the media receiving device acquires the data packet of the multi-path media signal from the media collecting device.
As described above, after the media receiving device obtains the transmission parameter used by the media acquiring device to send each path of media signal, the media acquiring device may send the data packet of a corresponding path of media signal to the media receiving device according to the transmission parameter corresponding to each path of media signal. Thus, the media receiving device can receive the data packet of the corresponding path of media signal according to the transmission parameter corresponding to each path of media signal.
Based on the above description, the transmission protocol may be RTP/TCP or RTP/UDP, and the transmission parameter may include a channel identifier or a port number. The following describes the process of the media receiving device acquiring the data packets of the multi-path media signal from the media capturing device in two cases.
In the first case, the transport protocol is RTP/TCP and the transport parameters include channel identification.
In this case, after the media capturing device encodes the captured video signal and the multiple audio signals, the media capturing device may use an RTP format to package the encoded video signal and the multiple audio signals to obtain a plurality of RTP data packets, and then use a TCP (Transmission Control Protocol) to send the plurality of RTP data packets to the media receiving device through the Transmission channel indicated by the channel identifier. The media receiving device receives the plurality of RTP packets on the transmission channel indicated by the channel identification using TCP.
In the second case, the transport protocol is RTP/UDP and the transport parameters include port numbers.
In this case, after the media acquisition device encodes the acquired video signal and the multiple audio signals, the media acquisition device may use an RTP format to pack the encoded video signal and the multiple audio signals to obtain a plurality of RTP data packets, and then use a UDP (User Datagram Protocol) to send the plurality of RTP data packets to the media reception device through the transmission port indicated by the port number. The media receiving device receives the plurality of RTP packets on the transmission port indicated by the port number using UDP.
Optionally, when data packets of the multiple media signals are transmitted, each data packet may carry an SSRC of the corresponding media signal. Moreover, the media collecting device may periodically send an RTCP message to the media receiving device, where the RTCP message carries the NTP timestamp, the RTP timestamp, and the SSRC. The media receiving device may match the SSRC carried in the data packet corresponding to the multiple paths of media signals with the SSRC carried in the RTCP message, so as to determine the data packet that is the same as the SSRC carried in the RTCP message. Then, based on NTP time stamp and RTP time stamp in RTCP message, and determined RTP time stamp in data packet, implementing audio-video synchronization. The NTP timestamp is the absolute time of the media acquisition equipment for sending the RTCP message, and the RTP timestamp has the same unit and a random initial value with the RTP timestamp in the determined data packet.
It should be noted that the RTCP message may be an SR (sender report) message. The NTP timestamp and RTP timestamp may be padded in RTCP messages according to RFC3550 standard.
S105, the media receiving device determines a video data packet and an audio data packet from the acquired data packets based on the PT of the multi-path media signals, and determines a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the transmission parameter corresponding to each audio signal in the multi-path audio signals.
For each acquired data packet, the media receiving device analyzes the data packet to acquire the PT carried in the data packet. Matching the PT carried in the data packet with the PT of each media signal acquired in S102. If the PT carried in the data packet is the same as the PT of the video signal acquired in S102, the data packet is determined to be a video data packet. If the PT carried in the data packet is the same as the PT of any one of the multiple audio signals acquired in S102, the data packet is determined to be an audio data packet.
Since multiple audio signals exist in the embodiment of the present application, after the audio data packets are determined, it is further required to determine to which audio signal each audio data packet specifically belongs. For this process, the embodiments of the present application provide the following two realizable manners:
in a first implementation manner, the media receiving device communicates with the media acquiring device through TCP, the transmission parameter includes a channel identifier, and the audio data packet carries a channel identifier corresponding to the audio signal to which the audio data packet belongs. And under the condition that the channel identifiers corresponding to the multiple audio signals are different based on the acquired transmission parameters, determining the data packet of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packets. And under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the acquired transmission parameters, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packet based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Under the condition that the channel identifiers corresponding to the multiple channels of audio signals are determined to be different based on the acquired transmission parameters, because the channel identifier of each channel of audio signals is unique and a mapping relation exists between the channel identifier and the serial number of the audio signals, the data packet of each channel of audio signals in the multiple channels of audio signals can be determined according to the channel identifiers carried in the audio data packets and the mapping relation between the serial numbers and the channel identifiers.
Under the condition that the channel identifiers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the acquired transmission parameters, the channel identifier of each path of audio signals is no longer unique, and the audio signals to which the audio signal data packets belong cannot be determined through the channel identifiers. However, based on the above description, the data packet corresponding to the media signal may carry PT and SSRC, and the combination of PT + SSRC of each audio signal is unique, so the data packet corresponding to each audio signal in the multiple audio signals may be determined from the audio data packet based on the number, the mapping relationship between PT and SSRC, and the PT and SSRC carried in the audio data packet.
The following illustrates the case where the channel identifications corresponding to the multiple audio signals are different and the same.
For example, the mapping relationship between the numbers generated according to the SETUP message and the 200OK message in step S103 and the channel identifiers is shown in table 1 below.
TABLE 1
Figure BDA0003163439610000181
Figure BDA0003163439610000191
That is, the channel ID of the audio signal numbered 1 is 0-1,0 is the channel ID of RTP, and 1 is the channel ID of RTCP. The channel identification of the audio signal numbered 2 is 2-3,2 is the channel identification of RTP, and 3 is the channel identification of RTCP. The channel identification of the audio signal numbered 3 is 4-5,4 is the channel identification of RTP, and 5 is the channel identification of RTCP.
Since the channel identifications of the three audio signals are different, the media receiving device can determine to distinguish the subsequently received audio data packets according to the channel identifications. Since the data packet is encapsulated by using the RTP protocol, the data packet contains the channel identification of the RTP. The media receiving device obtains the channel identifier of the RTP from the audio data packet, and determines that the audio data packet belongs to the audio signal with the number of 1 based on the table 1 if the channel identifier of the RTP is 0. If the channel id of the RTP is 2, it is determined that the audio data packet belongs to the audio signal numbered 2 based on the table 1. If the channel id of the RTP is 4, it is determined that the audio data packet belongs to the audio signal numbered 3 based on the table 1.
If it is determined that the channel identifiers of the RTP of the audio signal numbered 1, the RTP of the audio signal numbered 2, and the RTP of the audio signal numbered 3 are the same according to the SETUP message and the 200OK message, the media receiving device may determine to distinguish the subsequently received audio data packets according to the PT + SSRC combination.
For example, the mapping relationship between the number generated according to the description information response message sent by the media capturing device to the media receiving device in the step S102, the PT and the SSRC is shown in the following table 2.
TABLE 2
Numbering PT SSRC
1 0 12345
2 0 34567
3 8 67890
...... ...... ......
That is, PT and SSRC of the audio signal numbered 1 are 0 and 12345, respectively. The PT and SSRC of the audio signal numbered 2 are 0 and 34567, respectively. The PT and SSRC of the audio signal numbered 3 are 8 and 67890, respectively.
Since the data packet is encapsulated by the RTP protocol, the data packet includes PT and SSRC. The media receiving device obtains PT and SSRC from the audio data packet, and if PT and SSRC are 0 and 12345, respectively, it determines that the audio data packet belongs to the audio signal numbered 1 based on the above table 2. If PT and SSRC are obtained as 0 and 34567, respectively, it is determined that the audio packet belongs to the audio signal numbered 2 based on the above table 2. If PT and SSRC are obtained as 8 and 67890, respectively, it is determined that the audio data packet belongs to the audio signal numbered 3 based on table 2 above.
In a second implementation manner, the media receiving device communicates with the media collecting device through UDP, the transmission parameter includes a port number, and the audio data packet carries the port number corresponding to the audio signal to which the audio data packet belongs. And under the condition that the port numbers corresponding to the multiple audio signals are different based on the acquired transmission parameters, determining a data packet corresponding to each audio signal in the multiple audio signals according to the port numbers carried in the audio data packets. And under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the acquired transmission parameters, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packet based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Under the condition that the port numbers corresponding to the multiple audio signals are determined to be different based on the acquired transmission parameters, because the port number of each audio signal is unique and a mapping relation exists between the port number and the serial number of the audio signal, the data packet of each audio signal in the multiple audio signals can be determined according to the port number carried in the audio data packet and the mapping relation between the serial number and the port number.
Under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the acquired transmission parameters, the port number of each path of audio signals is not unique any more, and the audio signals to which the audio signal data packets belong cannot be determined through the port numbers. However, based on the above description, the data packet corresponding to the media signal may carry PT and SSRC, and the combination of PT + SSRC of each audio signal is unique, so the data packet corresponding to each audio signal in the multiple audio signals may be determined from the audio data packet based on the number, the mapping relationship between PT and SSRC, and the PT and SSRC carried in the audio data packet.
The following illustrates the case where the port numbers corresponding to the multiple audio signals are different and the same.
For example, the mapping relationship between the numbers and the port numbers generated according to the SETUP message and the 200OK message in step S103 is as shown in table 3 below, and the port numbers include the client port number and the service port number.
TABLE 3
Figure BDA0003163439610000201
/>
Figure BDA0003163439610000211
That is, the audio signal numbered 1 has a client port number of 58000-58001, a service port number of 8360-8361, 58000 being a client port number of RTP, 58001 being a client port number of RTCP, 8360 being a service port number of RTP, 8361 being a service port number of RTCP. The audio signal numbered 2 has client port numbers 58002-58003, service port numbers 8362-8363, 58002 being client port numbers of RTP, 58003 being client port number of RTCP, 8362 being service port number of RTP, service port number of 8363 RTCP. The audio signal numbered 3 has a client port number of 58004-58005, a service port number of 8364-83615, 58004 a service port number of guest RTP, 58005 a client port number of RTCP, 8364 a service port number of RTP, 8365 a service port number of RTCP.
Since the port numbers of the three audio signals are different, the media receiving device can determine to distinguish the subsequently received audio data packets according to the port numbers. Since the audio data packet is encapsulated by using the RTP protocol, the audio data packet includes the client port number of the RTP and the service port number of the RTP. The media receiving device obtains the client port number of the RTP and the service port number of the RTP from the audio data packet, and determines that the audio data packet belongs to the audio signal with the number of 1 based on the table 3 if the client port number obtained from the RTP is 58000, and the service port number of the RTP is 8360. If the client port number of the RTP obtained is 58002 and the service port number of the RTP obtained is 8362, it is determined that the audio packet belongs to the audio signal numbered 2 based on the above table 3. If the client port number of the RTP obtained is 58004 and the service port number of the RTP obtained is 8364, it is determined that the audio packet belongs to the audio signal with the number of 3 based on the above table 3.
If it is determined according to the SETUP message and the 200OK message that the client port numbers of the RTP of at least two audio signals in the audio signal numbered 1, the audio signal numbered 2, and the audio signal numbered 3 are the same, and the service port numbers of the RTP are also the same, at this time, because the mapping from the client port number of the RTP plus the service port number of the RTP to the media signal is not unique, the number of the media signal corresponding to the audio data packet cannot be determined through the transmission parameter carried in the audio data packet and the mapping relationship between the number and the port number, at this time, the data packet corresponding to each audio signal in the multiple audio signals may be determined from the audio data packet based on the PT and the SSRC carried in the audio data packet and the mapping relationship between the number, the PT, and the SSRC, and the specific determination process may refer to the related contents in the first implementation manner, and will not be described herein again.
The media receiving equipment determines the video data packet and the audio data packet based on the PT firstly, and then determines the data packet corresponding to each path of audio signal from the plurality of audio data packets based on the transmission parameter, so that the media receiving equipment can determine the data packet corresponding to each path of media signal in the multimedia signal by the mode, and subsequent storage or playing is facilitated.
Optionally, in the two implementation manners, when it is determined that the transmission channels corresponding to the multiple audio signals are different or the port numbers are different based on the obtained transmission parameters, the data packet corresponding to each audio signal is determined through a combination of PT + SSRC. However, based on the above description, the SSRC of each media signal is different, that is, the SSRC of each media signal is globally unique, so that the data packet corresponding to each audio signal in the multiple audio signals can also be determined directly based on the SSRC carried in the data packet.
As an example, the data packet corresponding to each audio signal in the multiple audio signals may be determined from the audio data packets based on the mapping relationship between the numbers and the SSRCs carried in the audio data packets.
For example, the mapping relationship between the number generated according to the description information response message sent by the media capturing device to the media receiving device in step S102 and the SSRC is shown in table 4 below.
TABLE 4
Number of SSRC
1 12345
2 34567
3 67890
...... ......
As can be seen from the mapping relationship between the numbers and the SSRCs in table 4, the SSRC of the audio signal number 1 is 12345. The SSRC of the audio signal numbered 2 is 34567. The SSRC of the audio signal numbered 3 is 67890. Since the data packet is encapsulated using the RTP protocol, the data packet contains the SSRC. The media receiving device obtains the SSRC from the audio data packet, and if the obtained SSRC is 12345, determines that the data packet belongs to the audio signal with number 1 based on the above table 4. If the obtained SSRC is 34567, it is determined that the packet belongs to the audio signal numbered 2 based on table 4 above. If the obtained SSRC is 67890, it is determined that the packet belongs to the audio signal numbered 3 based on table 4 above.
S106, the media receiving device plays or stores each determined video data packet and each determined audio data packet.
In the case that the media receiving device is a client, the client plays the video signal and the multiple audio signals. Moreover, the client playing interface can also display description information of a plurality of sound pickups, and the sound pickups refer to sound pickups used for collecting the multi-channel audio signals. The client receives an audio signal switching instruction based on the description information of the plurality of sound pickups, the audio signal switching instruction carries the number of a target audio signal, and the target audio signal is one of the plurality of paths of audio signals. The client switches the currently played audio signal to the target audio signal.
Since the media collecting device provides the client with the description information of the sound pickup corresponding to each audio signal, after the client displays the description information of the sound pickups on the playing interface, the user can determine which direction each sound pickup is used for collecting the audio signal based on the description information of the sound pickups. In this way, it is convenient for the user to switch to the audio signal of interest based on the descriptive information of the plurality of microphones.
Optionally, based on the description in the step S104, the media capturing device may periodically send an RTCP packet to the media receiving device. Therefore, when the client plays the video signal and the multiple audio signals, the SSRC can be obtained from the received RTCP message, so as to match the data packet of the media signal corresponding to each RTCP message. And then, acquiring an NTP (network time protocol) timestamp and an RTP timestamp from the received RTCP message, and realizing audio and video synchronous playing according to the NTP timestamp and the RTP timestamp in a data packet of a media signal matched with the RTCP message.
In the case that the media receiving device is an external memory, the external memory stores the video data packet and the audio data packet, or stores the video data packet and the audio data packet after being encapsulated into other formats.
For example, the video data packet and the audio data packet are RTP data packets, and the external memory only supports storing PS or MP4 data packets, so the external memory needs to firstly encapsulate the RTP data packets into PS or MP4 data packets and then store the PS or MP4 data packets.
When the data packets are converted and encapsulated into the data packets in the PS format, each data packet in the PS format carries a stream identifier, and the stream identifier is used for identifying each path of media signals in the multi-path media signals. The PS format supports audio signals of different coding formats, audio signals of different audio parameters of the same coding format, and audio signals of the same audio parameters of the same coding format. In the PS format packet, parameters of each audio signal or video signal can be described by a PSM (Program Stream Map). For example, a video PSM is constructed in front of each video I frame, the video PSM including a date, time information accurate to milliseconds, a timestamp, and an encoding format; the PSM of the audio signal includes date, time information accurate to milliseconds, time stamps, coding format, sampling rate, code rate, number of channels, and description information of the microphone.
When the data packets are converted and packaged into MP4 format data packets, each MP4 format data packet can carry the number of the media signal corresponding to the data packet and the description information of the corresponding sound pick-up; the MP4 format supports the repackaging of the data packets of multiple channels of video signals and also supports the repackaging of the data packets of multiple channels of audio signals, and the data packets of the MP4 format may carry the serial number of each channel of media signals and the description information of the corresponding sound pickup.
For example, in the MP4 format, the number of the media signal corresponding to the data packet may be described by a trak field in moov, and the description information of the sound pickup corresponding to each audio signal may be described by a moov- > trak- > mdia- > hdlr- > name field.
Further, the external memory may also send the video data packet and the audio data packet stored in the external memory to the client, and the client plays the received video data packet and audio data packet, and the playing process of the client may refer to the above description, which is not described herein again.
According to the embodiment of the application, the media receiving equipment acquires the description information of multiple paths of media signals from the media acquisition equipment, acquires the number, the transmission parameter, the PT and the SSRC of each path of media signal based on the description information, then establishes the mapping relation among the number, the transmission parameter, the PT and the SSRC of each path of media signal, and determines which path of media signal the data packet belongs to based on the transmission parameter or the PT plus SSRC after the media receiving equipment receives the data packet of the multimedia signal, so that the media receiving equipment can correctly distinguish the data packet of each path of media signal, and the transmission problem of the multiple paths of media signals is solved. Further, the embodiments of the present application respectively introduce methods for determining data packets of multiple media signals under different transmission protocol types, so that a media receiving device can correctly distinguish the data packets of the multiple media signals by using the methods provided by the embodiments of the present application, regardless of which transmission protocol type is used. Finally, since the media collecting device provides the client with the description information of the sound pickup corresponding to each audio signal, after the client displays the description information of the sound pickups on the play interface, the user can determine which orientation each sound pickup is used for collecting the audio signal based on the description information of the sound pickups. In this way, it is convenient for the user to switch to the audio signal of interest according to the description information of the plurality of sound collectors. In summary, the embodiment of the application realizes the correct transmission of the multi-path media signals, and enables the user to freely switch to the interested audio signal when watching the multimedia signal.
Fig. 4 is a schematic structural diagram of an audio/video monitoring apparatus provided in an embodiment of the present application, where the audio/video monitoring apparatus may be implemented by software, hardware, or a combination of the two to become part or all of a media receiving device, and the media receiving device may be the media receiving device shown in fig. 1. Referring to fig. 4, the apparatus includes: a description information obtaining module 401, a transmission parameter obtaining module 402, a data packet obtaining module 403 and a determining module 404.
A description information obtaining module 401, configured to obtain description information of each media signal in multiple media signals from a media collecting device, where the description information includes a serial number and a PT, the multiple media signals include a single video signal and multiple audio signals, and the multiple audio signals are collected by using different sound collectors;
a transmission parameter obtaining module 402, configured to obtain, from the media acquisition device, a transmission parameter corresponding to each of the multiple channels of audio signals based on the number of the multiple channels of media signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
a data packet obtaining module 403, configured to obtain data packets of the multiple paths of media signals from the media acquisition device;
a determining module 404, configured to determine, based on the PT of the multiple media signals, a video data packet and an audio data packet from the obtained data packets, and determine, from the audio data packet, a data packet of each audio signal in the multiple audio signals.
Optionally, the media receiving device communicates with the media acquiring device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, the audio data packet carries a channel identifier corresponding to the audio signal, and the determining module 404 is specifically configured to:
and under the condition that the channel identifiers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining the data packet of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packet.
Optionally, the description information further includes a synchronization source SSRC. The determining module 404 is specifically configured to:
and determining a data packet corresponding to each audio signal in the multi-path audio signals from the audio data packet based on the PT and the SSRC corresponding to each audio signal in the multi-path audio signals under the condition that the channel identifications corresponding to at least two audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals.
Optionally, the media receiving device communicates with the media acquiring device through a user datagram protocol UDP, the transmission parameter includes a port number, the audio data packet carries the port number corresponding to the audio signal, and the determining module 404 is specifically configured to:
and under the condition that the port numbers corresponding to the multiple audio signals are different according to the transmission parameters corresponding to the multiple audio signals, determining a data packet corresponding to each audio signal in the multiple audio signals according to the port numbers carried in the audio data packets.
Optionally, the description information further includes an SSRC, and the determining module 404 is specifically configured to:
and under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packet based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
Optionally, the media receiving device is a client, and the apparatus further includes:
the playing module is used for playing the video signal and the multi-channel audio signal;
the display module is used for displaying the description information of a plurality of sound pickups, and the sound pickups are used for collecting the multi-channel audio signals;
the receiving module is used for receiving an audio signal switching instruction based on the description information of the sound pickup devices, wherein the audio signal switching instruction carries the number of a target audio signal, and the target audio signal is one of the multiple paths of audio signals;
and the switching module is used for switching the currently played audio signal to the target audio signal.
According to the embodiment of the application, the media receiving equipment acquires the description information of multiple paths of media signals from the media acquisition equipment, acquires the number, the transmission parameter, the PT and the SSRC of each path of media signal based on the description information, then establishes the mapping relation among the number, the transmission parameter, the PT and the SSRC of each path of media signal, and determines which path of media signal the data packet belongs to based on the transmission parameter or the PT plus SSRC after the media receiving equipment receives the data packet of the multimedia signal, so that the media receiving equipment can correctly distinguish the data packet of each path of media signal, and the transmission problem of the multiple paths of media signals is solved.
It should be noted that: in the audio/video monitoring device provided in the above embodiment, only the division of each functional module is used for illustration when audio/video monitoring is performed, and in practical application, the function distribution may be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the audio/video monitoring device and the audio/video monitoring method provided by the above embodiment belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
Fig. 5 is a schematic structural diagram of another audio/video monitoring apparatus provided in an embodiment of the present application, where the audio/video monitoring apparatus may be implemented by software, hardware, or a combination of the two to become part or all of a media acquisition device, and the media acquisition device may be the media acquisition device shown in fig. 1. Referring to fig. 5, the apparatus includes: a first description information providing module 501, a transmission parameter providing module 502 and a data packet sending module 503.
A first description information providing module 501, configured to provide description information of each media signal in multiple media signals to a media receiving apparatus, where the description information includes a number and a payload type PT, the multiple media signals include a single video signal and multiple audio signals, and the multiple audio signals are collected by using different sound collectors;
a transmission parameter providing module 502, configured to provide, to the media receiving device, a transmission parameter corresponding to each of the multiple audio signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
a data packet sending module 503, configured to send the data packet of the multi-channel media signal to the media receiving device.
Optionally, the description information further includes a synchronization source SSRC;
the device also includes:
and the distribution module is used for distributing a synchronous signal source SSRC for each path of audio signals in the multi-path audio signals.
Optionally, the description information is transmitted by using a session description protocol SDP.
Optionally, the apparatus further comprises:
and the second description information providing module is used for providing the description information of the sound pickup corresponding to each audio signal in the multi-channel audio signals to the media receiving equipment.
According to the embodiment of the application, the media receiving equipment acquires the description information of multiple paths of media signals from the media acquisition equipment, the number, the transmission parameter, the PT and the SSRC of each path of media signals are acquired based on the description information, then the mapping relation among the number, the transmission parameter, the PT and the SSRC of each path of media signals is established, and after the media receiving equipment receives the data packet of the multimedia signals, which path of media signals the data packet belongs to is determined based on the transmission parameter or the PT + SSRC, so that the media receiving equipment can correctly distinguish the data packet of each path of media signals, and the transmission problem of the multiple paths of media signals is solved.
It should be noted that: in the audio/video monitoring device provided in the above embodiment, only the division of each functional module is used for illustration when audio/video monitoring is performed, and in practical application, the function distribution may be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the audio/video monitoring device provided by the above embodiment and the audio/video monitoring method embodiment belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiment, and are not described again here.
Fig. 6 is a block diagram of a client 600 according to an embodiment of the present disclosure. The client 600 may be a portable mobile client, such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Client 600 may also be referred to by other names such as user equipment, portable client, laptop client, desktop client, and so forth.
In general, client 600 includes: a processor 601 and a memory 602.
Processor 601 may include one or more processing cores, such as 4-core processors, 8-core processors, and so forth. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 602 is used to store at least one instruction for execution by the processor 601 to implement the audiovisual monitoring method provided by the method embodiments herein.
In some embodiments, client 600 may also optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch screen display 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other clients via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited by the embodiments of the present application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the client 600; in other embodiments, the display 605 may be at least two, respectively disposed on different surfaces of the client 600 or in a folding design; in still other embodiments, display 605 may be a flexible display disposed on a curved surface or on a folded surface of client 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, the front camera is arranged on the front panel of the client, and the rear camera is arranged on the back of the client. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and can be used for light compensation under different color temperatures.
Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing or inputting the electric signals to the radio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the client 600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 607 may also include a headphone jack.
The Location component 608 is used to locate the current geographic Location of the client 600 to implement navigation or LBS (Location Based Service). The Positioning component 608 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
Power supply 609 is used to provide power to the various components in client 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the architecture shown in fig. 6 does not constitute a limitation on client 600, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.
In some embodiments, a computer-readable storage medium is further provided, in which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the audio and video monitoring method in the foregoing embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is noted that the computer-readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.
It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.
That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the audiovisual monitoring method described above.
It is to be understood that reference herein to "at least one" means one or more and "a plurality" means two or more. In the description of the embodiments of the present application, "/" indicates an alternative meaning, for example, a/B may indicate a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
The above-mentioned embodiments are provided by way of example and should not be construed as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (15)

1. An audio and video monitoring method is applied to a media receiving device, the media receiving device is a client, and the method is characterized by comprising the following steps:
acquiring description information of each path of media signal in a plurality of paths of media signals from media acquisition equipment, wherein the description information comprises a serial number, a payload type PT and a synchronous signal source SSRC, the plurality of paths of media signals comprise a single path of video signals and a plurality of paths of audio signals, and the plurality of paths of audio signals are acquired by adopting different sound pickup devices;
acquiring a transmission parameter corresponding to each audio signal in the multi-channel audio signals from the media acquisition equipment based on the serial numbers of the multi-channel media signals, wherein the transmission parameter is used for indicating a transmission channel or a transmission port of the corresponding audio signal;
acquiring data packets of the multiple paths of media signals from the media acquisition equipment, and receiving a real-time transport control protocol (RTCP) message sent by the media acquisition equipment, wherein the RTCP message carries a Network Time Protocol (NTP) timestamp, a real-time transport protocol (RTP) timestamp and a single steady state radio frequency (SSRC), and each data packet carries the SSRC of the corresponding media signal;
determining a video data packet and an audio data packet from the obtained data packets based on the PT of the multi-path media signals, and determining a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the transmission parameter corresponding to each audio signal in the multi-path audio signals;
the method further comprises the following steps:
acquiring an SSRC from the RTCP message, and matching the SSRC carried in the data packet of the multi-path media signal with the SSRC carried in the RTCP message to determine the data packet which is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message;
when the media receiving device communicates with the media collecting device through a transmission control protocol TCP, where the transmission parameter includes a channel identifier, and the audio data packet carries a channel identifier corresponding to the audio signal, the determining, from the audio data packet, a data packet corresponding to each audio signal in the multiple audio signals based on the transmission parameter corresponding to each audio signal in the multiple audio signals includes:
determining a data packet of each audio signal in the multi-channel audio signals according to the channel identifier carried in the audio data packet under the condition that the channel identifiers corresponding to the multi-channel audio signals are different based on the transmission parameters corresponding to the multi-channel audio signals;
and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
2. The method according to claim 1, wherein when the media receiving device communicates with the media capturing device through a user datagram protocol UDP, the transmission parameter includes a port number, and the audio data packet carries the port number corresponding to the audio signal, the determining, from the audio data packet, a data packet corresponding to each audio signal in the multiple audio signals based on the transmission parameter corresponding to each audio signal in the multiple audio signals includes:
and under the condition that the port numbers corresponding to the multiple audio signals are different based on the transmission parameters corresponding to the multiple audio signals, determining a data packet corresponding to each audio signal in the multiple audio signals according to the port number carried in the audio data packet.
3. The method of claim 2, wherein the method further comprises:
and under the condition that the port numbers corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
4. The method of any of claims 1-3, wherein the method further comprises:
playing the video signal and the multi-channel audio signal;
displaying description information of a plurality of sound pickups, wherein the sound pickups are used for collecting the multi-path audio signals;
receiving an audio signal switching instruction based on the description information of the plurality of sound pickups, wherein the audio signal switching instruction carries the number of a target audio signal, and the target audio signal is one of the plurality of paths of audio signals;
and switching the currently played audio signal to the target audio signal.
5. An audio and video monitoring method is applied to a media acquisition device, and is characterized by comprising the following steps:
providing description information of each path of media signal in multiple paths of media signals to a media receiving device, wherein the description information comprises a number, a payload type PT and a synchronous signal source SSRC, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, the multiple paths of audio signals are collected by different sound pickups, and the media receiving device is a client;
providing a transmission parameter corresponding to each audio signal in the multiple audio signals to the media receiving device, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
sending data packets of the multiple paths of media signals to the media receiving equipment, so that the media receiving equipment determines video data packets and audio data packets from the obtained data packets, and determines data packets of each path of audio signals in the multiple paths of audio signals from the audio data packets based on transmission parameters corresponding to each path of audio signals in the multiple paths of audio signals; sending a real-time transport control protocol (RTCP) message to the media receiving equipment, wherein the RTCP message carries a Network Time Protocol (NTP) timestamp, a real-time transport protocol (RTP) timestamp and a single steady state radio frequency (SSRC), and each data packet carries the SSRC of the media signal;
the media receiving device is further configured to obtain an SSRC from the RTCP message, and match the SSRC carried in the data packet of the multiple paths of media signals with the SSRC carried in the RTCP message to determine a data packet that is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message;
the media receiving device is further configured to determine, when the media receiving device communicates with the media collecting device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, and when the audio data packet carries a channel identifier corresponding to the audio signal, and the channel identifiers corresponding to the multiple audio signals are determined based on the transmission parameter corresponding to the multiple audio signals, a data packet of each audio signal in the multiple audio signals according to the channel identifier carried in the audio data packet; and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
6. The method of claim 5, wherein before providing the description information of each of the multiple media signals to the media receiving device, further comprising:
and allocating one SSRC to each audio signal in the multi-path audio signals.
7. A method according to claim 5 or 6, characterized in that said description information is transmitted using the Session description protocol SDP.
8. The method of claim 5, further comprising:
and providing the description information of the sound pickup corresponding to each audio signal in the multi-channel audio signals to the media receiving equipment.
9. An audio/video monitoring apparatus, applied to a media receiving device, where the media receiving device is a client, the apparatus includes:
the device comprises a description information acquisition module, a synchronization information source acquisition module and a synchronization information source acquisition module, wherein the description information acquisition module is used for acquiring the description information of each path of media signals in the multiple paths of media signals from a media acquisition device, the description information comprises a number, a payload type PT and a synchronization information source SSRC, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, and the multiple paths of audio signals are acquired by adopting different sound collectors;
a transmission parameter acquiring module, configured to acquire, from the media acquisition device, a transmission parameter corresponding to each of the multiple audio signals based on the number of the multiple media signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
a data packet obtaining module, configured to obtain data packets of the multiple paths of media signals from the media acquisition device, and receive a real-time transport control protocol RTCP message sent by the media acquisition device, where the RTCP message carries a network time protocol NTP timestamp, a real-time transport protocol RTP timestamp, and an SSRC, and each data packet carries an SSRC of the corresponding media signal;
the determining module is used for determining a video data packet and an audio data packet from the acquired data packets based on the PT of the multi-path media signals, and determining a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the transmission parameter corresponding to each audio signal in the multi-path audio signals;
the apparatus also includes means for:
acquiring an SSRC from the RTCP message, and matching the SSRC carried in the data packet of the multi-path media signal with the SSRC carried in the RTCP message to determine the data packet which is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message;
when the media receiving device communicates with the media collecting device through a Transmission Control Protocol (TCP), the transmission parameters include channel identifiers, and the audio data packets carry the channel identifiers corresponding to the audio signals, the determining module is configured to determine the data packets of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packets under the condition that the channel identifiers corresponding to the multiple audio signals are determined to be different based on the transmission parameters corresponding to the multiple audio signals; and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
10. An audio-video monitoring device, applied to a media acquisition device, the device comprising:
the system comprises a first description information providing module, a second description information providing module and a synchronization information source SSRC, wherein the first description information providing module is used for providing description information of each path of media signals in multiple paths of media signals to a media receiving device, the description information comprises numbers, payload types PT and synchronization information sources SSRC, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, the multiple paths of audio signals are collected by different sound pickups, and the media receiving device is a client;
a transmission parameter providing module, configured to provide, to the media receiving device, a transmission parameter corresponding to each of the multiple audio signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
a data packet sending module, configured to send data packets of the multiple paths of media signals to the media receiving device, so that the media receiving device determines a video data packet and an audio data packet from the obtained data packets, and determines a data packet of each path of audio signal in the multiple paths of audio signals from the audio data packet based on a transmission parameter corresponding to each path of audio signal in the multiple paths of audio signals; a real-time transport control protocol (RTCP) message is sent to the media receiving equipment, the RTCP message carries a Network Time Protocol (NTP) timestamp, a real-time transport protocol (RTP) timestamp and a single steady state radio control channel (SSRC), and each data packet carries the SSRC of the media signal;
the media receiving device is further configured to obtain an SSRC from the RTCP message, and match the SSRC carried in the data packet of the multiple paths of media signals with the SSRC carried in the RTCP message to determine a data packet that is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message;
the media receiving device is further configured to determine, when the media receiving device communicates with the media collecting device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, and when the audio data packet carries a channel identifier corresponding to the audio signal, and it is determined based on the transmission parameter corresponding to the multiple audio signals that the channel identifiers corresponding to the multiple audio signals are different, a data packet of each audio signal in the multiple audio signals according to the channel identifier carried in the audio data packet; and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
11. A media receiving device, wherein the media receiving device is a client, the device comprising:
the device comprises a description information acquisition module, a synchronization information source acquisition module and a synchronization information source acquisition module, wherein the description information acquisition module is used for acquiring the description information of each path of media signals in the multiple paths of media signals from a media acquisition device, the description information comprises a number, a payload type PT and a synchronization information source SSRC, the multiple paths of media signals comprise single-path video signals and multiple paths of audio signals, and the multiple paths of audio signals are acquired by adopting different sound collectors;
a transmission parameter obtaining module, configured to obtain, from the media acquisition device, a transmission parameter corresponding to each of the multiple audio signals based on the number of the multiple media signals, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal;
a data packet obtaining module, configured to obtain a data packet of the multiple paths of media signals from the media acquisition device, and receive a real-time transport control protocol RTCP packet sent by the media acquisition device, where the RTCP packet carries a network time protocol NTP timestamp, a real-time transport protocol RTP timestamp, and an SSRC, and each data packet carries an SSRC of the media signal to which the data packet belongs;
the determining module is used for determining a video data packet and an audio data packet from the acquired data packets based on the PT of the multi-path media signals, and determining a data packet of each audio signal in the multi-path audio signals from the audio data packet based on the transmission parameter corresponding to each audio signal in the multi-path audio signals;
the apparatus also includes means for:
acquiring an SSRC from the RTCP message, and matching the SSRC carried in the data packet of the multi-path media signal with the SSRC carried in the RTCP message to determine the data packet which is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message;
when the media receiving device communicates with the media collecting device through a Transmission Control Protocol (TCP), the transmission parameters include channel identifiers, and the audio data packets carry the channel identifiers corresponding to the audio signals, the determining module is configured to determine the data packets of each audio signal in the multiple audio signals according to the channel identifiers carried in the audio data packets under the condition that the channel identifiers corresponding to the multiple audio signals are determined to be different based on the transmission parameters corresponding to the multiple audio signals; and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
12. A media capturing device, characterized in that the device comprises: the system comprises a video acquisition module, a plurality of sound pickups and a transmitter;
the video acquisition module is used for acquiring a single-channel video signal;
the plurality of sound pickups are used for collecting multi-path audio signals, and the multi-path audio signals are collected by adopting different sound pickups;
the transmitter is used for providing description information of each media signal in a plurality of media signals to a media receiving device, the description information comprises a number, a payload type PT and a synchronous information source SSRC, and the plurality of media signals comprise the single-channel video signal and the plurality of audio signals; providing a transmission parameter corresponding to each audio signal in the multiple audio signals to the media receiving device, where the transmission parameter is used to indicate a transmission channel or a transmission port of the corresponding audio signal; sending data packets of the multiple paths of media signals to the media receiving equipment, so that the media receiving equipment determines video data packets and audio data packets from the obtained data packets, and determines data packets of each path of audio signals in the multiple paths of audio signals from the audio data packets based on transmission parameters corresponding to each path of audio signals in the multiple paths of audio signals; sending a real-time transport control protocol (RTCP) message to the media receiving equipment, wherein the RTCP message carries a Network Time Protocol (NTP) timestamp, a real-time transport protocol (RTP) timestamp and a single steady state radio frequency (SSRC), and each data packet carries the SSRC of the media signal;
the media receiving device is further configured to obtain an SSRC from the RTCP message, and match the SSRC carried in the data packet of the multiple paths of media signals with the SSRC carried in the RTCP message to determine a data packet that is the same as the SSRC carried in the RTCP message; performing audio and video synchronization based on the NTP timestamp and the RTP timestamp in the RTCP message and the RTP timestamp in the data packet which is the same as the SSRC carried in the RTCP message, wherein the media receiving equipment is a client;
the media receiving device is further configured to determine, when the media receiving device communicates with the media collecting device through a transmission control protocol TCP, the transmission parameter includes a channel identifier, and when the audio data packet carries a channel identifier corresponding to the audio signal, and it is determined based on the transmission parameter corresponding to the multiple audio signals that the channel identifiers corresponding to the multiple audio signals are different, a data packet of each audio signal in the multiple audio signals according to the channel identifier carried in the audio data packet; and under the condition that the channel identifications corresponding to at least two paths of audio signals in the multi-path audio signals are determined to be the same based on the transmission parameters corresponding to the multi-path audio signals, determining a data packet corresponding to each path of audio signals in the multi-path audio signals from the audio data packets based on the PT and the SSRC corresponding to each path of audio signals in the multi-path audio signals.
13. The audio and video monitoring system is characterized by comprising a media receiving device and a media collecting device;
the media receiving device is used for realizing the steps of the method of any one of the preceding claims 1 to 4;
the media capturing device is adapted to implement the steps of the method of any of the preceding claims 5 to 8.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 4.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of the preceding claims 5 to 8.
CN202110797686.1A 2021-07-14 2021-07-14 Audio and video monitoring method, device, equipment, storage medium and system Active CN113542688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110797686.1A CN113542688B (en) 2021-07-14 2021-07-14 Audio and video monitoring method, device, equipment, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110797686.1A CN113542688B (en) 2021-07-14 2021-07-14 Audio and video monitoring method, device, equipment, storage medium and system

Publications (2)

Publication Number Publication Date
CN113542688A CN113542688A (en) 2021-10-22
CN113542688B true CN113542688B (en) 2023-03-28

Family

ID=78099200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110797686.1A Active CN113542688B (en) 2021-07-14 2021-07-14 Audio and video monitoring method, device, equipment, storage medium and system

Country Status (1)

Country Link
CN (1) CN113542688B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868937A (en) * 2011-07-08 2013-01-09 中兴通讯股份有限公司 Method and system for transmitting multimedia data
CN104079870A (en) * 2013-03-29 2014-10-01 杭州海康威视数字技术股份有限公司 Video monitoring method and system for single-channel video and multiple-channel audio frequency
EP3534609A1 (en) * 2018-03-02 2019-09-04 Thomson Licensing Methods for processing audiovisual streams and corresponding devices, electronic assembly, system, computer readable program products and computer readable storage media

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100531398C (en) * 2006-08-23 2009-08-19 中兴通讯股份有限公司 Method for realizing multiple audio tracks in mobile multimedia broadcast system
EP2292013B1 (en) * 2008-06-11 2013-12-04 Koninklijke Philips N.V. Synchronization of media stream components
CN101489090B (en) * 2009-02-20 2014-01-08 华为终端有限公司 Method, apparatus and system for multipath media stream transmission and reception
CN103607663A (en) * 2013-11-27 2014-02-26 福建星网锐捷网络有限公司 Identification method, device and equipment for multimedia streams
CN110086978A (en) * 2018-01-25 2019-08-02 浙江宇视科技有限公司 MCVF multichannel voice frequency transmission method, device and terminal device
CN111385625B (en) * 2018-12-29 2021-12-10 成都鼎桥通信技术有限公司 Non-IP data transmission synchronization method and device
JP7247707B2 (en) * 2019-03-28 2023-03-29 日本電気株式会社 Transmission node, broadcasting station system, control node and transmission control method
CN113114688B (en) * 2021-04-15 2023-03-24 杭州网易智企科技有限公司 Multimedia conference management method and device, storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868937A (en) * 2011-07-08 2013-01-09 中兴通讯股份有限公司 Method and system for transmitting multimedia data
CN104079870A (en) * 2013-03-29 2014-10-01 杭州海康威视数字技术股份有限公司 Video monitoring method and system for single-channel video and multiple-channel audio frequency
EP3534609A1 (en) * 2018-03-02 2019-09-04 Thomson Licensing Methods for processing audiovisual streams and corresponding devices, electronic assembly, system, computer readable program products and computer readable storage media

Also Published As

Publication number Publication date
CN113542688A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
WO2017008627A1 (en) Multimedia live broadcast method, apparatus and system
US9398344B2 (en) Image display apparatus, mobile terminal and method for operating the same
CN109874043B (en) Video stream sending method, video stream playing method and video stream playing device
JP2017521922A (en) Video remote commentary synchronization method and system, and terminal device
WO2021143479A1 (en) Media stream transmission method and system
CN110493626B (en) Video data processing method and device
CN109413453B (en) Video playing method, device, terminal and storage medium
CN113141524B (en) Resource transmission method, device, terminal and storage medium
CN112492357A (en) Method, device, medium and electronic equipment for processing multiple video streams
JP2018509060A5 (en)
US11652864B2 (en) Method and apparatus for transmitting resources and non-transitory storage medium
CN104079870A (en) Video monitoring method and system for single-channel video and multiple-channel audio frequency
WO2023071598A1 (en) Audio and video synchronous monitoring method and apparatus, electronic device, and storage medium
US9413787B2 (en) Real-time delivery of location/orientation data
KR20130137923A (en) Image display apparatus, mobile terminal and method for operating the same
CA2925088A1 (en) Reception device, reception method, transmission device, and transmission method
CN112738645A (en) Method and apparatus for transmitting and receiving signal in multimedia system
KR101533368B1 (en) Control method of master mobile apparatus and slave mobile apparatus, recording medium for performing the method
CN113542688B (en) Audio and video monitoring method, device, equipment, storage medium and system
CN109587497B (en) Audio data transmission method, device and system for FLV (flash video) stream
CN111478915A (en) Live broadcast data stream pushing method and device, terminal and storage medium
CN111131272A (en) Scheduling method, device and system of stream server, computing equipment and storage medium
TW200937880A (en) Audio and video wireless broadcasting device
CN206922944U (en) A kind of intelligent video records control system
RU159037U1 (en) AUDIO STREAM DEVICE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant