WO2021174879A1 - Ai视频通话质量分析方法、装置、计算机设备及存储介质 - Google Patents

Ai视频通话质量分析方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021174879A1
WO2021174879A1 PCT/CN2020/124904 CN2020124904W WO2021174879A1 WO 2021174879 A1 WO2021174879 A1 WO 2021174879A1 CN 2020124904 W CN2020124904 W CN 2020124904W WO 2021174879 A1 WO2021174879 A1 WO 2021174879A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
time stamp
database
rtp
video call
Prior art date
Application number
PCT/CN2020/124904
Other languages
English (en)
French (fr)
Inventor
王锁平
周登宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174879A1 publication Critical patent/WO2021174879A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/04Diagnosis, testing or measuring for television systems or their details for receivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1101Session protocols
    • H04L65/1104Session initiation protocol [SIP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering

Definitions

  • This application relates to the technical field of remote consultation calls for digital medical treatment, and in particular to an AI video call quality analysis method, device, and computer-readable storage medium.
  • Artificial Intelligence is a new technological science, a science that studies related theories and methods of simulating, extending, and expanding human intelligence, and belongs to a branch of computer science.
  • Various AI technologies have become mature, and AI video calls have gradually replaced manual labor, and have been applied in scenarios such as remote face-to-face interviews, video return visits, and remote account opening.
  • the inventor found that in the call process of remote consultation, when network fluctuations or network jitter affect the effect of audio and video calls, AI cannot accurately and intelligently identify audio and video effects. Video recordings cannot be retained as relevant vouchers.
  • This application provides an AI video call quality analysis method, device, electronic equipment, and computer readable storage medium to solve the problem that the quality of audio and video cannot be accurately detected during remote consultation video calls in the prior art.
  • an AI video call quality analysis method includes:
  • Extract the second RTP data corresponding to the port information of the SIP message from the database compare the first RTP data with the second RTP data, and if the comparison is successful, identify the SIP message and calculate The time stamp difference and the packet loss rate of the first RTP data;
  • the feature code corresponding to the first RTP data is recorded in the database.
  • this application also provides an AI video call quality analysis device, the device includes:
  • the acquisition module is used to acquire all the data when the device is in a video call
  • Parsing module used to extract SIP message and its corresponding port information and first RTP data in the data
  • the comparison analysis module is used to extract the second RTP data corresponding to the port information of the SIP message in the database, compare the first RTP data with the second RTP data, and if the comparison is successful, identify The SIP message, and calculating the time stamp difference and the packet loss rate of the first RTP data;
  • a judging module configured to, when it is recognized that the SIP message is a call end instruction, perform abnormal judgment on the video call based on the time stamp difference and the packet loss rate;
  • the feature code recording module is used to record the feature code corresponding to the first RTP data in the database according to the judgment result.
  • the embodiments of the present application also provide a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes
  • the computer-readable instructions implement the following steps:
  • Extract the second RTP data corresponding to the port information of the SIP message from the database compare the first RTP data with the second RTP data, and if the comparison is successful, identify the SIP message and calculate The time stamp difference and the packet loss rate of the first RTP data;
  • the feature code corresponding to the first RTP data is recorded in the database.
  • embodiments of the present application provide a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the AI as described below is realized Steps of the video call quality analysis method:
  • Extract the second RTP data corresponding to the port information of the SIP message from the database compare the first RTP data with the second RTP data, and if the comparison is successful, identify the SIP message and calculate The time stamp difference and the packet loss rate of the first RTP data;
  • the feature code corresponding to the first RTP data is recorded in the database.
  • the AI video call quality analysis method, device, computer equipment, and storage medium provided according to the embodiments of the present application have at least the following beneficial effects:
  • the SIP message and its corresponding port information and the first RTP data in the data By acquiring all the data during the video call, and extracting the SIP message and its corresponding port information and the first RTP data in the data, first extract the corresponding second RTP data of the SIP message corresponding port in the database and combine it with the first RTP data.
  • the RTP data is compared to ensure port consistency, and then the time stamp difference and packet loss rate of the first RTP data will be calculated.
  • the call end instruction in the SIP message is obtained, the time of the first RTP data
  • the stamp difference value and the packet loss rate are judged to judge whether the data is abnormal, and the characteristic code corresponding to the first RTP data is recorded in the database.
  • FIG. 1 is a schematic diagram of the encapsulation structure of the RTP data packet header provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of an AI video call quality analysis method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of modules of an AI video call quality analysis device provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a computer device according to an embodiment of the application.
  • RTP Real-time Transport Protocol
  • RTP Real-time Transport Protocol
  • RTP provides end-to-end transmission services with real-time features for data, such as interactive games, images, voice, fax, and network conferences under multicast or unicast network services.
  • SIP Session Initiation Protocol, Session Initiation Protocol
  • Session Initiation Protocol Session Initiation Protocol
  • SMTP Simple Mail Transfer Protocol
  • HTTP Hypertext Transfer Protocol
  • the header fields that must be included include: CALL_ID, a unique identifier used to distinguish different sessions; CSeq, a sequence number, used to distinguish request methods in the same session. This will be added for each new request in a session. Integer value to ensure that the value is in order; FORM, indicating the source of the request; TO, indicating the recipient of the request; MAX-Forwards, limiting the number of jump points and the maximum number of forwarding.
  • FIG. 1 is a schematic diagram of an encapsulation structure of an RTP header provided by an embodiment of the present application.
  • V RTP protocol pair version number, occupying 2 bits.
  • CSRC Content Resource
  • M The interpretation of this bit is borne by the configuration document. The purpose is to allow important events to be marked in the packet stream, occupying 1 bit; if different pairs of payloads have different meanings, for video, mark the end of a frame pair; for audio, Mark the conversation pair to start.
  • PT (Payload Type) payload type, which identifies the type of stream RTP payload pair, used to indicate the type of encoding used for audio or video, which is determined by the sender and occupies 7 bits.
  • SN (Sequence Number) sequence number, used to identify the sequence number of the RTP message sent by the sender, each time a message is sent, the sequence number increases by 1, and the sequence number is generated immediately to the initial value and can be used to check for loss Packets and data packet sorting, occupying 16 bits.
  • Timestamp Timestamp, which records the sampling time of the first byte of the data in the RTP packet.
  • the receiver can determine whether the arrival of the data is affected by the stream delay jitter according to the timestamp.
  • the timestamp is initialized to an initial value, and the value of the timestamp increases with time, occupying 32 bits.
  • SSRC (Synchronization Source) synchronization source identifier, used to identify the synchronization source, the identifier is randomly generated, and participation in the video conference can not have the same pair of SSRC for the two synchronization sources, and it occupies 32 bits.
  • CSRC list (Contributing Source) Contributing source list, including 0 to 15 CSRC identifiers, used to identify all contributors included in the RTP packet payload, so that the RTP packet at the receiving end can correctly indicate the identity of the two parties in the conversation.
  • Each CSRC identifier station has 32 bits.
  • This application provides an AI video call quality analysis method.
  • FIG. 2 it is a schematic flowchart of an AI video call quality analysis method provided by an embodiment of this application.
  • the AI video call quality analysis method includes:
  • the data sent by the device is mirrored into multiple pieces of data through the splitter or splitter, and sent to the other device at the same time, it is also sent to the processing device, and the processing device saves it in the database , So get the mirrored data sent to the device.
  • the devices are devices at both ends of the video call, that is, the video call initiator and the video call receiver. In this application, it is the customer's equipment and AI end equipment. That is, the data of both ends of the video call at one end is analyzed at the same time. The quality analysis methods at both ends of the video call are the same, so one end is taken as an example for description below.
  • the devices include, but are not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
  • the processing device includes, but is not limited to, an electronic device with processing capabilities such as a server.
  • the splitter is used for the proportional distribution of optical devices for optical signals. Among them, a large proportion of optical signals are given to the service link, and a small proportion of optical signals are given to the bypass monitoring link, which is the processing equipment in this case.
  • the splitter can split the traffic according to the strategy and guarantee the same source and the same sink under the condition of multi-port load.
  • Commonly used are 2.5G splitter, 10G splitter, 40G splitter and test access port (TAP). It is a hardware device that is directly plugged into the network cable and sends a network communication to other devices.
  • the processing device obtains all the data of the video call, it extracts the SIP message and its corresponding port information and the first RTP data.
  • S2 specifically includes:
  • the INVITE message in the SIP message is extracted from the data, and a fixed field is obtained from the INVITE message and recorded in the database and memory.
  • the fixed field is the CALL_ID field in the INVITE message, that is, the fixed field is the CALL_ID field in the INVITE message.
  • the CALL_ID field creates a folder for the name, and this field is a unique identifier for a call.
  • the first RTP data packet in the data is extracted as the first RTP data.
  • all the first RTP data is parsed, the data packets carried by the first RTP data are sequentially extracted, and the data packets carried by the RTP data are sequentially arranged according to the sequence number of the RTP data. And extract the SIP message and its port information, the port information is fixed in a video call.
  • the data packets include audio packets and video packets.
  • all the first RTP data is parsed, and the audio packets and video packets carried by it are extracted in turn. And according to the sequence numbers of the audio packets and video packets, the audio packets and video packets carried by them are sorted in order.
  • all data during the video call can also be stored in a blockchain node.
  • the related information of a video call is saved in a folder named CALL_ID and the folder is saved in the database and memory, which ensures the uniqueness of the video call data and avoids data disorder.
  • the port information is used to find the corresponding second RTP data in the database. And compare the first RTP data with the second RTP data.
  • the function of the comparison port is that the AI video call quality analysis method of this application not only analyzes one video call, but also analyzes the quality of multiple video calls at the same time, for example, analyzes the quality of dozens of video calls at the same time. Therefore, if the comparison is unsuccessful, it means that the first RTP data extracted first is wrong. If the analysis and testing are continued, the data will be disordered. The above problems can be avoided by analyzing and testing the data after successful comparison.
  • the identification of SIP messages is mainly to identify their status codes and BYE methods.
  • the status codes are divided into 1XX: temporary response, indicating that the request message is being processed; 2XX: session success, indicating that the request has been successfully received and fully understood; 3XX : Redirect, indicating that further requests are needed to be completed; 4XX, request failure, indicating that the request message contains syntax error information or the server cannot complete the client's request; 5XX, server error, indicating that the server cannot legally complete the request; 6XX: global failure, Indicates that no server can complete the request.
  • the BYE method means terminating the session connection.
  • the data packet information in the first RTP data is obtained to calculate the time stamp difference between adjacent data packets, and the part of the serial number lost during the entire video call is obtained to obtain the packet loss rate of the data packet.
  • the data packet includes an audio packet and/or a video packet
  • S3 specifically includes:
  • Analyzing the status code of the SIP message that is, analyzing and identifying the status code and BYE method in the SIP message, and when the status code is 200 OK, it indicates that the session is successful.
  • the audio packets in the RTP data are analyzed and detected first, that is, by sequentially recording the sequence numbers and timestamps of the audio packets carried in the RTP data, the audio packets are arranged in ascending order of the sequence numbers, and each The serial number and timestamp of each of the audio packets are respectively subtracted from the serial number and timestamp of the previous audio packet, and the difference between the timestamps is recorded as delta1; if two adjacent audio packets appear If the sequence number difference is not 1, then the corresponding difference is added to the total number of lost audio packets, that is, the total number of lost audio packets in this video call is counted, so as to use the lost audio packets.
  • sequence number and time stamp of each video packet in the first RTP data are sequentially recorded, and the sequence number and time stamp of each video packet are sequentially subtracted from the sequence number and time stamp of the previous video packet, respectively , Obtain the time stamp difference of two adjacent video packets, and record it as delta2;
  • the video packets in the RTP data are analyzed and detected first, that is, by sequentially recording the sequence numbers and timestamps of the video packets carried in the RTP data, the video packets are arranged in sequence from small to large, and each The serial number and timestamp of each of the video packets are respectively subtracted from the sequence number and timestamp of the previous video packet, and the difference between the timestamps is recorded as delta2; if two adjacent video packets appear If the sequence number difference is not 1, the corresponding difference is added to the total number of lost video packets, that is, the total number of lost video packets in this video call is counted, so as to use the lost video packets.
  • the processing sequence of the video packets and the audio packets in the RTP data is in no particular order, and can be performed at the same time.
  • the above-mentioned time-stamp difference analysis and serial number difference analysis are also in no particular order.
  • the maximum time stamp difference and the packet loss rate of the video call are calculated to realize the quality analysis of the video call.
  • a timing period of a preset number of seconds is sequentially set;
  • the time stamp of the data packet with the smallest sequence number that is, finding the earliest transmitted data packet as the start
  • sequentially setting the timing period of preset seconds and recording the number of data packets sent in a timing period.
  • the number of audio packets or video packets sent per second is obtained by dividing the number of transmissions by the preset number of seconds.
  • the preset number of seconds is 1 to determine the overall quality of the video call. And recorded in the database and memory.
  • the above method is used to calculate the number of video packets or audio packets sent per second of the video call, so as to further realize the quality analysis of the video call.
  • the call end instruction when the RTP data is analyzed and detected, SIP messages are also analyzed and detected sequentially.
  • the call end instruction is recognized in the SIP message, the analysis and detection of the audio packets and video packets in the first RTP data is ended. , And start to judge the video call based on the time stamp difference and the packet loss rate; the call end instruction includes a BYE method and a 200 OK message for end confirmation.
  • S4 specifically includes:
  • the analysis and detection results are stored in the database and memory at the same time. Therefore, when the call end instruction is parsed, the time of the previously stored audio packets and video packets is directly extracted from the database. Stamp difference, packet loss rate, and the number of packets sent per second;
  • the time stamp difference By obtaining the time stamp difference, the packet loss rate, and the quantity information sent per second, it is compared with the preset requirements to determine whether the video call is abnormal. Through this step, the abnormal situation of the quality analysis can be obtained. out.
  • the time stamp difference, the packet loss rate, and the number of data packets sent per second respectively correspond to three preset values, and the data packets include audio packets and video packets, that is, the time stamp difference corresponding to audio packets and video packets.
  • the preset values for value, packet loss rate, and the number of packets sent per second are also different. And as long as there is an audio packet or a video packet that does not meet any of the preset values corresponding to the time stamp difference, the packet loss rate, and the number sent per second, that is, as long as there is a non-compliance, the video can be determined The call is abnormal.
  • the following will describe in detail the preset values corresponding to the time stamp difference between the audio packet and the video packet, the packet loss rate, and the number sent per second.
  • determining that the video call data is abnormal includes:
  • the first preset sending standard is that it is normal to send 50-60 audio packets per second. If the audio packets sent per second in this video call are less than 50 or greater than 60, it means data Abnormal, the audio in the video call is abnormal;
  • the second preset sending standard is sending 45 video packets per second, and the error is within 15 video packets, it is normal. If the video packets sent per second in this video call are low If it is 30 or greater than 60, it means that the data is abnormal, and the video in the video call is abnormal;
  • the first preset value is 2000, when the difference between the timestamps of the audio packet or the video packet is greater than 2000, it means that the audio and video recording is abnormal;
  • the second preset value is 1000
  • the third preset value is 3%.
  • the fourth preset value is 5%
  • the fifth preset value is 4%. According to the packet loss rate of audio packets and video packets stored in the database, if the packet loss rate of the audio packet is greater than 5% and the video packet If the packet loss rate is greater than 4%, it means that the data is abnormal, specifically the audio and video stream is abnormal;
  • the first preset value is greater than the second preset value
  • the third preset value is less than the fourth and fifth preset values, and specific values are limited. This is only the best embodiment proposed. In other solutions, the third preset value may be set to be greater than the fourth or fifth preset value.
  • the judgment result that is, the above judgment result is the normal or abnormal situation corresponding to the abnormality
  • the characteristic code corresponding to the case of the RTP stream being sent less or more in the normal or abnormal situation in the database
  • the abnormal situation is realized. Or a record of normal conditions.
  • record the abnormal and normal feature codes of the matching If the judgment result is normal, it corresponds to the matching feature code 2; if the judgment result is abnormal, the matching corresponds to the feature code 7.
  • S5 specifically includes:
  • the abnormal quantity of the first RTP data is added to the total abnormal quantity in the database, and the difference in the timestamp, the packet loss rate or the data sent every second corresponding to the abnormal judgment result is added.
  • the number of data packets is used to determine the abnormal category of the first RTP data;
  • the preset feature code corresponding to the first RTP data is composed and encoded and recorded in the database.
  • the normal first RTP data matching corresponding preset feature code is directly acquired, that is, the RTP stream is normal.
  • An RTP data is used to determine the abnormality category. which is
  • the corresponding feature code for sending is 3
  • the feature code for RTP packet loss is 4
  • the feature code for abnormal delta value is 5
  • the feature code for unsynthesized audio and video is 6.
  • the preset feature code corresponding to the above data is based on [Identification bit (2 bits)] + [Encoding bit (4 bits)], for example, VE0000, VE is videoerror, but this bit has no practical meaning, it can be any two Instead of English letters, the encoding bits represent SRC_AUDIO (audio condition of the client), SRC_VIDEO (video condition of the client), DST_AUDIO (audio condition of the AI end), DST_VIDEO (video condition of the AI end) in order, namely the audio and video sent by the client The data situation and the audio and video data situation sent by the AI terminal.
  • DELTA is abnormal, RTP packet loss, audio and video not synthesized, and the priority is displayed from high to low.
  • Feature code the unshown feature codes and their corresponding abnormal conditions are also saved in the database. For example, if an audio packet in a video call has both RTP stream over-transmission and RTP packet loss, its feature code will only show 3, but the abnormal situation of RTP stream over-transmission and RTP packet loss and its corresponding will be saved in the database Feature code. Finally, save the above code in the database.
  • the codes in the database are extracted in real time, and the corresponding abnormal conditions are summarized and displayed. You can see the data of each category, and you can understand the call quality of the video call in time.
  • the judgment of the first RTP data is a detailed judgment, and the specific abnormal type of the first RTP data is obtained.
  • the abnormal data of the video call is classified, so that the abnormal data can be displayed more clearly, and the subsequent abnormal data processing is convenient.
  • the data packet includes an audio packet and/or a video packet.
  • the data packet includes an audio packet and a video packet. It can also include only audio packets or video packets.
  • the SIP message and its corresponding port information and the first RTP data in the data By acquiring all the data during the video call, and extracting the SIP message and its corresponding port information and the first RTP data in the data, first extract the corresponding second RTP data of the SIP message corresponding port in the database and combine it with the first RTP data.
  • the RTP data is compared to ensure port consistency, and then the time stamp difference and packet loss rate of the first RTP data will be calculated.
  • the call end instruction in the SIP message is obtained, the time of the first RTP data
  • the stamp difference value and the packet loss rate are judged to judge whether the data is abnormal, and the characteristic code corresponding to the first RTP data is recorded in the database.
  • FIG. 3 it is a functional block diagram of the AI video call quality analysis device of the present application.
  • the AI video call quality analysis apparatus 100 described in this application can be installed in an electronic device.
  • the AI video call quality analysis device 100 may include an acquisition module 101, an analysis module 102, a comparison analysis module 103, a judgment module 104, and a feature code recording module 105.
  • the module of the present invention can also be called a unit, which refers to an instruction segment of a series of computer-readable instructions that can be executed by the processor of an electronic device and can complete fixed functions, and is stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the data packets include audio packets and data packets.
  • the obtaining module 101 is used to obtain all data when the device is in a video call.
  • the data sent by the device is mirrored into multiple pieces of data through the splitter or splitter, and sent to the other device at the same time, it is also sent to the processing device, and the processing device saves it in the database , So get the mirrored data sent to the device.
  • the devices are devices at both ends of the video call, that is, the video call initiator and the video call receiver. In this application, it is the customer's equipment and AI end equipment. That is, the data of both ends of the video call at one end is analyzed at the same time. The quality analysis methods at both ends of the video call are the same, so one end is taken as an example for description below.
  • the devices include, but are not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
  • the processing device includes, but is not limited to, an electronic device with processing capabilities such as a server.
  • the splitter is used for the proportional distribution of optical devices for optical signals. Among them, a large proportion of optical signals are given to the service link, and a small proportion of optical signals are given to the bypass monitoring link, which is the processing equipment in this case.
  • the splitter can split the traffic according to the strategy and guarantee the same source and the same sink under the condition of multi-port load.
  • Commonly used are 2.5G splitter, 10G splitter, 40G splitter and test access port (TAP). It is a hardware device that is directly plugged into the network cable and sends a network communication to other devices.
  • all data during the video call can also be stored in a node of a blockchain.
  • the parsing module 102 is used to extract the SIP message and its corresponding port information and the first RTP data in the data.
  • the parsing module 102 is configured to extract the SIP message and its corresponding port information and the first RTP data after the video call ends, after the processing device obtains all the data packets of the video call.
  • the comparison analysis module 103 is configured to extract the second RTP data corresponding to the port information of the SIP message from the database, and compare the first RTP data with the second RTP data, and if the comparison is successful, then Identify the SIP message, and calculate the time stamp difference and the packet loss rate of the first RTP data.
  • the comparison and analysis module 103 uses the port information to find the corresponding second RTP data in the database. And compare the first RTP data with the second RTP data.
  • the function of the comparison port is that the AI video call quality analysis method of this application not only analyzes one video call, but also analyzes the quality of multiple video calls at the same time, for example, analyzes the quality of dozens of video calls at the same time. Therefore, if the comparison is unsuccessful, it means that the first RTP data extracted first is wrong. If the analysis and testing are continued, the data will be disordered. The above problems can be avoided by analyzing and testing the data after successful comparison.
  • the identification of SIP messages is mainly to identify their status codes and BYE methods.
  • the status codes are divided into 1XX: temporary response, indicating that the request message is being processed; 2XX: session success, indicating that the request has been successfully received and fully understood; 3XX : Redirect, indicating that further requests are needed to be completed; 4XX, request failure, indicating that the request message contains syntax error information or the server cannot complete the client's request; 5XX, server error, indicating that the server cannot legally complete the request; 6XX: global failure, Indicates that no server can complete the request.
  • the BYE method means terminating the session connection.
  • the data packet information in the first RTP data is obtained to calculate the time stamp difference between adjacent data packets, and the part of the serial number lost during the entire video call is obtained to calculate the packet loss rate of the data packet.
  • the judging module 104 is configured to, when it is recognized that the SIP message is a call end instruction, perform abnormal judgment on the video call based on the time stamp difference and the packet loss rate.
  • the judgment module 104 analyzes and detects the RTP data, it also analyzes and recognizes the SIP messages in sequence.
  • the call end instruction is identified in the SIP message, the audio packet and video in the RTP data are terminated. Packet analysis and detection, and start to judge the video call based on the time stamp difference and the packet loss rate; the call end instruction includes a BYE message and a 200 OK message for end confirmation.
  • the feature code recording module 105 is configured to record the feature code corresponding to the first RTP data in the database according to the judgment result.
  • the feature code recording module 105 is used to record the feature code corresponding to the condition that the RTP stream is sent less or more in the normal condition or the abnormal condition according to the judgment result, that is, the aforementioned judgment result is the normal or abnormal condition corresponding to the abnormality.
  • the database records of abnormal or normal conditions are realized. According to the above-mentioned rough judgment, record the abnormal and normal feature codes of the matching. If the judgment result is normal, it corresponds to the matching feature code 2; if the judgment result is abnormal, the matching corresponds to the feature code 7.
  • the device uses the acquisition module, the analysis module, the comparison analysis module, the judgment module, and the feature code recording module to accurately analyze and detect the abnormal situation in the video call of the remote consultation.
  • the abnormal data and abnormal types are counted to facilitate the rapid processing of video call abnormalities in the follow-up.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the shown components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store an operating system and various application software installed in the computer device 4, such as computer-readable instructions for an AI video call quality analysis method.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run computer-readable instructions or processed data stored in the memory 41, such as computer-readable instructions for running the AI video call quality analysis method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the steps of the AI video call quality analysis method in the above-mentioned embodiment are implemented, by acquiring all the data during the video call, and combining the SIP messages and their corresponding ports in the data.
  • the information and the first RTP data are extracted.
  • the time stamp difference and the packet loss rate When the call end instruction in the SIP message is obtained, the time stamp difference and the packet loss rate of the first RTP data are judged to determine whether the data is abnormal, and the first RTP data
  • the corresponding feature code is recorded in the database.
  • This application determines through further analysis of the abnormal data, Statistic the abnormal data and abnormal types to facilitate the rapid processing of video call abnormalities in the follow-up.
  • the present application also provides another implementation manner, that is, a computer-readable storage medium is provided with computer-readable instructions stored thereon, and the computer-readable instructions can be executed by at least one processor to The at least one processor is made to execute the steps of the AI video call quality analysis method as described above, by acquiring all the data during the video call, and extracting the SIP message and its corresponding port information and the first RTP data in the data. First, Extract the second RTP data corresponding to the SIP message corresponding port in the database and compare it with the first RTP data to ensure port consistency. After that, the time stamp difference and packet loss rate of the first RTP data will be calculated.
  • the time stamp difference and the packet loss rate of the first RTP data are judged to determine whether the data is abnormal, and the feature code corresponding to the first RTP data is recorded in the database.
  • the feature code corresponding to the first RTP data is recorded in the database.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Cardiology (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种AI视频通话质量分析方法,包括:获取设备进行视频通话时的所有数据(S1);提取数据中的SIP消息及其对应的端口信息和第一RTP数据(S2);在数据库中提取SIP消息的端口信息对应的第二RTP数据,将第一RTP数据与第二RTP数据进行比对,若比对成功则识别SIP消息并计算第一RTP数据的时间戳差值和丢包率(S3);当识别到SIP消息为通话结束指令,则基于时间戳差值和丢包率对视频通话进行异常判断(S4);根据判断结果,将第一RTP数据对应的特征码记录到数据库中(S5)。所述视频通话时的所有数据存储于区块链中。所述方法能对远程视频会诊通话中出现的异常情况进行准确分析检测。

Description

AI视频通话质量分析方法、装置、计算机设备及存储介质
本申请要求于2020年09月18日提交中国专利局、申请号为202010990036.4,发明名称为“AI视频通话质量分析方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数字医疗的远程会诊通话技术领域,尤其涉及一种AI视频通话质量分析方法、装置及计算机可读存储介质。
背景技术
人工智能(Artificial Intelligence,Al)属于一种新的技术科学,是对模拟、延伸以及拓展人的智能的相关理论和方法进行研究的科学,属于计算机科学的一个分支。各种AI技术已经趋于发展成熟,目前AI视频通话已经开始逐渐取代人工,在远程面审、视频回访、远程开户等场景得到了应用。但是发明人发现在远程会诊的通话过程当中,当网络波动或网络抖动,影响音视频通话效果时,AI还无法准确智能识别音视频效果,后续无法根据音视频效果进行对应处理,使得会诊通话的视频录制不能作为相关凭证保留。
发明内容
本申请提供一种AI视频通话质量分析方法、装置、电子设备及计算机可读存储介质,以解决现有技术中在远程会诊的视频通话时无法准确的检测音视频的质量问题。
为解决上述问题,本申请提供的一种AI视频通话质量分析方法,包括:
获取设备进行视频通话时的所有数据;
提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
为了解决上述问题,本申请还提供一种AI视频通话质量分析装置,所述装置包括:
获取模块,用于获取设备进行视频通话时的所有数据;
解析模块,用于提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
比对分析模块,用于在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
判断模块,用于当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
特征码记录模块,用于根据所述判断结果,将所述第一RTP数据对应的特征 码记录到所述数据库中。
为了解决上述问题,本申请实施例还提供一种计算机设备,包括存储器、处理器,以及存储在所述存储器中,并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取设备进行视频通话时的所有数据;
提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
为了解决上述问题,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的AI视频通话质量分析方法的步骤:
获取设备进行视频通话时的所有数据;
提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
根据本申请实施例提供的AI视频通话质量分析方法、装置、计算机设备及存储介质,与现有技术相比至少具有以下有益效果:
通过获取视频通话时的所有数据,并将数据中的SIP消息及其对应端口信息和第一RTP数据提取出,首先在数据库中将SIP消息对应端口的对应第二RTP数据提取出并和第一RTP数据进行比对,确保端口的一致性,后将计算第一RTP数据的时间戳差值和丢包率,在获取到所述SIP消息中的通话结束指令时,将第一RTP数据的时间戳差值和丢包率进行判断,判断数据是否异常,将第一RTP数据对应的特征码记录到数据库中。通过将对第一RTP数据进行检测,并将不同的第一RTP数据匹配对应对特征码,可以实现在远程会诊的视频通话中出现的异常情况进行准确检测,本申请还通过对异常数据的进一步分析判断,将异常数据及异常类型统计出,方便后续对会诊视频通话异常进行快速处理。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图做一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一实施例提供的RTP数据包头的封装结构示意图;
图2为本申请一实施例提供的AI视频通话质量分析方法的流程示意图;
图3为本申请一实施例提供的AI视频通话质量分析装置的模块示意图;
图4为本申请一实施例的计算机设备的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”“第二”等是用于区别不同对象,而不是用于描述特定顺序。
下面结合本申请实施例中对附图对本申请实施例进行描述。
下面介绍本申请涉及对实时传输协议(Real-time Transport Protocol,RTP)数据包,本申请中也称RTP数据。实时传输协议(RTP)为数据提供了具有实时特征对端对端传送服务,如在组播或单播网络服务下对交互式游戏、影像、语音、传真、网络会议等数据。
SIP(Session Initiation Protocol,会话发起协议),是用来帮助提供跨越因特网对高级电话业务,是IETF标准进程对一部分,它是在诸如SMTP(简单邮件传送协议)和HTTP(超文本传送协议)基础上建立起来的。用于建立、改变和终止基于IP网络的用户间呼叫。
SIP的六种基本方法,REGISTER,注册联系信息;INVITE,初始化一个会话,可以理解为发起一个呼叫;ACK,对INVITE消息的最终响应;CANCEL,取消一个等待处理货正常处理的请求;BYE,终止一个会话;OPTIONS,查询服务器和能力,也可以用作ping测试。
上述请求方法之后,必须包含的头域包括:CALL_ID,用于区分不同会话的唯一标志;CSeq,顺序号,用于在同一会话中区分请求方法,在一段对话中每一个新的请求都会增加这个整数的值,保证这个数值是有序的;FORM,说明请求来源;TO,说明请求接收方;MAX-Forwards,限制跳跃点数和最大转发次数。
请参阅图1,图1是本申请实施例提供对一种RTP包头的封装结构示意图。
V:RTP协议对版本号,占2比特。
P:填充标志,占1比特;若P=1,则在该报文对尾部填充一个或多个额外对八位组,它们不是有效载荷对一部分。
X:扩展标志,占1比特;若X=1,则在RTP报头后跟有一个扩展报头。
CC:CSRC(Contributing Source)计数器,占4比特,指示CSRC标示符对个数。
M:该位解释由配置文档来承担,目的在于允许重要事件在包流中标记出来,占位1比特;如不同对有效载荷由不同对含义,对于视频,标记一帧对结束;对于音频,标记会话对开始。
PT:(Payload Type)载荷类型,标识流RTP载荷对类型,用来指示声音或影像等使用对编码类型,由发送端决定,占位7比特。
SN:(Sequence Number)序列号,用于标识发送者所发送的RTP报文的序列号,每发送一个报文,序列号增加1,序列号对初始值是随即产生的,可以用于检查丢包以及进行数据包排序,占位16比特。
Timestamp:时间戳,记录该RTP数据包中数据的第一个字节对采样时间。接收方根据时间戳能够确定数据的到达是否受到流延迟抖动对影响。在一次会话 开始时,时间戳初始化成一个初始值,且时间戳的数值也随时间的增加而不断地增加,占位32比特。
SSRC:(Synchronization Source)同步信源标识符,用于标识同步信源,该标识符时随机产生对,参加同意视频会议对两个同步信源不能有相同对SSRC,占位32比特。
CSRC列表:(Contributing Source)特约信源列表,包括0~15项CSRC标识符,用来标识包含在RTP包有效载荷中所有的特约信源,以便接收端RTP包能正确指出会话双方对身份,每个CSRC标识符站32比特。
本申请提供一种AI视频通话质量分析方法。参照图2所示,为本申请一实施例提供的AI视频通话质量分析方法的流程示意图。
在本实施例中,AI视频通话质量分析方法包括:
S1、获取设备进行视频通话时的所有数据。
详细的,在设备进行视频通话时将设备发出的数据通过分光器或分流器,镜像的分成多份数据,在发送至对方设备的同时,还发送至处理设备,处理设备将其保存至数据库中,所以获取发送至设备的镜像数据。
所述设备为视频通话两端的设备,即视频通话发起方和视频通话接收方。在本申请中为客户的设备和AI端设备。即同时分析一端视频通话的两端的数据。所述视频通话两端的质量分析方法都是一致的,所以以下就一端为例进行说明。
所述设备包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等。
所述处理设备包括但不限于服务器等具有处理能力的电子设备。
分光器为对光信号用光学器件进行比例分配。其中大比例光信号给业务链路,小比例光信号给旁路监控链路即本案中的处理设备。
分流器可以按照策略分流,并在多端口负载情况下保证同源同宿。常用的有2.5G分流器、10G分流器、40G分流器和测试接入端口(TAP),是一个硬件设备,它直接插入到网络电缆和发送一份网络通信给其它设备。
S2、提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据。
详细的,在视频通话结束后,处理设备获取到该次视频通话的所有数据后,提取其中的SIP消息及其对应的端口信息和第一RTP数据。
进一步的,S2具体包括:
从所述数据中提取所述SIP消息中的INVITE消息,从所述INVITE消息中获取固定字段并记录到数据库和内存中,所述固定字段即INVITE消息中的CALL_ID字段,即在数据库和内存以CALL_ID字段为名称建立一个文件夹,该字段是一通通话的唯一标识。
以及提取SIP消息中对应的端口信息;
提取所述数据中的第一RTP数据包,作为第一RTP数据。
具体的,对所有第一RTP数据进行解析,将其携带的数据包依次提取出来,并按照RTP数据的序列号,将其携带的数据包依次排列。并且提取SIP消息及其端口信息,在一段视频通话中端口信息是固定的。
在本案中,数据包包括音频包和视频包,在一个较佳的实施例中,在视频通话结束后,对所有第一RTP数据进行解析,将其携带的音频包和视频包依次提取出来,并按照音频包和视频包的序列号,将其携带的音频包和视频包分类型依次排列。
需要强调的是,为了进一步保证上述视频通话时的所有数据的私密性和安全 性,视频通话时的所有数据还可以存储于一区块链的节点中。
通过采用上述方法,将一段视频通话的相关信息保存至以CALL_ID命名的文件夹中并将文件夹保存至数据库和内存中,保证了该视频通话数据的唯一性,避免数据紊乱。并提取SIP消息及其对应的端口,以及第一RTP数据中的数据包。
S3、在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率。
详细地,在步骤S2中获取到SIP消息对应的端口信息后,利用该端口信息找到数据库中对应的第二RTP数据。并将第一RTP数据与第二RTP数据进行比对。比对端口的作用在于,本申请的AI视频通话质量分析方法在分析时,不止仅分析一段视频通话,而是同时进行多段视频通话的质量分析,例如同时进行几十段视频通话的质量分析。所以比对不成功,就表示最先提取的第一RTP数据有误,若继续进行分析检测的话,会造成数据紊乱。比对成功再进行数据的分析检测就可避免上述问题。
对SIP消息的识别,主要在于识别其状态码及BYE方法,所述状态码分为1XX:临时应答,表示请求消息正在被处理;2XX:会话成功,表示请求已被成功接收并完全理解;3XX:重定向,表示需采取进一步完成请求;4XX,请求失败,表示请求消息中包含语法错误信息或服务器无法完成客户机的请求;5XX,服务器错误,表示服务器无法合法完成请求;6XX:全局故障,表示任何服务器都无法完成该请求。而BYE方法则表示终结会话连接。
通过获取第一RTP数据中的数据包信息来计算相邻数据包之间的时间戳差值,并获取整个视频通话过程中序列号丢失的部分,来数据包的丢包率。
进一步地,所述数据包包括音频包和/或视频包,S3具体包括:
分析所述SIP消息的状态码,即分析识别SIP消息中的状态码及BYE方法,并且状态码为200OK时,表示会话成功。
依次记录所述第一RTP数据中每个音频包的序列号和时间戳,将每个所述音频包的序列号和时间戳分别依次减去上一音频包的序列号和时间戳,得到两个相邻的所述音频包的时间戳差值,记为delta1;
分析两个相邻的所述音频包的序列号差值,若差值不为1,则在总的丢失的音频包数量中增加对应的所述差值,根据所述总的丢失的音频包数量计算出音频包的丢包率,并记录到数据库和内存中;
将所述delta1与数据库和内存中上一次记录的delta1相比较,若比数据库和内存中记录的delta1大,则将数据库和内存中的delta1进行更新;若比数据库中记录的delta1小,则不做处理。
具体的,首先将RTP数据中的音频包进行分析检测,即通过依次记录RTP数据中携带的音频包的序列号和时间戳,将序列号从小到大的顺序将音频包依次排列,并将每个所述音频包的序列号和时间戳分别减去上一音频包的序列号和时间戳,并将所述时间戳的差值记录为delta1;若出现两个相邻的所述音频包的序列号差值不为1,则在总的丢失的音频包数量中增加对应的所述差值,即统计在该次视频通话中丢失的音频包的总数量,以便利用该丢失的音频包的总数量除以总的音频包,以计算出该次视频通话的音频包的丢包率;将每个delta1与在数据库和内存中先前存储的delta1,即上一次存储的delta1相比较,若比先前存储在数据库和内存中的delta1大,则利用当前的delta1对数据库中先前存储的值进行更新;若比先前存储在数据库和内存中的delta1小,则不做更新处理。 更新delta1的作用在于确定该次视频通话中音频包的最大时延。
进一步地,依次记录所述第一RTP数据中每个视频包的序列号和时间戳,将每个所述视频包的序列号和时间戳分别依次减去上一视频包的序列号和时间戳,得到两个相邻的所述视频包的时间戳差值,记为delta2;
分析两个相邻的所述视频包的序列号差值,若差值不为1,则在总的丢失的视频包数量中增加对应的所述差值,根据所述总的丢失的视频包数量计算出视频包的丢包率,并记录到数据库和内存中;
将所述delta2与数据库和内存中上一次记录的delta2相比较,若比数据库和内存中记录的delta2大,则将数据库和内存中的delta2进行更新;若比数据库中记录的delta2小,则不做处理。
具体的,首先将RTP数据中的视频包进行分析检测,即通过依次记录RTP数据中携带的视频包的序列号和时间戳,将序列号从小到大的顺序将视频包依次排列,并将每个所述视频包的序列号和时间戳分别减去上一视频包的序列号和时间戳,并将所述时间戳的差值记录为delta2;若出现两个相邻的所述视频包的序列号差值不为1,则在总的丢失的视频包数量中增加对应的所述差值,即统计在该次视频通话中丢失的视频包的总数量,以便利用该丢失的视频包的总数量除以总的视频包,以计算出该次视频通话的视频包的丢包率;将每个delta2与在数据库和内存中先前存储的delta2相比较,即上一次存储的delta1相比较,若比先前存储在数据库和内存中的delta2大,则利用当前的delta2对数据库中先前存储的值进行更新;若比先前存储在数据库和内存中的delta2小,则不做更新处理。更新delta2的作用在于确定该次视频通话中视频包的最大时延。
上述所述RTP数据中的视频包和音频包的处理顺序不分先后,并可以同时进行。上述进行的时间戳差值分析和序列号差值分析同样不分先后。
通过采用上述方法计算出该段视频通话的时间戳差值分析和序列号差值分析,计算出该段视频通话的最大时间戳差值和丢包率,实现对该段视频通话的质量分析。
进一步地,以序列号最小的所述数据包的时间戳为起始,依次设定预设秒数的计时段;
记录一个计时段内的所述数据包的发送数量,将所述发送数量除以预设秒数得到每秒钟发送的所述数据包的数量,并记录到所述数据库和内存中。
具体的,通过先获取到序列号最小的数据包的时间戳,即找到最早发送的数据包为起始,依次设定预设秒数的计时段,记录一个计时段内的数据包发送数量,通过将所述发送数量除以预设秒数得到每秒钟发送的音频包或视频包的数量,,在本实施例中,预设秒数为1,来判断该次视频通话的整体质量,并记录到数据库和内存中。
通过上述方法来计算出该段视频通话的每秒钟视频包或音频包的发送数量,进一步实现对该段视频通话的质量分析。
S4、当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断。
详细的,在对RTP数据进行分析检测时,也在对SIP消息依次进行分析检测,当在SIP消息中识别出通话结束指令时,结束对第一RTP数据中的音频包和视频包的分析检测,并开始对所述时间戳差值和丢包率对所述视频通话进行判断;所述通话结束指令包括BYE方法和结束确认的200OK消息。
进一步的,S4具体包括:
提取数据库中存储的所述视频通话的音频包和视频包的时间戳差值、丢包率以及每秒钟发送的数据包的数量;
判断所述音频包和视频包的时间戳差值、丢包率以及每秒钟发送的数据包的数量是否符合预设要求;
若不符合所述预设要求,则确定所述视频通话数据异常。
具体的,上述在分析检测音频包和视频包的同时,将分析检测结果存储到数据库和内存中,所以在解析到通话结束指令时,直接从数据库中提取先前存储的音频包和视频包的时间戳差值、丢包率以及每秒钟发送的数量;
根据预设数值来判断音频包和视频包的时间戳差值、丢包率以及每秒钟发送的数据包数量是否符合要求;
若不符合所述预设要求,则确定所述视频通话数据异常。
通过获取到时间戳差值、丢包率以及每秒钟发送的数量信息,来和预设的要求进行比较判断,判断该段视频通话是否异常,通过该步骤能够实现质量分析的异常情况的得出。
所述时间戳差值、丢包率和每秒钟发送的数据包的数量分别对应有三种预设数值,并且对于数据包包括音频包和视频包,即音频包和视频包对应的时间戳差值、丢包率和每秒钟发送的数量的预设数值也不相同。并且只要有音频包或视频包不符合其时间戳差值、丢包率和每秒钟发送的数量对应的预设数值任一项时,即只要出现一项不符合,就可确定该段视频通话异常。下面将对所述音频包和视频包的时间戳差值、丢包率和每秒钟发送的数量对应的预设数值进行详细描述。
再进一步的,所述若不符合预设要求,则确定所述视频通话数据异常包括:
根据音频包的预设要求,所述第一预设发送标准为每秒发送50~60个音频包为正常,若本次视频通话中每秒发送的音频包低于50或大于60则表示数据异常,视频通话中的音频异常;
根据视频包的预设要求,所述第二预设发送标准为每秒发送45个视频包,且误差在15个视频包内则为正常,若本次视频通话中每秒发送的视频包低于30或大于60则表示数据异常,视频通话中的视频异常;
所述第一预设数值为2000,则当音频包或视频包的时间戳差值大于2000,则表示音视频记录为异常;
所述第二预设数值为1000,所述第三预设数值为3%,则当音频包或视频包的时间戳差值大于1000且丢包率大于3%,则表示音视频记录为异常;
所述第四预设数值为5%,所述第五预设数值为4%,根据数据库中存储的音频包和视频包的丢包率,若音频包的丢包率大于5%且视频包的丢包率大于4%,则表示数据异常,具体为音视频流异常;
只要出现上述任一情况,就可得出该段视频通话异常,但还需将上述判断都进行,以便得出第一RTP数据的异常数量。此处是进行粗略判断,得出第一RTP数据为正常或异常。
需要说明的是,这里对第一预设数值大于第二预设数值,第三预设数值小于第四和第五预设数值,并限定了具体数值,这仅是提出的最佳实施例。在其他方案中也可能设定为第三预设数值大于第四或第五预设数值等情况。
上述步骤对预设要求进行了明确的限制,能更明确的判断出该段视频通话的质量情况,为异常还是正常。
S5、根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
详细的,根据判断结果,即上述的判断结果为异常对应的正常或异常情况,通过将正常情况或异常情况中RTP流少发或多发等情况对应的特征码记录到数据库中,实现对异常情况或正常情况的记录。根据上述粗略的判断,将匹配异常和正常的特征码并记录,若判断结果为正常,则对应匹配特征码2;若判断结果为异常的匹配对应特征码7。
进一步的,S5具体包括:
若判断结果为正常,则将正常的所述第一RTP数据匹配对应的预设特征码;
若判断结果为异常,则将所述第一RTP数据的异常数量累加到所述数据库中的异常总量上,并根据异常判断结果对应的时间戳差值、丢包率或者每秒钟发送的数据包数量来确定所述第一RTP数据的异常类别;
基于所述异常类别匹配对应的所述预设特征码;
将所述第一RTP数据对应的所述预设特征码组成编码并记录到所述数据库中。
具体地,在一个较佳的实施例中,通过上述的异常判断,判断结果为正常时,直接获取正常的所述第一RTP数据匹配对应的预设特征码,即RTP流正常。
若判断结果为异常,则统计所述第一RTP数据的异常数量,并将所述第一RTP数据的异常数量累加到所述数据库中的异常总量上,从而可以得出多段视频通话检测出的异常总量,并根据上述异常判断结果,在数据库中提取判断结果为异常的所述视频通话的所述数据包的时间戳差值、丢包率以及每秒钟发送的数量对异常的第一RTP数据进行异常类别的确定。即
当所述每秒钟发送的音频包或视频包的数量为0即对应RTP流为0;
当音频包每秒钟发送的数量少于50个或视频包每秒钟发送的数量少于30个时即对应RTP流少发;
当音频包每秒钟发送的数量超过60个或音频包每秒钟发送的数量超过60个时即对应RTP流超发;
当所述音频包或视频包的丢包率不为0时则对应RTP丢包;
当音频包或视频包的时间戳差值超过2000时即对应Delta异常;
当音频包或视频包的丢包率高于70%时音视频将无法合成即对应音视频未合成。
并将对上述的异常类型匹配对应的预设特征码,即预先设置RTP流为0对应特征码为0、RTP流少发对应特征码为1、RTP流正常对应特征码为2、RTP流超发对应特征码为3、RTP丢包对应特征码为4、Delta值异常对应特征码为5以及音视频未合成对应特征码为6。将上述的数据对应的预设特征码按照[标识位(2位)]+[编码位(4位)],例如VE0000,VE即videoerror,但该位并无实际意义,其可为任一两位英文字母替代,编码位按顺序分别代表SRC_AUDIO(客户端的音频情况)、SRC_VIDEO(客户端的视频情况)、DST_AUDIO(AI端的音频情况)、DST_VIDEO(AI端的视频情况),即客户端发出的音视频数据情况和AI端发出的音视频数据情况。若一段视频通话中,出现上述多种情况时,按RTP流超发/RTP流少发/RTP流为0、DELTA异常、RTP丢包、音视频未合成的顺序,优先级从高到低展示特征码。但未展示的特征码及其对应的异常情况同样保存至数据库中。例如若一段视频通话中的音频包既有RTP流超发又有RTP丢包的情况,其特征码将只展示3,但数据库中将保存RTP流超发和RTP丢包的异常情况及其对应特征码。最后将上述编码保存至数据库中。
同时,将实时提取数据库中的编码,将其对应的异常情况汇总显示出来,可 以看见各个类别的数据情况,可以及时了解到视频通话的通话质量情况。此处对第一RTP数据再进行判断为细化判断,得出第一RTP数据的具体异常类型。
通过上述步骤,将该段视频通话的异常数据进行了分类,使得异常数据的情况能更清楚的展示出来,便于后续的异常数据的处理。
进一步的,所述数据包包括音频包和/或视频包,在上述的实施例中所述数据包包括音频包和视频包。也可以只包括音频包或视频包。
在本申请的其他实施例中,可以通过只获取第一RTP数据的时间戳差值和丢包率来判断所述视频通话是否异常,不用记录每秒钟发送的数据包数量。
通过获取视频通话时的所有数据,并将数据中的SIP消息及其对应端口信息和第一RTP数据提取出,首先在数据库中将SIP消息对应端口的对应第二RTP数据提取出并和第一RTP数据进行比对,确保端口的一致性,后将计算第一RTP数据的时间戳差值和丢包率,在获取到所述SIP消息中的通话结束指令时,将第一RTP数据的时间戳差值和丢包率进行判断,判断数据是否异常,将第一RTP数据对应的特征码记录到数据库中。通过将对第一RTP数据进行检测,并将不同的第一RTP数据匹配对应对特征码,可以实现在远程会诊的视频通话中出现的异常情况进行准确检测,本申请还通过对异常数据的进一步分析判断,将异常数据及异常类型统计出,方便后续对视频通话异常进行快速处理。
如图3所示,是本申请AI视频通话质量分析装置的功能模块图。
本申请所述AI视频通话质量分析装置100可以安装于电子设备中。根据实现的功能,所述AI视频通话质量分析装置100可以包括获取模块101、解析模块102、比对分析模块103、判断模块104和特征码记录模块105。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机可读指令的指令段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
在本实施例中,所述数据包包括音频包和数据包。
获取模块101,用于获取设备进行视频通话时的所有数据。
详细的,在设备进行视频通话时将设备发出的数据通过分光器或分流器,镜像的分成多份数据,在发送至对方设备的同时,还发送至处理设备,处理设备将其保存至数据库中,所以获取发送至设备的镜像数据。
所述设备为视频通话两端的设备,即视频通话发起方和视频通话接收方。在本申请中为客户的设备和AI端设备。即同时分析一端视频通话的两端的数据。所述视频通话两端的质量分析方法都是一致的,所以以下就一端为例进行说明。
所述设备包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等。
所述处理设备包括但不限于服务器等具有处理能力的电子设备。
分光器为对光信号用光学器件进行比例分配。其中大比例光信号给业务链路,小比例光信号给旁路监控链路即本案中的处理设备。
分流器可以按照策略分流,并在多端口负载情况下保证同源同宿。常用的有2.5G分流器、10G分流器、40G分流器和测试接入端口(TAP),是一个硬件设备,它直接插入到网络电缆和发送一份网络通信给其它设备。
需要强调的是,为了进一步保证上述视频通话时的所有数据的私密性和安全性,视频通话时的所有数据还可以存储于一区块链的节点中。
解析模块102,用于提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据。
详细的,解析模块102用于在视频通话结束后,处理设备获取到该次视频通话的所有数据包后,提取其中的SIP消息及其对应的端口信息和第一RTP数据。
比对分析模块103,用于在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率。
详细地,比对分析模块103在解析模块102中获取到SIP消息对应的端口信息后,利用该端口信息找到数据库中对应的第二RTP数据。并将第一RTP数据与第二RTP数据进行比对。比对端口的作用在于,本申请的AI视频通话质量分析方法在分析时,不止仅分析一段视频通话,而是同时进行多段视频通话的质量分析,例如同时进行几十段视频通话的质量分析。所以比对不成功,就表示最先提取的第一RTP数据有误,若继续进行分析检测的话,会造成数据紊乱。比对成功再进行数据的分析检测就可避免上述问题。
对SIP消息的识别,主要在于识别其状态码及BYE方法,所述状态码分为1XX:临时应答,表示请求消息正在被处理;2XX:会话成功,表示请求已被成功接收并完全理解;3XX:重定向,表示需采取进一步完成请求;4XX,请求失败,表示请求消息中包含语法错误信息或服务器无法完成客户机的请求;5XX,服务器错误,表示服务器无法合法完成请求;6XX:全局故障,表示任何服务器都无法完成该请求。而BYE方法则表示终结会话连接。
通过获取第一RTP数据中的数据包信息来计算相邻数据包之间的时间戳差值,并获取整个视频通话过程中序列号丢失的部分,来计算数据包的丢包率。
判断模块104,用于当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断。
详细的,所述判断模块104在对RTP数据进行分析检测时,也在对SIP消息依次进行分析识别,当在SIP消息中分析识别出通话结束指令时,结束对RTP数据中的音频包和视频包的分析检测,并开始对所述时间戳差值和丢包率对所述视频通话进行判断;所述通话结束指令包括BYE的消息和结束确认的200OK消息。
特征码记录模块105用于根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
详细的,特征码记录模块105用于根据判断结果,即上述的判断结果为异常对应的正常或异常情况,通过将正常情况或异常情况中RTP流少发或多发等情况对应的特征码记录到数据库中,实现对异常情况或正常情况的记录。根据上述粗略的判断,将匹配异常和正常的特征码并记录,若判断结果为正常,则对应匹配特征码2;若判断结果为异常的匹配对应特征码7。
通过采用上述装置,所述装置通过获取模块、解析模块、比对分析模块、判断模块和特征码记录模块配合使用,从而实现对远程会诊的视频通话中出现的异常情况进行准确分析检测,并将异常数据及异常类型统计出来,方便后续对视频通话异常进行快速处理。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括 但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如AI视频通话质量分析方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述AI视频通话质量分析方法的计算机可读指令。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本实施例通过处理器执行存储在存储器的计算机可读指令时实现如上述实施例AI视频通话质量分析方法的步骤,通过获取视频通话时的所有数据,并将数据中的SIP消息及其对应端口信息和第一RTP数据提取出,首先在数据库中将SIP消息对应端口的对应第二RTP数据提取出并和第一RTP数据进行比对,确保端口的一致性,后将计算第一RTP数据的时间戳差值和丢包率,在获取到所述SIP消息中的通话结束指令时,将第一RTP数据的时间戳差值和丢包率进行判断,判断数据是否异常,将第一RTP数据对应的特征码记录到数据库中。通过将对第一RTP数据进行检测,并将不同的第一RTP数据匹配对应对特征码,可以实现在视频通话中出现的异常情况进行准确检测,本申请还通过对异常数据的进一步分析判断,将异常数据及异常类型统计出,方便后续对视频通话异常进行快速处理。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的AI视频通话质量分析方法的步骤,通过获取视频通话时的所有数据,并将数据中的SIP消息及其对应端口信息和第一RTP数据提取出,首先在数据库中将SIP消息对应端口的对应第二RTP数据提取出并和第一RTP数据进行比对,确保端口的一致性,后将计算第一RTP数据的时间戳差值和丢包率,在获取到所述SIP消息中的通话结束指令时,将第一RTP数据的时间戳差值和丢包率进行判断,判断数据是否异常,将第一RTP 数据对应的特征码记录到数据库中。通过将对第一RTP数据进行检测,并将不同的第一RTP数据匹配对应对特征码,可以实现在视频通话中出现的异常情况进行准确检测,本申请还通过对异常数据的进一步分析判断,将异常数据及异常类型统计出,方便后续对视频通话异常进行快速处理。所述计算机可读存储介质可以是非易失性,也可以是易失性。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种AI视频通话质量分析方法,所述方法包括:
    获取设备进行视频通话时的所有数据;
    提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
    在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
    当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
    根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
  2. 根据权利要求1所述的AI视频通话质量分析方法,其中,所述提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据包括:
    从所述数据中提取所述SIP消息中的INVITE消息,从所述INVITE消息中获取固定字段并记录到所述数据库和内存中,以及提取SIP消息中对应的端口信息;
    提取所述数据中的第一RTP数据包,作为第一RTP数据。
  3. 根据权利要求1所述的AI视频通话质量分析方法,其中,所述识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率包括:
    分析所述SIP消息的状态码;
    依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值;
    分析两个相邻的所述数据包的序列号差值,若差值不为1,则在总的丢失的数据包数量中增加对应的所述差值,根据所述总的丢失的数据包数量计算出数据包的丢包率,并记录到所述数据库和内存中;
    将所述时间戳差值与数据库和内存中上一次记录的时间戳差值相比较,若比数据库和内存中记录的时间戳差值大,则将所述数据库和内存中的时间戳差值进行更新。
  4. 根据权利要求3所述的AI视频通话质量分析方法,其中,所述依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值之后,还包括:
    以序列号最小的所述数据包的时间戳为起始,依次设定预设秒数的计时段;
    记录一个计时段内的所述数据包的发送数量,将所述发送数量除以预设秒数得到每秒钟发送的所述数据包的数量,并记录到所述数据库和内存中。
  5. 根据权利要求4所述的AI视频通话质量分析方法,其中,所述当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断包括:
    提取数据库中存储的所述视频通话的所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量;
    判断所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量是否符合预设要求;
    若不符合所述预设要求,则确定所述视频通话数据异常。
  6. 根据权利要求5所述的AI视频通话质量分析方法,其中,所述数据包包 括音频包和/或视频包;所述若不符合所述预设要求,则确定所述视频通话数据异常包括:
    当所述音频包每秒钟的发送数量不满足所述预设要求中的第一预设发送标准时,则所述视频通话的音频异常;
    当所述视频包每秒钟的发送数量不满足所述预设要求中的第二预设发送标准时,则所述视频通话的视频异常;
    当所述音频包和/或视频包的所述时间戳差值大于所述预设要求中的第一预设数值时,则所述视频通话的音频异常和/或视频异常;
    当所述音频包和/或视频包的所述时间戳差值大于所述预设要求中的第二预设数值,且所述丢包率大于所述预设要求中的第三预设数值时,则所述视频通话的音频异常和/或视频异常;
    当所述音频包的丢包率大于所述预设要求中的第四预设数值,且所述视频包的丢包率大于所述预设要求中的第五预设数值时,则所述视频通话的音视频异常。
  7. 根据权利要求4所述的AI视频通话质量分析方法,其中,所述根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中包括:
    若判断结果为正常,则将正常的所述第一RTP数据匹配对应的预设特征码;
    若判断结果为异常,则将所述第一RTP数据的异常数量累加到所述数据库中的异常总量上,并根据异常判断结果对应的时间戳差值、丢包率或者每秒钟发送的数据包数量来确定所述第一RTP数据的异常类别;
    基于所述异常类别匹配对应的所述预设特征码;
    将所述第一RTP数据对应的所述预设特征码组成编码并记录到所述数据库中。
  8. 一种AI视频通话质量分析装置,所述装置包括:
    获取模块,用于获取设备进行视频通话时的所有数据;
    解析模块,用于提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
    比对分析模块,用于在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
    判断模块,用于当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
    特征码记录模块,用于根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
  9. 一种计算机设备,包括存储器、处理器,以及存储在所述存储器中,并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取设备进行视频通话时的所有数据;
    提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
    在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
    当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
    根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
  10. 根据权利要求9所述的计算机设备,其中,所述提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据的步骤包括:
    从所述数据中提取所述SIP消息中的INVITE消息,从所述INVITE消息中获取固定字段并记录到所述数据库和内存中,以及提取SIP消息中对应的端口信息;
    提取所述数据中的第一RTP数据包,作为第一RTP数据。
  11. 根据权利要求9所述的计算机设备,其中,所述识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率的步骤包括:
    分析所述SIP消息的状态码;
    依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值;
    分析两个相邻的所述数据包的序列号差值,若差值不为1,则在总的丢失的数据包数量中增加对应的所述差值,根据所述总的丢失的数据包数量计算出数据包的丢包率,并记录到所述数据库和内存中;
    将所述时间戳差值与数据库和内存中上一次记录的时间戳差值相比较,若比数据库和内存中记录的时间戳差值大,则将所述数据库和内存中的时间戳差值进行更新。
  12. 根据权利要求11所述的计算机设备,其中,所述依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值的步骤之后,还包括:
    以序列号最小的所述数据包的时间戳为起始,依次设定预设秒数的计时段;
    记录一个计时段内的所述数据包的发送数量,将所述发送数量除以预设秒数得到每秒钟发送的所述数据包的数量,并记录到所述数据库和内存中。
  13. 根据权利要求12所述的计算机设备,其中,所述当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断的步骤包括:
    提取数据库中存储的所述视频通话的所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量;
    判断所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量是否符合预设要求;
    若不符合所述预设要求,则确定所述视频通话数据异常。
  14. 根据权利要求12所述的计算机设备,其中,所述根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中的步骤包括:
    若判断结果为正常,则将正常的所述第一RTP数据匹配对应的预设特征码;
    若判断结果为异常,则将所述第一RTP数据的异常数量累加到所述数据库中的异常总量上,并根据异常判断结果对应的时间戳差值、丢包率或者每秒钟发送的数据包数量来确定所述第一RTP数据的异常类别;
    基于所述异常类别匹配对应的所述预设特征码;
    将所述第一RTP数据对应的所述预设特征码组成编码并记录到所述数据库中。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的AI视频通话质量分析方法的步骤:
    获取设备进行视频通话时的所有数据;
    提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据;
    在数据库中提取所述SIP消息的端口信息对应的第二RTP数据,将所述第一RTP数据与所述第二RTP数据进行比对,若比对成功,则识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率;
    当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断;
    根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述提取所述数据中的SIP消息及其对应的端口信息和第一RTP数据的步骤包括:
    从所述数据中提取所述SIP消息中的INVITE消息,从所述INVITE消息中获取固定字段并记录到所述数据库和内存中,以及提取SIP消息中对应的端口信息;
    提取所述数据中的第一RTP数据包,作为第一RTP数据。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述识别所述SIP消息,并计算所述第一RTP数据的时间戳差值和丢包率的步骤包括:
    分析所述SIP消息的状态码;
    依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值;
    分析两个相邻的所述数据包的序列号差值,若差值不为1,则在总的丢失的数据包数量中增加对应的所述差值,根据所述总的丢失的数据包数量计算出数据包的丢包率,并记录到所述数据库和内存中;
    将所述时间戳差值与数据库和内存中上一次记录的时间戳差值相比较,若比数据库和内存中记录的时间戳差值大,则将所述数据库和内存中的时间戳差值进行更新。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述依次记录所述第一RTP数据中每个数据包的序列号和时间戳,将每个所述数据包的序列号和时间戳分别依次减去上一数据包的序列号和时间戳,得到两个相邻的所述数据包的时间戳差值的步骤之后,还包括:
    以序列号最小的所述数据包的时间戳为起始,依次设定预设秒数的计时段;
    记录一个计时段内的所述数据包的发送数量,将所述发送数量除以预设秒数得到每秒钟发送的所述数据包的数量,并记录到所述数据库和内存中。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述当识别到所述SIP消息为通话结束指令时,则基于所述时间戳差值和丢包率对所述视频通话进行异常判断的步骤包括:
    提取数据库中存储的所述视频通话的所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量;
    判断所述数据包的时间戳差值、丢包率以及每秒钟发送的数据包的数量是否符合预设要求;
    若不符合所述预设要求,则确定所述视频通话数据异常。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述根据所述判断结果,将所述第一RTP数据对应的特征码记录到所述数据库中的步骤包括:
    若判断结果为正常,则将正常的所述第一RTP数据匹配对应的预设特征码;
    若判断结果为异常,则将所述第一RTP数据的异常数量累加到所述数据库中的异常总量上,并根据异常判断结果对应的时间戳差值、丢包率或者每秒钟发送的数据包数量来确定所述第一RTP数据的异常类别;
    基于所述异常类别匹配对应的所述预设特征码;
    将所述第一RTP数据对应的所述预设特征码组成编码并记录到所述数据库中。
PCT/CN2020/124904 2020-09-18 2020-10-29 Ai视频通话质量分析方法、装置、计算机设备及存储介质 WO2021174879A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010990036.4 2020-09-18
CN202010990036.4A CN112118442A (zh) 2020-09-18 2020-09-18 Ai视频通话质量分析方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021174879A1 true WO2021174879A1 (zh) 2021-09-10

Family

ID=73800827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124904 WO2021174879A1 (zh) 2020-09-18 2020-10-29 Ai视频通话质量分析方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112118442A (zh)
WO (1) WO2021174879A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112911385B (zh) * 2021-01-12 2021-12-07 平安科技(深圳)有限公司 待识别图片的提取方法、装置、设备以及存储介质
CN114095879A (zh) * 2021-11-09 2022-02-25 善理通益信息科技(深圳)有限公司 一种语音质量监测方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247432A (zh) * 2007-07-18 2008-08-20 北京高信达网络科技有限公司 一种VoIP语音数据实时监控的方法及装置
CN101945407A (zh) * 2010-10-22 2011-01-12 东南大学 一种应用于移动业务内容监控的负载均衡方法
CN103124412A (zh) * 2012-11-16 2013-05-29 佳都新太科技股份有限公司 一种基于rtp协议的网络抖动处理技术
US20170187774A1 (en) * 2015-12-29 2017-06-29 Spreadtrum Communications (Shanghai) Co., Ltd. Method and device for adjusting bit rate in video calling based on voice over long-term evolution and video over long-term evolution, and mobile terminal
CN109302603A (zh) * 2017-07-25 2019-02-01 中国移动通信集团北京有限公司 一种视频通话质量评估方法和装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067217B (zh) * 2012-12-14 2015-11-18 北京思特奇信息技术股份有限公司 一种通信网络服务质量的指示系统及方法
CN103384247B (zh) * 2013-07-05 2016-03-30 福建星网锐捷通讯股份有限公司 一种基于sip监控系统的视频多播实现方法
US20180359660A1 (en) * 2017-06-09 2018-12-13 Qualcomm Incorporated System and method for maintaining a network connection
CN107566776A (zh) * 2017-08-14 2018-01-09 深圳市金立通信设备有限公司 一种图像处理方法、终端及计算机可读存储介质
CN109600341B (zh) * 2017-09-30 2021-10-26 华为技术有限公司 一种即时通信检测方法、设备和计算机存储介质
CN108650550B (zh) * 2018-07-05 2021-06-25 平安科技(深圳)有限公司 网络传输质量分析方法、装置、计算机设备和存储介质
CN110996103A (zh) * 2019-12-12 2020-04-10 杭州叙简科技股份有限公司 一种根据网络情况对视频编码码率进行调节的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247432A (zh) * 2007-07-18 2008-08-20 北京高信达网络科技有限公司 一种VoIP语音数据实时监控的方法及装置
CN101945407A (zh) * 2010-10-22 2011-01-12 东南大学 一种应用于移动业务内容监控的负载均衡方法
CN103124412A (zh) * 2012-11-16 2013-05-29 佳都新太科技股份有限公司 一种基于rtp协议的网络抖动处理技术
US20170187774A1 (en) * 2015-12-29 2017-06-29 Spreadtrum Communications (Shanghai) Co., Ltd. Method and device for adjusting bit rate in video calling based on voice over long-term evolution and video over long-term evolution, and mobile terminal
CN109302603A (zh) * 2017-07-25 2019-02-01 中国移动通信集团北京有限公司 一种视频通话质量评估方法和装置

Also Published As

Publication number Publication date
CN112118442A (zh) 2020-12-22

Similar Documents

Publication Publication Date Title
CN100473025C (zh) 用于协调监控网络传输事件的方法和系统
US8547974B1 (en) Generating communication protocol test cases based on network traffic
US8572382B2 (en) Out-of band authentication method and system for communication over a data network
WO2021174879A1 (zh) Ai视频通话质量分析方法、装置、计算机设备及存储介质
WO2020006912A1 (zh) 网络传输质量分析方法、装置、计算机设备和存储介质
WO2016082371A1 (zh) 一种基于ssh协议的会话解析方法及系统
WO2022142676A1 (zh) 数据传输方法、装置、计算机可读介质及电子设备
US20230071243A1 (en) Conserving network resources during transmission of packets of interactive services
US20090138959A1 (en) DEVICE, SYSTEM AND METHOD FOR DROPPING ATTACK MULTIMEDIA PACKET IN THE VoIP SERVICE
JP2006279636A (ja) クライアント間通信ログの整合性保証管理システム
WO2023071290A1 (zh) 组播重传方法、装置、服务器以及存储介质
CN107451491B (zh) 一种提高数据库连接信息丢失时协议解析准确性的方法
CN116614481A (zh) 一种多媒体数据传输方法、装置、设备及存储介质
US20170104596A1 (en) Media detection of encrypted tunneled data
US20220368622A1 (en) Systems and methods for network optimization using end user telemetry
Li et al. An efficient intrusion detection and prevention system against SIP malformed messages attacks
CN113259621B (zh) 一种云会议分步录制方法及系统
WO2023114187A1 (en) Video conferencing systems featuring end-to-end encryption watchdog
US9762412B2 (en) Redundant traffic encoding of encapsulated real time communications
US20080008302A1 (en) System and method for H.323 call logging
WO2017107462A1 (zh) 基于p2p网络的数据处理方法、装置及系统
CN110768930B (zh) 服务器的数据转发方法和装置
CN112688824B (zh) Rtp丢包检测方法、装置、设备及计算机可读存储介质
CN110198202B (zh) 一种afdx总线消息数据源的校验方法及装置
CN111585962A (zh) 一种rtp数据包的处理方法、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923338

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31/07/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20923338

Country of ref document: EP

Kind code of ref document: A1