WO2022205964A1 - 确定视频会议质量的方法、相关装置和系统 - Google Patents

确定视频会议质量的方法、相关装置和系统 Download PDF

Info

Publication number
WO2022205964A1
WO2022205964A1 PCT/CN2021/133105 CN2021133105W WO2022205964A1 WO 2022205964 A1 WO2022205964 A1 WO 2022205964A1 CN 2021133105 W CN2021133105 W CN 2021133105W WO 2022205964 A1 WO2022205964 A1 WO 2022205964A1
Authority
WO
WIPO (PCT)
Prior art keywords
video conference
sampling period
nth
training data
traffic
Prior art date
Application number
PCT/CN2021/133105
Other languages
English (en)
French (fr)
Inventor
史浩
徐金春
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110654936.6A external-priority patent/CN115174842A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022205964A1 publication Critical patent/WO2022205964A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present application relates to the technical field of video conferences, and further relates to the application of artificial intelligence (Artificial Intelligence, AI) technology in the field of video conferences, and in particular, to a method, related apparatus and system for determining the quality of video conferences.
  • AI Artificial Intelligence
  • Video conferencing is one of the hottest applications on the Internet today. However, video conferences often suffer from the phenomenon that the quality of video conferences represented by freezing is reduced. These phenomena result in a poor user experience.
  • video conferencing users usually do not record the specific time when the freeze occurs. Therefore, even if the video conference user complains to the video conference service provider that the video conference is stuck, the video conferencing service provider cannot determine the specific time of the freeze, which makes it difficult to determine the cause of the freeze.
  • the present application provides a method, a related device and a system for determining the quality of a video conference, which can determine the quality of a video conference according to traffic data in the network.
  • an embodiment of the present application provides a method for determining the quality of a video conference, including: collecting data on the traffic of the first video conference in N sampling periods to obtain N groups of traffic characteristic data, where the N groups of traffic characteristic data are The nth group of traffic characteristic data is collected in the nth sampling period in the N sampling periods, where N is a positive integer greater than or equal to 1, and n is a positive integer greater than or equal to 1 and less than or equal to N;
  • the N groups of traffic characteristic data are input into a quality judgment model, and a quality judgment result of the first video conference in the target period is obtained, and the quality judgment result indicates the quality of the video conference in the target period, wherein the quality judgment model is based on the video conference. It is obtained by training the training data, wherein the target period is not earlier than the N sampling periods.
  • the above technical solution enables the computer device to judge the quality of the video conference according to the traffic characteristic data in the network. In this way, even if there is no conference picture, the quality during the video conference can be determined, so that the video conference provider can optimize the video conference application or the network device (eg server) for providing video conference service according to the video conference quality.
  • the network device eg server
  • the nth group of traffic characteristic data includes characteristic data of upstream traffic and/or characteristic data of downlink traffic.
  • the characteristic data of the upstream traffic included in the nth group of traffic characteristic data includes any one or more of the following data: The number of uplink data packets; the total number of bytes uploaded in the nth sampling period; the maximum value of the uplink packet size in the nth sampling period; the average value of the uplink packet size in the nth sampling period; the nth sampling period The variance of the uplink packet size in the period; the maximum value of the uplink packet interval in the nth sampling period; the average value of the uplink packet interval in the nth sampling period; the variance of the uplink packet interval in the nth sampling period; the nth sampling period The uplink packet loss rate in the period; the maximum number of consecutive uplink packet losses in the nth sampling period; the upstream byte index in the nth sampling period.
  • the characteristic data of the downlink traffic included in the nth group of traffic characteristic data includes any one or more of the following data: The number of downlink data packets; the total number of bytes downloaded in the nth sampling period; the maximum value of the downlink packet size in the nth sampling period; the average value of the downlink packet size in the nth sampling period; the nth sampling period.
  • the variance of the downlink packet size in the period; the maximum value of the downlink packet interval in the nth sampling period; the average value of the downlink packet interval in the nth sampling period; the variance of the downlink packet interval in the nth sampling period; the nth sampling period The downlink packet loss rate in the period; the maximum number of consecutive downlink packets lost in the nth sampling period; the byte index of the downlink in the nth sampling period.
  • the N sampling periods are consecutive in time.
  • the sum of the time lengths of the N sampling periods is the same as the time length of the target period.
  • the Nth sampling period in the N sampling periods is before the target period and is temporally continuous with the target period, or, the The start moment of the first sampling period in the N sampling periods is the start moment of the target period, and the end of the Nth sampling period in the N sampling periods is suitably the end moment of the target period.
  • the computer device can predict the future video conference quality in advance. If there is a problem with the video conference screen in the future (such as freezing, low resolution, etc.), the computer device can notify the user in advance. Users can know in advance that the image quality of the upcoming video conference will decrease, so as to choose a response plan according to their needs, such as turning off other applications that occupy bandwidth, switching the way to access the network, and so on.
  • the video conference training data includes multiple training datasets and multiple label information
  • the first training dataset in the multiple training datasets includes M groups of traffic feature data
  • the M groups of traffic feature data are respectively obtained by data collection of the traffic of the second video conference in M sampling periods
  • the first label information in the plurality of label information is used to indicate the first training data Whether the video conference screen corresponding to the set is stuck
  • the first training data set is any one of the multiple training data sets, and M is a positive integer greater than or equal to 1;
  • the quality judgment model is based on the multiple training data sets.
  • the training data set and the multiple label information are obtained by training.
  • the first label information is used to indicate that the video conference picture corresponding to the first training data set is stuck; If the image information is the same and the number of consecutive images is less than the preset number threshold, the first label information is used to indicate that the video conference screen corresponding to the first training data set does not freeze.
  • the preset number threshold is determined according to the following formula: Among them, Th represents the preset number threshold, Std represents a predefined video freeze standard, and t represents the duration of a single frame of image.
  • the image information of the two frames of images is the same, including the same quality parameter values of some or all of the pictures of the two frames of images, and the quality parameter values are calculated according to Laplace. Sub, Brenner gradient function or Tenengrad gradient function is determined.
  • the video conference picture of each sampling period in the plurality of sampling periods includes time-varying elements.
  • Time-varying elements may include images captured by cameras, and time-varying elements may also include scrolling timelines, timers, or GIF images. In this way, if the user does not enable the camera, and the video conference screen stays on a certain fixed screen (for example, stays on a certain page in the document for a long time), it can change with time according to the scrolling timeline, timer or GIF image, etc. The element determines whether the video conference screen freezes or degrades in quality.
  • an embodiment of the present application provides a method for training a model, the method includes: acquiring multiple training data sets and multiple label information, wherein a first training data set in the multiple training data sets includes M groups of feature data , the M groups of characteristic data are respectively obtained by data collection of the traffic of the second video conference in M sampling periods, and the mth group of characteristic data in the M groups of characteristic data includes the flow of the second video conference in the M
  • the feature data of the mth sampling period in the sampling period, the first tag information in the plurality of tag information is used to indicate the quality of the video conference screen corresponding to the first training data set, and the first training data set is the Any one of the training datasets, M is a positive integer greater than or equal to 1, m is a positive integer greater than or equal to 1 and less than or equal to M; training according to the multiple training datasets and the multiple label information A quality judgment model is obtained.
  • the above technical solution provides a method for determining a quality judgment model, and the quality judgment model determined by the above method is helpful for determining the quality of a video conference in a target period.
  • the first label information is used to indicate that the video conference screen corresponding to the first training data set is stuck; if the image information in the multi-frame images of the video conference in any one sampling period of the M sampling periods If the number of identical and continuous images is less than the preset number threshold, the first label information is used to indicate that the video conference screen corresponding to the first training data set does not freeze.
  • the preset number threshold is determined according to the following formula: Among them, Th represents the preset number threshold, Std represents a predefined video freeze standard, and t represents the duration of a single frame of image.
  • the image information of the two frames of images is the same, including the same quality parameter values of some or all of the pictures of the two frames of images, and the quality parameter values are calculated according to Laplace. Sub, Brenner gradient function or Tenengrad gradient function is determined.
  • the video conference picture of each sampling period in the plurality of sampling periods includes visual visual elements that change over time.
  • Visual elements that change over time can include images captured by cameras, and elements that change over time can also include scrolling timelines, timers, or GIF images. In this way, if the user does not enable the camera, and the video conference screen stays on a certain fixed screen (for example, stays on a certain page in the document for a long time), it can change with time according to the scrolling timeline, timer or GIF image, etc.
  • the element determines whether the video conference screen freezes or degrades in quality.
  • an embodiment of the present application provides a computer device, where the computer device includes a unit for implementing the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a computer device, where the computer device includes a unit for implementing the second aspect or any possible implementation manner of the second aspect.
  • an embodiment of the present application provides a computer device, where the computer device includes a processor, and the processor is configured to be coupled with a memory, and read and execute instructions and/or program codes in the memory to execute the first aspect or Any possible implementation of the first aspect.
  • an embodiment of the present application provides a computer device, the computer device includes a processor, and the processor is configured to be coupled with a memory, and read and execute instructions and/or program codes in the memory to execute the second aspect or Any possible implementation of the second aspect.
  • an embodiment of the present application provides a chip system, the chip system includes a logic circuit, the logic circuit is used for coupling with an input/output interface, and data is transmitted through the input/output interface, so as to execute the first aspect or the first any possible implementation of the aspect.
  • an embodiment of the present application provides a chip system, the chip system includes a logic circuit, the logic circuit is configured to be coupled with an input/output interface, and transmit data through the input/output interface, so as to execute the second aspect or the second any possible implementation of the aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where program codes are stored in the computer-readable storage medium, and when the computer storage medium runs on a computer, the computer is made to execute the first aspect or the first aspect any possible implementation.
  • embodiments of the present application provide a computer-readable storage medium, where program codes are stored in the computer-readable storage medium, and when the computer storage medium is run on a computer, the computer is made to execute the second aspect or the second aspect any possible implementation.
  • an embodiment of the present application provides a computer program product, the computer program product includes: computer program code, when the computer program code is run on a computer, the computer is made to execute the first aspect or the first aspect. any possible implementation.
  • an embodiment of the present application provides a computer program product, the computer program product comprising: computer program code, when the computer program code is run on a computer, the computer is made to execute the second aspect or the second aspect. any possible implementation.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application.
  • FIG. 2 is a method for determining the quality of a video conference provided according to an embodiment of the present application.
  • FIG. 3 is a schematic structural block diagram of a computer device provided according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a sampling period.
  • Figure 5 shows a schematic diagram of another sampling period.
  • Figure 6 is a schematic diagram of a time window.
  • FIG. 7 is a schematic diagram of a video conference screen.
  • FIG. 8 is a method for training a model according to an embodiment of the present application.
  • Figure 9 is a schematic diagram of a convolutional neural network.
  • Figure 10 is a schematic diagram of a recurrent neural network.
  • FIG. 11 is a structural diagram of a chip hardware provided by an embodiment of the present invention.
  • FIG. 12 is a schematic structural block diagram of a computer device provided according to an embodiment of the present application.
  • FIG. 13 is a schematic structural block diagram of another computer device provided according to an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the network architecture and service scenarios described in the embodiments of the present application are for the purpose of illustrating the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application.
  • the evolution of the architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
  • At least one means one or more, and “plurality” means two or more.
  • And/or which describes the relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, it can indicate that A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one item (a) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c may be single or multiple .
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application.
  • the terminal device 1 , the terminal device 2 and the terminal device 3 are installed with a video conference application client.
  • User 1 conducts a video conference with user 2 and user 3 through terminal device 1 .
  • the computer equipment obtains the traffic characteristic data of the video conference traffic, wherein the traffic characteristic data is obtained by sampling the traffic generated by a video conference between terminal devices during the sampling period, and the above-mentioned video in the target period is obtained by analyzing the traffic characteristic data The quality judgment results of the meeting.
  • FIG. 2 is a schematic flowchart of a method for determining video conference quality according to an embodiment of the present application.
  • the method shown in FIG. 2 is performed by a computer device, or implemented by a component (eg, a chip) in a computer device.
  • N is a positive integer greater than or equal to 1
  • n is a positive integer greater than or equal to 1 and less than or equal to N.
  • the technical solution shown in FIG. 2 can judge the quality of the video conference according to the traffic characteristic data in the network. In this way, even if there is no conference picture, the quality during the video conference can be determined, so that the video conference provider can optimize the video conference application or the network device (eg server) for providing video conference service according to the video conference quality.
  • the network device eg server
  • FIG. 3 is a schematic structural block diagram of a computer device provided according to an embodiment of the present application.
  • the computer device shown in FIG. 3 includes a processor 310 , a memory 350 , and a communication interface 360 .
  • the computer device 300 shown in FIG. 3 further includes a camera 320 , a display screen 330 , and an audio module 340
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the computer device 300 .
  • the computer device 300 includes more or less components than shown, or some components are combined, or some components are separated, or different components are arranged.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 310 may include one or more processing units, for example, the processor 310 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent components, or may be integrated in one or more processors.
  • computer device 300 may also include one or more processors 310 .
  • the controller can generate an operation control signal according to the instruction operation code and the timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 310 for storing instructions and data.
  • the memory in the processor 310 may be a cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 310 . If the processor 310 needs to use the instruction or data again, it can be called directly from the memory. In this way, repeated accesses are avoided, and the waiting time of the processor 310 is reduced, thereby improving the efficiency of the computer device 301 in processing data or executing instructions.
  • processor 310 may include one or more interfaces.
  • the interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal) asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input/output (GPIO) interface, etc.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the computer device 300 .
  • the computer device 300 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the communication function of the computer device 300 is implemented through the communication interface 360 .
  • the communication interface 360 may provide a wireless communication, and/or wired communication solution applied on the computer device 300 .
  • the communication interface 360 is a wired interface, such as a fiber distributed data interface (Fiber Distributed Data Interface, FDDI), a Gigabit Ethernet (Gigabit Ethernet, GE) interface.
  • the network interface 360 may also be a wireless interface that provides wireless communication functions such as 2G/3G/4G/5G/wireless local area networks (WLAN).
  • the computer device 300 implements a display function through a GPU, a display screen 330, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 330 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 310 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 330 is used to display images, videos, and the like.
  • the display screen 330 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode, or an active matrix organic light emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode Flex light-emitting diode, FLED
  • Miniled MicroLed, Micro-oLed
  • quantum dot light-emitting diode quantum dot light emitting diodes, QLED
  • computer device 300 may include one or more display screens 330 .
  • the computer device 300 can realize the shooting function through an ISP, a camera 320, a video codec, a GPU, a display screen 330, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 320 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • the ISP may be provided in the camera 320 .
  • Camera 320 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • computer device 300 may include one or more cameras 320 .
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the computer device 300 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point, and the like.
  • Video codecs are used to compress or decompress digital video.
  • Computer device 300 may support one or more video codecs. In this way, the computer device 300 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the computer device 300 can be implemented through the NPU.
  • Memory 350 may be used to store one or more computer programs including instructions.
  • the memory 350 can also be used to store the trained quality judgment model.
  • the processor 310 may execute the above-mentioned instructions stored in the memory 350, thereby causing the computer device 300 to execute the method for determining the video conference quality provided in some embodiments of the present application, as well as various applications and data processing.
  • the memory 350 may include a stored program area and a stored data area.
  • the stored program area may store an operating system; the stored program area may also store one or more applications and the like.
  • the storage data area may store data (such as conference video, etc.) created during the use of the computer device 300 and the like.
  • memory 350 may include high-speed random access memory, and may also include non-volatile memory, such as one or more disk storage components, flash memory components, universal flash storage (UFS), and the like.
  • the processor 310 may cause the computer device 300 to perform the determination provided in the embodiments of the present application by executing the instructions stored in the memory 350 and/or the instructions stored in the memory provided in the processor 310 Methods of videoconferencing quality, and other applications and data processing.
  • the computer device 300 can implement audio functions through the audio module 340 and an application processor. For example, the sound playback of the video conference, the audio recording of the video conference, etc.
  • the processor 310 generates a processing unit 311 and a data acquisition unit 312 after reading the instructions stored in the memory.
  • the data collection unit 312 performs data sampling on the traffic of the video conference provided by other external devices, so as to obtain N groups of traffic characteristic data in step 201 of FIG. 2 .
  • the data acquisition unit 312 in FIG. 3 can also be implemented by a dedicated chip independent of the processor 310 .
  • the external device After identifying the traffic of the video conference, the external device provides the traffic of the video conference to the computer device 300 shown in FIG. 3 , or the external device provides the traffic mixed with various service packets to the computer device shown in FIG. 3 . 300, the traffic of the video conference is identified from the promiscuous traffic by the computer device 300.
  • the external device or the computer device 300 determines, according to the quintuple, the traffic generated when the video conference service is performed between the terminal devices.
  • the external device is an independent hardware device deployed on the video conference traffic forwarding path, or may be a proxy plug-in installed on the terminal device where the video conference application client is located.
  • Independent hardware devices include but are not limited to traffic forwarding devices such as routers and gateways.
  • the external device captures the traffic of the video conference from multiple concurrent traffic according to the source internet protocol (IP) address, destination port number and transport layer protocol in the quintuple information.
  • IP internet protocol
  • the N sampling periods are consecutive in time.
  • the data collection unit 312 collects data on the traffic of the conference video captured by the external device, and obtains N groups of traffic characteristic data.
  • the processing unit 311 inputs the N groups of traffic characteristic data into the quality judgment model, and obtains the quality judgment result of the first video conference in the target period.
  • FIG. 4 is a schematic diagram of the sampling period.
  • Figure 4 shows 9 sampling periods, which are respectively sampling period 1 to sampling period 9, wherein the start time of sampling period 1 is 0s and the end time is 1s; the start time of sampling period 2 is 1s and the end time is 2s .
  • the data collection unit 312 samples the flow with 1s as a time unit, and obtains 9 sets of flow characteristic data.
  • the time unit of the sampling period is set as required.
  • the length of one time unit is 1s.
  • Figure 5 shows a schematic diagram of another sampling period.
  • the data collection unit 312 samples the flow with 2s as a time unit, and obtains 5 sets of flow characteristic data.
  • sampling period 1 and sampling period 2 are consecutive in time.
  • two adjacent sampling periods may not be consecutive in time. In other words, two adjacent sampling periods may be separated by one or more time units.
  • the length of a time unit is 1s, and if two adjacent sampling periods are separated by 1 time unit, then the start time of sampling period 1 is 0s, and the end time is 1s; the start time of sampling period 2 is 2s, and the end time is 2s. The time is 3s; the start time of sampling period 3 is 4s, the end time is 5s, and so on.
  • the interval time of the first group of two adjacent sampling periods and the interval time of the second group of two adjacent sampling periods in the plurality of sampling periods may be the same or may be different.
  • the nth group of traffic characteristic data includes characteristic data of uplink traffic and/or characteristic data of downlink traffic.
  • Table 1 shows the characteristic data of possible upstream traffic and the characteristic data of downstream traffic.
  • the upstream traffic feature data included in each set of traffic feature data includes any one or more of the multiple upstream traffic features shown in Table 1.
  • the feature data of downlink traffic included in each group of traffic feature data includes any one or more of the features of multiple downlink traffic shown in the table.
  • the characteristic data of the uplink traffic and the characteristic data of the downlink traffic are corresponding. For example, if the feature data of the upstream traffic includes the features of the upstream traffic with the serial numbers 1, 3, 5, 6, and 8 in Table 1, then the feature data of the downstream traffic also includes the features of the upstream traffic with the serial numbers 1, 3, 5, 6, 6, 8 characteristics of downstream traffic.
  • each set of traffic feature data in the N groups of traffic feature data includes the same types of upstream traffic features and downstream traffic features. in other words. If the first group of traffic feature data in the N groups of traffic feature data includes the upstream traffic features with serial numbers 1, 3, 5, 6, and 8 in Table 1 and the upstream traffic features with serial numbers 1, 3, 5, 6, and 8 in Table 1 The characteristics of downlink traffic, then any group of traffic characteristic data in the second group to the Nth group of traffic characteristic data in the N groups of traffic characteristic data includes the uplinks with serial numbers 1, 3, 5, 6, and 8 in Table 1. The characteristics of the traffic and the characteristics of the downstream traffic with serial numbers 1, 3, 5, 6, and 8 in Table 1.
  • the quality judgment result obtained in step 202 in FIG. 2 indicates the quality of the video conference in the target period.
  • the pros and cons of the video conference quality include absolute pros and cons (for example, freeze/no freeze, freeze time exceeds a preset threshold/freeze time does not exceed a preset threshold) and/or relative advantages (such as resolution). drop/resolution not drop) and so on.
  • the video conference quality judgment result includes two categories: freeze and no freeze. In other words, according to the N groups of traffic characteristic data and the quality judgment model, it is determined whether the video conference in the target period will be stuck.
  • the video conference quality judgment result includes two categories: resolution reduction and resolution not decreasing. In other words, according to the N groups of traffic characteristic data and the quality judgment model, it is determined whether the resolution of the video conference in the target period is reduced.
  • the video conference quality judgment result includes two categories: the freeze time exceeds the preset threshold and the freeze time does not exceed the preset threshold.
  • the freeze time exceeds the preset threshold and the freeze time does not exceed the preset threshold.
  • the video conference quality judgment results include two categories: the broadcast start delay exceeds a preset threshold and the broadcast start delay does not exceed the preset threshold.
  • the N groups of traffic characteristic data and the quality judgment model it is determined whether the start-up delay of the video conference in the target time period exceeds a preset threshold.
  • the video conference quality judgment result includes any one or a combination of the above.
  • the video conferencing instruction judgment results include: freeze and no freeze, resolution reduction and resolution not reduced, freeze time exceeding the preset threshold and freeze time not exceeding the preset threshold, and start delay exceeding the preset threshold The preset threshold and the start-up delay do not exceed any or all of the preset thresholds.
  • the pros and cons of the video conference quality may also be expressed as the probability of occurrence of freeze, resolution reduction, and the like. For example, stuttering occurs at 90%, resolution drops at 85%, etc.
  • the technical solution of the present application will be introduced by taking the video conference quality judgment result as whether a freeze occurs as an example.
  • the implementation method of determining whether the video conference quality judgment result is other results is the same as determining whether the video conference screen is Caton occurs in the same or similar way.
  • the target period is later than the Nth sampling period of the N sampling periods.
  • the processing unit 311 predicts, according to N groups of traffic characteristic data corresponding to the N sampling periods before the target period, whether the video conference picture in the target period is stuck.
  • the video conference is temporally divided into multiple time windows, and each time window is divided into multiple sampling periods with the granularity of one time unit.
  • Figure 6 is a schematic diagram of a time window.
  • Figure 6 shows two time windows, time window 1 and time window 2, respectively.
  • Time window 1 takes 2s as a time unit and is divided into five sampling periods, namely sampling period 1_1 to sampling period 1_5;
  • time window 2 also takes 2s as a time unit and is divided into five sampling periods, which are sampling period 2_1 to sampling period 2_5.
  • the target period is a time window.
  • the N groups of flow characteristic data collected by the data collection unit 312 are flow characteristic data collected from the sampling period 1_1 to the sampling period 1_5 in the time window 1 .
  • the target period determined according to the N groups of traffic characteristic data is time window 2 .
  • the processing unit 311 determines whether the video conference in the time window 2 will freeze according to the quality judgment model and the traffic characteristic data collected by the data collection unit 312 from the sampling period 1_1 to the sampling period 1_5 in the time window 1.
  • the target period is one or more sampling periods.
  • the N groups of flow characteristic data collected by the data collection unit 312 are flow characteristic data collected from sampling period 1_1 to sampling period 1_5 in time window 1 .
  • the target period determined according to the N groups of flow characteristic data is the sampling period 2_1.
  • the processing unit 311 determines whether the video conference in the sampling period 2_1 will freeze according to the traffic characteristic data and the quality judgment model collected from the sampling period 1_1 to the sampling period 1_5 in the time window 1 by the data acquisition unit 312 .
  • the processing unit 311 determines whether the video conference in the sampling period 2_2 will freeze according to the traffic characteristic data and the quality judgment model collected by the data acquisition unit 312 from the sampling period 1_2 to the sampling period 2_1. Then, the processing unit 311 determines whether the video conference in the sampling period 2_3 will freeze according to the traffic characteristic data and the quality judgment model collected by the data acquisition unit 312 from the sampling period 1_3 to the sampling period 2_2, and so on.
  • the target period is the period after N groups of traffic characteristic data are collected.
  • the above-mentioned embodiment predicts whether the video conference will freeze in the future according to the current collected data. If it is determined that the video conference of the target time period will be stuck, then the stuck notification message is sent to the output device.
  • the output device may be the display screen 330 or the audio module 340 .
  • the display screen 330 can remind the user that the video conference is about to freeze through a pop-up window or other means. In this way, users can prepare for the impending stuttering. For example, the user can close some applications that consume a large amount of bandwidth to ensure that there is enough bandwidth for the video conference.
  • whether the video conference screen freezes is to determine whether the current video conference screen freezes.
  • the target period is the period during which the N groups of traffic characteristic data are collected. As described above, N groups of traffic characteristic data are acquired in N sampling periods respectively. Then, the start moment of the target period is the start moment of the first sampling period in the N sampling periods, and the end moment of the target period is the end moment of the N sampling periods.
  • the N groups of flow characteristic data are collected in five sampling periods included in time window 1 .
  • the target period is time window 1 .
  • the processing unit 311 determines whether the video conference in the time window 1 will freeze according to the traffic characteristic data and the quality judgment model collected from the sampling period 1_1 to the sampling period 1_5 in the time window 1 by the data acquisition unit 312 .
  • the freezing target period will be marked.
  • video conference service providers usually cannot obtain video conference images, so video conference service providers cannot timely and effectively determine the service quality of video conferences, and take proactive measures to reduce or avoid the possible deterioration of video conference quality. Caton and other phenomena occur.
  • the video conferencing service provider can determine whether the video conferencing screen freezes by using the traffic characteristic data, so as to determine whether network or video conferencing applications (application, APP) need to be optimized according to the freezing situation.
  • the following will be the first application scenario to predict whether the video conference will freeze in the future according to the currently collected data (that is, the target period is located after N sampling periods); the current video will be determined according to the currently collected data. Whether the meeting will freeze (that is, the target period overlaps with N sampling periods) is referred to as the second application scenario.
  • the quality judgment model used to determine the quality judgment results is trained from the video conferencing training dataset.
  • the video conferencing training data includes multiple training datasets and multiple label information.
  • Each training data set in the plurality of training data sets includes M groups of traffic feature data, where the M groups of traffic feature data are traffic feature data obtained in M sampling periods respectively.
  • the mth group of flow characteristic data in the M groups of flow characteristic data is acquired in the mth sampling period in the M sampling periods, where M is a positive integer greater than or equal to 1, and m is greater than or equal to 1 A positive integer less than or equal to M.
  • the video conference service provider or network equipment operator of the video conference for collecting training data and the video conference service provider or network operator that provides video conference 1 (that is, the video conference that needs to determine the quality of the target period) same.
  • the video conference and video conference 1 for collecting training data are provided by China Mobile.
  • the video conference service provider or network equipment operator for the video conference for collecting training data is different from the video conference service provider or network operator providing the video conference 1 .
  • the video conference for collecting training data is provided by China Unicom, while the video conference 1 is provided by China Mobile.
  • M is equal to N.
  • one time window includes M sampling periods, and the M groups of traffic feature data included in each training data set are acquired in M sampling periods within one time window.
  • the pieces of label information are in one-to-one correspondence with the plurality of training data sets, and each label information in the plurality of label information is used to indicate whether the video conference screen corresponding to the corresponding training data set is stuck.
  • the video conference screen corresponding to the training data set is the video conference screen in the reference time window including the M sampling periods.
  • the label information is automatically generated by computer equipment according to image data, or manually calibrated by observing images. The process of determining the label information according to the image data by the computer device will be described in detail in the following embodiments of the present application.
  • the relationship between the reference time window corresponding to each training data set and the M sampling periods for obtaining the M groups of traffic characteristic data included in each training data set, and the relationship between the target period and the N sampling periods The relationship is the same.
  • the training data set 1 includes five sets of traffic characteristic data collected during the five sampling periods included in the time window 1, then the reference time window corresponding to the training data set 1 is the time window 2.
  • the reference time window corresponding to each training data set is the M sampling periods for obtaining M groups of traffic characteristic data included in each training data set.
  • the training data set 1 includes five sets of traffic characteristic data collected during the five sampling periods included in the time window 1, then the reference time window corresponding to the training data set 1 is the time window 1.
  • the reference time window can be divided into multiple sampling periods, if any one or more of the multiple sampling periods If the video conference screen in the sampling period is stuck, then the video conference screen of the reference time window is considered to be stuck; if the video conference in any one of the multiple sampling periods is not stuck, then it is considered that the video conference freezes. The video conference screen of the reference time window did not freeze.
  • the method of determining whether the video conference in the sampling period is stuck is the same for the first application scenario and the second application scenario.
  • the following uses the second application scenario as an example to explain how to determine whether the video conference screen in a sampling period occurs. Caton introduced.
  • the first training data set is any one of the plurality of training data sets
  • the first label information is label information corresponding to the first training data set.
  • the first label information is used to indicate whether the video conference screen corresponding to the first training data set is stuck.
  • the M groups of traffic feature data included in the first training data set are acquired in M sampling periods. Since it is the second application scenario, the reference time window is the M sampling periods. In this case, if the video conferencing screen freezes in any one of the M sampling periods, the first tag information indicates that the video conference screen of the reference time window corresponding to the first training data set freezes. If the video conference picture in any one of the M sampling periods does not freeze, the first tag information indicates that the video conference picture of the reference time window corresponding to the first training data set does not freeze.
  • the training data set 1 includes five groups of traffic characteristic data collected during the five sampling periods included in the time window 1, and the training data set 2 includes five groups collected during the five sampling periods included in the time window 2.
  • Label information 1 is label information corresponding to training data set 1
  • label information 2 is label information corresponding to training data set 2 .
  • the conference screen ie, the video conference screen of time window 1 freezes
  • the tag information 2 indicates that the video conference screen corresponding to the training data set (ie, the video conference screen of time window 2) does not freeze.
  • Whether the video conferencing image freezes in a sampling period is determined according to the number of consecutive images with the same image information in the sampling period.
  • the computer vision method is used to extract each frame of image in the video conference playback process, and the timestamp of each frame of image can be obtained at the same time.
  • the image information of each frame of image is determined according to the acquired image.
  • the number of consecutive images with the same image information in a sampling period is greater than or equal to a preset number threshold, it is determined that the video conference screen in this sampling period freezes; if the number of consecutive images with the same image information in a sampling period is less than the If the preset number threshold is set, it is determined that the video conference picture in the sampling period does not freeze.
  • the preset quantity threshold is determined according to the following formula:
  • Th represents the preset number threshold
  • Std represents a predefined video freeze standard
  • t represents the duration of a single frame of image.
  • the rounding method in Equation 1 is round up. This rounding is an example of rounding. In other embodiments, the rounding of Std/t may also be other rounding methods such as rounding down, rounding to the nearest integer, and the like.
  • the image information of the two frames of images is the same, including the same quality parameter values of the partial pictures of the two frames of images.
  • the same image information of the two frames of images includes the same quality parameter values of all pictures of the two frames of images.
  • the quality parameter value of the image is determined according to the Laplacian operator, the Brenner gradient function or the Tenengrad gradient function.
  • FIG. 7 is a schematic diagram of a video conference screen.
  • the video conference screen 700 shown in FIG. 7 includes four parts: the first part 701 is the image captured by the camera of user 1, the second part 702 is the image captured by user 2, and the third part 703 is the image captured by the camera of user 3 In the captured picture, the fourth part 704 is the desktop shared by the user 1 .
  • the video conference screen of user 1 is divided into four parts, and the quality parameter value of each part is determined. If the quality parameter values of two adjacent frames of a part are the same, the image information of the two frames is considered to be the same. For example, assuming that the quality parameter value of the first part of the third frame of the video conference is the same as the command parameter value of the first part of the fourth frame of the video conference, then it is considered that the third frame of the video conference is the same as the fourth frame of the video conference. Image information is the same. In other words, when the video conference screen includes multiple parts, as long as the screen of at least one part of the multiple parts is frozen, it is considered that the video conference is frozen.
  • the video conference picture when the video conference picture includes multiple parts, the video conference picture is not divided, but the quality parameter value of the video conference picture is determined by taking the video conference picture as a whole. In this case, even if one of the multiple parts included in the video conference is stuck, it is considered that the video conference is not stuck.
  • the meeting screen of a video conference includes visual elements that change over time.
  • a participant in a video conference will turn on a local camera (for example, a camera built into the terminal device or a camera external to the terminal device), and the local camera generally shoots the participant's head or upper body.
  • the head or upper body of the attendee generally does not remain stationary.
  • the picture captured by the local camera is a visual element that changes over time.
  • a participant in a video conference may play a video while sharing the desktop, and the picture of this video is a visual element that changes over time.
  • the video conference screen may display the duration of the conference recorded through a scrolling timeline or a timer.
  • a time-varying scrolling timeline or timer is a time-varying visual element.
  • a certain place eg, the lower right corner, the upper left corner
  • a certain place of the video conference screen may display a constantly changing image (eg, a graphics interchange format (gif) image). Then that gif is a visual element that changes over time.
  • a constantly changing image eg, a graphics interchange format (gif) image.
  • gif is a visual element that changes over time.
  • FIG. 8 is a method for training a model according to an embodiment of the present application.
  • the method shown in FIG. 8 is implemented by a computer device or a component (eg, a chip, etc.) in the computer device.
  • the first training data set in the multiple training data sets includes M groups of feature data, and the M groups of feature data are respectively used in M sampling periods for the second video conference.
  • the mth group of characteristic data in the M groups of characteristic data includes the characteristic data of the mth sampling period of the traffic of the second video conference in the M sampling periods, the plurality of tag information
  • the first label information in is used to indicate the quality of the video conference picture corresponding to the first training data set, the first training data set is any one of the multiple training data sets, and M is greater than or equal to 1.
  • Positive integer, m is a positive integer greater than or equal to 1 and less than or equal to M.
  • CNN convolutional neural network
  • CNN is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • a deep learning architecture refers to multiple levels of learning at different levels of abstraction through machine learning algorithms.
  • a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
  • a convolutional neural network (CNN) 900 includes an input layer 910 , a convolutional layer 920 , a pooling layer 930 and a neural network layer 940 .
  • Each input training data set in the input layer 910 is all traffic feature data included in a training data set.
  • each of the first convolutional layer and the second convolutional layer includes two convolutional layers 921, the third convolutional layer and the third convolutional layer
  • Each of the four sets of convolutional layers includes three convolutional layers 922
  • each of the fifth and sixth sets of convolutional layers includes four convolutional layers 923 .
  • the convolutional layer 921 includes a length-2 convolution operator.
  • Convolutional layer 922 includes a length-3 convolution operator.
  • the convolutional layer 923 includes a length-4 convolution operator.
  • the convolution operator is also called a kernel, and its role is equivalent to a filter that extracts specific information from the input traffic characteristic data.
  • the convolution operator is essentially a weight matrix, which is usually pre-defined. In the process of convolution operation on the traffic characteristic data of , the weight matrix is processed one traffic characteristic after another along the horizontal direction on the input traffic characteristic data, so as to complete the work of extracting specific features from the traffic characteristic data.
  • the size of the weight matrix should be related to the size of the traffic feature data. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input traffic feature data. During the convolution operation, The weight matrix extends to the full depth of the input traffic feature data. Therefore, convolution with a single weight matrix produces a single depth dimension convolutional output. But in most cases a single weight matrix is not used, instead multiple weight matrices of the same dimension are applied. The output of each weight matrix is stacked to form the depth dimension of the convolutional data. Different weight matrices are used to extract different features in the traffic feature data. The multiple weight matrices have the same dimension, and the feature maps extracted from the multiple weight matrices with the same dimension have the same dimension, and then the multiple extracted feature maps with the same dimension are combined to form the output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training extracts information from the input traffic characteristic data, thereby helping the convolutional neural network 900 to make correct predictions .
  • the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network 900 increases As it deepens, the features extracted by the later convolutional layers are more and more complex, so they are more suitable for the problem to be solved.
  • each of the first set of convolutional layers and the second set of convolutional layers includes two convolutional layers 921 .
  • each of the first set of convolutional layers and the second set of convolutional layers uses two weight matrices to extract features from the input traffic feature data.
  • Each of the third set of convolutional layers and the fourth set of convolutional layers includes three convolutional layers 922 .
  • each of the third set of convolutional layers and the fourth set of convolutional layers uses three weight matrices to extract features of the input traffic feature data.
  • Each of the fifth set of convolutional layers and the sixth set of convolutional layers includes four convolutional layers 923 . In other words, each of the fifth set of convolutional layers and the sixth set of convolutional layers uses four weight matrices to extract features from the input traffic feature data.
  • the output of the first set of convolutional layers, the output of the second set of convolutional layers, the output of the third set of convolutional layers, the output of the fourth set of convolutional layers, the output of the fifth set of convolutional layers, and the output of the sixth set of convolutional layers are combined as the input to the pooling layer 930.
  • pooling layer is often introduced periodically after the convolutional layer.
  • the pooling layer makes the feature map output by the convolutional layer smaller, which simplifies the computational complexity of the network, reduces the parameters and computation of the next layer, and prevents overfitting.
  • the convolutional neural network 900 After being processed by the convolution layer 920 and the pooling layer 930, the convolutional neural network 900 is not enough to output the required output information. Because as mentioned above, the convolutional layer 920 and the pooling layer 930 only extract features and reduce the parameters brought by the input traffic feature data. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 900 needs to utilize the neural network layer 940 to generate one or a set of outputs of the required number of classes. Therefore, a fully connected layer (FC) 941 and a normalized exponential function (softmax function) layer 942 may be included in the neural network layer 940 . The fully connected layer 941 acts to map the learned feature representations to the label space of the samples.
  • FC fully connected layer
  • softmax function normalized exponential function
  • the classifier can be implemented by the softmax function.
  • the convolutional neural network 900 shown in FIG. 9 is only used as an example of a convolutional neural network.
  • the convolutional neural network may also exist in the form of other network models.
  • each The number of convolutional layers included in the group convolutional layer can be different from that shown in FIG. 9 .
  • the quality judgment model can also be trained with a recurrent neural network.
  • FIG. 10 is a schematic diagram of a recurrent neural network.
  • the recurrent neural network 1000 shown in FIG. 10 includes an FC 1001, a softmax function layer 1002 and a first part of long-short term memory (LSTM) neurons (cells) and a second part of LSTM cells, the first part
  • LSTM long-short term memory
  • Each of the LSTM cells and the second part LSTM cells includes LSTM cells 1003-1012.
  • the recurrent neural network shown in FIG. 10 is based on the traffic characteristic data of the time window and sampling period shown in FIG. 6 .
  • Data1_1 represents a set of traffic characteristic data collected in sampling period 1_1
  • Data1_2 represents a set of flow characteristic data collected in sampling period 1_2, and so on.
  • the five groups of traffic feature data collected in a time window are respectively input into five LSTM cells.
  • a set of traffic characteristic data Data1_1 collected in sampling period 1_1 is input to LSTM cell 1003
  • a set of traffic characteristic data Data1_2 collected in sampling period 1_2 is input to LSTM cell 1005.
  • Fig. 10 The arrows shown in Fig. 10 indicate the flow of data.
  • the output data of LSTM cell 1003 is sent to LSTM cell 1004 and LSTM 1005, and the output data of LSTM cell 1004 is sent to LSTM cell 1006.
  • the LSTM cell 1012 of the first part of the LSTM cell outputs the output result of the first part of the LSTM cell.
  • the way the second part of LSTM cells processes data is similar to the way that the first part of LSTM cells processes data, the difference is that the LSTM cells 1004 in the second part of LSTM cells output the output results of the second part of LSTM cells.
  • FC1001 The output results of the first part of LSTM cells and the output results of the second part of LSTM cells are spliced and input to FC1001, and then input to sfotmax function layer 1002.
  • FC1001 The functions of the FC 1001 and the softmax function layer 1002 are the same as those in the CNN shown in FIG. 9 , and are not repeated here for brevity.
  • FIG. 11 is a structural diagram of a chip hardware provided by an embodiment of the present invention.
  • the neural network-based algorithms shown in FIGS. 9 and 10 may be implemented in a neural network processing unit (NPU) 1100 shown in FIG. 11 .
  • NPU neural network processing unit
  • the NPU 1100 can be mounted on the main CPU (Host CPU) as a co-processor, and tasks are assigned by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1103, which is controlled by the controller 1104 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 1103 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 1103 is a two-dimensional systolic array. The arithmetic circuit 1103 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1103 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1102 and buffers it on each PE in the arithmetic circuit.
  • the operation circuit fetches the data of matrix A and matrix B from the input memory 1101 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1108 .
  • Unified memory 1106 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 1102 through a direct memory access controller (DMAC) 1105 .
  • DMAC direct memory access controller
  • Input data is also moved to unified memory 1106 via the DMAC.
  • the bus interface unit 1110 (bus interface unit, BIU) is used for the instruction fetch memory 1109 to acquire instructions from the external memory, and also for the storage unit access controller 1105 to acquire the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory to the unified memory 1106 , the weight data to the weight memory 1102 , or the input data to the input memory 1101 .
  • the vector calculation unit 1107 has multiple operation processing units, and if necessary, further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/FC layer network computation in neural networks, such as Pooling, Batch Normalization, Local Response Normalization, etc.
  • the vector computation unit 1107 can store the processed output vectors to the unified buffer 1106 .
  • the vector calculation unit 1107 may apply a nonlinear function to the output of the arithmetic circuit 1103, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 1107 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 1103, eg, for use in subsequent layers in a neural network.
  • An instruction fetch buffer 1109 connected to the controller 1104 is used to store the instructions used by the controller 1104 .
  • the unified memory 1106, the input memory 1101, the weight memory 1102 and the instruction fetch memory 1109 are all on-chip memories. External memory is private to the NPU hardware architecture.
  • An embodiment of the present application further provides a computer device, where the computer device includes a chip as shown in FIG. 11 and a memory.
  • FIG. 12 is a schematic structural block diagram of a computer device provided according to an embodiment of the present application.
  • the computer device shown in FIG. 12 may be a computer device for performing the method shown in FIG. 2 .
  • the computer device 1200 shown in FIG. 12 includes a data acquisition unit 1201 and a processing unit 1202 .
  • the data collection unit 1201 is configured to collect data on the traffic of the first video conference in N sampling periods to obtain N groups of characteristic data.
  • the processing unit 1202 is configured to input the N groups of characteristic data into a quality judgment model to obtain a quality judgment result of the first video conference in the target period.
  • FIG. 13 is a schematic structural block diagram of another computer device provided according to an embodiment of the present application.
  • the computer device shown in FIG. 13 may be a computer device for performing the method shown in FIG. 4 .
  • the computer device 1300 shown in FIG. 13 includes a data acquisition unit 1301 and a processing unit 1302 .
  • the data acquisition unit 1301 is configured to acquire multiple training data sets and multiple label information.
  • the processing unit 1302 is configured to train a quality judgment model according to the multiple training data sets and the multiple label information.
  • FIG. 14 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the data collection device 1410 in the system architecture 1400 is used to collect video conference training data and store it in the database 1420.
  • the data acquisition unit 312 in FIG. 3 For the working principle of the data acquisition device 1410, refer to the data acquisition unit 312 in FIG. 3 .
  • the training device 1430 generates a quality judgment model based on the video conference training data set maintained in the database 1420 .
  • the execution device 1440 processes the traffic characteristic data collected by the quality judgment model to obtain a final judgment result.
  • the execution device 1440 may be a computer device as shown in FIG. 3 or 10
  • the training device 1430 may be a computer device as shown in FIG. 13 or a computer device including a chip as shown in FIG. 11 .
  • the specific functions of the execution device 1440 and the training device 1430 reference may be made to the foregoing embodiments, which are not repeated here for brevity.
  • the present application also provides a computer program product, the computer program product includes: computer program code, when the computer program code is run on a computer, the computer is made to execute any one of the above-mentioned embodiments. Methods of Examples.
  • the present application further provides a computer-readable medium, where program codes are stored in the computer-readable medium, and when the program codes are run on a computer, the computer is made to execute any one of the foregoing embodiments.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请提供一种确定视频会议质量的方法和相关装置,该方法包括将在N个采样时段获取到的N组流量特征数据输入质量判断模型,得到目标时段的视频会议质量判断结果。上述技术方案可以根据网络中的流量特征数据判断视频会议的质量。这样,即使没有会议画面,也可以确定视频会议过程中的视频会议画面质量,从而方便视频会议提供商对视频会议服务进行优化。

Description

确定视频会议质量的方法、相关装置和系统
本申请要求于2021年6月11日提交中国专利局、申请号为202110654936.6、申请名称为“确定视频会议质量的方法、相关装置和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请要求于2021年4月01日提交中国专利局、申请号为202110355932.8、申请名称为“预测视频会议卡顿的方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频会议技术领域,进一步涉及人工智能(Artificial Intelligence,AI)技术在视频会议领域中的应用,尤其涉及确定视频会议质量的方法、相关装置和系统。
背景技术
视频会议是当前互联网上最炙手可热的应用之一。但是视频会议经常会出现以卡顿为代表的视频会议质量降低的现象。这些现象会带来较差的用户体验。
以卡顿为例,通常是视频会议画面已经发生卡顿后,视频会议用户才会手动排查自身所在网络的连接状况。这一现状给视频会议用户造成了负担,降低了用户体验。
除此之外,视频会议用户通常不会记录发生卡顿的具体时间。因此,即使视频会议用户向视频会议服务提供商投诉视频会议发生卡顿现象,视频会议服务提供商业也无法确定发生卡顿的具体时间,这对确定发生卡顿的原因造成了困难。
发明内容
本申请提供一种确定视频会议质量的方法、相关装置和系统,可以根据网络中的流量数据确定视频会议的质量。
第一方面,本申请实施例提供一种确定视频会议质量的方法,包括:在N个采样时段对第一视频会议的流量进行数据采集,得到N组流量特征数据,该N组流量特征数据中的第n组流量特征数据是在该N个采样时段中的第n个采样时段采集的,N为大于或等于1的正整数,n为大于或等于1且小于或等于N的正整数;将该N组流量特征数据输入质量判断模型,得到该第一视频会议在目标时段的质量判断结果,该质量判断结果指示该目标时段的视频会议质量的优劣,其中该质量判断模型是根据视频会议训练数据训练得到的,其中该目标时段不早于该N个采样时段。
上述技术方案使得计算机设备能够根据网络中的流量特征数据判断视频会议的质量。这样,即使没有会议画面,也可以确定视频会议过程中的质量,从而便于视频会议提供商根据视频会议质量优化视频会议应用或者用于提供视频会议服务的网络设备(例如服务器)等。
结合第一方面,在第一方面的一种可能的实现方式中,该第n组流量特征数据包括上行流量的特征数据,和/或,下行流量的特征数据。
结合第一方面,在第一方面的一种可能的实现方式中,该第n组流量特征数据包括的上行流量的特征数据包括以下数据中的任一个或多个:该第n个采样时段的上行数据包个数;该第n个采样时段上行上传总字节数;该第n个采样时段上行包大小的最大值;该第n个采样时段上行包大小的平均值;该第n个采样时段上行包大小的方差;该第n个采样时段上行包间隔的最大值;该第n个采样时段上行包间隔的平均值;该第n个采样时段上行包间隔的方差;该第n个采样时段上行丢包率;该第n个采样时段上行最大连续丢包数;该第n个采样时段上行的字节指标。
结合第一方面,在第一方面的一种可能的实现方式中,该第n组流量特征数据包括的下行流量的特征数据包括一下数据中的任一个或多个:该第n个采样时段的下行数据包个数;该第n个采样时段下行下载总字节数;该第n个采样时段下行包大小的最大值;该第n个采样时段下行包大小的平均值;该第n个采样时段下行包大小的方差;该第n个采样时段下行包间隔的最大值;该第n个采样时段下行包间隔的平均值;该第n个采样时段下行包间隔的方差;该第n个采样时段下行丢包率;该第n个采样时段下行最大连续丢包数;该第n个采样时段下行的字节指标。
结合第一方面,在第一方面的一种可能的实现方式中,该N个采样时段在时间上是连续的。
结合第一方面,在第一方面的一种可能的实现方式中,该N个采样时段的时间长度之和与该目标时段的时间长度相同。
结合第一方面,在第一方面的一种可能的实现方式中,该N个采样时段中的第N个采样时段在该目标时段之前且与该目标时段在时间上是连续的,或者,该N个采样时段中第一个采样时段的起始时刻为该目标时段的起始时刻,该N个采样时段中的第N个采样时段的结束适合为该目标时段的结束时刻。
如果N个采样时段中的第N个采样时段是在目标时段之前,那么计算机设备就可以提前预测未来的视频会议质量。如果未来视频会议画面出现问题(例如卡顿、分辨率变低等情况),那么计算机设备可以提前通知用户。用户可以预先知道即将到来的视频会议画面质量下降,以便于根据需要选择应对方案,例如关掉其他占用带宽的应用、切换接入网络的方式等。
结合第一方面,在第一方面的一种可能的实现方式中,该视频会议训练数据包括多个训练数据集和多个标签信息,其中该多个训练数据集中的第一训练数据集包括M组流量特征数据,该M组流量特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,该多个标签信息中的第一标签信息用于指示该第一训练数据集对应的视频会议画面是否发生卡顿,该第一训练数据集为该多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数;该质量判断模型是根据该多个训练数据集和该多个标签信息训练得到的。
结合第一方面,在第一方面的一种可能的实现方式中,若该M个采样时段中的任一个采样时段的该第二视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则该第一标签信息用于指示该第一训练数据集对应的视频会议画面发生卡 顿;若该M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于该预设数量阈值,则该第一标签信息用于指示该第一训练数据集对应视频会议画面没有发生卡顿。
结合第一方面,在第一方面的一种可能的实现方式中,该预设数量阈值是根据以下公式确定的:
Figure PCTCN2021133105-appb-000001
其中,Th表示该预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
结合第一方面,在第一方面的一种可能的实现方式中,两帧图像的图像信息相同包括该两帧图像部分或全部画面的质量参数值相同,该质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
结合第一方面,在第一方面的一种可能的实现方式中,该多个采样时段中的每个采样时段的视频会议画面包括随时间变化的元素。
随时间变化的元素可以包括摄像头采集到的画面,随时间变化的元素还可以包括滚动时间轴、计时器或者GIF图。这样,如果用户没有启用摄像头,且视频会议画面停留在某一个固定的画面(例如在文档中的某一页停留较长时间),那么可以根据滚动时间轴、计时器或者GIF图等随时间变化的元素确定视频会议画面是否发生卡顿或者质量下降等情况。
第二方面,本申请实施例提供一种训练模型的方法,该方法包括:获取多个训练数据集和多个标签信息,其中该多个训练数据集中的第一训练数据集包括M组特征数据,该M组特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,该M组特征数据中的第m组特征数据包括该第二视频会议的流量在该M个采样时段中的第m个采样时段的特征数据,该多个标签信息中的第一标签信息用于指示该第一训练数据集对应的视频会议画面的质量,该第一训练数据集为该多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数,m为大于或等于1且小于或等于M的正整数;根据该多个训练数据集和该多个标签信息训练得到质量判断模型。
上述技术方案提供了一种确定质量判断模型的方法,利用上述方法确定的质量判断模型,有助于确定目标时段的视频会议质量的优劣。
结合第二方面,在第二方面的一种可能的实现方式中,若该M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则该第一标签信息用于指示该第一训练数据集对应的视频会议画面发生卡顿;若该M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于该预设数量阈值,则该第一标签信息用于指示该第一训练数据集对应的视频会议画面没有发生卡顿。
结合第二方面,在第二方面的一种可能的实现方式中,该预设数量阈值是根据以下公式确定的:
Figure PCTCN2021133105-appb-000002
其中,Th表示该预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
结合第二方面,在第二方面的一种可能的实现方式中,两帧图像的图像信息相同包括该两帧图像部分或全部画面的质量参数值相同,该质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
结合第二方面,在第二方面的一种可能的实现方式中,该多个采样时段中的每个采样 时段的视频会议画面包括随时间变化的视觉视觉元素。
随时间变化的视觉元素可以包括摄像头采集到的画面,随时间变化的元素还可以包括滚动时间轴、计时器或者GIF图。这样,如果用户没有启用摄像头,且视频会议画面停留在某一个固定的画面(例如在文档中的某一页停留较长时间),那么可以根据滚动时间轴、计时器或者GIF图等随时间变化的元素确定视频会议画面是否发生卡顿或者质量下降等情况。
第三方面,本申请实施例提供一种计算机设备,该计算机设备包括用于实现第一方面或第一方面的任一种可能的实现方式的单元。
第四方面,本申请实施例提供一种计算机设备,该计算机设备包括用于实现第二方面或第二方面的任一种可能的实现方式的单元。
第五方面,本申请实施例提供一种计算机设备,该计算机设备包括处理器,该处理器用于与存储器耦合,读取并执行该存储器中的指令和/或程序代码,以执行第一方面或第一方面的任一种可能的实现方式。
第六方面,本申请实施例提供一种计算机设备,该计算机设备包括处理器,该处理器用于与存储器耦合,读取并执行该存储器中的指令和/或程序代码,以执行第二方面或第二方面的任一种可能的实现方式。
第七方面,本申请实施例提供一种芯片系统,该芯片系统包括逻辑电路,该逻辑电路用于与输入/输出接口耦合,通过该输入/输出接口传输数据,以执行第一方面或第一方面任一种可能的实现方式。
第八方面,本申请实施例提供一种芯片系统,该芯片系统包括逻辑电路,该逻辑电路用于与输入/输出接口耦合,通过该输入/输出接口传输数据,以执行第二方面或第二方面任一种可能的实现方式。
第九方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该计算机存储介质在计算机上运行时,使得计算机执行如第一方面或第一方面的任一种可能的实现方式。
第十方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该计算机存储介质在计算机上运行时,使得计算机执行如第二方面或第二方面的任一种可能的实现方式。
第十一方面,本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行如第一方面或第一方面的任一种可能的实现方式。
第十二方面,本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行如第二方面或第二方面的任一种可能的实现方式。
附图说明
图1是本申请实施例的应用场景示意图。
图2是根据本申请实施例提供的一种确定视频会议质量的方法。
图3是根据本申请实施例提供的一种计算机设备的示意性结构框图。
图4是一个采样时段的示意图。
图5示出了另一个采样时段的示意图。
图6是时间窗口的示意图。
图7是一个视频会议画面的示意图。
图8是根据本申请实施例提供的一种训练模型的方法。
图9是一个卷积神经网络的示意图。
图10是一个循环神经网络的示意图。
图11是本发明实施例提供的一种芯片硬件结构图。
图12是根据本申请实施例提供的一种计算机设备的示意性结构框图。
图13是根据本申请实施例提供的另一种计算机设备的示意性结构框图。
图14是本申请实施例提供的一种系统架构的示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请将围绕可包括多个设备、组件、模块等的系统来呈现各个方面、实施例或特征。应当理解和明白的是,各个系统可以包括另外的设备、组件、模块等,并且/或者可以并不包括结合附图讨论的所有设备、组件、模块等。此外,还可以使用这些方案的组合。
另外,在本申请实施例中,“示例的”、“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用示例的一词旨在以具体方式呈现概念。
本申请实施例中,“相应的(corresponding,relevant)”和“对应的(corresponding)”有时可以混用,应当指出的是,在不强调其区别时,其所要表达的含义是一致的。
本申请实施例中,有时候下标如W1可能会笔误为非下标的形式如W1,在不强调其区别时,其所要表达的含义是一致的。
本申请实施例描述的网络架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
图1是本申请实施例的应用场景示意图。如图1所示,终端设备1、终端设备2和终端设备3安装有视频会议应用客户端。用户1通过终端设备1与用户2和用户3进行视频会议。计算机设备获取视频会议流量的流量特征数据,其中流量特征数据是在采样时段对终端设备之间进行的一个视频会议产生的流量进行采样得到的,并通过对流量特征数据进 行分析得到目标时段上述视频会议的质量判断结果。
图2是根据本申请实施例提供的一种确定视频会议质量的方法的示意性流程图。可选地,图2所示的方法由计算机设备执行,或者由计算机设备中的部件(例如芯片)实行。
201,在N个采样时段对第一视频会议的流量进行数据采集,得到N组流量特征数据,该N组流量特征数据中的第n组流量特征数据是在该N个采样时段中的第n个采样时段采集的,N为大于或等于1的正整数,n为大于或等于1且小于或等于N的正整数。
202,将该N组流量特征数据输入质量判断模型,得到该第一视频会议在目标时段的质量判断结果,该质量判断结果指示目标时段的视频会议质量的优劣,其中该质量判断模型是根据视频会议训练数据训练得到的,其中该目标时段不早于该N个采样时段。
如图2所示的技术方案能够根据网络中的流量特征数据判断视频会议的质量。这样,即使没有会议画面,也可以确定视频会议过程中的质量,从而便于视频会议提供商根据视频会议质量优化视频会议应用或者用于提供视频会议服务的网络设备(例如服务器)等。
图3是根据本申请实施例提供的计算机设备的示意性结构框图。图3所示的计算机设备包括处理器310、存储器350、通信接口360。可选地,如图3所示的计算机设备300还包括摄像头320、显示屏330、音频模块340
可以理解的是,本申请实施例示意的结构并不构成对计算机设备300的具体限定。可选地,在本申请另一些实施例中,计算机设备300包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器310可以包括一个或多个处理单元,例如:处理器310可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的部件,也可以集成在一个或多个处理器中。在一些实施例中,计算机设备300也可以包括一个或多个处理器310。其中,控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。在其他一些实施例中,处理器310中还可以设置存储器,用于存储指令和数据。示例性地,处理器310中的存储器可以为高速缓冲存储器。该存储器可以保存处理器310刚用过或循环使用的指令或数据。如果处理器310需要再次使用该指令或数据,可从所述存储器中直接调用。这样就避免了重复存取,减少了处理器310的等待时间,因而提高了计算机设备301处理数据或执行指令的效率。
在一些实施例中,处理器310可以包括一个或多个接口。接口可以包括集成电路间(inter-integrated circuit,I2C)接口、集成电路间音频(inter-integrated circuit sound,I2S)接口、脉冲编码调制(pulse code modulation,PCM)接口、通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口、移动产业处理器接口(mobile industry processor interface,MIPI)、用输入输出(general-purpose input/output,GPIO)接口等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对计算机设备300的结构限定。在本申请另一些实施例中,计算机设备300也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
计算机设备300的通信功能通过通信接口360实现。通信接口360可以提供应用在计算机设备300上的无线通信,和/或,有线通信的解决方案。可选地,通信接口360是有线接口,例如光纤分布式数据接口(Fiber Distributed Data Interface,FDDI)、千兆以太网(Gigabit Ethernet,GE)接口。可替换地,网络接口360也可以是提供2G/3G/4G/5G/无线局域网(wireless local area networks,WLAN)等无线通信功能的无线接口。
计算机设备300通过GPU,显示屏330,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏330和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器310可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏330用于显示图像、视频等。显示屏330包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、Miniled、MicroLed、Micro-oLed、量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,计算机设备300可以包括1个或多个显示屏330。
计算机设备300可以通过ISP、摄像头320、视频编解码器、GPU、显示屏330以及应用处理器等实现拍摄功能。
ISP用于处理摄像头320反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点、亮度、肤色进行算法优化。ISP还可以对拍摄场景的曝光、色温等参数优化。在一些实施例中,ISP可以设置在摄像头320中。
摄像头320用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,计算机设备300可以包括1个或多个摄像头320。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当计算机设备300在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。计算机设备300可以支持一种或多种视频编解码器。这样,计算机设备300可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3、MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现计算机设备300的智能认知等应用。
存储器350可以用于存储一个或多个计算机程序,该一个或多个计算机程序包括指令。存储器350还可以用于存储训练好的质量判断模型。处理器310可以通过运行存储在 存储器350的上述指令,从而使得计算机设备300执行本申请一些实施例中所提供的确定视频会议质量的方法,以及各种应用以及数据处理等。存储器350可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统;该存储程序区还可以存储一个或多个应用等。存储数据区可存储计算机设备300使用过程中所创建的数据(比如会议视频等)等。此外,存储器350可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储部件,闪存部件,通用闪存存储器(universal flash storage,UFS)等。在一些实施例中,处理器310可以通过运行存储在存储器350的指令,和/或存储在设置于处理器310中的存储器的指令,来使得计算机设备300执行本申请实施例中所提供的确定视频会议质量的方法,以及其他应用及数据处理。计算机设备300可以通过音频模块340以及应用处理器等实现音频功能。例如视频会议的声音播放、视频会议的收音等。
如图3所示,处理器310读取存储器中存储的指令后生成处理单元311和数据采集单元312。数据采集单元312对其他外部设备提供的视频会议的流量进行数据采样,从而得到附图2步骤201中的N组流量特征数据。
可替换地,附图3中的数据采集单元312也可以由一个独立于处理器310的专用芯片实现。
此外,外部设备识别出视频会议的流量后将视频会议的流量提供给附图3所示的计算机设备300,或者外部设备将混杂各种业务报文的流量提供给附图3所示的计算机设备300,由计算机设备300从混杂流量中识别出视频会议的流量。
可选地,外部设备或者计算机设备300根据五元组确定终端设备之间进行视频会议业务时产生的流量。可选地,外部设备是部署于视频会议流量转发路径上的独立的硬件设备,也可以是安装于视频会议应用客户端所在的终端设备上的代理插件。独立硬件设备包括但不限于路由器、网关等流量转发设备。一般情况下,外部设备根据五元组信息中的源互联网协议(internet protocol,IP)地址、目的端口号和传输层协议从多个并发的流量中捕获视频会议的流量。可选的,该N个采样时段在时间上是连续的。
数据采集单元312对外部设备捕获的会议视频的流量进行数据采集,得到N组流量特征数据。处理单元311将该N组流量特征数据输入质量判断模型,得到该第一视频会议在目标时段的质量判断结果。
下面将结合图4、图5对上述图2中提及的采样时段进行举例说明。图4是采样时段的一个示意图。图4示出了9个采样时段,分别为采样时段1至采样时段9,其中采样时段1的起始时刻为0s,结束时刻为1s;采样时段2的起始时刻为1s,结束时刻为2s。换句话说,在图4所示的实例中,数据采集单元312以1s为一个时间单位对流量进行采样,得到9组流量特征数据。
可选地,采样时段的时间单位根据需要设置。例如,在图4中,一个时间单位的长度为1s。图5示出了另一个采样时段的示意图。在图5中,数据采集单元312以2s为一个时间单位对流量进行采样,得到5组流量特征数据。
在图4和图5所示的实例中,两个相邻采样时段(例如采样时段1和采样时段2、采样时段2和采样时段3)在时间上是连续的。
在另一些实施例中,两个相邻的采样时段在时间上也可以不是连续的。换句话说,相邻的两个采样时段可以间隔一个或多个时间单位。
还假设一个时间单位长度为1s,假设两个相邻的采样时段间隔1个时间单位,那么采样时段1的起始时刻为0s,结束时刻为1s;采样时段2的起始时刻为2s,结束时刻为3s;采样时段3的起始时刻为4s,结束时刻为5s,以此类推。此外,在多个采样时间段中的第一组两个相邻的采样时段的间隔时间与第二组两个相邻的采样时段的间隔时间可能是相同的,也可能是不同的。
可选的,在一些实施例中,该第n组流量特征数据包括上行流量的特征数据,和/或,下行流量的特征数据。
表1示出了可能的上行流量的特征数据和下行流量的特征数据。可选地,每组流量特征数据包括的上行流量的特征数据包括表1所示的多个上行流量的特征中的任一个或多个。类似的,可选地,每组流量特征数据包括的下行流量的特征数据包括表所示的多个下行流量的特征中的任一个或多个。
表1
序号 上行流量的特征 下行流量的特征
1 上行数据包个数 下行数据包个数
2 上行上传总字节数 下行下载总字节数
3 上行包大小的最大值 下行包大小的最大值
4 上行包大小的平均值 下行包大小的平均值
5 上行包大小的方差 下行包大小的方差
6 上行包间隔的最大值 下行包间隔的最大值
7 上行包间隔的平均值 下行包间隔的平均值
8 上行包间隔的方差 下行包间隔的方差
9 上行丢包率 下行丢包率
10 上行最大连续丢包数 下行最大连续丢包数
11 上行的字节指标 下行的字节指标
在一些实施例中,上行流量的特征数据和下行流量特的征数据是对应的。例如,如果上行流量的特征数据包括表1中序号为1、3、5、6、8的上行流量的特征,那么下行流量的特征数据也包括表1中序号为1、3、5、6、8的下行流量的特征。
在一些实施例中,N组流量特征数据中的每组流量特征数据包括的上行流量的特征和下行流量的特征的类型都是相同的。换句话说。如果N组流量特征数据中的第1组流量特征数据包括表1中序号为1、3、5、6、8的上行流量的特征和表1中序号为1、3、5、6、8的下行流量的特征,那么该N组流量特征数据中的第2组至第N组流量特征数据中的任一组流量特征数据都包括表1中序号为1、3、5、6、8的上行流量的特征和表1中序号为1、3、5、6、8的下行流量的特征。
图2中步骤202得到的质量判断结果指示目标时段的视频会议质量的优劣。可选地,视频会议质量的优劣包括绝对优劣(例如卡顿/不卡顿、卡顿时间超过预设阈值/卡顿时间不超过预设阈值)和/或相对优劣(例如分辨率下降/分辨率不下降)等等。
在一些实施例中,视频会议质量判断结果包括两类:卡顿和不卡顿。换句话说,根据该N组流量特征数据和质量判断模型,确定目标时段的视频会议是否会发生卡顿。
在另一些实施例中,视频会议质量判断结果包括两类:分辨率下降和分辨率不下降。 换句话说,根据该N组流量特征数据和质量判断模型,确定目标时段的视频会议的分辨率是否降低。
在另一些实施例中,视频会议质量判断结果包括两类:卡顿时间超过预设阈值和卡顿时间不超过预设阈值。换句话说,根据该N组流量特征数据和质量判断模型,确定目标时段的视频会议的卡顿时间是否超过预设阈值。
在另一些实施例中,视频会议质量判断结果包括两类:起播时延超过预设阈值和起播时延不超过预设阈值。换句话说,根据该N组流量特征数据和质量判断模型,确定目标时段的视频会议的起播时延是否超过预设阈值。
在另一些实施例中,视频会议质量判断结果包括上述的任一种或多种的组合。换句话说,视频会议指令判断结果包括:卡顿和不卡顿、分辨率下降和分辨率不下降、卡顿时间超过预设阈值和卡顿时间不超过预设阈值,和起播时延超过预设阈值和起播时延不超过预设阈值中的任意多组或者全部。
在另一些实施例中,视频会议质量的优劣还可以表示为发生卡顿、分辨率下降等情况概率。例如,90%会发生卡顿,85%会发生分辨率下降等。
下面以视频会议质量判断结果为是否发生卡顿为例对本申请技术方案进行介绍。本领域技术人员可以理解,确定视频会议质量判断结果为其他结果(分辨率是否下降、卡顿时延是否超过预设阈值、起播时延是否超过预设阈值)的实现方式与确定视频会议画面是否发生卡顿的实现方式相同或类似。
在一些实施例中,目标时段晚于N个采样时段中的第N个采样时段。处理单元311根据目标时段之前的N个采样时段对应的N组流量特征数据来预测目标时段的视频会议画面是否发生卡顿。
在一些实施例中,视频会议在时间上划分为多个时间窗口,每个时间窗口以一个时间单位为粒度,划分为多个采样时段。
图6是时间窗口的示意图。图6示出了两个时间窗口,分别为时间窗口1和时间窗口2。时间窗口1以2s为一个时间单位,划分为五个采样时段,分别为采样时段1_1至采样时段1_5;时间窗口2也以2s为一个时间单位,划分为五个采样时段,分别为采样时段2_1至采样时段2_5。
在一些实施例中,目标时段是一个时间窗口。以图6为例,数据采集单元312采集到的N组流量特征数据是从时间窗口1中的采样时段1_1至采样时段1_5中采集的流量特征数据。根据该N组流量特征数据确定的目标时段是时间窗口2。换句话说,处理单元311根据质量判断模型和数据采集单元312从时间窗口1中的采样时段1_1至采样时段1_5中采集的流量特征数据,确定时间窗口2的视频会议是否会发生卡顿。
在另一些实施例中,目标时段是一个或多个采样时段。以图6为例,假设目标时段是一个采样时段,数据采集单元312采集到的N组流量特征数据是从时间窗口1中的采样时段1_1至采样时段1_5中采集的流量特征数据。根据该N组流量特征数据确定的目标时段是采样时段2_1。换句话说,处理单元311根据数据采集单元312从时间窗口1中的采样时段1_1至采样时段1_5中采集的流量特征数据和质量判断模型,确定采样时段2_1的视频会议是否会发生卡顿。然后,处理单元311根据数据采集单元312从采样时段1_2至采样时段2_1中采集的流量特征数据和质量判断模型,确定采样时段2_2的视频会议是否会 发生卡顿。然后,处理单元311根据数据采集单元312从采样时段1_3至采样时段2_2中采集的流量特征数据和质量判断模型,确定采样时段2_3的视频会议是否会发生卡顿,以此类推。
上述实施例中,目标时段都是采集到N组流量特征数据之后的时段。换句话说,上述实施例是根据当前采集数据预测未来的视频会议是否会发生卡顿。如果确定目标时段的视频会议会发生卡顿,那么将卡顿通知消息发送给输出设备。该输出设备可以是显示屏330或者与音频模块340。例如,假设该输出设备为显示屏330,那么显示屏330可以通过弹窗或者其他方式提醒用户视频会议即将发生卡顿。这样,用户可以为即将到来的卡顿做出准备。例如,用户可以关闭一些对带宽占用较大的应用,以保证有足够的带宽给视频会议。
在另一些实施例中,视频会议画面是否发生卡顿是确定当前的视频会议画面是否发生卡顿。在此情况下,目标时段是采集该N组流量特征数据的时段。如上所述,N组流量特征数据是分别在N个采样时段中获取的。那么,该目标时段的起始时刻就是该N个采样时段中的第一个采样时段的起始时刻,该目标时段的结束时刻就是该N个采样时段的结束时刻。
以图6为例,该N组流量特征数据是在时间窗口1包括的五个采样时段中采集的。在此情况下,该目标时段就是时间窗口1。换句话说,处理单元311根据数据采集单元312从时间窗口1中的采样时段1_1至采样时段1_5中采集的流量特征数据和质量判断模型,确定时间窗口1的视频会议是否会发生卡顿。将对发生卡顿目标时段进行标记。目前视频会议服务提供商通常情况下无法获取到视频会议画面,因此视频会议服务提供商无法及时有效地确定视频会议的服务质量,针对可能发生的视频会议质量变差从而主动采取措施来减少或避免出现卡顿等现象。基于上述技术方案,视频会议服务提供商可以通过流量特征数据判断视频会议画面是否发生卡顿,从而根据发生卡顿的情况,判断是否需要对网络或者视频会议应用(application,APP)等进行优化。
为了便于描述,以下将根据当前采集到的数据预测未来视频会议是否会发生卡顿(即目标时段位于N个采样时段之后)称为第一种应用场景;将根据当前采集到的数据确定当前视频会议是否会发生卡顿(即目标时段与N个采样时段重叠)称为第二种应用场景。
用于确定质量判断结果的质量判断模型是根据视频会议训练数据集训练的。
视频会议训练数据包括多个训练数据集和多个标签信息。该多个训练数据集中的每个训练数据集包括M组流量特征数据,该M组流量特征数据分别是M个采样时段获取的流量特征数据。换句话说,该M组流量特征数据中的第m组流量特征数据是在M个采样时段中的第m个采样时段获取的,M为大于或等于1的正整数,m为大于或等于1且小于或等于M的正整数。
可选的,用于采集训练数据的视频会议的视频会议服务提供商或者网络设备运营商与提供视频会议1(即需要确定目标时段的质量的视频会议)的视频会议服务提供商或者网络运营商相同。例如,用于采集训练数据的视频会议和视频会议1都是由中国移动提供网络服务的。
可选的,用于采集训练数据的视频会议的视频会议服务提供商或者网络设备运营商与提供视频会议1的视频会议服务提供商或者网络运营商不相同。例如,用于采集训练数据的视频会议是由中国联通提供网络服务的,而视频会议1是由中国移动提供网络服务的。
在一些实施例中,M等于N。换句话说,一个时间窗口包括M个采样时段,每个训练数据集包括的M组流量特征数据是在一个时间窗口内的M个采样时段获取的。
该多个标签信息与多个训练数据集一一对应,该多个标签信息中的每个标签信息用于指示对应的训练数据集对应的视频会议画面是否发生卡顿。训练数据集对应的视频会议画面就是包含该M个采样时段的参考时间窗口内的视频会议画面。可选地,标签信息是计算机设备根据图像数据自动生成的,或者是人工通过观察图像标定的。在本申请后面的实施例中将详细介绍计算机设备根据图像数据确定标签信息的过程。
如果是第一种应用场景,那么每个训练数据集对应的参考时间窗与获取每个训练数据集包括的M组流量特征数据的M个采样时段的关系,和目标时段与N个采样时段的关系相同。以图6为例,训练数据集1包括在时间窗口1包括的五个采样时段采集到的五组流量特征数据,那么对应于训练数据集1的参考时间窗就是时间窗口2。
如果是第二种应用场景,那么每个训练数据集对应的参考时间窗就是获取每个训练数据集包括的M组流量特征数据的M个采样时段。以图6为例,训练数据集1包括在时间窗口1包括的五个采样时段采集到的五组流量特征数据,那么对应于训练数据集1的参考时间窗就是时间窗口1。
无论是第一种应用场景中的参考时间窗还是第二种应用场景中的参考时间窗,该参考时间窗都可以划分为多个采样时段,如果该多个采样时段中的任一个或多个采样时段的视频会议画面发生卡顿,那么就认为该参考时间窗的视频会议画面发生卡顿;如果该多个采样时段中的任一个采样时段的视频会议都没有发生卡顿,那么就认为该参考时间窗的视频会议画面没有发生卡顿。
确定采样时段的视频会议是否卡顿的方法对于第一种应用场景和第二钟应用场景都是相同的,以下以第二种应用场景为例,对如何确定一个采样时段的视频会议画面是否发生卡顿进行介绍。
假设第一训练数据集是该多个训练数据集中的任一个训练数据集,第一标签信息是对应于第一训练数据集的标签信息。第一标签信息用于指示第一训练数据集对应的视频会议画面是否发生卡顿。第一训练数据集包括的M组流量特征数据是在M个采样时段中获取的。由于是第二种应用场景,那么参考时间窗就是该M个采样时段。在此情况下,如果该M个采样时段中的任一个采样时段的视频会议画面发生卡顿,那么第一标签信息指示第一训练数据集对应的参考时间窗的视频会议画面发生卡顿。如果该M个采样时段中的任一个采样时段的视频会议画面都没有发生卡顿,那么第一标签信息指示第一训练数据集对应的参考时间窗的视频会议画面没有发生卡顿。
以图6为例,训练数据集1包括在时间窗口1包括的五个采样时段采集到的五组流量特征数据,训练数据集2包括在时间窗口2包括的五个采样时段采集到的五组流量特征数据。标签信息1是对应于训练数据集1的标签信息,标签信息2是对应于训练数据集2的标签信息。假设除采样时段1_1的视频会议画面发生卡顿外,时间窗口1和时间窗口2中的其他采样时段的视频会议画面都没有发生卡顿,那么确定标签信息1指示对应于训练数据集1的视频会议画面(即时间窗口1的视频会议画面)发生卡顿,标签信息2指示对应于训练数据集的视频会议画面(即时间窗口2的视频会议画面)没有发生卡顿。
一个采样时段的视频会议画面是否发生卡顿根据该采样时段中图像信息相同的连续 的图像数目确定。对视频会议录屏后通过计算机视觉方法提取视频会议播放过程中的每一帧图像,同时可以获取每一帧图像的时间戳。根据获取到的每一帧图像确定该帧图像的图像信息。
如果一个采样时段中图像信息相同的连续的图像数目大于或等于一个预设数量阈值,那么确定该采样时段的视频会议画面发生卡顿;如果一个采样时段中图像信息相同的连续的图像数目小于该预设数量阈值,那么确定该采样时段的视频会议画面没有发生卡顿。
该预设数量阈值根据以下公式确定:
Figure PCTCN2021133105-appb-000003
其中,Th表示该预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
例如,假设视频会议的帧率为30帧/s,那么单帧图像的持续时间t=1/30=33ms。如果预定义的视频卡顿标准为500ms,那么根据公式1得到Th=16。换句话说,如果一个采样时段中完全相同的连续的图像数目大于或等于16,那么认为该采样时段发生卡顿。
公式1中的取整方式是向上取整。这种取整方式是一种取整的示例。在另一些实施例中,Std/t的取整也可以是向下取整、四舍五入取整等其他取整方式。
可选的,两帧图像的图像信息相同包括两帧图像的部分画面的质量参数值相同。
可选的,两帧图像的图像信息相同包括两帧图像的全部画面的质量参数值相同。图像的质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
图7是一个视频会议画面的示意图。如图7所示的视频会议画面700包括四个部分:第一部分701为用户1的摄像头采集到的画面,第二部分702为用户2的采集到的画面,第三部分703为用户3的摄像头采集到的画面,第四部分704为用户1共享的桌面。
在一些实施例中,对用户1的视频会议画面分割为四个部分,确定每个部分的质量参数值。如果一个部分的相邻两帧的质量参数值相同,那么认为两帧图像的图像信息相同。例如,假设视频会议的第3帧图像的第一部分的质量参数值与视频会议的第4帧图像的第一部分的指令参数值相同,那么认为视频会议第3帧图像与视频会议第4帧图像的图像信息相同。换句话说,在视频会议画面包括多个部分的情况下,只要该多个部分中的至少一个部分的画面发生卡顿,那么就认为视频会议发生卡顿。
在另一些实施例中,在视频会议画面包括多个部分的情况下,不对视频会议画面进行分割,而是将视频会议画面作为一个整体来确定该视频会议画面的质量参数值。在此情况下,即使视频会议包括的多个部分中的一个发生卡顿,也认为视频会议没有发生卡顿。
视频会议的会议画面中包括随时间变化的视觉元素。
例如,视频会议的与会者会打开本地摄像头(例如内置在终端设备中的摄像头或者终端设备外接的摄像头),本地摄像头一般会拍摄与会者的头部或者上半身。与会者的头部或者上半身通常而言不会保持静止状态。那么本地摄像头拍摄的画面是随时间变化的视觉元素。
又如,视频会议中的某一个与会者可能会在共享桌面时播放一段视频,那么这段视频的画面是随时间变化的视觉元素。
又如,视频会议画面中可以显示通过滚动时间轴或者计时器记录会议的持续时间。在此情况下,随时间变化的滚动时间轴或者计时器是随时间变化的视觉元素。
又如,视频会议画面的某一个地方(例如右下角、左上角)等可能会显示一段一直变化的图像(例如图形交换格式(graphics interchange format,gif)图)。那么该gif图是随时间变化的视觉元素。
可以理解,如果视频会议画面发生卡顿,那么这些随时间变化的视觉元素会在一段时间静止不动。视频会议的图像的质量参数值根据这些随时间变化的视觉元素计算得到。如果存在连续的一段时间中超过预设数量阈值的图像,那么认为这期间的视频会议的画面中随时间变化的视觉元素静止不动,从而认为这段时间的视频会议画面发生卡顿。
图8是根据本申请实施例提供的一种训练模型的方法。可选地,如图8所示的方法由计算机设备或者计算机设备中的部件(例如芯片等)实现。
801,获取多个训练数据集和多个标签信息,其中该多个训练数据集中的第一训练数据集包括M组特征数据,该M组特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,该M组特征数据中的第m组特征数据包括该第二视频会议的流量在该M个采样时段中的第m个采样时段的特征数据,该多个标签信息中的第一标签信息用于指示该第一训练数据集对应的视频会议画面的质量,该第一训练数据集为该多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数,m为大于或等于1且小于或等于M的正整数。
802,根据该多个训练数据集和该多个标签信息训练得到质量判断模型。
训练数据集和标签信息的相关内容可以参考上述实施例中的介绍,为了简洁在此就不再赘述。
下面以卷积神经网络(convolutional neuron network,CNN)为例,对训练设备220如何训练得到该质量判断模型进行介绍。
CNN是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元对输入其中的图像中的重叠区域作出响应。
如图9所示,卷积神经网络(CNN)900包括输入层910,卷积层920,池化层930和神经网络层940。
输入层910中每次输入的训练数据集是一个训练数据集中包括的全部流量特征数据。
卷积层920:
如图9所示的卷积层分为六组,第一组卷积层和第二组卷积层中的每组卷积层包括两个卷积层921,第三组卷积层和第四组卷积层中的每组卷积层包括三个卷积层922,第五组卷积层和第六组卷积层中的每组卷积层包括四个卷积层923。
卷积层921包括一个长度为2的卷积算子。卷积层922包括一个长度为3的卷积算子。卷积层923包括一个长度为4的卷积算子。
卷积算子也称为核,其作用相当于一个从输入的流量特征数据中提取特定信息的过滤器,卷积算子本质上是一个权重矩阵,这个权重矩阵通常被预先定义,在对输入的流量特征数据进行卷积操作的过程中,权重矩阵在输入的流量特征数据上沿着水平方向一个流量特征接着一个流量特征的进行处理,从而完成从流量特征数据中提取特定特征的工作。
该权重矩阵的大小应该与流量特征数据的大小相关,需要注意的是,权重矩阵的纵深 维度(depth dimension)和输入的流量特征数据的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入的流量特征数据的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出。但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积数据的纵深维度。不同的权重矩阵用来提取流量特征数据中不同的特征。该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提取到的多个维度相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵从输入的流量特征数据中提取信息,从而帮助卷积神经网络900进行正确的预测。
当卷积神经网络900有多个卷积层的时候,初始的卷积层往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络900深度的加深,越往后的卷积层提取到的特征越来越复杂,从而越适用于待解决的问题。
例如,第一组卷积层和第二组卷积层中的每组卷积层包括两个卷积层921。换句话说,第一组卷积层和第二组卷积层中的每组卷积层使用两个权重矩阵提取输入的流量特征数据的特征。第三组卷积层和第四组卷积层中的每组卷积层包括三个卷积层922。换句话说,第三组卷积层和第四组卷积层中的每组卷积层使用三个权重矩阵提取输入的流量特征数据的特征。第五组卷积层和第六组卷积层中的每组卷积层包括四个卷积层923。换句话说,第五组卷积层和第六组卷积层中的每组卷积层使用四个权重矩阵提取输入的流量特征数据的特征。
第一组卷积层的输出、第二组卷积层的输出、第三组卷积层的输出、第四组卷积层的输出、第五组卷积层的输出和第六组卷积层的输出合并作为池化层930的输入。
池化层930:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层。池化层使得卷积层输出的特征图变小,简化网络计算复杂度,减少下一层的参数和计算量,防止过拟合。
神经网络层940:
在经过卷积层920和池化层930的处理后,卷积神经网络900还不足以输出所需要的输出信息。因为如前所述,卷积层920和池化层930只会提取特征,并减少输入的流量特征数据带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络900需要利用神经网络层940来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层940中可以包括全连接层(fully connected layers,FC)941和归一化指数函数(softmax函数)层942。全连接层941起到将学到的特征表示映射到样本的标记空间的作用。换句话说,就是把特征整合到一起(高度提纯特征),方便交给最后的分类器。本申请实施例中根据流量特征数据确定的视频会议质量的判断结果是卡顿和不卡顿。因此,分类器可以通过softmax函数实现。
需要说明的是,如图9所示的卷积神经网络900仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,每组卷积层包括的卷积层个数可以与如图9所示的不同。
除了利用如图9所示的卷积神经网络训练该质量判断模型外,还可以利用循环神经网络训练该质量判断模型。
图10是一个循环神经网络的示意图。如图10所示的循环神经网络1000包括FC 1001、softmax函数层1002和第一部分长短期记忆人工神经网络(long-short term memory,LSTM)神经元(cell)和第二部分LSTM cell,第一部分LSTM cell和第二部分LSTM cell中的每个部分都包括LSTM cell 1003-1012。
图10所示的循环神经网络是基于如图6所示的时间窗口和采样时段的流量特征数据。如图10所示的Data1_1表示采样时段1_1中采集到的一组流量特征数据,Data1_2表示采样时段1_2中采集到的一组流量特征数据,以此类推。
以第一部分为例,一个时间窗口内采集到的五组流量特征数据分别输入到五个LSTM cell。例如,采样时段1_1中采集到的一组流量特征数据Data1_1输入到LSTM cell 1003,采样时段1_2中采集到的一组流量特征数据Data1_2输入到LSTM cell 1005。
如图10所示的箭头表示数据的流向。例如,LSTM cell 1003的输出数据被发送至LSTM cell 1004和LSTM 1005,LSTM cell 1004的输出数据被发送至LSTM cell 1006。最终第一部分LSTM cell的LSTM cell 1012输出第一部分LSTM cell的输出结果。
第二部分LSTM cell处理数据的方式与第一部分LSTM cell处理数据的方式类似,不同之处在于第二部分LSTM cell中的LSTM cell 1004输出第二部分LSTM cell的输出结果。
第一部分LSTM cell的输出结果和第二部分LSTM cell的输出结果拼接后输入到FC1001,然后输入到sfotmax函数层1002。FC 1001和softmax函数层1002的功能与图9所示的CNN中的功能相同,为了简洁,在此就不再赘述。
图9和图10输出的结果与输入的流量特征数据对应的标签进行对比,确定误差率,根据误差率,调整模型中的各个参数,从而训练出最终的质量判断模型。
图11是本发明实施例提供的一种芯片硬件结构图。图9和图10所示的基于神经网络的算法可以在图11所示的神经网络处理器(neural network processing unit,NPU)1100中实现。
NPU 1100可以作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1103,通过控制器1104控制运算电路1103提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1103内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路1103是二维脉动阵列。运算电路1103还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1103是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1102中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1101中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1108中。
统一存储器1106用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)1105被搬运到权重存储器1102中。输入数据也通过DMAC被搬运到统一存储器1106中。
总线接口单元1110(bus interface unit,BIU),用于取指存储器1109从外部存储器获取指令,还用于存储单元访问控制器1105从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器中的输入数据搬运到统一存储器1106或将权重数据搬运到权重存储器1102中或将输入数据数据搬运到输入存储器1101中。
向量计算单元1107多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/FC层网络计算,如池化(Pooling),批归一化(Batch Normalization),局部响应归一化(Local Response Normalization)等。
在一些实现种,向量计算单元能1107将经处理的输出的向量存储到统一缓存器1106。例如,向量计算单元1107可以将非线性函数应用到运算电路1103的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1107生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1103的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1104连接的取指存储器(instruction fetch buffer)1109,用于存储控制器1104使用的指令。
统一存储器1106,输入存储器1101,权重存储器1102以及取指存储器1109均为片上(On-Chip)存储器。外部存储器私有于该NPU硬件架构。
本申请实施例还提供一种计算机设备,该计算机设备包括如图11所示的芯片以及存储器。
图12是根据本申请实施例提供的一种计算机设备的示意性结构框图。如图12所示的计算机设备可以是用于执行如图2所示方法的计算机设备。如图12所示的计算机设备1200包括数据采集单元1201和处理单元1202。
数据采集单元1201,用于在N个采样时段对第一视频会议的流量进行数据采集,得到N组特征数据。
处理单元1202,用于将该N组特征数据输入质量判断模型,得到该第一视频会议在目标时段的质量判断结果。
数据采集单元1201和处理单元1202的具体功能和有益效果可以参考上述实施例,为了简洁,在此就不再赘述。
图13是根据本申请实施例提供的另一种计算机设备的示意性结构框图。如图13所示的计算机设备可以是用于执行如图4所示方法的计算机设备。如图13所示的计算机设备1300包括数据采集单元1301和处理单元1302。
数据采集单元1301,用于获取多个训练数据集和多个标签信息。
处理单元1302,用于根据该多个训练数据集和该多个标签信息训练得到质量判断模型。
数据采集单元1301和处理单元1302的具体功能和有益效果可以参考上述实施例为了简洁,在此就不再赘述。
图14是本申请实施例提供的一种系统架构的示意图。
参见图14,系统架构1400中数据采集设备1410用于采集视频会议训练数据并存入 数据库1420。数据采集设备1410的工作原理参考附图3中的数据采集单元312。
训练设备1430基于数据库1420中维护的视频会议训练数据集生成质量判断模型。执行设备1440使用质量判断模型采集到的流量特征数据进行处理,得到最终的判断结果。
执行设备1440可以是如图3或10所示的计算机设备,训练设备1430可以是如图13所示的计算机设备或包含如图11所示的芯片的计算机设备。执行设备1440和训练设备1430的具体功能可以参见上述实施例,为了简洁,在此就不再赘述。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行上述实施例中任意一个实施例的方法。
根据本申请实施例提供的方法,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行上述实施例中任意一个实施例的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟 悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (38)

  1. 一种确定视频会议质量的方法,其特征在于,包括:
    在N个采样时段对第一视频会议的流量进行数据采集,得到N组流量特征数据,所述N组流量特征数据中的第n组流量特征数据是在所述N个采样时段中的第n个采样时段采集的,N为大于或等于1的正整数,n为大于或等于1且小于或等于N的正整数;
    将所述N组流量特征数据输入质量判断模型,得到所述第一视频会议在目标时段的质量判断结果,所述质量判断结果指示所述目标时段的视频会议质量的优劣,其中所述质量判断模型是根据视频会议训练数据训练得到的,其中所述目标时段不早于所述N个采样时段。
  2. 根据权利要求1所述的方法,其特征在于,所述第n组流量特征数据包括上行流量的特征数据,和/或,下行流量的特征数据。
  3. 如权利要求2所述的方法,其特征在于,所述第n组流量特征数据包括的上行流量的特征数据包括以下数据中的任一个或多个:
    所述第n个采样时段的上行数据包个数;
    所述第n个采样时段上行上传总字节数;
    所述第n个采样时段上行包大小的最大值;
    所述第n个采样时段上行包大小的平均值;
    所述第n个采样时段上行包大小的方差;
    所述第n个采样时段上行包间隔的最大值;
    所述第n个采样时段上行包间隔的平均值;
    所述第n个采样时段上行包间隔的方差;
    所述第n个采样时段上行丢包率;
    所述第n个采样时段上行最大连续丢包数;
    所述第n个采样时段上行的字节指标。
  4. 如权利要求2所述的方法,其特征在于,所述第n组流量特征数据包括的下行流量的特征数据包括一下数据中的任一个或多个:
    所述第n个采样时段的下行数据包个数;
    所述第n个采样时段下行下载总字节数;
    所述第n个采样时段下行包大小的最大值;
    所述第n个采样时段下行包大小的平均值;
    所述第n个采样时段下行包大小的方差;
    所述第n个采样时段下行包间隔的最大值;
    所述第n个采样时段下行包间隔的平均值;
    所述第n个采样时段下行包间隔的方差;
    所述第n个采样时段下行丢包率;
    所述第n个采样时段下行最大连续丢包数;
    所述第n个采样时段下行的字节指标。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,所述N个采样时段在时间上是连续的。
  6. 如权利要求5所述的方法,其特征在于,所述N个采样时段的时间长度之和与所述目标时段的时间长度相同。
  7. 如权利要求5或6所述的方法,其特征在于,所述N个采样时段中的第N个采样时段在所述目标时段之前且与所述目标时段在时间上是连续的,或者,
    所述N个采样时段中第一个采样时段的起始时刻为所述目标时段的起始时刻,所述N个采样时段中的第N个采样时段的结束适合为所述目标时段的结束时刻。
  8. 如权利要求1至7中任一项所述的方法,其特征在于,所述视频会议训练数据包括多个训练数据集和多个标签信息,其中所述多个训练数据集中的第一训练数据集包括M组流量特征数据,所述M组流量特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,所述多个标签信息中的第一标签信息用于指示所述第一训练数据集对应的视频会议画面是否发生卡顿,所述第一训练数据集为所述多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数;
    所述质量判断模型是根据所述多个训练数据集和所述多个标签信息训练得到的。
  9. 如权利要求8所述的方法,其特征在于,若所述M个采样时段中的任一个采样时段的所述第二视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面发生卡顿;或
    若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于所述预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应视频会议画面没有发生卡顿。
  10. 如权利要求9所述的方法,其特征在于,所述预设数量阈值是根据以下公式确定的:
    Figure PCTCN2021133105-appb-100001
    其中,Th表示所述预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
  11. 如权利要求9或10所述的方法,其特征在于,两帧图像的图像信息相同包括所述两帧图像部分或全部画面的质量参数值相同,所述质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
  12. 如权利要求8至11中任一项所述的方法,其特征在于,所述多个采样时段中的每个采样时段的视频会议画面包括随时间变化的视觉元素。
  13. 如权利要求8至12中任一项所述的方法,其特征在于,所述第一视频会议与所述第二视频会议是由不同的视频会议服务提供商或者不同的网络运营商提供服务的。
  14. 一种训练模型的方法,其特征在于,所述方法包括:
    获取多个训练数据集和多个标签信息,其中所述多个训练数据集中的第一训练数据集包括M组特征数据,所述M组特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,所述M组特征数据中的第m组特征数据包括所述第二视频会议的流量在所述M个采样时段中的第m个采样时段的特征数据,所述多个标签信息中的第一标 签信息用于指示所述第一训练数据集对应的视频会议画面的质量,所述第一训练数据集为所述多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数,m为大于或等于1且小于或等于M的正整数;
    根据所述多个训练数据集和所述多个标签信息训练得到质量判断模型。
  15. 如权利要求14所述的方法,其特征在于,若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面发生卡顿;
    若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于所述预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面没有发生卡顿。
  16. 如权利要求15所述的方法,其特征在于,所述预设数量阈值是根据以下公式确定的:
    Figure PCTCN2021133105-appb-100002
    其中,Th表示所述预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
  17. 如权利要求15或16所述的方法,其特征在于,两帧图像的图像信息相同包括所述两帧图像部分或全部画面的质量参数值相同,所述质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
  18. 如权利要求14至17中任一项所述的方法,其特征在于,所述多个采样时段中的每个采样时段的视频会议画面包括随时间变化的视觉元素。
  19. 一种计算机设备,其特征在于,包括:
    数据采集单元,用于在N个采样时段对第一视频会议的流量进行数据采集,得到N组特征数据,所述N组特征数据中的第n组特征数据是在所述N个采样时段中的第n个采样时段采集的,N为大于或等于1的正整数,n为大于或等于1且小于或等于N的正整数;
    处理单元,用于将所述N组特征数据输入质量判断模型,得到所述第一视频会议在目标时段的质量判断结果,所述质量判断结果指示所述目标时段的视频会议质量的优劣,其中所述质量判断模型是根据视频会议训练数据训练得到的,其中所述目标时段不早于所述N个采样时段。
  20. 根据权利要求19所述的计算机设备,其特征在于,所述第n组流量特征数据包括上行流量的特征数据和下行流量的特征数据。
  21. 如权利要求20所述的计算机设备,其特征在于,所述第n组流量特征数据包括的上行流量的特征数据包括以下数据中的任一个或多个:
    所述第n个采样时段的上行数据包个数;
    所述第n个采样时段上行上传总字节数;
    所述第n个采样时段上行包大小的最大值;
    所述第n个采样时段上行包大小的平均值;
    所述第n个采样时段上行包大小的方差;
    所述第n个采样时段上行包间隔的最大值;
    所述第n个采样时段上行包间隔的平均值;
    所述第n个采样时段上行包间隔的方差;
    所述第n个采样时段上行丢包率;
    所述第n个采样时段上行最大连续丢包数;
    所述第n个采样时段上行的字节指标。
  22. 如权利要求20所述的计算机设备,其特征在于,所述第n组流量特征数据包括的下行流量的特征数据包括一下数据中的任一个或多个:
    所述第n个采样时段的下行数据包个数;
    所述第n个采样时段下行下载总字节数;
    所述第n个采样时段下行包大小的最大值;
    所述第n个采样时段下行包大小的平均值;
    所述第n个采样时段下行包大小的方差;
    所述第n个采样时段下行包间隔的最大值;
    所述第n个采样时段下行包间隔的平均值;
    所述第n个采样时段下行包间隔的方差;
    所述第n个采样时段下行丢包率;
    所述第n个采样时段下行最大连续丢包数;
    所述第n个采样时段下行的字节指标。
  23. 如权利要求19至22中任一项所述的计算机设备,其特征在于,所述N个采样时段在时间上是连续的。
  24. 如权利要求23所述的计算机设备,其特征在于,所述N个采样时段的时间长度之和与所述目标时段的时间长度相同。
  25. 如权利要求23或24所述的计算机设备,其特征在于,所述N个采样时段中的第N个采样时段在所述目标时段之前且与所述目标时段在时间上是连续的,或者,
    所述N个采样时段中第一个采样时段的起始时刻为所述目标时段的起始时刻,所述N个采样时段中的第N个采样时段的结束适合为所述目标时段的结束时刻。
  26. 如权利要求19至25中任一项所述的计算机设备,其特征在于,所述视频会议训练数据包括多个训练数据集和多个标签信息,其中所述多个训练数据集中的第一训练数据集包括M组特征数据,所述M组特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,所述多个标签信息中的第一标签信息用于指示所述第一训练数据集对应的视频会议画面是否发生卡顿,所述第一训练数据集为所述多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数;
    所述质量判断模型是根据所述多个训练数据集和所述多个标签信息训练得到的。
  27. 如权利要求26所述的计算机设备,其特征在于,若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面发生卡顿;
    若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于所述预设数量阈值,则所述第一标签信息用于指示所述第一训练数据 集对应的内视频会议画面没有发生卡顿。
  28. 如权利要求27所述的计算机设备,其特征在于,所述预设数量阈值是根据以下公式确定的:
    Figure PCTCN2021133105-appb-100003
    其中,Th表示所述预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
  29. 如权利要求27或28所述的计算机设备,其特征在于,两帧图像的图像信息相同包括所述两帧图像部分或全部画面的质量参数值相同,所述质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
  30. 如权利要求27至29中任一项所述的计算机设备,其特征在于,所述多个采样时段中的每个采样时段的视频会议画面包括随时间变化的元素。
  31. 如权利要求27至30中任一项所述的计算机设备,其特征在于,所述第一视频会议与所述第二视频会议是由不同的视频会议服务提供商或者不同的网络运营商提供服务的。
  32. 一种计算机设备,其特征在于,所述计算机设备包括:
    数据采集单元,用于获取多个训练数据集和多个标签信息,其中所述多个训练数据集中的第一训练数据集包括M组特征数据,所述M组特征数据分别是在M个采样时段对第二视频会议的流量进行数据采集得到的,所述M组特征数据中的第m组特征数据包括所述第二视频会议的流量在所述M个采样时段中的第m个采样时段的特征数据,所述多个标签信息中的第一标签信息用于指示所述第一训练数据集对应的视频会议画面的质量,所述第一训练数据集为所述多个训练数据集中的任一个训练数据集,M为大于或等于1的正整数,m为大于或等于1且小于或等于M的正整数;
    处理单元,用于根据所述多个训练数据集和所述多个标签信息训练得到质量判断模型。
  33. 如权利要求32所述的计算机设备,其特征在于,若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目大于或等于预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面发生卡顿;
    若所述M个采样时段中的任一个采样时段的视频会议的多帧图像中图像信息相同且连续的图像数目小于所述预设数量阈值,则所述第一标签信息用于指示所述第一训练数据集对应的视频会议画面没有发生卡顿。
  34. 如权利要求33所述的计算机设备,其特征在于,所述预设数量阈值是根据以下公式确定的:
    Figure PCTCN2021133105-appb-100004
    其中,Th表示所述预设数量阈值,Std表示预定义的视频卡顿标准,t表示单帧图像的持续时间。
  35. 如权利要求33或34所述的计算机设备,其特征在于,两帧图像的图像信息相同包括所述两帧图像部分或全部画面的质量参数值相同,所述质量参数值根据拉普拉斯算子、Brenner梯度函数或者Tenengrad梯度函数确定。
  36. 如权利要求32至35中任一项所述的计算机设备,其特征在于,所述多个采样时段中的每个采样时段的视频会议画面包括随时间变化的视觉元素。
  37. 一种确定视频会议质量的系统,其特征在于,所述系统包括如权利要求19至31中任一项所述的计算机设备,和如权利要求32至36中任一项所述的计算机设备。
  38. 一种计算机可读存储介质,用于储存为计算机所用的计算机软件指令,其包含用于执行权利要求1至13任一所述包括的各个步骤的程序,或者包含用于执行权利要求14至18任一所述包括的各个步骤的程序。
PCT/CN2021/133105 2021-04-01 2021-11-25 确定视频会议质量的方法、相关装置和系统 WO2022205964A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110355932.8 2021-04-01
CN202110355932 2021-04-01
CN202110654936.6A CN115174842A (zh) 2021-04-01 2021-06-11 确定视频会议质量的方法、相关装置和系统
CN202110654936.6 2021-06-11

Publications (1)

Publication Number Publication Date
WO2022205964A1 true WO2022205964A1 (zh) 2022-10-06

Family

ID=83457893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133105 WO2022205964A1 (zh) 2021-04-01 2021-11-25 确定视频会议质量的方法、相关装置和系统

Country Status (1)

Country Link
WO (1) WO2022205964A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157899A1 (en) * 2016-12-07 2018-06-07 Samsung Electronics Co., Ltd. Method and apparatus detecting a target
US20190377972A1 (en) * 2018-06-08 2019-12-12 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for training, classification model, mobile terminal, and readable storage medium
CN110837842A (zh) * 2019-09-12 2020-02-25 腾讯科技(深圳)有限公司 一种视频质量评估的方法、模型训练的方法及装置
CN111031403A (zh) * 2019-11-05 2020-04-17 网宿科技股份有限公司 一种卡顿检测方法、系统及设备
CN111263225A (zh) * 2020-01-08 2020-06-09 恒安嘉新(北京)科技股份公司 视频卡顿预测方法、装置、计算机设备及存储介质
CN111597361A (zh) * 2020-05-19 2020-08-28 腾讯科技(深圳)有限公司 多媒体数据处理方法、装置、存储介质及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157899A1 (en) * 2016-12-07 2018-06-07 Samsung Electronics Co., Ltd. Method and apparatus detecting a target
US20190377972A1 (en) * 2018-06-08 2019-12-12 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for training, classification model, mobile terminal, and readable storage medium
CN110837842A (zh) * 2019-09-12 2020-02-25 腾讯科技(深圳)有限公司 一种视频质量评估的方法、模型训练的方法及装置
CN111031403A (zh) * 2019-11-05 2020-04-17 网宿科技股份有限公司 一种卡顿检测方法、系统及设备
CN111263225A (zh) * 2020-01-08 2020-06-09 恒安嘉新(北京)科技股份公司 视频卡顿预测方法、装置、计算机设备及存储介质
CN111597361A (zh) * 2020-05-19 2020-08-28 腾讯科技(深圳)有限公司 多媒体数据处理方法、装置、存储介质及设备

Similar Documents

Publication Publication Date Title
US10372991B1 (en) Systems and methods that leverage deep learning to selectively store audiovisual content
US20200351466A1 (en) Low Power Framework for Controlling Image Sensor Mode in a Mobile Image Capture Device
WO2020177722A1 (zh) 一种视频分类的方法、模型训练的方法、设备及存储介质
US10848709B2 (en) Artificial intelligence based image data processing method and image processing device
CN109640007B (zh) 人工智能图像传感设备
WO2020078027A1 (zh) 一种图像处理方法、装置与设备
WO2019179283A1 (zh) 图像识别方法及装置
KR101876433B1 (ko) 행동인식 기반 해상도 자동 조절 카메라 시스템, 행동인식 기반 해상도 자동 조절 방법 및 카메라 시스템의 영상 내 행동 자동 인식 방법
WO2021115242A1 (zh) 一种超分辨率图像处理方法以及相关装置
WO2022073282A1 (zh) 一种基于特征交互学习的动作识别方法及终端设备
US11917158B2 (en) Static video recognition
WO2020207192A1 (zh) 图像处理器、图像处理方法、拍摄装置和电子设备
WO2024002211A1 (zh) 一种图像处理方法及相关装置
WO2024007948A1 (zh) 频闪图像处理方法、装置、电子设备和可读存储介质
US20220070453A1 (en) Smart timelapse video to conserve bandwidth by reducing bit rate of video on a camera device with the assistance of neural network input
WO2022205964A1 (zh) 确定视频会议质量的方法、相关装置和系统
Yuan et al. AccDecoder: Accelerated decoding for neural-enhanced video analytics
Brzoza-Woch et al. Remotely reconfigurable hardware–software platform with web service interface for automated video surveillance
CN111510629A (zh) 数据显示方法、图像处理器、拍摄装置和电子设备
US20230419505A1 (en) Automatic exposure metering for regions of interest that tracks moving subjects using artificial intelligence
CN111881862A (zh) 手势识别方法及相关装置
US20220294971A1 (en) Collaborative object detection
US20220301278A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN110730335A (zh) 无人机视频实时预览方法及其系统
CN115174842A (zh) 确定视频会议质量的方法、相关装置和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21934600

Country of ref document: EP

Kind code of ref document: A1