WO2012174931A1

WO2012174931A1 - Parameter control method and device

Info

Publication number: WO2012174931A1
Application number: PCT/CN2012/074031
Authority: WO
Inventors: 刘军莉; 陈军; 佟鑫; 王福; 张良平
Original assignee: 中兴通讯股份有限公司
Priority date: 2011-06-20
Filing date: 2012-04-13
Publication date: 2012-12-27
Also published as: CN102256099A

Abstract

The present invention provides a parameter control method and device. The method comprises the following steps: analyzing audio data and/or video data from a terminal; determining that the audio in the audio data and/or the picture in the video data comprises an audio and/or picture for instructing adjustment of a parameter; and adjusting the parameter according to the audio and/or picture for instructing adjustment of the parameter. By means of the present invention, the effect of expanding the control functions on a terminal without modifying the terminal is achieved.

Description

TECHNICAL FIELD The present invention relates to the field of communications, and in particular, to a parameter control method and apparatus. BACKGROUND For services that require a terminal and a server to cooperate (for example, conference television), if the terminal wishes to change certain service-related parameters, it needs to send a dedicated instruction to the service, and the instruction is carried in the control. In the message, therefore, for each additional control method, the terminal needs to be upgraded so that it can send the commands required for the control. Since the number of terminals is large and widely distributed, it is not easy to upgrade the terminal. The following is an example of a video conference. The existing implementation method of the video conference is to add several terminals to the conference by setting the relevant configuration of the conference in a conference. After the conference, the multipoint conference unit (MCU) first performs signaling interaction with each terminal. Determining the format of the terminal, and then each terminal sends its own image compression code stream to the MCU according to the interaction format; the MCU decodes the code stream transmitted by the terminal, encodes and compresses the synthesized image, and sends it back to each terminal, so that each The images received by the terminals are the same. In order to enable each terminal to customize the number of multi-pictures according to their own needs, it is necessary to add a universal port.

(universe port) function. The existing implementation method is as follows: On the terminal side, in addition to the previous functions, each terminal needs to increase the signaling processing for acquiring and processing the adjusted multi-picture number, and needs to receive the user's request in real time, and set the multi-picture number information set by the user. It is also transmitted to the MCU through the signaling channel; on the MCU side, the multipoint processor (MP) receives the signaling of adjusting the multi-picture of each terminal, and then adjusts the transmission relationship between each decoding node and the coding node, according to the new After the corresponding multi-picture synthesis is performed, the image is encoded and sent back to each terminal. This completes the universe port function, that is, the image received by each terminal can be a customized image, which is different. This kind of processing is also to upgrade the terminal to support this function, which will bring a series of problems: For example, all the terminals that were sold before need to be modified extensively; the terminal must send a letter with the MCU. To make the interaction, the additional burden of the signaling channel is required; and the terminals of other manufacturers cannot implement this function. SUMMARY OF THE INVENTION Embodiments of the present invention provide a parameter control method and apparatus to solve at least the above problems. According to an aspect of an embodiment of the present invention, a parameter control method is provided, the method comprising the steps of: parsing audio data and/or video data from a terminal; and determining audio and/or video data in the audio data. The picture in the picture contains an audio and/or picture for indicating adjustment of the parameter; the parameter is adjusted according to the audio and/or picture for indicating adjustment of the parameter. Determining, in a predetermined period of time, a picture in the audio and/or video data in the audio data includes the audio and/or picture for indicating adjustment of a parameter. In the case that the parameter is a parameter used to send a media stream to the terminal, after adjusting the parameter according to the audio and/or picture for indicating adjustment of the parameter, using the adjusted The parameter sends a media stream to the terminal. When the media stream is a video stream, the parameter includes at least one of: a number of terminal pictures included in the video stream, a layout of the video stream displayed on the terminal, and a frame rate of the video stream. The code rate of the video stream and the format of the video stream. Obtaining an image linear code after parsing the video data from the terminal; determining that the image linear code includes a screen for indicating adjustment of the parameter. According to another aspect of an embodiment of the present invention, a parameter control apparatus is provided, the apparatus comprising: a parsing module configured to parse audio data and/or video data from a terminal; and a determining module configured to determine the audio The picture in the audio and/or video data in the data comprises audio and/or pictures for indicating adjustment of the parameters; an adjustment module arranged to be in accordance with said audio for indicating adjustment of said parameters and/or The screen adjusts the parameters. The determining module is configured to determine, in a preset period of time, that the picture in the audio and/or video data in the audio data includes the audio and/or picture for indicating adjustment of a parameter. In the case that the parameter is a parameter used to send a media stream to the terminal, the adjustment module is configured to adjust the parameter according to the audio and/or picture for indicating adjustment of the parameter, The media stream is sent to the terminal using the adjusted parameters. When the media stream is a video stream, the parameter includes at least one of: a number of terminal pictures included in the video stream, a layout of the video stream displayed on the terminal, and a frame rate of the video stream. The code rate of the video stream and the format of the video stream. The parsing module is configured to parse the video data from the terminal to obtain an image linear code; the determining module is configured to determine that the image linear code includes a screen for indicating adjustment of the parameter. Through the embodiment of the present invention, the audio data and/or the video data from the terminal are parsed; determining that the picture in the audio and/or video data in the audio data includes audio for indicating the parameter adjustment and/or And adjusting the parameter according to the audio and/or the screen for indicating the adjustment of the parameter, which solves the problem that the terminal needs to upgrade the terminal in order to increase the control function of the terminal in the prior art. Furthermore, the effect of expanding the control function of the terminal without modifying the terminal is achieved. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1 is a flowchart of a parameter control method according to an embodiment of the present invention; FIG. 2 is a block diagram showing a structure of a parameter control apparatus according to an embodiment of the present invention; and FIG. 3 is a media stream processing according to a preferred embodiment of the present invention. FIG. 4 is a flowchart of an MCU side of a media stream processing method according to a preferred embodiment of the present invention; FIG. 5 is a downward and upward direction of a gesture recognition multi-screen method according to an embodiment of the present invention; FIG. 6 is a three-screen department diagram of a gesture recognition multi-screen method according to an embodiment of the present invention; and FIG. 7 is a four-screen department diagram of a gesture recognition multi-screen method according to an embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. For the server, the control of a certain service is reflected in the adjustment of the parameters, that is, by adjusting different parameters to achieve the purpose of control. In this embodiment, a parameter control method is provided. FIG. 1 is a flowchart of a parameter control method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps: Step S102: Audio data from a terminal And/or the video data is parsed; step S104, determining that the picture in the audio and/or video data in the audio data includes an audio and/or a picture for indicating adjustment of the parameter; Step S106, according to the above, The audio and/or picture adjusted by the above parameters adjust the above parameters. Through the above steps, only the server needs to be adjusted, because the terminal can transmit video through the camera or can transmit audio through the microphone, so that the terminal does not need to make any changes, as long as the user acts on the camera or through the microphone. Just indicate the voice. Compared with the prior art, the signaling interaction is relieved, and the control function of the terminal is expanded to some extent. In the implementation, if all the audio and video data is parsed, it may slightly increase the burden on the server, although it is found in the actual test that it will not have too much impact on the server, but in order to avoid this possibility The problem can be only to analyze audio and video data for a certain period of time. For example, it can be agreed in advance. For 10 minutes of audio and video data, the data of the first 5 minutes is not parsed, and the data of the last 5 minutes is parsed, so that for each hour, the user can respectively be in 5-10 minutes. 15-20 minutes, 25-30 minutes, 35-40 minutes, 45-50 minutes, 55-60 minutes to make action or voice that can be controlled. In a more preferred embodiment, at the beginning, all audio and video data can be parsed until parsed into a picture or audio indicating a change in parsing mode. Of course, the preset time can also be set in advance according to the specific situation of the conference. If you want the conference to update fast, you can set the time interval to be shorter. If you want to update slowly, you can set the judgment time to be slightly longer. The following is an example in which the parameter is used to send a parameter used by the media stream to the terminal. After the server adjusts the parameter according to the audio and/or screen for indicating the adjustment of the parameter, the server sends the media to the terminal by using the adjusted parameter. Stream (for example, audio stream, video stream). For example, control of receiving a picture by the terminal may be implemented by using audio and/or a picture. When the media stream is a video stream, the parameter includes at least one of the following: a number of terminal pictures included in the video stream, and the video stream is at the terminal. The layout of the display, the frame rate of the video stream, the bit rate of the video stream, and the format of the video stream. For the parsing of the video data, a preferred manner is provided in the embodiment, that is, an image linear code is obtained after parsing the video data from the terminal; and determining that the image linear code includes indicating to perform the parameter Adjusted picture. Because the identification of linear codes is relatively easy, the process of judgment is relatively simple. In this embodiment, a parameter control device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments thereof, which have been described, and will not be described again. . 2 is a block diagram showing the structure of a parameter control apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes: an analysis module 22, a determination module 24, and an adjustment module 26. The structure will be described below. The parsing module 22 is configured to parse the audio data and/or the video data from the terminal; the judging module 24 is connected to the parsing module 22, and is configured to determine the screen inclusion in the audio and/or video data in the audio data, For example, when the media stream is a video stream, the foregoing parameters include at least one of the following: a number of terminal pictures included in the video stream, and the video stream is displayed on the terminal. The layout, the frame rate of the video stream, the bit rate of the video stream, and the format of the video stream. The adjustment module 26 is coupled to the determination module 24 and configured to adjust the parameters in accordance with the audio and/or screen for indicating adjustment of the parameters. Preferably, the determining module 24 is configured to determine that the picture in the audio and/or video data in the audio data comprises the above-mentioned audio and/or picture for indicating adjustment of the parameter within a preset time period. In the case that the parameter is a parameter used to send a media stream to the terminal, the adjustment module 26 is configured to use the adjusted parameter after adjusting the parameter according to the audio and/or screen for indicating the adjustment of the parameter. The terminal sends a media stream. Preferably, the parsing module 22 is configured to parse the video data from the terminal to obtain an image linear code; the determining module 24 is configured to determine that the image linear code includes a screen for indicating adjustment of the parameter. The following describes a conference television system as an example in conjunction with a preferred embodiment. In the preferred embodiment, a method for acquiring a gesture image using an existing camera of a terminal in a conference television system and adjusting a multi-picture number by gesture or other audio and/or video recognition on the MCU side is provided. In this embodiment, when the universe port function is implemented, no modification to the terminal is required, and only a gesture recognition module is added to the video processing unit (VPU) on the MCU side to identify the terminal image. The multi-picture adjustment, and then the adjusted multi-picture information is transmitted to the MP, and the MP adjusts the transmission relationship between each decoding node and the coding node after receiving the signaling. Adopted in this embodiment The functions involved in the technical solution include: (1) the conference control interface sets the conference-related configuration; (2) the MCU and the terminal perform signaling interaction of the coding format; (3) the terminal sends the image compressed code stream to the MCU; (4) the MCU receives After decoding to the code stream, the image linear code is obtained; (5) the MCU performs multi-picture synthesis and re-encoding on the decoded image of each terminal; (6) the last MCU sends the encoded code stream back to the terminal; (7) in the MCU The VPU module performs gesture recognition on the decoded linear code. (8) If the VPU module identifies the multi-picture number adjustment information, the VPU module sends the signaling to the MP module of the MCU; (9) The MP module receives the signaling, adjusts Corresponding codec node relationships. 3 is a flowchart of the entire system processing of the media stream processing method according to a preferred embodiment of the present invention. As shown in FIG. 3, the process includes the following steps: Step S302: A user sets a conference-related configuration through a WEB interface, including the current conference. There are several terminals, what format and what bit rate are each terminal, and the number of multi-pictures of the conference, etc.; Step S304, the MCU sends a letter to each terminal according to the encoding format of each terminal according to the conference setting.

Step S306, if the image format signaling interaction is unsuccessful, the terminal does not go up; if the interaction is successful, in step S3062, the terminal and the MCU interact to output an image format; in step S3064, the terminal sends the image compressed code stream to the MCU according to the corresponding format; S308, the VPU module in the MCU separately decodes the received image compression code streams of each terminal, and obtains an image linear code of each terminal. Step S310, the VPU separately performs gesture recognition on the image linear code decoded by each terminal. In step S312, it is determined whether there is multi-screen adjustment information. If there is no multi-screen adjustment gesture, the process directly jumps to step S316. If there is an adjusted gesture, step S314 is performed. Step S314, the corresponding multi-screen number adjustment information is acquired, and the VPU module will recognize The multi-screen number adjustment information is sent to the MP module of the MCU, and the MP module receives the signaling, and adjusts the corresponding codec node relationship; in step S316, the MCU performs multi-picture synthesis on the decoded image of each terminal according to the corresponding relationship of the codec. Encoding step S318, sending the encoded code stream back to the terminal; Step S3182, the terminal receiving the code stream decoding and sending . With the method of the preferred embodiment, there is no need to modify the terminal code at all, and there is no need to increase the burden of the terminal and the MCU signaling channel, and it is not necessary to upgrade the terminal on a large scale, and other terminals can also implement this function, which is existing. Technology can't reach it. In addition, the following extensions can be made: (1) The multi-screen layout can be adjusted, the frame rate can be adjusted, the code rate can be adjusted, and the format format can be adjusted; (2) As long as the signaling interaction between the terminal and the MCU is increased It can be controlled by designing corresponding gestures; (3) In addition to gestures, sound control can also be implemented in this way; (4) Any subsequent control by some kind of signal recognition can be implemented in this way. 4 is a flowchart of an MCU side of a media stream processing method according to a preferred embodiment of the present invention. As shown in FIG. 4, the MCU side includes an MP and a digital sound processor (DSP), and the process includes the following: Steps: Step S402, the MP obtains the relevant configuration of the conference by the conference control interface, including the number of terminals in the conference, the format and the code rate of each terminal, and the number of multiple frames of the conference; S404, the MP performs signaling interaction with each terminal according to the encoding format of each terminal according to the conference setting; step S406, if the image format signaling interaction is unsuccessful, the terminal does not attend; if the interaction is successful, the receiving terminal presses the corresponding format. The transmitted image compresses the code stream and sends it to the DSP; Step S4062, the DSP receives the terminal code stream; Step S408, the VPU module in the DSP separately decodes the received image compressed code streams of each terminal, and obtains the terminals of each terminal. Image linear code; Step S410, the VPU separately performs gesture recognition on the image linear code decoded by each terminal; Step S412, determining the image If there is no multi-screen adjustment gesture, skip to step S420, if there is an adjusted gesture, go to step S414; step S414, determine whether the gesture is legal, if not, go directly to step S420, if it is legal, then Sending the multi-screen number adjustment information to the MP module; Step S416, the MP receives the corresponding multi-screen number adjustment information sent by the DSP; Step S418, adjusting the corresponding relationship between the corresponding codec nodes; Step S420, the DSP performs multi-picture synthesis and re-encoding on the decoded image of each terminal according to the corresponding relationship between the codecs. In step S422, the encoded code stream is sent back to each terminal. The following is a description of a method for adjusting a multi-picture by gesture recognition in a conference television system in combination with a gesture acquired by a terminal and a screen presented by a terminal: The system used in the preferred embodiment is a conference television system, with reference to the above embodiments and preferred embodiments The flow chart is processed. The user first holds a four-screen 720P (1280x720) 30-frame conference of four terminals A, B, C, and D. The conference control interface sends this information to the MCU MP module. After the MP module obtains this signaling, And the terminal performs 720P (1280x720) 30 frame signaling interaction. If four terminals A, B, C, and D support this format, the interaction is successful. Then, four terminals A, B, C, and D send 720P (1280x720) 30 frames of compressed code streams to MCU, assuming that no user has controlled the multi-screen by gesture at this time. After receiving the code stream, the MCU sends the four image compression code streams to the VPU module on the MCU side. The VPU uses four decoding nodes for decoding, and performs gesture recognition after decoding. No information is found to adjust the number of multi-pictures, then four The decoding node sends the respective image linear codes to the same encoding node a, and then the encoding node a reduces the respective images by a quarter, performs multi-picture synthesis according to FIG. 7, and finally encodes the synthesized image linear codes and returns them. Give the terminal. The terminal receives the multi-picture compressed code stream sent by the MCU for decoding and display, and the four terminal users can see the four-picture image as shown in FIG. 7. At this time, if the user of the terminal A only wants to see the image of the three terminals, that is, the three screens, the gesture shown in FIG. 5 is presented to the camera (two options are optional), and the terminal will include the gesture. The image is sent to the MCU. After receiving the code stream containing the gesture information, the MCU sends the four image compression code streams to the VPU module on the MCU side. The VPU uses four decoding nodes to decode, and the four decoding nodes decode the gestures respectively. Nodes B, C, and D do not find information for adjusting the number of multi-pictures. Decoding node A finds that terminal A needs to adjust the multi-picture number information of the three-picture image, and decoding node A sends this information to MP, and MP adds one coding node b. To encode the image required by the terminal A, and adjust the codec node relationship. In addition to transmitting the respective image linear codes to the encoding node a for multi-picture synthesis, the decoding nodes A, B, and C also need to convert the images. The linear code is sent to the encoding node b for multi-picture synthesis. After the encoding node is synthesized, the encoding node a reduces the image by a quarter and then synthesizes the multi-picture according to Figure 7, and encodes the image linear code and sends it back to the terminal B. , C, D, the coding node b reduces each image by a quarter and then synthesizes according to the multi-picture of FIG. 6, and encodes the image linear code and returns it to the terminal A. Terminal A receives the compressed code stream of the encoding node b, and after decoding and transmitting, can see the three-picture image as shown in FIG. 6; the terminal B, C, D receives the compressed code stream of the encoding node a, and can be decoded and sent. See the four-picture image shown in Figure 7. This completes the operation of reducing the number of multi-pictures by gesture recognition. Obviously, those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device so that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

Claim

1. A parameter control method comprising the following steps:

Parsing audio data and/or video data from the terminal;

Determining that the picture in the audio and/or video data in the audio data includes an audio and/or picture for indicating adjustment of the parameter;

The parameter is adjusted in accordance with the audio and/or picture for indicating adjustment of the parameter.

2. The method according to claim 1, wherein determining, in a predetermined period of time, a picture in the audio and/or video data in the audio data comprises the audio and the instruction to indicate adjustment of a parameter / or screen.

3. The method according to claim 1, wherein, in the case that the parameter is a parameter used to send a media stream to the terminal, according to the audio for indicating adjustment of the parameter and/or After the picture is adjusted, the adjusted media stream is used to send the media stream to the terminal.

The method according to claim 3, wherein, when the media stream is a video stream, the parameter includes at least one of: a number of terminal pictures included in the video stream, the video stream is in a The layout of the terminal display, the frame rate of the video stream, the code rate of the video stream, and the format of the video stream.

The method according to any one of claims 1 to 4, wherein the video data from the terminal is parsed to obtain an image linear code; and the image linear code is included to indicate that the parameter is adjusted Picture.

6. A parameter control device comprising:

a parsing module configured to parse audio data and/or video data from the terminal; the determining module configured to determine a picture included in the audio and/or video data in the audio data, to indicate that the parameter is adjusted Audio and / or picture;

An adjustment module configured to adjust the parameter in accordance with the audio and/or picture for indicating adjustment of the parameter.

7. The apparatus according to claim 6, wherein the determining module is configured to determine, in a preset time period, that a picture in the audio and/or video data in the audio data includes the indication for indicating Audio and/or picture that adjusts the parameters. The device according to claim 6, wherein, in the case that the parameter is a parameter used to send a media stream to the terminal, the adjustment module is configured to perform an adjustment according to the parameter for indicating After the audio and/or picture adjusts the parameters, the adjusted media is used to send the media stream to the terminal. The device according to claim 8, wherein, when the media stream is a video stream, the parameter comprises at least one of: a number of terminal pictures included in the video stream, the video stream is at the terminal The displayed layout, the frame rate of the video stream, the code rate of the video stream, and the format of the video stream. The apparatus according to any one of claims 6 to 9, wherein the parsing module is configured to parse the video data from the terminal to obtain an image linear code; the determining module is configured to determine the linear code of the image A screen for indicating adjustment of the parameters is included.