CN113301367A - Audio and video processing method, device and system and storage medium - Google Patents

Audio and video processing method, device and system and storage medium Download PDF

Info

Publication number
CN113301367A
CN113301367A CN202110308870.5A CN202110308870A CN113301367A CN 113301367 A CN113301367 A CN 113301367A CN 202110308870 A CN202110308870 A CN 202110308870A CN 113301367 A CN113301367 A CN 113301367A
Authority
CN
China
Prior art keywords
image data
camera
camera module
display screen
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110308870.5A
Other languages
Chinese (zh)
Inventor
郑坤坤
吴思琦
洪梦初
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Innovation Co
Original Assignee
Alibaba Singapore Holdings Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Singapore Holdings Pte Ltd filed Critical Alibaba Singapore Holdings Pte Ltd
Priority to CN202110308870.5A priority Critical patent/CN113301367A/en
Publication of CN113301367A publication Critical patent/CN113301367A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects

Abstract

The embodiment of the application provides an audio and video processing method, audio and video processing equipment, an audio and video processing system and a storage medium. In the embodiment of the application, when image data is collected, a movable camera is adopted, and the relative position of the camera is adaptively adjusted according to a reference mark (such as a sight line) of a shot object, so that the relative position of the camera is adapted to the reference mark of the shot object.

Description

Audio and video processing method, device and system and storage medium
Technical Field
The present application relates to the field of audio and video processing technologies, and in particular, to an audio and video processing method, device, system, and storage medium.
Background
With the development of internet technology and 4G/5G communication technology, a large number of audio/video pictures are generated in various application scenes, such as a live broadcast scene, an online education scene, a recorded broadcast scene or a video conference scene. In these audio-visual pictures, a problem of a person's line of sight deviation, such as the person's line of sight being directed downward, often occurs. Especially for live scenes with interaction requirements, if the line of sight of the anchor in the live frame is always downward or is biased to other directions instead of closely following the line of sight of the watching user, the user experience is poor because no interaction between the user and the anchor occurs. Therefore, the problem of the human sight line deviation in the audio and video picture is a problem which needs to be solved urgently in various audio and video scenes.
Disclosure of Invention
Aspects of the present application provide an audio and video processing method, device, system, and storage medium to solve the problem of human line of sight deviation in an audio and video picture.
The embodiment of the application provides an audio and video acquisition device, includes: an apparatus body; the equipment body is provided with a movable structure, and a camera module is arranged on the movable structure; wherein, the movable structure can adjust the relative position of the camera module to make the camera module adapt to the reference mark of the shot object; the camera module is used for collecting and outputting image data containing a shot object at a position matched with the reference mark.
An embodiment of the present application further provides an audio/video processing system, including: the device comprises audio and video acquisition equipment and a display screen in communication connection with the audio and video acquisition equipment; the audio and video acquisition equipment is hung or fixed on the display screen and comprises a movable structure, a camera module is arranged on the movable structure, and the camera module is positioned in front of the display screen and can move relative to the display screen; and the audio and video acquisition equipment is used for adjusting the relative position of the camera module through the movable structure to enable the camera module to be matched with the reference mark of the shot object, so that the camera module can acquire image data containing the shot object at the position matched with the reference mark and output the image data to the display screen.
An embodiment of the present application further provides an online live broadcast system, including: the system comprises integrated live broadcast equipment and a display screen in communication connection with the live broadcast equipment; the live broadcast equipment is hung or fixed on the display screen and comprises a movable structure, a camera module is arranged on the movable structure, and the camera module is positioned in front of the display screen and can move relative to the display screen; the live broadcast equipment is used for adjusting the relative position of the camera module through the movable structure to enable the camera module to be matched with the reference mark of the anchor broadcast, so that the camera module can collect image data containing the anchor broadcast at the position matched with the reference mark; and synthesizing a live broadcast picture based on the image data, and respectively sending the live broadcast picture to a display screen and a user terminal for displaying.
The embodiment of the application also provides a video conference system, which comprises a plurality of conference participating ends, wherein each conference participating end comprises an integrated conference terminal and a display screen in communication connection with the conference terminal; the conference terminal is hung or fixed on the display screen and comprises a movable structure, a camera module is arranged on the movable structure, and the camera module is positioned in front of the display screen and can move relative to the display screen; the conference terminal is used for adjusting the relative position of the camera module through the movable structure so as to enable the camera module to be matched with the reference mark of the conference speaker, and the camera module is used for collecting image data containing the conference speaker at the position matched with the reference mark; and synthesizing a conference picture based on the image data, and respectively sending the conference picture to a display screen and conference terminals in other conference participants for displaying.
An embodiment of the present application further provides an audio and video processing method, including: adjusting the relative position of a camera module on the audio and video acquisition equipment to enable the camera module to be matched with a reference mark of a shot object, so that the camera module acquires image data containing the shot object at the position matched with the reference mark; and acquiring image data acquired by the camera module and outputting the image data.
The embodiment of the present application further provides an online live broadcasting method, including: adjusting the relative position of a camera module on the live broadcast equipment to enable the camera module to be matched with a reference mark of the anchor, so that the camera module collects image data containing the anchor at the matched position; the method comprises the steps of obtaining image data collected by a camera module, synthesizing a live broadcast picture based on the image data, and sending the live broadcast picture to a user terminal for displaying.
An embodiment of the present application further provides a video conference method, including: adjusting the relative position of a camera module on the conference terminal to enable the camera module to be matched with the reference mark of the conference speaker, so that the camera module collects image data containing the conference speaker at the matched position; and obtaining image data acquired by the camera module, synthesizing a conference picture based on the image data, and sending the conference picture to other conference terminals for displaying.
An embodiment of the present application further provides an audio/video processing system, including: a camera module and a display device; the camera module is suspended or fixed in front of a screen of the display equipment and can move relative to the screen of the display equipment; the camera module is used for adjusting the relative position of the camera module to be matched with a reference mark of a shot object, collecting image data containing the shot object at the position matched with the reference mark and transmitting the image data to the display equipment; a display device for displaying the image data.
An embodiment of the present application further provides a data display method, including: acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object; and displaying the image data and/or the associated data on the display screen by taking the position of the camera mapped on the display screen as a focus center.
An embodiment of the present application further provides a data processing apparatus, including: a memory and a processor; a memory for storing a computer program; a processor coupled with the memory for executing the computer program for: acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object; and displaying the image data and/or the associated data on the display screen by taking the position of the camera mapped on the display screen as a focus center.
Embodiments of the present application further provide a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to implement the steps in the methods provided by the embodiments of the present application.
Embodiments of the present application also provide a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the processor is caused to implement the steps in the methods provided by the embodiments of the present application.
In the embodiment of the application, when image data is collected, a movable camera is adopted, and the relative position of the camera is adaptively adjusted according to a reference mark (such as a sight line) of a shot object, so that the relative position of the camera is adapted to the reference mark of the shot object.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1a is a perspective structural view of an audio/video capture device according to an exemplary embodiment of the present disclosure;
fig. 1b is a top view of an audio/video capture device provided in an exemplary embodiment of the present application;
fig. 1c is a rear view of an audio/video capture device according to an exemplary embodiment of the present application;
fig. 1d is a bottom view of an audio/video capture device according to an exemplary embodiment of the present application;
fig. 2a is a schematic structural diagram of an audio/video processing system according to an exemplary embodiment of the present application;
fig. 2b is a schematic structural diagram of another audio/video processing system provided in an exemplary embodiment of the present application;
fig. 3a is a schematic diagram showing an audio/video picture and associated data thereof on a display screen;
FIG. 3b is a schematic diagram illustrating interface changes of a display screen when a live-action picture is set;
FIG. 3c is a diagram of the interface layout change of the display screen as the camera module is lowered;
FIG. 3d is another view of the interface layout change of the display screen as the camera module is lowered;
fig. 4 is a schematic structural diagram of an online live broadcast system according to an exemplary embodiment of the present application;
fig. 5 is a schematic structural diagram of a video conference system according to an exemplary embodiment of the present application;
fig. 6a is a schematic flowchart of an audio/video processing method according to an exemplary embodiment of the present application;
fig. 6b is a schematic flowchart of an online live broadcast method according to an exemplary embodiment of the present application;
fig. 6c is a schematic flowchart of a video conference method according to an exemplary embodiment of the present application;
fig. 7 is a schematic structural diagram of another audio/video processing system provided in an exemplary embodiment of the present application;
FIG. 8 is a schematic flow chart diagram illustrating a data display method according to an exemplary embodiment of the present application;
fig. 9 is a schematic structural diagram of a data processing device according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In various application scenes, such as a live scene, an online education scene or a video conference scene, a large number of audio and video pictures are generated. These audio/video frames usually include shot objects, which are simply called shot objects, for example, in a live scene, an online education scene, or a video conference scene, the shot objects are a main broadcasting user, a teacher or a student in a class, and participants, respectively. In practical applications, the line of sight of these subjects in the audio/video picture may be shifted and fail to closely follow the line of sight of the audience, which will result in no interaction between the user and the anchor and poor user experience. Therefore, the problem of the human sight line deviation in the audio and video picture is a problem which needs to be solved urgently in various audio and video scenes.
Aiming at the problem of the visual line deviation of a shot object in an audio and video picture, in some embodiments of the application, when image data is collected, a movable camera is adopted, and the relative position of the camera is adaptively adjusted according to a reference mark of the shot object, so that the relative position of the camera is matched with the reference mark of the shot object.
Based on the above, in the audio and video acquisition device 201 with the movable camera provided in the embodiment of the present application, fig. 1a is a three-dimensional structure diagram of the audio and video acquisition device 201. As shown in fig. 1a, the audio and video capture device includes: the device comprises a device body 101, wherein a movable structure 104 is arranged at the bottom of the device body 101, optionally, a processor 103 is further arranged on the device body 101, and a camera module 105 is mounted on the movable structure 104. The camera module 105 includes a camera 105a, and the camera 105a may include, but is not limited to: optical cameras, monocular cameras, binocular cameras, Red Green Blue Depth map (RGBD) cameras, 3D Structured Light cameras (Three Dimensional Structured Light), and the like. In an embodiment, the camera 105a is a 3D structured light camera, and the basic principle of the 3D structured light camera is that light rays with certain structural features are projected onto a subject through a laser and collected by cameras located at two sides of the laser. The light rays with a certain structure can acquire different image phase information according to different depth areas of the shot object, and then the change of the structure is converted into depth information through an arithmetic unit, so that the three-dimensional structure of the shot object is obtained. Based on the three-dimensional structure of the subject, deeper processing such as face unlocking, face beautification, and the like can be performed for the subject.
Further optionally, with reference to fig. 1a to 1d, the audio/video capture device may further include at least one of the following components: a microphone array 102 located on the device body 101, a dot matrix screen 106, physical keys 107, a heat dissipation hole 108, and at least one communication interface 109. Wherein the microphone array 102 is used for acquiring audio signals adapted to the image data. It should be noted that the device capable of acquiring the audio signal is not limited to the microphone array 102, and may be a single microphone. The dot matrix screen 106 is configured to display status information of the audio/video acquisition device, the camera module, and/or the microphone array, where the device status information may reflect a current status of the audio/video acquisition device, the camera module, and/or the microphone array. Taking the application of the audio/video capture device to the live scene as an example, the status information may be, but is not limited to: on live (ONLIVE), on radio, OFF (OFF), etc., the "ONLIVE" displayed in the dot matrix screen in fig. 1a indicates that the audio/video capture device is live. The physical key 107 is used to implement control operations on the audio/video capture device, for example, to control the audio/video capture device to be powered on or powered off, and also to control the movable structure 104 to be stretched and retracted. The at least one communication interface 109 is used for realizing communication connection between the audio/video capture device and an external device, and may include but is not limited to: a network Interface, a Universal Serial Bus (USB) Interface, a display Interface or a headphone Interface, a microphone Interface, and the like, wherein the display Interface may be a High Definition Multimedia Interface (HDMI), a Digital Video Interface (DVI), a Display Port (DP), or the like.
In the present embodiment, the disposition positions of the respective components included in the apparatus body 101 are not limited. As shown in fig. 1a, the dot-matrix screen 106 is disposed at the front end of the device body 101, and the physical keys 107 are disposed at the side of the device body 101; as shown in fig. 1b, the top view of the audio and video acquisition device, the radiating hole 108 and the microphone array 102 are arranged at the top of the device body 101; as shown in fig. 1c, the communication interface 109 is disposed at the rear end of the device body. Alternatively, as shown in fig. 1d, the bottom of the device body 101 is provided with a groove 110, the movable structure 104 is mounted in the groove 110, and the movable structure 104 and the camera module 105 can be retracted into the groove 110.
In the present embodiment, the movable structure 104 can adjust the relative position of the camera module 105 by moving so as to fit the reference mark of the subject. The reference mark of the subject may be a physical feature of the subject, for example, five sense organs, a line of sight, or the like of the subject; the attachment may be an attachment to the subject, such as clothing, shoes, or an article carried by the subject, but is not limited thereto. The movable structure 104 may move left and right, or up and down. Alternatively, the movable structure 104 may be implemented as a lifting structure capable of moving the camera module 105 up and down, and the lifting structure adjusts the height of the camera module 105 by extending and contracting up and down so as to be adapted to a reference mark (e.g., a line of sight) of the subject in the height direction. Or, alternatively, the movable structure 104 may be implemented as a horizontal displacement structure capable of driving the camera module 105 to move horizontally, and the horizontal displacement structure adjusts the horizontal position of the camera module 105 by stretching left and right to fit a reference mark (such as a line of sight) of the subject in the horizontal direction.
The camera module 105 is adapted to the reference mark of the subject according to different application scenes, and can be understood differently. Taking the reference mark as the line of sight of the subject as an example, the camera module 105 and the reference mark of the subject may be adapted such that the camera module is flush with the line of sight of the subject, and at this time, the subject looks directly at the camera; alternatively, the camera module 105 may be adapted to the reference mark of the subject, or the camera module may be higher than the line of sight of the subject by a certain distance, and at this time, the subject looks up at the camera; alternatively, the camera module 105 may be adapted to the reference mark of the subject by slightly lowering the camera module to a certain distance from the line of sight of the subject, and at this time, the subject looks down on the camera. In the present embodiment, the embodiment in which the relative position of the camera module 105 is adjusted by the movable structure 104 is not limited. For example, the adjustment may be manual, or the processor 103 may control the movement of the movable structure 104 automatically.
In this embodiment, the camera module 105 can capture and output image data including a subject at a position adapted to a reference mark of the subject. In the present embodiment, the embodiment in which the camera module 105 outputs image data is not limited, and the following description will be made by way of example.
In an optional embodiment, the camera module 105 may output the image data to a server of the audio/video acquisition device, where the server may receive the image data, and in addition, the server may also receive audio data that is acquired by a microphone array of the audio/video acquisition device and is adapted to the image data, for example, a sound signal emitted by a subject or an audio signal related to the subject, and then the server may synthesize an audio/video picture with the audio data and the image data, and the server sends the audio/video picture to the terminal device, so that the terminal device displays the audio/video picture.
In another alternative embodiment, the camera module 105 transmits the image data to the processor 103, and the processor 103 outputs the image data to the terminal device through the communication interface 109 on the device body 101 for display by the terminal device. Further alternatively, in the case where the microphone array 102 is provided on the apparatus body 101, the microphone array 102 may collect an audio signal corresponding to image data, for example, a sound signal emitted by the subject or an audio signal related to the subject, and transmit the audio signal to the processor 103; the processor 103 is responsible for synthesizing the image data and the audio signal into an audio/video picture, and sending the audio/video picture to the terminal device so that the terminal device can display the audio/video picture. The terminal device may be different according to different application scenarios. For example, in a live scene, the terminal device is a user terminal that watches an audio/video picture; in a conference scenario, the terminal device is a conference terminal.
Alternatively, as shown in fig. 1a, the camera module 105 may include, in addition to the camera 105 a: the indicator light 105b and the indicator light 105b may emit an alarm signal to the outside, for example, in general, the indicator light 105b is green and always on, and the alarm may be yellow or red and always on. The shape of the indicator light 105b is not limited in this embodiment, and may be, for example, a point-shaped light, a strip-shaped light, or a ring-shaped light, and in fig. 1a, the indicator light 105b is a ring-shaped light, and the surrounding camera 105a is provided as an example for illustration, but the invention is not limited thereto. Further alternatively, the processor 103 may monitor whether the relative position of the reference mark of the subject and the camera is adapted according to the image data collected by the camera module, and control the indicator lamp 105b to send out a warning signal when it is monitored that the reference mark of the subject and the camera are not adapted and the duration of the unsuitable adaptation exceeds a set duration threshold, where the set duration threshold may be 2s, 3s, or 5s, and the like, which is not limited.
The audio and video acquisition device provided by the embodiment can not only provide the image data for the terminal device, but also display the image data on the display screen under the condition that the audio and video acquisition device is used in cooperation with the display screen. Based on this, in an optional embodiment, in the case that the audio/video capture device is externally connected to the display screen, the device body 101 is suspended or fixed on the display screen, and the camera module 105 is located in front of the display screen and is movable. Optionally, the display screen may be connected to the audio/video capture device through a display interface of the audio/video capture device.
Further optionally, the processor 103 is further configured to: and displaying the image data on the display screen according to the camera position mapped on the display screen by the camera module.
Further optionally, taking the camera as a center of attention radiation, the data priority decreases with increasing distance from the camera, so that the subject can pay attention to important or critical information displayed on the display screen while watching the camera, and visual offset is reduced. Specifically, the processor 103, when displaying the image data on the display screen according to the camera position of the camera module on the display screen, is specifically configured to: acquiring associated data of the image data on the basis of the image data, and determining the priority between the image data and the associated data, wherein the priority reflects the attention value of the data; and displaying the image data and the related data in a surrounding area of the camera position according to the priority by taking the camera position as a focus center, wherein the higher the priority is, the higher the focus value is, and the closer the display position of the data on the display screen is to the camera position. The image data may have one or more kinds of associated data, and the priority of different associated data is different.
Further optionally, when the processor 103 displays the image data and the associated data in the peripheral area of the camera position according to the priority with the camera position as the center of attention, the processor is specifically configured to: dividing at least two information distribution areas on a display screen by taking the position of a camera as a focus center, wherein the farther the distance between the information distribution areas and the position of the camera is, the lower the attention degree of the information distribution areas is; and displaying the image data and the related data thereof in the information distribution areas with corresponding attention degrees according to the distances between the at least two information distribution areas and the positions of the cameras and the priority between the image data and the related data thereof.
Further optionally, when the processor 103 displays the image data and the associated data thereof in the information distribution area with the corresponding attention degree according to the distance between the at least two information distribution areas and the camera position and the priority between the image data and the associated data thereof, the processor is specifically configured to: grading the image data and the data in the related data respectively, wherein the higher the grade of the data is, the higher the attention value of the data is; and displaying the data of different grades in the information distribution areas with corresponding attention degrees according to the distances between the at least two information distribution areas and the positions of the cameras and the priority between the image data and the associated data thereof. The display area corresponding to the image data and the associated data on the display screen can be determined according to the priority between the image data and the associated data; for each type of data (either image data or some associated data), the data is displayed in its corresponding display area further according to the ranking between the data in each type of data.
In an alternative embodiment, the processor 103 is further configured to: and responding to picture setting operation initiated by the shot object, adjusting the interface position of the image data to enable the image data to be close to the camera position, and/or carrying out picture amplification on the image data to enable the shot object to view picture setting effects. In the case that the subject has an interactive function, the screen setting operation can be initiated remotely by the subject through the remote control terminal, and can also be sent out by the subject in a voice mode. Of course, if the subject does not have the interactive function, the screen setting operation may be issued by the person in charge of shooting through a remote control terminal or a voice mode, which is not limited thereto.
In an alternative embodiment, the processor 103 is further configured to: responding to an instruction for adjusting the relative position of the camera module, and identifying an interface element on a display screen, wherein the interface element is positioned on the moving path of the camera module; the display position of the interface element on the display screen is adjusted along with the position change of the camera module; and/or adjusting the image data from the current display area to a display area closer to the camera position in response to an interactive request with the subject. When the shot object has an interactive function, the instruction for adjusting the relative position of the camera module can be initiated remotely by the shot object through a remote control terminal, can be sent out by the shot object in a voice mode, or can be sent out manually by the shot object through a physical key.
In an alternative embodiment, the processor 103 is further configured to: and receiving a control instruction or a voice instruction, and executing corresponding control operation according to the control instruction or the voice instruction. The control instruction or the voice instruction may be an instruction for instructing to perform a picture setting operation, may also be an instruction for adjusting a relative position of the camera module, and may also be an instruction for changing a picture background, performing system setting, and the like.
In an optional embodiment, the manner of displaying the image data on the display screen is specifically as follows: and synthesizing the image data and the audio signal adapted to the image data into an audio and video picture, and displaying the audio and video picture on a display screen. The manner of displaying the audio/video image on the display screen is the same as the detailed implementation manner provided in the foregoing and the following embodiments of the present application, and is not described herein again.
According to the audio and video acquisition equipment provided by the embodiment of the application, in the process of acquiring image data, the movable camera is adopted, the relative position of the camera is adaptively adjusted according to the reference mark (such as sight line) of the object to be shot, so that the relative position of the camera is adaptive to the reference mark of the object to be shot, under the condition, the sight line reference mark (such as sight line) of the object to be shot in the image data of the object to be shot acquired based on the camera basically has no offset, the problem of offset of the reference mark of the object to be shot in a picture is solved, particularly in some on-line interactive scenes, the communication sense between the object to be shot and an opposite-end user can be improved, and the experience sense of the opposite-end user is enhanced. Furthermore, the audio and video acquisition equipment provided by the embodiment is a software and hardware all-in-one machine integrating a movable camera, a microphone array and a processor, and is simple in structure and easy to operate.
Based on the above audio and video acquisition device, an embodiment of the present application further provides an audio and video processing system, as shown in fig. 2a, the audio and video processing system includes: for the structure of the audio/video acquisition device 201, reference may be made to the foregoing embodiments, and details are not repeated herein, and the terminal device 204 refers to a terminal device that views an audio/video image, where the terminal device may be different according to different application scenarios. For example, in a live scene, the terminal device is a user terminal that watches an audio/video picture; in a conference scenario, the terminal device is a conference terminal.
Further, as shown in fig. 2b, the audio/video processing system further includes a display screen 202 communicatively connected to the audio/video capture device 201. The display screen 202 may cooperate with the audio/video acquisition device 201 to perform audio/video processing. The audio/video acquisition device 201 can send the image data to at least one terminal device 204 through the internet for display, and can also send the image data to the display screen 202 for display, so that the shot object can view or know information such as the picture quality of the image data in time. It should be noted that, in the following description of the embodiment, various actions performed by the audiovisual acquisition device 201 are substantially performed by the processor of the audiovisual acquisition device, and for convenience of description, the audiovisual acquisition device 201 is taken as an execution subject in the following description of the embodiment.
In practical application, on one hand, the display screen 202 is connected through the display interface of the audio/video acquisition device 201, on the other hand, the device body of the audio/video acquisition device 201 can be hung or fixed on the display screen 202, and at this time, the camera module of the audio/video acquisition device 201 is located in front of the display screen 202 and can move relative to the display screen 202, as shown in fig. 2 b.
When image data needs to be generated, first, a reference mark of a subject is determined, and a relative position of a camera module on the audio/video acquisition device 201 is adjusted according to the reference mark so as to be adapted to the reference mark of the subject, and the position of the camera module can be manually adjusted, or a relationship between the reference mark of the subject and the current position of the camera module is analyzed by a processor according to image data acquired by the camera module at the current position, and if the reference mark of the subject and the current position of the camera module are not adapted, the position of the camera module can be adjusted by the processor, for example, the processor drives a stepping motor to drive a movable structure to move, so that the position of the camera module is adjusted to be adapted to the reference mark of the subject. Further optionally, the processor may adjust the position of the camera module multiple times to fit the reference mark of the subject. Note that, although the reference marks are illustrated as the height of the line of sight in fig. 2a and 2b, the present invention is not limited thereto.
Then, in the case that the camera module is adapted to the reference mark of the subject, the audio/video capture device 201 can capture the image data containing the subject by using the camera module, and on one hand, transmit the image data to the terminal device 204 for display, and on the other hand, can also display the image data on the display screen 202.
In an alternative embodiment, the audio-video capture device 201 has a microphone array in addition to the movable camera module; in the process of acquiring image data including a shot object by using the camera module, an audio signal adapted to the image data may also be acquired by using the microphone array, and further, after the audio/video acquisition device 201 may synthesize the image data and the audio signal into an audio/video picture to obtain the audio/video picture, as shown in fig. 2b, on one hand, the audio/video picture is sent to the terminal device 204 to be displayed, and on the other hand, the audio/video picture may also be displayed on the display screen 202. The camera module shoots the shot object at the position matched with the reference mark of the shot object, so that the obtained audio and video picture can solve the problem of the deviation of the reference mark (such as sight line) of the shot object in the audio and video picture to a certain extent, and when a user at a terminal equipment side watches the audio and video picture, the communication sense between the user and the shot object can be enhanced, and the experience sense of the user is enhanced.
In this embodiment, when the audio/video picture is displayed on the display screen 202, the audio/video picture can be displayed by taking a mapping position (referred to as a camera position for short) of the camera module on the display screen 202 as a focus center. The audio and video pictures are preferentially displayed in the area close to the position of the camera, so that when a shot object watches the camera, the audio and video pictures can be seen at the same time, and therefore, related information of the audio and video pictures can be known in time, for example, the state of the shot object in the audio and video pictures is sensed, timely adjustment is facilitated, and the quality of the audio and video pictures is improved.
Further optionally, in some application scenarios, in addition to displaying the audio-video picture on the display screen 202, associated data of the audio-video picture may also be displayed. The audio and video picture related data are different according to different application scenes. For example, in a live-broadcast delivery scene, the audio/video picture is a live-broadcast picture, and the associated data of the audio/video picture may be commodity information (for example, commodity numbers, commodity prices, or commodity links) in a live broadcast room, live-broadcast data (average stay time, number of fans, number of praise, amount of bargain, and the like), comment content (message content from fans to a host), and the like. For another example, in an online education scenario, the audio/video screen is a screen in which a teacher gives a lecture, and the associated data of the audio/video screen may be courseware in which the teacher gives a lecture, messages left by students, or broadcast data (the number of people who are present in the lecture, the number of people who are late, or the like). For another example, in a video conference scene, the audio/video screen is a screen of a speaker speaking, and the related data of the audio/video screen is a subtitle of the speaker speaking, a name or a nickname of the current speaker, and the like.
In this embodiment, when the subject records the audio/video screen, the audio/video screen or the associated data thereof displayed on the display screen 202 may be viewed, and corresponding operations may be performed according to the viewed content. For example, in a live delivery scene, the anchor needs to view comment contents of fans, consultation questions, and the like, and to introduce details of products to be sold with respect to the comment contents or the consultation questions. For another example, in a textbook teaching scene, a teacher needs to pay attention to a student message in real time and reply to the student message. However, when the subject views the content displayed on the display screen 202, if the content that the subject needs to view is not within the current visual line range of the subject, the subject may have a shifted visual line, which may cause a visual line deviation in the audio/video screen. In order to solve the problem of the line-of-sight deviation caused by the situation, in some optional embodiments of the present application, by adjusting the interface layout of the display screen, data that the subject is highly concerned is displayed in the line-of-sight range of the subject during normal recording, so that the probability of line-of-sight transition is reduced, the situation of line-of-sight deviation occurring in the audio/video picture is reduced, and the viewing experience of the user at the terminal device side is further ensured.
Specifically, the audio/video acquisition device can acquire associated data of audio/video pictures and determine priorities among the audio/video pictures and the associated data, wherein the associated data of the audio/video pictures can be one or more, the audio/video pictures and the associated data thereof can be regarded as different types of data, each type of data corresponds to the same priority, namely the audio/video pictures correspond to one priority, each type of associated data corresponds to one priority, and the priorities reflect the attention values of the data; taking the position of the camera as a focus center, and displaying the audio and video pictures and the associated data thereof in a surrounding area of the position of the camera according to the priority between the audio and video pictures and the associated data thereof; among them, the data with higher priority is more easily focused by the subject as the display position on the display screen is closer to the camera position. For example, as shown in fig. 3a, in a live broadcast cargo carrying scene, an audio/video picture is a live broadcast picture, and associated data of the audio/video picture is a commodity shopping bag (including a plurality of commodity details to be sold), commodity sales data, comment data, and anchor data, in the live broadcast scene, for the anchor, the attention values of the commodity details and the comment data are higher, and the priority of the data is also higher, so that the display areas of the commodity shopping bag and the comment data are closer to the camera position, while the attention values of the live broadcast picture, the commodity sales data, and the anchor data are relatively lower, and the display areas of the live broadcast picture, the commodity sales data, and the anchor data are relatively far away from the camera position. The anchor data refers to some data related to the anchor, such as an anchor account, a nickname, and the like; the commodity sales data refers to sales data, such as sales volume, relating to commodities recommended by the anchor.
In some optional embodiments of the present application, at least two information distribution regions are divided on the display screen by taking the camera position as the attention radiation center, and the farther the distance between the information distribution region and the camera position is, the lower the attention degree of the information distribution region is, wherein the information distribution region may be a circle, a square or an ellipse, which is not limited to this, in fig. 3a, the display region is divided into three information distribution regions, which are an information distribution region P0, an information distribution region P1 and an information distribution region P2, respectively, wherein the information distribution region P0 is an ellipse, and the information distribution region P1 and the information distribution region P2 are rings. After at least two information distribution areas are divided on the display screen, the audio/video picture and the associated data thereof can be displayed in the information distribution areas with corresponding attention according to the distance between the at least two information distribution areas and the position of the camera and the priority between the audio/video picture and the associated data thereof. It should be noted that, in the embodiment of the present application, the corresponding relationship between each type of data and the information distribution area is not limited, and specifically, the corresponding relationship has a relationship with the shape of the information distribution area, that is, one type of data may be distributed in one or more information distribution areas, and according to the priority level of the type of data, part of the content of the type of data may be displayed in the information distribution area with a corresponding attention degree. As shown in fig. 3a, if the priority of the shopping bags is high, 30% of the shopping bag contents can be displayed in the information distribution area P0 with the highest attention value, 60% of the shopping bag contents can be displayed in the information distribution area P1 and the information distribution area P2 with the lower attention value, and if the priority of the live broadcast screen is low, the live broadcast screen can be displayed in the information distribution area P1 and the information distribution area P2 with the lower attention value.
Considering that the priority of data distributed around the camera position with the camera position as the radiation center is lowered as the distance between the data distribution position and the camera position is increased, the visual offset of the subject can be reduced. Further optionally, the data in the audio/video picture and the associated data thereof are respectively graded, and the higher the grade of the data is, the higher the attention value of the data in the category to which the data belongs is; and displaying the data of different grades in the information distribution areas with corresponding attention degrees according to the distances between the at least two information distribution areas and the positions of the cameras and the priorities between the audio and video pictures and the associated data thereof. As shown in fig. 3a, shopping bags are firstly mapped into three information distribution areas P0-P2 according to the priorities of the shopping bags, furthermore, the details of the commodities in the commodity shopping bags are classified into grades, the sequence of the commodities is explained according to the anchor in the commodity shopping bags, the grades among the details of the commodities are determined, and the details of the commodities are sorted according to the grades as follows: the commodity information A1> commodity information A2> commodity information A3, the commodity information A1 is displayed in an information distribution area P0 with the highest attention degree, the commodity information A2 is displayed in an information distribution area P1 with the lower attention degree, and the commodity information A3 is displayed in an information distribution area P2 with the lowest attention degree, when a main broadcast watches the information distribution area P0, the sight line of the main broadcast still does not leave the camera module, so that the main broadcast can view the commodity information A1 and simultaneously fall in the sight line range of the main broadcast, and the problem of sight line deviation in the live broadcast process of the main broadcast is solved. For another example, as shown in fig. 3a, in a live scene, first, according to the priority of the comment data, the comment data is mapped to three information distribution areas P0-P2, so that the comment data of fan can be further classified into grades, and if the comment B1 is the newest in fan comments and the comment B2 is earlier than the comment B1, the comment B1 can be considered to be ranked higher than the comment B2, the comment B1 can be displayed in the information distribution area P0 with the highest attention degree, and the comment B2 can be displayed in the information distribution area P1 and the information distribution area P2 with the lower attention degree.
In an optional embodiment, the subject may control the audio/video capture device 201 to perform corresponding operations in a voice manner. Specifically, the subject sends a voice instruction to the audio/video capture device 201, and the audio/video capture device 201 executes a corresponding control operation according to the voice instruction. For example, in a live scene, the anchor issues a voice instruction "zoom in the lens", and the audio/video capture device 201 adjusts the focal length of the lens according to the voice instruction. For another example, the anchor issues a voice command "up link", and the audio/video capture device 201 presents the link of the corresponding product on the display screen 202. In addition, as shown in fig. 2b, the audio/video processing system further includes: the control terminal 203 is used for a subject to perform a series of control operations on the audio and video acquisition device 201 through the control terminal 203. For example, controlling the height of the camera module, instrumenting parameters of the display screen 202, setting parameters of an audio-video capture device, etc. In addition, in a specific application scenario, control of other parameters by the management and control terminal 203 may be further included, for example, in a live broadcast scenario, a live broadcast parameter, a live wheat parameter, a shopping bag, and the like may be set. In this embodiment, for a far-field interaction scene, for example, in a live broadcast cargo-taking scene, a director carries out live broadcast with standing posture, a series of control operations can be performed on the audio/video acquisition device through a control instruction sent by the control terminal 203 and/or a voice instruction sent by a shot object, so that the problem that touch screen interaction cannot adapt to a far field can be solved, the smoothness of far-field interaction is ensured, and the use feeling and experience feeling of the shot object are provided.
In the present embodiment, the subject may need to set the audio/video picture, for example, in a live streaming scene, some parameters of the audio/video picture may need to be set, as shown in fig. 3b, in the live scene, setting of contrast, saturation, hue, color temperature, sharpness, and the like of the live video picture is shown, but not limited thereto. In the process, if the display position of the audio/video picture on the display screen does not fall within the sight range of the shot object, the shot object may have sight line deviation when setting the audio/video picture. Based on this, in some optional embodiments of the present application, the subject initiates a picture setting operation to the audio/video acquisition device 201 by sending a voice instruction or sending a control instruction by the control terminal 203, and the audio/video acquisition device 201 adjusts an interface position of an audio/video picture in response to the picture setting operation initiated by the subject, so that the audio/video picture is close to the camera position, and/or performs picture amplification on the audio/video picture, so that the subject can view a picture setting effect. For example, in a live broadcast cargo-carrying scene, fig. 3b shows the interface layout of the display screen before the picture setting operation is performed on the live broadcast picture and during the picture setting operation on the live broadcast picture, before the picture setting operation is performed on the live broadcast picture, the live broadcast picture is located on the left side of the associated data of the live broadcast picture and is farther away from the camera, during the picture setting operation, the live broadcast picture is displayed in an enlarged manner, and the live broadcast picture is close to the camera, so that the shot object can view the picture setting effect.
In an alternative embodiment, the relative position of the camera module may need to be adjusted, for example, the reference mark (e.g. line of sight) of the subject is changed, or the reference mark (e.g. line of sight) of the subject is not changed, but the subject wants to look up or look down the lens. Based on this, the audio/video acquisition device 201 can completely present the interface elements on the display screen in a streaming self-adaptive manner based on the position of the camera module, thereby ensuring the integrity of the interface of the display screen. Specifically, the audio/video capture device 201, in response to the instruction for adjusting the relative position of the camera module, identifies an interface element (interface element) on the display screen 202, where the interface element is located on the movement path of the camera module, and the interface element refers to an operable minimum unit or a movable minimum unit on the display interface. Among these, interface elements may be, but are not limited to: the interface elements comprise a floating layer (popup), windows, Containers (Containers), scroll bars, graphics and the like, and the interface elements can bear audio and video pictures and associated contents thereof. For example, in a live scene, if the content associated with the audio/video picture is a user comment, the user comment can be carried through a container to obtain an interface element, and finally the user comment is presented on the display screen through the interface element. The audio and video acquisition device 201 can dynamically adjust the display position of the interface element on the display screen 202 along with the position change of the camera module. For example, the interface element on the moving path of the camera module carries user comments, the user comments carried by the interface element may be blocked during the downward movement of the camera module, and in order to better display the user comments on the display screen 202 completely, the interface element carrying the user comments may be moved to the left or to the right, or the length of the interface element may be compressed, etc. to display the user comments on the display screen 202 completely.
In an alternative embodiment, the audiovisual capture device 201 may respond to an interactive request with a subject. The interactive request is different according to different application scenes, for example, in a live broadcast scene, the interactive request may be a microphone connection request; in an online education scenario, the interactive request may be a request for a student to ask a question to a teacher. No matter what the interaction request is, when the object is interacting with the user at the terminal device 204 side, the content on the audio/video picture may be viewed, for example, the head portrait, nickname or Identity Document (ID) of the user, or the interaction with the target user is confirmed according to the prompt of the audio/video picture, so as to ensure that the object does not have a problem of line of sight deviation when viewing the audio/video picture, the audio/video acquisition device 201 may adjust the audio/video picture from the current display area to a display area closer to the camera position, so as to ensure that the object does not have line of sight deviation when viewing the audio/video picture, improve the communication feeling between the user viewing the audio/video picture and the object, and enhance the experience feeling of the user.
In this embodiment, the audio/video processing system may be applied to various application scenarios, for example, an online live broadcast scenario, an online education scenario, a video conference scenario, and the like. The following takes an online live broadcast scene and a video conference scene as examples, and details are introduced.
Fig. 4 is a schematic structural diagram of an online live system according to an exemplary embodiment of the present application, and as shown in fig. 4, the online live system includes: an integrated live device 401 and a display screen 402 in communication connection with the live device 401; live equipment 401 hangs or fixes on display screen 402, and live equipment 401 includes the camera module, and the camera module is located display screen the place ahead and can move relatively display screen 402.
In this embodiment, the live broadcast device 401 may adjust the position of the camera module to adapt to the height of the line of sight of the anchor, and in the case that the position of the camera module adapts to the height of the line of sight of the anchor, the camera module collects image data including the anchor, and then the live broadcast device 401 synthesizes a live broadcast picture based on the image data, for example, synthesizes an audio signal collected by a microphone array of the live broadcast device and adapted to the image data with the image data into a live broadcast picture, and sends the live broadcast picture to the display screen 402 and the user terminal 404 for display. In a live scene, the subject is the anchor and the corresponding reference mark may be the anchor's eyes or line of sight.
In this embodiment, not only the live view but also the associated data of the live view are displayed on the display screen 402, and as shown in fig. 3a, the associated data of the live view includes: merchandise shopping bags, anchor data, merchandise sales data, and review data. When displaying a live broadcast picture and associated data thereof, firstly determining the priority of the live broadcast picture and the associated data thereof, wherein the priority is ordered as follows: and then, with the camera position as a focus center, displaying the live broadcast picture and the related data thereof in a peripheral area of the camera position in a manner that the higher the priority is, the closer the live broadcast picture and the related data are to the camera position.
Further, taking the position of the camera as the focus center, three information distribution areas are divided on the display screen, as shown in fig. 3a, the information distribution areas are respectively: the more distant the information distribution area is from the camera position, the lower the attention degree of the information distribution area is, and the live broadcast picture and the associated data thereof are displayed in the information distribution area having the corresponding attention degree in priority.
Further, the data in the live broadcast picture and the associated data are respectively graded, as shown in fig. 3a, the grade in the shopping bag from high to low is: commodity information a1> commodity information a2> commodity information A3, the level of each comment in the comment data is from high to low: comment B1> comment B2, the higher the level of data indicates the higher the attention value of the data, and then, the data of different levels are displayed in the information distribution area having the corresponding attention.
In this embodiment, as shown in fig. 4, the online live broadcasting system further includes: and a remote control terminal 403, where the remote control terminal 403 is in communication connection with the live broadcast device 401, and the remote control terminal 403 may be implemented as a smart phone or may be an independent live broadcast remote controller. In the case where the remote control terminal 403 is implemented as a smart phone, a remote controller interface is provided in the live broadcast application on the smart phone, and the anchor can send various control instructions to the integrated live broadcast device through the interface, thereby implementing various controls on the live broadcast device, such as setting the state of a live broadcast picture, controlling the live broadcast, adjusting the height of a camera, setting or adjusting the picture quality, and the like.
In this embodiment, as shown in fig. 3b, the anchor may initiate a picture setting operation to the live broadcast device 401 through the remote control terminal 403, so as to adjust contrast, saturation, hue, color temperature, and sharpness of a live broadcast picture, and the live broadcast device 401 adjusts an interface position of the live broadcast picture, so that the live broadcast picture is close to a camera position, and meanwhile, the live broadcast picture is displayed in an enlarged manner, so that the anchor can check a picture device effect in time.
In this embodiment, the anchor can receive a wheat connection request initiated by a fan of the user terminal, and when the anchor connects a wheat with the fan, the live broadcast device 401 can adjust the interface position of the live broadcast picture to enable the live broadcast picture to be close to the position of the camera, so that the problem of sight deviation of the live broadcast picture when the anchor connects the wheat is avoided, the communication feeling between the fan and the anchor is improved, and the experience feeling of the fan is enhanced.
In this embodiment, in the live broadcasting process or before live broadcasting, the position of the camera needs to be adjusted, and when the position of the camera is adjusted, the interface content on the display screen can be adjusted in a streaming self-adaptive manner. As shown in fig. 3c, the whole commodity sales data is loaded on one interface element, and the whole anchor data is loaded on one interface element, when the camera module descends, the length of the interface element is compressed, and the compressed interface element corresponding to the commodity sales data and the compressed interface element corresponding to the anchor data are respectively displayed on the left side and the right side of the camera lifting path. As shown in fig. 3d, each comment in the comment data is displayed on the display screen as an independent interface element, so that interface elements corresponding to the comments C1, C2, and C3 can be obtained, when the camera module is lowered to the interface element position corresponding to the comment C1, in order to avoid the shielding of the camera module on the comment C1, the interface element corresponding to the comment C1 may be moved to the right, and at this time, the interface elements corresponding to the comments C2 and C3 may not be moved to the right; further, when the camera module continues to descend to the interface element position corresponding to the comment C2, in order to avoid the shielding of the camera module on the comment C2, the interface element corresponding to the comment C2 can continue to move rightward, and so on, so that the utilization rate of the display screen is improved.
Fig. 5 is a schematic structural diagram of a video conference system according to an exemplary embodiment of the present application, and as shown in fig. 5, the video conference system 500 includes: a plurality of conference participants are included, and 3 conference participants are illustrated in fig. 5, but not limited thereto. The 3 conference participants are a conference participant 501, a conference participant 502, and a conference participant 503. Each conference participant comprises an integrated conference terminal 504 and a display screen 505 in communication connection with the conference terminal 504; the conference terminal 504 is suspended or fixed on the display screen 505, the conference terminal 504 includes a camera module, the camera module is located in front of the display screen 505 and can be adjusted up and down, only the structure inside the conference participant terminal 501 is illustrated in fig. 5, and as for the internal structure diagrams of the conference participant terminal 502 and the conference participant terminal 503, reference may be made to the description of the conference participant terminal 501, and details are not repeated here.
In this embodiment, the conference terminal 504 may adjust the relative position of the camera module to adapt to the sight of the conference speaker, so that the camera module collects the image data including the conference speaker at the adapted position; and synthesizing a conference picture based on the image data, and respectively sending the conference picture to a display screen and conference terminals in other conference participants for displaying. Further, the associated data of the conference screen, for example, the message contents of other conference speakers, the conference screen of other conference speakers, and the like, may be displayed on the display screen. For other contents of the video conference system, reference may be made to the foregoing embodiments, and details are not repeated here. In a video conference scenario, the subject is a conference speaker and the corresponding reference mark may be an eye or line of sight of the conference speaker.
An embodiment of the present application further provides an audio/video processing method, which is applicable to the above-mentioned audio/video acquisition device or audio/video processing system, and as shown in fig. 6a, the method includes:
601a, adjusting the relative position of a camera module on the audio and video acquisition equipment to enable the camera module to be matched with a reference mark of a shot object, so that the camera module acquires image data containing the shot object at the position matched with the reference mark;
and 602a, acquiring image data collected by the camera module and outputting the image data.
In an optional embodiment, further comprising: collecting audio signals matched with the image data by using a microphone array, and synthesizing the image data and the audio signals into audio and video pictures; accordingly, the output image data is specifically: and sending the audio and video picture to corresponding terminal equipment for display.
In an optional embodiment, the audio/video capture device is suspended or fixed on a display screen to which it is connected, and the camera module is located in front of the display screen, and then outputs image data/audio/video pictures, including: and displaying the image data/audio/video pictures on the display screen according to the position of the camera head, which is mapped on the display screen by the camera head module.
Further optionally, displaying the image data/audio/video picture on the display screen according to a camera position of the camera module on the display screen, including: acquiring image data/audio and video pictures and associated data of the image data/audio and video pictures, and determining the priority between the image data/audio and video pictures and the associated data, wherein the priority reflects the attention value of the data; taking the position of the camera as a focus center, and displaying the image data/audio/video pictures and the associated data thereof in a surrounding area of the position of the camera according to the priority; wherein, the higher the priority of the data, the closer the display position on the display screen is to the camera position.
Further optionally, taking the camera position as a focus center, displaying the image data/audio/video picture and the associated data thereof in a surrounding area of the camera position according to the priority, including: dividing at least two information distribution areas on a display screen by taking the position of a camera as a focus center, wherein the farther the distance between the information distribution areas and the position of the camera is, the lower the attention degree of the information distribution areas is; and displaying the image data/audio/video picture and the related data thereof in the information distribution area with corresponding attention according to the distance and the priority between the at least two information distribution areas and the position of the camera.
Further optionally, displaying the image data/audio/video picture and the associated data thereof in the information distribution areas with the corresponding attention degrees according to the distance and the priority between the at least two information distribution areas and the position of the camera, including: the data in the image data/audio/video picture and the associated data are respectively graded, and the higher the grade of the data is, the higher the attention value of the data is; and displaying the data of different grades in the information distribution areas with corresponding attention degrees according to the distances and the priorities between the at least two information distribution areas and the positions of the cameras.
In an optional embodiment, the method provided in this embodiment further includes: responding to picture setting operation initiated by a shot object, adjusting the interface position of the image data/audio/video picture to enable the image data/audio/video picture to be close to the position of the camera, and/or amplifying the picture of the image data/audio/video picture to enable the shot object to check the picture setting effect.
In an optional embodiment, the method provided in this embodiment further includes: responding to an instruction for adjusting the position of the camera module, and identifying an interface element on a display screen, wherein the interface element is positioned on a moving path of the camera module; the display position of the interface element on the display screen is adjusted along with the position change of the camera module; and/or adjusting the image data/audio-video picture from the current display area to a display area closer to the position of the camera in response to an interactive request with the shot object.
An embodiment of the present application further provides an online live broadcasting method, as shown in fig. 6b, the method includes:
601b, adjusting the relative position of a camera module on the live broadcast equipment to enable the camera module to be matched with the sight of the anchor, so that the camera module collects image data containing the anchor at the matched position;
and 602b, acquiring image data acquired by the camera module, synthesizing a live broadcast picture based on the image data, and sending the live broadcast picture to the user terminal for displaying.
In an optional embodiment, further comprising: and acquiring an audio signal matched with the image data by using a microphone array, and synthesizing the image data and the audio signal into a live broadcast picture.
An embodiment of the present application further provides an online live broadcasting method, as shown in fig. 6c, the method includes:
601c, adjusting the relative position of a camera module on the conference terminal to enable the camera module to be matched with the sight of a conference speaker, so that the camera module collects image data containing the conference speaker at the matched position;
and 602c, acquiring image data acquired by the camera module, synthesizing a conference picture based on the image data, and sending the conference picture to other conference terminals for displaying.
In an optional embodiment, further comprising: and acquiring an audio signal matched with the image data by using a microphone array, and synthesizing the image data and the audio signal into a conference picture.
According to the audio and video processing method provided by the embodiment of the application, when image data is collected, the movable camera is adopted, the relative position of the camera is adaptively adjusted according to the reference mark (such as sight line) of a shot object, so that the relative position of the camera is matched with the reference mark of the shot object, and under the condition, the reference mark (such as sight line) of the shot object in the image data of the shot object collected by the camera basically has no offset, so that the problem of offset of the reference mark of the shot object in an audio and video picture is solved, particularly in some on-line interactive scenes, the communication sense between the shot object and an opposite-end user can be improved, and the experience sense of the opposite-end user is enhanced.
Besides the system embodiment, the embodiment of the application also provides an audio and video processing system. As shown in fig. 7, the system includes: a camera module 702 and a display device 703; the camera module 702 is suspended or fixed in front of the screen of the display device 703 and is movable relative to the screen of the display device 701.
In the present embodiment, the position of the camera module 702 is adjustable, and when acquiring the image data of the subject, by adjusting the position of the camera module 702 to be matched with the reference mark of the subject, based on this, the camera module 702 can acquire the image data containing the subject at the matched position and transmit the image data to the display device 703; the display device 703 may send image data to its screen for display.
Optionally, as shown in fig. 7, the audio/video processing system further includes: the audio acquisition device 701 is configured to acquire audio data adapted to the image data and send the audio data to the display device 703, and the display device 701 may further synthesize the image data and the audio data into an audio/video picture, and provide the audio/video picture to the terminal device 704 or send the audio/video picture to a screen of the terminal device for display.
In an alternative embodiment, the display device 703 can also display audio and video pictures on its screen according to the mapping position of the camera module 702 on its screen (i.e. the camera position). Further, the display device 703 may also display the audio/video picture and its associated data on its screen with the camera position as the center of attention radiation. The audio capturing device 701 may be a microphone array, a microphone, or a sound pickup device including a microphone array or a microphone. The difference between the audio/video processing system provided in this embodiment and the audio/video processing systems shown in fig. 2a and fig. 2b is that: the execution bodies of generating the audio/video picture and displaying the audio/video picture and the associated data on the screen thereof are different with the camera position as the attention radiation center, and the related details are the same as those of the foregoing embodiment, so that reference may be made to the foregoing embodiment, and further description is omitted here.
Further, an embodiment of the present application further provides a data display method, where the method is applicable to an audio/video processing system that includes an external camera and a display screen, and the camera is located in front of the display screen and can be adjusted up and down, such as the various systems provided in the foregoing embodiments, but not limited to the various systems provided in the foregoing embodiments.
As shown in fig. 8, the method includes:
801. acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object;
802. and displaying the image data and/or the associated data on the display screen by taking the position of the camera mapped on the display screen as a focus center.
For detailed implementation of steps 801-802, reference may be made to the foregoing embodiments, which are not described herein again.
According to the data display method provided by the embodiment of the application, when image data is collected, a movable camera is adopted, and the relative position of the camera is adaptively adjusted according to a reference mark (such as a sight line) of a shot object, so that the relative position of the camera is matched with the reference mark of the shot object, and under the condition, the reference mark (such as the sight line) of the shot object in the image data of the shot object collected by the camera basically has no offset, so that the problem of the offset of the reference mark of the shot object in an audio/video picture is solved; furthermore, based on the position of the camera on the display screen, the interface layout taking the camera as an attention radiation center can be realized, the data priority can be reduced along with the distance from the camera, so that the information needing to be paid attention preferentially can be displayed at the position close to the camera on the interface, the visual deviation problem can be further solved through the interface layout surrounding the camera, and the touch efficiency of the whole interface information can be improved. Especially in some online interactive scenes, the communication feeling between a shot object and an opposite-end user can be improved, and the experience feeling of the opposite-end user is enhanced.
It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 601a to 603a may be device a; for another example, the execution main bodies of steps 601a and 602a may be device a, and the execution main body of step 603a may be device B; and so on.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 601a, 602a, etc., are merely used for distinguishing different operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
Fig. 9 is a schematic structural diagram of a data processing device according to another exemplary embodiment of the present application. As shown in fig. 9, the data processing apparatus includes: a memory 94 and a processor 95. Further still include: a screen 97 is displayed.
The memory 94 is used for storing computer programs and may be configured to store other various data to support operations on the data processing apparatus. Examples of such data include instructions for any application or method operating on a data processing device.
The memory 94 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
A processor 95, coupled to the memory 94, for executing computer programs in the memory 94 for: acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object; the image data and/or its associated data are displayed on the display screen 97 with the camera position where the camera is mapped on the display screen as the center of attention.
In an alternative embodiment, the image data and the audio data adapted to the image data may also be synthesized into an audio-video picture; the audio/video picture and/or its associated data is displayed on the display screen 97 with the camera position of the camera on the display screen as the center of attention.
In an optional embodiment, when the camera position mapped on the display screen by the camera is taken as a focus center, and the image data/audio/video picture and/or its associated data are displayed on the display screen 97, the processor 95 is specifically configured to: acquiring associated data of the image data/audio/video pictures, and determining the priority between the image data/audio/video pictures and the associated data, wherein the priority reflects the attention value of the data; taking the position of the camera as a focus center, and displaying the image data/audio/video pictures and the associated data thereof in a surrounding area of the position of the camera according to the priority; among them, the higher the priority of the data, the closer the display position thereof on the display screen 97 to the camera position.
In an optional embodiment, when the processor 95 displays the image data/audio/video picture and the associated data thereof in the peripheral area of the camera position according to the priority with the camera position as the center of attention, the processor is specifically configured to: dividing at least two information distribution areas on the display screen 97 by taking the position of the camera as a focus center, wherein the farther the distance between the information distribution area and the position of the camera is, the lower the attention degree of the information distribution area is; and displaying the image data/audio/video picture and the related data thereof in the information distribution area with corresponding attention according to the distance and the priority between the at least two information distribution areas and the position of the camera.
In an optional embodiment, when the processor 95 displays the image data/audio/video picture and the associated data thereof in the information distribution area with the corresponding attention degree according to the distance and the priority between the at least two information distribution areas and the position of the camera, the processor is specifically configured to: the data in the image data/audio/video picture and the associated data are respectively graded, and the higher the grade of the data is, the higher the attention value of the data is; and displaying the data of different grades in the information distribution areas with corresponding attention degrees according to the distances and the priorities between the at least two information distribution areas and the positions of the cameras.
In an alternative embodiment, the processor 95 is further configured to: responding to picture setting operation initiated by a shot object, adjusting the interface position of the image data/audio/video picture to enable the image data/audio/video picture to be close to the position of the camera, and/or amplifying the picture of the image data/audio/video picture to enable the shot object to check the picture setting effect.
In an alternative embodiment, the processor 95 is further configured to: responding to an instruction for adjusting the position of the camera module, and identifying an interface element on a display screen, wherein the interface element is positioned on a moving path of the camera module; and adjust the display position of the interface element on the display screen 97 following the position change of the camera module; and/or adjusting the image data/audio-video picture from the current display area to a display area closer to the position of the camera in response to an interactive request with the shot object.
The data processing device provided by the embodiment of the application adopts the movable camera when acquiring the image data, and adaptively adjusts the relative position of the camera according to the reference mark (such as sight line) of the object to be shot, so that the relative position of the camera is adaptive to the reference mark of the object to be shot, and under the condition, the reference mark (such as sight line) of the object to be shot in the image data of the object to be shot acquired by the camera basically has no offset, thereby solving the problem of the offset of the reference mark of the object to be shot in an audio and video picture, and particularly improving the communication sense between the object to be shot and an opposite-end user and enhancing the experience sense of the opposite-end user in some line interactive scenes.
Further, as shown in fig. 9, the data processing apparatus further includes: communication components 96, power components 98, audio components 99, and the like. Only some of the components are schematically shown in fig. 9, and it is not intended that the data processing apparatus includes only the components shown in fig. 9. It should be noted that the components within the dashed line in fig. 9 are optional components, not necessary components, and may depend on the product form of the data processing apparatus.
Accordingly, the embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor, causes the processor to implement the steps in the method embodiments provided by the embodiments of the present application.
Accordingly, the present application also provides a computer program product, which includes a computer program/instruction, when being executed by a processor, causes the processor to implement the steps in the methods provided by the present application.
The communication component of fig. 9 described above is configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The display screen in fig. 9 described above may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The power supply assembly of fig. 9 described above provides power to the various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.
The audio component of fig. 9 described above may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (35)

1. An audio-video capture device, comprising: an apparatus body; the equipment body is provided with a movable structure, and a camera module is arranged on the movable structure;
wherein, the movable structure can adjust the relative position of the camera module to be matched with the reference mark of the shot object; the camera module is used for collecting and outputting image data containing the shot object at a position matched with the reference mark.
2. The apparatus according to claim 1, wherein the apparatus body is further provided with at least one of a microphone array, a dot matrix screen, a physical key, a heat dissipation hole, and a communication interface; wherein the microphone array is configured to acquire an audio signal adapted to the image data; the dot matrix screen is used for displaying state information of the equipment, the camera module and/or the microphone array.
3. The device of claim 2, wherein the microphone array and the heat dissipation hole are both disposed at the top of the device body, the physical key is disposed at the side of the device body, and the dot-matrix screen is disposed at the front end of the device body; the display interface is arranged at the rear end of the equipment body.
4. The equipment of claim 1, wherein under the condition that the audio/video acquisition equipment is externally connected with a display screen, the equipment body is hung or fixed on the display screen, and the camera module is positioned in front of the display screen and can move relative to the display screen.
5. The apparatus according to claim 4, wherein the apparatus body is further provided with a processor for displaying the image data on the display screen according to the camera position mapped on the display screen by the camera module.
6. The device of claim 5, wherein the processor is specifically configured to:
acquiring associated data of the image data, and determining the priority between the image data and the associated data, wherein the priority reflects the attention value of the data;
taking the camera position as a focus center, and displaying the image data and the associated data in a surrounding area of the camera position according to the priority;
and the higher the priority of the data is, the closer the display position of the data on the display screen is to the position of the camera.
7. The device of claim 6, wherein the processor is specifically configured to:
dividing at least two information distribution areas on the display screen by taking the position of the camera as a focus center, wherein the farther the distance between the information distribution areas and the position of the camera is, the lower the focus degree of the information distribution areas is;
and displaying the image data and the associated data in the information distribution areas with corresponding attention according to the distances between the at least two information distribution areas and the positions of the cameras and the priority.
8. The device of claim 7, wherein the processor is specifically configured to:
grading the image data and the data in the related data respectively, wherein the higher the grade of the data is, the higher the attention value of the data is;
and displaying the data of different grades in the information distribution areas with corresponding attention according to the distance between the at least two information distribution areas and the position of the camera and the priority.
9. The device of claim 5, wherein the processor is further configured to:
responding to the picture setting operation initiated by the shot object, adjusting the interface position of the image data to enable the image data to be close to the camera position, and/or carrying out picture amplification on the image data to enable the shot object to view the picture setting effect.
10. The apparatus of any of claims 5-9, wherein the processor is further configured to:
responding to an instruction for adjusting the relative position of the camera module, and identifying an interface element on the display screen, wherein the interface element is positioned on the moving path of the camera module; adjusting the display position of the interface element on the display screen along with the position change of the camera module;
and/or
And adjusting the image data from the current display area to a display area closer to the camera position in response to the interactive request with the shot object.
11. The apparatus of any of claims 5-9, wherein the camera module comprises a camera and a notification light; the prompting lamp is used for sending a prompting signal to remind that the relative position of the reference mark of the shot object and the camera is not adaptive.
12. The device of claim 11, wherein the processor is further configured to: and monitoring whether the relative position of the reference mark of the shot object and the camera is matched or not according to the image data, and controlling the prompting lamp to send out a prompting signal under the condition that the unmatched time length exceeds a set time length threshold value when the unmatched time length is monitored.
13. The apparatus of claim 12, wherein the camera is a three-dimensional structured light camera.
14. The apparatus of any of claims 5-9, wherein the processor is further configured to:
receiving a control instruction or a voice instruction, and executing corresponding control operation according to the control instruction or the voice instruction.
15. The apparatus according to any one of claims 1 to 9, wherein the moving mechanism is a lifting mechanism capable of moving the camera module up and down.
16. An audio-video processing system, comprising: the device comprises audio and video acquisition equipment and a display screen in communication connection with the audio and video acquisition equipment; the audio and video acquisition equipment is hung or fixed on the display screen and comprises a movable structure, a camera module is mounted on the movable structure, and the camera module is positioned in front of the display screen and can move relative to the display screen;
the audio and video acquisition equipment is used for adjusting the relative position of the camera module through the movable structure to enable the camera module to be matched with a reference mark of a shot object, so that the camera module can acquire image data containing the shot object at the position matched with the reference mark and output the image data to the display screen.
17. The system of claim 16, wherein the audio-visual capture device is further configured to:
and synthesizing the audio signals acquired by the microphone array of the audio and video acquisition equipment and the image data into audio and video pictures, and outputting the audio and video pictures to the display screen.
18. The system of claim 17, wherein the audio/video capture device is specifically configured to:
and displaying the audio and video picture on the display screen according to the position of the camera, which is mapped on the display screen by the camera module.
19. The system of claim 18, further comprising: the control terminal is bound with the audio and video acquisition equipment;
the control terminal is used for responding to user operation and sending a control instruction to the audio and video acquisition equipment; the audio and video acquisition equipment is further used for executing corresponding operation according to the control instruction.
20. An online live broadcast system, comprising: the system comprises integrated live broadcast equipment and a display screen in communication connection with the live broadcast equipment; the live broadcasting equipment is hung or fixed on the display screen and comprises a movable structure, a camera module is arranged on the movable structure and is positioned in front of the display screen and can move relative to the display screen;
the live broadcast equipment is used for adjusting the relative position of the camera module through the movable structure to enable the camera module to be matched with the sight of a main broadcast, so that the camera module can collect image data containing the main broadcast at the position matched with the sight; and synthesizing a live broadcast picture based on the image data, and respectively sending the live broadcast picture to the display screen and the user terminal for displaying.
21. A video conference system is characterized by comprising a plurality of conference participants, wherein each conference participant comprises an integrated conference terminal and a display screen in communication connection with the conference terminal; the conference terminal is hung or fixed on the display screen and comprises a movable structure, a camera module is mounted on the movable structure, and the camera module is positioned in front of the display screen and can move relative to the display screen;
the conference terminal is used for adjusting the relative position of the camera module through the movable structure to enable the camera module to be matched with the sight of a conference speaker, so that the camera module can collect image data containing the conference speaker at the position matched with the sight; and synthesizing a conference picture based on the image data, and respectively sending the conference picture to the display screen and conference terminals of other conference participants for display.
22. An audio/video processing method, comprising:
adjusting the relative position of a camera module on the audio and video acquisition equipment to enable the camera module to be matched with a reference mark of a shot object, so that the camera module acquires image data containing the shot object at the position matched with the reference mark;
and acquiring the image data acquired by the camera module and outputting the image data.
23. The method of claim 22, wherein the audio/video capture device is suspended or fixed on a display screen to which it is connected, and the camera module is positioned in front of the display screen, and outputting the image data comprises:
and displaying the image data on the display screen according to the camera position of the camera module mapped on the display screen.
24. The method of claim 23, wherein displaying the image data on the display screen according to the camera position mapped on the display screen by the camera module comprises:
acquiring associated data of the image data, and determining the priority between the image data and the associated data, wherein the priority reflects the attention value of the data;
taking the camera position as a focus center, and displaying the image data and the associated data in a surrounding area of the camera position according to the priority;
and the higher the priority of the data is, the closer the display position of the data on the display screen is to the position of the camera.
25. The method of claim 24, wherein displaying the image data and the associated data in the area around the camera position according to the priority with the camera position as a center of attention comprises:
dividing at least two information distribution areas on the display screen by taking the position of the camera as a focus center, wherein the farther the distance between the information distribution areas and the position of the camera is, the lower the focus degree of the information distribution areas is;
and displaying the image data and the associated data in the information distribution areas with corresponding attention according to the distances between the at least two information distribution areas and the positions of the cameras and the priority.
26. The method of claim 25, wherein displaying the image data and its associated data in information distribution areas having respective degrees of interest based on the distance of the at least two information distribution areas from the camera location and the priority comprises:
grading the image data and the data in the related data respectively, wherein the higher the grade of the data is, the higher the attention value of the data is;
and displaying the data of different grades in the information distribution areas with corresponding attention according to the distance between the at least two information distribution areas and the position of the camera and the priority.
27. The method of claim 26, further comprising:
responding to the picture setting operation initiated by the shot object, adjusting the interface position of the image data to enable the image data to be close to the camera position, and/or carrying out picture amplification on the image data to enable the shot object to view the picture setting effect.
28. The method of any one of claims 23-27, further comprising:
responding to an instruction for adjusting the relative position of the camera module, and identifying an interface element on the display screen, wherein the interface element is positioned on the moving path of the camera module; adjusting the display position of the interface element on the display screen along with the position change of the camera module;
and/or
And adjusting the image data from the current display area to a display area closer to the camera position in response to the interactive request with the shot object.
29. An online live broadcast method, comprising:
adjusting the relative position of a camera module on live broadcast equipment to enable the camera module to be matched with the sight of a main broadcast, so that the camera module collects image data containing the main broadcast at the matched position;
and acquiring the image data acquired by the camera module, synthesizing a live broadcast picture based on the image data, and sending the live broadcast picture to a user terminal for displaying.
30. A video conferencing method, comprising:
adjusting the relative position of a camera module on a conference terminal to enable the camera module to be matched with the sight of a conference speaker, so that the camera module collects image data containing the conference speaker at the matched position;
and obtaining the image data acquired by the camera module, synthesizing a conference picture based on the image data, and sending the conference picture to other conference terminals for displaying.
31. An audio-video processing system, comprising: a camera module and a display device; the camera module is suspended or fixed in front of a screen of the display equipment and can move relative to the screen of the display equipment;
the camera module is used for adjusting the relative position of the camera module to be matched with a reference mark of a shot object, collecting image data containing the shot object at the position matched with the reference mark and transmitting the image data to the display equipment;
the display device is used for displaying the image data.
32. A method of displaying data, comprising:
acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object;
and displaying the image data and/or the associated data on the display screen by taking the position of the camera mapped on the display screen as a focus center.
33. A data processing apparatus, characterized by comprising: a memory and a processor;
the memory for storing a computer program;
the processor, coupled with the memory, to execute the computer program to:
acquiring image data which is acquired by a camera and contains a shot object, wherein the camera is positioned in front of a display screen, and the relative position of the camera is matched with a reference mark of the shot object;
and displaying the image data and/or the associated data on the display screen by taking the position of the camera mapped on the display screen as a focus center.
34. A computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to carry out the steps of the method of any one of claims 22-30 and 32.
35. A computer program product comprising computer program/instructions for causing a processor to carry out the steps of the method of any one of claims 22-30 and 32 when the computer program/instructions is executed by the processor.
CN202110308870.5A 2021-03-23 2021-03-23 Audio and video processing method, device and system and storage medium Pending CN113301367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110308870.5A CN113301367A (en) 2021-03-23 2021-03-23 Audio and video processing method, device and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110308870.5A CN113301367A (en) 2021-03-23 2021-03-23 Audio and video processing method, device and system and storage medium

Publications (1)

Publication Number Publication Date
CN113301367A true CN113301367A (en) 2021-08-24

Family

ID=77319187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110308870.5A Pending CN113301367A (en) 2021-03-23 2021-03-23 Audio and video processing method, device and system and storage medium

Country Status (1)

Country Link
CN (1) CN113301367A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852833A (en) * 2021-08-30 2021-12-28 阿里巴巴(中国)有限公司 Multi-device collaborative live broadcast method and device and electronic device
CN114422842A (en) * 2022-01-19 2022-04-29 阿里巴巴(中国)有限公司 Content display method and electronic equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006060651A (en) * 2004-08-23 2006-03-02 Hitachi Kokusai Electric Inc Television camera device
CN103621103A (en) * 2011-06-23 2014-03-05 Lg电子株式会社 Method for displaying program information and image display apparatus thereof
CN105912103A (en) * 2016-03-31 2016-08-31 乐视控股(北京)有限公司 Display processing method, device of application desktop of mobile terminal and mobile terminal
CN106658032A (en) * 2017-01-19 2017-05-10 三峡大学 Multi-camera live method and system
US20180043263A1 (en) * 2016-08-15 2018-02-15 Emmanuel Brian Cao Augmented Reality method and system for line-of-sight interactions with people and objects online
CN110611787A (en) * 2019-06-10 2019-12-24 青岛海信电器股份有限公司 Display and image processing method
CN110719406A (en) * 2019-10-15 2020-01-21 腾讯科技(深圳)有限公司 Shooting processing method, shooting equipment and computer equipment
CN110874133A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 Interaction method based on intelligent display device, intelligent display device and storage medium
CN111669508A (en) * 2020-07-01 2020-09-15 海信视像科技股份有限公司 Camera control method and display device
CN111935532A (en) * 2020-08-14 2020-11-13 腾讯科技(深圳)有限公司 Video interaction method and device, electronic equipment and storage medium
CN112073662A (en) * 2019-06-10 2020-12-11 海信视像科技股份有限公司 Display device
CN212319339U (en) * 2020-04-17 2021-01-08 深圳宇彤音乐教育科技有限公司 Convenient live camera that removes
CN112351161A (en) * 2019-08-08 2021-02-09 华为技术有限公司 Camera assembly and electronic equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006060651A (en) * 2004-08-23 2006-03-02 Hitachi Kokusai Electric Inc Television camera device
CN103621103A (en) * 2011-06-23 2014-03-05 Lg电子株式会社 Method for displaying program information and image display apparatus thereof
CN105912103A (en) * 2016-03-31 2016-08-31 乐视控股(北京)有限公司 Display processing method, device of application desktop of mobile terminal and mobile terminal
US20180043263A1 (en) * 2016-08-15 2018-02-15 Emmanuel Brian Cao Augmented Reality method and system for line-of-sight interactions with people and objects online
CN106658032A (en) * 2017-01-19 2017-05-10 三峡大学 Multi-camera live method and system
CN110874133A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 Interaction method based on intelligent display device, intelligent display device and storage medium
CN112073662A (en) * 2019-06-10 2020-12-11 海信视像科技股份有限公司 Display device
CN110611787A (en) * 2019-06-10 2019-12-24 青岛海信电器股份有限公司 Display and image processing method
CN112351161A (en) * 2019-08-08 2021-02-09 华为技术有限公司 Camera assembly and electronic equipment
CN110719406A (en) * 2019-10-15 2020-01-21 腾讯科技(深圳)有限公司 Shooting processing method, shooting equipment and computer equipment
CN212319339U (en) * 2020-04-17 2021-01-08 深圳宇彤音乐教育科技有限公司 Convenient live camera that removes
CN111669508A (en) * 2020-07-01 2020-09-15 海信视像科技股份有限公司 Camera control method and display device
CN111935532A (en) * 2020-08-14 2020-11-13 腾讯科技(深圳)有限公司 Video interaction method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852833A (en) * 2021-08-30 2021-12-28 阿里巴巴(中国)有限公司 Multi-device collaborative live broadcast method and device and electronic device
CN113852833B (en) * 2021-08-30 2024-03-22 阿里巴巴(中国)有限公司 Multi-device collaborative live broadcast method and device and electronic device
CN114422842A (en) * 2022-01-19 2022-04-29 阿里巴巴(中国)有限公司 Content display method and electronic equipment

Similar Documents

Publication Publication Date Title
US20140104396A1 (en) Apparatus and method for streaming live images, audio and meta-data
US9619693B2 (en) Display system, display device, projection device and program
WO2015107817A1 (en) Image display device and image display method, image output device and image output method, and image display system
AU2011237473B2 (en) Remote gaze control system and method
CN103869470A (en) Display device, head-mount type display device, method of controlling display device, and method of controlling head-mount type display device
CN113301367A (en) Audio and video processing method, device and system and storage medium
CN103995685A (en) Information processing device and control method for information processing device
CN102804792A (en) Three-dimensional video processing apparatus, method therefor, and program
WO2021218547A1 (en) Method for superimposing live image of person onto real scene, and electronic device
CN102685537A (en) Display device, display system, and method for controlling display device
WO2022262839A1 (en) Stereoscopic display method and apparatus for live performance, medium, and system
US20240077941A1 (en) Information processing system, information processing method, and program
EP3388036A1 (en) Methods and systems for wireless live video streaming from a welding helmet
CN105472358A (en) Intelligent terminal about video image processing
US20120249758A1 (en) Electric apparatus and control method of indicator
KR102424150B1 (en) An automatic video production system
CN114268775A (en) Projection system, method and storage medium
CN105227828B (en) Filming apparatus and method
WO2013005518A1 (en) Image output device, image output method, and program
US10679589B2 (en) Image processing system, image processing apparatus, and program for generating anamorphic image data
CN114979598B (en) Laser projection display method, three-color laser projection apparatus, and readable storage medium
JP2015159460A (en) Projection system, projection device, photographing device, method for generating guide frame, and program
WO2021226821A1 (en) Systems and methods for detection and display of whiteboard text and/or an active speaker
CN103391445A (en) Image display apparatus and shutter device
CN111630848B (en) Image processing apparatus, image processing method, program, and projection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240318

Address after: # 03-06, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Applicant after: Alibaba Innovation Co.

Country or region after: Singapore

Address before: Room 01, 45th Floor, AXA Building, 8 Shanton Road, Singapore

Applicant before: Alibaba Singapore Holdings Ltd.

Country or region before: Singapore

TA01 Transfer of patent application right