WO2022001406A1 - 一种显示方法及显示设备 - Google Patents

一种显示方法及显示设备 Download PDF

Info

Publication number
WO2022001406A1
WO2022001406A1 PCT/CN2021/093588 CN2021093588W WO2022001406A1 WO 2022001406 A1 WO2022001406 A1 WO 2022001406A1 CN 2021093588 W CN2021093588 W CN 2021093588W WO 2022001406 A1 WO2022001406 A1 WO 2022001406A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
angle
sound source
sound
character
Prior art date
Application number
PCT/CN2021/093588
Other languages
English (en)
French (fr)
Inventor
杨鲁明
王大勇
王旭升
程晋
于文钦
马乐
丁佳一
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010621070.4A external-priority patent/CN111708383A/zh
Priority claimed from CN202110014128.3A external-priority patent/CN112866772B/zh
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Priority to CN202180047263.6A priority Critical patent/CN116097120A/zh
Publication of WO2022001406A1 publication Critical patent/WO2022001406A1/zh
Priority to US18/060,210 priority patent/US20230090916A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D3/00Control of position or direction
    • G05D3/12Control of position or direction using feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/02Casings; Cabinets ; Supports therefor; Mountings therein
    • H04R1/028Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/15Transducers incorporated in visual displaying devices, e.g. televisions, computer displays, laptops

Definitions

  • the present application relates to the technical field of television software, and in particular, to a display method and a display device.
  • the display device can implement functions such as network search, IP TV, BBTV, video on demand (VOD), digital music, network news, and network video telephony.
  • VOD video on demand
  • a camera needs to be installed on the display device to collect user images.
  • An embodiment of the present application provides a display device, including:
  • the camera is configured to capture a portrait and realize rotation within a preset angle range
  • the sound collector is configured to collect character sound source information, where the character sound source information refers to the sound information generated when the character interacts with the display device through voice;
  • controller connected to the camera and the sound collector, the controller is configured to: acquire the character sound source information collected by the sound collector and the current shooting angle of the camera;
  • the shooting angle of the camera is adjusted, so that the shooting area of the camera faces the position where the person's voice is located.
  • FIG. 1 exemplarily shows a schematic diagram of an operation scene between a display device and a control apparatus according to some embodiments
  • FIG. 2 exemplarily shows a hardware configuration block diagram of a display device 200 according to some embodiments
  • FIG. 3 exemplarily shows a hardware configuration block diagram of the control device 100 according to some embodiments
  • FIG. 4 exemplarily shows a schematic diagram of software configuration in the display device 200 according to some embodiments
  • FIG. 5 exemplarily shows a schematic diagram of displaying an icon control interface of an application in the display device 200 according to some embodiments
  • FIG. 6 exemplarily shows a structural block diagram of a display device according to some embodiments.
  • FIG. 7 exemplarily shows a schematic diagram of implementing a preset angle range for camera rotation according to some embodiments
  • FIG. 8 exemplarily shows a scene graph of camera rotation within a preset angle range according to some embodiments
  • FIG. 9 exemplarily shows a schematic diagram of a sound source angle range according to some embodiments.
  • FIG. 10 exemplarily shows a flowchart of a method for adjusting the shooting angle of a camera according to some embodiments
  • FIG. 11 exemplarily shows a flowchart of a wake-up text comparison method according to some embodiments
  • FIG. 12 exemplarily shows a flowchart of a method for performing sound source identification on character sound source information according to some embodiments
  • FIG. 13 exemplarily shows a flowchart of a method for determining a target rotation direction and a target rotation angle of a camera according to some embodiments
  • FIG. 14 exemplarily shows a scene diagram of adjusting the shooting angle of the camera according to some embodiments
  • Fig. 15a exemplarily shows another scene diagram of adjusting the shooting angle of the camera according to some embodiments
  • FIG. 15b exemplarily shows a scene graph of the position of a character when speaking according to some embodiments
  • 16 is a schematic diagram of the arrangement structure of a display device and a camera in an embodiment of the application;
  • FIG. 17 is a schematic structural diagram of a camera in an embodiment of the present application.
  • 18a is a schematic diagram of a scene of a display device before adjustment in an embodiment of the present application.
  • 18b is a schematic diagram of a scene of a display device after adjustment in an embodiment of the present application.
  • FIG. 19 is a schematic diagram of a sound source localization scene in an embodiment of the present application.
  • 21 is a schematic diagram of the portrait center and the image center in the embodiment of the application.
  • 22 is a schematic diagram of the geometric relationship of the process of calculating the rotation angle in the embodiment of the application.
  • 23a is a schematic diagram of the initial state of the process of adjusting the rotation angle in the embodiment of the present application.
  • 23b is a schematic diagram of the result of the process of adjusting the rotation angle in the embodiment of the present application.
  • Figure 24a is a schematic diagram of a squatting state in an embodiment of the present application.
  • 24b is a schematic diagram of a standing posture state in an embodiment of the present application.
  • Fig. 25a is a schematic diagram of the display effect of the initial state of the virtual portrait in the embodiment of the application.
  • FIG. 25b is a schematic diagram of the display effect of the virtual portrait after adjustment in the embodiment of the present application.
  • module refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic or combination of hardware or/and software code capable of performing the function associated with that element.
  • remote control refers to a component of an electronic device, such as the display device disclosed in this application, that can wirelessly control the electronic device, usually over a short distance.
  • infrared and/or radio frequency (RF) signals and/or Bluetooth are used to connect with electronic devices, and functional modules such as WiFi, wireless USB, Bluetooth, and motion sensors may also be included.
  • RF radio frequency
  • a hand-held touch remote control replaces most of the physical built-in hard keys in a general remote control device with a user interface in a touch screen.
  • gesture used in this application refers to a user's behavior that is used by a user to express an expected thought, action, purpose/or result through an action such as a change of hand shape or hand movement.
  • FIG. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in FIG. 1 , a user can operate the display device 200 through the smart device 300 or the control device 100 .
  • the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or Bluetooth protocol communication, and other short-distance communication methods, and the display device 200 is controlled wirelessly or wiredly.
  • the user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, and the like.
  • a smart device 300 eg, a mobile terminal, a tablet computer, a computer, a notebook computer, etc.
  • the display device 200 is controlled using an application running on the smart device.
  • the display device 200 can also be controlled in a manner other than the control apparatus 100 and the smart device 300.
  • the module for acquiring voice commands configured inside the display device 200 can directly receive the user's voice command for control.
  • the user's voice command control can also be received through a voice control device provided outside the display device 200 device.
  • the display device 200 is also in data communication with the server 400 .
  • the display device 200 may be allowed to communicate via local area network (LAN), wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the display device 200 .
  • FIG. 2 exemplarily shows a configuration block diagram of the control apparatus 100 according to an exemplary embodiment.
  • the control device 100 includes a controller 110 , a communication interface 130 , a user input/output interface 140 , a memory 190 , and a power supply 180 .
  • the control device 100 can receive the user's input operation instruction, and convert the operation instruction into an instruction that the display device 200 can recognize and respond to, and play an intermediary role between the user and the display device 200 .
  • FIG. 2 is a block diagram showing a hardware configuration of a display device 200 according to an exemplary embodiment.
  • the display device 200 includes at least one of a tuner and demodulator 210 , a communicator 220 , a detector 230 , an external device interface 240 , a controller 250 , a display 275 , an audio output interface 285 , a memory 260 , a power supply 290 , and a user interface 265 .
  • the display 275 includes a display screen component for presenting pictures, and a driving component for driving image display, for receiving image signals output from the controller, components for displaying video content, image content, and menu manipulation interfaces, and user manipulation UI interfaces .
  • the display 275 may be a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
  • the communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.
  • the communicator may include at least one of a Wifi module, a Bluetooth module, a wired Ethernet module and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
  • the display device 200 may establish transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220 .
  • the user interface can be used to receive control signals from the control device 100 (eg, an infrared remote control, etc.).
  • control device 100 eg, an infrared remote control, etc.
  • the detector 230 is used to collect external environment or external interaction signals.
  • the detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which can be used to collect external environmental scenes, user attributes or user interaction gestures, or , the detector 230 includes a sound collector, such as a microphone, for receiving external sound.
  • the external device interface 240 may include, but is not limited to, the following: any one of high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, etc. or multiple interfaces. It may also be a composite input/output interface formed by a plurality of the above-mentioned interfaces.
  • HDMI high-definition multimedia interface
  • component analog or data high-definition component input interface
  • CVBS composite video input interface
  • USB USB input interface
  • RGB port etc.
  • It may also be a composite input/output interface formed by a plurality of the above-mentioned interfaces.
  • the controller 250 and the tuner 210 may be located in different separate devices, that is, the tuner 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
  • the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in the memory 260 .
  • the controller 250 controls the overall operation of the display apparatus 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.
  • Objects can be any of the optional objects, such as hyperlinks, icons, or other actionable controls.
  • the operations related to the selected object include: displaying operations connected to hyperlinked pages, documents, images, etc., or executing operations of programs corresponding to the icons.
  • the user may input user commands on a graphical user interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
  • GUI Graphical User Interface
  • control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. visual interface elements.
  • the system is divided into four layers, from top to bottom, they are an application layer (referred to as “application layer”), an application framework layer (referred to as “framework layer”) ”), the Android runtime and the system library layer (referred to as the “system runtime layer”), and the kernel layer.
  • application layer an application layer
  • frame layer an application framework layer
  • Android runtime the Android runtime
  • system library layer the system library layer
  • kernel layer the kernel layer
  • At least one application program runs in the application program layer, and these application programs may be a Window program, a system setting program, a clock program, a camera application, etc. built into the operating system; they may also be developed by a third party
  • the application programs developed by the author such as the Hijian program, the K song program, the magic mirror program, etc.
  • the application package in the application layer is not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.
  • the framework layer provides an application programming interface (API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer is equivalent to a processing center, which decides to let the applications in the application layer take action.
  • the application program can access the resources in the system and obtain the services of the system during execution through the API interface.
  • the application framework layer in the embodiment of the present application includes managers (Managers), content providers (Content Provider), etc., wherein the manager includes at least one of the following modules: an activity manager (Activity Manager) uses Interacts with all activities running in the system; Location Manager is used to provide system services or applications with access to system location services; Package Manager is used to retrieve files currently installed on the device Various information related to the application package; Notification Manager (Notification Manager) is used to control the display and clearing of notification messages; Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpapers on the user interface and desktop widgets.
  • an activity manager uses Interacts with all activities running in the system
  • Location Manager is used to provide system services or applications with access to system location services
  • Package Manager is used to retrieve files currently installed on the device Various information related to the application package
  • Notification Manager Notification Manager
  • Window Manager Window Manager
  • the activity manager is used to: manage the life cycle of each application and the usual navigation and fallback functions, such as controlling the exit of the application (including switching the user interface currently displayed in the display window to the system desktop), opening the , back (including switching the currently displayed user interface in the display window to the upper-level user interface of the currently displayed user interface), and the like.
  • the window manager is used to manage all window programs, such as obtaining the size of the display screen, judging whether there is a status bar, locking the screen, taking screenshots, and controlling the change of the display window (for example, reducing the display window to display, shaking display, twisting deformation display, etc.), etc.
  • the system runtime layer provides support for the upper layer, that is, the framework layer.
  • the Android operating system will run the C/C++ library included in the system runtime layer to implement the functions to be implemented by the framework layer.
  • the kernel layer is the layer between hardware and software. As shown in Figure 4, the kernel layer at least includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, touch sensors, pressure sensors, etc.), etc.
  • the kernel layer at least includes at least one of the following drivers: audio driver, display driver, Bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, touch sensors, pressure sensors, etc.), etc.
  • the kernel layer further includes a power driver module for power management.
  • software programs and/or modules corresponding to the software architecture in FIG. 4 are stored in the first memory or the second memory shown in FIG. 2 or FIG. 3 .
  • the remote control receiving device receives the input operation of the remote control
  • the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes the input operation into the original input event (including the value of the input operation, the timestamp of the input operation, etc.).
  • Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the input event according to the current position of the focus, and regards the input operation as a confirmation operation, and the control corresponding to the confirmation operation is the control of the magic mirror application icon.
  • the mirror application calls the interface of the application framework layer, starts the mirror application, and then starts the camera driver by calling the kernel layer to capture still images or videos through the camera.
  • the display device receives an input operation (such as a split-screen operation) performed by the user on the display screen, and the kernel layer can generate corresponding input operations according to the input operation. Enter an event and report the event to the application framework layer.
  • the window mode (such as multi-window mode) and window position and size corresponding to the input operation are set by the activity manager of the application framework layer.
  • the window management of the application framework layer draws the window according to the settings of the activity manager, and then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display screen.
  • the application layer contains at least one application that can display corresponding icon controls in the display, such as: live TV application icon control, video on demand application icon control, media center application Program icon controls, application center icon controls, game application icon controls, etc.
  • the live TV application may provide live TV from different sources.
  • a live TV application may provide a TV signal using input from cable, over-the-air, satellite services, or other types of live TV services.
  • the live TV application may display the video of the live TV signal on the display device 200 .
  • a video-on-demand application may provide video from various storage sources. Unlike live TV applications, video-on-demand provides a display of video from certain storage sources. For example, video-on-demand can come from the server side of cloud storage, from local hard disk storage containing existing video programs.
  • the media center application may provide various multimedia content playback applications.
  • a media center may provide services other than live TV or video-on-demand, where users can access various images or audio through a media center application.
  • the application center may provide storage of various applications.
  • An application can be a game, an application, or some other application that is related to a computer system or other device but can be run on a Smart TV.
  • the application center can obtain these applications from various sources, store them in local storage, and then run them on the display device 200 .
  • the application programs that need to use the camera in the display device include “Hey see”, “Look in the mirror”, “Youxuemao”, “Fitness”, etc., which can realize “video chat", “watch while chatting” ” and “Fitness”.
  • “See you” is a video chat application that can realize one-click chat between mobile phone and TV, and between TV and TV.
  • "Looking in the Mirror” is an application that provides users with mirror services. By turning on the camera through the mirroring application, users can use the smart TV as a mirror.
  • “Youxuemao” is an application that provides learning functions.
  • the "fitness” function can simultaneously display the fitness instruction video and the image of the user following the fitness instruction video to perform corresponding actions on the display of the display device, so that users can check whether their actions are standard in real time.
  • the camera is fixedly installed on the display device, the center line of the viewing angle of the camera is perpendicular to the display, and the viewing angle of the camera is limited, usually between 60° and 75°, that is, the shooting area of the camera is The center line of the angle of view of the camera is synchronously spread to the left and right to form an area corresponding to an angle of 60° to 75°.
  • the camera cannot capture an image containing the user's portrait, so that the portrait cannot be displayed on the monitor. If in the video chat call scenario, the opposite end user who is in the video chat call with the local user will not be able to see the local user; if in the fitness scenario, the display will not be able to display the image of the user presenting the fitness action, and the user will not be able to see it. Your own fitness movements will not be able to judge whether they are standard or not, which will affect the user experience.
  • FIG. 6 exemplarily shows a structural block diagram of a display device according to some embodiments.
  • the camera is used to capture portraits.
  • the camera is no longer fixedly installed, but is rotatably installed on the display device.
  • the camera 232 is installed on the top of the display in a rotating form, and the camera 232 can rotate along the top of the display.
  • FIG. 7 exemplarily shows a schematic diagram of implementing a preset angle range of camera rotation according to some embodiments
  • FIG. 8 exemplarily shows a scene diagram of camera rotation within the preset angle range according to some embodiments.
  • the preset camera 232 can be rotated within a preset angle range and rotated in a horizontal direction.
  • the preset angle range is 0° ⁇ 120°, that is, at the position facing the display, the left side of the user is 0° and the right side of the user is 120°.
  • the camera can be rotated 60° to the left from the initial state, and 60° to the right from the initial state; the center line of the viewing angle of the camera is perpendicular to the display.
  • the position is the 60° position of the camera.
  • the display device provided by the embodiment of the present application realizes the use of sound source information to trigger the rotation of the camera, can automatically identify the real-time location of the user and adjust the shooting angle of the camera, so that the camera can always capture images including portraits.
  • the display device implements the collection of the sound source information of the person by setting the sound collector 231 .
  • multiple sets of sound collectors can be set in the display device.
  • four sets of sound collectors 231 are set in the display device, and the four sets of sound collectors 231 can be arranged in a linear positional relationship. set up.
  • the sound collector may be a microphone, and four groups of microphones are linearly arranged to form a microphone array. During sound collection, the four groups of sound collectors 231 receive sound information generated when the same user interacts with the display device through voice.
  • FIG. 9 A schematic diagram of a sound source angle range according to some embodiments is exemplarily shown in FIG. 9 .
  • the angle of the sound source generated by the user ranges from 0° to 180°.
  • the user is on the back of the display device, the user The generated sound source angle also ranges from 0° to 180°.
  • the user is located on the left side of the sound collector, which is 0° horizontally, and the user is located on the right side of the sound collector, which is 180° horizontally.
  • the 30° angular position of the sound source is equal to the 0° angular position of the camera
  • the 90° angular position of the sound source is equal to the 60° angular position of the camera
  • the 150° angular position of the sound source is equal to the 120° angular position of the camera corner position.
  • the controller 250 is connected with the camera 232 and the sound collector 231 respectively, and the controller is used to receive the sound source information of the character collected by the sound collector, identify the sound source information of the character, determine the azimuth angle of the position of the character, and then determine the camera. The angle that needs to be turned.
  • the controller adjusts the shooting angle of the camera according to the determined angle that the camera needs to rotate, so that the shooting area of the camera is facing the position of the voice of the character, and adjusts the shooting angle of the camera according to the position of the character to capture the image containing the character.
  • FIG. 10 exemplarily shows a flowchart of a method for adjusting the shooting angle of a camera according to some embodiments.
  • the controller when adjusting the shooting angle of the camera according to the position of the character, the controller is configured to execute the method for adjusting the shooting angle of the camera shown in FIG. 10 , including:
  • the controller in the display device drives the camera to rotate to adjust the shooting angle of the camera, it needs to determine the sound source information of the person generated when the person performs voice interaction with the display device at the location where the person is located.
  • the source information refers to the sound information generated when the character interacts with the display device through voice.
  • the sound source information of the person can determine the azimuth and angle of the person's position when speaking, and in order to accurately determine the angle that the camera needs to adjust, it is necessary to first obtain the current state of the camera, that is, the current shooting angle.
  • the current shooting angle of the camera needs to be acquired when the camera is in a stopped state, so as to ensure the accuracy of the current shooting angle of the camera, and thus to ensure the accuracy of determining the angle that the camera needs to adjust.
  • the controller before executing the acquisition of the current shooting angle of the camera, the controller is further configured to execute the following steps:
  • Step 11 Query the current running status of the camera.
  • Step 12 If the current operating state of the camera is in the rotating state, wait for the camera to rotate completely.
  • Step 13 If the current operating state of the camera is in the non-rotation state, obtain the current shooting angle of the camera.
  • a motor control service is configured in the controller, and the motor control service is used to drive the camera to rotate, obtain the running status of the camera and the orientation angle of the camera.
  • the motor control service monitors the running status of the camera in real time.
  • the controller queries the current running status of the camera by calling the motor control service.
  • the current running status of the camera can represent the current orientation angle of the camera and whether the camera is in a rotating state.
  • the current shooting angle of the camera cannot be obtained at this time, otherwise the exact value cannot be determined. Therefore, when the camera is in the rotating state, it is necessary to wait for the camera to execute the previous instruction to complete the rotation, and then perform the step of obtaining the current shooting angle of the camera in the stopped state.
  • the steps of obtaining the current shooting angle of the camera can be performed.
  • S2. Perform sound source identification on the person's sound source information, and determine the sound source angle information, and the sound source angle information is used to represent the azimuth angle of the person's position during speech.
  • the controller After obtaining the character sound source information generated by the interaction between the character and the display device, the controller needs to perform sound source identification on the character sound source information to determine the position of the character when speaking, specifically the azimuth angle, that is, the character is located in The left and right sides of the sound collector are still facing the sound collector, and the shooting angle of the camera is adjusted according to the position of the character.
  • the character's voice may be in a dialogue with the opposite end user, while the character is still in the shooting area of the camera. If the controller executes the function of adjusting the shooting angle of the camera step, an invalid operation occurs.
  • the wake-up text for triggering the adjustment of the camera shooting angle may be stored in the controller in advance, for example, customizing "Hisense Xiaoju" as the wake-up text for sound source recognition.
  • the character uses the voice "Hisense Xiaoju” as the identification sound source to trigger the process of adjusting the camera's shooting angle.
  • the wake-up text can also be customized as other words, which are not specifically limited in this embodiment.
  • FIG. 11 exemplarily shows a flowchart of a wake-up text comparison method according to some embodiments. Specifically, referring to FIG. 11 , before the controller performs sound source identification on the character sound source information and determines the sound source angle information, the controller is further configured to perform the following steps:
  • the preset wake-up text refers to the text used to trigger the sound source identification process.
  • the controller after acquiring the sound source information of the character, the controller first performs text extraction, and extracts the voice interaction text when the character interacts with the display device through voice. Compare the extracted voice interaction text with the preset wake-up text. If the comparison is inconsistent, for example, the character's voice is not "Hisense Xiaoju", but other interactive content. At this time, it means that the current character's voice does not trigger the adjustment of the camera's shooting angle. voice, the controller does not need to perform the relevant steps to adjust the camera's shooting angle.
  • the controller can continue to perform the subsequent steps to adjust the camera's shooting angle.
  • the controller When judging that the person's sound source information is a wake-up voice, that is, a trigger voice for adjusting the shooting angle of the camera, the controller needs to perform a subsequent sound source recognition process.
  • the multiple groups of sound collectors can collect multiple groups of person sound source information when the same person's voice is spoken, so when the controller obtains the person sound source information collected by the sound collector, it can obtain Each sound collector collects the character sound source information generated when the character speaks, that is, the controller will acquire multiple sets of character sound source information.
  • FIG. 12 exemplarily shows a flow chart of a method for sound source identification for character sound source information according to some embodiments.
  • the controller is further configured to perform the following steps when performing sound source identification on the character sound source information and determining the sound source angle information:
  • each sound collector The frequency response of each sound collector is the same, and its sampling clock is also synchronized. However, because the distance between each sound collector and the character is not the same, the time when each sound collector can collect speech is not the same. There will be a difference in acquisition time between multiple groups of sound collectors.
  • the angle and distance of the sound source from the array can be calculated by the sound collector array, so as to realize the tracking of the sound source at the position of the character when speaking.
  • the time difference between the arrival of the signal between the two microphones is estimated, so as to obtain the equation set of the sound source position coordinates, and then the exact position of the sound source can be obtained by solving the equation set Coordinates, that is, sound source angle information.
  • step S21 the controller performs sound source identification for each of the character sound source information, and calculates the voices generated by the plurality of groups of the sound collectors when collecting the corresponding character sound source information.
  • the time difference is further configured to perform the following steps:
  • Step 211 extract the environmental noise, the sound source signal of the person's voice, and the propagation time of the person's voice to each sound collector from the person's sound source information.
  • Step 212 Determine the received signal of each sound collector according to the environmental noise, the sound source signal and the propagation time.
  • Step 213 using the cross-correlation time delay estimation algorithm to process the received signal of each sound collector to obtain the speech time difference generated when each two sound collectors collect the corresponding character sound source information.
  • the sound source array can be used to estimate the direction-of-arrival (DOA) estimation. time difference.
  • DOA direction-of-arrival
  • the target signal received by each element of the sound collector array comes from the same sound source. Therefore, there is a strong correlation between the signals of each channel.
  • the time delay between the signals observed by each two sound collectors that is, the speech time difference, can be determined.
  • the character sound source information generated by the character during the speech includes the ambient noise and the sound source signal of the character voice, and the propagation time of the character's voice transmitted to each sound collector can also be extracted from the character sound source information by identifying and extracting, and calculating The received signal of each sound collector.
  • x i (t) is the received signal of the i-th sound collector
  • s(t) is the sound source signal when the character's voice is spoken
  • ⁇ i is the propagation time of the character's voice propagating to the i-th sound collector
  • n i (t) is the environmental noise
  • ⁇ i is the correction coefficient
  • the cross-correlation delay estimation algorithm is used to process the received signal of each sound collector to estimate the delay, which is expressed as: In the formula, is the time delay between the i-th sound collector and the i+1-th sound collector, that is, the voice time difference.
  • n i and n i+1 are uncorrelated Gaussian white noise, the above formula is further simplified as:
  • the maximum value is the time delay of the two sound collectors, that is, the voice time difference.
  • the peak value of the cross-power spectrum can be weighted in the frequency domain according to the prior knowledge of signal and noise, so as to suppress noise and reverberation interference.
  • PHAT weighting is used to make the interaction rate spectrum between the signals smoother, and the final speech time difference generated by each two sound collectors when collecting the corresponding character sound source information is obtained.
  • the cross-power spectrum weighted by PHAT is similar to the expression of the unit impulse response, which highlights the peak value of the delay, which can effectively suppress the reverberation noise and improve the accuracy and accuracy of the delay (speech time difference) estimation.
  • step S22 the controller is further configured to perform the following steps when calculating the sound source angle information of the position of the character when speaking based on the speech time difference:
  • Step 221 Acquire the speed of sound in the current environmental state, the coordinates of each sound collector, and the set number of sound collectors.
  • Step 222 Determine the number of combined pairs of sound collectors according to the set number of sound collectors, where the number of combined pairs refers to the number of combinations obtained by combining two sound collectors.
  • Step 223 according to the speech time difference, the speed of sound and the coordinates of each sound collector corresponding to each two sound collectors, establish a vector relational equation set, the number of which is the same as the number of combination pairs.
  • Step 224 Solve the vector relation equation system to obtain the vector value of the unit plane wave propagation vector of the sound source at the position of the person's speech.
  • Step 225 Calculate, according to the vector value, the sound source angle information of the position where the character is speaking.
  • the sound source angle information of the position of the character when speaking can be calculated according to each voice time difference.
  • the number of equations can be set to be the same as the number of combinations obtained by combining the sound collectors in pairs. To this end, the set number N of the sound collectors is obtained, and there are N(N-1)/2 pairs of combinations between all the sound collectors.
  • the sound source angle information can be determined by solving the vector value of the sound source unit plane wave propagation vector at the character's voice position.
  • This formula represents the set of vector relationship equations established between the ith sound collector and the jth sound collector.
  • the sound source angle information of the azimuth angle of the position of the character when speaking is, the sound source angle information of the azimuth angle of the position of the character when speaking.
  • the controller determines the sound source angle information used to represent the azimuth angle of the person's position when speaking by performing sound source identification on the sound source information of the person.
  • the sound source angle information can identify the current position of the character, and the current shooting angle of the camera can identify the current position of the camera.
  • FIG. 13 exemplarily shows a flowchart of a method for determining a target rotation direction and a target rotation angle of a camera according to some embodiments.
  • the controller is further configured to perform the following steps when determining the target rotation direction and target rotation angle of the camera based on the current shooting angle and sound source angle information of the camera:
  • the sound source angle information represents the azimuth angle of the character
  • the sound source angle information of the character can be converted into the camera.
  • the coordinate angle that is, the coordinate angle of the camera is used to replace the sound source angle information of the character.
  • controller is further configured to perform the following steps when performing the conversion of the sound source angle information into the coordinate angle of the camera:
  • Step 311 Acquire the sound source angle range of the character when speaking and the preset angle range when the camera rotates.
  • Step 312 Calculate the angle difference between the sound source angle range and the preset angle range, and use the half value of the angle difference as the conversion angle.
  • Step 313 Calculate the angle difference between the angle corresponding to the sound source angle information and the conversion angle, and use the angle difference as the coordinate angle of the camera.
  • the preset angle range is 0° ⁇ 120°
  • the sound source angle range is 0° ⁇ 180°
  • the coordinate angle of the camera cannot directly replace the sound source angle information . Therefore, first calculate the angle difference between the sound source angle range and the preset angle range, then calculate the half value of the angle difference, and use the half value as the conversion angle when the sound source angle information is converted into the coordinate angle of the camera.
  • the angle difference between the sound source angle range and the preset angle range is 60°, the half value of the angle difference is 30°, and 30° is used as the conversion angle. Finally, the angle difference between the angle corresponding to the sound source angle information and the conversion angle is calculated, which is the coordinate angle of the camera converted from the sound source angle information.
  • the angle corresponding to the sound source angle information determined by the controller by acquiring the character sound source information collected by multiple sound collectors is 50°, and the conversion angle is 30°. Therefore, The calculated angle difference is 20°, that is, the 50° corresponding to the sound source angle information is replaced by the camera's coordinate angle of 20°.
  • the angle corresponding to the sound source angle information determined by the controller by acquiring the character sound source information collected by multiple sound collectors is 130°, and the conversion angle is 30°. Therefore, the calculated angle The difference is 100°, that is, the 130° corresponding to the sound source angle information is replaced by the camera's coordinate angle of 100° to represent it.
  • S32 Calculate the angle difference between the coordinate angle of the camera and the current shooting angle of the camera, and use the angle difference as the target rotation angle of the camera.
  • the coordinate angle of the camera is used to identify the angle of the person's position within the camera coordinates. Therefore, according to the angle difference between the current shooting angle of the camera and the coordinate angle of the camera, the target rotation angle that the camera needs to rotate can be determined.
  • the current shooting angle of the camera is 100° and the coordinate angle of the camera is 20°, it means that the current shooting area of the camera is not aimed at the position of the person, and the difference between the two is 80°. Therefore, it is necessary to rotate the camera 80° after , the shooting area of the camera can be aimed at the position of the person, that is, the target rotation angle of the camera is 80°.
  • the left side is taken as the 0° position of the camera
  • the right side is taken as the 120° position of the camera. Therefore, after the angle difference is determined according to the coordinate angle of the camera and the current shooting angle of the camera, if the If the angle is greater than the coordinate angle, it means that the camera's shooting angle is on the right side of the character's position, and the angle difference is a negative value; if the current shooting angle is less than the coordinate angle, it means that the camera's shooting angle is on the left side of the character's position. side, the angle difference is a positive value at this time.
  • the target rotation direction of the camera may be determined according to the positive or negative of the angle difference. If the angle difference is a positive value, it means that the shooting angle of the camera is on the left side of the position of the character. At this time, in order to make the image of the character captured by the camera, the shooting angle of the camera needs to be adjusted to the right, and the target rotation direction of the camera is determined. to turn right.
  • the angle difference is a negative value, it means that the shooting angle of the camera is located on the right side of the person's position. At this time, in order to make the camera capture the image of the person, it is necessary to adjust the shooting angle of the camera to the left, and then determine the target rotation direction of the camera. to turn left.
  • FIG. 14 exemplarily shows a scene graph for adjusting the shooting angle of the camera according to some embodiments.
  • the angle corresponding to the sound source angle information corresponding to the character is 50°
  • the converted coordinate angle of the camera is 20°
  • the current shooting angle of the camera is 100°, that is, the center line of the camera's viewing angle is located at the position of the character.
  • the calculated angle difference is -80°.
  • the visible angle difference is a negative value.
  • the camera needs to be adjusted to rotate 80° to the left.
  • FIG. 15a exemplarily shows another scene diagram for adjusting the shooting angle of the camera according to some embodiments.
  • the coordinate angle of the converted camera is 90°; the current shooting angle of the camera is 40°, that is, the center line of the camera's viewing angle is located at the position of the character.
  • the calculated angle difference is 50°.
  • the visible angle difference is a positive value. At this time, the camera needs to be adjusted to rotate 50° to the right.
  • the controller After the controller determines the target rotation direction and target rotation angle required when the camera needs to adjust the shooting angle, it can adjust the shooting angle of the camera according to the target rotation direction and target rotation angle, so that the shooting area of the camera is facing the position of the character. , so that the camera can capture images including characters, so that the shooting angle of the camera can be adjusted according to the position of the characters.
  • FIG. 15b exemplarily shows a scene graph of the position of the character when speaking according to some embodiments. Since the preset angle range of the camera is different from the sound source angle range of the human voice, if it is reflected in the angle diagram, see Figure 15b, there is a 30° position between the 0° position of the preset angle range and the 0° position of the sound source angle range. ° angle difference, similarly, there is also a 30° angle difference between the 120° position of the preset angle range and the 180° position of the sound source angle range.
  • the controller converts the sound source angle information into the coordinate angle of the camera in the aforementioned step S31, the coordinate angle of the camera converted from the sound source angle information of the character will be negative, or larger than the camera.
  • the maximum value of the preset angle range that is, the coordinate angle of the camera obtained by conversion is not within the preset angle range of the camera.
  • the calculated coordinate angle of the camera is -10°. If the sound source angle information corresponding to the position of the person (b) is 170°, and the conversion angle is 30°, the calculated coordinate angle of the camera is 140°. It can be seen that the coordinate angles of the camera respectively converted according to the position of the person (a) and the position of the person (b) are beyond the preset angle range of the camera.
  • the viewing angle range of the camera is between 60° and 75°, it means that when the camera is rotated to the 0° position or the 120° position, the viewing angle range of the camera can cover the 0° position of the preset angle range and the sound source.
  • the position of the character is within a 30° angle difference between the 0° position of the preset angle range and the 0° position of the sound source angle range, or, if the character is located at the 120° position of the preset angle range and the sound source If there is a 30° angle difference between the 180° positions of the source angle range, in order to capture images including people, adjust the camera’s shooting angle according to the position corresponding to the minimum or maximum value of the camera’s preset angle range .
  • the controller is further configured to perform the following steps: when the angle information of the sound source of the character is converted into the coordinate angle of the camera beyond the preset angle range of the camera, according to the current shooting angle of the camera and the preset angle range The angle difference between the minimum or maximum value determines the target rotation direction and target rotation angle of the camera.
  • the person (a) is located within a 30° angle difference between the 0° position of the preset angle range and the 0° position of the sound source angle range, that is, the sound source corresponding to the sound source angle information of the person (a)
  • the current shooting angle of the camera is 50°.
  • the angle difference is -50°
  • the target rotation direction of the camera is determined to be leftward
  • the target rotation angle is 50° .
  • the center line (a) of the viewing angle of the camera coincides with the 0° line of the camera.
  • the sound source angle corresponding to the sound source angle information of the person (b) is 170°
  • the current shooting angle of the camera is 50°.
  • the angle difference is 70°, it is determined that the camera's target rotation direction is rightward, and the target rotation angle is 70°. At this time, the center line (b) of the viewing angle of the camera coincides with the 120° line of the camera.
  • the display device provided by the embodiment of the present application can still rotate the camera to the preset angle range according to the position of the character.
  • the position of the minimum or maximum value of depending on the viewing angle coverage of the camera, an image containing a person is captured.
  • the camera can be rotated within a preset angle range
  • the controller is configured to obtain the sound source information of the characters collected by the sound collector and identify the sound source, and determine the sound source used for identification.
  • the sound source angle information of the azimuth angle of the character's position based on the current shooting angle and sound source angle information of the camera, determine the target rotation direction and target rotation angle of the camera; according to the target rotation direction and target rotation angle, adjust the camera's shooting angle, so that the shooting area of the camera is facing the position of the person's voice.
  • the display device provided by the present application can trigger the rotation of the camera by using the sound source information of the person, and can automatically identify the real-time position of the user and adjust the shooting angle of the camera, so that the camera can always capture images containing the portrait.
  • FIG. 10 exemplarily shows a flowchart of a method for adjusting the shooting angle of a camera according to some embodiments.
  • a method for adjusting the shooting angle of a camera provided by an embodiment of the present application is executed by the controller in the display device provided by the foregoing embodiment, and the method includes:
  • S1 obtain the character sound source information collected by the sound collector and the current shooting angle of the camera, and the character sound source information refers to the sound information generated when the character interacts with the display device through voice;
  • the method before performing sound source identification on the sound source information of a character and determining the angle information of the sound source, the method further includes: performing text extraction on the character sound source information to obtain a voice interaction text; comparing the voice The interactive text and the preset wake-up text, the preset wake-up text refers to the text used to trigger the sound source recognition process; if the voice interactive text is consistent with the preset wake-up text, then perform the sound source information of the person. Steps for sound source identification.
  • a plurality of groups of sound collectors are included, and the controller acquires the sound source information of the characters collected by the sound collectors specifically: acquiring the voice of the character collected by each of the sound collectors while speaking The generated character sound source information; the performing sound source identification on the character sound source information to determine the sound source angle information includes: respectively performing sound source identification on each of the character sound source information, and calculating a plurality of groups of the sound collectors.
  • the voice time difference generated when the corresponding character sound source information is collected; based on the voice time difference, the sound source angle information of the position of the character when the voice is spoken is calculated.
  • the performing sound source identification on each of the character sound source information, and calculating the voice time difference generated when the multiple groups of the sound collectors collect the corresponding character sound source information include: Extract the ambient noise, the sound source signal of the human voice and the propagation time of the human voice to each sound collector from the character sound source information; determine each sound according to the environmental noise, the sound source signal and the propagation time The received signal of the collector; the cross-correlation delay estimation algorithm is used to process the received signal of each sound collector to obtain the speech time difference generated by each two sound collectors when collecting the corresponding character sound source information.
  • the calculating, based on the speech time difference, the sound source angle information of the position where the character is speaking includes: acquiring the speed of sound in the current environmental state, the coordinates of each sound collector, and the The set number of sound collectors; according to the set number of the sound collectors, determine the number of combination pairs of the sound collectors, and the number of combination pairs refers to the number of combinations obtained by combining the sound collectors in pairs;
  • the voice time difference, the speed of sound and the coordinates of each sound collector corresponding to the sound collector establish a vector relationship equation set, the number of the vector relationship equation set is the same as the number of pairs of combinations; solve the vector relationship equation set, obtain the character voice when The vector value of the unit plane wave propagation vector of the sound source at the location; according to the vector value, the sound source angle information of the location where the character is speaking is calculated.
  • the method before acquiring the current shooting angle of the camera, includes: querying the current operating state of the camera; if the current operating state of the camera is in a rotating state, waiting for the camera to rotate; If the current operating state of the camera is in a non-rotation state, the current shooting angle of the camera is acquired.
  • determining the target rotation direction and target rotation angle of the camera based on the current shooting angle and sound source angle information of the camera includes: converting the sound source angle information into the coordinate angle of the camera; calculating The angle difference between the coordinate angle of the camera and the current shooting angle of the camera is used as the target rotation angle of the camera; the target rotation direction of the camera is determined according to the angle difference.
  • converting the sound source angle information into the coordinate angle of the camera includes: acquiring the sound source angle range of the character when speaking and the preset angle range when the camera is rotated; calculating the sound source angle range The angle difference between the source angle range and the preset angle range, and the half value of the angle difference is used as the conversion angle; the angle difference between the angle corresponding to the sound source angle information and the conversion angle is calculated, and the The angle difference is used as the coordinate angle of the camera.
  • determining the target rotation direction of the camera according to the angle difference includes: if the angle difference is a positive value, determining that the target rotation direction of the camera is rightward rotation; If the angle difference is a negative value, it is determined that the target rotation direction of the camera is to rotate to the left.
  • the camera 232 as a detector 230 can be built in or externally connected to the display device 200 , and after the operation is started, the camera 232 can detect image data.
  • the camera 232 can be connected with the controller 250 through an interface component, so as to send the detected image data to the controller 250 for processing.
  • the camera 232 may include a lens assembly and a pan/tilt assembly.
  • the lens assembly may be an image acquisition element based on CCD (Charge Coupled Device, charge coupled device) or CMOS (Complementary Metal Oxide Semiconductor, complementary metal oxide semiconductor), so as to generate image data of electrical signals according to user images.
  • the lens assembly is arranged on the gimbal assembly, and the gimbal assembly can drive the lens assembly to rotate, so as to change the orientation of the lens assembly.
  • the pan/tilt assembly may include at least two rotating parts, so as to drive the lens assembly to rotate left and right in the numerical direction, and rotate up and down in the horizontal direction, respectively.
  • Each rotating part can be connected to a motor so that it can be automatically rotated by the motor.
  • the pan/tilt assembly may include a first rotating shaft in a vertical state and a second rotating shaft in a horizontal state, and the first rotating shaft is disposed on the top of the display 275 and is rotatably connected with the top of the display 275;
  • the first rotating shaft is also provided with a fixing piece, the top of the fixing piece is rotatably connected with the second rotating shaft, and the second rotating shaft is connected with the lens assembly to drive the lens assembly to rotate.
  • the first rotating shaft and the second rotating shaft are respectively connected with a motor and a transmission component.
  • the motor can be a servo motor, a stepping motor, etc. that can support automatic control of the rotation angle. After acquiring the control command, the two motors can be rotated respectively to drive the first rotating shaft and the second rotating shaft to rotate, so as to adjust the orientation of the lens assembly.
  • the lens assembly can capture video of users at different positions, so as to obtain user image data. Obviously, different orientations correspond to image capture in different areas.
  • the first rotating shaft on the pan/tilt assembly can drive the fixing piece and the lens assembly to rotate to the left, so that the In the captured image, the position of the user's portrait is located in the central area of the screen; when the imaging position of the user's body is lower, the lens assembly can be rotated upward through the second rotating shaft in the gimbal assembly to raise the shooting angle and make the user's portrait position located in the center area of the screen.
  • the controller 250 may recognize the position of the user's portrait in the image by executing the method of tracking the human portrait. And when the user's position is not suitable, the camera 232 is controlled to rotate to obtain a suitable image. Wherein, identifying the location of the user may be accomplished through image processing. For example, after the camera 232 is activated, the controller 250 may capture at least one image through the camera 232 as a proofreading image. And feature analysis is performed in the proofreading image, thereby identifying the portrait area in the proofreading image. By judging the position of the portrait area, it is determined whether the user's position is appropriate.
  • the initial orientation of the camera 232 may be offset from the position of the user in space. That is, in some cases, the shooting range of the camera 232 cannot cover the portrait of the user, so that the portrait of the user cannot be photographed by the camera 232, or only a small part of the portrait can be obtained. In this case, the portrait area cannot be recognized during the image processing process, and the rotation control of the camera 232 cannot be realized when the user's position is not appropriate, that is, effective adjustment cannot be performed for the person not in the current image.
  • the display device 200 is further provided with a sound collector 231 .
  • the sound collector 231 may form an array with a plurality of microphones, and collect sound signals sent by the user at the same time, so as to determine the user's orientation through the collected sound signals. That is, as shown in Fig. 18a and Fig. 18b, some embodiments of the present application provide an audio-visual character location tracking method, which includes the following steps:
  • the controller 250 may automatically run the audio-visual character location tracking method after the camera 232 is activated, and acquire the test audio signal input by the user.
  • the activation of the camera 232 may be manual activation or automatic activation.
  • Manual startup means that the startup is completed after the user selects the icon corresponding to the camera 232 in the operation interface through the control device 100 such as the remote controller.
  • the automatic start may be automatically started after the user performs some interactive actions that need to call the camera 232 . For example, when the user selects the "Look in the mirror" application in the "My Application" interface, since the application needs to call the camera 232, the camera 232 is also started when the application is started and run.
  • the posture of the camera 232 after startup can be the default initial posture, for example, the default initial posture is set as the lens assembly of the camera 232 facing forward; the posture after startup can also be the posture maintained when the camera 232 was used last time, for example, on the In one use, the camera 232 is adjusted to a posture that is raised by 45 degrees. After the camera 232 is activated this time, the posture of the camera 232 is also a posture of being raised by 45 degrees.
  • the controller 250 may acquire the test audio signal input by the user through the sound collector 231 . Since the sound collector 231 includes a microphone array, the microphones at different positions can collect different audio signals for the same test audio.
  • a text prompt can also be automatically displayed on the display 275 and/or a voice prompt can be played through an audio output device such as a speaker to prompt the user to input test audio, such as "Please enter the test audio. Audio: Hi! Gathering".
  • test audio may be a variety of audio signals sent by the user, including: the user's voice through speaking, the user's voice through body movements such as clapping hands, and the user's voice through other handheld terminals.
  • a control command for controlling the sound generation can be sent to the smart terminal, so that the smart terminal can control the command after receiving the command. , to automatically play a specific sound for the sound collector 231 to detect.
  • the controller 250 may acquire the sound signal through the sound acquisition component, and extract the voiceprint information from the sound signal. Then compare the voiceprint information with the preset test voiceprint, if the voiceprint information is the same as the preset test voiceprint, mark the voice signal as a test audio signal; if the voiceprint information is different from the preset test voiceprint, control the display 275 to display prompt interface.
  • test audio signal when the test audio signal is set to be the voice with the content "Hi! Small gathering", after the microphone detects the sound signal, the voiceprint information in the sound signal can be extracted, and it can be judged whether the current voiceprint information is the same as "Hey! The voiceprint information of "Hi! Xiaoju” is the same, and after confirming that the voiceprint information is the same, perform the next steps.
  • this method of using the intelligent terminal to make sounds can realize that the sound emitted has a specific waveform or loudness, so that the corresponding audio signal has unique sound characteristics, so it is convenient for the subsequent comparison and analysis of the audio signal, and alleviate other problems in the environment. The effect of sound on the analysis process.
  • Target orientation is located based on the test audio signal.
  • the controller 250 may analyze the test audio signal to determine the target position where the user is located. Since the sound collector 231 includes a plurality of microphones and forms a microphone array, the distances between different microphones and the sound source position are different with respect to one sound source position, and correspondingly the collected audio signals have a certain propagation delay. . The controller 250 may determine the approximate location where the user utters the sound by analyzing the propagation time delay between the at least two microphones and combining the distance between the two microphones and the propagation speed of the sound in the air.
  • the position where the sound is emitted can be located, that is, the target orientation can be determined. Since the purpose of detecting the target azimuth is to orient the lens assembly of the camera 232 toward the azimuth, the target azimuth can only be represented by a relative angle, so that the controller 250 can directly determine the relative angle data after locating the target azimuth, and Based on this, the angle that the camera 232 needs to be adjusted is calculated.
  • the relative angle may refer to the relative angle between the target position and the vertical line of the plane where the camera 232 is located (ie, the plane parallel to the screen of the display 275 ), or the relative angle between the target position and the lens axis of the camera 232 .
  • the external sound collector 231 of the display device 200 includes two microphones, which are respectively arranged at two side positions of the display 275 , and the camera 232 is arranged at the center position of the top side of the display 275 .
  • the microphones on both sides can detect the test audio signal respectively. According to the positional relationship in Figure 19, it can be known that:
  • Target orientation ⁇ arctan(L2/D); wherein, L2 is the horizontal distance between the user and the camera 232 , and D is the vertical distance between the user and the camera 232 .
  • the display width H, the propagation velocity v and the acquisition time difference ⁇ t are known, so L2/D can be solved through the above positional relationship, and then the target azimuth ⁇ can be solved.
  • the controller 250 can obtain the test audio signal collected by at least two microphones and then extract the acquisition time difference of the test audio signal, so as to calculate the target orientation according to the acquisition time difference and the installation position data of the microphone and the camera.
  • the positional relationship can also be determined in the horizontal direction and the vertical direction respectively, so as to calculate the horizontal deflection angle and the vertical deflection angle of the user's position relative to the camera position. For example, the number of microphones can be increased or the microphones can be arranged at different heights, thereby determining the positional relationship in the vertical direction to calculate the vertical deflection angle.
  • the more the number of microphones the more accurate the user's orientation can be located, and the more able to detect the delay value between the audio signals received by different microphones. Therefore, in practical applications, the number of microphones can be appropriately increased. To improve the accuracy of target orientation detection. At the same time, in order to increase the time delay value to reduce detection error interference, a more accurate detection result can also be obtained by increasing the distance between the microphones.
  • the rotation angle is calculated according to the target orientation and the current posture of the camera 232 .
  • the rotation angle of the camera 232 can be calculated, so that the camera can make the lens assembly face the target azimuth according to the rotation angle. For example, as shown in Fig. 18a and Fig. 18b, the current camera 232 is in the default initial posture, and the relative angle between the target orientation of the positioning and the vertical line of the screen is a direction offset by 30° to the left, then the rotation angle ⁇ is calculated as 30° left (+30°).
  • the rotation angle can be calculated by the actual camera 232 according to the transformation between the position and the current attitude. For example, if the current camera 232 is in a posture of turning 50° to the left, and the relative angle between the orientation of the positioned target and the vertical line of the screen is 30° to the left, the calculated rotation angle is 20° to the right (-20°) .
  • the camera 232 can be controlled in one direction by controlling the Rotation enables the shot proof image to include the portrait area.
  • the camera 232 cannot capture a portrait by rotating in the horizontal direction.
  • the target orientation in space (including the height direction) can also be determined through multiple microphones, and when calculating the rotation angle, the target orientation is decomposed into two angular components in the horizontal direction and the vertical direction , so as to control the rotation angle of the camera 232 respectively.
  • a rotation instruction is generated according to the rotation angle, and the rotation instruction is sent to the camera 232 .
  • the controller 250 may package the rotation angle to generate a rotation instruction. And send the rotation instruction to the camera 232 .
  • the motor in the camera 232 can rotate after receiving the control command, so as to drive the lens assembly to rotate through the rotating shaft, and adjust the orientation of the lens assembly.
  • the display device 200 can connect an external camera 232 and a sound collector 231 through the interface component, and after entering the application that needs to perform portrait tracking, collect test audio signals through multiple microphones in the sound collector 231, and locate the user.
  • the target orientation so as to control the camera 232 to rotate, so that the lens assembly faces the orientation of the user, so as to adjust the shooting direction of the camera 232 to face the target orientation, so as to facilitate the collection of images containing the user's portrait, so that there is no portrait on the current screen.
  • the area can also be adjusted to achieve subsequent character tracking.
  • the controller 250 can also continue to perform the audio-visual person localization tracking method and acquire the image to identify the position of the person in the image, so that the position of the person changes. , control the camera 232 to rotate to track the user's position, so that the portrait in the image captured by the camera 232 is always located in an appropriate area.
  • the controller 250 can also acquire a calibration image through the camera 232, and detect a portrait pattern in the calibration image; and then mark the portrait pattern, And when the user moves the position, a tracking instruction is sent to the camera 232 to track the user's position.
  • the character pattern can always be in a proper position, for example, in the middle area of the image, so that when performing applications such as "looking in the mirror” and "moving follow", , you can get a better display effect in the application interface.
  • the controller 250 may acquire the calibration image through the camera 232 at a set frequency, and detect the position of the portrait pattern in the calibration image.
  • different preset area ranges can be set according to the application type.
  • the portrait pattern is within the preset area, it means that in the currently collected proofreading image, the portrait pattern is in a suitable position and can maintain the current The shooting direction of the camera 232 remains unchanged.
  • the portrait pattern is no longer within the preset area, it means that the current user's position moves a large distance, and the portrait pattern in the collected proofreading image is inappropriate, and the shooting direction of the camera 232 needs to be adjusted.
  • the controller 250 can generate a tracking instruction according to the position of the portrait pattern, and send the tracking instruction to the camera 232 to control the camera 232 to adjust the shooting direction.
  • the adjusted shooting direction should be able to keep the portrait pattern within the preset area.
  • the audio-visual character location tracking method further includes the following steps:
  • the camera 232 After the camera 232 is rotated and adjusted, the camera 232 can capture multiple frames of images in real time, and send the captured images to the controller 250 of the display device 200 .
  • the controller 250 can process the image according to the activated application program, for example, control the display 275 to display the image;
  • the detection of the user's position can be completed by an image processing program. That is, body information is detected by capturing images captured by the camera 232 in real time.
  • the limb information can contain key points and the outer frame that wraps the limb, and the position information in the image is obtained by the detected key points and the position of the limb frame.
  • Keypoints can refer to a series of points in a human image that can represent human features. For example, eyes, ears, nose, neck, shoulders, elbows, wrists, waist, knees, and ankles.
  • the determination of key points can be obtained through image recognition, that is, the image corresponding to the key points can be determined by analyzing the characteristic shape in the picture and matching with the preset template, and the position corresponding to the image can be obtained, so as to obtain the position corresponding to each key point. .
  • the position can be represented by the number of pixels in the image from the boundary.
  • a plane rectangular coordinate system can be constructed with the upper left corner of the image as the origin and the right and downward directions as the positive directions, then each pixel in the image can pass through this rectangular coordinate system. to express.
  • the viewing angles of the camera in the horizontal and vertical directions are HFOV and VFOV respectively.
  • the viewing angle can be obtained from the camera's CameraInfo.
  • the camera preview image supports 1080P, the width is 1920, and the height is 1080 pixels.
  • the position of each pixel can be (x, y), where the value range of x is (0, 1920); the value range of y is (0, 1080).
  • the number of key points can be set to multiple, and in one detection process, all or part of the multiple key points need to be extracted, so as to determine the outer frame area of the wrapped limb.
  • keypoints can include 18, i.e. 2 eye points, 2 ear points, 1 nose point, 1 neck point, 2 shoulder points, 2 elbow points, 2 wrist points, 2 Waist point (or hip point), 2 knee points, and 2 ankle points.
  • different identification methods are required according to different user orientations. For example, the position corresponding to the waist is identified as the waist point when the user faces the display 275 , and is identified as the hip point when the user faces away from the display 275 .
  • the positions of some key points will change.
  • the relative position of the human body in the image captured by the camera 232 will also change. For example, when the human body moves to the left, the position of the human body in the image captured by the camera 232 will be shifted to the left, which is inconvenient for image analysis processing and real-time display.
  • the x-axis coordinate can be judged first to determine whether the x-axis coordinate of the center position is in the The center position of the entire image. For example, when the proofreading image is a 1080P image of (1920, 1080), the horizontal coordinate of the center point of the proofreading image is 960.
  • an allowable coordinate range can be preset.
  • the center position of the portrait is within the allowable coordinate range, it is determined that the current user position is within the preset area. For example, if the maximum allowable coordinate error is 300 pixels, the allowable coordinate interval is [660, 1260].
  • the camera 232 After comparing the user's position with the preset area in the proofreading image, it can be determined whether portrait tracking is required according to the comparison result. If the current user's position is not within the preset area, the camera 232 is controlled to rotate so that the user's imaging position is located in the central area of the screen. If the current user position is within the preset area, there is no need to control the rotation of the camera 232, and the orientation of the camera can be maintained.
  • the controller 250 may calculate the rotation angle according to the user position, and generate a control instruction according to the rotation angle to control the camera 232 to rotate.
  • the controller 250 may first calculate the distance between the center position of the portrait area and the center point of the image area; The angle of view and image size are calculated to obtain the rotation angle; finally, the calculated rotation angle is sent to the camera 232 in the form of a control command, so that the motor in the camera 232 drives each shaft to rotate, thereby adjusting the orientation of the lens assembly.
  • the angle at which the camera 232 needs to be adjusted can be calculated, and the controller 250 then compares the center position of the portrait area with the coordinate values of the center point of the image area, and determines the orientation of the center position of the portrait area relative to the center point of the image area, thereby The rotation direction of the camera 232 is determined. That is, if the horizontal position of the center of the portrait area is larger than the center of the image, turn the camera 232 to the right; otherwise, turn the camera 232 to the left.
  • the camera 232 may adopt a rear camera mode, so that the image displayed on the screen and the image captured by the camera are in a left-right mirror relationship, that is, the horizontal angle rotation is opposite to the left and right.
  • the controller 250 can package the rotation angle and direction data, generate a control command, and send the control command to the camera 232 .
  • the motor in the camera 232 can rotate after receiving the control command, so as to drive the lens assembly to rotate through the rotating shaft, and adjust the orientation of the lens assembly.
  • the horizontal coordinates are used as an example for judgment and adjustment.
  • the components are also adjusted in the same way, and the specific adjustment method is the same as the adjustment method in the horizontal direction, that is, after determining that the current user position is not within the preset area, the controller 250 can first calculate the difference between the center position of the portrait area and the center point of the image area. Then according to the calculated vertical distance, combined with the maximum vertical angle of view of the camera 232 lens assembly and the image size, the rotation angle is calculated to obtain the rotation angle; finally, the calculated rotation angle is sent to the camera 232 in the form of a control instruction, so that The motor in the camera 232 drives the second shaft to rotate, thereby adjusting the orientation of the lens assembly.
  • controlling the rotation of the camera 232 so that the imaging position of the user is located in the middle area of the screen may also be performed according to the following steps.
  • a first identification point is detected in the proofreading image.
  • the first identification point is to identify one or more key points, which are used to represent the position of a part of the user's limbs.
  • the first identification points may be 2 eye points (or 2 ear points) to represent the position of the user's head.
  • proofreading image does not contain the first identification point, a second identification point is detected in the proofreading image.
  • the second identification point is a key point that is spaced apart from the first identification point by a certain distance and can have a relative positional relationship.
  • the second identification point may be a chest point. Since the chest point is located below the eye point in a normal use state, and the distance between the chest point and the eye point is 20-30 cm, it can be determined by detecting the chest point The direction that needs to be adjusted.
  • the rotation direction is determined according to the positional relationship between the second identification point and the first identification point.
  • the first recognition point that is, the eye point
  • the second recognition point that is, the chest point
  • the first identification point is not detected in the proofreading image, but when the second identification point is detected, the determined rotation direction is also different. of.
  • the first identification point is the waist point and the second identification point is the chest point
  • the waist point is not detected but the chest point is detected, it means that the captured image is too close to the upper half of the portrait, so you can reduce the shooting angle by reducing the shooting angle. to bring the lower half of the portrait into the preset area of the image.
  • the camera 232 is controlled to rotate according to the rotation direction and the preset adjustment step, so that the portrait is located in the image preset area.
  • the camera 232 can be lifted up to make the position of the first identification point Adjust 100 pixels each time until the first recognition point is at the 1/7-1/5 position.
  • the position of the first identification point relative to the image area is obtained.
  • the position of the first identification point can be further extracted, so as to determine the position of the first identification point relative to the entire image area. For example, as shown in Fig. 23a, after obtaining the proofreading image, if the eye point is identified, that is, it is determined that the first identification point is detected, the current coordinate P(x 1 , y 1 ) of the eye point can be obtained. Then compare the x-axis coordinate value and/or y-axis coordinate value in the current coordinates with the overall width imgWidth and/or height imgHeight of the image, so as to determine the position of the first recognition point relative to the image area.
  • the positions of the first identification point relative to the image area in the two directions may be determined in the horizontal direction and the vertical direction. That is, in the horizontal direction, the position of the first identification point relative to the image area is x 1 /imgWidth; in the vertical direction, the position of the first identification point relative to the image area is y 1 /imgHeight.
  • one portrait among multiple portraits may also be locked for tracking through a locking program. For example, you can search for the person closest to the center of the screen in a certain area in the center of the screen, as the optimal face information (1/3 area of the central screen, with the most occurrences), so as to record the person information and lock it. If no face information is detected, it means that the sound information has a large error, and the person closest to the screen is locked.
  • the adjustment of the camera 232 may only be affected by the position of the locked person. That is, the movement of other people in the image captured by the camera 232 will not adjust the camera 232, and the camera 232 will remain stationary. Only the person in the locked state moves, and after detection through image detection, the camera 232 is driven to rotate following the locked person.
  • the display device 200 can obtain a proofreading image through the camera 232, and detect a portrait pattern in the proofreading image, thereby marking the portrait pattern, and sending a tracking instruction to the camera when the user moves to track the user. position, to achieve the effect that the camera 232 moves with the user.
  • the portrait pattern can always be in a proper position, which is convenient for the application to display, call, and analyze.
  • the proofreading image includes a plurality of portrait patterns, search for a portrait pattern located in the central area of the proofreading image; if the central area of the proofreading image contains a portrait pattern, the mark is located in the image The portrait pattern in the central area; if there is no portrait pattern in the center area of the proofreading image, mark the portrait pattern with the largest area in the proofreading image.
  • the controller 250 may query the status of the camera 232 in real time, and if the rotation of the camera 232 ends according to the test audio signal, start the AI image detection algorithm. Find the face information away from the center of the screen in a certain area in the center of the screen, record the information of the person and lock it. If no face information is detected, it means that the sound information has a large error, and the person closest to the screen is locked.
  • an image recognition may be performed on the image captured by the camera 232 to determine whether the current camera 232 can capture a picture with a portrait. If a person is identified from the captured image, the target tracking is performed directly through subsequent image processing without sound source localization. That is, after the camera 232 is activated, an initial image for recognizing a portrait can be obtained first, and a portrait region can be identified in the initial image.
  • the identification method of the portrait area can be the same as the above-mentioned embodiment, that is, it is completed by identifying key points.
  • the user position detection and subsequent steps are directly performed, and the portrait target is tracked by means of image processing. If the initial image does not contain a human portrait area, the camera 232 is adjusted to the area facing the user's position by means of sound source localization by obtaining the test audio signal input by the user and subsequent steps, and then performing the detection of the user's position and subsequent steps.
  • a schematic diagram of the skeleton line can be established according to the identified key points, so that according to the skeleton line
  • the graphics further determine where the portrait is located.
  • the skeleton line can be determined by connecting multiple key points. Under different user poses, the shape of the skeleton line is also different.
  • the drawn skeletal line can also dynamically adjust the shooting position of the camera according to the movement change rule of the skeletal line. For example, when it is judged that the movement state of the skeletal line changes from the squatting state to the standing state, the viewing angle of the camera 232 can be raised so that the portrait in the standing state can also be in a suitable area in the image, that is, from the image 24a transitions to the effect shown in Figure 24b. When judging that the movement state of the skeletal line changes from the standing state to the squatting state, this can reduce the viewing angle of the camera 232, so that the portrait in the squatting state can also be in a suitable area in the image, that is, the transition from FIG. 24b to The effect shown in Figure 24a.
  • the above embodiment takes the portrait position at the center of the image as an example to illustrate the tracking of the portrait by the camera 232. It should be understood that, according to actual needs, in the expected captured image, the portrait position may be located in other regions than the central region.
  • the display device 200 can render a virtual coach image according to the video captured by the camera 232, so that the audio and video of the scene viewed by the user through the display device 200 include the user portrait and the virtual coach. portrait.
  • the portrait shot by the camera 232 needs to be located on one side of the image, and the other side is used for rendering the virtual coach image.
  • the audiovisual person localization tracking method can improve the accuracy of sound source localization and is unable to effectively locate the specific person. Location flaws.
  • the image processing space perception is poor, and it can only locate the shooting area targeted by the camera 232.
  • Described audiovisual character localization and tracking method comprehensively utilizes sound source localization and camera 232 image analysis, utilizes the advantage that sound source localization space perception is stronger, first confirms the approximate position of the person, and drives camera 232 toward the direction of the sound source.
  • the camera 232 performs person detection to determine the specific position, and drives the camera to perform fine-tuning, so as to achieve precise positioning, so that the person captured by the camera 232 can be focused and displayed in the image.
  • the present application further provides a display device 200 , including: a display 275 , an interface component, and a controller 250 .
  • the display 275 is configured to display a user interface
  • the interface component is configured to connect the camera 232 and the sound collector 231, the camera 232 can rotate the shooting angle, and is configured to shoot images
  • the sound collector 231 includes a plurality of microphones. an array of microphones, configured to acquire audio signals.
  • the controller 250 is configured to obtain the test audio signal input by the user, and in response to the test audio signal, locate the target orientation, and the target orientation is calculated and obtained according to the time difference of the test audio signal collected by the sound acquisition component, thereby sending a rotation instruction to the camera to adjust the camera. from the shooting direction to the direction facing the target.
  • the camera 232 and the sound collector 231 can be externally connected through the interface component, and the above-mentioned audio-visual person localization tracking method can be completed in combination with the display device 200 .
  • the camera 232 and the sound collector 231 may also be directly built into the display device 200 , that is, the display device 200 includes the display 275 , the camera 232 , the sound collector 231 and the controller 250 , wherein the camera 232 , the sound collector 231 and the controller 250
  • the collector 231 can be directly connected to the controller 250, so as to obtain the test audio signal directly through the sound collector 231, and directly control the camera 232 to rotate, so as to complete the above-mentioned method for locating and tracking audio-visual characters.
  • the present application further provides a computer storage medium, wherein the computer storage medium can store a program, and when the program is executed, it can include part or all of the various embodiments of the method for adjusting the shooting angle of the camera provided by the present application step.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (English: read-only memory, abbreviated as: ROM) or a random access memory (English: random access memory, abbreviated as: RAM) and the like.

Abstract

本申请公开了一种显示方法及显示设备,其中的摄像头可在预设角度范围内转动,控制器被配置为获取声音采集器采集的人物声源信息并进行声源识别,确定用于标识人物所在位置的方位角度的声源角度信息;基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;按照目标转动方向和目标转动角度,调整摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。

Description

一种显示方法及显示设备
本申请要求在2020年8月21日提交中国专利局、申请号为202010848905.X、名称为“一种声像人物定位追踪方法”的中国专利申请的优先权;本申请要求在2020年7月1日提交中国专利局、申请号为202010621070.4、名称为“一种摄像头拍摄角度的调整方法及显示设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2021年1月6日提交中国专利局、申请号为202110014128.3、名称为“一种显示设备及声像人物定位追踪方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电视软件技术领域,尤其涉及一种显示方法及显示设备。
背景技术
随着显示设备的快速发展,显示设备功能将越来越丰富,性能也越来越强大。例如,显示设备可实现网络搜索、IP电视、BBTV网视通、视频点播(VOD)、数字音乐、网络新闻、网络视频电话等功能。而在利用显示设备实现网络视频通话功能时,需在显示设备上安装摄像头,实现用户形象的采集。
发明内容
本申请实施例提供了一种显示设备,包括:
摄像头,所述摄像头被配置为采集人像以及实现在预设角度范围内的转动;
声音采集器,所述声音采集器被配置为采集人物声源信息,所述人物声源信息是指人物通过语音与显示设备交互时产生的声音信息;
与所述摄像头和所述声音采集器连接的控制器,所述控制器被配置为:获取所述声音采集器采集的人物声源信息和所述摄像头的当前拍摄角度;
对所述人物声源信息进行声源识别,确定声源角度信息,所述声源角度信息用于表征人物在语音时所处位置的方位角度;
基于所述摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;
按照所述目标转动方向和目标转动角度,调整所述摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。
附图说明
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1中示例性示出了根据一些实施例的显示设备与控制装置之间操作场景的示意图;
图2中示例性示出了根据一些实施例的显示设备200的硬件配置框图;
图3中示例性示出了根据一些实施例的控制设备100的硬件配置框图;
图4中示例性示出了根据一些实施例的显示设备200中软件配置示意图;
图5中示例性示出了根据一些实施例的显示设备200中应用程序的图标控件界面显示示意图;
图6中示例性示出了根据一些实施例的显示设备的结构框图;
图7中示例性示出了根据一些实施例的实现摄像头转动的预设角度范围的示意图;
图8中示例性示出了根据一些实施例的在预设角度范围内摄像头转动的场景图;
图9中示例性示出了根据一些实施例的声源角度范围的示意图;
图10中示例性示出了根据一些实施例的摄像头拍摄角度的调整方法的流程图;
图11中示例性示出了根据一些实施例的唤醒文本的对比方法的流程图;
图12中示例性示出了根据一些实施例的对人物声源信息进行声源识别的方法流程图;
图13中示例性示出了根据一些实施例的确定摄像头的目标转动方向和目标转动角度的方法流程图;
图14中示例性示出了根据一些实施例的调整摄像头拍摄角度的一种场景图;
图15a中示例性示出了根据一些实施例的调整摄像头拍摄角度的另一种场景图;
图15b中示例性示出了根据一些实施例的人物语音时所处位置的场景图;
图16为本申请实施例中显示设备与摄像头布置结构示意图;
图17为本申请实施例中摄像头结构示意图;
图18a为本申请实施例中调整前显示设备场景示意图;
图18b为本申请实施例中调整后显示设备场景示意图;
图19为本申请实施例中通过声源定位场景示意图;
图20为本申请实施例中关键点示意图;
图21为本申请实施例中人像中心与图像中心示意图;
图22为本申请实施例中计算旋转角度过程的几何关系示意图;
图23a为本申请实施例中调整旋转角度过程初始状态示意图;
图23b为本申请实施例中调整旋转角度过程结果示意图;
图24a为本申请实施例中蹲姿状态示意图;
图24b为本申请实施例中站姿状态示意图;
图25a为本申请实施例中虚拟人像初始状态显示效果示意图;
图25b为本申请实施例中虚拟人像调整后显示效果示意图。
具体实施方式
为使本申请的目的、实施方式和优点更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,所描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
基于本申请描述的示例性实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请所附权利要求保护的范围。此外,虽然本申请中公开内容按照示范性一个或几个实例来介绍,但应理解,可以就这些公开内 容的各个方面也可以单独构成一个完整实施方式。
需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
本申请中说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别类似或同类的对象或实体,而不必然意味着限定特定的顺序或先后次序,除非另外注明(Unless otherwise indicated)。应该理解这样使用的用语在适当情况下可以互换,例如能够根据本申请实施例图示或描述中给出那些以外的顺序实施。
此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖但不排他的包含,例如,包含了一系列组件的产品或设备不必限于清楚地列出的那些组件,而是可包括没有清楚地列出的或对于这些产品或设备固有的其它组件。
本申请中使用的术语“模块”,是指任何已知或后来开发的硬件、软件、固件、人工智能、模糊逻辑或硬件或/和软件代码的组合,能够执行与该元件相关的功能。
本申请中使用的术语“遥控器”,是指电子设备(如本申请中公开的显示设备)的一个组件,通常可在较短的距离范围内无线控制电子设备。一般使用红外线和/或射频(RF)信号和/或蓝牙与电子设备连接,也可以包括WiFi、无线USB、蓝牙、动作传感器等功能模块。例如:手持式触摸遥控器,是以触摸屏中用户界面取代一般遥控装置中的大部分物理内置硬键。
本申请中使用的术语“手势”,是指用户通过一种手型的变化或手部运动等动作,用于表达预期想法、动作、目的/或结果的用户行为。
图1为根据实施例中显示设备与控制装置之间操作场景的示意图。如图1所示,用户可通过智能设备300或控制装置100操作显示设备200。
控制装置100可以是遥控器,遥控器和显示设备的通信包括红外协议通信或蓝牙协议通信,及其他短距离通信方式,通过无线或有线方式来控制显示设备200。用户可以通过遥控器上按键、语音输入、控制面板输入等输入用户指令,来控制显示设备200。
在一些实施例中,也可以使用智能设备300(如移动终端、平板电脑、计算机、笔记本电脑等)以控制显示设备200。例如,使用在智能设备上运行的应用程序控制显示设备200。
在一些实施例中,显示设备200还可以采用除了控制装置100和智能设备300之外的方式进行控制,例如,可以通过显示设备200设备内部配置的获取语音指令的模块直接接收用户的语音指令控制,也可以通过显示设备200设备外部设置的语音控制设备来接收用户的语音指令控制。
在一些实施例中,显示设备200还与服务器400进行数据通信。可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。
图2示例性示出了根据示例性实施例中控制装置100的配置框图。如图3所示,控制装置100包括控制器110、通信接口130、用户输入/输出接口140、存储器190、供电电源180。控制装置100可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起用用户与显示设备200之间交互中介作用。
图2示出了根据示例性实施例中显示设备200的硬件配置框图。
显示设备200包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器275、音频输出接口285、存储器260、供电电源290、用户接口265中的至少一种。
显示器275包括用于呈现画面的显示屏组件,以及驱动图像显示的驱动组件,用于接收源自控制器输出的图像信号,进行显示视频内容、图像内容以及菜单操控界面的组件以及用户操控UI界面。
显示器275可为液晶显示器、OLED显示器、以及投影显示器,还可以为一种投影装置和投影屏幕。
通信器220是用于根据各种通信协议类型与外部设备或服务器进行通信的组件。例如:通信器可以包括Wifi模块,蓝牙模块,有线以太网模块等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。显示设备200可以通过通信器220与外部控制设备100或服务器400建立控制信号和数据信号的发送和接收。
用户接口,可用于接收控制装置100(如:红外遥控器等)的控制信号。
检测器230用于采集外部环境或与外部交互的信号。例如,检测器230包括光接收器,用于采集环境光线强度的传感器;或者,检测器230包括图像采集器,如摄像头,可以用于采集外部环境场景、用户的属性或用户交互手势,再或者,检测器230包括声音采集器,如麦克风等,用于接收外部声音。
外部装置接口240可以包括但不限于如下:高清多媒体接口(HDMI)、模拟或数据高清分量输入接口(分量)、复合视频输入接口(CVBS)、USB输入接口(USB)、RGB端口等任一个或多个接口。也可以是上述多个接口形成的复合性的输入/输出接口。
控制器250和调谐解调器210可以位于不同的分体设备中,即调谐解调器210也可在控制器250所在的主体设备的外置设备中,如外置机顶盒等。
控制器250,通过存储在存储器260上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。例如:响应于接收到用于选择在显示器275上显示UI对象的用户命令,控制器250便可以执行与由用户命令选择的对象有关的操作。
对象可以是可选对象中的任何一个,例如超链接、图标或其他可操作的控件。与所选择的对象有关操作有:显示连接到超链接页面、文档、图像等操作,或者执行与所述图标相对应程序的操作。
在一些实施例中,用户可在显示器275上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。
“用户界面”可以指应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面常用的表现形式是图形用户界面(Graphic User Interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状 态栏、导航栏、Widget等可视的界面元素。
参见图4,在一些实施例中,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架(Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。
在一些实施例中,应用程序层中运行有至少一个应用程序,这些应用程序可以是操作系统自带的窗口(Window)程序、系统设置程序、时钟程序、相机应用等;也可以是第三方开发者所开发的应用程序,比如嗨见程序、K歌程序、魔镜程序等。在具体实施时,应用程序层中的应用程序包不限于以上举例,实际还可以包括其它应用程序包,本申请实施例对此不做限制。
框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。应用程序框架层相当于一个处理中心,这个中心决定让应用层中的应用程序做出动作。应用程序通过API接口,可在执行中访问系统中的资源和取得系统的服务。
如图4所示,本申请实施例中应用程序框架层包括管理器(Managers),内容提供者(Content Provider)等,其中管理器包括以下模块中的至少一个:活动管理器(Activity Manager)用与和系统中正在运行的所有活动进行交互;位置管理器(Location Manager)用于给系统服务或应用提供了系统位置服务的访问;文件包管理器(Package Manager)用于检索当前安装在设备上的应用程序包相关的各种信息;通知管理器(Notification Manager)用于控制通知消息的显示和清除;窗口管理器(Window Manager)用于管理用户界面上的括图标、窗口、工具栏、壁纸和桌面部件。
在一些实施例中,活动管理器用于:管理各个应用程序的生命周期以及通常的导航回退功能,比如控制应用程序的退出(包括将显示窗口中当前显示的用户界面切换到系统桌面)、打开、后退(包括将显示窗口中当前显示的用户界面切换到当前显示的用户界面的上一级用户界面)等。
在一些实施例中,窗口管理器用于管理所有的窗口程序,比如获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕,控制显示窗口变化(例如将显示窗口缩小显示、抖动显示、扭曲变形显示等)等。
在一些实施例中,系统运行库层为上层即框架层提供支撑,当框架层被使用时,安卓操作系统会运行系统运行库层中包含的C/C++库以实现框架层要实现的功能。
在一些实施例中,内核层是硬件和软件之间的层。如图4所示,内核层至少包含以下驱动中的至少一种:音频驱动、显示驱动、蓝牙驱动、摄像头驱动、WIFI驱动、USB驱动、HDMI驱动、传感器驱动(如指纹传感器,温度传感器,触摸传感器、压力传感器等)等。
在一些实施例中,内核层还包括用于进行电源管理的电源驱动模块。
在一些实施例中,图4中的软件架构对应的软件程序和/或模块存储在图2或图3所示的第一存储器或第二存储器中。
在一些实施例中,以魔镜应用(拍照应用)为例,当遥控接收装置接收到遥控器输入操作,相应的硬件中断被发给内核层。内核层将输入操作加工成原始输入事件(包 括输入操作的值,输入操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,根据焦点当前的位置识别该输入事件所对应的控件以及以该输入操作是确认操作,该确认操作所对应的控件为魔镜应用图标的控件,魔镜应用调用应用框架层的接口,启动魔镜应用,进而通过调用内核层启动摄像头驱动,实现通过摄像头捕获静态图像或视频。
在一些实施例中,对于具备触控功能的显示设备,以分屏操作为例,显示设备接收用户作用于显示屏上的输入操作(如分屏操作),内核层可以根据输入操作产生相应的输入事件,并向应用程序框架层上报该事件。由应用程序框架层的活动管理器设置与该输入操作对应的窗口模式(如多窗口模式)以及窗口位置和大小等。应用程序框架层的窗口管理根据活动管理器的设置绘制窗口,然后将绘制的窗口数据发送给内核层的显示驱动,由显示驱动在显示屏的不同显示区域显示与之对应的应用界面。
在一些实施例中,如图5中所示,应用程序层包含至少一个应用程序可以在显示器中显示对应的图标控件,如:直播电视应用程序图标控件、视频点播应用程序图标控件、媒体中心应用程序图标控件、应用程序中心图标控件、游戏应用图标控件等。
在一些实施例中,直播电视应用程序,可以通过不同的信号源提供直播电视。例如,直播电视应用程可以使用来自有线电视、无线广播、卫星服务或其他类型的直播电视服务的输入提供电视信号。以及,直播电视应用程序可在显示设备200上显示直播电视信号的视频。
在一些实施例中,视频点播应用程序,可以提供来自不同存储源的视频。不同于直播电视应用程序,视频点播提供来自某些存储源的视频显示。例如,视频点播可以来自云存储的服务器端、来自包含已存视频节目的本地硬盘储存器。
在一些实施例中,媒体中心应用程序,可以提供各种多媒体内容播放的应用程序。例如,媒体中心,可以为不同于直播电视或视频点播,用户可通过媒体中心应用程序访问各种图像或音频所提供服务。
在一些实施例中,应用程序中心,可以提供储存各种应用程序。应用程序可以是一种游戏、应用程序,或某些和计算机系统或其他设备相关但可以在智能电视中运行的其他应用程序。应用程序中心可从不同来源获得这些应用程序,将它们储存在本地储存器中,然后在显示设备200上可运行。
第一方面:
在一些实施例中,显示设备中需要利用到摄像头的应用程序包括“嗨见”、“照镜子”、“优学猫”、“健身”等,可实现“视频聊天”、“边看边聊”和“健身”等功能。“嗨见”是一款视频聊天应用,可实现手机与电视之间,电视与电视之间的一键聊天。“照镜子”是为用户提供镜子服务的应用,通过照镜子应用打开摄像头,用户可将智能电视作为镜子使用。“优学猫”是提供学习功能的应用。在实现“边聊边看”功能时,用户在启动“嗨见”应用进行视频通话的场景下,同时观看视频节目。“健身”功能可在显示设备的显示器上同步显示健身指导视频和摄像头拍摄的用户跟随健身指导视频做相应动作的图像,实现用户实时查看自身的动作是否标准。
由于用户在利用显示设备进行“视频聊天”、“边看边聊”或“健身”时,可能不会固定在一个位置不动,用户还可以边走边进行上述功能实现。但是现有的显示设备中,摄像头固定安装在显示设备上,摄像头的视角中心线与显示器垂直,且摄像头的 可视角度有限,通常位于60°~75°之间,即摄像头的拍摄区域为以摄像头的视角中心线向左和向右同步扩散形成60°~75°角度对应的区域。
如果用户走出摄像头的拍摄区域,摄像头将无法拍摄到包含用户人像的图像,使得显示器上无法显示人像。若在视频聊天通话场景下,与本端用户进行视频聊天通话的对端用户将无法看到本端用户;若在健身场景下,显示器上将无法显示用户呈现健身动作的图像,用户看不到自己的健身动作,将无法判断是否标准,影响用户体验。
图6中示例性示出了根据一些实施例的显示设备的结构框图。为了在用户走出摄像头的拍摄区域,使得摄像头仍然可以拍摄到用户的图像,参见图6,本申请实施例提供了一种显示设备,包括摄像头232、声音采集器231和控制器250。摄像头用于采集人像,摄像头不再采用固定安装方式,而是以可转动的方式安装在显示设备上,具体地,摄像头232以转动形式安装在显示器的顶部,摄像头232可沿显示器的顶部转动。
图7中示例性示出了根据一些实施例的实现摄像头转动的预设角度范围的示意图;图8中示例性示出了根据一些实施例的在预设角度范围内摄像头转动的场景图。参见图7和图8,预设摄像头232可在预设角度范围内转动,且在水平方向上转动。在一些实施例中,预设角度范围为0°~120°,即在面对显示器的位置,以用户的左侧为0°,用户的右侧为120°。以摄像头232的视角中心线垂直于显示器时的状态为初始状态,摄像头可实现由初始状态向左转动60°,以及,由初始状态向右转动60°;摄像头的视角中心线与显示器垂直的位置为摄像头60°的位置。
本申请实施例提供的显示设备,实现利用声源信息触发摄像头的转动,能够自动识别用户的实时所处位置并调整摄像头的拍摄角度,使得摄像头始终能够拍摄到包含人像的图像。为此,在一些实施例中,显示设备通过设置声音采集器231实现人物声源信息的采集。
为保证声源采集的准确性,显示设备中可设置多组声音采集器,在一些实施例中,显示设备中设置四组声音采集器231,四组声音采集器231可以线性的位置关系进行排列设置。在一些实施例中,声音采集器可为麦克风,四组麦克风线性排列形成麦克风阵列。在声音采集时,四组声音采集器231接收同一用户通过语音与显示设备交互时产生的声音信息。
图9中示例性示出了根据一些实施例的声源角度范围的示意图。用户在语音时,产生的声音会被360°接收到,因此,在用户位于显示设备正面时,用户产生的声源角度范围为0°~180°,同样的,在用户位于显示设备背面,用户产生的声源角度范围也为0°~180°。参见图9,以用户面对显示设备的位置为例,用户位于声音采集器左侧为水平0°,用户位于声音采集器右侧为水平180°。
再次参见图7和图9,声源的30°角位置等于摄像头的0°角位置,声源的90°角位置等于摄像头的60°角位置,声源的150°角位置等于摄像头的120°角位置。
控制器250分别与摄像头232和声音采集器231连接,控制器用于接收声音采集器采集到的人物声源信息,并对人物声源信息进行识别,确定出人物所在位置的方位角度,进而确定摄像头需要转动的角度。控制器按照确定出的摄像头需要转动的角度调整摄像头的拍摄角度,使得摄像头的拍摄区域正对人物语音时的所处位置,实现根据人物的位置调整摄像头的拍摄角度以拍摄到包含人物的图像。
图10中示例性示出了根据一些实施例的摄像头拍摄角度的调整方法的流程图。本申请实施例提供的一种显示设备,在根据人物的位置调整摄像头的拍摄角度时,控制器被配置为执行图10所示的摄像头拍摄角度的调整方法,包括:
S1、获取声音采集器采集的人物声源信息和摄像头的当前拍摄角度。
在一些实施例中,显示设备中的控制器在驱动摄像头转动,以调整摄像头的拍摄角度时,需根据人物在所处位置与显示设备进行语音交互时产生的人物声源信息来确定,人物声源信息是指人物通过语音与显示设备交互时产生的声音信息。
人物声源信息可确定出人物在语音时所处位置的方位角度,而为准确确定摄像头需要进行调整的角度,需要先获取摄像头的当前状态,即当前拍摄角度。摄像头的当前拍摄角度需要在摄像头处于停止状态时才可被获取,以保证摄像头的当前拍摄角度的准确性,进而保证确定摄像头需要进行调整角度的准确性。
因此,控制器在执行获取摄像头的当前拍摄角度之前,被进一步配置为执行下述步骤:
步骤11、查询摄像头的当前运行状态。
步骤12、如果摄像头的当前运行状态为处于旋转状态,则等待摄像头旋转完毕。
步骤13、如果摄像头的当前运行状态为处于未旋转状态,则获取摄像头的当前拍摄角度。
控制器内配置有马达控制服务,马达控制服务用于驱动摄像头转动、获取摄像头的运行状态和摄像头朝向角度。
马达控制服务实时监控摄像头的运行状态,控制器通过调用马达控制服务查询摄像头的当前运行状态,摄像头的当前运行状态可表征当前摄像头的朝向角度以及摄像头是否处于旋转状态。
如果摄像头正处于旋转状态,此时不能获取摄像头的当前拍摄角度,否则无法确定准确的数值。因此,在摄像头处于旋转状态时,需先等待摄像头执行前一指令完成转动后,在停止状态下,再执行获取摄像头的当前拍摄角度的步骤。
如果摄像头正处于未旋转状态,即摄像头处于停止状态,则可执行获取摄像头的当前拍摄角度的步骤。
S2、对人物声源信息进行声源识别,确定声源角度信息,声源角度信息用于表征人物在语音时所处位置的方位角度。
在获取到人物与显示设备交互产生的人物声源信息后,控制器需对人物声源信息进行声源识别,以判断出人物在语音时的所处位置,具体为方位角度,即人物是位于声音采集器的左侧、右侧还是正对声音采集器的位置,进而根据人物的所处位置调整摄像头的拍摄角度。
由于人物在与显示设备交互时,例如在视频通话场景中,人物语音可能是在与对端用户进行对话,而自身仍位于摄像头的拍摄区域内,若此时控制器执行调整摄像头的拍摄角度的步骤,则会出现无效操作。
因此,为准确的根据人物声源信息确定是否需要对摄像头的拍摄角度进行调整,需要先对人物产生的人物声源信息进行分析,判断人物声源信息是否为触发摄像头调整的信息。
在一些实施例中,可预先在控制器内存储用于触发摄像头拍摄角度调整的唤醒文 本,例如,定制“海信小聚”作为声源识别的唤醒文本。人物通过语音“海信小聚”作为识别声源,以触发调整摄像头拍摄角度的过程。唤醒文本也可定制为其他词语,本实施例中不做具体限定。
图11中示例性示出了根据一些实施例的唤醒文本的对比方法的流程图。具体地,参见图11,控制器在执行对人物声源信息进行声源识别,确定声源角度信息之前,被进一步配置为执行下述步骤:
S021、对人物声源信息进行文本提取,得到语音交互文本。
S022、对比语音交互文本和预置唤醒文本,预置唤醒文本是指用于触发声源识别过程的文本。
S023、如果语音交互文本与预置唤醒文本对比一致,则执行对人物声源信息进行声源识别的步骤。
在一些实施例中,控制器在获取到人物声源信息后,先进行文本提取,提取出人物通过语音与显示设备交互时的语音交互文本。将提取出的语音交互文本与预置唤醒文本进行对比,如果对比不一致,例如,人物语音并非“海信小聚”,而是其他交互内容,此时,说明当前人物的语音并非触发调整摄像头拍摄角度的语音,控制器无需执行调整摄像头拍摄角度的相关步骤。
如果对比一致,则说明当前人物的语音为触发调整摄像头拍摄角度的语音,例如,人物语音为预先设置的“海信小聚”,此时,控制器可继续执行后续调整摄像头拍摄角度的步骤。
在判断出人物声源信息为唤醒语音,即调整摄像头拍摄角度的触发语音时,控制器需执行后续声源识别的过程。
由于显示设备中设置多组声音采集器,多组声音采集器可采集到同一人物语音时的多组人物声源信息,那么控制器在获取声音采集器采集的人物声源信息时,可获取到每个声音采集器采集的人物在语音时产生的人物声源信息,即控制器会获取到多组人物声源信息。
图12中示例性示出了根据一些实施例的对人物声源信息进行声源识别的方法流程图。多组声音采集器采集同一唤醒文本时,由于每个声音采集器与人物之间的距离并不相同,因此,可对每个人物声源信息进行识别,以确定人物语音时的方位角度,即声源角度信息。具体地,参见图12,制器在执行对人物声源信息进行声源识别,确定声源角度信息,被进一步配置为执行下述步骤:
S21、对每个人物声源信息分别进行声源识别,计算多组声音采集器在采集对应的人物声源信息时产生的语音时间差。
S22、基于语音时间差,计算人物在语音时所处位置的声源角度信息。
每个声音采集器的频率响应一致,其采样时钟也同步,但由于每个声音采集器与人物之间的距离并不相同,因此,每个声音采集器能够采集到语音的时刻也并非相同,多组声音采集器之间会存在采集时间差。
在一些实施例中,可以通过声音采集器阵列计算声源距离阵列的角度和距离,实现对人物语音时所处位置的声源进行跟踪。基于TDOA(Time Difference Of Arrival,到达时间差)的声源定位技术,估计信号到达两两麦克风之间的时间差,从而得到声源位置坐标的方程组,然后求解方程组即可得到声源的精确方位坐标,即声源角度信 息。
在一些实施例中,在步骤S21中,控制器在执行对每个所述人物声源信息分别进行声源识别,计算多组所述声音采集器在采集对应的人物声源信息时产生的语音时间差,被进一步配置为执行下述步骤:
步骤211、在人物声源信息中提取环境噪声、人物语音时的声源信号和人物的语音传播至每一声音采集器的传播时间。
步骤212、根据环境噪声、声源信号和传播时间,确定每个声音采集器的接收信号。
步骤213、利用互相关时延估计算法,对每个声音采集器的接收信号进行处理,得到每两个声音采集器在采集对应的人物声源信息时产生的语音时间差。
在计算每两个声音采集器的语音时间差时,可利用声音采集器阵列实现声源到达方向估计(direction-of-arrival(DOA)estimation),由DOA估计算法计算声音到达不同声音采集器阵列间的时间差。
在声源定位系统中,声音采集器阵列的每个阵元接收到的目标信号都来自于同一个声源。因此,各通道信号之间具有较强的相关性,通过计算每两路信号之间的相关函数,就可以确定每两个声音采集器观测信号之间的时延,即语音时间差。
人物在语音时产生的人物声源信息中包括环境噪声和人物语音时的声源信号,还可在人物声源信息中通过识别提取出人物的语音传播至每一声音采集器的传播时间,计算每个声音采集器的接收信号。
x i(t)=α is(t-τ i)+n i(t);
式中,x i(t)为第i个声音采集器的接收信号,s(t)为人物语音时的声源信号,τ i为人物的语音传播至第i个声音采集器的传播时间,n i(t)为环境噪声,α i为修正系数。
利用互相关时延估计算法对每个声音采集器的接收信号进行处理,进行时延估计,表示为:
Figure PCTCN2021093588-appb-000001
式中,
Figure PCTCN2021093588-appb-000002
为第i个声音采集器与第i+1个声音采集器之间的时延,即语音时间差。
带入每个声音采集器的接收信号模型,得到:
Figure PCTCN2021093588-appb-000003
由于s(t)与n i(t)互不相关,因此可简化上式为:
Figure PCTCN2021093588-appb-000004
其中,
Figure PCTCN2021093588-appb-000005
n i与n i+1为互不相关的高斯白噪声,则上式进一步简化为:
Figure PCTCN2021093588-appb-000006
由互相关时延估计算法的性质可知,当
Figure PCTCN2021093588-appb-000007
时,
Figure PCTCN2021093588-appb-000008
取最大值,是两个声音采集器的时延,即语音时间差。
在声音采集器阵列信号处理实际模型中,由于存在混响和噪声影响,导致
Figure PCTCN2021093588-appb-000009
的峰值不明显,降低了时延估计的精度。为了锐化
Figure PCTCN2021093588-appb-000010
的峰值,可以根据信号和噪声的先验知识,在频域内对互功率谱进行加权,从而能抑制噪声和混响干扰。最后进行傅里叶逆变换,得到广义互相关函数
Figure PCTCN2021093588-appb-000011
Figure PCTCN2021093588-appb-000012
其中
Figure PCTCN2021093588-appb-000013
表示频域加权函数。
最后采用PHAT加权,使得信号间的互动率谱更加平滑,得到最终的每两个声音采集器在采集对应的人物声源信息时产生的语音时间差
Figure PCTCN2021093588-appb-000014
经过PHAT加权的互功率谱近似于单位冲激响应的表达式,突出了时延的峰值,能够有效抑制混响噪声,提高时延(语音时间差)估计的精度和准确度。
在一些实施例中,在步骤S22中,控制器在执行基于语音时间差,计算人物在语音时所处位置的声源角度信息,被进一步配置为执行下述步骤:
步骤221、获取当前环境状态下的声速、每个声音采集器的坐标和声音采集器的设置个数。
步骤222、根据声音采集器的设置个数,确定声音采集器的组合对数量,组合对数量是指声音采集器两两组合得到的组合数。
步骤223、根据每两个声音采集器对应的语音时间差、声速和每个声音采集器的坐标,建立向量关系方程组,向量关系方程组的数量与组合对数量相同。
步骤224、求解向量关系方程组,得到人物语音时所处位置的声源单位平面波传播向量的向量值。
步骤225、根据向量值,计算人物在语音时所处位置的声源角度信息。
在根据前述实施例提供的方法计算出每两个声音采集器的语音时间差后,可根据每个语音时间差计算人物在语音时所处位置的声源角度信息。
在计算声源角度信息时,需要建立多组向量关系方程组,为保证计算结果的准确性,可设定方程组的数量与声音采集器两两组合得到的组合数相同。为此,获取声音采集器的设置个数N,则所有声音采集器之间两两组合共有N(N-1)/2对组合对。
在建立向量关系方程组时,获取当前环境状态下的声速c和每个声音采集器的坐标,记第k个声音采集器的坐标为(x k,y k,z k),同时,设定人物语音时所处位置的声源单位平面波传播向量为u=(u,v,w),求解出人物语音时所处位置的声源单位平面波传播向量的向量值即可确定声源角度信息。
根据每两个声音采集器对应的语音时间差
Figure PCTCN2021093588-appb-000015
声速c、每个声音采集器的 坐标(x k,y k,z k)和人物语音时所处位置的声源单位平面波传播向量为(u,v,w),建立N(N-1)/2个向量关系方程组:
Figure PCTCN2021093588-appb-000016
该式代表第i个声音采集器与第j个声音采集器之间建立的向量关系方程组。
以N=3为例,可以建立以下方程组:
Figure PCTCN2021093588-appb-000017
(第1个声音采集器与第2个声音采集器之间建立的向量关系方程组);
Figure PCTCN2021093588-appb-000018
(第1个声音采集器与第3个声音采集器之间建立的向量关系方程组);
Figure PCTCN2021093588-appb-000019
(第3个声音采集器与第2个声音采集器之间建立的向量关系方程组)。
将上述三个向量关系方程组,写成矩阵形式:
Figure PCTCN2021093588-appb-000020
根据上述矩阵求解出u=(u,v,w),再利用正余弦关系,即可得到角度值:
Figure PCTCN2021093588-appb-000021
即人物在语音时所处位置的方位角度的声源角度信息。
S3、基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度。
控制器通过对人物声源信息进行声源识别,以确定出用于表征人物在语音时所处位置的方位角度的声源角度信息。声源角度信息可标识人物当前的所处位置,摄像头的当前拍摄角度可标识摄像头当前的所处位置,根据两个位置之间的相差角度即可确定摄像头需要转动的目标转动角度,以及摄像头在转动时的目标转动方向。
图13中示例性示出了根据一些实施例的确定摄像头的目标转动方向和目标转动角度的方法流程图。具体地,参见图13,控制器在执行基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度,被进一步配置为执行下述步骤:
S31、将声源角度信息转换为摄像头的坐标角度。
由于声源角度信息表征人物的所处方位角度,因此,为便于准确地根据声源角度信息和摄像头的当前拍摄角度计算出摄像头需要调整的方位角度,可将人物的声源角度信息转换为摄像头的坐标角度,即用摄像头的坐标角度来代替人物的声源角度信息。
具体地,控制器在执行将声源角度信息转换为摄像头的坐标角度,被进一步配置 为执行下述步骤:
步骤311、获取人物在语音时的声源角度范围和摄像头转动时的预设角度范围。
步骤312、计算声源角度范围与预设角度范围之间的角度差值,将角度差值的半值作为转换角度。
步骤313、计算声源角度信息对应的角度与转换角度的角度差,将角度差作为摄像头的坐标角度。
由于声源角度范围和摄像头的预设角度范围并不相同,预设角度范围为0°~120°,声源角度范围为0°~180°,无法直接由摄像头的坐标角度代替声源角度信息。因此,先计算声源角度范围与预设角度范围之间的角度差值,再计算角度差值的半值,将半值作为由声源角度信息转换为摄像头的坐标角度时的转换角度。
声源角度范围与预设角度范围之间的角度差值为60°,角度差值的半值为30°,将30°作为转换角度。最后,计算声源角度信息对应的角度与转换角度的角度差,即为将声源角度信息转换成的摄像头的坐标角度。
例如,如果人物位于声音采集器的左侧,控制器通过获取多个声音采集器采集的人物声源信息确定出的声源角度信息对应的角度为50°,而转换角度为30°,因此,计算角度差为20°,即实现将声源角度信息对应的50°替换为摄像头的坐标角度20°来表示。
如果人物位于声音采集器的右侧,控制器通过获取多个声音采集器采集的人物声源信息确定出的声源角度信息对应的角度为130°,而转换角度为30°,因此,计算角度差为100°,即实现将声源角度信息对应的130°替换为摄像头的坐标角度100°来表示。
S32、计算摄像头的坐标角度和摄像头的当前拍摄角度的角度差值,将角度差值作为摄像头的目标转动角度。
摄像头的坐标角度用于标识人物所处位置在摄像头坐标内的角度,因此,根据摄像头的当前拍摄角度与摄像头的坐标角度的角度差值,即可确定出摄像头需要转动的目标转动角度。
例如,如果摄像头的当前拍摄角度为100°,摄像头的坐标角度为20°,说明摄像头当前的拍摄区域并未对准人物所处位置,二者相差80°,因此,需将摄像头转动80°后,摄像头的拍摄区域才可对准人物所处位置,即摄像头的目标转动角度为80°。
S33、根据角度差值,确定摄像头的目标转动方向。
由于以面对显示设备的方向,将左侧作为摄像头0°位置,右侧作为摄像头120°位置,因此,在根据摄像头的坐标角度和摄像头的当前拍摄角度确定出角度差值后,如果当前拍摄角度大于坐标角度,则说明摄像头的拍摄角度位于人物所处位置的右侧,此时角度差值为负值;如果当前拍摄角度小于坐标角度,则说明摄像头的拍摄角度位于人物所处位置的左侧,此时角度差值为正值。
在一些实施例中,可根据角度差值的正负来确定摄像头的目标转动方向。如果角度差值为正值,说明摄像头的拍摄角度位于人物所处位置的左侧,此时,为使摄像头拍摄到人物的图像,需向右调整摄像头的拍摄角度,则确定摄像头的目标转动方向为向右转动。
如果角度差值为负值,说明摄像头的拍摄角度位于人物所处位置的右侧,此时, 为使摄像头拍摄到人物的图像,需向左调整摄像头的拍摄角度,则确定摄像头的目标转动方向为向左转动。
例如,图14中示例性示出了根据一些实施例的调整摄像头拍摄角度的一种场景图。参见图14,如果人物对应的声源角度信息对应的角度为50°,则转换成的摄像头的坐标角度为20°;摄像头的当前拍摄角度为100°,即摄像头的视角中心线位于人物所处位置的右侧,计算得到角度差值为-80°。可见角度差值为负值,此时,需调整摄像头向左转动80°。
图15a中示例性示出了根据一些实施例的调整摄像头拍摄角度的另一种场景图。参见图15a,如果人物对应的声源角度信息对应的角度为120°,则转换成的摄像头的坐标角度为90°;摄像头的当前拍摄角度为40°,即摄像头的视角中心线位于人物所处位置的左侧,计算得到角度差值为50°。可见角度差值为正值,此时,需调整摄像头向右转动50°。
S4、按照目标转动方向和目标转动角度,调整摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。
控制器在确定出摄像头需要调整拍摄角度时所需的目标转动方向和目标转动角度后,即可按照目标转动方向和目标转动角度调整摄像头的拍摄角度,将摄像头的拍摄区域正对人物所处位置,使得摄像头可拍摄到包括人物的图像,实现根据人物的所处位置调整摄像头的拍摄角度。
图15b中示例性示出了根据一些实施例的人物语音时所处位置的场景图。由于摄像头的预设角度范围与人物语音时的声源角度范围不同,若体现在角度示意图中,参见图15b,预设角度范围的0°位置与声源角度范围的0°位置之间存在30°的角度差值,同样的,预设角度范围的120°位置与声源角度范围的180°位置之间也存在30°的角度差值。
那么,如果人物在与显示设备交互时,其所处的位置恰好位于30°的夹角区域范围内,如图15b中所示的人物(a)所处位置或人物(b)所处位置。此时,控制器在执行前述步骤S31中将声源角度信息转换为摄像头的坐标角度时,将会出现由人物的声源角度信息转换得到的摄像头的坐标角度为负值的情况,或者大于摄像头的预设角度范围最大值的情况,即转换得到的摄像头的坐标角度并未位于摄像头的预设角度范围内。
例如,若人物(a)所处位置对应的声源角度信息为20°,而转换角度为30°,则计算得到的摄像头的坐标角度为-10°。若人物(b)所处位置对应的声源角度信息为170°,而转换角度为30°,则计算得到的摄像头的坐标角度为140°。可见,根据人物(a)所处位置和人物(b)所处位置分别转换得到的摄像头的坐标角度均超出摄像头的预设角度范围。
如果摄像头的坐标角度均超出摄像头的预设角度范围,说明摄像头无法转动至摄像头的坐标角度(人物语音所处位置)对应的位置。而由于摄像头的可视角度范围位于60°~75°之间,说明在将摄像头转动到0°位置或者120°位置,摄像头的可视角度范围可覆盖预设角度范围的0°位置与声源角度范围的0°位置之间存在30°的角度差,以及,覆盖预设角度范围的120°位置与声源角度范围的180°位置之间存在30°的角度差。
因此,如果人物的所处位置位于预设角度范围的0°位置与声源角度范围的0°位置之间存在30°的角度差范围内,或者,位于预设角度范围的120°位置与声源角度范围的180°位置之间存在30°的角度差范围内,则为了能够拍摄到包含人物的图像,按照摄像头的预设角度范围的最小值或最大值对应的位置,调整摄像头的拍摄角度。
在一些实施例中,控制器被进一步配置为执行下述步骤:在人物的声源角度信息转换为摄像头的坐标角度超出摄像头的预设角度范围时,根据摄像头的当前拍摄角度与预设角度范围的最小值或最大值的角度差值,确定摄像头的目标转动方向和目标转动角度。
例如,如果人物(a)位于预设角度范围的0°位置与声源角度范围的0°位置之间存在30°的角度差范围内,即人物(a)的声源角度信息对应的声源角度为20°,摄像头的当前拍摄角度为50°时。根据摄像头的预设角度范围的最小值0°和当前拍摄角度50°计算角度差值,角度差值为-50°,则确定摄像头的目标转动方向为向左转动,目标转动角度为50°。此时,摄像头的视角中心线(a)与摄像头的0°线重合。
如果人物(b)位于预设角度范围的120°位置与声源角度范围的180°位置之间存在30°的角度差范围内,即人物(b)的声源角度信息对应的声源角度为170°,摄像头的当前拍摄角度为50°时。根据摄像头的预设角度范围的最大值120°和当前拍摄角度50°计算角度差值,角度差值为70°,则确定摄像头的目标转动方向为向右转动,目标转动角度为70°。此时,摄像头的视角中心线(b)与摄像头的120°线重合。
因此,即使人物所处位置对应的声源角度超出摄像头在转动时的预设角度范围,本申请实施例提供的显示设备,仍可依据人物的所处位置,将摄像头转动至预设角度范围对应的最小值或最大值的位置,依据摄像头的可视角度覆盖范围,拍摄到包含人物的图像。
可见,本申请实施例提供的一种显示设备,其中的摄像头可在预设角度范围内转动,控制器被配置为获取声音采集器采集的人物声源信息并进行声源识别,确定用于标识人物所在位置的方位角度的声源角度信息;基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;按照目标转动方向和目标转动角度,调整摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。可见,本申请提供的显示设备,可实现利用人物声源信息触发摄像头的转动,能够自动识别用户的实时所处位置并调整摄像头的拍摄角度,使得摄像头始终能够拍摄到包含人像的图像。
图10中示例性示出了根据一些实施例的摄像头拍摄角度的调整方法的流程图。参见图10,本申请实施例提供的一种摄像头拍摄角度的调整方法,由前述实施例提供的显示设备中的控制器执行,该方法包括:
S1、获取所述声音采集器采集的人物声源信息和所述摄像头的当前拍摄角度,所述人物声源信息是指人物通过语音与显示设备交互时产生的声音信息;
S2、对所述人物声源信息进行声源识别,确定声源角度信息,所述声源角度信息用于表征人物在语音时所处位置的方位角度;
S3、基于所述摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;
S4、按照所述目标转动方向和目标转动角度,调整所述摄像头的拍摄角度,以使 摄像头的拍摄区域正对人物语音时的所处位置。
在本申请一些实施例中,所述对人物声源信息进行声源识别,确定声源角度信息之前,还包括:对所述人物声源信息进行文本提取,得到语音交互文本;对比所述语音交互文本和预置唤醒文本,所述预置唤醒文本是指用于触发声源识别过程的文本;如果所述语音交互文本与所述预置唤醒文本对比一致,则执行对人物声源信息进行声源识别的步骤。
在本申请一些实施例中,包括多组声音采集器,所述控制器获取所述声音采集器采集的人物声源信息具体为:获取每个所述声音采集器采集的所述人物在语音时产生的人物声源信息;所述对人物声源信息进行声源识别,确定声源角度信息,包括:对每个所述人物声源信息分别进行声源识别,计算多组所述声音采集器在采集对应的人物声源信息时产生的语音时间差;基于所述语音时间差,计算所述人物在语音时所处位置的声源角度信息。
在本申请一些实施例中,所述对每个所述人物声源信息分别进行声源识别,计算多组所述声音采集器在采集对应的人物声源信息时产生的语音时间差,包括:在所述人物声源信息中提取环境噪声、人物语音时的声源信号和人物的语音传播至每一声音采集器的传播时间;根据所述环境噪声、声源信号和传播时间,确定每个声音采集器的接收信号;利用互相关时延估计算法,对每个声音采集器的接收信号进行处理,得到每两个声音采集器在采集对应的人物声源信息时产生的语音时间差。
在本申请一些实施例中,所述基于语音时间差,计算所述人物在语音时所处位置的声源角度信息,包括:获取当前环境状态下的声速、每个声音采集器的坐标和所述声音采集器的设置个数;根据所述声音采集器的设置个数,确定声音采集器的组合对数量,所述组合对数量是指声音采集器两两组合得到的组合数;根据每两个声音采集器对应的语音时间差、声速和每个声音采集器的坐标,建立向量关系方程组,所述向量关系方程组的数量与组合对数量相同;求解所述向量关系方程组,得到人物语音时所处位置的声源单位平面波传播向量的向量值;根据所述向量值,计算所述人物在语音时所处位置的声源角度信息。
在本申请一些实施例中,所述获取摄像头的当前拍摄角度之前,包括:查询所述摄像头的当前运行状态;如果所述摄像头的当前运行状态为处于旋转状态,则等待摄像头旋转完毕;如果所述摄像头的当前运行状态为处于未旋转状态,则获取所述摄像头的当前拍摄角度。
在本申请一些实施例中,所述基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度,包括:将所述声源角度信息转换为摄像头的坐标角度;计算所述摄像头的坐标角度和摄像头的当前拍摄角度的角度差值,将所述角度差值作为所述摄像头的目标转动角度;根据所述角度差值,确定摄像头的目标转动方向。
在本申请一些实施例中,所述将声源角度信息转换为摄像头的坐标角度,包括:获取所述人物在语音时的声源角度范围和摄像头转动时的预设角度范围;计算所述声源角度范围与所述预设角度范围之间的角度差值,将所述角度差值的半值作为转换角度;计算所述声源角度信息对应的角度与所述转换角度的角度差,将所述角度差作为摄像头的坐标角度。
在本申请一些实施例中,所述根据角度差值,确定摄像头的目标转动方向,包括:如果所述角度差值为正值,则确定摄像头的目标转动方向为向右转动;如果所述角度差值为负值,则确定摄像头的目标转动方向为向左转动。
第二方面:
本申请实施例中,如图15b所示,摄像头232作为一种检测器230可以内置或外接显示设备200上,在启动运行后,摄像头232可以检测图像数据。摄像头232可以通过接口部件与控制器250连接,从而将检测的图像数据发送给控制器250进行处理。为了检测图像,摄像头232可以包括镜头组件和云台组件。其中,镜头组件可以是基于CCD(Charge Coupled Device,电荷耦合器件)或CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)的图像采集元件,以根据用户图像生成电信号的图像数据。
镜头组件设置在云台组件上,云台组件可以带动镜头组件进行转动,以便更改镜头组件的朝向。云台组件可以包括至少两个转动部件,以分别实现带动镜头组件沿数值方向进行左右转动,以及沿水平方向进行上下转动。每个转动部件可以连接电机,以通过电机驱动其自动进行转动。
例如,如图17所示,云台组件可以包括呈竖直状态的第一转轴和呈水平状态的第二转轴,第一转轴设置在显示器275的顶部,与显示器275的顶部可转动地连接;第一转轴上还设有固定件,固定件的顶部可转动的连接有所述第二转轴,第二转轴连接镜头组件,以带动镜头组件进行转动。第一转轴和第二转轴上分别连接有电机以及传动部件。电机可以是能够支持自动控制转角的伺服电机、步进电机等。当获取控制指令后,两个电机可以分别进行旋转以驱动第一转轴和第二转轴进行转动,从而调节镜头组件的朝向。
随着镜头组件的不同朝向,镜头组件可以对位于不同位置上的用户进行视频拍摄,从而获取用户图像数据。显然,不同的朝向对应于不同区域的图像采集,当用户在相对于显示器275正前方位置偏左时,可以通过云台组件上的第一转轴带动固定件以及镜头组件向左转动,以使拍摄的图像中,用户人像位置位于画面的中心区域;而当用户躯体成像位置偏下时,可以通过云台组件中的第二转轴带动镜头组件向上转动,以抬高拍摄角度,使用户人像位置位于画面的中心区域。
为了追踪人像位置,控制器250可以通过执行人物定位追踪方法,识别用户人像在图像中所处的位置。并且在用户位置不合适时,通过控制摄像头232进行旋转,以获取合适的图像。其中,识别用户所处位置可以通过图像处理完成。例如,控制器250可以在启动摄像头232后,通过摄像头232拍摄至少一张图像,作为校对图像。并且在校对图像中进行特征分析,从而在校对图像中识别出人像区域。通过判断人像区域的位置,从而确定用户位置是否合适。
但在实际应用中,由于摄像头232的初始朝向与用户在空间中所处的位置可能具有偏移。即在部分情况下,摄像头232的拍摄范围不能覆盖用户人像,使得摄像头232无法拍摄到用户人像,或只能获取到小部分人像。这种情况下会导致在图像处理过程中无法识别出人像区域,也无法实现在用户位置不合适时摄像头232的旋转控制,即对于不在当前图像中的人物则无法进行有效调整。
因此,为了使摄像头232拍摄的校对图像中能够包括人像区域,可以在获取校对 图像前先通过声音信号定位用所在的方位,并在获得方位后,先控制摄像头232旋转朝向该方位,再采集校对图像,从而使采集的校对图像中更容易包含人像区域。为此,显示设备200上还设有声音采集器231。声音采集器231可以通过多个麦克风形成阵列,同时对用户发出的声音信号进行采集,以便通过采集的声音信号确定用户方位。即如图18a、图18b所示,在本申请的部分实施例中提供一种声像人物定位追踪方法,包括以下步骤:
获取用户输入的测试音频信号。
实际应用中,控制器250可以在启动摄像头232后自动运行所述声像人物定位追踪方法,并获取用户输入的测试音频信号。其中,摄像头232的启动可以为手动启动或自动启动。手动启动即用户通过遥控器等控制装置100在操作界面中选择摄像头232对应的图标后,完成启动。自动启动可以是用户在执行某些需要调用摄像头232的交互动作后,自动启动。例如,用户在“我的应用”界面中选择“照镜子”应用,由于该应用需要调用摄像头232,因此在启动运行该应用的同时,也启动摄像头232。
摄像头232在启动后的姿态可以是默认初始姿态,例如设置默认初始姿态为摄像头232的镜头组件朝向正前方;启动后的姿态也可以是上一次使用摄像头232时所维持的姿态,例如,在上一次使用时,将摄像头232调节至抬高45度的姿态,则在此次启动摄像头232后,摄像头232的姿态也为抬高45度的姿态。
在启动摄像头232后,控制器250可以通过声音采集器231获取用户输入的测试音频信号。由于声音采集器231中包括麦克风阵列,因此在不同位置上的麦克风可以针对同一个测试音频采集到不同的音频信号。
为了能够通过麦克风阵列获取音频信号,在启动摄像头232后,还可以自动在显示器275上显示文字提示和/或通过扬声器等音频输出装置播放语音提示,以提示用户输入测试音频,例如“请输入测试音频:嗨!小聚”。
需要说明的是,测试音频可以是用户发出的多种音频信号,包括:用户通过说话方式发出的语音、用户通过拍手等肢体动作发出的声音以及用户通过其他手持终端发出的声音。例如,用户通过手机等智能终端操控显示设备200时,在需要用户输入测试音频信号时,可以向该智能终端发送用于控制其发声的控制指令,使得该智能终端可以在接收到控制该指令后,自动播放特定声音,以便声音采集器231进行检测。
为此,在一些实施例中,控制器250可以在运行应用程序后,通过声音采集组件获取声音信号,并从声音信号中提取声纹信息。再将声纹信息与预设测试声纹进行对比,如果声纹信息与预设测试声纹相同,标记声音信号为测试音频信号;如果声纹信息与预设测试声纹不同,控制显示器275显示提示界面。
例如,当设定测试音频信号为内容“嗨!小聚”的语音时,则在麦克风检测到声音信号后,可以对声音信号中的声纹信息进行提取,并判断当前声纹信息是否与“嗨!小聚”的声纹信息相同,并在确定声纹信息相同后,执行后续步骤。
显然,这种利用智能终端进行发声的方式,可以实现发出的声音具有特定的波形或响度,使其对应的音频信号具有独特的声音特点,因此便于后续对音频信号进行比较分析,缓解环境中其他声音对分析过程的影响。
根据所述测试音频信号定位目标方位。
在获取到用户输入的测试音频信号后,控制器250可以对测试音频信号进行分析, 以确定用户所处的目标方位。由于声音采集器231中包括多个麦克风,并构成麦克风阵列,因此相对于一个声音源位置,不同麦克风与音源位置之间的距离不同,相应其采集到的音频信号之间具有一定的传播时延。控制器250可以通过分析至少两个麦克风之间的传播时延,结合两个麦克风之间的距离以及声音在空气中的传播速度,确定用户发出声音时所在的大致方位。
通过多个麦克风进行的时延检测,可以定位声音发出位置,即确定目标方位。由于检测目标方位的目的在于将摄像头232镜头组件朝向该方位,因此所述目标方位可以仅通过相对角度的方式进行表示,以使控制器250在定位目标方位后,直接能够确定相对角度数据,并以此来计算摄像头232需要调整的角度。其中,相对角度可以是指目标位置与摄像头232所在平面(即与显示器275屏幕平行的平面)垂线之间的相对角度,也可以是目标位置与摄像头232镜头轴线之间的相对角度。
例如,显示设备200外接的声音采集器231中,包括两个麦克风,分别设置在显示器275的两个侧边位置处,摄像头232则设置在显示器275的顶边中心位置处。当用户在任一位置输入语音信号后,两侧的麦克风可以分别检测到测试音频信号,则根据图19中的位置关系可知:
目标方位φ=arctan(L2/D);其中,L2为用户距离摄像头232的水平距离,D为用户距离摄像头232的垂直距离。
而根据勾股定理可以确定以下的位置关系:显示器宽度H=L1+L2+L3;D 2+(L1+L2) 2=S1 2;L3 2+D 2=S2 2;其中,S1为用户位置与左侧麦克风之间的距离,S2为用户位置与右侧麦克风之间的距离,并且,S2=vt;S1=v(t+Δt),其中v为声音在空气中的传播速度,t为声音到达右侧麦克风所消耗的时间,Δt为左侧麦克风与右侧麦克风获取到测试音频信号的时间差。
在上述各式中,显示器宽度H、传播速度v以及获取时间差Δt是已知的,因此通过上述位置关系,可以求解出L2/D,进而求解出目标方位φ。
可见,在本实施例中,控制器250可以通过获取至少两个麦克风采集的测试音频信号再提取测试音频信号的获取时间差,从而根据获取时间差、麦克风和摄像头的安装位置数据,计算目标方位。为了获得更加准确的目标方位,还可以分别在水平方向和竖直方向上,确定位置关系,从而计算出用户位置相对于所述摄像头位置的水平偏转角度和竖直偏转角度。例如,可以增加麦克风的数量或者将麦克风设置在不同的高度上,从而确定竖直方向上的位置关系,以计算竖直偏转角度。
需要说明的是,麦克风的数量越多,越能够准确的定位用户方位,且越能够检测出不同麦克风所接收到音频信号之间的时延数值,因此在实际应用中可以通过适当增加麦克风的数量来提高目标方位检测的准确度。同时,为了增大时延数值,以减小检测误差干扰,还可以通过加大麦克风之间的距离来获得更加准确的检测结果。
根据所述目标方位与所述摄像头232的当前姿态,计算旋转角度。
在确定用户发出声音时的方位即目标方位后,可以计算摄像头232的旋转角度,以使摄像头按照旋转角度可以使镜头组件朝向目标方位。例如,如图18a、图18b所示,当前摄像头232处于默认初始姿态,而定位的目标方位与屏幕垂线之间的相对角度为向左偏移30°方向,则计算出旋转角度φ为向左30°(+30°)。
显然,无论目标方位通过哪一种相对角度的方式进行表示,都能够通过实际摄像 头232按照位置与当前姿态转化计算出旋转角度。例如,当前摄像头232处于左转50°的姿态,而定位的目标方位与屏幕垂线之间的相对角度为向左偏移30°,则计算出旋转角度为向右20°(-20°)。
需要说明的是,由于通过测试音频信号检测用户方位的目的在于使摄像头232所拍摄的校对图像中能够包含有用户对应的人像区域,因此在大多数情况下,通过控制摄像头232在一个方向上的旋转即能够使拍摄的校对图像包含人像区域。但在少数情况下,例如摄像头232的当前姿态处于竖直方向最大转角的极端姿态时,通过水平方向上的旋转并不能使摄像头232拍摄到人像。
因此,在部分实施例中,还可以通过多个麦克风确定空间(包括高度方向)上的目标方位,并且在计算旋转角度时,将目标方位分解为水平方向和竖直方向上的两个角度分量,从而分别控制摄像头232的旋转角度。
根据所述旋转角度生成旋转指令,以及将所述旋转指令发送给摄像头232。
在计算获得旋转角度后,控制器250可以对旋转角度进行封装,生成旋转指令。并将旋转指令发送给摄像头232。摄像头232中的电机可以在接收到控制指令后进行转动,从而通过转轴带动镜头组件转动,调整镜头组件的朝向。
由以上技术方案可知,显示设备200可以通过接口组件外接摄像头232和声音采集器231,并在进入需要进行人像追踪的应用后,通过声音采集器231中多个麦克风采集测试音频信号,并定位用户所处的目标方位,从而控制摄像头232进行旋转,使镜头组件朝向用户所在方位,以调整摄像头232的拍摄方向至面对目标方位,便于采集到包含用户人像的图像,使得在当前屏幕中没有人像区域时也能够进行调整,实现后续人物追踪。
为了实现对人物的追踪,在摄像头232完成旋转后,控制器250还可以通过继续执行声像人物定位追踪方法,通过获取图像的方式,对图像中的人像位置进行识别,从而在人像位置发生变化时,控制摄像头232旋转以追踪用户位置,使摄像头232所采集的图像中人像始终位于合适的区域内。
具体地,在一些实施例中,当摄像头232根据旋转指令旋转至面对目标方位后,控制器250还可以通过摄像头232获取校对图像,并在校对图像中检测人像图案;再通过标记人像图案,以及在用户移动位置时向摄像头232发送追踪指令,以追踪用户位置。通过对用户位置的追踪,可以使摄像头232拍摄的图像中,人物图案始终处于合适的位置内,例如处于图像的中部区域内,从而在执行“照镜子”、“运动跟随”等功能的应用时,能够在应用界面中获得更好的显示效果。
为了实现对用户位置的追踪,在一些实施例中,控制器250可以按照设定的频率通过摄像头232获取校对图像,并检测人像图案在校对图像中的位置。根据应用所需要的图像画面布局的不同,可以根据应用类型设置不同的预设区域范围,当人像图案在预设区域内时,即代表当前采集的校对图像中,人像图案位置合适,可以保持当前的摄像头232的拍摄方向不变。当人像图案不再预设区域内时,即代表当前用户的位置移动距离较大,采集的校对图像中人像图案位置不合适,需要对摄像头232的拍摄方向进行调整。
因此,控制器250可以根据人像图案位置生成追踪指令,并将追踪指令发送给摄像头232中,以控制摄像头232调整拍摄方向。显然,在摄像头232接收到追踪指令 后,调整后的拍摄方向应能够保持人像图案位于预设区域内。例如,所述声像人物定位追踪方法还包括以下步骤:
检测用户位置。
在对摄像头232进行旋转调整后,摄像头232可以实时拍摄多帧图像,并将拍摄的图像发送给显示设备200的控制器250。控制器250一方面可以根据所启动的应用程序进行图像处理,例如控制显示器275显示该图像;另一方面可以通过调用检测程序对校对图像进行分析,从而确定用户所在的位置。
其中,用户位置的检测可以通过图像处理程序完成。即通过实时抓取摄像头232拍摄的图像,检测肢体信息。肢体信息可以包含关键点和包裹肢体的外框,通过检测的关键点和肢体框位置在图像中位置信息。关键点可以是指人体图像中能够代表人体特征的一系列点。例如,眼睛、耳朵、鼻子、脖子、肩部、手肘、手腕、腰部、膝关节以及踝关节等。
关键点的确定可以通过图像识别获得,即可以通过分析画面中特征形状,并与预设的模板进行匹配从而确定关键点对应的图像,并获取图像对应的位置,从而获取各关键点对应的位置。其中,位置可以通过图像中距离边界的像素点数量进行表示。可以根据摄像头232的分辨率和可视角度,以图像的左上角为原点,以向右和向下为正方向构建平面直角坐标系,则图像中的各个像素点均能够通过这一直角坐标系进行表示。
例如,如图20所示,水平方向和垂直方向摄像头可视角度分别为HFOV和VFOV,可视角度可以根据摄像头CameraInfo获取,摄像头预览图像支持1080P,宽度为1920,高度1080像素,则图像中每个像素点的位置都可以为(x,y),其中x的取值范围为(0,1920);y的取值范围为(0,1080)。
通常为了能够准确表达用户所在的位置,关键点的数量可以设置为多个,并且在一次检测过程中需要对多个关键点的全部或部分进行位置提取,从而确定包裹肢体的外框区域。例如,关键点可以包括18个,即2个眼睛点、2个耳朵点、1个鼻子点、1个脖子点、2个肩部点、2个肘部点、2个腕部点、2个腰部点(或臀部点)、2个膝关节点以及2个踝关节点。显然,这些关键点在识别的过程中会根据用户的面向不同需要不同的识别方式。例如,腰部对应的位置在用户面向显示器275时识别为腰部点,而在用户背对显示器275时,识别为臀部点。
显然,当用户所处位置发生改变或者姿态发生变化时,部分关键点的位置将发生变化。随着这种变化的出现,摄像头232采集的图像中人体相对位置也将发生变化。例如,当人体向左移动位置时,将使摄像头232采集的图像中人体位置偏左,不便于进行图像分析处理和实时显示。
因此,在检测用户位置后,还需要对比用户位置与校对图像中的预设区域,从而确定当前用户位置是否在预设区域中。
在一些实施例中,用户位置可以通过肢体框中心位置进行表示,而肢体框中心位置可以通过检测的各关键点位置坐标计算获得。例如,通过获取肢体框水平左右两侧的关键点x轴位置坐标,计算肢体框中心位置,即中心位置x轴坐标x 0=(x 1+x 2)/2。
由于本申请实施例中摄像头232可以包括两个左右方向旋转和上下方向旋转,因此在计算获得中心位置的x轴坐标后,可以先对x轴坐标进行判断,确定中心位置的 x轴坐标是否位于整个图像的中心位置。例如,当校对图像为(1920,1080)的1080P图像时,校对图像的中心点水平坐标为960。
在确定人像中心位置和图像中心点后,可以通过对比确定用户位置是否位于预设判断区域中。为了避免频繁调整带来的处理负荷增加,以及允许部分检测误差。根据实际应用条件要求以及摄像头232的水平方向可视角度,可以预设一个允许坐标区间,当人像中心位置位于允许坐标区间内,则确定当前用户位置在预设区域中。例如,最大允许坐标误差为300像素,则允许坐标区间为[660,1260],当检测获得的用户中心位置坐标在这一区间内时,确定用户则在预设判断区域中,即计算获得的人像中心位置坐标与960位置相差不大;当检测获得的用户中心位置坐标不在这一区间内时,确定当前用户位置不在预设区域中,即计算获得的人像中心位置坐标与960位置相差较大。
在对比用户位置与校对图像中的预设区域后,可以根据对比结果确定是否需要进行人像追踪,如果当前用户位置不在预设区域内,控制摄像头232旋转,以使用户成像位置位于画面中部区域。如果当前用户位置在预设区域内,则无需控制摄像头232旋转,维持摄像头朝向即可。
在当前用户位置不在预设区域内时,为了控制摄像头232进行旋转,控制器250可以根据用户位置计算旋转角度量,并根据旋转角度量生成控制指令,以控制摄像头232进行旋转。
具体地,在确定当前用户位置不在预设区域内以后,控制器250可以先计算人像区域的中心位置和图像区域的中心点之间的距离;再根据计算的距离,结合摄像头232镜头组件的最大视角以及图像尺寸计算获得旋转角度;最后将计算的旋转角度以控制指令的形式发送给摄像头232,使得摄像头232中电机带动各转轴进行转动,从而调整镜头组件的朝向。
例如,如图21、图22所示,摄像头232的预览分辨率为1920x1080,图像的水平宽度:imgWidth=1920;图像水平中心位置坐标x=960;人像区域中心位置坐标为(x 0,y 0)水平中心位置坐标为x 0;水平视角为hfov;则人像区域和图像区域的中心距离:hd=x–x 0,则摄像头232在水平方向上的旋转角度则按照下式可计算获得:
Figure PCTCN2021093588-appb-000022
通过上式,可以计算出摄像头232需要进行调节的角度,控制器250再对人像区域中心位置与图像区域中心点的坐标数值进行比较,确定人像区域中心位置相对于图像区域中心点的方位,从而确定摄像头232的旋转方向。即,如果人像区域中心水平位置比图像中心大,则向右转动摄像头232;反之向左转动摄像头232。本申请实施例中,摄像头232可以采用后置摄像头模式,使得屏幕显示图像与摄像头拍摄图像是左右镜像关系,即水平角度旋转是左右相反的。
在确定旋转角度和方向以后,控制器250可以将旋转角度和方向数据进行封装,生成控制指令,并将控制指令发送给摄像头232。摄像头232中的电机可以在接收到控制指令后进行转动,从而通过转轴带动镜头组件转动,调整镜头组件的朝向。
需要说明的是,在上述实施例中,是以水平方向坐标为例进行判断、调整,实际 应用中还可以通过比较人像区域中心位置与图像区域中心点位置之间的竖直方向差异,对镜头组件也进行同样的调整,具体的调整方法与水平方向的调整方法相同,即在确定当前用户位置不在预设区域内以后,控制器250可以先计算人像区域的中心位置和图像区域的中心点之间的竖直距离;再根据计算的竖直距离,结合摄像头232镜头组件的竖直方向最大视角以及图像尺寸计算获得旋转角度;最后将计算的旋转角度以控制指令的形式发送给摄像头232,使得摄像头232中电机带动第二转轴进行转动,从而调整镜头组件的朝向。
但在实际应用中,由于受到用户姿态的影响,以及不同应用程序中的需求不同,在部分应用场景下使用中心位置作为用户位置判断的方式并不能获得较好的显示、检测、跟踪效果。因此在一些实施例中,控制摄像头232旋转,以使用户成像位置位于画面中部区域还可以按照以下步骤进行。
在校对图像中检测第一识别点。
其中,第一识别点为识别出关键点中的一个或多个,用于表征用户的部分肢体位置。例如,第一识别点可以为2个眼睛点(或2个耳朵点),用以表示用户的头部位置。通过在校对图像中匹配眼睛图案(或耳朵图案)所对应的区域,检测出当前图像中是否含有第一识别点,即是否含有眼睛点(或耳朵点)。
如果所述校对图像中不含有第一识别点,在所述校对图像中检测第二识别点。
第二识别点是与第一识别点间隔一定距离并且能够具有相对位置关系的关键点。例如,第二识别点可以为胸部点,由于在常规使用状态下,胸部点位于眼睛点的下方,并且胸部点与眼睛点之间间隔20-30cm的距离,因此可以通过对胸部点的检测确定需要调整的方向。
如果在所述校对图像中检测到所述第二识别点,则按照第二识别点与第一识别点的位置关系确定转动方向。
例如,当在校对图像中未检测到第一识别点,即眼睛点;而检测到第二识别点,即胸部点,则确定当前校对图像中,未能够显示完全用户的头部图像,需要将摄像头232向上抬起,以使人像头部进入图像的预设区域中。
显然,在实际应用中,根据第二识别点与第一识别点的相对方位不同,在校对图像中未检测到第一识别点,而检测到第二识别点时,所确定的旋转方向也是不同的。例如,第一识别点为腰部点,第二识别点为胸部点时,当未检测到腰部点而检测到胸部点,则说明拍摄的图像太靠人像的上半部,因此可以通过降低拍摄角度,使人像下半部进入图像的预设区域中。
按照所述旋转方向以及预设调节步长控制摄像头232转动,以使人像位于图像预设区域中。
例如,在眼部/耳部等关键点(第一识别点)没有检测到,而肩部等关键点(第二识别点)检测到时,可以上抬起摄像头232,使第一识别点位置每次调整100像素点,直到第一识别点处于1/7-1/5位置处。
如果校对图像中含有第一识别点,则获取第一识别点相对于图像区域所在的位置。
通过对校对图像中画面的识别,如果识别出第一识别点,则可以进一步对第一识别点所在的位置进行提取,从而确定第一识别点相对于在整个图像区域中所处的位置。例如,如图23a所示,在获得校对图像后,如果识别出眼睛点,即确定检测到第一识 别点,则可以获取眼睛点当前坐标P(x 1,y 1)。再将当前坐标中的x轴坐标值和/或y轴坐标值与图像的整体宽度imgWidth和/或高度imgHeight进行对比,从而确定第一识别点相对于图像区域所在的位置。
其中,在水平方向和竖直方向两个方向上可以确定第一识别点相对于图像区域在两个方向上所在的位置。即水平方向上,所述第一识别点相对于图像区域所在的位置为x 1/imgWidth;在竖直方向上,所述第一识别点相对于图像区域所在的位置为y 1/imgHeight。
在获取第一识别点相对于图像区域所在的位置后,还可以对第一识别点对应位置所在区间进行判断,并根据所在的不同区间,确定不同的调整方式。
例如,如图23a所示,通过检测在竖直方向上,第一识别点相对于图像区域所在的位置时,检测到眼睛(或耳部)在图像画面高度的1/5之下,此时,眼睛位置过低,需要将摄像头232下压,以使眼睛位置升高至合适的区域内,在将摄像头232下压的过程中,如果检测到眼睛的点在图像画面的1/5位置处,则停止下压,完成摄像头232的调整,如图23b所示。当检测到眼睛(或耳部)位置在图像画面高度的1/7以下、1/5以上,则确定当前第一识别点位置合适,因此摄像头232的高度不需要进行调整,防止抖动造成摄像头频繁变动。
上述实施例通过图像识别相结合的方式,可以实现对摄像头232的朝向进行实时控制,实现对人像目标的追踪。显然,在实际应用中,还可以通过声源定位实现对人像目标的追踪。因此在本申请的部分实施例中,对人像目标的追踪可以采用声源定位与图像识别相结合的方式,对人像目标进行更加准确的定位。
例如,在运行部分运动幅度较大、动作较快的健身类应用时,可以预先通过统计等方式获得哪些时刻容易出现难于确定用户位置的特殊时段,并在这一时段中通过获取音频信号辅助判断用户所处的位置,并按照此时图像识别和音频定位两者的结果进行综合定位,以提高对人像目标进行追踪的准确率。
另外,在部分使用场景中,通过图像识别检测到的人像可能存在多个,这将对摄像头232的追踪过程造成影响。因此在本申请的部分实施例中,还可以通过锁定程序在多个人像中锁定一个人像进行追踪。例如,可以在屏幕中心一定区域内查找离屏幕中心最近人像,作为最优的人脸信息(中心屏幕大小1/3区域,出现次数最多),从而记录该人物信息并进行锁定。而如果没有检测到人脸信息,说明声音信息误差较大,则锁定离屏幕最近的人物。
在锁定其中一个人像后,摄像头232的调节可以仅受到被锁定人物的位置影响。即摄像头232所拍摄图像内其他人的移动将不会调节摄像头232,摄像头232依然保持不动状态。只有锁定状态的人物移动,通过图像检测侦测到之后,驱动摄像头232跟随锁定人物进行转动。
由以上技术方案可知,显示设备200可以通过摄像头232获取校对图像,并在校对图像中检测人像图案,从而标记所述人像图案,以及在用户移动位置时向所述摄像头发送追踪指令,以追踪用户位置,实现摄像头232跟随用户移动的效果。通过对用户位置的追踪,可以使摄像头232拍摄的图像中,人像图案始终处于合适的位置中,便于应用进行显示、调用以及分析处理。
在一些实施例中,在标记所述人像图案的步骤中,如果所述校对图像中包括多个 人像图案查找位于校对图像中心区域的人像图案;如果校对图像中心区域位置含有人像图案,标记处于图像中心区域的人像图案;如果校对图像中心区域位置不含有人像图案,标记校对图像中,面积最大的人像图案。
例如,控制器250可以实时查询摄像头232状态,如果摄像头232根据测试音频信号旋转结束,则启动AI图像检测算法。在屏幕中心一定区域内查找离屏幕中心位置的人脸信息,记录该人物信息并进行锁定。如果没有检测到人脸信息,说明声音信息误差较大,则锁定离屏幕最近的人物。
在一些实施例中,获取用户输入的测试音频信号之前,还可以先对摄像头232所拍摄的图像进行一次图像识别,确定当前摄像头232能否拍摄到带有人像的画面。如果从拍摄的图像中识别出具有人像,则无需通过声源定位,而直接通过后续图像处理进行目标追踪。即在启动摄像头232后,可以先获取用于识别人像的初始图像,并在初始图像中识别人像区域。人像区域是识别方法可以与上述实施例相同,即通过识别关键点的方式完成。
如果所述初始图像中含有人像区域,则直接执行检测用户位置以及后续步骤,通过图像处理的方式对人像目标进行追踪。如果所述初始图像中不含有人像区域,则通过执行获取用户输入的测试音频信号以及后续步骤,通过声源定位的方式调整摄像头232至朝向用户位置的区域,再执行检测用户位置以及后续步骤。
为了获得更加准确的人像位置判断,在一些实施例中,如图24a、图24b所示,识别出多个关键点以后,还可以根据识别出的关键点建立骨骼线示意图形,从而根据骨骼线图形进一步确定人像所在位置。其中,骨骼线可以通过连接多个关键点进行确定。在用户不同的姿态下,骨骼线所呈现的形状也不同。
需要说明的是,通过绘制的骨骼线还可以对根据骨骼线的运动变化规律来动态调整摄像头的拍摄位置。例如,在判断骨骼线运动状态变化过程为从蹲姿状态变化到站立状态,则可以抬高摄像头232的视角,以使处于站姿状态的人像也能够处于图像中的合适区域内,即从图24a过渡到图24b所示的效果。在判断骨骼线运动状态变化过程为从站立状态变化到蹲姿状态,这可以降低摄像头232的视角,以使处于蹲姿状态的人像也能够处于图像中的合适区域内,即从图24b过渡到图24a所示的效果。
上述实施例以人像位置处于图像中心为例说明摄像头232对人像的追踪,应当理解的是,根据实际需要,预想拍摄的图像中,人像位置可能位于中心区域以外的其他区域中。例如,如图25a所示,对于运动跟随类应用,显示设备200可以根据摄像头232拍摄的视频,渲染虚拟教练影像,从而使用户通过显示设备200观看到的场景音像中,包括用户人像和虚拟教练人像。此时,为了随场景渲染,需要摄像头232拍摄的人像位于图像的一侧,而另一侧用于渲染虚拟教练影像。
例如,如图25a、图25b所示,当通过校对图像确定当前人像位置位于图像中心区域时,同样需要向摄像头232发送旋转指令,使摄像头232旋转,以使人像位于图像的右侧区域。
由以上技术方案可知,相对于单纯通过图像处理以及单纯通过声源定位的人物追踪方式,本申请实施例提供的声像人物定位追踪方法可以改进声源定位精确度较低,无法有效定位人物具体位置的缺陷。以及图像处理空间感知较差,只能对摄像头232对准的拍摄区域进行定位的缺陷。所述声像人物定位追踪方法通过对声源定位和摄像 头232图像分析进行综合利用,利用声源定位空间感知能力较强的优势,首先确认人物的大致位置,驱动摄像头232朝向声源方向。同时利用摄像头232图像分析精准度高的优点,对拍摄图像进行人物检测确定具体位置,驱动摄像头进行微调,以此达到精准定位,使摄像头232拍摄人物能够在图像中聚焦显示。
基于上述声像人物定位追踪方法,在一些实施例中,本申请还提供一种显示设备200,包括:显示器275、接口组件以及控制器250。
其中,所述显示器275被配置为显示用户界面,接口组件被配置为连接摄像头232和声音采集器231,摄像头232可转动拍摄角度,被配置为拍摄图像;声音采集器231包括多个麦克风组成的麦克风阵列,被配置为采集音频信号。
控制器250被配置为获取用户输入的测试音频信号,并响应于测试音频信号,定位目标方位,目标方位根据声音采集组件采集的测试音频信号时间差计算获得,从而向摄像头发送旋转指令,以调整摄像头的拍摄方向至面对目标方位。
在上述实施例中,可以通过接口组件外接摄像头232和声音采集器231,并结合显示设备200完成上述声像人物定位追踪方法。在一些实施例中,还可以直接将摄像头232和声音采集器231内置在显示设备200中,即显示设备200包括显示器275、摄像头232、声音采集器231以及控制器250,其中,摄像头232、声音采集器231可以直接连接控制器250,从而直接通过声音采集器231获取测试音频信号,并直接控制摄像头232进行旋转,从而完成上述声像人物定位追踪方法。
具体实现中,本申请还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可包括本申请提供的摄像头拍摄角度的调整方法的各实施例中的部分或全部步骤。所述的存储介质可为磁碟、光盘、只读存储记忆体(英文:read-only memory,简称:ROM)或随机存储记忆体(英文:random access memory,简称:RAM)等。
本领域的技术人员可以清楚地了解到本申请实施例中的技术可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用所述实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (10)

  1. 一种显示设备,其特征在于,包括:
    摄像头,所述摄像头被配置为采集人像以及实现在预设角度范围内的转动;
    声音采集器,所述声音采集器被配置为采集人物声源信息,所述人物声源信息是指人物通过语音与显示设备交互时产生的声音信息;
    与所述摄像头和所述声音采集器连接的控制器,所述控制器被配置为:获取所述声音采集器采集的人物声源信息和所述摄像头的当前拍摄角度;
    对所述人物声源信息进行声源识别,确定声源角度信息,所述声源角度信息用于表征人物在语音时所处位置的方位角度;
    基于所述摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;
    按照所述目标转动方向和目标转动角度,调整所述摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。
  2. 根据权利要求1所述的显示设备,其特征在于,所述控制器在执行所述对人物声源信息进行声源识别,确定声源角度信息之前,被进一步配置为:
    对所述人物声源信息进行文本提取,得到语音交互文本;
    对比所述语音交互文本和预置唤醒文本,所述预置唤醒文本是指用于触发声源识别过程的文本;
    如果所述语音交互文本与所述预置唤醒文本对比一致,则执行对人物声源信息进行声源识别的步骤。
  3. 根据权利要求1所述的显示设备,其特征在于,包括多组声音采集器,所述控制器获取所述声音采集器采集的人物声源信息具体为:获取每个所述声音采集器采集的所述人物在语音时产生的人物声源信息;
    所述控制器在执行所述对人物声源信息进行声源识别,确定声源角度信息,被进一步配置为:
    对每个所述人物声源信息分别进行声源识别,计算多组所述声音采集器在采集对应的人物声源信息时产生的语音时间差;
    基于所述语音时间差,计算所述人物在语音时所处位置的声源角度信息。
  4. 根据权利要求3所述的显示设备,其特征在于,所述控制器在执行所述对每个所述人物声源信息分别进行声源识别,计算多组所述声音采集器在采集对应的人物声源信息时产生的语音时间差,被进一步配置为:
    在所述人物声源信息中提取环境噪声、人物语音时的声源信号和人物的语音传播至每一声音采集器的传播时间;
    根据所述环境噪声、声源信号和传播时间,确定每个声音采集器的接收信号;
    利用互相关时延估计算法,对每个声音采集器的接收信号进行处理,得到每两个声音采集器在采集对应的人物声源信息时产生的语音时间差。
  5. 根据权利要求3所述的显示设备,其特征在于,所述控制器在执行所述基于语音时间差,计算所述人物在语音时所处位置的声源角度信息,被进一步配置为:
    获取当前环境状态下的声速、每个声音采集器的坐标和所述声音采集器的设置个数;
    根据所述声音采集器的设置个数,确定声音采集器的组合对数量,所述组合对数量是指声音采集器两两组合得到的组合数;
    根据每两个声音采集器对应的语音时间差、声速和每个声音采集器的坐标,建立向量关系方程组,所述向量关系方程组的数量与组合对数量相同;
    求解所述向量关系方程组,得到人物语音时所处位置的声源单位平面波传播向量的向量值;
    根据所述向量值,计算所述人物在语音时所处位置的声源角度信息。
  6. 根据权利要求1所述的显示设备,其特征在于,所述控制器在执行所述获取摄像头的当前拍摄角度之前,被进一步配置为:
    查询所述摄像头的当前运行状态;
    如果所述摄像头的当前运行状态为处于旋转状态,则等待摄像头旋转完毕;
    如果所述摄像头的当前运行状态为处于未旋转状态,则获取所述摄像头的当前拍摄角度。
  7. 根据权利要求1所述的显示设备,其特征在于,所述控制器在执行所述基于摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度,被进一步配置为:
    将所述声源角度信息转换为摄像头的坐标角度;
    计算所述摄像头的坐标角度和摄像头的当前拍摄角度的角度差值,将所述角度差值作为所述摄像头的目标转动角度;
    根据所述角度差值,确定摄像头的目标转动方向。
  8. 根据权利要求7所述的显示设备,其特征在于,所述控制器在执行所述将声源角度信息转换为摄像头的坐标角度,被进一步配置为:
    获取所述人物在语音时的声源角度范围和摄像头转动时的预设角度范围;
    计算所述声源角度范围与所述预设角度范围之间的角度差值,将所述角度差值的半值作为转换角度;
    计算所述声源角度信息对应的角度与所述转换角度的角度差,将所述角度差作为摄像头的坐标角度。
  9. 根据权利要求7所述的显示设备,其特征在于,所述控制器在执行所述根据角度差值,确定摄像头的目标转动方向,被进一步配置为:
    如果所述角度差值为正值,则确定摄像头的目标转动方向为向右转动;
    如果所述角度差值为负值,则确定摄像头的目标转动方向为向左转动。
  10. 一种摄像头拍摄角度的调整方法,其特征在于,所述方法包括:
    获取所述声音采集器采集的人物声源信息和所述摄像头的当前拍摄角度,所述人物声源信息是指人物通过语音与显示设备交互时产生的声音信息;
    对所述人物声源信息进行声源识别,确定声源角度信息,所述声源角度信息用于表征人物在语音时所处位置的方位角度;
    基于所述摄像头的当前拍摄角度和声源角度信息,确定摄像头的目标转动方向和目标转动角度;
    按照所述目标转动方向和目标转动角度,调整所述摄像头的拍摄角度,以使摄像头的拍摄区域正对人物语音时的所处位置。
PCT/CN2021/093588 2020-07-01 2021-05-13 一种显示方法及显示设备 WO2022001406A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180047263.6A CN116097120A (zh) 2020-07-01 2021-05-13 一种显示方法及显示设备
US18/060,210 US20230090916A1 (en) 2020-07-01 2022-11-30 Display apparatus and processing method for display apparatus with camera

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN202010621070.4 2020-07-01
CN202010621070.4A CN111708383A (zh) 2020-07-01 2020-07-01 一种摄像头拍摄角度的调整方法及显示设备
CN202010848905 2020-08-21
CN202010848905.X 2020-08-21
CN202110014128.3 2021-01-06
CN202110014128.3A CN112866772B (zh) 2020-08-21 2021-01-06 一种显示设备及声像人物定位追踪方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/060,210 Continuation US20230090916A1 (en) 2020-07-01 2022-11-30 Display apparatus and processing method for display apparatus with camera

Publications (1)

Publication Number Publication Date
WO2022001406A1 true WO2022001406A1 (zh) 2022-01-06

Family

ID=79317415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093588 WO2022001406A1 (zh) 2020-07-01 2021-05-13 一种显示方法及显示设备

Country Status (3)

Country Link
US (1) US20230090916A1 (zh)
CN (1) CN116097120A (zh)
WO (1) WO2022001406A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278070A (zh) * 2022-07-23 2022-11-01 宁波市杭州湾大桥发展有限公司 一种桥面监控视频防抖方法、系统、存储介质及智能终端
CN115862668A (zh) * 2022-11-28 2023-03-28 之江实验室 机器人基于声源定位判断交互对象的方法和系统
CN116052667A (zh) * 2023-03-08 2023-05-02 广东浩博特科技股份有限公司 智能开关的控制方法、装置和智能开关

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11963516B2 (en) * 2022-03-28 2024-04-23 Pumpkii Inc. System and method for tracking objects with a pet companion robot device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070195012A1 (en) * 2006-02-22 2007-08-23 Konica Minolta Holdings Inc. Image display apparatus and method for displaying image
CN102186051A (zh) * 2011-03-10 2011-09-14 弭强 基于声音定位的视频监控系统
CN104767970A (zh) * 2015-03-20 2015-07-08 上海大唐移动通信设备有限公司 一种基于声源的监控方法及监控系统
CN105049709A (zh) * 2015-06-30 2015-11-11 广东欧珀移动通信有限公司 一种大视角摄像头控制方法及用户终端
CN105278380A (zh) * 2015-10-30 2016-01-27 小米科技有限责任公司 智能设备的控制方法和装置
CN106292732A (zh) * 2015-06-10 2017-01-04 上海元趣信息技术有限公司 基于声源定位和人脸检测的智能机器人转动方法
CN108668077A (zh) * 2018-04-25 2018-10-16 Oppo广东移动通信有限公司 摄像头控制方法、装置、移动终端及计算机可读介质
CN111708383A (zh) * 2020-07-01 2020-09-25 海信视像科技股份有限公司 一种摄像头拍摄角度的调整方法及显示设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070195012A1 (en) * 2006-02-22 2007-08-23 Konica Minolta Holdings Inc. Image display apparatus and method for displaying image
CN102186051A (zh) * 2011-03-10 2011-09-14 弭强 基于声音定位的视频监控系统
CN104767970A (zh) * 2015-03-20 2015-07-08 上海大唐移动通信设备有限公司 一种基于声源的监控方法及监控系统
CN106292732A (zh) * 2015-06-10 2017-01-04 上海元趣信息技术有限公司 基于声源定位和人脸检测的智能机器人转动方法
CN105049709A (zh) * 2015-06-30 2015-11-11 广东欧珀移动通信有限公司 一种大视角摄像头控制方法及用户终端
CN105278380A (zh) * 2015-10-30 2016-01-27 小米科技有限责任公司 智能设备的控制方法和装置
CN108668077A (zh) * 2018-04-25 2018-10-16 Oppo广东移动通信有限公司 摄像头控制方法、装置、移动终端及计算机可读介质
CN111708383A (zh) * 2020-07-01 2020-09-25 海信视像科技股份有限公司 一种摄像头拍摄角度的调整方法及显示设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278070A (zh) * 2022-07-23 2022-11-01 宁波市杭州湾大桥发展有限公司 一种桥面监控视频防抖方法、系统、存储介质及智能终端
CN115278070B (zh) * 2022-07-23 2023-06-02 宁波市杭州湾大桥发展有限公司 一种桥面监控视频防抖方法、系统、存储介质及智能终端
CN115862668A (zh) * 2022-11-28 2023-03-28 之江实验室 机器人基于声源定位判断交互对象的方法和系统
CN115862668B (zh) * 2022-11-28 2023-10-24 之江实验室 机器人基于声源定位判断交互对象的方法和系统
CN116052667A (zh) * 2023-03-08 2023-05-02 广东浩博特科技股份有限公司 智能开关的控制方法、装置和智能开关

Also Published As

Publication number Publication date
US20230090916A1 (en) 2023-03-23
CN116097120A (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2022001407A1 (zh) 一种摄像头的控制方法及显示设备
WO2022001406A1 (zh) 一种显示方法及显示设备
CN111541845B (zh) 图像处理方法、装置及电子设备
US7990421B2 (en) Arrangement and method relating to an image recording device
WO2020108261A1 (zh) 拍摄方法及终端
CN112866772B (zh) 一种显示设备及声像人物定位追踪方法
CN110809115B (zh) 拍摄方法及电子设备
WO2019174628A1 (zh) 拍照方法及移动终端
CN110740259A (zh) 视频处理方法及电子设备
JP2020527000A (ja) 撮影モバイル端末
CN110602401A (zh) 一种拍照方法及终端
WO2021190428A1 (zh) 图像拍摄方法和电子设备
WO2022100262A1 (zh) 显示设备、人体姿态检测方法及应用
CN111708383A (zh) 一种摄像头拍摄角度的调整方法及显示设备
CN112672062B (zh) 一种显示设备及人像定位方法
CN109905603B (zh) 一种拍摄处理方法及移动终端
WO2022037535A1 (zh) 显示设备及摄像头追踪方法
CN109922294B (zh) 一种视频处理方法及移动终端
CN111031253B (zh) 一种拍摄方法及电子设备
CN111145192A (zh) 图像处理方法及电子设备
KR20220005087A (ko) 촬영 방법 및 단말
CN113655887A (zh) 一种虚拟现实设备及静态录屏方法
CN110086998B (zh) 一种拍摄方法及终端
CN113473024A (zh) 显示设备、云台摄像头和摄像头控制方法
CN110913133B (zh) 拍摄方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834310

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21834310

Country of ref document: EP

Kind code of ref document: A1