WO2021143574A1 - Augmented reality glasses, augmented reality glasses-based ktv implementation method and medium - Google Patents

Augmented reality glasses, augmented reality glasses-based ktv implementation method and medium Download PDF

Info

Publication number
WO2021143574A1
WO2021143574A1 PCT/CN2021/070281 CN2021070281W WO2021143574A1 WO 2021143574 A1 WO2021143574 A1 WO 2021143574A1 CN 2021070281 W CN2021070281 W CN 2021070281W WO 2021143574 A1 WO2021143574 A1 WO 2021143574A1
Authority
WO
WIPO (PCT)
Prior art keywords
augmented reality
audio
reality glasses
image
scene image
Prior art date
Application number
PCT/CN2021/070281
Other languages
French (fr)
Chinese (zh)
Inventor
劳逸
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021143574A1 publication Critical patent/WO2021143574A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • G02B27/0172Head mounted characterised by optical features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • G02B2027/0178Eyeglass type

Definitions

  • the present disclosure relates to the technical field of virtual reality and augmented reality, and in particular to an augmented reality glasses, a KTV implementation method based on the augmented reality glasses, and a computer-readable storage medium.
  • KTV is a kind of place that provides people with karaoke audio-visual equipment and sight-singing space. Going to KTV "K song” has become a popular way of leisure and entertainment.
  • the present disclosure provides an augmented reality glasses, a KTV implementation method based on the augmented reality glasses, and a computer-readable storage medium, thereby overcoming the problem of the relatively limited implementation of KTV at least to a certain extent.
  • an augmented reality glasses including: a camera unit for shooting current scene images; a storage unit for storing executable instructions; a processing unit for executing the executable instructions , To render a virtual image in the scene image according to the target video; a display unit for displaying the virtual image; an audio unit for playing the target audio and receiving and playing other audio; wherein the target video And the target audio are the video and audio of the same song.
  • a KTV implementation method based on augmented reality glasses including: acquiring a current scene image; rendering a virtual image in the scene image according to a target video; and synchronously playing the virtual image and
  • the target audio, the target video and the target audio are the video and audio of the same song; when the audio input by the user is received, the input audio is played.
  • a computer-readable storage medium having a computer program stored thereon, and the computer program, when executed by a processor, realizes the above-mentioned KTV implementation method based on augmented reality glasses.
  • the KTV implementation method based on augmented reality glasses, and the computer-readable storage medium on the one hand, the KTV function is realized through the augmented reality glasses, so that users can wear augmented reality glasses for singing entertainment, and the realization process is very convenient, and Without the assistance of other equipment, the implementation cost is low and the space occupied is small.
  • augmented reality glasses render and display virtual images based on real scene images, so that users are immersed in an audio-visual environment that combines virtual and real, with a strong sense of interaction and immersiveness, and a better user experience.
  • FIG. 1 shows a schematic diagram of the structure of augmented reality glasses in this exemplary embodiment
  • FIG. 2 shows a structural diagram of augmented reality glasses in this exemplary embodiment
  • Fig. 3 shows a flowchart of a KTV implementation method based on augmented reality glasses in this exemplary embodiment
  • FIG. 4 shows a flowchart of rendering a virtual image in this exemplary embodiment
  • Fig. 5 shows a schematic diagram of a computer-readable storage medium in this exemplary embodiment.
  • KTV mobile cabinets and home KTV systems have appeared, and karaoke APPs (Applications) have also appeared on terminal platforms such as mobile phones.
  • karaoke APPs Applications
  • terminal platforms such as mobile phones.
  • KTV mobile cabinets and home KTV systems have appeared, and karaoke APPs (Applications) have also appeared on terminal platforms such as mobile phones.
  • the equipment cost of KTV mobile cabinets and home KTV systems is relatively high, and the equipment occupies a large area, which is inconvenient to use; using K song APP on mobile phones and other terminals to sing, the interactivity and audiovisual experience are poor.
  • exemplary embodiments of the present disclosure provide an augmented reality (Augmented Reality, AR) glasses.
  • the following takes the augmented reality glasses 100 in FIG. 1 and FIG. 2 as an example to illustrate the internal unit structure thereof.
  • the augmented reality glasses 100 may include more or fewer components than shown, or combine certain components, or disassemble certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the interface connection relationship between the components is only shown schematically, and does not constitute a structural limitation on the augmented reality glasses 100.
  • the augmented reality glasses 100 may also adopt a different interface connection mode from that in FIG. 1, or a combination of multiple interface connection modes.
  • the augmented reality glasses 100 may specifically include a camera unit 110, a storage unit 120, a processing unit 130, a display unit 140 and an audio unit 150.
  • the imaging unit 110 may be composed of components such as a lens and a photosensitive element. As shown in FIG. 2, it can be located in the middle of the two lenses.
  • the camera unit 110 faces directly in front of the user, and can capture a still image or video in front.
  • the camera unit 110 when the KTV function of the augmented reality glasses 100 is activated, the camera unit 110 is used to take pictures of the current scene. For example, if the user is in a room, the camera unit 110 can take pictures of the scene in front of the user, including the room. Inside walls, floors, tables and chairs and other objects.
  • the camera unit 110 may include a depth camera 1101, for example, a TOF (Time Of Flight) camera, a binocular camera, etc., which can detect every scene in the image. Depth information of each part or each object (that is, the axial distance from the augmented reality glasses 100).
  • a depth camera 1101 for example, a TOF (Time Of Flight) camera, a binocular camera, etc., which can detect every scene in the image.
  • Depth information of each part or each object that is, the axial distance from the augmented reality glasses 100).
  • the storage unit 120 is used to store executable instructions, such as operating system code, program code, and data generated during the running of the program, such as image data taken by the camera unit 110, user data in the APP, and so on.
  • the storage unit 120 can be arranged in the mirror body between the two lenses, or can be arranged in other positions.
  • the storage unit 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (Universal Flash Storage, UFS), and the like.
  • the processing unit 130 may include a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), a modem processor, and an image signal processor (Image Signal Processor).
  • CPU central processing unit
  • GPU graphics processing unit
  • AP application processor
  • modem processor modem processor
  • image signal processor Image Signal Processor
  • ISP image signal processor
  • controller video codec
  • DSP Digital Signal Processor
  • NPU Neural-Network Processing Unit
  • different processors can be used as independent units or integrated into one processing unit.
  • the processing unit 130 may be arranged in the lens body between the two lenses, or may be arranged in other positions.
  • the processing unit 130 may execute executable instructions on the storage unit 120 to execute corresponding program commands.
  • the processing unit 130 may obtain a scene image from the camera unit 110, and render a virtual image in the scene image according to the target video.
  • the target video is the video of the currently playing song, usually a song MV (Music Video); the currently playing song can be selected by the user.
  • the song selection interface is displayed, and the user can click Stop, click and other operations to order a song, or it can be automatically selected by the system, such as random play mode or play according to a certain preset playlist.
  • the processing unit 130 may obtain the corresponding target video, and render a virtual image in the scene image according to the video.
  • the virtual image may be a combination of a video and a real scene image, for example, a picture in the video is embedded in the scene image for display.
  • the processing unit 130 may be used to: identify a plane in the scene image; render a virtual display screen on the plane, and display the target video in the virtual display screen. For example, if the wall plane is recognized in the scene image, a virtual display screen can be rendered on the wall plane to simulate the display screen in the KTV room.
  • the present disclosure does not limit its style, size, etc.; then it is displayed in the virtual display screen
  • the MV of the song currently being played makes the user feel like being in a KTV room with a strong sense of immersion.
  • the processing unit 130 may perform plane recognition based on the color information, shape information, and texture information in the scene image, for example, recognize connected areas with consistent colors and consistent textures as a plane.
  • the camera unit 110 includes a depth camera 1101
  • the collected scene image can carry depth information
  • the processing unit 130 can perform more accurate plane recognition based on the depth information, such as identifying connected areas with consistent colors, consistent textures, and continuously changing depths. Recognized as a plane.
  • an image recognition unit 1301 can be provided in the processing unit 130 to recognize planes in the scene image.
  • the image recognition unit 1301 can run a Convolutional Neural Network (CNN) to process the scene image and output the plane recognition result.
  • CNN Convolutional Neural Network
  • An image rendering unit 1302 may also be provided in the processing unit 130, specifically for rendering virtual images.
  • other virtual elements can also be rendered, including but not limited to: rendering virtual lights in the scene image, simulating the lighting effects of the KTV room, and according to Switching for different song styles; rendering a virtual microphone or singing stage; rendering an animation of virtual characters dancing in front.
  • the display unit 140 can display images, videos, and the like. As shown in FIG. 2, the display unit 140 is generally set in the form of a lens. The user sees the real scene through the lens, and the processing unit 130 transmits the virtual image to the display unit 140 for display, so that the user can see the real and virtual superimposed images. Effect. Therefore, the display unit 140 needs to have a “see-through” function, which not only needs to see the real external world, but also sees virtual information, so as to realize the integration and “enhancement” of reality and virtuality. In an alternative embodiment, as shown in FIG. 1, the display unit 140 may include a micro display (Display) 1401 and a lens (Lens) 1402.
  • the micro display screen 1401 is used to provide display content, and can be a self-luminous active device, such as a light-emitting diode panel, or a liquid crystal display with external light source illumination, etc.; the lens 1402 is used to make the human eye see the real scene, so as to The image and the virtual image are superimposed.
  • a self-luminous active device such as a light-emitting diode panel, or a liquid crystal display with external light source illumination, etc.
  • the lens 1402 is used to make the human eye see the real scene, so as to The image and the virtual image are superimposed.
  • the processing unit 130 renders the virtual image
  • it is transmitted to the display unit 140 for display.
  • the user can see the real scene and the virtual image in front at the same time through the augmented reality glasses 100.
  • the audio unit 150 can convert a digital audio signal into an analog audio signal for output, can also convert an analog audio input into a digital audio signal, and can also be used to encode and decode audio signals.
  • the audio unit 150 may be provided in the processing unit 130, or part of the functional modules of the audio unit 150 may be provided in the processing unit 130.
  • the audio unit 150 is used to play target audio, and to receive and play other audio.
  • the target video and target audio are the video and audio of the same song; other audio refers to the externally input audio besides the target audio.
  • the audio unit 150 plays the target audio synchronously, so that the user can start singing according to the audio and video.
  • the audio unit 150 may include a microphone 1501, also called “microphone”, "microphone”, etc., for receiving externally input audio, such as the user's singing voice or the audio of voice control instructions. For the user's singing voice, it can be converted to It is played out synchronously.
  • the audio unit 150 may also include earphones, such as a bone conduction earphone 1502, which can realize high-quality audio playback, such as Dolby sound effects.
  • the present disclosure does not limit the number and positions of the microphone 1501 and the bone conduction earphone 1502.
  • the microphone 1501 may be located below the front end of the temple part and present an array distribution to better receive the user's voice.
  • the bone conduction earphone 1502 can be located at the middle and rear ends of the temples, close to the position of the ear, which is conducive to achieving high-quality sound conduction.
  • the processing unit 130 may optimize the voice and control the audio unit 150 to play the optimized voice.
  • Optimization processing can include any one or more of the following: denoising (such as removing environmental noise), sound modification (such as appropriate beautification of pitch, vibrato, etc.), volume adjustment (adaptively adjusting the volume of the voice according to the volume of the audio playback) To prevent the voice volume from being too high or too low). Allow users to hear the optimized singing voice, and the experience is better.
  • the user may need to perform related control of the song, such as cutting songs, ordering songs, pausing playback, continuing playback, increasing the volume, reducing the volume, etc.
  • related control of the song such as cutting songs, ordering songs, pausing playback, continuing playback, increasing the volume, reducing the volume, etc.
  • the camera unit 110 can also be used to take a user's gesture image, which can be a static image of one frame or a dynamic image of multiple consecutive frames.
  • the processing unit 130 may also recognize the gesture control instruction corresponding to the gesture image, and execute the gesture control instruction. In this way, the user can perform gesture control during the singing process, such as sliding the finger up and down, swinging the finger left and right, etc.
  • the camera unit 110 takes the gesture image and sends it to the processing unit 130.
  • the processing unit 130 recognizes the gesture image, for example, through the image recognition unit. 1301 runs the convolutional neural network to obtain the gesture recognition result, and then determines the gesture control instruction according to the preset correspondence between the gesture and the control instruction, and executes the gesture control instruction.
  • the user can also input voice control instructions through the microphone 1501 of the audio unit 150, for example, the user speaks "cut song” into the microphone 1501. Then the processing unit 130 recognizes the voice control instruction input by the user, and executes the voice control instruction if it recognizes the "cut song” instruction.
  • a voice recognition unit 1303 may be provided in the processing unit 130, and a machine learning model of voice recognition is run to process the user's voice and output the recognition result.
  • a specific evoked voice can be set. When the user speaks the evoked voice, the voice control function is awakened, and at other times, the voice input by the user is played as a singing voice.
  • gesture control instruction Whether it is a gesture control instruction or a voice control instruction, operations such as switching songs, ordering songs, pausing playback, continuing playback, increasing volume, and decreasing volume can be implemented, which are not limited in the present disclosure.
  • the augmented reality glasses 100 may also include a communication unit 160, which may provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) networks), Bluetooth (Bluetooth) , BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared, IR) and other wireless communication solutions
  • WLAN wireless Local Area Networks
  • WLAN wireless Local Area Networks
  • Bluetooth Bluetooth
  • GNSS Global Navigation Satellite System
  • FM Frequency Modulation
  • NFC Near Field Communication
  • Infrared Technology Infrared, IR
  • the communication unit 160 can be used to obtain the target video and target audio.
  • the communication unit 160 is connected to the Internet, and when the user orders a song, the corresponding video and audio are searched and downloaded from the Internet, or the communication unit 160 is connected to Wi-Fi.
  • the song library is automatically updated, and the video and audio are stored in the storage unit 120 in advance. When the user sings, the video and audio files can be directly read from the storage unit 120.
  • the communication unit 160 can be used to access a wireless network, and if other devices also access the wireless network, the augmented reality glasses 100 can play the target audio synchronously with other devices; when the augmented reality glasses When any device in the 100 and other devices receives the voice input by the user, it can be synchronized to all devices in the wireless network to realize simultaneous KTV for multiple people. Further, if other devices connected to the wireless network are also augmented reality glasses, each augmented reality glasses can also display virtual images synchronously, so as to give the user a strong sense of trial and listening interaction.
  • the augmented reality glasses 100 may further include a sensor unit 170, which is composed of different types of sensors and is used to implement different functions.
  • the 6DOF (Degree Of Freedom) sensor 1701 can detect the posture information of the augmented reality glasses 100; the pressure sensor 1702 is used to sense pressure signals and can convert the pressure signals into electrical signals; the fingerprint sensor 1703 is used to detect the user's fingerprints Data to achieve user identity verification and other functions; the air pressure sensor 1704 is used to measure air pressure, which can calculate the altitude and assist in positioning and navigation; and so on.
  • the 6DOF (Degree Of Freedom) sensor 1701 can detect the posture information of the augmented reality glasses 100;
  • the pressure sensor 1702 is used to sense pressure signals and can convert the pressure signals into electrical signals;
  • the fingerprint sensor 1703 is used to detect the user's fingerprints Data to achieve user identity verification and other functions;
  • the air pressure sensor 1704 is used to measure air pressure, which can calculate the altitude and assist in positioning and navigation; and so on.
  • the augmented reality glasses 100 may further include a USB (Universal Serial Bus, Universal Serial Bus) interface 180, which complies with the USB standard specification, and may specifically be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and the like.
  • the USB interface 180 can be used to connect a charger to charge the augmented reality glasses 100, can also connect to a headset, play audio through the headset, and can also be used to connect to other electronic devices, such as a computer, a peripheral device, and the like.
  • the augmented reality glasses 100 may further include a charging management unit 190, configured to receive a charging input from the charger to charge the battery 1901.
  • the charger can be a wireless charger or a wired charger.
  • the charging management unit 190 may receive the charging input of the wired charger through the USB interface 180.
  • the charging management unit 190 may receive a wireless charging input through the wireless charging coil of the augmented reality glasses 100. While charging the battery 1901, the charging management unit 190 can also supply power to the device.
  • Exemplary embodiments of the present disclosure also provide a KTV implementation method based on augmented reality glasses.
  • the method may include the following steps S310 to S340:
  • Step S310 acquiring the current scene image
  • step S320 a virtual image is rendered in the scene image according to the target video
  • step S330 the virtual image and the target audio are played synchronously;
  • Step S340 When the input audio is received, the audio is played.
  • the current scene image can be captured by the camera unit of the augmented reality glasses.
  • a time stamp can be added to the virtual image so that the time stamp corresponds to the frame time stamp of the video.
  • the target audio and target video are the audio and video of the same song.
  • the target audio itself also has a timestamp, so the virtual image and the target audio can be played synchronously. The two start playing at the same time and keep the timestamps synchronized to ensure that the virtual image seen by the user is synchronized with the music heard.
  • the augmented reality glasses can receive the input audio and play the audio, so that the user's singing voice is superimposed with the target audio to achieve the effect of K song.
  • the rendering of the virtual image and the playing of the virtual image can be a simultaneous process, that is, the process is played while rendering.
  • Playing the target audio and playing the user's voice are two separate processes.
  • the target audio needs to be played according to the timestamp, while the voice is generally played instantly, that is, it is processed and played immediately after being received.
  • step S320 may include the following steps S401 and S402:
  • Step S401 identifying a plane in the scene image
  • step S402 the virtual display screen is rendered on the above-mentioned plane, and the target video is displayed in the virtual display screen.
  • a virtual display screen can be rendered on the wall plane to simulate the display screen in the KTV room.
  • the present disclosure does not limit its style, size, etc.; then it is displayed in the virtual display screen
  • the target video makes the user feel as if they are in a KTV room and has a strong sense of immersion.
  • plane recognition can be performed based on the color information, shape information, and texture information in the scene image, for example, a connected area with the same color and texture is recognized as a plane.
  • the scene image is a depth image
  • more accurate plane recognition can be performed based on the depth information, for example, a connected area with consistent color, consistent texture, and continuously changing depth can be recognized as a plane.
  • step S320 may also include: rendering virtual lights in the scene image.
  • the lighting effect of a KTV room can be simulated, and the lighting style can be adjusted according to the style of the song.
  • the KTV implementation method may further include the following steps:
  • the gesture control instruction corresponding to the gesture image is recognized, and the gesture control instruction is executed.
  • the KTV implementation method may further include the following steps:
  • the voice control instruction is executed.
  • gesture control instructions and voice control instructions may include any one or more of the following instructions: cut songs, order songs, pause playing, continue playing, increase volume, and decrease volume.
  • step S340 may include: receiving the voice input by the user, performing optimization processing on it, and playing it.
  • Optimization processing can include: noise removal, sound modification, volume adjustment, etc.
  • the target video and target audio may be searched on the Internet, or the target video and target audio may be searched in a local pre-stored file library.
  • the augmented reality glasses when the augmented reality glasses are connected to the wireless network, they can play the target audio synchronously with other devices connected to the wireless network; when any of the augmented reality glasses and other devices receive user input
  • the voice can be synchronized to all devices in the wireless network to achieve simultaneous KTV for multiple people.
  • each augmented reality glasses can also display virtual images simultaneously.
  • the KTV function is realized through the augmented reality glasses, so that the user can wear the augmented reality glasses for singing entertainment, and the implementation process is very convenient. And without the assistance of other equipment, the implementation cost is low and the space occupied is small.
  • augmented reality glasses render and display virtual images based on real scene images, so that users are immersed in an audio-visual environment that combines virtual and real, with a strong sense of interaction and immersiveness, and a better user experience.
  • Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be implemented in the form of a program product, which includes program code.
  • program product runs on a terminal device
  • program code is used to make the terminal device Perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
  • a program product 500 for implementing the above method according to an exemplary embodiment of the present disclosure is described. It may adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and may be used in a terminal Running on equipment, such as a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can adopt any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Abstract

Augmented reality glasses, an augmented reality glasses-based KTV implementation method, and a storage medium. The augmented reality glasses comprise: an image capturing unit used for capturing a current scene image; a storage unit used for storing executable instructions; a processing unit used for executing the executable instructions so as to render a virtual image in the scene image according to a target video; a display unit used for displaying the virtual image; and an audio unit used for playing back a target audio, and receiving and playing back other audio, wherein the target video and the target audio are a video and audio of the same song. The augmented reality glasses implement a KTV function. The implementation process is very convenient, and the device costs are low. Moreover, the glasses may bring users a strong sense of interaction and immersion.

Description

增强现实眼镜、基于增强现实眼镜的KTV实现方法与介质Augmented reality glasses, KTV realization method and medium based on augmented reality glasses
本申请要求于2020年01月16日提交的,申请号为202010057956.0,名称为“增强现实眼镜、基于增强现实眼镜的KTV实现方法与介质”的中国专利申请的优先权,该中国专利申请的全部内容通过引用结合在本文中。This application claims the priority of the Chinese patent application filed on January 16, 2020, with the application number 202010057956.0, titled "Augmented Reality Glasses, KTV Implementation Method and Medium Based on Augmented Reality Glasses", all of the Chinese patent applications The content is incorporated herein by reference.
技术领域Technical field
本公开涉及虚拟现实与增强现实技术领域,尤其涉及一种增强现实眼镜、基于增强现实眼镜的KTV实现方法与计算机可读存储介质。The present disclosure relates to the technical field of virtual reality and augmented reality, and in particular to an augmented reality glasses, a KTV implementation method based on the augmented reality glasses, and a computer-readable storage medium.
背景技术Background technique
KTV是一类为人们提供卡拉OK影音设备与视唱空间的场所,去KTV“K歌”已成为一种广受欢迎的休闲娱乐方式。KTV is a kind of place that provides people with karaoke audio-visual equipment and sight-singing space. Going to KTV "K song" has become a popular way of leisure and entertainment.
目前的KTV实现方式较为局限,需要人们前往KTV场所,非常不便。Current KTV implementation methods are relatively limited, requiring people to go to KTV places, which is very inconvenient.
发明内容Summary of the invention
本公开提供了一种增强现实眼镜、基于增强现实眼镜的KTV实现方法与计算机可读存储介质,进而至少在一定程度上克服KTV实现方式较为局限的问题。The present disclosure provides an augmented reality glasses, a KTV implementation method based on the augmented reality glasses, and a computer-readable storage medium, thereby overcoming the problem of the relatively limited implementation of KTV at least to a certain extent.
根据本公开的第一方面,提供一种增强现实眼镜,包括:摄像单元,用于拍摄当前的场景图像;存储单元,用于存储可执行指令;处理单元,用于通过执行所述可执行指令,以根据目标视频,在所述场景图像中渲染出虚拟影像;显示单元,用于显示所述虚拟影像;音频单元,用于播放目标音频,以及接收并播放其他音频;其中,所述目标视频和所述目标音频为同一歌曲的视频和音频。According to a first aspect of the present disclosure, there is provided an augmented reality glasses, including: a camera unit for shooting current scene images; a storage unit for storing executable instructions; a processing unit for executing the executable instructions , To render a virtual image in the scene image according to the target video; a display unit for displaying the virtual image; an audio unit for playing the target audio and receiving and playing other audio; wherein the target video And the target audio are the video and audio of the same song.
根据本公开的第二方面,提供一种基于增强现实眼镜的KTV实现方法,包括:获取当前的场景图像;根据目标视频,在所述场景图像中渲染出虚拟影像;同步播放所述虚拟影像与目标音频,所述目标视频和所述目标音频为同一歌曲的视频和音频;当接收用户输入的音频时,播放所述输入的音频。According to a second aspect of the present disclosure, there is provided a KTV implementation method based on augmented reality glasses, including: acquiring a current scene image; rendering a virtual image in the scene image according to a target video; and synchronously playing the virtual image and The target audio, the target video and the target audio are the video and audio of the same song; when the audio input by the user is received, the input audio is played.
根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述基于增强现实眼镜的KTV实现方法。According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and the computer program, when executed by a processor, realizes the above-mentioned KTV implementation method based on augmented reality glasses.
本公开的技术方案具有以下有益效果:The technical solution of the present disclosure has the following beneficial effects:
根据上述增强现实眼镜、基于增强现实眼镜的KTV实现方法与计算机可读存储介质,一方面,通过增强现实眼镜实现了KTV功能,使用户可以佩戴增强现实眼镜进行唱歌娱乐,实现过程非常方便,且无需其他设备的辅助,因而实现成本较低,占用空间较小。另一方面,增强现实眼镜基于真实的场景图像渲染并显示虚拟影像,使用户沉浸在虚实结合的视听环境中,具有较强的交互感与身临其境感,用户体验较好。According to the above-mentioned augmented reality glasses, the KTV implementation method based on augmented reality glasses, and the computer-readable storage medium, on the one hand, the KTV function is realized through the augmented reality glasses, so that users can wear augmented reality glasses for singing entertainment, and the realization process is very convenient, and Without the assistance of other equipment, the implementation cost is low and the space occupied is small. On the other hand, augmented reality glasses render and display virtual images based on real scene images, so that users are immersed in an audio-visual environment that combines virtual and real, with a strong sense of interaction and immersiveness, and a better user experience.
附图说明Description of the drawings
图1示出本示例性实施方式中增强现实眼镜的构造示意图;FIG. 1 shows a schematic diagram of the structure of augmented reality glasses in this exemplary embodiment;
图2示出本示例性实施方式中增强现实眼镜的结构图;FIG. 2 shows a structural diagram of augmented reality glasses in this exemplary embodiment;
图3示出本示例性实施方式中基于增强现实眼镜的KTV实现方法的流程图;Fig. 3 shows a flowchart of a KTV implementation method based on augmented reality glasses in this exemplary embodiment;
图4示出本示例性实施方式中渲染虚拟影像的流程图;FIG. 4 shows a flowchart of rendering a virtual image in this exemplary embodiment;
图5示出本示例性实施方式中计算机可读存储介质的示意图。Fig. 5 shows a schematic diagram of a computer-readable storage medium in this exemplary embodiment.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concept of the example embodiments will be fully conveyed To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail in order to avoid overwhelming the crowd and obscure all aspects of the present disclosure.
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。In addition, the drawings are only schematic illustrations of the present disclosure, and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.
随着电子技术与互联网的发展,为了方便人们在KTV门店以外也能K歌,出现了KTV移动柜与家庭KTV系统,手机等终端平台上也出现了K歌APP(Application,应用程序),丰富了人们的唱歌方式。然而,KTV移动柜与家庭KTV系统的设备成本较高,且设备占地较大,使用不便;在手机等终端上使用K歌APP唱歌,交互性与视听感受都较差。With the development of electronic technology and the Internet, in order to make it easier for people to sing outside of KTV stores, KTV mobile cabinets and home KTV systems have appeared, and karaoke APPs (Applications) have also appeared on terminal platforms such as mobile phones. The way people sing. However, the equipment cost of KTV mobile cabinets and home KTV systems is relatively high, and the equipment occupies a large area, which is inconvenient to use; using K song APP on mobile phones and other terminals to sing, the interactivity and audiovisual experience are poor.
鉴于上述问题,本公开的示例性实施方式提供一种增强现实(Augmented Reality,AR)眼镜。下面以图1与图2中的增强现实眼镜100为例,对其内部的单元构造进行示例性说明。本领域技术人员应当理解,增强现实眼镜100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。各部件间的接口连接关系只是示意性示出,并不构成对增强现实眼镜100的结构限定。在另一些实施方式中,增强现实眼镜100也可以采用与图1不同的接口连接方式,或多种接口连接方式的组合。In view of the foregoing problems, exemplary embodiments of the present disclosure provide an augmented reality (Augmented Reality, AR) glasses. The following takes the augmented reality glasses 100 in FIG. 1 and FIG. 2 as an example to illustrate the internal unit structure thereof. Those skilled in the art should understand that the augmented reality glasses 100 may include more or fewer components than shown, or combine certain components, or disassemble certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware. The interface connection relationship between the components is only shown schematically, and does not constitute a structural limitation on the augmented reality glasses 100. In other embodiments, the augmented reality glasses 100 may also adopt a different interface connection mode from that in FIG. 1, or a combination of multiple interface connection modes.
如图1所示,增强现实眼镜100具体可以包括摄像单元110、存储单元120、处理单元130、显示单元140与音频单元150。As shown in FIG. 1, the augmented reality glasses 100 may specifically include a camera unit 110, a storage unit 120, a processing unit 130, a display unit 140 and an audio unit 150.
摄像单元110可以由镜头、感光元件等部件组成。参考图2所示,其可以位于两 镜片中间的位置,当用户佩戴增强现实眼镜100时,摄像单元110朝向用户的正前方,可以捕获前方的静态图像或视频。The imaging unit 110 may be composed of components such as a lens and a photosensitive element. As shown in FIG. 2, it can be located in the middle of the two lenses. When the user wears the augmented reality glasses 100, the camera unit 110 faces directly in front of the user, and can capture a still image or video in front.
本示例性实施方式中,在启动增强现实眼镜100的KTV功能时,摄像单元110用于拍摄当前的场景图像,例如,用户位于房间内,摄像单元110可以拍摄用户正前方的场景图像,包括房间内的墙壁、地面、桌椅等物体。In this exemplary embodiment, when the KTV function of the augmented reality glasses 100 is activated, the camera unit 110 is used to take pictures of the current scene. For example, if the user is in a room, the camera unit 110 can take pictures of the scene in front of the user, including the room. Inside walls, floors, tables and chairs and other objects.
在一种可选的实施方式中,如图1所示,摄像单元110可以包括深度摄像头1101,例如可以是TOF(Time Of Flight,飞行时间)摄像头、双目摄像头等,可以检测场景图像中每个部分或每个物体的深度信息(即与增强现实眼镜100的轴向距离)。In an alternative implementation, as shown in FIG. 1, the camera unit 110 may include a depth camera 1101, for example, a TOF (Time Of Flight) camera, a binocular camera, etc., which can detect every scene in the image. Depth information of each part or each object (that is, the axial distance from the augmented reality glasses 100).
存储单元120用于存储可执行指令,例如可以包括操作系统代码、程序代码,还可以存储程序运行期间所产生的数据,例如摄像单元110所拍摄的图像数据,APP内的用户数据等。参考图2所示,存储单元120可以设置于两镜片中间的镜体内,也可以设置于其他位置。存储单元120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(Universal Flash Storage,UFS)等。The storage unit 120 is used to store executable instructions, such as operating system code, program code, and data generated during the running of the program, such as image data taken by the camera unit 110, user data in the APP, and so on. Referring to FIG. 2, the storage unit 120 can be arranged in the mirror body between the two lenses, or can be arranged in other positions. The storage unit 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (Universal Flash Storage, UFS), and the like.
处理单元130可以包括中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)、应用处理器(Application Processor,AP)、调制解调处理器、图像信号处理器(Image Signal Processor,ISP)、控制器、视频编解码器、数字信号处理器(Digital Signal Processor,DSP)、基带处理器和/或神经网络处理器(Neural-Network Processing Unit,NPU)等。其中,不同的处理器可以作为独立的单元,也可以集成在一个处理单元中。参考图2所示,处理单元130可以设置于两镜片中间的镜体内,也可以设置于其他位置。处理单元130可以执行存储单元120上的可执行指令,以执行相应的程序命令。The processing unit 130 may include a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), a modem processor, and an image signal processor (Image Signal Processor). Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor, and/or Neural-Network Processing Unit (NPU), etc. Among them, different processors can be used as independent units or integrated into one processing unit. Referring to FIG. 2, the processing unit 130 may be arranged in the lens body between the two lenses, or may be arranged in other positions. The processing unit 130 may execute executable instructions on the storage unit 120 to execute corresponding program commands.
本示例性实施方式中,在启动增强现实眼镜100的KTV功能时,处理单元130可以从摄像单元110获取场景图像,根据目标视频,在场景图像中渲染出虚拟影像。其中,目标视频为当前播放歌曲的视频,一般是歌曲MV(Music Video,歌曲短片);当前播放歌曲可以由用户选择,例如增强现实眼镜100运行KTV APP时,显示歌曲选择界面,用户可以通过悬停、点击等操作进行点歌,也可以由系统自动选择,例如随机播放模式或者按照一定的预设歌单播放等。在确定当前播放歌曲后,处理单元130可以获取相应的目标视频,根据该视频,在场景图像中渲染出虚拟影像。该虚拟影像可以是视频与真实的场景图像的结合,例如将视频中的画面嵌入场景图像中加以显示。In this exemplary embodiment, when the KTV function of the augmented reality glasses 100 is activated, the processing unit 130 may obtain a scene image from the camera unit 110, and render a virtual image in the scene image according to the target video. Among them, the target video is the video of the currently playing song, usually a song MV (Music Video); the currently playing song can be selected by the user. For example, when the augmented reality glasses 100 runs the KTV APP, the song selection interface is displayed, and the user can click Stop, click and other operations to order a song, or it can be automatically selected by the system, such as random play mode or play according to a certain preset playlist. After determining the currently playing song, the processing unit 130 may obtain the corresponding target video, and render a virtual image in the scene image according to the video. The virtual image may be a combination of a video and a real scene image, for example, a picture in the video is embedded in the scene image for display.
在一种可选的实施方式中,处理单元130可以用于:在场景图像中识别一平面;在平面上渲染出虚拟显示屏,并在虚拟显示屏内显示目标视频。例如,在场景图像中识别出墙壁平面,可以在墙壁平面上渲染出一个虚拟显示屏,模拟KTV房间中的显示屏,本公开对其样式、尺寸等不做限定;然后在虚拟显示屏内显示当前播放歌曲的MV,这样使用户如同置身于KTV房间内,具有较强的沉浸感。In an optional implementation manner, the processing unit 130 may be used to: identify a plane in the scene image; render a virtual display screen on the plane, and display the target video in the virtual display screen. For example, if the wall plane is recognized in the scene image, a virtual display screen can be rendered on the wall plane to simulate the display screen in the KTV room. The present disclosure does not limit its style, size, etc.; then it is displayed in the virtual display screen The MV of the song currently being played makes the user feel like being in a KTV room with a strong sense of immersion.
需要说明的是,当场景图像为平面图像时,处理单元130可以基于场景图像中的 颜色信息、形状信息与纹理信息进行平面识别,例如将颜色一致、纹理一致的连通区域识别为一个平面。当摄像单元110包括深度摄像头1101时,所采集的场景图像可以携带深度信息,则处理单元130可以基于深度信息进行更加准确的平面识别,例如将颜色一致、纹理一致、且深度连续变化的连通区域识别为一个平面。It should be noted that when the scene image is a plane image, the processing unit 130 may perform plane recognition based on the color information, shape information, and texture information in the scene image, for example, recognize connected areas with consistent colors and consistent textures as a plane. When the camera unit 110 includes a depth camera 1101, the collected scene image can carry depth information, and the processing unit 130 can perform more accurate plane recognition based on the depth information, such as identifying connected areas with consistent colors, consistent textures, and continuously changing depths. Recognized as a plane.
进一步的,处理单元130内可以设置图像识别单元1301,以识别场景图像中的平面,例如图像识别单元1301可以运行卷积神经网络(Convolutional Neural Network,CNN),以处理场景图像,输出平面识别结果。Further, an image recognition unit 1301 can be provided in the processing unit 130 to recognize planes in the scene image. For example, the image recognition unit 1301 can run a Convolutional Neural Network (CNN) to process the scene image and output the plane recognition result. .
处理单元130内也可以设置图像渲染单元1302,专门用于渲染虚拟影像。An image rendering unit 1302 may also be provided in the processing unit 130, specifically for rendering virtual images.
在一种可选的实施方式中,除了渲染出虚拟播放的视频外,还可以渲染其他虚拟元素,包括但不限于:在场景图像中渲染出虚拟灯光,模拟KTV房间的灯光效果,并可以根据歌曲风格的不同而进行切换;渲染出虚拟的麦克风或演唱舞台;渲染出虚拟人物在前方跳舞的动画。In an optional implementation manner, in addition to rendering the virtual playing video, other virtual elements can also be rendered, including but not limited to: rendering virtual lights in the scene image, simulating the lighting effects of the KTV room, and according to Switching for different song styles; rendering a virtual microphone or singing stage; rendering an animation of virtual characters dancing in front.
显示单元140可以显示图像,视频等。参考图2所示,显示单元140一般设置为镜片的形式,用户透过镜片看到真实场景,而处理单元130将虚拟影像传输到显示单元140上显示,使用户看到真实和虚拟叠加的影像效果。因此显示单元140需要具备“透视”(See-Through)的功能,既要看到真实的外部世界,也要看到虚拟信息,以实现现实和虚拟的融合与“增强”。在一种可选的实施方式中,如图1所示,显示单元140可以包括微型显示屏(Display)1401与透镜(Lens)1402。微型显示屏1401用于提供显示内容,可以是自发光的有源器件,如发光二极管面板、或具有外部光源照明的液晶显示屏等;透镜1402用于使人眼看到真实场景,从而对真实场景影像和虚拟影像进行叠加。The display unit 140 can display images, videos, and the like. As shown in FIG. 2, the display unit 140 is generally set in the form of a lens. The user sees the real scene through the lens, and the processing unit 130 transmits the virtual image to the display unit 140 for display, so that the user can see the real and virtual superimposed images. Effect. Therefore, the display unit 140 needs to have a “see-through” function, which not only needs to see the real external world, but also sees virtual information, so as to realize the integration and “enhancement” of reality and virtuality. In an alternative embodiment, as shown in FIG. 1, the display unit 140 may include a micro display (Display) 1401 and a lens (Lens) 1402. The micro display screen 1401 is used to provide display content, and can be a self-luminous active device, such as a light-emitting diode panel, or a liquid crystal display with external light source illumination, etc.; the lens 1402 is used to make the human eye see the real scene, so as to The image and the virtual image are superimposed.
本示例性实施方式中,处理单元130渲染出虚拟影像后,将其传输至显示单元140,加以显示。这样用户通过增强现实眼镜100,同时看到前方的真实场景与虚拟影像。In this exemplary embodiment, after the processing unit 130 renders the virtual image, it is transmitted to the display unit 140 for display. In this way, the user can see the real scene and the virtual image in front at the same time through the augmented reality glasses 100.
音频单元150可以将数字音频信号转换成模拟音频信号输出,也可以将模拟音频输入转换为数字音频信号,还可以用于对音频信号编码和解码。在一些实施方式中,音频单元150可以设置于处理单元130中,或将音频单元150的部分功能模块设置于处理单元130中。The audio unit 150 can convert a digital audio signal into an analog audio signal for output, can also convert an analog audio input into a digital audio signal, and can also be used to encode and decode audio signals. In some embodiments, the audio unit 150 may be provided in the processing unit 130, or part of the functional modules of the audio unit 150 may be provided in the processing unit 130.
本示例性实施方式中,音频单元150用于播放目标音频,以及接收并播放其他音频。目标视频和目标音频为同一歌曲的视频和音频;其他音频是指目标音频以外,外部输入的音频。当显示单元140播放虚拟影像时,音频单元150同步播放目标音频,使用户可以对照音画开始唱歌。音频单元150可以包括麦克风1501,也称“话筒”,“传声器”等,用于接收外部输入的音频,例如用户唱歌的声音或者语音控制指令的音频,对于用户唱歌的声音,可以经过转化,将其同步播放出来。音频单元150还可以包括耳机,例如骨传导耳机1502,其可以实现高质量的音频播放,如可以实现杜比音效。In this exemplary embodiment, the audio unit 150 is used to play target audio, and to receive and play other audio. The target video and target audio are the video and audio of the same song; other audio refers to the externally input audio besides the target audio. When the display unit 140 plays the virtual image, the audio unit 150 plays the target audio synchronously, so that the user can start singing according to the audio and video. The audio unit 150 may include a microphone 1501, also called "microphone", "microphone", etc., for receiving externally input audio, such as the user's singing voice or the audio of voice control instructions. For the user's singing voice, it can be converted to It is played out synchronously. The audio unit 150 may also include earphones, such as a bone conduction earphone 1502, which can realize high-quality audio playback, such as Dolby sound effects.
本公开对于麦克风1501、骨传导耳机1502的数量与位置不做限定。参考图2所示,在一种可选的实施方式中,麦克风1501可以位于镜腿部前端的下方,呈现阵列式分布, 以更好的接收用户语音。骨传导耳机1502可以位于镜腿部的中后端,靠近耳朵的位置,这样有利于实现高质量的声音传导。The present disclosure does not limit the number and positions of the microphone 1501 and the bone conduction earphone 1502. As shown in FIG. 2, in an optional implementation manner, the microphone 1501 may be located below the front end of the temple part and present an array distribution to better receive the user's voice. The bone conduction earphone 1502 can be located at the middle and rear ends of the temples, close to the position of the ear, which is conducive to achieving high-quality sound conduction.
在一种可选的实施方式中,用户唱歌时,通过麦克风1501接收用户输入的语音,处理单元130可以对语音进行优化处理,并控制音频单元150播放优化处理后的语音。优化处理可以包括以下任意一种或多种:去杂音(如去除环境噪音)、声音修饰(如对音调、颤音等进行适当的美化)、音量调节(根据音频播放的音量适应性调节语音的音量,以防止语音音量过高或过低)。使用户听到经过优化的歌声,体验更好。In an optional implementation manner, when the user sings, the voice input by the user is received through the microphone 1501, and the processing unit 130 may optimize the voice and control the audio unit 150 to play the optimized voice. Optimization processing can include any one or more of the following: denoising (such as removing environmental noise), sound modification (such as appropriate beautification of pitch, vibrato, etc.), volume adjustment (adaptively adjusting the volume of the voice according to the volume of the audio playback) To prevent the voice volume from being too high or too low). Allow users to hear the optimized singing voice, and the experience is better.
用户在唱歌过程中,可能需要进行歌曲的相关控制,如切歌、点歌、暂停播放、继续播放、增大音量、减小音量等。为了便于用户操作,下面提供两种关于实现用户控制的方式:In the process of singing, the user may need to perform related control of the song, such as cutting songs, ordering songs, pausing playback, continuing playback, increasing the volume, reducing the volume, etc. In order to facilitate user operations, the following provides two ways to implement user control:
(1)摄像单元110,还可以用拍摄用户的手势图像,可以是一帧的静态图像,也可以是连续多帧的动态图像。处理单元130,还可以识别手势图像对应的手势控制指令,并执行手势控制指令。这样用户在唱歌过程中可以进行手势控制,例如上下滑动手指,左右摆动手指等,摄像单元110,拍摄手势图像后发送至处理单元130,处理单元130对手势图像进行识别,例如可以通过图像识别单元1301运行卷积神经网络以得到手势识别结果,然后根据预设的手势与控制指令的对应关系,确定手势控制指令,并执行该手势控制指令。(1) The camera unit 110 can also be used to take a user's gesture image, which can be a static image of one frame or a dynamic image of multiple consecutive frames. The processing unit 130 may also recognize the gesture control instruction corresponding to the gesture image, and execute the gesture control instruction. In this way, the user can perform gesture control during the singing process, such as sliding the finger up and down, swinging the finger left and right, etc. The camera unit 110 takes the gesture image and sends it to the processing unit 130. The processing unit 130 recognizes the gesture image, for example, through the image recognition unit. 1301 runs the convolutional neural network to obtain the gesture recognition result, and then determines the gesture control instruction according to the preset correspondence between the gesture and the control instruction, and executes the gesture control instruction.
(2)用户还可以通过音频单元150的麦克风1501输入语音控制指令,例如用户对着麦克风1501说出“切歌”。然后处理单元130识别用户输入的语音控制指令,如识别出“切歌”的指令,则执行该语音控制指令。在一种可选的实施方式中,处理单元130内可以设置语音识别单元1303,通过运行语音识别的机器学习模型,以处理用户语音,输出识别结果。一般的,可以设置特定的唤起语音,当用户说出该唤起语音时,唤醒语音控制功能,其他时候则将用户输入的语音作为唱歌的声音进行播放处理。(2) The user can also input voice control instructions through the microphone 1501 of the audio unit 150, for example, the user speaks "cut song" into the microphone 1501. Then the processing unit 130 recognizes the voice control instruction input by the user, and executes the voice control instruction if it recognizes the "cut song" instruction. In an optional implementation manner, a voice recognition unit 1303 may be provided in the processing unit 130, and a machine learning model of voice recognition is run to process the user's voice and output the recognition result. Generally, a specific evoked voice can be set. When the user speaks the evoked voice, the voice control function is awakened, and at other times, the voice input by the user is played as a singing voice.
无论是手势控制指令或语音控制指令,均可以实现如切歌、点歌、暂停播放、继续播放、增大音量、减小音量等操作,本公开对此不做限定。Whether it is a gesture control instruction or a voice control instruction, operations such as switching songs, ordering songs, pausing playback, continuing playback, increasing volume, and decreasing volume can be implemented, which are not limited in the present disclosure.
参考图1所示,增强现实眼镜100还可以包括通信单元160,其可以提供包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络)、蓝牙(Bluetooth,BT)、全球导航卫星系统(Global Navigation Satellite System,GNSS)、调频(Frequency Modulation,FM)、近距离无线通信技术(Near Field Communication,NFC)、红外技术(Infrared,IR)等无线通信的解决方案,使增强现实眼镜100连接到互联网,或与其他设备形成连接。因而,通信单元160可以用于获取目标视频和目标音频,例如通信单元160连接互联网,当用户点歌时,从互联网上搜索并下载相应的视频和音频,或者通信单元160在连接Wi-Fi时自动更新歌曲库,将视频和音频预先存储在存储单元120中,当用户唱歌时,可以直接从存储单元120中读取视频和音频文件。As shown in FIG. 1, the augmented reality glasses 100 may also include a communication unit 160, which may provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) networks), Bluetooth (Bluetooth) , BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared, IR) and other wireless communication solutions The solution is to connect the augmented reality glasses 100 to the Internet or to form a connection with other devices. Therefore, the communication unit 160 can be used to obtain the target video and target audio. For example, the communication unit 160 is connected to the Internet, and when the user orders a song, the corresponding video and audio are searched and downloaded from the Internet, or the communication unit 160 is connected to Wi-Fi. The song library is automatically updated, and the video and audio are stored in the storage unit 120 in advance. When the user sings, the video and audio files can be directly read from the storage unit 120.
在一种可选的实施方式中,通信单元160可用于接入无线网络,同时若有其他设 备也接入该无线网络,则增强现实眼镜100可以和其他设备同步播放目标音频;当增强现实眼镜100和其他设备中有任意设备接收到用户输入的语音时,可以同步到无线网络中的所有设备,以实现多人同时KTV。进一步的,若接入无线网络的其他设备也是增强现实眼镜,则各增强现实眼镜还可以同步显示虚拟影像,以带给用户较强的试听交互感。In an alternative embodiment, the communication unit 160 can be used to access a wireless network, and if other devices also access the wireless network, the augmented reality glasses 100 can play the target audio synchronously with other devices; when the augmented reality glasses When any device in the 100 and other devices receives the voice input by the user, it can be synchronized to all devices in the wireless network to realize simultaneous KTV for multiple people. Further, if other devices connected to the wireless network are also augmented reality glasses, each augmented reality glasses can also display virtual images synchronously, so as to give the user a strong sense of trial and listening interaction.
在一种可选的实施方式中,增强现实眼镜100还可以包括传感器单元170,其由不同类型的传感器组成,用于实现不同的功能。例如,6DOF(Degree Of Freedom,自由度)传感器1701可以检测增强现实眼镜100的姿态信息;压力传感器1702用于感受压力信号,可以将压力信号转换成电信号;指纹传感器1703用于检测用户的指纹数据,以实现用户身份验证等功能;气压传感器1704用于测量气压,可以实现海拔高度的计算,辅助进行定位和导航;等等。In an optional implementation manner, the augmented reality glasses 100 may further include a sensor unit 170, which is composed of different types of sensors and is used to implement different functions. For example, the 6DOF (Degree Of Freedom) sensor 1701 can detect the posture information of the augmented reality glasses 100; the pressure sensor 1702 is used to sense pressure signals and can convert the pressure signals into electrical signals; the fingerprint sensor 1703 is used to detect the user's fingerprints Data to achieve user identity verification and other functions; the air pressure sensor 1704 is used to measure air pressure, which can calculate the altitude and assist in positioning and navigation; and so on.
在一种可选的实施方式中,增强现实眼镜100还可以包括USB(Universal Serial Bus,通用串行总线)接口180,其符合USB标准规范,具体可以是MiniUSB接口,MicroUSB接口,USBTypeC接口等。USB接口180可以用于连接充电器为增强现实眼镜100充电,也可以连接耳机,通过耳机播放音频,还可以用于连接其他电子设备,例如连接电脑、外围设备等。In an optional implementation manner, the augmented reality glasses 100 may further include a USB (Universal Serial Bus, Universal Serial Bus) interface 180, which complies with the USB standard specification, and may specifically be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and the like. The USB interface 180 can be used to connect a charger to charge the augmented reality glasses 100, can also connect to a headset, play audio through the headset, and can also be used to connect to other electronic devices, such as a computer, a peripheral device, and the like.
在一种可选的实施方式中,增强现实眼镜100还可以包括充电管理单元190,用于从充电器接收充电输入,为电池1901充电。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施方式中,充电管理单元190可以通过USB接口180接收有线充电器的充电输入。在一些无线充电的实施方式中,充电管理单元190可以通过增强现实眼镜100的无线充电线圈接收无线充电输入。充电管理单元190为电池1901充电的同时,还可以为设备供电。In an optional implementation manner, the augmented reality glasses 100 may further include a charging management unit 190, configured to receive a charging input from the charger to charge the battery 1901. Among them, the charger can be a wireless charger or a wired charger. In some wired charging implementations, the charging management unit 190 may receive the charging input of the wired charger through the USB interface 180. In some implementations of wireless charging, the charging management unit 190 may receive a wireless charging input through the wireless charging coil of the augmented reality glasses 100. While charging the battery 1901, the charging management unit 190 can also supply power to the device.
本公开的示例性实施方式还提供一种基于增强现实眼镜的KTV实现方法。参考图3所示,该方法可以包括以下步骤S310至S340:Exemplary embodiments of the present disclosure also provide a KTV implementation method based on augmented reality glasses. Referring to FIG. 3, the method may include the following steps S310 to S340:
步骤S310,获取当前的场景图像;Step S310, acquiring the current scene image;
步骤S320,根据目标视频,在场景图像中渲染出虚拟影像;In step S320, a virtual image is rendered in the scene image according to the target video;
步骤S330,同步播放虚拟影像与目标音频;In step S330, the virtual image and the target audio are played synchronously;
步骤S340,当接收输入的音频时,播放该音频。Step S340: When the input audio is received, the audio is played.
其中,当前的场景图像可以由增强现实眼镜的摄像单元拍摄而获得。在根据当前播放歌曲的目标视频渲染虚拟影像时,可以在虚拟影像中添加时间戳,使其时间戳与视频的帧时间戳相对应。目标音频和目标视频为同一歌曲的音频和视频。目标音频本身也具有时间戳,因此可以同步播放虚拟影像与目标音频,两者在同一时刻开始播放,并保持时间戳同步,保证用户看到的虚拟影像与听到的音乐是同步的。当用户唱歌时,增强现实眼镜可以接收输入的音频,并播放该音频,使用户唱歌的声音与目标音频叠加在一起,达到K歌的效果。Among them, the current scene image can be captured by the camera unit of the augmented reality glasses. When rendering the virtual image according to the target video of the currently playing song, a time stamp can be added to the virtual image so that the time stamp corresponds to the frame time stamp of the video. The target audio and target video are the audio and video of the same song. The target audio itself also has a timestamp, so the virtual image and the target audio can be played synchronously. The two start playing at the same time and keep the timestamps synchronized to ensure that the virtual image seen by the user is synchronized with the music heard. When the user sings, the augmented reality glasses can receive the input audio and play the audio, so that the user's singing voice is superimposed with the target audio to achieve the effect of K song.
需要说明的是,渲染虚拟影像与播放虚拟影像可以是同时进行的过程,即一边渲 染一边播放。播放目标音频与播放用户语音是独立的两个过程,目标音频需要按照时间戳来播放,而语音一般是即时播放,即接收后经过处理,立即播放。It should be noted that the rendering of the virtual image and the playing of the virtual image can be a simultaneous process, that is, the process is played while rendering. Playing the target audio and playing the user's voice are two separate processes. The target audio needs to be played according to the timestamp, while the voice is generally played instantly, that is, it is processed and played immediately after being received.
在一种可选的实施方式中,参考图4所示,步骤S320可以包括以下步骤S401和S402:In an optional implementation manner, referring to FIG. 4, step S320 may include the following steps S401 and S402:
步骤S401,在场景图像中识别一平面;Step S401, identifying a plane in the scene image;
步骤S402,在上述平面上渲染出虚拟显示屏,并在虚拟显示屏内显示目标视频。In step S402, the virtual display screen is rendered on the above-mentioned plane, and the target video is displayed in the virtual display screen.
例如,在场景图像中识别出墙壁平面,可以在墙壁平面上渲染出一个虚拟显示屏,模拟KTV房间中的显示屏,本公开对其样式、尺寸等不做限定;然后在虚拟显示屏内显示目标视频,这样使用户如同置身于KTV房间内,具有较强的沉浸感。For example, if the wall plane is recognized in the scene image, a virtual display screen can be rendered on the wall plane to simulate the display screen in the KTV room. The present disclosure does not limit its style, size, etc.; then it is displayed in the virtual display screen The target video makes the user feel as if they are in a KTV room and has a strong sense of immersion.
当场景图像为平面图像时,可以基于场景图像中的颜色信息、形状信息与纹理信息进行平面识别,例如将颜色一致、纹理一致的连通区域识别为一个平面。当场景图像为深度图像时,可以基于深度信息进行更加准确的平面识别,例如将颜色一致、纹理一致、且深度连续变化的连通区域识别为一个平面。When the scene image is a plane image, plane recognition can be performed based on the color information, shape information, and texture information in the scene image, for example, a connected area with the same color and texture is recognized as a plane. When the scene image is a depth image, more accurate plane recognition can be performed based on the depth information, for example, a connected area with consistent color, consistent texture, and continuously changing depth can be recognized as a plane.
进一步的,步骤S320还可以包括:在场景图像中渲染出虚拟灯光。例如可以模拟KTV房间的灯光效果,并可以根据歌曲的风格而调整灯光的风格。此外,还可以渲染出虚拟的麦克风或演唱舞台,渲染出虚拟人物在前方跳舞的动画等,本公开对此不做限定。Further, step S320 may also include: rendering virtual lights in the scene image. For example, the lighting effect of a KTV room can be simulated, and the lighting style can be adjusted according to the style of the song. In addition, it is also possible to render a virtual microphone or a singing stage, and render an animation of a virtual character dancing in front, which is not limited in the present disclosure.
在一种可选的实施方式中,KTV实现方法还可以包括以下步骤:In an optional implementation manner, the KTV implementation method may further include the following steps:
获取用户的手势图像;Obtain the user's gesture image;
识别手势图像对应的手势控制指令,并执行该手势控制指令。The gesture control instruction corresponding to the gesture image is recognized, and the gesture control instruction is executed.
在一种可选的实施方式中,KTV实现方法还可以包括以下步骤:In an optional implementation manner, the KTV implementation method may further include the following steps:
当识别用户输入的语音为语音控制指令时,执行该语音控制指令。When it is recognized that the voice input by the user is a voice control instruction, the voice control instruction is executed.
其中,上述手势控制指令与语音控制指令均可以包括以下任意一种或多种指令:切歌、点歌、暂停播放、继续播放、增大音量、减小音量。Wherein, the aforementioned gesture control instructions and voice control instructions may include any one or more of the following instructions: cut songs, order songs, pause playing, continue playing, increase volume, and decrease volume.
在一种可选的实施方式中,步骤S340可以包括:接收用户输入的语音,对其进行优化处理后播放。优化处理可以包括:去杂音、声音修饰、音量调节等。In an optional implementation manner, step S340 may include: receiving the voice input by the user, performing optimization processing on it, and playing it. Optimization processing can include: noise removal, sound modification, volume adjustment, etc.
在一种可选的实施方式中,可以在互联网上搜索目标视频和目标音频,或者在本地预先存储的文件库中查找目标视频和目标音频。In an optional implementation manner, the target video and target audio may be searched on the Internet, or the target video and target audio may be searched in a local pre-stored file library.
在一种可选的实施方式中,当增强现实眼镜接入无线网络时,可以和接入该无线网络的其他设备同步播放目标音频;当增强现实眼镜和其他设备中有任意设备接收到用户输入的语音时,可以同步到无线网络中的所有设备,以实现多人同时KTV。进一步的,若接入无线网络的其他设备也是增强现实眼镜,则各增强现实眼镜还可以同步显示虚拟影像。In an optional implementation manner, when the augmented reality glasses are connected to the wireless network, they can play the target audio synchronously with other devices connected to the wireless network; when any of the augmented reality glasses and other devices receive user input The voice can be synchronized to all devices in the wireless network to achieve simultaneous KTV for multiple people. Further, if other devices connected to the wireless network are also augmented reality glasses, each augmented reality glasses can also display virtual images simultaneously.
综上所述,本示例性实施方式中,基于上述增强现实眼镜以及KTV实现方法,一方面,通过增强现实眼镜实现了KTV功能,使用户可以佩戴增强现实眼镜进行唱歌娱乐,实现过程非常方便,且无需其他设备的辅助,因而实现成本较低,占用空间较小。 另一方面,增强现实眼镜基于真实的场景图像渲染并显示虚拟影像,使用户沉浸在虚实结合的视听环境中,具有较强的交互感与身临其境感,用户体验较好。In summary, in this exemplary embodiment, based on the above-mentioned augmented reality glasses and KTV implementation method, on the one hand, the KTV function is realized through the augmented reality glasses, so that the user can wear the augmented reality glasses for singing entertainment, and the implementation process is very convenient. And without the assistance of other equipment, the implementation cost is low and the space occupied is small. On the other hand, augmented reality glasses render and display virtual images based on real scene images, so that users are immersed in an audio-visual environment that combines virtual and real, with a strong sense of interaction and immersiveness, and a better user experience.
本公开的示例性实施方式还提供了一种计算机可读存储介质,可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the terminal device Perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
参考图5所示,描述了根据本公开的示例性实施方式的用于实现上述方法的程序产品500,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。As shown in FIG. 5, a program product 500 for implementing the above method according to an exemplary embodiment of the present disclosure is described. It may adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and may be used in a terminal Running on equipment, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在 一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开示例性实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
本技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".
此外,上述附图仅是根据本公开示例性实施方式的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiment of the present disclosure, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the exemplary embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限定。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims (20)

  1. 一种增强现实眼镜,其特征在于,包括:An augmented reality glasses, characterized by comprising:
    摄像单元,用于拍摄当前的场景图像;Camera unit, used to shoot the current scene image;
    存储单元,用于存储可执行指令;The storage unit is used to store executable instructions;
    处理单元,用于通过执行所述可执行指令,以根据目标视频,在所述场景图像中渲染出虚拟影像;A processing unit, configured to execute the executable instruction to render a virtual image in the scene image according to the target video;
    显示单元,用于显示所述虚拟影像;A display unit for displaying the virtual image;
    音频单元,用于播放目标音频,以及接收并播放其他音频;Audio unit, used to play target audio, and receive and play other audio;
    其中,所述目标视频和所述目标音频为同一歌曲的视频和音频。Wherein, the target video and the target audio are the video and audio of the same song.
  2. 根据权利要求1所述的增强现实眼镜,其特征在于,所述摄像单元,还用于拍摄用户的手势图像;The augmented reality glasses according to claim 1, wherein the camera unit is also used to take a user's gesture image;
    所述处理单元,还用于识别所述手势图像对应的手势控制指令,并执行所述手势控制指令。The processing unit is further configured to recognize a gesture control instruction corresponding to the gesture image, and execute the gesture control instruction.
  3. 根据权利要求1所述的增强现实眼镜,其特征在于,所述处理单元,还用于识别用户输入的语音控制指令,并执行所述语音控制指令。The augmented reality glasses according to claim 1, wherein the processing unit is further configured to recognize a voice control instruction input by a user, and execute the voice control instruction.
  4. 根据权利要求2或3所述的增强现实眼镜,其特征在于,所述手势控制指令或语音控制指令包括以下任意一种或多种:The augmented reality glasses according to claim 2 or 3, wherein the gesture control instruction or voice control instruction includes any one or more of the following:
    切歌、点歌、暂停播放、继续播放、增大音量、减小音量。Cut song, order song, pause playing, continue playing, increase volume, decrease volume.
  5. 根据权利要求1至4任一项所述的增强现实眼镜,其特征在于,所述增强现实眼镜还包括:The augmented reality glasses according to any one of claims 1 to 4, wherein the augmented reality glasses further comprise:
    通信单元,用于获取所述目标视频和所述目标音频。The communication unit is used to obtain the target video and the target audio.
  6. 根据权利要求5所述的方法,其特征在于,所述通信单元,还用于接入无线网络,以使所述增强现实眼镜和接入所述无线网络的其他设备同步播放所述目标音频。The method according to claim 5, wherein the communication unit is further configured to access a wireless network, so that the augmented reality glasses and other devices connected to the wireless network can play the target audio synchronously.
  7. 根据权利要求1至4任一项所述的增强现实眼镜,其特征在于,所述处理单元,还用于对用户输入的语音进行优化处理,并控制所述音频单元播放优化处理后的语音。The augmented reality glasses according to any one of claims 1 to 4, wherein the processing unit is further configured to optimize the voice input by the user, and control the audio unit to play the optimized voice.
  8. 根据权利要求7所述的方法,其特征在于,所述优化处理包括以下任意一种或多种:去杂音、声音修饰、音量调节。The method according to claim 7, wherein the optimization processing includes any one or more of the following: noise removal, sound modification, and volume adjustment.
  9. 根据权利要求1至4任一项所述的增强现实眼镜,其特征在于,所述音频单元包括骨传导耳机,所述骨传导耳机设于所述增强现实眼镜的镜腿部。The augmented reality glasses according to any one of claims 1 to 4, wherein the audio unit comprises a bone conduction headset, and the bone conduction headset is provided on a temple of the augmented reality glasses.
  10. 根据权利要求1至4任一项所述的增强现实眼镜,其特征在于,所述摄像单元包括深度摄像头。The augmented reality glasses according to any one of claims 1 to 4, wherein the camera unit comprises a depth camera.
  11. 一种基于增强现实眼镜的KTV实现方法,其特征在于,包括:A KTV implementation method based on augmented reality glasses, characterized in that it includes:
    获取当前的场景图像;Get the current scene image;
    根据目标视频,在所述场景图像中渲染出虚拟影像;According to the target video, render a virtual image in the scene image;
    同步播放所述虚拟影像与目标音频,所述目标视频和所述目标音频为同一歌曲的视频和音频;Synchronously playing the virtual image and target audio, where the target video and the target audio are the video and audio of the same song;
    当接收输入的音频时,播放所述输入的音频。When the input audio is received, the input audio is played.
  12. 根据权利要求11所述的方法,其特征在于,所述根据目标视频,在所述场景图像中渲染出虚拟影像,包括:The method according to claim 11, wherein the rendering a virtual image in the scene image according to the target video comprises:
    在所述场景图像中识别一平面;Identifying a plane in the scene image;
    在所述平面上渲染出虚拟显示屏,并在所述虚拟显示屏内显示所述目标视频。A virtual display screen is rendered on the plane, and the target video is displayed in the virtual display screen.
  13. 根据权利要求12所述的方法,其特征在于,所述在所述场景图像中识别一平面,包括:The method of claim 12, wherein the identifying a plane in the scene image comprises:
    当所述场景图像为平面图像时,将所述场景图像中颜色一致、纹理一致的连通区域识别为一个所述平面。When the scene image is a plane image, the connected areas with the same color and the same texture in the scene image are identified as one plane.
  14. 根据权利要求12所述的方法,其特征在于,所述在所述场景图像中识别一平面,包括:The method of claim 12, wherein the identifying a plane in the scene image comprises:
    当所述场景图像为深度图像时,将所述场景图像中颜色一致、纹理一致、且深度连续变化的连通区域识别为一个所述平面。When the scene image is a depth image, a connected area in the scene image that has the same color, the same texture, and continuously changes in depth is identified as one of the planes.
  15. 根据权利要求12所述的方法,其特征在于,所述根据目标视频,在所述场景图像中渲染出虚拟影像,还包括:The method according to claim 12, wherein said rendering a virtual image in said scene image according to the target video, further comprising:
    在所述场景图像中渲染出虚拟灯光。Render virtual lights in the scene image.
  16. 根据权利要求11所述的方法,其特征在于,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    获取用户的手势图像;Obtain the user's gesture image;
    识别所述手势图像对应的手势控制指令,并执行所述手势控制指令。The gesture control instruction corresponding to the gesture image is recognized, and the gesture control instruction is executed.
  17. 根据权利要求11所述的方法,其特征在于,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    当识别用户输入的语音为语音控制指令时,执行所述语音控制指令。When it is recognized that the voice input by the user is a voice control instruction, the voice control instruction is executed.
  18. 根据权利要求16或17所述的方法,其特征在于,所述手势控制指令或语音控制指令包括以下任意一种或多种:The method according to claim 16 or 17, wherein the gesture control instruction or voice control instruction includes any one or more of the following:
    切歌、点歌、暂停播放、继续播放、增大音量、减小音量。Cut song, order song, pause playing, continue playing, increase volume, decrease volume.
  19. 根据权利要求11所述的方法,其特征在于,所述当接收输入的音频时,播放所述输入的音频,包括:The method according to claim 11, wherein when receiving the input audio, playing the input audio comprises:
    接收用户输入的语音,对所述语音进行优化处理后播放;Receiving the voice input by the user, and playing the voice after optimization processing;
    所述优化处理包括以下任意一种或多种:The optimization processing includes any one or more of the following:
    去杂音、声音修饰、音量调节。Noise removal, sound modification, volume adjustment.
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求11至19任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the method according to any one of claims 11 to 19 when the computer program is executed by a processor.
PCT/CN2021/070281 2020-01-16 2021-01-05 Augmented reality glasses, augmented reality glasses-based ktv implementation method and medium WO2021143574A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010057956.0A CN111273775A (en) 2020-01-16 2020-01-16 Augmented reality glasses, KTV implementation method based on augmented reality glasses and medium
CN202010057956.0 2020-01-16

Publications (1)

Publication Number Publication Date
WO2021143574A1 true WO2021143574A1 (en) 2021-07-22

Family

ID=70997467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070281 WO2021143574A1 (en) 2020-01-16 2021-01-05 Augmented reality glasses, augmented reality glasses-based ktv implementation method and medium

Country Status (2)

Country Link
CN (1) CN111273775A (en)
WO (1) WO2021143574A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111273775A (en) * 2020-01-16 2020-06-12 Oppo广东移动通信有限公司 Augmented reality glasses, KTV implementation method based on augmented reality glasses and medium
CN112288877A (en) * 2020-10-28 2021-01-29 北京字节跳动网络技术有限公司 Video playing method and device, electronic equipment and storage medium
CN112367426B (en) * 2020-11-09 2021-06-04 Oppo广东移动通信有限公司 Virtual object display method and device, storage medium and electronic equipment
CN113724398A (en) * 2021-09-01 2021-11-30 北京百度网讯科技有限公司 Augmented reality method, apparatus, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070122786A1 (en) * 2005-11-29 2007-05-31 Broadcom Corporation Video karaoke system
CN105163191A (en) * 2015-10-13 2015-12-16 腾叙然 System and method of applying VR device to KTV karaoke
CN207283723U (en) * 2017-09-28 2018-04-27 深圳晶恒数码科技有限公司 A kind of AR mixed realities intelligent control and the song-order machine device of viewing
CN109920065A (en) * 2019-03-18 2019-06-21 腾讯科技(深圳)有限公司 Methods of exhibiting, device, equipment and the storage medium of information
CN110362204A (en) * 2019-07-11 2019-10-22 Oppo广东移动通信有限公司 Information cuing method, device, storage medium and augmented reality equipment
CN111273775A (en) * 2020-01-16 2020-06-12 Oppo广东移动通信有限公司 Augmented reality glasses, KTV implementation method based on augmented reality glasses and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102419065B1 (en) * 2017-02-28 2022-07-07 매직 립, 인코포레이티드 Virtual and real object recording in mixed reality device
CN107635131B (en) * 2017-09-01 2020-05-19 北京雷石天地电子技术有限公司 Method and system for realizing virtual reality
CN207441228U (en) * 2017-11-28 2018-06-01 信利光电股份有限公司 The order programme and KTV of a kind of KTV
CN109841204A (en) * 2017-11-29 2019-06-04 四川动力图划文化传媒有限公司德阳分公司 A kind of K song system based on VR technology
US10712901B2 (en) * 2018-06-27 2020-07-14 Facebook Technologies, Llc Gesture-based content sharing in artificial reality environments
CN110018571A (en) * 2018-12-31 2019-07-16 浙江理工大学 The Portable virtual reality KTV helmet

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070122786A1 (en) * 2005-11-29 2007-05-31 Broadcom Corporation Video karaoke system
CN105163191A (en) * 2015-10-13 2015-12-16 腾叙然 System and method of applying VR device to KTV karaoke
CN207283723U (en) * 2017-09-28 2018-04-27 深圳晶恒数码科技有限公司 A kind of AR mixed realities intelligent control and the song-order machine device of viewing
CN109920065A (en) * 2019-03-18 2019-06-21 腾讯科技(深圳)有限公司 Methods of exhibiting, device, equipment and the storage medium of information
CN110362204A (en) * 2019-07-11 2019-10-22 Oppo广东移动通信有限公司 Information cuing method, device, storage medium and augmented reality equipment
CN111273775A (en) * 2020-01-16 2020-06-12 Oppo广东移动通信有限公司 Augmented reality glasses, KTV implementation method based on augmented reality glasses and medium

Also Published As

Publication number Publication date
CN111273775A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
WO2021143574A1 (en) Augmented reality glasses, augmented reality glasses-based ktv implementation method and medium
JP6961007B2 (en) Recording virtual and real objects in mixed reality devices
KR102197544B1 (en) Mixed reality system with spatialized audio
CN110121695B (en) Apparatus in a virtual reality domain and associated methods
US11800313B2 (en) Immersive audio platform
KR101614790B1 (en) Camera driven audio spatialization
US10798518B2 (en) Apparatus and associated methods
CN105163191A (en) System and method of applying VR device to KTV karaoke
JP2022188081A (en) Information processing apparatus, information processing system, and information processing method
KR20220148915A (en) Audio processing methods, apparatus, readable media and electronic devices
US11743645B2 (en) Method and device for sound processing for a synthesized reality setting
CN114422935B (en) Audio processing method, terminal and computer readable storage medium
CN113420177A (en) Audio data processing method and device, computer equipment and storage medium
US20230344973A1 (en) Variable audio for audio-visual content
US20220291743A1 (en) Proactive Actions Based on Audio and Body Movement
CN111696566B (en) Voice processing method, device and medium
CN116194792A (en) Connection evaluation system
CN115623156B (en) Audio processing method and related device
JP7397883B2 (en) Presentation of communication data based on environment
CN111696565B (en) Voice processing method, device and medium
CN111696564B (en) Voice processing method, device and medium
JP2023181567A (en) Information processing apparatus, information processing method, information processing system, and data generation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21741122

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21741122

Country of ref document: EP

Kind code of ref document: A1