WO2019127759A1 - 一种语音图像采集编码方法及装置 - Google Patents

一种语音图像采集编码方法及装置 Download PDF

Info

Publication number
WO2019127759A1
WO2019127759A1 PCT/CN2018/073488 CN2018073488W WO2019127759A1 WO 2019127759 A1 WO2019127759 A1 WO 2019127759A1 CN 2018073488 W CN2018073488 W CN 2018073488W WO 2019127759 A1 WO2019127759 A1 WO 2019127759A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scene
coordinate
area
voice
Prior art date
Application number
PCT/CN2018/073488
Other languages
English (en)
French (fr)
Inventor
徐奎
Original Assignee
武汉华星光电半导体显示技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉华星光电半导体显示技术有限公司 filed Critical 武汉华星光电半导体显示技术有限公司
Publication of WO2019127759A1 publication Critical patent/WO2019127759A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to a voice image acquisition and encoding method and apparatus.
  • TV is an audio-visual entertainment interactive device that not only displays beautiful images, but also produces beautiful sounds that match it.
  • image and voice of the TV in the market the acquisition and encoding process is: image source ⁇ image acquisition device ⁇ analog conversion ⁇ image coding ⁇ digital image.
  • the camera CCD/CMOS sensor video signal acquisition module
  • Sensor video signal acquisition module
  • the intensity of the light in different areas of the scene and the color of the light are different, the corresponding area of the photoreceptor is generated.
  • the signal is also different, so that the color and brightness information of the scene captured is recorded; at the same time, the position of the scene in the scene is also recorded.
  • the scene has a corresponding relationship with the Sensor.
  • the scene image corresponds to a plurality of Sensor Pixels (hereinafter referred to as Pixel), and since the Pixel is regularly arranged in the Sensor, the position of the scene in the scene can be located according to the Pixel coordinate information.
  • the process of voice acquisition and encoding in the video is: voice source ⁇ voice acquisition device ⁇ analog conversion ⁇ voice coding ⁇ digital voice.
  • voice source voice acquisition device ⁇ analog conversion ⁇ voice coding ⁇ digital voice.
  • DAC digital-to-analog conversion
  • Photographing with a camera is a simple acquisition of images; recording with a microphone is a simple acquisition of speech; when using a camera and a microphone to work together, a voice image or video is generated.
  • the conventional video signal includes: image signal, voice signal, and synchronization signal, but the image and voice information are encoded in a simple manner, and the content of the image and voice information is single, which does not give the viewer a true appearance.
  • the technical problem to be solved by the present invention is to provide a voice image acquisition and encoding method and device, and further enrich the content of the image and voice information by changing the coding mode of the image and the voice information to the viewer. A more realistic look and feel.
  • the present invention provides a voice image acquisition and coding method, including the following steps:
  • Step 1 The image in the video signal collected by the video signal acquisition module is formed by a matrix of m rows and n columns of pixels, and is positioned by pixel coordinate values for each image pixel;
  • Step 2 When taking a picture, the image area coordinate value is used to represent a certain image specific area, and the coordinates of the image medium scene are determined according to the difference of the pixel sensing signal collected by the video signal acquisition module, the pixel coordinate value, and the image area coordinate value;
  • Step 3 Match the coordinate information of the voice in the video signal with the coordinates of the image medium scene.
  • the image area coordinate value in the step 2 is represented by (x1, y1; x2, y2), and (x1, y1; x2, y2) is the x1 to x2 line and the y1th to The image area contained in the y2 column.
  • the coordinate area of the scene 1 is represented as: (x11, y11; x12, y12), and the coordinate area of the scene 2 is: (x21, y21; x22, y22),...
  • the scene N coordinate area is: (xN1, yN1; xN2, yN2); wherein the voice 1 coordinate is a specific position (x1, y1) in the coordinate area of the scene 1, (x1, y1) position sounding area of the scene 1
  • the speech 2 coordinates are determined by a specific position (x2, y2), (x2, y2) in the scene 2 coordinate area, and the speech N coordinate is a specific position in the N coordinate area of the scene. (xN, yN), (xN, yN) position depends on the utterance area of the scene N.
  • the video signal collected by the video signal acquisition module includes an image signal, a voice signal, a synchronization signal, and a coordinate signal.
  • the coordinate signal may be separately used as a type of signal independently of the image signal, the voice signal, and the synchronization signal, or may be encoded into any type of image signal, voice signal, or synchronization signal. Among them.
  • a voice image acquisition and encoding device comprises a video signal acquisition module, an image scene area coordinate value generation module, an image scene area audio coordinate value judgment module, an image scene area coordinate and a scene area audio coordinate matching module, and a processed video signal playing module.
  • the output end of the video signal acquisition module is connected with an image scene area coordinate value generation module and an image scene area audio coordinate value judgment module, and the image scene area coordinate value generation module, the image scene area audio coordinate value judgment module output end and the scene area
  • the coordinates are connected to the scene area audio coordinate matching module, and the image scene area coordinates are connected to the scene area audio coordinate matching module and the processed video signal playing module.
  • the video signal acquisition module includes an image acquisition sensor and a sound collection sensor.
  • the scene 1 coordinate region is generated by the image scene region coordinate value generation module as: (x11, y11; x12, y12), the scene 2 coordinate
  • the area is generated as: (x21, y21; x22, y22), ..., the scene N coordinate area is generated as: (xN1, yN1; xN2, yN2);
  • the voice 1 coordinate is a specific position in the coordinate area of the scene 1 (x1) , y1), (x1, y1) position is determined by the utterance area of the scene 1 judged by the image coordinate area audio coordinate value judgment module;
  • the voice 2 coordinate is a specific position (x2, y2) in the scene 2 coordinate area, (x2 , y2) position is determined by the utterance area of the scene 2 judged by the image coordinate area audio coordinate value judgment module;
  • the voice N coordinate is a specific position (xN, yN), (xN,
  • the speech can be emitted from the actual utterance area of the corresponding display image scene, especially when the TV size becomes larger, the speech is no longer simply from the bottom of the TV. Or the side is emitted, which realizes the movement of the voice on the TV with the scene, perfectly restores the video shooting scene, and presents the viewer with a better sense of presence.
  • FIG. 1 is a schematic diagram showing the composition of a video signal processed by the present invention
  • FIG. 2 is a schematic diagram of image pixel coordinates processed by the present invention
  • Figure 3 is a view of the actual scene of the present invention.
  • Figure 4 is a schematic view showing the structure of the apparatus of the present invention.
  • the invention provides a voice image acquisition and coding method, which comprises the following steps:
  • Step 1 The image in the video signal collected by the video signal acquisition module is composed of m rows and n columns of pixels in a matrix form, and for each image pixel, the pixel coordinate values are used for positioning, as shown in FIG. 2;
  • Step 2 When taking a picture, the image area coordinate value is used to represent a certain image specific area, and the coordinates of the image medium scene are determined according to the difference of the pixel sensing signal collected by the video signal acquisition module, the pixel coordinate value, and the image area coordinate value;
  • Step 3 Match the coordinate information of the voice in the video signal with the coordinates of the image medium scene.
  • the image area coordinate values in the step 2 are represented as (x1, y1; x2, y2), and (x1, y1; x2, y2) are included in the x1th to x2th rows and the y1th to y2th columns. Image area.
  • the coordinate area of the scene 1 is represented as: (x11, y11; x12, y12), and the coordinate area of the scene 2 is: (x21, y21; x22, y22), ..., the scene N coordinate area is: (xN1, yN1; xN2, yN2); wherein the voice 1 coordinate is a specific position (x1, y1) in the coordinate area of the scene 1, (x1, y1) position sound of the scene 1
  • the speech 2 coordinates are determined by a specific position (x2, y2) and (x2, y2) position of the scene object 2 in the coordinate area of the scene 2; the voice N coordinate is a specific area in the N coordinate area of the scene.
  • the position (xN, yN), (xN, yN) position depends on the sounding area of the scene N.
  • the video signal collected by the video signal acquisition module includes an image signal, a voice signal, a synchronization signal, and a coordinate signal. among them:
  • Image signal containing image information for presenting an image
  • Voice signal contains voice information for presenting voice
  • Synchronization signal including the line of the image signal, the field synchronization information to ensure the normal display of the image, and the image and the voice synchronization information to ensure that the corresponding speech is played synchronously when the TV presents the image;
  • Coordinate signal contains coordinate information of the voice, and the coordinate information is matched with the coordinates of the image medium.
  • the coordinate signal may exist as a type of signal independently of the image signal, the voice signal, and the synchronization signal, or may be incorporated into any of the image signal, the voice signal, and the synchronization signal.
  • a voice image acquisition and encoding device includes a video signal acquisition module, an image scene area coordinate value generation module, an image scene area audio coordinate value judgment module, an image scene area coordinate and a scene area audio coordinate matching module, and processing.
  • the output end of the video signal collecting module is connected with the image scene area coordinate value generating module and the image scene area audio coordinate value determining module, and the image scene area coordinate value generating module and the image scene area audio coordinate value judgment
  • the module output end and the scene area coordinate are connected with the scene area audio coordinate matching module, and the image scene area coordinate is connected with the scene area audio coordinate matching module and the processed video signal playing module.
  • the video signal acquisition module includes an image acquisition sensor and a sound collection sensor.
  • the coordinate region of the scene 1 is generated by the image scene region coordinate value generating module as: (x11, y11; x12, y12), and the scene 2 coordinate region is generated as: X21, y21; x22, y22), ..., the scene N coordinate area is generated as: (xN1, yN1; xN2, yN2); wherein the voice 1 coordinate is a specific position (x1, y1) in the coordinate area of the scene 1, ( The position of x1, y1) is determined by the utterance area of the scene 1 judged by the audio coordinate value judgment module of the image scene area; the position of the voice 2 is a specific position (x2, y2) in the coordinate area of the scene 2, and the position of (x2, y2) is passed.
  • the image scene area audio coordinate value judgment module determines the sounding area of the scene 2; the voice N coordinate is a specific position (xN, yN) in the scene N coordinate area, and the (xN, yN) position passes the image scene area audio coordinate value. It is determined by the sounding area of the scene N judged by the module.
  • the speech can be emitted from the actual utterance area of the corresponding display image scene, especially when the TV size becomes larger, the speech is no longer simply from the bottom of the TV. Or the side is emitted, which realizes the movement of the voice on the TV with the scene, perfectly restores the video shooting scene, and presents the viewer with a better sense of presence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种语音图像采集编码方法及装置,方法包括:令视频信号采集模块采集的视频信号中的图像共由m行n列像素呈矩阵形式构成,针对每一个图像像素,由像素坐标值进行定位;当进行拍照时,以图像区域坐标值表示某个图像具体区域,根据视频信号采集模块采集的像素感应信号差异及像素坐标值、图像区域坐标值,来判断图像中景的坐标;将视频信号中语音的坐标信息与图像中景的坐标进行匹配。本发明通过赋予语音坐标值,若配合相应的电视机(TV)设备,能够在播放视频信号时,语音能从对应显示图像景物实际发声区域发出,尤其当TV尺寸变大时,语音不再单纯的从TV底部或侧面发出,实现了语音随景物在TV上移动,完美还原视频拍摄现场的临场感。

Description

一种语音图像采集编码方法及装置 技术领域
本发明涉及图像处理技术领域,具体地讲,涉及一种语音图像采集编码方法及装置。
背景技术
平板显示技术的发展,使得电视机(Television,TV)等设备得到了普及。TV为影音娱乐交互设备,不仅可以显示绚丽的图像,还能发出与之匹配的优美的语音。关于目前市面的TV之图像及语音,其采集及编码过程为:图像源→图像采集装置→模数转换→图像编码→数字图像。
在TV上显示时,则是上述流程的逆向,即对数字图像进行DAC(数模转换),再输送到显示模组上后显示图像。针对图像采集,以常用的拍照相机为例,进行如下说明:
当对景物进行拍照时,相机CCD/CMOS感光Sensor(视频信号采集模块)(以下简称Sensor)对景物感光,由于景物不同区域光的强度及光的颜色不一样,则感光器对应区域感应生成的信号也不一样,这样就记录下所拍摄的景物颜色及亮度信息;同时,景物中的景的位置也被一并记录。
景物与Sensor有对应关系,景物图像分别对应多个Sensor Pixel(以下简称Pixel),同时由于Pixel在Sensor中有规律地排列,即可根据Pixel坐标信息对景物中的景的位置进行定位。
同时,视频中的语音采集及编码的过程为:语音源→语音采集装置→模数转换→语音编码→数字语音。当需要回放语音时,则是上述流程的逆向,即对数字语音进行DAC(数模转换),再输送到扬声器(Speaker)上进行播放。
利用照相机拍照,是单纯的采集图像;利用麦克风进行录音,是单纯的采集语音;当同时利用照相机及麦克风进行协同工作时,语音图像即视频就产生 了。传统的视频信号包含有:图像信号、语音信号、同步信号,但其图像及语音信息的编码方式简单,图像及语音信息的内容单一,不能很好地给观者真实的临场观感。
发明内容
有鉴于现有技术的上述缺陷,本发明所要解决的技术问题是提供一种语音图像采集编码方法及装置,通过改变图像及语音信息的编码方式,进一步丰富图像及语音信息的内容,给观者更为真实的临场观感。
为实现上述目的,本发明提供了一种语音图像采集编码方法,包括以下步骤:
步骤1、令视频信号采集模块采集的视频信号中的图像共由m行n列像素呈矩阵形式构成,针对每一个图像像素,由像素坐标值进行定位;
步骤2、当进行拍照时,以图像区域坐标值表示某个图像具体区域,根据视频信号采集模块采集的像素感应信号差异及像素坐标值、图像区域坐标值,来判断图像中景的坐标;
步骤3、将视频信号中语音的坐标信息与图像中景的坐标进行匹配。
上述的一种语音图像采集编码方法,所述步骤2中图像区域坐标值表示为(x1,y1;x2,y2),(x1,y1;x2,y2)为第x1至x2行及第y1至y2列所包含的图像区域。
上述的一种语音图像采集编码方法,所述图像区域中,景物1坐标区域表示为:(x11,y11;x12,y12),景物2坐标区域为:(x21,y21;x22,y22),……,景物N坐标区域为:(xN1,yN1;xN2,yN2);其中语音1坐标为景物1坐标区域内某一具体位置(x1,y1),(x1,y1)位置视景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置视景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置视景物N的发声区域而定。
上述的一种语音图像采集编码方法,所述视频信号采集模块采集的视频信号包括图像信号、语音信号、同步信号、坐标信号。
上述的一种语音图像采集编码方法,所述坐标信号可独立于图像信号、语音信号、同步信号外而单独作为一类信号存在,也可编入图像信号、语音信号、同步信号任一类信号之中。
一种语音图像采集编码装置,包括视频信号采集模块、图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块、图像景物区域坐标与景物区域音频坐标匹配模块、处理后视频信号播放模块,所述视频信号采集模块输出端与图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块连接,所述图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块输出端与景物区域坐标与景物区域音频坐标匹配模块连接,所述图像景物区域坐标与景物区域音频坐标匹配模块、处理后视频信号播放模块相连接。
上述的一种语音图像采集编码装置,所述视频信号采集模块包括图像采集传感器和声音采集传感器。
上述的一种语音图像采集编码装置,所述视频信号采集模块采集的图像区域中,景物1坐标区域通过图像景物区域坐标值生成模块生成为:(x11,y11;x12,y12),景物2坐标区域生成为:(x21,y21;x22,y22),……,景物N坐标区域生成为:(xN1,yN1;xN2,yN2);其中语音1坐标为景物1坐标区域内某一具体位置(x1,y1),(x1,y1)位置通过图像景物区域音频坐标值判断模块判断的景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置通过图像景物区域音频坐标值判断模块判断的景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置通过图像景物区域音频坐标值判断模块判断的景物N的发声区域而定。
本发明的有益效果是:
本发明通过赋予语音坐标值,若配合相应的TV设备,能够在播放视频信号时,语音能从对应显示图像景物实际发声区域发出,尤其当TV尺寸变大时,语音不再单纯的从TV底部或侧面发出,实现了语音随景物在TV上移动,完美还原视频拍摄现场,呈现给观者更好的临场感。
以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。
附图说明
通过结合附图进行的以下描述,本发明的实施例的上述和其它方面、特点和优点将变得更加清楚,附图中:
图1是本发明处理的视频信号组成示意图;
图2是本发明处理的图像像素坐标示意图;
图3是本发明的实际场景图;
图4是本发明的装置结构示意图。
具体实施方式
以下,将参照附图来详细描述本发明的实施例。然而,可以以许多不同的形式来实施本发明,并且本发明不应该被解释为限制于这里阐述的具体实施例。相反,提供这些实施例是为了解释本发明的原理及其实际应用,从而使本领域的其他技术人员能够理解本发明的各种实施例和适合于特定预期应用的各种修改。
本发明提出一种语音图像采集编码方法,包括以下步骤:
步骤1、令视频信号采集模块采集的视频信号中的图像共由m行n列像素呈矩阵形式构成,针对每一个图像像素,由像素坐标值进行定位,如图2所示;
步骤2、当进行拍照时,以图像区域坐标值表示某个图像具体区域,根据视频信号采集模块采集的像素感应信号差异及像素坐标值、图像区域坐标值,来判断图像中景的坐标;
步骤3、将视频信号中语音的坐标信息与图像中景的坐标进行匹配。
本实施例中,所述步骤2中图像区域坐标值表示为(x1,y1;x2,y2),(x1,y1;x2,y2)为第x1至x2行及第y1至y2列所包含的图像区域。
如图3所示,本实施例中,所述图像区域中,景物1坐标区域表示为:(x11,y11;x12,y12),景物2坐标区域为:(x21,y21;x22,y22),……,景物N坐标区域为:(xN1,yN1;xN2,yN2);其中语音1坐标为景物1坐标区域内某一具体位 置(x1,y1),(x1,y1)位置视景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置视景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置视景物N的发声区域而定。
如图1所示,本实施例中,所述视频信号采集模块采集的视频信号包括图像信号、语音信号、同步信号、坐标信号。其中:
1.图像信号:包含图像信息,用以呈现图像;
2.语音信号:包含语音信息,用以呈现语音;
3.同步信号:包含图像信号之行、场同步信息以保证图像正常显示,以及图像与语音同步信息,以保证TV呈现图像时同步播放其对应的语音;
4.坐标信号:包含语音的坐标信息,该坐标信息与图像中景的坐标进行匹配。
坐标信号可独立于图像信号、语音信号、同步信号外而单独作为一类信号存在,也可编入图像信号、语音信号、同步信号任一类信号之中。
如图4所示,一种语音图像采集编码装置,包括视频信号采集模块、图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块、图像景物区域坐标与景物区域音频坐标匹配模块、处理后视频信号播放模块,所述视频信号采集模块输出端与图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块连接,所述图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块输出端与景物区域坐标与景物区域音频坐标匹配模块连接,所述图像景物区域坐标与景物区域音频坐标匹配模块、处理后视频信号播放模块相连接。
本实施例中,所述视频信号采集模块包括图像采集传感器和声音采集传感器。
本实施例中,所述视频信号采集模块采集的图像区域中,景物1坐标区域通过图像景物区域坐标值生成模块生成为:(x11,y11;x12,y12),景物2坐标区域生成为:(x21,y21;x22,y22),……,景物N坐标区域生成为:(xN1,yN1;xN2, yN2);其中语音1坐标为景物1坐标区域内某一具体位置(x1,y1),(x1,y1)位置通过图像景物区域音频坐标值判断模块判断的景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置通过图像景物区域音频坐标值判断模块判断的景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置通过图像景物区域音频坐标值判断模块判断的景物N的发声区域而定。
本发明通过赋予语音坐标值,若配合相应的TV设备,能够在播放视频信号时,语音能从对应显示图像景物实际发声区域发出,尤其当TV尺寸变大时,语音不再单纯的从TV底部或侧面发出,实现了语音随景物在TV上移动,完美还原视频拍摄现场,呈现给观者更好的临场感。
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思做出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。

Claims (8)

  1. 一种语音图像采集编码方法,其中,包括以下步骤:
    步骤1、令视频信号采集模块采集的视频信号中的图像共由m行n列像素呈矩阵形式构成,针对每一个图像像素,由像素坐标值进行定位;
    步骤2、当进行拍照时,以图像区域坐标值表示某个图像具体区域,根据视频信号采集模块采集的像素感应信号差异及像素坐标值、图像区域坐标值,来判断图像中景的坐标;
    步骤3、将视频信号中语音的坐标信息与图像中景的坐标进行匹配。
  2. 如权利要求1所述的一种语音图像采集编码方法,其中,所述步骤2中图像区域坐标值表示为(x1,y1;x2,y2),(x1,y1;x2,y2)为第x1至x2行及第y1至y2列所包含的图像区域。
  3. 如权利要求2所述的一种语音图像采集编码方法,其中,所述图像区域中,景物1坐标区域表示为:(x11,y11;x12,y12),景物2坐标区域为:(x21,y21;x22,y22),……,景物N坐标区域为:(xN1,yN1;xN2,yN2);其中语音1坐标为景物1坐标区域内某一具体位置(x1,y1),(x1,y1)位置视景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置视景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置视景物N的发声区域而定。
  4. 如权利要求1所述的一种语音图像采集编码方法,其中,所述视频信号采集模块采集的视频信号包括图像信号、语音信号、同步信号、坐标信号。
  5. 如权利要求4所述的一种语音图像采集编码方法,其中,所述坐标信号可独立于图像信号、语音信号、同步信号外而单独作为一类信号存在,也可编入图像信号、语音信号、同步信号任一类信号之中。
  6. 一种语音图像采集编码装置,其中,包括视频信号采集模块、图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块、图像景物区域坐 标与景物区域音频坐标匹配模块、处理后视频信号播放模块,所述视频信号采集模块输出端与图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块连接,所述图像景物区域坐标值生成模块、图像景物区域音频坐标值判断模块输出端与景物区域坐标与景物区域音频坐标匹配模块连接,所述图像景物区域坐标与景物区域音频坐标匹配模块、处理后视频信号播放模块相连接。
  7. 如权利要求6所述的一种语音图像采集编码装置,其中,所述视频信号采集模块包括图像采集传感器和声音采集传感器。
  8. 如权利要求6所述的一种语音图像采集编码装置,其中,所述视频信号采集模块采集的图像区域中,景物1坐标区域通过图像景物区域坐标值生成模块生成为:(x11,y11;x12,y12),景物2坐标区域生成为:(x21,y21;x22,y22),……,景物N坐标区域生成为:(xN1,yN1;xN2,yN2);其中语音1坐标为景物1坐标区域内某一具体位置(x1,y1),(x1,y1)位置通过图像景物区域音频坐标值判断模块判断的景物1的发声区域而定;语音2坐标为景物2坐标区域内某一具体位置(x2,y2),(x2,y2)位置通过图像景物区域音频坐标值判断模块判断的景物2的发声区域而定;语音N坐标为景物N坐标区域内某一具体位置(xN,yN),(xN,yN)位置通过图像景物区域音频坐标值判断模块判断的景物N的发声区域而定。
PCT/CN2018/073488 2017-12-28 2018-01-19 一种语音图像采集编码方法及装置 WO2019127759A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711458492.9 2017-12-28
CN201711458492.9A CN108156499A (zh) 2017-12-28 2017-12-28 一种语音图像采集编码方法及装置

Publications (1)

Publication Number Publication Date
WO2019127759A1 true WO2019127759A1 (zh) 2019-07-04

Family

ID=62463462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/073488 WO2019127759A1 (zh) 2017-12-28 2018-01-19 一种语音图像采集编码方法及装置

Country Status (2)

Country Link
CN (1) CN108156499A (zh)
WO (1) WO2019127759A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100135510A1 (en) * 2008-12-02 2010-06-03 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
CN103763578A (zh) * 2014-01-10 2014-04-30 北京酷云互动科技有限公司 一种节目关联信息推送方法和装置
CN104065869A (zh) * 2013-03-18 2014-09-24 三星电子株式会社 在电子装置中与播放音频组合地显示图像的方法
CN105379302A (zh) * 2013-07-19 2016-03-02 索尼公司 信息处理设备和信息处理方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101844511B1 (ko) * 2010-03-19 2018-05-18 삼성전자주식회사 입체 음향 재생 방법 및 장치
CN103905810B (zh) * 2014-03-17 2017-12-12 北京智谷睿拓技术服务有限公司 多媒体处理方法及多媒体处理装置
CN105979470B (zh) * 2016-05-30 2019-04-16 北京奇艺世纪科技有限公司 全景视频的音频处理方法、装置和播放系统
CN106162206A (zh) * 2016-08-03 2016-11-23 北京疯景科技有限公司 全景录制、播放方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100135510A1 (en) * 2008-12-02 2010-06-03 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
CN104065869A (zh) * 2013-03-18 2014-09-24 三星电子株式会社 在电子装置中与播放音频组合地显示图像的方法
CN105379302A (zh) * 2013-07-19 2016-03-02 索尼公司 信息处理设备和信息处理方法
CN103763578A (zh) * 2014-01-10 2014-04-30 北京酷云互动科技有限公司 一种节目关联信息推送方法和装置

Also Published As

Publication number Publication date
CN108156499A (zh) 2018-06-12

Similar Documents

Publication Publication Date Title
JP4310916B2 (ja) 映像表示装置
CN106789991B (zh) 一种基于虚拟场景的多人互动网络直播方法及系统
US9160938B2 (en) System and method for generating three dimensional presentations
JP7447077B2 (ja) 映像ストリームにおける動的画像コンテンツ置換のための方法およびシステム
JPH11219446A (ja) 映像音響再生システム
CN101877767A (zh) 一种六通道视频源生成三维全景连续视频的方法和系统
WO2020090458A1 (ja) 表示装置、及び表示制御方法
WO2001035675A1 (en) Virtual presentation system and method
JP2007295559A (ja) ビデオ処理および表示
CN115118880A (zh) 一种基于沉浸式视频终端搭建的xr虚拟拍摄系统
CN112532963B (zh) 一种基于ar的三维全息实时互动系统及方法
KR101839406B1 (ko) 디스플레이장치 및 그 제어방법
WO2019127759A1 (zh) 一种语音图像采集编码方法及装置
CN113382292B (zh) 一种法院庭审公开展示方法
WO2020184316A1 (ja) 情報処理装置、情報処理方法、及びプログラム
JP2000358222A (ja) 表示表現装置および情報伝送方式
JP2004007284A (ja) 映像記録システム、プログラム及び記録媒体
EP1542471A4 (en) IMAGE PROCESSING DEVICE, METHOD, INFORMATION PROCESSING DEVICE, METHOD, RECORDING MEDIUM AND PROGRAM
CN202872950U (zh) Led显示模组、led电视及led电视系统
CN219802409U (zh) 一种xr虚拟制片实时合成系统
KR102652371B1 (ko) 동작 배우기 동영상 제작 시스템
JP2000149041A (ja) 動画像加工装置および方法および記憶媒体
WO2021082742A1 (zh) 一种数据显示方法及媒体处理装置
TWI836141B (zh) 即時三維影像顯示的直播方法
US20240163414A1 (en) Information processing apparatus, information processing method, and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18896611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18896611

Country of ref document: EP

Kind code of ref document: A1